Готові списки джерел за темами / Dense Vision Tasks

Добірка наукової літератури з теми "Dense Vision Tasks"

Автор: Grafiati

Опубліковано: 6 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Dense Vision Tasks".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Dense Vision Tasks"

Yao, Chao, Shuo Jin, Meiqin Liu, and Xiaojuan Ban. "Dense Residual Transformer for Image Denoising." Electronics 11, no. 3 (January 29, 2022): 418. http://dx.doi.org/10.3390/electronics11030418.

Повний текст джерела

Анотація:

Image denoising is an important low-level computer vision task, which aims to reconstruct a noise-free and high-quality image from a noisy image. With the development of deep learning, convolutional neural network (CNN) has been gradually applied and achieved great success in image denoising, image compression, image enhancement, etc. Recently, Transformer has been a hot technique, which is widely used to tackle computer vision tasks. However, few Transformer-based methods have been proposed for low-level vision tasks. In this paper, we proposed an image denoising network structure based on Transformer, which is named DenSformer. DenSformer consists of three modules, including a preprocessing module, a local-global feature extraction module, and a reconstruction module. Specifically, the local-global feature extraction module consists of several Sformer groups, each of which has several ETransformer layers and a convolution layer, together with a residual connection. These Sformer groups are densely skip-connected to fuse the feature of different layers, and they jointly capture the local and global information from the given noisy images. We conduct our model on comprehensive experiments. In synthetic noise removal, DenSformer outperforms other state-of-the-art methods by up to 0.06–0.28 dB in gray-scale images and 0.57–1.19 dB in color images. In real noise removal, DenSformer can achieve comparable performance, while the number of parameters can be reduced by up to 40%. Experimental results prove that our DenSformer achieves improvement compared to some state-of-the-art methods, both for the synthetic noise data and real noise data, in the objective and subjective evaluations.

Стилі APA, Harvard, Vancouver, ISO та ін.

Zhang, Qian, Yeqi Liu, Chuanyang Gong, Yingyi Chen, and Huihui Yu. "Applications of Deep Learning for Dense Scenes Analysis in Agriculture: A Review." Sensors 20, no. 5 (March 10, 2020): 1520. http://dx.doi.org/10.3390/s20051520.

Повний текст джерела

Анотація:

Deep Learning (DL) is the state-of-the-art machine learning technology, which shows superior performance in computer vision, bioinformatics, natural language processing, and other areas. Especially as a modern image processing technology, DL has been successfully applied in various tasks, such as object detection, semantic segmentation, and scene analysis. However, with the increase of dense scenes in reality, due to severe occlusions, and small size of objects, the analysis of dense scenes becomes particularly challenging. To overcome these problems, DL recently has been increasingly applied to dense scenes and has begun to be used in dense agricultural scenes. The purpose of this review is to explore the applications of DL for dense scenes analysis in agriculture. In order to better elaborate the topic, we first describe the types of dense scenes in agriculture, as well as the challenges. Next, we introduce various popular deep neural networks used in these dense scenes. Then, the applications of these structures in various agricultural tasks are comprehensively introduced in this review, including recognition and classification, detection, counting and yield estimation. Finally, the surveyed DL applications, limitations and the future work for analysis of dense images in agriculture are summarized.

Стилі APA, Harvard, Vancouver, ISO та ін.

Gan, Zhe, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, and Zicheng Liu. "Playing Lottery Tickets with Vision and Language." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 1 (June 28, 2022): 652–60. http://dx.doi.org/10.1609/aaai.v36i1.19945.

Повний текст джерела

Анотація:

Large-scale pre-training has recently revolutionized vision-and-language (VL) research. Models such as LXMERT and UNITER have significantly lifted the state of the art over a wide range of VL tasks. However, the large number of parameters in such models hinders their application in practice. In parallel, work on the lottery ticket hypothesis (LTH) has shown that deep neural networks contain small matching subnetworks that can achieve on par or even better performance than the dense networks when trained in isolation. In this work, we perform the first empirical study to assess whether such trainable subnetworks also exist in pre-trained VL models. We use UNITER as the main testbed (also test on LXMERT and ViLT), and consolidate 7 representative VL tasks for experiments, including visual question answering, visual commonsense reasoning, visual entailment, referring expression comprehension, image-text retrieval, GQA, and NLVR2. Through comprehensive analysis, we summarize our main findings as follows. (i) It is difficult to find subnetworks that strictly match the performance of the full model. However, we can find relaxed winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy. (ii) Subnetworks found by task-specific pruning transfer reasonably well to the other tasks, while those found on the pre-training tasks at 60%/70% sparsity transfer universally, matching 98%/96% of the full accuracy on average over all the tasks. (iii) Besides UNITER, other models such as LXMERT and ViLT can also play lottery tickets. However, the highest sparsity we can achieve for ViLT is far lower than LXMERT and UNITER (30% vs. 70%). (iv) LTH also remains relevant when using other training methods (e.g., adversarial training).

Стилі APA, Harvard, Vancouver, ISO та ін.

Dinh, My-Tham, Deok-Jai Choi, and Guee-Sang Lee. "DenseTextPVT: Pyramid Vision Transformer with Deep Multi-Scale Feature Refinement Network for Dense Text Detection." Sensors 23, no. 13 (June 25, 2023): 5889. http://dx.doi.org/10.3390/s23135889.

Повний текст джерела

Анотація:

Detecting dense text in scene images is a challenging task due to the high variability, complexity, and overlapping of text areas. To adequately distinguish text instances with high density in scenes, we propose an efficient approach called DenseTextPVT. We first generated high-resolution features at different levels to enable accurate dense text detection, which is essential for dense prediction tasks. Additionally, to enhance the feature representation, we designed the Deep Multi-scale Feature Refinement Network (DMFRN), which effectively detects texts of varying sizes, shapes, and fonts, including small-scale texts. DenseTextPVT, then, is inspired by Pixel Aggregation (PA) similarity vector algorithms to cluster text pixels into correct text kernels in the post-processing step. In this way, our proposed method enhances the precision of text detection and effectively reduces overlapping between text regions under dense adjacent text in natural images. The comprehensive experiments indicate the effectiveness of our method on the TotalText, CTW1500, and ICDAR-2015 benchmark datasets in comparison to existing methods.

Стилі APA, Harvard, Vancouver, ISO та ін.

Pan, Zizheng, Bohan Zhuang, Haoyu He, Jing Liu, and Jianfei Cai. "Less Is More: Pay Less Attention in Vision Transformers." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 2 (June 28, 2022): 2035–43. http://dx.doi.org/10.1609/aaai.v36i2.20099.

Повний текст джерела

Анотація:

Transformers have become one of the dominant architectures in deep learning, particularly as a powerful alternative to convolutional neural networks (CNNs) in computer vision. However, Transformer training and inference in previous works can be prohibitively expensive due to the quadratic complexity of self-attention over a long sequence of representations, especially for high-resolution dense prediction tasks. To this end, we present a novel Less attention vIsion Transformer (LIT), building upon the fact that the early self-attention layers in Transformers still focus on local patterns and bring minor benefits in recent hierarchical vision Transformers. Specifically, we propose a hierarchical Transformer where we use pure multi-layer perceptrons (MLPs) to encode rich local patterns in the early stages while applying self-attention modules to capture longer dependencies in deeper layers. Moreover, we further propose a learned deformable token merging module to adaptively fuse informative patches in a non-uniform manner. The proposed LIT achieves promising performance on image recognition tasks, including image classification, object detection and instance segmentation, serving as a strong backbone for many vision tasks. Code is available at https://github.com/zip-group/LIT.

Стилі APA, Harvard, Vancouver, ISO та ін.

CASCO, CLARA, GIANLUCA CAMPANA, ALBA GRIECO, SILVANA MUSETTI, and SALVATORE PERRONE. "Hyper-vision in a patient with central and paracentral vision loss reflects cortical reorganization." Visual Neuroscience 20, no. 5 (September 2003): 501–10. http://dx.doi.org/10.1017/s0952523803205046.

Повний текст джерела

Анотація:

SM, a 21-year-old female, presents an extensive central scotoma (30 deg) with dense absolute scotoma (visual acuity = 10/100) in the macular area (10 deg) due to Stargardt's disease. We provide behavioral evidence of cortical plastic reorganization since the patient could perform several visual tasks with her poor-vision eyes better than controls, although high spatial frequency sensitivity and visual acuity are severely impaired. Between 2.5-deg and 12-deg eccentricities, SM presented (1) normal acuity for crowded letters, provided stimulus size is above acuity thresholds for single letters; (2) a two-fold sensitivity increase (d-prime) with respect to controls in a simple search task; and (3) largely above-threshold performance in a lexical decision task carried out randomly by controls. SM's hyper-vision may reflect a long-term sensory gain specific for unimpaired low spatial-frequency mechanisms, which may result from modifications in response properties due to practice-dependent changes in excitatory/inhibitory intracortical connections.

Стилі APA, Harvard, Vancouver, ISO та ін.

Zhang, Xu, DeZhi Han, and Chin-Chen Chang. "RDMMFET: Representation of Dense Multimodality Fusion Encoder Based on Transformer." Mobile Information Systems 2021 (October 18, 2021): 1–9. http://dx.doi.org/10.1155/2021/2662064.

Повний текст джерела

Анотація:

Visual question answering (VQA) is the natural language question-answering of visual images. The model of VQA needs to make corresponding answers according to specific questions based on understanding images, the most important of which is to understand the relationship between images and language. Therefore, this paper proposes a new model, Representation of Dense Multimodality Fusion Encoder Based on Transformer, for short, RDMMFET, which can learn the related knowledge between vision and language. The RDMMFET model consists of three parts: dense language encoder, image encoder, and multimodality fusion encoder. In addition, we designed three types of pretraining tasks: masked language model, masked image model, and multimodality fusion task. These pretraining tasks can help to understand the fine-grained alignment between text and image regions. Simulation results on the VQA v2.0 data set show that the RDMMFET model can work better than the previous model. Finally, we conducted detailed ablation studies on the RDMMFET model and provided the results of attention visualization, which proves that the RDMMFET model can significantly improve the effect of VQA.

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Bin, Haifeng Ye, Sihan Fu, Xiaojin Gong, and Zhiyu Xiang. "UnVELO: Unsupervised Vision-Enhanced LiDAR Odometry with Online Correction." Sensors 23, no. 8 (April 13, 2023): 3967. http://dx.doi.org/10.3390/s23083967.

Повний текст джерела

Анотація:

Due to the complementary characteristics of visual and LiDAR information, these two modalities have been fused to facilitate many vision tasks. However, current studies of learning-based odometries mainly focus on either the visual or LiDAR modality, leaving visual–LiDAR odometries (VLOs) under-explored. This work proposes a new method to implement an unsupervised VLO, which adopts a LiDAR-dominant scheme to fuse the two modalities. We, therefore, refer to it as unsupervised vision-enhanced LiDAR odometry (UnVELO). It converts 3D LiDAR points into a dense vertex map via spherical projection and generates a vertex color map by colorizing each vertex with visual information. Further, a point-to-plane distance-based geometric loss and a photometric-error-based visual loss are, respectively, placed on locally planar regions and cluttered regions. Last, but not least, we designed an online pose-correction module to refine the pose predicted by the trained UnVELO during test time. In contrast to the vision-dominant fusion scheme adopted in most previous VLOs, our LiDAR-dominant method adopts the dense representations for both modalities, which facilitates the visual–LiDAR fusion. Besides, our method uses the accurate LiDAR measurements instead of the predicted noisy dense depth maps, which significantly improves the robustness to illumination variations, as well as the efficiency of the online pose correction. The experiments on the KITTI and DSEC datasets showed that our method outperformed previous two-frame-based learning methods. It was also competitive with hybrid methods that integrate a global optimization on multiple or all frames.

Стилі APA, Harvard, Vancouver, ISO та ін.

Liang, Junling, Heng Li, Fei Xu, Jianpin Chen, Meixuan Zhou, Liping Yin, Zhenzhen Zhai, and Xinyu Chai. "A Fast Deployable Instance Elimination Segmentation Algorithm Based on Watershed Transform for Dense Cereal Grain Images." Agriculture 12, no. 9 (September 16, 2022): 1486. http://dx.doi.org/10.3390/agriculture12091486.

Повний текст джерела

Анотація:

Cereal grains are a vital part of the human diet. The appearance quality and size distribution of cereal grains play major roles as deciders or indicators of market acceptability, storage stability, and breeding. Computer vision is popular in completing quality assessment and size analysis tasks, in which an accurate instance segmentation is a key step to guaranteeing the smooth completion of tasks. This study proposes a fast deployable instance segmentation method based on a generative marker-based watershed segmentation algorithm, which combines two strategies (one strategy for optimizing kernel areas and another for comprehensive segmentation) to overcome the problems of over-segmentation and under-segmentation for images with dense and small targets. Results show that the average segmentation accuracy of our method reaches 98.73%, which is significantly higher than the marker-based watershed segmentation algorithm (82.98%). To further verify the engineering practicality of our method, we count the size distribution of segmented cereal grains. The results keep a high degree of consistency with the manually sketched ground truth. Moreover, our proposed algorithm framework can be used as a great reference in other segmentation tasks of dense targets.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Yaming, Minjie Wang, Wenqing Huang, Xiaoping Ye, and Mingfeng Jiang. "Deep Spatial-Temporal Neural Network for Dense Non-Rigid Structure from Motion." Mathematics 10, no. 20 (October 14, 2022): 3794. http://dx.doi.org/10.3390/math10203794.

Повний текст джерела

Анотація:

Dense non-rigid structure from motion (NRSfM) has long been a challenge in computer vision because of the vast number of feature points. As neural networks develop rapidly, a novel solution is emerging. However, existing methods ignore the significance of spatial–temporal data and the strong capacity of neural networks for learning. This study proposes a deep spatial–temporal NRSfM framework (DST-NRSfM) and introduces a weighted spatial constraint to further optimize the 3D reconstruction results. Layer normalization layers are applied in dense NRSfM tasks to stop gradient disappearance and hasten neural network convergence. Our DST-NRSfM framework outperforms both classical approaches and recent advancements. It achieves state-of-the-art performance across commonly used synthetic and real benchmark datasets.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Dense Vision Tasks"

Kundu, Jogendra Nath. "Self-Supervised Domain Adaptation Frameworks for Computer Vision Tasks." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5782.

Повний текст джерела

Анотація:

There is a strong incentive to build intelligent machines that can understand and adapt to changes in the visual world without human supervision. While humans and animals learn to perceive the world on their own, almost all state-of-the-art vision systems heavily rely on external supervision from millions of manually annotated training examples. Gathering such large-scale manual annotations for structured vision tasks, such as monocular depth estimation, scene segmentation, human pose estimation, faces several practical limitations. Usually, the annotations are gathered in two broad ways; 1) via specialized instruments (sensors) or laboratory setups, 2) via manual annotations. Both processes have several drawbacks. While human annotations are expensive, scarce, or error-prone; instrument-based annotations are often noisy or limited to specific laboratory environments. Such limitations not only stand as a major bottleneck in our efforts to gather unambiguous ground-truth but also limit the diversity in the collected labeled dataset. This motivates us to develop innovative ways to utilize synthetic environments to create labeled synthetic datasets with noise-free unambiguous ground-truths. However, the performance of models trained on such synthetic data markedly degrades when tested on real-world samples due to input distribution shift (a.k.a. domain shift). Unsupervised domain adaptation (DA) seeks learning techniques that can minimize the domain discrepancy between a labeled source and an unlabeled target. However, it mostly remains unexplored for challenging structured prediction based vision tasks. Motivated by the above observations, my research focuses on addressing the following key aspects: (1) Developing algorithms that support improved transferability to domain and task shifts, (2) Leveraging inter-entity or cross-modal relationships to develop self-supervised objectives, and (3) Instilling natural priors to constrain the model output within the realm of natural distributions. First, we present AdaDepth - an unsupervised domain adaptation (DA) strategy for the pixel-wise regression task of monocular depth estimation. Mode collapse is a common phenomenon observed during adversarial training in the absence of paired supervision. Without access to target depth-maps, we address this challenge using a novel content congruent regularization technique. In a follow-up work, we introduced UM-Adapt, a unified framework to address two distinct objectives in a multi-task adaptation framework, i.e., a) achieving balanced performance across all tasks and b) performing domain adaptation in an unsupervised setting. This is realized using two novel regularization strategies; Contour-based content regularization and exploitation of inter-task coherency using a novel cross-task distillation module. Moving forward, we identified certain key issues in existing domain adaptation algorithms that hinder their practical deployability to a large extent. Existing approaches demand the coexistence of source and target data, which is highly impractical in scenarios where data-sharing is restricted due to proprietary or privacy concerns. To address this, we propose a new setting termed as Source-Free DA and tailored learning protocols for the dense prediction task of semantic segmentation and image classification in both with and without category shift scenarios. Further, we investigate the problem of Self-supervised Domain Adaptation for the challenging monocular 3D human pose estimation task. The key differentiating factor in our approach is the idea of infusing model-based structural prior as a means to constrain the pose estimation predictions within the realm of natural pose and shape distributions. Towards self-supervised learning, our contribution lies in the effective use of new inter-entity relationships to discern the co-salient foreground appearance and thereby the corresponding pose from just a pair of images having diverse backgrounds. Unlike self-supervised solutions that aim for better generalization, self-adaptive solutions aim for target-specific adaptation, i.e., adaptation to deployment-specific environmental attributes. To this end, we propose a self-adaptive method to align the latent space of human pose from unpaired image-to-latent and the pose-to-latent, by enforcing well-formed non-local latent space rules available for unpaired image (or video) and pose (or motion) domains. This idea of non-local relation distillation against the broadly employed general contrastive learning techniques shows significant improvements in the self-adaptation performance. Further, in a recent work, we propose a novel way to effectively utilize uncertainty estimation for out-of-distribution (OOD) detection, and thus enabling inference-time self-adaptation. The ability to discern OOD samples allows a model to assess when to perform re-adaptation while deployed in a continually changing environment. Such solutions are in high demand for enabling effective real-world deployment across various industries, from virtual and augmented reality to gaming and health-care applications.

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Dense Vision Tasks"

Appe, Seetharam Nagesh, G. Arulselvi, and Balaji G. N. "Detection and Classification of Dense Tomato Fruits by Integrating Coordinate Attention Mechanism With YOLO Model." In Advances in Computational Intelligence and Robotics, 278–89. IGI Global, 2023. http://dx.doi.org/10.4018/978-1-6684-8098-4.ch016.

Повний текст джерела

Анотація:

Real-time detection of objects is one of the important tasks of computer vision applications such as agriculture, surveillance, self-driving cars, etc. The fruit target detection rate based on traditional approaches is low due to the complex background, substantial texture interference, partial occlusion of fruits, etc. This chapter proposes an improved YOLOv5 model to detect and classify the dense tomatoes by adding the coordinate attention mechanism and bidirectional pyramid network. The coordinate attention mechanism is used to detect and classify the dense tomatoes, and bidirectional pyramid network is used to detect the tomatoes at different scales. The proposed model produces good results in detecting the small dense tomatoes with an accuracy of 87.4%.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Dense Vision Tasks"

Levinshtein, Alex, Alborz Rezazadeh Sereshkeh, and Konstantinos G. Derpanis. "DATNet: Dense Auxiliary Tasks for Object Detection." In 2020 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2020. http://dx.doi.org/10.1109/wacv45572.2020.9093325.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Jeeveswaran, Kishaan, Senthilkumar Kathiresan, Arnav Varma, Omar Magdy, Bahram Zonooz, and Elahe Arani. "A Comprehensive Study of Vision Transformers on Dense Prediction Tasks." In 17th International Conference on Computer Vision Theory and Applications. SCITEPRESS - Science and Technology Publications, 2022. http://dx.doi.org/10.5220/0010917800003124.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Wei-Hong, Xialei Liu, and Hakan Bilen. "Learning Multiple Dense Prediction Tasks from Partially Annotated Data." In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2022. http://dx.doi.org/10.1109/cvpr52688.2022.01831.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Takahashi, Naoya, and Yuki Mitsufuji. "Densely connected multidilated convolutional networks for dense prediction tasks." In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2021. http://dx.doi.org/10.1109/cvpr46437.2021.00105.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Jung, HyunJun, Patrick Ruhkamp, Guangyao Zhai, Nikolas Brasch, Yitong Li, Yannick Verdie, Jifei Song, et al. "On the Importance of Accurate Geometry Data for Dense 3D Vision Tasks." In 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2023. http://dx.doi.org/10.1109/cvpr52729.2023.00082.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Lai, Shenqi, Xi Du, Jia Guo, and Kaipeng Zhang. "RaMLP: Vision MLP via Region-aware Mixing." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/111.

Повний текст джерела

Анотація:

Recently, MLP-based architectures achieved impressive results in image classification against CNNs and ViTs. However, there is an obvious limitation in that their parameters are related to image sizes, allowing them to process only fixed image sizes. Therefore, they cannot directly adapt dense prediction tasks (e.g., object detection and semantic segmentation) where images are of various sizes. Recent methods tried to address it but brought two new problems, long-range dependencies or important visual cues are ignored. This paper presents a new MLP-based architecture, Region-aware MLP (RaMLP), to satisfy various vision tasks and address the above three problems. In particular, we propose a well-designed module, Region-aware Mixing (RaM). RaM captures important local information and further aggregates these important visual clues. Based on RaM, RaMLP achieves a global receptive field even in one block. It is worth noting that, unlike most existing MLP-based architectures that adopt the same spatial weights to all samples, RaM is region-aware and adaptively determines weights to extract region-level features better. Impressively, our RaMLP outperforms state-of-the-art ViTs, CNNs, and MLPs on both ImageNet-1K image classification and downstream dense prediction tasks, including MS-COCO object detection, MS-COCO instance segmentation, and ADE20K semantic segmentation. In particular, RaMLP outperforms MLPs by a large margin (around 1.5% Apb or 1.0% mIoU) on dense prediction tasks. The training code could be found at https://github.com/xiaolai-sqlai/RaMLP.

Стилі APA, Harvard, Vancouver, ISO та ін.

Schuster, Rene, Oliver Wasenmuller, Christian Unger, and Didier Stricker. "SDC – Stacked Dilated Convolution: A Unified Descriptor Network for Dense Matching Tasks." In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2019. http://dx.doi.org/10.1109/cvpr.2019.00266.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Swain, Michael J., and Lambert E. Wixson. "Efficient Estimation for Markov Random Fields." In Image Understanding and Machine Vision. Washington, D.C.: Optica Publishing Group, 1989. http://dx.doi.org/10.1364/iumv.1989.wc2.

Повний текст джерела

Анотація:

The problem of assigning labels from a fixed set to each member of a set of sites appears at all levels of computer vision. Recently, an optimization algorithm known as Highest Confidence First (HCF) [Chou, 1988] has been applied to labeling tasks in low-level vision. Examples of such tasks include edge detection, in which each inter-pixel site must be labeled as either edge or non-edge, and the integration of intensity and sparse depth data for the labeling of depth discontinuities and the generation of dense depth estimates. In these tasks, it often outperforms conventional optimization techniques such as simulated annealing[Geman and Geman, 1984], Monte Carlo sampling[Marroquin, 1985], and Iterative Conditional Modes (ICM) estimation[Besag, 1986].

Стилі APA, Harvard, Vancouver, ISO та ін.

Kobina, Piriziwè, Thierry Duval, Laurent Brisson, and Anthony David. "Human-centered Evaluation of 3D Radial Layouts for Centrality Visualization." In WSCG'2022 - 30. International Conference in Central Europe on Computer Graphics, Visualization and Computer Vision'2022. Západočeská univerzita, 2022. http://dx.doi.org/10.24132/csrn.3201.9.

Повний текст джерела

Анотація:

In this paper we propose improvements to the 3D radial layouts that make it possible to visualize centrality mea- sures of the nodes in a graph. Our improvements mainly relate edge drawing and the evaluation of the 3D radial layouts. First, we projected not only the nodes but also the edges onto the visualization surfaces in order to reduce the node overlap that could be observed in previous 3D radial layouts. Secondly, we proposed a human-centered evaluation in order to compare the efficiency score and the time to complete tasks of the 3D radial layouts to those of the 2D radial layouts. The evaluation tasks proposed are related to the central nodes, the peripheral nodes and the dense areas of a graph. The results showed that 3D layouts can perform significantly better than 2D layouts in terms of efficiency when tasks are related to the central and peripheral nodes, while the difference in time is not statistically significant between these various layouts. Additionally, we found that the participants preferred interacting with 3D layouts over 2D layouts.

Стилі APA, Harvard, Vancouver, ISO та ін.

Suo, Wei, MengYang Sun, Peng Wang, and Qi Wu. "Proposal-free One-stage Referring Expression via Grid-Word Cross-Attention." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/143.

Повний текст джерела

Анотація:

Referring Expression Comprehension (REC) has become one of the most important tasks in visual reasoning, since it is an essential step for many vision-and-language tasks such as visual question answering. However, it has not been widely used in many downstream tasks because it suffers 1) two-stage methods exist heavy computation cost and inevitable error accumulation, and 2) one-stage methods have to depend on lots of hyper-parameters (such as anchors) to generate bounding box. In this paper, we present a proposal-free one-stage (PFOS) model that is able to regress the region-of-interest from the image, based on a textual query, in an end-to-end manner. Instead of using the dominant anchor proposal fashion, we directly take the dense-grid of image as input for a cross-attention transformer that learns grid-word correspondences. The final bounding box is predicted directly from the image without the time-consuming anchor selection process that previous methods suffer. Our model achieves the state-of-the-art performance on four referring expression datasets with higher efficiency, comparing to previous best one-stage and two-stage methods.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!