Academic literature on the topic 'Scene parsing'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Scene parsing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Scene parsing"

1

Shi, Hengcan, Hongliang Li, Fanman Meng, Qingbo Wu, Linfeng Xu, and King Ngi Ngan. "Hierarchical Parsing Net: Semantic Scene Parsing From Global Scene to Objects." IEEE Transactions on Multimedia 20, no. 10 (October 2018): 2670–82. http://dx.doi.org/10.1109/tmm.2018.2812600.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ce Liu, J. Yuen, and A. Torralba. "Nonparametric Scene Parsing via Label Transfer." IEEE Transactions on Pattern Analysis and Machine Intelligence 33, no. 12 (December 2011): 2368–82. http://dx.doi.org/10.1109/tpami.2011.131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Rui, Sheng Tang, Yongdong Zhang, Jintao Li, and Shuicheng Yan. "Perspective-Adaptive Convolutions for Scene Parsing." IEEE Transactions on Pattern Analysis and Machine Intelligence 42, no. 4 (April 1, 2020): 909–24. http://dx.doi.org/10.1109/tpami.2018.2890637.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Xuelong Li, Lichao Mou, and Xiaoqiang Lu. "Scene Parsing From an MAP Perspective." IEEE Transactions on Cybernetics 45, no. 9 (September 2015): 1876–86. http://dx.doi.org/10.1109/tcyb.2014.2361489.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Botao, Tao Hong, Rong Xiong, and Sergey A. Chepinskiy. "A terrain segmentation method based on pyramid scene parsing-mobile network for outdoor robots." International Journal of Advanced Robotic Systems 18, no. 5 (September 1, 2021): 172988142110486. http://dx.doi.org/10.1177/17298814211048633.

Full text
Abstract:
Terrain segmentation is of great significance to robot navigation, cognition, and map building. However, the existing vision-based methods are challenging to meet the high-accuracy and real-time performance. A terrain segmentation method with a novel lightweight pyramid scene parsing mobile network is proposed for terrain segmentation in robot navigation. It combines the feature extraction structure of MobileNet and the encoding path of pyramid scene parsing network. The depthwise separable convolution, the spatial pyramid pooling, and the feature fusion are employed to reduce the onboard computing time of pyramid scene parsing mobile network. A unique data set called Hangzhou Dianzi University Terrain Dataset is constructed for terrain segmentation, which contains more than 4000 images from 10 different scenes. The data set was collected from a robot’s perspective to make it more suitable for robotic applications. Experimental results show that the proposed method has high-accuracy and real-time performance on the onboard computer. Moreover, its real-time performance is better than most state-of-the-art methods for terrain segmentation.
APA, Harvard, Vancouver, ISO, and other styles
6

Chen, Xiaoyu, Chuan Wang, Jun Lu, Lianfa Bai, and Jing Han. "Road-Scene Parsing Based on Attentional Prototype-Matching." Sensors 22, no. 16 (August 17, 2022): 6159. http://dx.doi.org/10.3390/s22166159.

Full text
Abstract:
Road-scene parsing is complex and changeable; the interferences in the background destroy the visual structure in the image data, increasing the difficulty of target detection. The key to addressing road-scene parsing is to amplify the feature differences between the targets, as well as those between the targets and the background. This paper proposes a novel scene-parsing network, Attentional Prototype-Matching Network (APMNet), to segment targets by matching candidate features with target prototypes regressed from labeled road-scene data. To obtain reliable target prototypes, we designed the Sample-Selection and the Class-Repellence Algorithm in the prototype-regression progress. Also, we built the class-to-class and target-to-background attention mechanisms to increase feature recognizability based on the target’s visual characteristics and spatial-target distribution. Experiments conducted on two road-scene datasets, CamVid and Cityscapes, demonstrate that our approach effectively improves the representation of targets and achieves impressive results compared with other approaches.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Pingping, Wei Liu, Yinjie Lei, Hongyu Wang, and Huchuan Lu. "Deep Multiphase Level Set for Scene Parsing." IEEE Transactions on Image Processing 29 (2020): 4556–67. http://dx.doi.org/10.1109/tip.2019.2957915.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Boutell, Matthew R., Jiebo Luo, and Christopher M. Brown. "Scene Parsing Using Region-Based Generative Models." IEEE Transactions on Multimedia 9, no. 1 (January 2007): 136–46. http://dx.doi.org/10.1109/tmm.2006.886372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hager, Gregory D., and Ben Wegbreit. "Scene parsing using a prior world model." International Journal of Robotics Research 30, no. 12 (June 3, 2011): 1477–507. http://dx.doi.org/10.1177/0278364911399340.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Bu, Shuhui, Pengcheng Han, Zhenbao Liu, and Junwei Han. "Scene parsing using inference Embedded Deep Networks." Pattern Recognition 59 (November 2016): 188–98. http://dx.doi.org/10.1016/j.patcog.2016.01.027.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Scene parsing"

1

Zhao, Hang Ph D. Massachusetts Institute of Technology. "Visual and auditory scene parsing." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122101.

Full text
Abstract:
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Thesis: Ph. D. in Mechanical Engineering and Computation, Massachusetts Institute of Technology, Department of Mechanical Engineering, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 121-132).
Scene parsing is a fundamental topic in computer vision and computational audition, where people develop computational approaches to achieve human perceptual system's ability in understanding scenes, e.g. group visual regions of an image into objects and segregate sound components in a noisy environment. This thesis investigates fully-supervised and self-supervised machine learning approaches to parse visual and auditory signals, including images, videos, and audios. Visual scene parsing refers to densely grouping and labeling of image regions into object concepts. First I build the MIT scene parsing benchmark based on a large scale, densely annotated dataset ADE20K. This benchmark, together with the state-of-the-art models we open source, offers a powerful tool for the research community to solve semantic and instance segmentation tasks. Then I investigate the challenge of parsing a large number of object categories in the wild. An open vocabulary scene parsing model which combines a convolutional neural network with a structured knowledge graph is proposed to address the challenge. Auditory scene parsing refers to recognizing and decomposing sound components in complex auditory environments. I propose a general audio-visual self-supervised learning framework that learns from a large amount of unlabeled internet videos. The learning process discovers the natural synchronization of vision and sounds without human annotation. The learned model achieves the capability to localize sound sources in videos and separate them from mixture. Furthermore, I demonstrate that motion cues in videos are tightly associated with sounds, which help in solving sound localization and separation problems.
by Hang Zhao.
Ph. D. in Mechanical Engineering and Computation
Ph.D.inMechanicalEngineeringandComputation Massachusetts Institute of Technology, Department of Mechanical Engineering
APA, Harvard, Vancouver, ISO, and other styles
2

Lan, Cyril. "Urban scene parsing via low-rank texture patches." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/77536.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 52-55).
Automatic 3-D reconstruction of city scenes from ground, aerial, and satellite imagery is a difficult problem that has seen active research for nearly two decades. The problem is difficult because many algorithms require salient areas in the image to be identified and segmented, a task that is typically done by humans. We propose a pipeline that detects these salient areas using low-rank texture patches. Areas in images such as building facades contain low-rank textures, which are an intrinsic property of the scene and invariant to viewpoint. The pipeline uses these low-rank patches to automatically rectify images and detect and segment out the patches with an energy-minimizing graph cut. The output is then further parameterized to provide useful data to existing 3-D reconstruction methods. The pipeline was evaluated on challenging test images from Microsoft Bing Maps oblique aerial photography and produced an 80% recall and precision with superb empirical results.
by Cyril Lan.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
3

Tung, Frederick. "Towards large-scale nonparametric scene parsing of images and video." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/60790.

Full text
Abstract:
In computer vision, scene parsing is the problem of labelling every pixel in an image or video with its semantic category. Its goal is a complete and consistent semantic interpretation of the structure of the real world scene. Scene parsing forms a core component in many emerging technologies such as self-driving vehicles and prosthetic vision, and also informs complementary computer vision tasks such as depth estimation. This thesis presents a novel nonparametric scene parsing framework for images and video. In contrast to conventional practice, our scene parsing framework is built on nonparametric search-based label transfer instead of discriminative classification. We formulate exemplar-based scene parsing for both 2D (from images) and 3D (from video), and demonstrate accurate labelling on standard benchmarks. Since our framework is nonparametric, it is easily extensible to new categories and examples as the database grows. Nonparametric scene parsing is computationally demanding at test time, and requires methods for searching large collections of data that are time and memory efficient. This thesis also presents two novel binary encoding algorithms for large-scale approximate nearest neighbor search: the bank of random rotations is data independent and does not require training, while the supervised sparse projections algorithm targets efficient search of high-dimensional labelled data. We evaluate these algorithms on standard retrieval benchmarks, and then demonstrate their integration into our nonparametric scene parsing framework. Using 256-bit codes, binary encoding reduces search times by an order of magnitude and memory requirements by three orders of magnitude, while maintaining a mean per-class accuracy within 1% on the 3D scene parsing task.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
4

Shu, Allen. "Use of shot/scene parsing in generating and browsing video databases." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/36985.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pan, Hong. "Superparsing with Improved Segmentation Boundaries through Nonparametric Context." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32329.

Full text
Abstract:
Scene parsing, or segmenting all the objects in an image and identifying their categories, is one of the core problems of computer vision. In order to achieve an object-level semantic segmentation, we build upon the recent superparsing approach by Tighe and Lazebnik, which is a nonparametric solution to the image labeling problem. Superparsing consists of four steps. For a new query image, the most similar images from the training dataset of labeled images is retrieved based on global features. In the second step, the query image is segmented into superpxiels and 20 di erent local features are computed for each superpixel. We propose to use the SLICO segmentation method to allow control of the size, shape and compactness of the superpixels because SLICO is able to produce accurate boundaries. After all superpixel features have been extracted, feature-based matching of superpixels is performed to nd the nearest-neighbour superpixels in the retrieval set for each query superpxiel. Based on the neighbouring superpixels a likelihood score for each class is calculated. Finally, we formulate a Conditional Random Field (CRF) using the likelihoods and a pairwise cost both computed from nonparametric estimation to optimize the labeling of the image. Speci cally, we de ne a novel pairwise cost to provide stronger semantic contextual constraints by incorporating the similarity of adjacent superpixels depending on local features. The optimized labeling obtained with the CRF results in superpixels with the same labels grouped together to generate segmentation results which also identify the categories of objects in an image. We evaluate our improvements to the superparsing approach using segmentation evaluation measures as well as the per-pixel rate and average per-class rate in a labeling evaluation. We demonstrate the success of our modi ed approach on the SIFT Flow dataset, and compare our results with the basic superparsing methods proposed by Tighe and Lazebnik.
APA, Harvard, Vancouver, ISO, and other styles
6

Munoz, Daniel. "Inference Machines: Parsing Scenes via Iterated Predictions." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/305.

Full text
Abstract:
Extracting a rich representation of an environment from visual sensor readings canbenefit many tasks in robotics, e.g., path planning, mapping, and object manipulation.While important progress has been made, it remains a difficult problem to effectivelyparse entire scenes, i.e., to recognize semantic objects, man-made structures, and landforms.This process requires not only recognizing individual entities but also understandingthe contextual relations among them. The prevalent approach to encode such relationships is to use a joint probabilistic orenergy-based model which enables one to naturally write down these interactions. Unfortunately,performing exact inference over these expressive models is often intractableand instead we can only approximate the solutions. While there exists a set of sophisticatedapproximate inference techniques to choose from, the combination of learning andapproximate inference for these expressive models is still poorly understood in theoryand limited in practice. Furthermore, using approximate inference on any learned modeloften leads to suboptimal predictions due to the inherent approximations. As we ultimately care about predicting the correct labeling of a scene, and notnecessarily learning a joint model of the data, this work proposes to instead view theapproximate inference process as a modular procedure that is directly trained in orderto produce a correct labeling of the scene. Inspired by early hierarchical models in thecomputer vision literature for scene parsing, the proposed inference procedure is structuredto incorporate both feature descriptors and contextual cues computed at multipleresolutions within the scene. We demonstrate that this inference machine frameworkfor parsing scenes via iterated predictions offers the best of both worlds: state-of-the-artclassification accuracy and computational efficiency when processing images and/orunorganized 3-D point clouds. Additionally, we address critical problems that arise inpractice when parsing scenes on board real-world systems: integrating data from multiplesensor modalities and efficiently processing data that is continuously streaming fromthe sensors.
APA, Harvard, Vancouver, ISO, and other styles
7

Taghavi, Namin Sarah. "Scene Parsing using Multiple Modalities." Phd thesis, 2016. http://hdl.handle.net/1885/116781.

Full text
Abstract:
Scene parsing is the task of assigning a semantic class label to the elements of a scene. It has many applications in autonomous systems when we need to understand the visual data captured from our environment. Different sensing modalities, such as RGB cameras, multi-spectral cameras and Lidar sensors, can be beneficial when pursuing this goal. Scene analysis using multiple modalities aims at leveraging complementary information captured by multiple sensing modalities. When multiple modalities are used together, the strength of each modality can combat the weaknesses of other modalities. Therefore, working with multiple modalities enables us to use powerful tools for scene analysis. However, possible gains of using multiple modalities come with new challenges such as dealing with misalignments between different modalities. In this thesis, our aim is to take advantage of multiple modalities to improve outdoor scene parsing and address the associated challenges. We initially investigate the potential of multi-spectral imaging for outdoor scene analysis. Our approach is to combine the discriminative strength of the multi-spectral signature in each pixel and the corresponding nature of the surrounding texture. Many materials appearing similar if viewed by a common RGB camera, will show discriminating properties if viewed by a camera capturing a greater number of separated wavelengths. When using imagery data for scene parsing, a number of challenges stem from, e.g., color saturation, shadow and occlusion. To address such challenges, we focus on scene parsing using multiple modalities, panoramic RGB images and 3D Lidar data in particular, and propose a multi-view approach to select the best 2D view that describes each element in the 3D point cloud data. Keeping our focus on using multiple modalities, we then introduce a multi-modal graphical model to address the problems of scene parsing using 2D3D data exhibiting extensive many-to-one correspondences. Existing methods often impose a hard correspondence between the 2D and 3D data, where the 2D and 3D corresponding regions are forced to receive identical labels. This results in performance degradation due to misalignments, 3D-2D projection errors and occlusions. We address this issue by defining a graph over the entire set of data that models soft correspondences between the two modalities. This graph encourages each region in a modality to leverage the information from its corresponding regions in the other modality to better estimate its class label. Finally, we introduce latent nodes to explicitly model inconsistencies between the modalities. The latent nodes allow us not only to leverage information from various domains in order to improve the labeling of the modalities, but also to cut the edges between inconsistent regions. To eliminate the need for hand tuning the parameters of our model, we propose to learn potential functions from training data. In addition, to demonstrate the benefits of the proposed approaches on publicly available multi-modality datasets, we introduce a new multi-modal dataset of panoramic images and 3D point cloud data captured from outdoor scenes (NICTA/2D3D Dataset).
APA, Harvard, Vancouver, ISO, and other styles
8

Wang, Ren, and 王任. "Transferring Weakly-Supervised Convolutional Networks for Scene Parsing." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/29046824010257775924.

Full text
Abstract:
碩士
國立清華大學
資訊工程學系
103
Deep neural networks have become more and more popular in computer vision because of their powerful ability to extract distinctive image features. In deep neural networks, transfer learning plays an important role to avoid overfitting. In this thesis, we present a clustering-based method to combine fully-labeled data with weakly-labeled data for convolutional networks. By transfer learning, these convolutional networks can be viewed as pre-trained models for another target task. Next, we design a framework of convolutional networks for scene parsing to demonstrate our idea. Preliminary experimental results show that it is helpful to use these pre-trained convolutional networks for transfer learning.
APA, Harvard, Vancouver, ISO, and other styles
9

Yu, Jie-Kuan, and 余界寬. "A Scene Parsing and Classification Method for Baseball Videos." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/wrt3k3.

Full text
Abstract:
碩士
國立臺北科技大學
資訊工程系所
94
The thesis proposes a scene parsing and classification system for baseball videos. The system automatically parses baseball video and extracts import scenes with image content analysis. Firstly, the system selects several candidate import scenes by field/cloth color ratio and scene change detection. Secondly, the system utilizes image features, e.g. object motion detection, field and cloth color detection, camera motion parameters, key-frame analysis, and motion-map comparison, etc, to analysis each candidate import scenes. Finally, the system classifies scenes according to above-mentioned features and predefined rules. Subsequently, the system will establish indexes of scenes correspond to the rules in baseball video database.
APA, Harvard, Vancouver, ISO, and other styles
10

He, Tong. "Efficient Scene Parsing with Imagery and Point Cloud Data." Thesis, 2020. http://hdl.handle.net/2440/129534.

Full text
Abstract:
Scene parsing, aiming to provide a comprehensive understanding of the scene, is a fundamental task in the field of computer vision and remains a challenging problem for the unconstrained environment and open scenes. The results of scene parsing can generate semantic labels, location distribution, as well as for instance shape information for each element, which has shown great potential in the applications like automatic driving, video surveillance, just to name a few. Also, the efficiency of the methods determines whether it can be used on a large scale. With the easy availability of various sensors, more and more solutions resort to different data modalities according to the requirements of the applications. Imagery and point cloud are two representative data sources. How to design efficient frameworks in separate domains remains an open problem and more importantly, lays a solid foundation for multimodal fusion. In this thesis, we study the task of scene parsing under different data modalities, i.e., imagery and point cloud data, by deep neural networks. The first part of this thesis addresses the task of efficient semantic segmentation in 2D image data. The aim is to improve the accuracy of small models while maintaining their fast inference speed without introducing extra computation overhead. To achieve this, we propose a knowledge-distillation-based method tailored for semantic segmentation to improve the performance of the small Fully Convolution Network (FCN) model by injecting compact feature representation and long-tail dependencies from the large complex FCN model (incorporated in Chapter 3). The second part of this thesis addresses the task of semantic and instance segmentation on point cloud data. Compared to rasterized image data, point cloud data often suffer from two problems: (1) how to efficiently extract and aggregate context information. (2) how to solve the forgetting issue Lin et al., 2017c caused by extreme data imbalance. For the first problem, we study the influence of instance-aware knowledge by proposing an Instance-Aware Module by learning discriminative instance embedding features via metric learning (incorporated in Chapter 4). We also address the second problem by proposing a memory-augmented network to learn and memorize the representative prototypes that cover diverse samples universally (incorporated in Chapter 5).
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2020
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Scene parsing"

1

Museum, of Contemporary Art (Los Angeles Calif ). The social scene: The Ralph M. Parsons Foundation photography collection at the Museum of Contemporary Art, Los Angeles. Los Angeles: Museum of Contemporary Art, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Levinson, Marjorie, and Marjorie Levinson. Parsing the Frost. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198810315.003.0009.

Full text
Abstract:
The reading of Coleridge’s “Frost at Midnight” at the center of this chapter opens up the cognitive and aesthetic stakes of seeing writing. It does so by analyzing the encounter with visible script, an experience that can be understood as a reworking of a previously unrecognized source, the scene of writing in David Hume’s A Treatise of Human Nature, Book 4. Just such an encounter is the activity in play with the figure of the window frost and with the entire poem. Broadly speaking, sentence formation is seen as analogous to frost formation. In this way, the discussion seeks to shift the sensory register of criticism of the poem from its traditional emphasis on the acoustic to a new appreciation of the visible.
APA, Harvard, Vancouver, ISO, and other styles
3

Coleman, A. D., Cornelia H. Butler, Liz Kotz, and Calif.) Museum of Contemporary Art (Los. The Social Scene, The Ralph R. Parsons Foundation, Photography Collection. Museum of Contemporary Art, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Scene parsing"

1

Liu, Ce, Jenny Yuen, and Antonio Torralba. "Nonparametric Scene Parsing via Label Transfer." In Dense Image Correspondences for Computer Vision, 207–36. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-23048-1_10.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhong, Guangyu, Yi-Hsuan Tsai, and Ming-Hsuan Yang. "Weakly-Supervised Video Scene Co-parsing." In Computer Vision – ACCV 2016, 20–36. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-54181-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Xiao, Tete, Yingcheng Liu, Bolei Zhou, Yuning Jiang, and Jian Sun. "Unified Perceptual Parsing for Scene Understanding." In Computer Vision – ECCV 2018, 432–48. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01228-1_26.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wu, Tianyi, Yu Lu, Yu Zhu, Chuang Zhang, Ming Wu, Zhanyu Ma, and Guodong Guo. "GINet: Graph Interaction Network for Scene Parsing." In Computer Vision – ECCV 2020, 34–51. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58520-4_3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Lu, Ye, Xian Zhong, Wenxuan Liu, Jingling Yuan, and Bo Ma. "Tree-Structured Channel-Fuse Network for Scene Parsing." In Advances in Intelligent Systems and Computing, 697–709. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-55180-3_53.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Tung, Frederick, and James J. Little. "CollageParsing: Nonparametric Scene Parsing by Adaptive Overlapping Windows." In Computer Vision – ECCV 2014, 511–25. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10599-4_33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Li, Xiangtai, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Shaohua Tan, and Yunhai Tong. "Semantic Flow for Fast and Accurate Scene Parsing." In Computer Vision – ECCV 2020, 775–93. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-58452-8_45.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Tang, Keke, Zhe Zhao, and Xiaoping Chen. "Joint Visual Phrase Detection to Boost Scene Parsing." In Advances in Visual Computing, 389–99. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-27863-6_36.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Yu, Hui, Yuecheng Song, Wenyu Ju, and Zhenbao Liu. "Scene Parsing with Deep Features and Spatial Structure Learning." In Lecture Notes in Computer Science, 715–22. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-48896-7_71.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Cui, Xiaofei, Hanbing Qu, Xi Chen, Ziliang Qi, and Liang Dong. "Scene Parsing with Deep Features and Per-Exemplar Detectors." In Lecture Notes in Electrical Engineering, 367–76. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-6445-6_40.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Scene parsing"

1

Wang, Yu-Siang, Chenxi Liu, Xiaohui Zeng, and Alan Yuille. "Scene Graph Parsing as Dependency Parsing." In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Stroudsburg, PA, USA: Association for Computational Linguistics, 2018. http://dx.doi.org/10.18653/v1/n18-1037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Zhao, Hengshuang, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. "Pyramid Scene Parsing Network." In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017. http://dx.doi.org/10.1109/cvpr.2017.660.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhao, Hang, Xavier Puig, Bolei Zhou, Sanja Fidler, and Antonio Torralba. "Open Vocabulary Scene Parsing." In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017. http://dx.doi.org/10.1109/iccv.2017.221.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Yu, Chengcheng, Xiaobai Liu, and Song-Chun Zhu. "Single-Image 3D Scene Parsing Using Geometric Commonsense." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/649.

Full text
Abstract:
This paper presents a unified grammatical framework capable of reconstructing a variety of scene types (e.g., urban, campus, county etc.) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; (ii) pair-wise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhou, Bolei, Hang Zhao, Xavier Puig, Sanja Fidler, Adela Barriuso, and Antonio Torralba. "Scene Parsing through ADE20K Dataset." In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2017. http://dx.doi.org/10.1109/cvpr.2017.544.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zhang, Rui, Sheng Tang, Luoqi Liu, Yongdong Zhang, Jintao Li, and Shuicheng Yan. "High Resolution Feature Recovering for Accelerating Urban Scene Parsing." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/161.

Full text
Abstract:
Both accuracy and speed are equally important in urban scene parsing. Most of the existing methods mainly focus on improving parsing accuracy, ignoring the problem of low inference speed due to large-sized input and high resolution feature maps. To tackle this issue, we propose a High Resolution Feature Recovering (HRFR) framework to accelerate a given parsing network. A Super-Resolution Recovering module is employed to recover features of large original-sized images from features of down-sampled input. Therefore, our framework can combine the advantages of (1) fast speed of networks with down-sampled input and (2) high accuracy of networks with large original-sized input. Additionally, we employ auxiliary intermediate supervision and boundary region re-weighting to facilitate the optimization of the network. Extensive experiments on the two challenging Cityscapes and CamVid datasets well demonstrate the effectiveness of the proposed HRFR framework, which can accelerate the scene parsing inference process by about 3.0x speedup from 1/2 down-sampled input with negligible accuracy reduction.
APA, Harvard, Vancouver, ISO, and other styles
7

Zhang, Rui, Sheng Tang, Min Lin, Jintao Li, and Shuicheng Yan. "Global-residual and Local-boundary Refinement Networks for Rectifying Scene Parsing Predictions." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/479.

Full text
Abstract:
Most of existing scene parsing methods suffer from the serious problems of both inconsistent parsing results and object boundary shift. To tackle these problems, we first propose an iterative Global-residual Refinement Network (GRN) through exploiting global contextual information to predict the parsing residuals and iteratively smoothen the inconsistent parsing labels. Furthermore, we propose a Local-boundary Refinement Network (LRN) to learn the position-adaptive propagation coefficients so that local contextual information from neighbors can be optimally captured for refining object boundaries. Finally, we cascade the proposed two refinement networks after a fully residual convolutional neural network within a uniform framework. Extensive experiments on ADE20K and Cityscapes datasets well demonstrate the effectiveness of the two refinement methods for refining scene parsing predictions.
APA, Harvard, Vancouver, ISO, and other styles
8

Ce Liu, J. Yuen, and A. Torralba. "Nonparametric scene parsing: Label transfer via dense scene alignment." In 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2009. http://dx.doi.org/10.1109/cvprw.2009.5206536.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Liu, Ce, Jenny Yuen, and Antonio Torralba. "Nonparametric scene parsing: Label transfer via dense scene alignment." In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPR Workshops). IEEE, 2009. http://dx.doi.org/10.1109/cvpr.2009.5206536.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Zhang, Rui, Sheng Tang, Yongdong Zhang, Jintao Li, and Shuicheng Yan. "Scale-Adaptive Convolutions for Scene Parsing." In 2017 IEEE International Conference on Computer Vision (ICCV). IEEE, 2017. http://dx.doi.org/10.1109/iccv.2017.224.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography