Дисертації з теми "Scene parsing"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-16 дисертацій для дослідження на тему "Scene parsing".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Zhao, Hang Ph D. Massachusetts Institute of Technology. "Visual and auditory scene parsing." Thesis, Massachusetts Institute of Technology, 2019. https://hdl.handle.net/1721.1/122101.
Повний текст джерелаThesis: Ph. D. in Mechanical Engineering and Computation, Massachusetts Institute of Technology, Department of Mechanical Engineering, 2019
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 121-132).
Scene parsing is a fundamental topic in computer vision and computational audition, where people develop computational approaches to achieve human perceptual system's ability in understanding scenes, e.g. group visual regions of an image into objects and segregate sound components in a noisy environment. This thesis investigates fully-supervised and self-supervised machine learning approaches to parse visual and auditory signals, including images, videos, and audios. Visual scene parsing refers to densely grouping and labeling of image regions into object concepts. First I build the MIT scene parsing benchmark based on a large scale, densely annotated dataset ADE20K. This benchmark, together with the state-of-the-art models we open source, offers a powerful tool for the research community to solve semantic and instance segmentation tasks. Then I investigate the challenge of parsing a large number of object categories in the wild. An open vocabulary scene parsing model which combines a convolutional neural network with a structured knowledge graph is proposed to address the challenge. Auditory scene parsing refers to recognizing and decomposing sound components in complex auditory environments. I propose a general audio-visual self-supervised learning framework that learns from a large amount of unlabeled internet videos. The learning process discovers the natural synchronization of vision and sounds without human annotation. The learned model achieves the capability to localize sound sources in videos and separate them from mixture. Furthermore, I demonstrate that motion cues in videos are tightly associated with sounds, which help in solving sound localization and separation problems.
by Hang Zhao.
Ph. D. in Mechanical Engineering and Computation
Ph.D.inMechanicalEngineeringandComputation Massachusetts Institute of Technology, Department of Mechanical Engineering
Lan, Cyril. "Urban scene parsing via low-rank texture patches." Thesis, Massachusetts Institute of Technology, 2012. http://hdl.handle.net/1721.1/77536.
Повний текст джерелаCataloged from PDF version of thesis.
Includes bibliographical references (p. 52-55).
Automatic 3-D reconstruction of city scenes from ground, aerial, and satellite imagery is a difficult problem that has seen active research for nearly two decades. The problem is difficult because many algorithms require salient areas in the image to be identified and segmented, a task that is typically done by humans. We propose a pipeline that detects these salient areas using low-rank texture patches. Areas in images such as building facades contain low-rank textures, which are an intrinsic property of the scene and invariant to viewpoint. The pipeline uses these low-rank patches to automatically rectify images and detect and segment out the patches with an energy-minimizing graph cut. The output is then further parameterized to provide useful data to existing 3-D reconstruction methods. The pipeline was evaluated on challenging test images from Microsoft Bing Maps oblique aerial photography and produced an 80% recall and precision with superb empirical results.
by Cyril Lan.
M.Eng.
Tung, Frederick. "Towards large-scale nonparametric scene parsing of images and video." Thesis, University of British Columbia, 2017. http://hdl.handle.net/2429/60790.
Повний текст джерелаScience, Faculty of
Computer Science, Department of
Graduate
Shu, Allen. "Use of shot/scene parsing in generating and browsing video databases." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/36985.
Повний текст джерелаPan, Hong. "Superparsing with Improved Segmentation Boundaries through Nonparametric Context." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32329.
Повний текст джерелаMunoz, Daniel. "Inference Machines: Parsing Scenes via Iterated Predictions." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/305.
Повний текст джерелаTaghavi, Namin Sarah. "Scene Parsing using Multiple Modalities." Phd thesis, 2016. http://hdl.handle.net/1885/116781.
Повний текст джерелаWang, Ren, and 王任. "Transferring Weakly-Supervised Convolutional Networks for Scene Parsing." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/29046824010257775924.
Повний текст джерела國立清華大學
資訊工程學系
103
Deep neural networks have become more and more popular in computer vision because of their powerful ability to extract distinctive image features. In deep neural networks, transfer learning plays an important role to avoid overfitting. In this thesis, we present a clustering-based method to combine fully-labeled data with weakly-labeled data for convolutional networks. By transfer learning, these convolutional networks can be viewed as pre-trained models for another target task. Next, we design a framework of convolutional networks for scene parsing to demonstrate our idea. Preliminary experimental results show that it is helpful to use these pre-trained convolutional networks for transfer learning.
Yu, Jie-Kuan, and 余界寬. "A Scene Parsing and Classification Method for Baseball Videos." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/wrt3k3.
Повний текст джерела國立臺北科技大學
資訊工程系所
94
The thesis proposes a scene parsing and classification system for baseball videos. The system automatically parses baseball video and extracts import scenes with image content analysis. Firstly, the system selects several candidate import scenes by field/cloth color ratio and scene change detection. Secondly, the system utilizes image features, e.g. object motion detection, field and cloth color detection, camera motion parameters, key-frame analysis, and motion-map comparison, etc, to analysis each candidate import scenes. Finally, the system classifies scenes according to above-mentioned features and predefined rules. Subsequently, the system will establish indexes of scenes correspond to the rules in baseball video database.
He, Tong. "Efficient Scene Parsing with Imagery and Point Cloud Data." Thesis, 2020. http://hdl.handle.net/2440/129534.
Повний текст джерелаThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2020
Ma, Chih Hao, and 馬智豪. "Nonparametric Scene Parsing with Deep Convolutional Features and Dense Alignment." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/65701079918564835437.
Повний текст джерела國立清華大學
資訊工程學系
103
This thesis addresses two key issues which concern the performance of nonparametric scene parsing: (1) the semantic quality of image retrieval; and (2) the accuracy in label transfer. First, because nonparametric methods annotate a query image through transferring labels from retrieved images, the task of image retrieval should find a set of “semantically similar” images to the query. Second, with the retrieval set, a good strategy should be developed to transfer semantic labels in pixel-level accuracy. In this thesis, we focus on improving scene parsing accuracy in these two issues. We propose using the state-of-the-art deep convolutional features as visual descriptors to improve the semantic quality of retrieved images. In addition, we include dense alignment into the Markov Random Field (MRF) inference framework to transfer labels at pixel-level accuracy. Next, we utilize the derived semantic labels as queries to expand the retrieval set and then conduct the second-round label transfer. Finally, we combine label transferring cues of two rounds into the MRF model to improve the labeling results. Our experiments on the SIFT Flow dataset and LMSun dataset show the improvement of the proposed approach over other nonparametric methods.
Najafi, Mohammad. "On the Role of Context at Different Scales in Scene Parsing." Phd thesis, 2017. http://hdl.handle.net/1885/116302.
Повний текст джерелаWang, Yi-Ru, and 王怡儒. "Incremental object detection and scene parsing from a moving vehicle via exemplar cut." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/y9fr2u.
Повний текст джерела國立清華大學
資訊工程學系所
105
This thesis presents a nonparametric scene parsing system based on superpixel matching and exemplar cut. Foreground classes are often neglected in other algorithms since they occupy only a small portion of the pixels in an image. To solve this problem, we utilize the concept of “exemplar” to improve their recognition rate. Our experimental images are unique as we photograph continuously from a moving vehicle. Thus, the characteristics of progressive images can be utilized to raise labeling accuracy. By adding the previous parsing result into retrieval set, we enhance the resemblance between query image and images in the retrieval set. We also remove the pictures which have large class proportion discrepancy compared with previous frame, which prevents the unlikely classes to appear on the query image. And we add exemplars in the previous image to candidate exemplars of query image. This novel idea can hopefully be applied on autonomous car driving in the near future. Our experimental dataset contains 4 foreground labels and 4 background labels. The system achieves state-of-the-art recognition rate on both per-pixel accuracy and per-class accuracy.
Shih, Yi-Hsuan, and 施亦宣. "Vehicles Detection at Urban Intersections via Adaptive Neighbor Sets of Nonparametric Scene Parsing." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/2rpj85.
Повний текст джерела國立清華大學
資訊工程學系所
105
The challenges faced by many experiments with the vehicles detection at the urban intersections are that a bounding box is manually given to circle out the target object in the first frame and that the lost target object during a procedure of tracking might lead to tracking error. Hence, the specific objective of this thesis is to explore some solutions to these problems. We apply the nonparametric scene parsing method to the vehicles detection at the urban intersections to automatically find out the car and motorcycle objects in the first frame without manually giving a bounding box. Moreover, the annotation results of scene parsing can improve the lost object. Many researches about the nonparametric scene parsing have been studied currently. The nonparametric scene parsing is a method to annotate a query image by transferring labels from the training data set. Referring to the method of [5], our proposed method firstly segments the images into superpixels. By means of calculating features, we can extract similar image set as the retrieval set from the training data set. In addition, we learn weights for each image in the training data set to minimize classification error using a leave-one-out strategy. In order to boost the classification of rare classes, we compute the semantic context of segments in the training data set and add the nearest rare class examples into the retrieval set. Finally, we compute the energy function in Markov Random Field (MRF) to label the query image. Since the scene of urban intersections is our main testing data set, we use background subtraction to extract foregrounds so as to reduce classification error. Our experimental results show that combination with the nonparametric scene parsing and background subtraction can effectively solve the problems of the vehicles detection at the urban intersections.
Liu, Keng-Chi, and 劉庚錡. "Low Discrepancy Adaptation with Weak Domain-specific Annotations for Efficient Indoor Scene Parsing." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/56zc4y.
Повний текст джерела國立臺灣大學
電子工程學研究所
107
Developing autonomous mobile agents that can perform behaviors like human based on their visual perception is an goal in the field of artificial intelligence and pixel-wise visual cues such as scene parsing are beneficial to such high-level applications. Significant improvement for these tasks have been made recent years due to the evolution of deep learning. Nevertheless, in addition to accuracy, efficiency remains a major issue. The term “efficiency” we have mentioned refers to both data collection and computational complexity. Remarkable scene parsing results made by supervised methods rely on numerous pixel-level annotations, which are time-consuming and expensive to obtain. Hence, to alleviate such cumbersome manual effort becomes a crucial issue during training procedure. Synthetic rendered data and weakly-supervised methods have been explored to overcome this challenge; unfortunately, the former suffer from severe domain shift, the latter with imprecise information. Moreover, majority of existing researches for weak supervision are only capable of handling foreground salient “things”. Hence, to address the issue, we employ an auxiliary teacher-student learning framework to train such untransferable task through pseudo-ground truths constructed by adapting auxiliary cues with lower domain discrepancy (e.g. depth) and leveraging domain-specific information (e.g. real appearance) in weak form. Thereafter, this imperfect information can be integrated effectively by developing a two-stage voting mechanism. From inference phase perspective, complexity has been the main issue for edge computing all the while. A typical network requires large run-time memory and 32-bit floating point computation. Furthermore, unlike general classification networks with only several category outputs,the hourglass network output is the same size and dimension as the input, which cost more resources. However, most of the previous researches focused on classification networks. In this thesis, considering the practicality as well as necessity of real world applications, our goal is to develop a “efficient” scene parsing algorithm with focus on three objectives: labeling, complexity, performance. First, it is shown that depth diminish more domain discrepancy for indoor scenes by introducing min-max normalization to the loss function. Additionally, we argue that the generator for real-to-sim reconstruction is capable of performing unsupervised sensor depth map restoration. Second, a scene parsing framework is proposed by performing auxiliary teacher-student learning with depth adaptation as well as domain-specific weak supervision information. We train a network based on the loss function that penalizes predictions disagreeing with the highly confident pseudo-ground truths provided by a two-stage integration mechanism so as to produce more accurate segmentations. The proposed method turns out to outperform the state-of-the-art adaptation method by 14.63% in terms of mean Intersection over Union (mIoU). Lastly, we extend the existing method to quantize the target lightweight scene parsing network into ternary weights and low bit-width activations (3-4 bits), which can reduce the model size to 21.9X and activation size to 8.2X smaller with only 1.8% mIoU loss.
Liu, Buyu. "Efficient multi-level scene understanding in videos." Phd thesis, 2016. http://hdl.handle.net/1885/110787.
Повний текст джерела