Dissertations / Theses on the topic 'Monocular depth'

To see the other types of publications on this topic, follow the link: Monocular depth.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Monocular depth.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Andraghetti, Lorenzo. "Monocular Depth Estimation enhancement by depth from SLAM Keypoints." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16626/.

Full text
Abstract:
Training a neural network in a supervised way is extremely challenging since ground truth is expensive, time consuming and limited. Therefore the best choice is to do it unsupervisedly, exploiting easier-to-obtain binocular stereo images and epipolar geometry constraints. Sometimes however, this is not enough to predict fairly correct depth maps because of ambiguity of colour images, due for instance to shadows, reflective surfaces and so on. A Simultaneous Location and Mapping (SLAM) algorithm keeps track of hundreds of 3D landmarks in each frame of a sequence. Therefore, given the base assumption that it has the right scale, it can help the depth prediction providing a value for each of those 3D points. This work proposes a novel approach to enhance the depth prediction exploiting the potential of the SLAM depth points to their limits.
APA, Harvard, Vancouver, ISO, and other styles
2

Pinheiro, de Carvalho Marcela. "Deep Depth from Defocus : Neural Networks for Monocular Depth Estimation." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS609.

Full text
Abstract:
L'estimation de profondeur à partir d'une seule image est maintenant cruciale pour plusieurs applications, de la robotique à la réalité virtuelle. Les approches par apprentissage profond dans les tâches de vision par ordinateur telles que la reconnaissance et la classification d'objets ont également apporté des améliorations au domaine de l'estimation de profondeur. Dans cette thèse, nous développons des méthodes pour l'estimation en profondeur avec un réseau de neurones profond en explorant différents indices, tels que le flou de défocalisation et la sémantique. Nous menons également plusieures expériences pour comprendre la contribution de chaque indice à la performance du modèle et sa capacité de généralisation. Dans un premier temps, nous proposons un réseau de neurones convolutif efficace pour l'estimation de la profondeur ainsi qu'une stratégie d'entraînement basée sur les réseaux génératifs adversaires conditionnels. Notre méthode permet d'obtenir des performances parmis les meilleures sur les jeux de données standard. Ensuite, nous proposons d'explorer le flou de défocalisation, une information optique fondamentalement liée à la profondeur. Nous montrons que ces modèles sont capables d'apprendre et d'utiliser implicitement cette information pour améliorer les performances et dépasser les limitations connues des approches classiques d'estimation de la profondeur par flou de défocalisation. Nous construisons également une nouvelle base de données avec de vraies images focalisées et défocalisées que nous utilisons pour valider notre approche. Enfin, nous explorons l'utilisation de l'information sémantique, qui apporte une information contextuelle riche, en apprenant à la prédire conjointement avec la profondeur par une approache multi-tâche
Depth estimation from a single image is a key instrument for several applications from robotics to virtual reality. Successful Deep Learning approaches in computer vision tasks as object recognition and classification also benefited the domain of depth estimation. In this thesis, we develop methods for monocular depth estimation with deep neural network by exploring different cues: defocus blur and semantics. We conduct several experiments to understand the contribution of each cue in terms of generalization and model performance. At first, we propose an efficient convolutional neural network for depth estimation along with a conditional Generative Adversarial framework. Our method achieves performances among the best on standard datasets for depth estimation. Then, we propose to explore defocus blur cues, which is an optical information deeply related to depth. We show that deep models are able to implicitly learn and use this information to improve performance and overcome known limitations of classical Depth-from-Defocus. We also build a new dataset with real focused and defocused images that we use to validate our approach. Finally, we explore the use of semantic information, which brings rich contextual information while learned jointly to depth on a multi-task approach. We validate our approaches with several datasets containing indoor, outdoor and aerial images
APA, Harvard, Vancouver, ISO, and other styles
3

Cheda, Diego. "Monocular Depth Cues in Computer Vision Applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/121644.

Full text
Abstract:
La percepción de la profundidad es un aspecto clave en la visión humana. El ser humano realiza esta tarea sin esfuerzo alguno con el objetivo de efectuar diversas actividades cotidianas. A menudo, la percepción de la profundidad se ha asociado con la visión binocular. Pese a esto, los seres humanos tienen una capacidad asombrosa de percibir las relaciones de profundidad, incluso a partir de una sola imagen, mediante el uso de varias pistas monoculares. En el campo de la visión por ordenador, si la información de la profundidad de una imagen estuviera disponible, muchas tareas podr´ıan ser planteadas desde una perspectiva diferente en aras de un mayor rendimiento y robustez. Sin embargo, dada una única imagen, esta posibilidad es generalmente descartada, ya que la obtención de la información de profundidad es frecuentemente obtenida por las técnicas de reconstrucción tridimensional, que requieren dos o más imágenes de la misma escena tomadas desde diferentes puntos de vista. Recientemente, algunas propuestas han demostrado que es posible obtener información de profundidad a partir de imágenes individuales. En esencia, la idea es aprovechar el conocimiento a priori de las condiciones de adquisición de la imagen y de la escena observada para estimar la profundidad empleando pistas pictóricas monoculares. Estos enfoques tratan de estimar con precisión los mapas de profundidad de la escena empleando técnicas computacionalmente costosas. Sin embargo, muchos algoritmos de visión por ordenador no necesitan un mapa de profundidad detallado de la imagen. De hecho, sólo una descripción en profundidad aproximada puede ser muy valiosa en muchos problemas. En nuestro trabajo, hemos demostrado que incluso la información aproximada de profundidad puede integrarse en diferentes tareas siguiendo una estrategia holística con el fin de obtener resultados más precisos y robustos. En ese sentido, hemos propuesto una técnica simple, pero fiable, por medio de la cual regiones de la imagen de una escena se clasifican en rangos de profundidad discretos para construir un mapa tosco de la profundidad. Sobre la base de esta representación, hemos explorado la utilidad de nuestro método en tres dominios de aplicación desde puntos de vista novedosos: la estimación de la rotación de la cámara, la estimación del fondo de una escena y la generación de ventanas de interés para la detección de peatones. En el primer caso, calculamos la rotación de la cámara montada en un veh´ıculo en movimiento mediante dos nuevos m˜A c ⃝todos que identifican elementos distantes en la imagen a través de nuestros mapas de profundidad. En la reconstrucción del fondo de una imagen, propusimos un método novedoso que penaliza las regiones cercanas en una función de coste que integra, además, información del color y del movimiento. Por último, empleamos la información geométrica y de la profundidad de una escena para la generación de peatones candidatos. Este método reduce significativamente el número de ventanas generadas, las cuales serán posteriormente procesadas por un clasificador de peatones. En todos los casos, los resultados muestran que los enfoques basados en la profundidad contribuyen a un mejor rendimiento de las aplicaciones estudidadas.
Depth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following holistic and alternative strategies to obtain more precise and robustness results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle from two novels methods that identify distant elements in the image, where the translation component of the image flow field is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our depth-based approaches contribute to better performances.
APA, Harvard, Vancouver, ISO, and other styles
4

Toschi, Marco. "Towards Monocular Depth Estimation for Robot Guidance." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Human visual perception is a powerful tool to let us interact with the world, interpreting depth using both physiological and psychological cues. In the early days, machine vision was primarily inspired by physiological cues, guiding robots with bulky sensors based on focal length adjustments, pattern matching, and binocular disparity. In reality, however, we always get a certain degree of depth sensation from the monocular image reproduced on the retina, which is judged by our brain upon empirical grounds. With the advent of deep learning techniques, estimating depth from a monocular image has became a major research topic. Currently, it is still far from industrial use, as the estimated depth is valid only up to a scale factor, leaving us with relative depth information. We propose an algorithm to estimate the depth of a scene at the actual global scale, leveraging geometric constraints and state-of-the-art techniques in optical flow and depth estimation. We first compute the three-dimensional information of multiple similar scenes, triangulating multi-view images for which dense correspondences have been estimated by an Optical Flow Estimation network. Then we train a Monocular Depth Estimation network on the precomputed multiple scenes to learn their similarities, like objects sizes, and ignore their differences, like objects arrangements. Experimental results suggest that our method is able to learn to estimate metric depth of a novel similar scene, opening the possibility to perform Robot Guidance using an affordable, light and compact smartphone camera as depth sensor.
APA, Harvard, Vancouver, ISO, and other styles
5

Rovinelli, Marco. "Realtime Monocular Depth Estimation on Mobile Phones." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24159/.

Full text
Abstract:
Depth estimation is a necessary task to understand and navigate the environment around us. Over the years, many active sensors have been developed to measure depth but they are expensive and require additional space to be mounted. A cheaper alternative consists of estimating depth maps using images taken by a mobile phone camera. Since most mobile phones don't have cameras built for stereo depth sensing, it would be ideal to be able to recover depth from a single image using only the computational capability of the mobile phone itself. This can be achieved by training a neural network on ground truth depth maps. This type of data is very expensive to obtain so it's preferred to train the neural network using self-supervision from multiple images. Since the devices where the trained models will be deployed have only one camera, it is ideal to train the network on monocular videos representing the actual data distribution at deployment. Self-supervised training using monocular videos lowers the accuracy of the depth maps and brings the additional challenge of being able to predict depth only up to an unknown scale factor. To this end, additional information, velocity provided by the GPS, and sparse points computed by a monocular SLAM algorithm, are employed to recover scale and improve the accuracy. This study will investigate different neural network architectures and training schemes to achieve depth maps as accurately as possible given the constraints of the computational budget available on modern mobile phones.
APA, Harvard, Vancouver, ISO, and other styles
6

Rivero, Pindado Víctor. "Monocular visual SLAM based on Inverse depth parametrization." Thesis, Mälardalen University, School of Innovation, Design and Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-10166.

Full text
Abstract:

The first objective of this research has always been carry out a study of visual techniques SLAM (Simultaneous localization and mapping), specifically the type monovisual, less studied than the stereo. These techniques have been well studied in the world of robotics. These techniques are focused on reconstruct a map of the robot enviroment while maintaining its position information in that map. We chose to investigate a method to encode the points by the inverse of its depth, from the first time that the feature was observed. This method permits efficient and accurate representation of uncertainty during undelayed initialization and beyond, all within the standard extended Kalman filter (EKF).At first, the study mentioned it should be consolidated developing an application that implements this method. After suffering various difficulties, it was decided to make use of a platform developed by the same author of Slam method mentioned in MATLAB. Until then it had developed the tasks of calibration, feature extraction and matching. From that point, that application was adapted to the characteristics of our camera and our video to work. We recorded a video with our camera following a known trajectory to check the calculated path shown in the application. Corroborating works and studying the limitations and advantages of this method.

APA, Harvard, Vancouver, ISO, and other styles
7

Chan, Kevin S. (Kevin Sao Wei). "Multiview monocular depth estimation using unsupervised learning methods." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119753.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 50-51).
Existing learned methods for monocular depth estimation use only a single view of scene for depth evaluation, so they inherently overt to their training scenes and cannot generalize well to new datasets. This thesis presents a neural network for multiview monocular depth estimation. Teaching a network to estimate depth via structure from motion allows it to generalize better to new environments with unfamiliar objects. This thesis extends recent work in unsupervised methods for single-view monocular depth estimation and uses the reconstruction losses for training posed in those works. Models and baseline models were evaluated on a variety of datasets and results indicate that indicate multiview models generalize across datasets better than previous work. This work is unique in that it emphasizes cross domain performance and ability to generalize more so than performance on the training set.
by Kevin S. Chan.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
8

Larsson, Susanna. "Monocular Depth Estimation Using Deep Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159981.

Full text
Abstract:
For a long time stereo-cameras have been deployed in visual Simultaneous Localization And Mapping (SLAM) systems to gain 3D information. Even though stereo-cameras show good performance, the main disadvantage is the complex and expensive hardware setup it requires, which limits the use of the system. A simpler and cheaper alternative are monocular cameras, however monocular images lack the important depth information. Recent works have shown that having access to depth maps in monocular SLAM system is beneficial since they can be used to improve the 3D reconstruction. This work proposes a deep neural network that predicts dense high-resolution depth maps from monocular RGB images by casting the problem as a supervised regression task. The network architecture follows an encoder-decoder structure in which multi-scale information is captured and skip-connections are used to recover details. The network is trained and evaluated on the KITTI dataset achieving results comparable to state-of-the-art methods. With further development, this network shows good potential to be incorporated in a monocular SLAM system to improve the 3D reconstruction.
APA, Harvard, Vancouver, ISO, and other styles
9

Möckelind, Christoffer. "Improving deep monocular depth predictions using dense narrow field of view depth images." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235660.

Full text
Abstract:
In this work we study a depth prediction problem where we provide a narrow field of view depth image and a wide field of view RGB image to a deep network tasked with predicting the depth for the entire RGB image. We show that by providing a narrow field of view depth image, we improve results for the area outside the provided depth compared to an earlier approach only utilizing a single RGB image for depth prediction. We also show that larger depth maps provide a greater advantage than smaller ones and that the accuracy of the model decreases with the distance from the provided depth. Further, we investigate several architectures as well as study the effect of adding noise and lowering the resolution of the provided depth image. Our results show that models provided low resolution noisy data performs on par with the models provided unaltered depth.
I det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.
APA, Harvard, Vancouver, ISO, and other styles
10

Pilzer, Andrea. "Learning Unsupervised Depth Estimation, from Stereo to Monocular Images." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/268252.

Full text
Abstract:
In order to interact with the real world, humans need to perform several tasks such as object detection, pose estimation, motion estimation and distance estimation. These tasks are all part of scene understanding and are fundamental tasks of computer vision. Depth estimation received unprecedented attention from the research community in recent years due to the growing interest in its practical applications (ie robotics, autonomous driving, etc.) and the performance improvements achieved with deep learning. In fact, the applications expanded from the more traditional tasks such as robotics to new fields such as autonomous driving, augmented reality devices and smartphones applications. This is due to several factors. First, with the increased availability of training data, bigger and bigger datasets were collected. Second, deep learning frameworks running on graphical cards exponentially increased the data processing capabilities allowing for higher precision deep convolutional networks, ConvNets, to be trained. Third, researchers applied unsupervised optimization objectives to ConvNets overcoming the hurdle of collecting expensive ground truth and fully exploiting the abundance of images available in datasets. This thesis addresses several proposals and their benefits for unsupervised depth estimation, i.e., (i) learning from resynthesized data, (ii) adversarial learning, (iii) coupling generator and discriminator losses for collaborative training, and (iv) self-improvement ability of the learned model. For the first two points, we developed a binocular stereo unsupervised depth estimation model that uses reconstructed data as an additional self-constraint during training. In addition to that, adversarial learning improves the quality of the reconstructions, further increasing the performance of the model. The third point is inspired by scene understanding as a structured task. A generator and a discriminator joining their efforts in a structured way improve the quality of the estimations. Our intuition may sound counterintuitive when cast in the general framework of adversarial learning. However, in our experiments we demonstrate the effectiveness of the proposed approach. Finally, self-improvement is inspired by estimation refinement, a widespread practice in dense reconstruction tasks like depth estimation. We devise a monocular unsupervised depth estimation approach, which measures the reconstruction errors in an unsupervised way, to produce a refinement of the depth predictions. Furthermore, we apply knowledge distillation to improve the student ConvNet with the knowledge of the teacher ConvNet that has access to the errors.
APA, Harvard, Vancouver, ISO, and other styles
11

Nassir, Cesar. "Domain-Independent Moving Object Depth Estimation using Monocular Camera." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233519.

Full text
Abstract:
Today automotive companies across the world strive to create vehicles with fully autonomous capabilities. There are many benefits of developing autonomous vehicles, such as reduced traffic congestion, increased safety and reduced pollution, etc. To be able to achieve that goal there are many challenges ahead, one of them is visual perception. Being able to estimate depth from a 2D image has been shown to be a key component for 3D recognition, reconstruction and segmentation. Being able to estimate depth in an image from a monocular camera is an ill-posed problem since there is ambiguity between the mapping from colour intensity and depth value. Depth estimation from stereo images has come far compared to monocular depth estimation and was initially what depth estimation relied on. However, being able to exploit monocular cues is necessary for scenarios when stereo depth estimation is not possible. We have presented a novel CNN network, BiNet which is inspired by ENet, to tackle depth estimation of moving objects using only a monocular camera in real-time. It performs better than ENet in the Cityscapes dataset while adding only a small overhead to the complexity.
I dag strävar bilföretag över hela världen för att skapa fordon med helt autonoma möjligheter. Det finns många fördelar med att utveckla autonoma fordon, såsom minskad trafikstockning, ökad säkerhet och minskad förorening, etc. För att kunna uppnå det målet finns det många utmaningar framåt, en av dem är visuell uppfattning. Att kunna uppskatta djupet från en 2D-bild har visat sig vara en nyckelkomponent för 3D-igenkännande, rekonstruktion och segmentering. Att kunna uppskatta djupet i en bild från en monokulär kamera är ett svårt problem eftersom det finns tvetydighet mellan kartläggningen från färgintensitet och djupvärde. Djupestimering från stereobilder har kommit långt jämfört med monokulär djupestimering och var ursprungligen den metod som man har förlitat sig på. Att kunna utnyttja monokulära bilder är dock nödvändig för scenarier när stereodjupuppskattning inte är möjligt. Vi har presenterat ett nytt nätverk, BiNet som är inspirerat av ENet, för att ta itu med djupestimering av rörliga objekt med endast en monokulär kamera i realtid. Det fungerar bättre än ENet med datasetet Cityscapes och lägger bara till en liten kostnad på komplexiteten.
APA, Harvard, Vancouver, ISO, and other styles
12

Palou, Visa Guillem. "Monocular depth estimation in images and sequences using occlusion cues." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/144653.

Full text
Abstract:
When humans observe a scene, they are able to perfectly distinguish the different parts composing it. Moreover, humans can easily reconstruct the spatial position of these parts and conceive a consistent structure. The mechanisms involving visual perception have been studied since the beginning of neuroscience but, still today, not all the processes composing it are known. In usual situations, humans can make use of three different methods to estimate the scene structure. The first one is the so called divergence and it makes use of both eyes. When objects lie in front of the observed at a distance up to hundred meters, subtle differences in the image formation in each eye can be used to determine depth. When objects are not in the field of view of both eyes, other mechanisms should be used. In these cases, both visual cues and prior learned information can be used to determine depth. Even if these mechanisms are less accurate than divergence, humans can almost always infer the correct depth structure when using them. As an example of visual cues, occlusion, perspective or object size provide a lot of information about the structure of the scene. A priori information depends on each observer, but it is normally used subconsciously by humans to detect commonly known regions such as the sky, the ground or different types of objects. In the last years, since technology has been able to handle the processing burden of vision systems, there has been lots of efforts devoted to design automated scene interpreting systems. In this thesis we address the problem of depth estimation using only one point of view and using only occlusion depth cues. The thesis objective is to detect occlusions present in the scene and combine them with a segmentation system so as to generate a relative depth order depth map for a scene. We explore both static and dynamic situations such as single images, frame inside sequences or full video sequences. In the case where a full image sequence is available, a system exploiting motion information to recover depth structure is also designed. Results are promising and competitive with respect to the state of the art literature, but there is still much room for improvement when compared to human depth perception performance.
Quan els humans observen una escena, son capaços de distingir perfectament les parts que la composen i organitzar-les espacialment per tal de poder-se orientar. Els mecanismes que governen la percepció visual han estat estudiats des dels principis de la neurociència, però encara no es coneixen tots els processos biològic que hi prenen part. En situacions normals, els humans poden fer servir tres eines per estimar l’estructura de l’escena. La primera és l’anomenada divergència. Aprofita l’ús de dos punts de vista (els dos ulls) i és capaç¸ de determinar molt acuradament la posició dels objectes ,que a una distància de fins a cent metres, romanen enfront de l’observador. A mesura que augmenta la distància o els objectes no es troben en el camp de visió dels dos ulls, altres mecanismes s’han d’utilitzar. Tant l’experiència anterior com certs indicis visuals s’utilitzen en aquests casos i, encara que la seva precisió és menor, els humans aconsegueixen quasi bé sempre interpretar bé el seu entorn. Els indicis visuals que aporten informació de profunditat més coneguts i utilitzats són per exemple, la perspectiva, les oclusions o el tamany de certs objectes. L’experiència anterior permet resoldre situacions vistes anteriorment com ara saber quins regions corresponen al terra, al cel o a objectes. Durant els últims anys, quan la tecnologia ho ha permès, s’han intentat dissenyar sistemes que interpretessin automàticament diferents tipus d’escena. En aquesta tesi s’aborda el tema de l’estimació de la profunditat utilitzant només un punt de vista i indicis visuals d’oclusió. L’objectiu del treball es la detecció d’aquests indicis i combinar-los amb un sistema de segmentació per tal de generar automàticament els diferents plans de profunditat presents a una escena. La tesi explora tant situacions estàtiques (imatges fixes) com situacions dinàmiques, com ara trames dins de seqüències de vídeo o seqüències completes. En el cas de seqüències completes, també es proposa un sistema automàtic per reconstruir l’estructura de l’escena només amb informació de moviment. Els resultats del treball son prometedors i competitius amb la literatura del moment, però mostren encara que la visió per computador té molt marge de millora respecte la precisió dels humans.
APA, Harvard, Vancouver, ISO, and other styles
13

何漢達 and Hon-tat Ho. "An integrated approach to depth estimation using a monocular image sequence." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31213108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Ho, Hon-tat. "An integrated approach to depth estimation using a monocular image sequence /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B17592045.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Schennings, Jacob. "Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-336923.

Full text
Abstract:
Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance.
APA, Harvard, Vancouver, ISO, and other styles
16

Pinard, Clément. "Robust Learning of a depth map for obstacle avoidance with a monocular stabilized flying camera." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLY003/document.

Full text
Abstract:
Le drone orienté grand public est principalement une caméra volante, stabilisée et de bonne qualité. Ceux-ci ont démocratisé la prise de vue aérienne, mais avec leur succès grandissant, la notion de sécurité est devenue prépondérante.Ce travail s'intéresse à l'évitement d'obstacle, tout en conservant un vol fluide pour l'utilisateur.Dans ce contexte technologique, nous utilisons seulement une camera stabilisée, par contrainte de poids et de coût.Pour leur efficacité connue en vision par ordinateur et leur performance avérée dans la résolution de tâches complexes, nous utilisons des réseaux de neurones convolutionnels (CNN). Notre stratégie repose sur un systeme de plusieurs niveaux de complexité dont les premieres étapes sont de mesurer une carte de profondeur depuis la caméra. Cette thèse étudie les capacités d'un CNN à effectuer cette tâche.La carte de profondeur, étant particulièrement liée au flot optique dans le cas d'images stabilisées, nous adaptons un réseau connu pour cette tâche, FlowNet, afin qu'il calcule directement la carte de profondeur à partir de deux images stabilisées. Ce réseau est appelé DepthNet.Cette méthode fonctionne en simulateur avec un entraînement supervisé, mais n'est pas assez robuste pour des vidéos réelles. Nous étudions alors les possibilites d'auto-apprentissage basées sur la reprojection différentiable d'images. Cette technique est particulièrement nouvelle sur les CNNs et nécessite une étude détaillée afin de ne pas dépendre de paramètres heuristiques.Finalement, nous développons un algorithme de fusion de cartes de profondeurs pour utiliser DepthNet sur des vidéos réelles. Plusieurs paires différentes sont données à DepthNet afin d'avoir une grande plage de profondeurs mesurées
Customer unmanned aerial vehicles (UAVs) are mainly flying cameras. They democratized aerial footage, but with thei success came security concerns.This works aims at improving UAVs security with obstacle avoidance, while keeping a smooth flight. In this context, we use only one stabilized camera, because of weight and cost incentives.For their robustness in computer vision and thei capacity to solve complex tasks, we chose to use convolutional neural networks (CNN). Our strategy is based on incrementally learning tasks with increasing complexity which first steps are to construct a depth map from the stabilized camera. This thesis is focused on studying ability of CNNs to train for this task.In the case of stabilized footage, the depth map is closely linked to optical flow. We thus adapt FlowNet, a CNN known for optical flow, to output directly depth from two stabilized frames. This network is called DepthNet.This experiment succeeded with synthetic footage, but is not robust enough to be used directly on real videos. Consequently, we consider self supervised training with real videos, based on differentiably reproject images. This training method for CNNs being rather novel in literature, a thorough study is needed in order not to depend too moch on heuristics.Finally, we developed a depth fusion algorithm to use DepthNet efficiently on real videos. Multiple frame pairs are fed to DepthNet to get a great depth sensing range
APA, Harvard, Vancouver, ISO, and other styles
17

Bartoli, Simone. "Deploying deep learning for 3D reconstruction from monocular video sequences." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22402/.

Full text
Abstract:
3D reconstruction from monocular video sequences is a field of increasingly interest in the late years. Before the growth of deep learning, the retrieve of depth information from single images was possible only with RGBD sensors or algorithmic approaches. However, the availability of more and more data has allowed the training of monocular depth estimation neural networks, introducing innovative data-driven techniques. Since recovering ground-truth labels for depth estimation is very challenging, most of the research has focused on unsupervised or semi-supervised training approaches. The currently state of the art for 3D reconstruction is defined by an algorithmic method which exploits a Structure from Motion and Multi-View Stereo pipeline. Nevertheless, the whole approach is based on keypoints extraction, which provides well-known limitations when it comes to texture-less, reflective and/or transparent surfaces. Consequentely, a possible way to predict dense depth maps even in absence of keypoints is by employing neural networks. This work proposes a novel data-driven pipeline for 3D reconstruction from monocular video sequences. It exploits a fine-tuning technique to adjust the weights of a pre-trained depth estimation neural network depending on the input scene. In doing so, the network can learn the features of a particular object and can provide semi real-time depth predictions for 3D reconstruction. Furthermore, the project provides a comparison with a custom implementation of the current state of the art approach and shows the potential of this innovative data-driven pipeline.
APA, Harvard, Vancouver, ISO, and other styles
18

Gampher, John Eric. "Perception of motion-in-depth induced motion effects on monocular and binocular cues /." Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2009r/gampher.pdf.

Full text
Abstract:
Thesis (Ph. D.)--University of Alabama at Birmingham, 2008.
Title from PDF title page (viewed Mar. 30, 2010). Additional advisors: Franklin R. Amthor, James E. Cox, Timothy J. Gawne, Rosalyn E. Weller. Includes bibliographical references (p. 104-114).
APA, Harvard, Vancouver, ISO, and other styles
19

Dey, Rohit. "MonoDepth-vSLAM: A Visual EKF-SLAM using Optical Flow and Monocular Depth Estimation." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627666226301079.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Ye, Mao. "MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA." UKnowledge, 2014. http://uknowledge.uky.edu/cs_etds/25.

Full text
Abstract:
Quasi-articulated objects, such as human beings, are among the most commonly seen objects in our daily lives. Extensive research have been dedicated to 3D shape reconstruction and motion analysis for this type of objects for decades. A major motivation is their wide applications, such as in entertainment, surveillance and health care. Most of existing studies relied on one or more regular video cameras. In recent years, commodity depth sensors have become more and more widely available. The geometric measurements delivered by the depth sensors provide significantly valuable information for these tasks. In this dissertation, we propose three algorithms for monocular pose estimation and shape reconstruction of quasi-articulated objects using a single commodity depth sensor. These three algorithms achieve shape reconstruction with increasing levels of granularity and personalization. We then further develop a method for highly detailed shape reconstruction based on our pose estimation techniques. Our first algorithm takes advantage of a motion database acquired with an active marker-based motion capture system. This method combines pose detection through nearest neighbor search with pose refinement via non-rigid point cloud registration. It is capable of accommodating different body sizes and achieves more than twice higher accuracy compared to a previous state of the art on a publicly available dataset. The above algorithm performs frame by frame estimation and therefore is less prone to tracking failure. Nonetheless, it does not guarantee temporal consistent of the both the skeletal structure and the shape and could be problematic for some applications. To address this problem, we develop a real-time model-based approach for quasi-articulated pose and 3D shape estimation based on Iterative Closest Point (ICP) principal with several novel constraints that are critical for monocular scenario. In this algorithm, we further propose a novel method for automatic body size estimation that enables its capability to accommodate different subjects. Due to the local search nature, the ICP-based method could be trapped to local minima in the case of some complex and fast motions. To address this issue, we explore the potential of using statistical model for soft point correspondences association. Towards this end, we propose a unified framework based on Gaussian Mixture Model for joint pose and shape estimation of quasi-articulated objects. This method achieves state-of-the-art performance on various publicly available datasets. Based on our pose estimation techniques, we then develop a novel framework that achieves highly detailed shape reconstruction by only requiring the user to move naturally in front of a single depth sensor. Our experiments demonstrate reconstructed shapes with rich geometric details for various subjects with different apparels. Last but not the least, we explore the applicability of our method on two real-world applications. First of all, we combine our ICP-base method with cloth simulation techniques for Virtual Try-on. Our system delivers the first promising 3D-based virtual clothing system. Secondly, we explore the possibility to extend our pose estimation algorithms to assist physical therapist to identify their patients’ movement dysfunctions that are related to injuries. Our preliminary experiments have demonstrated promising results by comparison with the gold standard active marker-based commercial system. Throughout the dissertation, we develop various state-of-the-art algorithms for pose estimation and shape reconstruction of quasi-articulated objects by leveraging the geometric information from depth sensors. We also demonstrate their great potentials for different real-world applications.
APA, Harvard, Vancouver, ISO, and other styles
21

Djikic, Addi. "Segmentation and Depth Estimation of Urban Road Using Monocular Camera and Convolutional Neural Networks." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235496.

Full text
Abstract:
Deep learning for safe autonomous transport is rapidly emerging. Fast and robust perception for autonomous vehicles will be crucial for future navigation in urban areas with high traffic and human interplay. Previous work focuses on extracting full image depth maps, or finding specific road features such as lanes. However, in urban environments lanes are not always present, and sensors such as LiDAR with 3D point clouds provide a quite sparse depth perception of road with demanding algorithmic approaches. In this thesis we derive a novel convolutional neural network that we call AutoNet. It is designed as an encoder-decoder network for pixel-wise depth estimation of an urban drivable free-space road, using only a monocular camera, and handled as a supervised regression problem. AutoNet is also constructed as a classification network to solely classify and segment the drivable free-space in real- time with monocular vision, handled as a supervised classification problem, which shows to be a simpler and more robust solution than the regression approach. We also implement the state of the art neural network ENet for comparison, which is designed for fast real-time semantic segmentation and fast inference speed. The evaluation shows that AutoNet outperforms ENet for every performance metrics, but shows to be slower in terms of frame rate. However, optimization techniques are proposed for future work, on how to advance the frame rate of the network while still maintaining the robustness and performance. All the training and evaluation is done on the Cityscapes dataset. New ground truth labels for road depth perception are created for training with a novel approach of fusing pre-computed depth maps with semantic labels. Data collection with a Scania vehicle is conducted, mounted with a monocular camera to test the final derived models. The proposed AutoNet shows promising state of the art performance in regards to road depth estimation as well as road classification.
Deep learning för säkra autonoma transportsystem framträder mer och mer inom forskning och utveckling. Snabb och robust uppfattning om miljön för autonoma fordon kommer att vara avgörande för framtida navigering inom stadsområden med stor trafiksampel. I denna avhandling härleder vi en ny form av ett neuralt nätverk som vi kallar AutoNet. Där nätverket är designat som en autoencoder för pixelvis djupskattning av den fria körbara vägytan för stadsområden, där nätverket endast använder sig av en monokulär kamera och dess bilder. Det föreslagna nätverket för djupskattning hanteras som ett regressions problem. AutoNet är även konstruerad som ett klassificeringsnätverk som endast ska klassificera och segmentera den körbara vägytan i realtid med monokulärt seende. Där detta är hanterat som ett övervakande klassificerings problem, som även visar sig vara en mer simpel och mer robust lösning för att hitta vägyta i stadsområden. Vi implementerar även ett av de främsta neurala nätverken ENet för jämförelse. ENet är utformat för snabb semantisk segmentering i realtid, med hög prediktions- hastighet. Evalueringen av nätverken visar att AutoNet utklassar ENet i varje prestandamätning för noggrannhet, men visar sig vara långsammare med avseende på antal bilder per sekund. Olika optimeringslösningar föreslås för framtida arbete, för hur man ökar nätverk-modelens bildhastighet samtidigt som man behåller robustheten.All träning och utvärdering görs på Cityscapes dataset. Ny data för träning samt evaluering för djupskattningen för väg skapas med ett nytt tillvägagångssätt, genom att kombinera förberäknade djupkartor med semantiska etiketter för väg. Datainsamling med ett Scania-fordon utförs även, monterad med en monoculär kamera för att testa den slutgiltiga härleda modellen. Det föreslagna nätverket AutoNet visar sig vara en lovande topp-presterande modell i fråga om djupuppskattning för väg samt vägklassificering för stadsområden.
APA, Harvard, Vancouver, ISO, and other styles
22

Paerhati, Paruku. "Real-time monocular depth mapping system using variance of focal plane and pixel focus measure." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113117.

Full text
Abstract:
Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 48).
Vision is one of the most powerful senses available to creatures. Undoubtedly, many of the fundamental operations of humans, such as the ability to plan paths, avoid obstacles, and recognize objects, depend heavily on their visual perception of the world around them. Although humans have naturally evolved to efficiently use their stereo optical prowess to develop an understanding of their environment, artificial machines and systems in comparison have just begun to utilize computer vision to create awareness of local physical entities. One of the most important sensory skills creatures have is depth perception, which allows them to estimate the relative distance of objects in their vision from many visual cues. Many systems have been developed to aid machines in perceiving the depth map of their environment, and each system has its drawbacks and benefits. In this paper, we introduce the design and implementation of a new system which provides a depth map from the use of a single optical camera with focal plane variation in the images taken. The paper focuses on the methods used to scale the depth from focus algorithm to perform in real-time. The results also showcase a real-time depth mapping system capable of providing rich depth maps of scenes at a high framerate and with advanced noise filtration techniques.
by Paruku Paerhati.
M. Eng.
APA, Harvard, Vancouver, ISO, and other styles
23

Tucker, Andrew James, and n/a. "Visual space attention in three-dimensional space." Swinburne University of Technology, 2006. http://adt.lib.swin.edu.au./public/adt-VSWT20070301.085637.

Full text
Abstract:
Current models of visual spatial attention are based on the extent to which attention can be allocated in 2-dimensional displays. The distribution of attention in 3-dimensional space has received little consideration. A series of experiments were devised to explore the apparent inconsistencies in the literature pertaining to the allocation of spatial attention in the third dimension. A review of the literature attributed these inconsistencies to differences and limitations in the various methodologies employed, in addition to the use of differing attentional paradigms. An initial aim of this thesis was to develop a highly controlled novel adaptation of the conventional robust covert orienting of visual attention task (COVAT) in depth defined by either binocular (stereoscopic) or monocular cues. The results indicated that attentional selection in the COVAT is not allocated within a 3-dimensional representation of space. Consequently, an alternative measure of spatial attention in depth, the overlay interference task, was successfully validated in a different stereoscopic depth environment and then manipulated to further examine the allocation of attention in depth. Findings from the overlay interference experiments indicated that attentional selection is based on a representation that includes depth information, but only when an additional feature can aid 3D selection. Collectively, the results suggest a dissociation between two paradigms that are both purported to be measures of spatial attention. There appears to be a further dissociation between 2-dimensional and 3-dimensional attentional selection in both paradigms for different reasons. These behavioural results, combined with recent electrophysiological evidence suggest that the temporal constraints of the 3D COVAT paradigm result in early selection based predominantly on retinotopic spatial coordinates prior to the complete construction of a 3-dimensional representation. Task requirements of the 3D overlay interference paradigm, on the other hand, while not being restricted by temporal constraints, demand that attentional selection occurs later, after the construction of a 3-dimensional representation, but only with the guidance of a secondary feature. Regardless of whether attentional selection occurs early or late, however, some component of selection appears to be based on viewer-centred spatial coordinates.
APA, Harvard, Vancouver, ISO, and other styles
24

JUNGÅKER, JONAS. "Monocular depth estimation for level assessment in an industrial waste management environment : A thesis within smart waste management." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-303107.

Full text
Abstract:
With the transition to Industry 4.0, actors in many industries face challenges such as how to successfully implement technical solutions and retain competitiv eadvantages. In the smart waste management sector, many solutions have been presented in how to create effecient sensors but a practical way of comparing these solutions is non-existent. From research within Industrial Internet of Things (IIoT) and interviews with operators at Scania, we present a clear and effective way of comparing smart waste management sensors with regards to operational effectiveness. Along with this, we present a way to measure  fill volume of garbage containers using monocular depth estimation and compare this to using ultrasonic sensors. Our findings show that depth estimation with deep convolutional neural networks is viable as long as environmental conditions can be controlled, although, we have also found that ultrasonic sensors outperform depth estimation in many metrics and is the desired way of measuring fill level of containers in many applications. Despite this, the results of this research show promise in that depth estimation can be used in conjunction with object recognition models, leading to the obsolescence of ultrasonic sensors in more complex applications.
Med den tekniska omvandlingen till Industri 4.0, ledande aktörer i många branscher ställs inför utmaningar såsom hur de ska implementera tekniska lösningar och bibehålla konkurrenskraft. Inom området för smart avfallshantering har många tekniska lösningar presenterats som på ett effektivt sätt mäter soptunnenivåer men ett praktiskt sätt att jämföra dessa lösningar saknas. Från forskning inom Industrial Internet of Things (IIoT) och intervjuer med operatörer på Scania har vi tagit fram ett koncist och konkret sätt att jämföra dessa lösningar med avseende på operativ effektivitet. Tillsammans med detta har vi också tagit fram en djupestimeringsmodell som med hjälp av djupa konvolutionsneuronsnätverk kan mäta fyllnadsvolymen av soptunnor. Vi har visat i vår forskning att detta djupestimeringsnätverk är ett möjligt alternativ till andra sensorer. Vi jämför sedan detta system mot ultraljudssensorer och har funnit att ultraljudssensorerna presterar bättreän djupestimeringsmodellen på  flera av de centrala mätvärdena. Trots detta så drog vi slutsatsen att vår metod att mäta fyllnadsvolym av soptunnor med hjälp av djupestimering kan användas tillsammans med objektigenkänning i mer komplexa applikationer för att undvika användandet av enklare sensorer, så som ultraljud.
APA, Harvard, Vancouver, ISO, and other styles
25

Ekström, Marcus. "Road Surface Preview Estimation Using a Monocular Camera." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151873.

Full text
Abstract:
Recently, sensors such as radars and cameras have been widely used in automotives, especially in Advanced Driver-Assistance Systems (ADAS), to collect information about the vehicle's surroundings. Stereo cameras are very popular as they could be used passively to construct a 3D representation of the scene in front of the car. This allowed the development of several ADAS algorithms that need 3D information to perform their tasks. One interesting application is Road Surface Preview (RSP) where the task is to estimate the road height along the future path of the vehicle. An active suspension control unit can then use this information to regulate the suspension, improving driving comfort, extending the durabilitiy of the vehicle and warning the driver about potential risks on the road surface. Stereo cameras have been successfully used in RSP and have demonstrated very good performance. However, the main disadvantages of stereo cameras are their high production cost and high power consumption. This limits installing several ADAS features in economy-class vehicles. A less expensive alternative are monocular cameras which have a significantly lower cost and power consumption. Therefore, this thesis investigates the possibility of solving the Road Surface Preview task using a monocular camera. We try two different approaches: structure-from-motion and Convolutional Neural Networks.The proposed methods are evaluated against the stereo-based system. Experiments show that both structure-from-motion and CNNs have a good potential for solving the problem, but they are not yet reliable enough to be a complete solution to the RSP task and be used in an active suspension control unit.
APA, Harvard, Vancouver, ISO, and other styles
26

Diskin, Yakov. "Dense 3D Point Cloud Representation of a Scene Using Uncalibrated Monocular Vision." University of Dayton / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1366386933.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Cavalcanti, Ugo Leone. "Miglioramento tramite reti monoculari di mappe di disparità ottenute da reti stereo." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
In questo lavoro si è provato ad introdurre una nuova architettura per migliorare le mappe di disparità ottenute con gli attuali algoritmi di stereo matching allo stato dell’arte. Nello specifico, l’approccio proposto prevede di sfruttare la sinergia tra una rete stereoscopica, tradizionalmente usata in maniera end-to-end, ed una mono, dedicata al miglioramento della mappa in uscita da quella stereo.
APA, Harvard, Vancouver, ISO, and other styles
28

Ali, Shahnewaz. "Robotic vision for knee arthroscopy." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/235890/1/Shahnewaz%2BAli%2BThesis%282%29.pdf.

Full text
Abstract:
This research focuses on visualisation challenges associated with anatomical imaging of complex joints such as the human knee. Current imaging systems are inadequate to provide 3D perception and lack the level of situational awareness needed for performing highly complex minimally invasive surgeries like knee arthroscopy. As a result, unintended tissue damage is common occurrence and training new surgeons takes a very long time. To improve surgical precision and training, this study presents a series of novel methods and computational tools that provide 3D perception for safer surgery with added ability of automatically recognition of multiple tissue types in real time.
APA, Harvard, Vancouver, ISO, and other styles
29

Banach, Artur. "Visual navigation in minimally invasive surgery." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/228730/1/Artur_Banach_Thesis.pdf.

Full text
Abstract:
This thesis investigates how to maximise spatial information from surgical camera images for robotic navigation. Image enhancement and depth estimation techniques are extended to not only improve the visual navigation capability in robotic surgery, but enable more accurate clinical diagnosis. Targeted applications include bronchoscopy and arthroscopy, with trials using real and synthetic data demonstrating potential for improved clinical outcomes of minimally invasive surgery.
APA, Harvard, Vancouver, ISO, and other styles
30

Moukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.

Full text
Abstract:
La vision par ordinateur est une branche de l'intelligence artificielle dont le but est de permettre à une machine d'analyser, de traiter et de comprendre le contenu d'images numériques. La compréhension de scène en particulier est un enjeu majeur en vision par ordinateur. Elle passe par une caractérisation à la fois sémantique et structurelle de l'image, permettant d'une part d'en décrire le contenu et, d'autre part, d'en comprendre la géométrie. Cependant tandis que l'espace réel est de nature tridimensionnelle, l'image qui le représente, elle, est bidimensionnelle. Une partie de l'information 3D est donc perdue lors du processus de formation de l'image et il est d'autant plus complexe de décrire la géométrie d'une scène à partir d'images 2D de celle-ci.Il existe plusieurs manières de retrouver l'information de profondeur perdue lors de la formation de l'image. Dans cette thèse nous nous intéressons à l’estimation d'une carte de profondeur étant donné une seule image de la scène. Dans ce cas, l'information de profondeur correspond, pour chaque pixel, à la distance entre la caméra et l'objet représenté en ce pixel. L'estimation automatique d'une carte de distances de la scène à partir d'une image est en effet une brique algorithmique critique dans de très nombreux domaines, en particulier celui des véhicules autonomes (détection d’obstacles, aide à la navigation).Bien que le problème de l'estimation de profondeur à partir d'une seule image soit un problème difficile et intrinsèquement mal posé, nous savons que l'Homme peut apprécier les distances avec un seul œil. Cette capacité n'est pas innée mais acquise et elle est possible en grande partie grâce à l'identification d'indices reflétant la connaissance a priori des objets qui nous entourent. Par ailleurs, nous savons que des algorithmes d'apprentissage peuvent extraire ces indices directement depuis des images. Nous nous intéressons en particulier aux méthodes d’apprentissage statistique basées sur des réseaux de neurones profond qui ont récemment permis des percées majeures dans de nombreux domaines et nous étudions le cas de l'estimation de profondeur monoculaire
Computer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
APA, Harvard, Vancouver, ISO, and other styles
31

Kaller, Ondřej. "Pokročilé metody snímání a hodnocení kvality 3D videa." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-369744.

Full text
Abstract:
Disertační práce se zabývá metodami snímání a hodnocení kvality 3D obrazů a videí. Po krátkém shrnutí fyziologie prostorového vnímání, obsahuje práce stav poznání v oblastech problému adaptivní paralaxy a konfigurace kamer pro snímání klasického stereopáru. Taktéž shrnuje dnešní možnosti odhadu hloubkové mapy. Zmíněny jsou aktivní i pasivní metody, detailněji je vysvětleno profilometrické skenování. Byly změřeny některé technické parametry dvou technologií současných 3D zobrazovačů, a to polarizačně-oddělujících a využívajících časový multiplex, například přeslechy mezi levým a pravým snímkem. Jádro práce tvoří nová metoda pro vytváření hloubkové mapy při snímání 3D scény, kterážto byla autorem navržena a testována. Inovativnost tohoto přístupu spočívá v chytré kombinaci současných aktivních a pasivních metod snímání hloubky scény, která vtipně využívá výhod obou metod. Nakonec jsou prezentovány výsledky subjektivních testů kvality 3D videa. Největší přínos zde má navržená metrika modelující výsledky subjektivních testů kvality 3D videa.
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Cheng-En, and 吳承恩. "Depth Estimation from Multiple Monocular Cues." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/25774947344479431961.

Full text
Abstract:
碩士
國立臺灣大學
資訊網路與多媒體研究所
100
3-D display technology is one of popular topics in recent years. In the process of generating 3-D visual perspective, depth map plays an important role in rebuilding the stereoscopic effect. So far, manually drawing the depth map is the mainstream in movie industry. However, it costs lots of money and time. Therefore, many automatic and semi-automatic depth estimation methods have been published in recent years. In this thesis, a three-phase and semi-automatic system is proposed. First, the input image/frames are analyzed to extract the information of scene. Then absolute depth estimation and relative depth estimation are employed to generate depth map. The proposed system is applicable to both single image and image sequence. For sequence inputs, temporal coherence is obtained to make the depth maps between frames being smooth and continuous. The experimental results show that this method can estimate depth successfully. The effectiveness of the proposed system also gives development for automatic depth estimation. With the improvement of segmentation algorithm, generating depth map automatically will come true in the future.
APA, Harvard, Vancouver, ISO, and other styles
33

Chang, Yu-Tzu, and 章祐慈. "Learning 3D Geometry for Monocular Depth Estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/y5k7aw.

Full text
Abstract:
碩士
國立交通大學
資訊科學與工程研究所
107
Abstract In this thesis, we proposed a convolutional neural network (CNN) for monocular depth estimation. In previous depth estimation works, various approaches only take RGB images as input to reconstruct dense depth map. These approaches may have a limit on the accuracy of the predicted depth value. In the proposed method, we take RGB image and corresponding sparse depth information as input, extract both multi-scale context features and multi-resolution spatial features to reconstruct dense depth map. By utilizing the sparse depth information, we can significantly improve the accuracy of prediction depth map. Moreover, we introduce the concept of multi-view learning to our network, compute the photometric consistency between reference and neighborhood views. It provides the geometry constraint and helps network to recover a more complete depth map. The proposed network can efficiently predict accurate depth maps full of details through sparse depth information and geometry cues. In addition, we use the depth map predicted by our method to demonstrate the network ability on 3D reconstruction task. The 3D point clouds can be reconstructed well even in ground-truthless areas, such as textureless and reflective materials. In conclusion, the proposed network takes RGB image and sparse depth information as input, and learns the geometry constraint to predict depth map. The proposed network provides dense depth map in both accurate depth values and high visualization quality on variety datasets, including RGBD, SUN3D, MVS and ETH3D.
APA, Harvard, Vancouver, ISO, and other styles
34

Yin, Wei. "3D Scene Reconstruction from A Monocular Image." Thesis, 2022. https://hdl.handle.net/2440/134585.

Full text
Abstract:
3D scene reconstruction is a fundamental task in computer vision. The established approaches to address this task are based on multi-view geometry, which create correspondence of feature points with consecutive frames or multiple views. Finally, 3D information of these feature points can be recovered. In contrast, we aim to achieve dense 3D scene shape reconstruction from a single in-the-wild image. Without multiple views available, we rely on deep learning techniques. Recently, deep neural networks have been the dominant solution for various computer vision problems. Thus, we propose a two stage method based on learning-based methods. Firstly, we employ fully-convolutional neural networks to learn accurate depth from a monocular image. To recover high-quality depth, we lift the depth to 3D space and propose a global geometric constraint, termed virtual normal loss. To improve the generalization ability of the monocular depth estimation module, we construct a large-scale and diverse dataset and propose to learn the affine-invariant depth on that. Experiments demonstrate that our monocular depth estimation methods can robustly work in the wild and recover high-quality 3D geometry information. Furthermore, we propose a novel second stage to predict the focal length with a point cloud network. Instead of directly predicting it, the point cloud module leverages point cloud encoder networks that predict focal length adjustment factors from an initial guess of the scene point cloud reconstruction. The domain gap is significantly less of an issue for point clouds than that for images. Combing two stage modules together, 3D shape can be recovered from a single image input. Note that such reconstruction is up to a scale. To recover metric 3D shape, we propose to input the sparse points as guidance. Our proposed training method can significantly improve the robustness of the system, including robustness to various sparsity patterns and diverse scenes.
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
APA, Harvard, Vancouver, ISO, and other styles
35

Su, Wei-Cheng, and 蘇偉誠. "Unsupervised Monocular Depth Estimation Using Spatial-Temporal Information." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ns39f3.

Full text
Abstract:
碩士
國立交通大學
電子研究所
107
Depth estimation is using on lots of applications in our life. First, the 3D information in depth map assists many real application. For example, 3D reconstruction, human machine interaction, virtual/ augmented reality, and navigation. Next, many tasks can be simplified with the help of information of depth map. Such as, simply setting a threshold in RGB-D images can separate the human and background in 3D human pose estimation task, then applying random forest to regress the position of joints. Another example is simultaneous localization and mapping(SLAM). It is clear to see that using RGB-D sensor is much more reliable than simply using monocular camera on visual SLAM. The RGB-D sensor can provide more information to help the tracking and mapping. It may be said without fear of exaggeration that lots of task would be benefit from the depth information. In theis thesis, we would foucus on monocular unsupervised depth estimation. We refine monodepth model and utilize atrous convolution to enlarge receptive field. So that the accuracy of our model will increase. In addition, we propose a model training with spatial-temporal information. With the help of learning pose transformation, the performances in different datasets are increase. Lastly, temporal branch increases the inference speed.
APA, Harvard, Vancouver, ISO, and other styles
36

Bian, Jiawang. "Self-supervised Learning of Monocular Depth from Video." Thesis, 2022. https://hdl.handle.net/2440/136692.

Full text
Abstract:
Image-based depth estimation as a fundamental problem in computer vision allows for understanding the scene geometry using only cameras. This thesis addresses the specific problem of monocular depth estimation via self-supervised learning from RGB-only videos. Although existing work has shown partial excellent results in benchmark datasets, there remain several vital challenges that limit the use of these algorithms in general scenarios. To summarize, my identified challenges and contributions include: (i) Previous methods predict inconsistent depths over a video, which limits their uses in visual localization and mapping. To this end, I propose a geometry consistency loss that penalizes the multi-view depth misalignment in training, which enables scale-consistent depth estimation at inference time; (ii) Previous methods often diverge or show low-accuracy results when training on handheld camera captured videos. To address the challenge, I analyze the effect of camera motion on depth network gradients, and I propose an auto-rectify network to remove the relative rotation in training image pairs for robust learning; (iii) Previous methods fail to learn reasonable depths from highly dynamic scenes due to the non-rigidity. In this scenario, I propose a novel method, which constrains dynamic regions using an external well-trained depth estimation network and supervises static regions via multi-view losses. Comprehensive quantitative results and rich qualitative results are provided to demonstrate the advantages of the proposed methods over existing alternatives. The codes and pre-trained models have been released at https://github.com/JiawangBian
Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
APA, Harvard, Vancouver, ISO, and other styles
37

Chen, L., W. Tang, Tao Ruan Wan, and N. W. John. "Self-supervised monocular image depth learning and confidence estimation." 2019. http://hdl.handle.net/10454/17908.

Full text
Abstract:
No
We present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
APA, Harvard, Vancouver, ISO, and other styles
38

CHANG, PO-CHAO, and 張博詔. "Excluding non-matched patches to do unsupervised monocular depth estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/3r3z6n.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

"Monocular Depth Estimation with Edge-Based Constraints and Active Learning." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.54881.

Full text
Abstract:
abstract: The ubiquity of single camera systems in society has made improving monocular depth estimation a topic of increasing interest in the broader computer vision community. Inspired by recent work in sparse-to-dense depth estimation, this thesis focuses on sparse patterns generated from feature detection based algorithms as opposed to regular grid sparse patterns used by previous work. This work focuses on using these feature-based sparse patterns to generate additional depth information by interpolating regions between clusters of samples that are in close proximity to each other. These interpolated sparse depths are used to enforce additional constraints on the network’s predictions. In addition to the improved depth prediction performance observed from incorporating the sparse sample information in the network compared to pure RGB-based methods, the experiments show that actively retraining a network on a small number of samples that deviate most from the interpolated sparse depths leads to better depth prediction overall. This thesis also introduces a new metric, titled Edge, to quantify model performance in regions of an image that show the highest change in ground truth depth values along either the x-axis or the y-axis. Existing metrics in depth estimation like Root Mean Square Error(RMSE) and Mean Absolute Error(MAE) quantify model performance across the entire image and don’t focus on specific regions of an image that are hard to predict. To this end, the proposed Edge metric focuses specifically on these hard to classify regions. The experiments also show that using the Edge metric as a small addition to existing loss functions like L1 loss in current state-of-the-art methods leads to vastly improved performance in these hard to classify regions, while also improving performance across the board in every other metric.
Dissertation/Thesis
Masters Thesis Computer Engineering 2019
APA, Harvard, Vancouver, ISO, and other styles
40

Lin, Xinyi, and 林心怡. "Enhancing Unsupervised Monocular Depth Estimation via Fusing Layer-wised Features." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qg72r4.

Full text
Abstract:
碩士
國立臺灣大學
資訊網路與多媒體研究所
107
Recently, deep methods have shown good performance in depth estimation and Visual Odometry from monocular video sequence by optimizing the photometric consistency between frames. However, it remains hard to obtain large-scale ground truth depth maps for supervising a neural network for depth estimation. Meanwhile, existing solutions for depth estimation typically produce low resolution results. Inspired by recent deep learning methods for semantic segmentation, we present a simple but effective unsupervised learning deep network for more accurate depth estimation and camera motion estimation. An atrous spatial pyramid pooling module and an additional refinement layer are combined to an encoder-decoder base model. Besides, we introduce a consistency-regularization loss to increase the robustness towards handling illumination change. Our approach produces high-resolution depth maps with sharper object boundaries and achieve better results on the KITTI benchmark.
APA, Harvard, Vancouver, ISO, and other styles
41

KE, MIN-HUNG, and 柯旻宏. "Monocular Depth Estimation and Collision Avoidance on a Multirotor Drone." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7z7n9k.

Full text
Abstract:
碩士
國立聯合大學
電機工程學系碩士班
107
The thesis mainly presents a multirotor drone capable of autonomous flight and obstacle avoidance in an outdoor environment. The system consists of Pixhawk, Raspberry Pi and notebook. It uses deep learning, image processing and control strategies to achieve outdoor autonomous flight. The overall system is composed of a multirotor drone and a ground control station. We used a single-lens camera to get a single image,and the image is transmitted back to the ground control station by real time messaging protocol. Then, Socket API method returns obstacle avoidance information to Raspberry Pi on the multirotor drone. Moreover, obstacle distance detection is based on deep learning. We applied offline training to collect the training data that are captured by stereo camera. Left and right images at the same time and the same level input to the convolutional neural network. The output is a disparity map of a single image. In order to find the conversion relationship between the disparity value and the real distance, we used the curve fitting method. The true distance is measured using a laser range finder. Therefore, we can get the true distance of each pixel in a single image. Then, we have depth map and use image processing techniques to find the flightable area. And use the flightable area to calculate the multirotor that should go straight or turn left or turn right to carry out the mission. Finally, Raspberry Pi and Pixhawk communicate using Dronekit-Python when Raspberry get obstacle avoidance information on the multirotor dorne. When the multirotor drone encounters obstacle during the autonomous flight, it can instantly change the flight attitude and smoothly avoid the obstacles ahead to complete the multirotor drone automatic obstacle avoidance function. The user can set a target to make the multirotor dorne take off toward the target after taking off. During the flight, if the multirotor drone encounters an obstacle, the obstacle information can be obtained and the original flight path can be taken to avoid obstacles. The multirotor drone flies towards the target point again after the obstacle avoidance is completed. In the end, the multirotor dorne arrives at set target point for landing to complete the function of automatic obstacle avoidance during the autonomous flight.
APA, Harvard, Vancouver, ISO, and other styles
42

Huang, Yao-Pao, and 黃耀葆. "Transfer2Depth: Dual attention network with transfer learning for monocular depth estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qcx3q7.

Full text
Abstract:
碩士
國立中山大學
電機工程學系研究所
107
Resolving depth from monocular RGB image has been a long-standing task in computer vision and robotics. In this work we propose a monocular depth estimation method which takes only a single image as input. Unlike most existing learning based methods taking two images as input, our network has the advantage of high applicability while it does not require sufficient and static camera motion to reach optimal performance. We also propose a spatial-channel attention module to improve feature extraction. The proposed methods utilize transfer learning to achieve higher estimation accuracy while using less training data and less training epochs. The experimental results show that the proposed method outperforms state-of-the-art singe image depth estimation methods.
APA, Harvard, Vancouver, ISO, and other styles
43

Chiu, Mian-Jhong, and 邱勉中. "Real-time Monocular Depth Estimation with Extremely Light-weight Neural Network." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4p2ddx.

Full text
Abstract:
碩士
國立交通大學
多媒體工程研究所
107
Obstacle avoidance and environment sensing are crucial applications in autonomous driving and robotics. Among all types of sensors, camera is widely used in these applications because it can offer rich visual contents with relatively low-cost. Thus, using images from a single camera to perform depth estimation became one of the main focus in resent research works. However, prior works usually rely on highly complicated computation and power-consuming equipment to achieve such task; therefore, we focus on developing a real-time light-weight system for depth prediction in this thesis. Based on the well-known encoder-decoder architecture, we propose a supervised learning-based CNN with detachable decoders that outputs predicted depth maps with multiple resolutions. We also formulate a novel multi-task loss function for each decoder block, which considers both depth map and semantic segmentation simultaneously to encourage model convergence as well as to speed up the training process. To train our model on KITTI dataset, we generate depth map and semantic segmentation via PSMNet and DeepLabV3, respectively as ground truth, and test various pre-processing methods. We also collect a synthetic dataset in AirSim with a wide range of cameras views to evaluate the proposed depth estimation approach in terms of robustness. Via a series of ablation studies and experiments, it is validated that our model can efficiently performs real-time depth prediction with few parameters and fairly low computation cost, with the best trained model outperforms previous works on KITTI dataset for various evaluation matrices. Trained and tested on our AirSim dataset, our model also shown to be able to deal with images captured with quite different camera poses and altitudes.
APA, Harvard, Vancouver, ISO, and other styles
44

Torralba, Antonio, and Aude Oliva. "Global Depth Perception from Familiar Scene Structure." 2001. http://hdl.handle.net/1721.1/7267.

Full text
Abstract:
In the absence of cues for absolute depth measurements as binocular disparity, motion, or defocus, the absolute distance between the observer and a scene cannot be measured. The interpretation of shading, edges and junctions may provide a 3D model of the scene but it will not inform about the actual "size" of the space. One possible source of information for absolute depth estimation is the image size of known objects. However, this is computationally complex due to the difficulty of the object recognition process. Here we propose a source of information for absolute depth estimation that does not rely on specific objects: we introduce a procedure for absolute depth estimation based on the recognition of the whole scene. The shape of the space of the scene and the structures present in the scene are strongly related to the scale of observation. We demonstrate that, by recognizing the properties of the structures present in the image, we can infer the scale of the scene, and therefore its absolute mean depth. We illustrate the interest in computing the mean depth of the scene with application to scene recognition and object detection.
APA, Harvard, Vancouver, ISO, and other styles
45

Huang, Po-Yu, and 黃柏諭. "Supervised Monocular Depth Estimation Using Deep Neural Network in Robotic Operating System." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/3ztypn.

Full text
Abstract:
碩士
國立交通大學
電子研究所
107
Recently, unmanned aerial vehicles (UAVs) not only play an important role in military use, commercial applications like damage assessment, environment monitoring and pesticide spraying. More accurate and reliable technological capabilities such as autonomous flying, obstacle avoidance, battery performance, and localization are required. Deep neural networks (DNNs) are leading huge improvements in many artificial intelligence (AI) tasks such as image classification, object detection, and image segmentation, which makes UAVs become one of the important AI commercial technologies that can provide potential applications like rescue, transportation and monitoring services. However, UAVs are powered by battery which limits the flight time and payload capacity. When developing deep learning algorithms inferred on UAV, platform resources should be considered to achieve better accuracy versus latency trade-offs. In order to automatically fly a drone, we develop a monocular depth estimation algorithm based on deep neural network that takes a RGB image and predicts correspond depth image. The depth image is further transformed into point cloud and occupancy grid map in Robotic Operating System (ROS) so the drone is equipped with the knowledge of its surrounding environment. This information is critical to obstacle avoidance and path planning algorithms. After reducing the model complexity and compiled with an open-sourced compiler TVM, the proposed depth estimation algorithm has been deployed to energy efficient embedded systems Nvidia Tegra X1 (TX1) by taking the advantage of fast and powerful deep neural network with only 6.1 M parameters and reaches 14 FPS to estimate depth.
APA, Harvard, Vancouver, ISO, and other styles
46

Fan, Chen-shuo, and 范辰碩. "Monocular Vision Based Depth Map Extraction Method for 2D to 3D Video Conversion." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/16383026759959088781.

Full text
Abstract:
碩士
國立中央大學
電機工程學系
101
There are two semi-automatic depth map extraction methods for stereo video conversion presented in this thesis. Due to demand of 3D visualization and lack of 3D video content, we must develop low cost and high efficiency post processing methods to convert efficiently from 2D to 3D video if everyone wants to enjoy vivid 3D video. For static background video sequence, we proposed a method that is combined foreground segmentation with vanishing line technology of monocular depth cue. According to the results of separated foreground and background from foreground segmentation algorithm, viewer could use their acquired visual experience to assign computer some depth information of background at initialization step. Then, foreground would be obtained relative depth information form background depth map. This algorithm could be operated at 0.17s/frame in CIF size video under nearly 3D visualization to other references from our experimental results. Moreover, we proposed another conversion method followed conception as mentioned above for dynamic background video sequence. Foreground segmentation was replaced by relative velocity estimation based on motion estimation and motion compensation. Although this method is not able to attend equally quality of foreground segmentation method, this method still has wide utility and could be operated at 0.15s/frame in CIF size video.
APA, Harvard, Vancouver, ISO, and other styles
47

Chen, Ting-Wei, and 陳庭瑋. "Image Depth Initialization and Fuzzy Data Association for Aerial Robot Monocular Visual Localization and Mapping." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/07348162565621224367.

Full text
Abstract:
碩士
淡江大學
機械與機電工程學系碩士班
102
This study investigates the issues of visual sensor assisted aerial robot navigation. The major objectives are to provide the aerial robot the capabilities of localization and mapping in global positioning system (GPS) denied environments. When the aerial robot navigates in a GPS-denied environment, the visual sensor could provide the measurement for robot state estimation and environmental mapping. Considering the carrying capacity of the aerial robot, single camera is used in this study and the image is transmitted to PC-based controller for image processing using a radio frequency module. The extended Kalman filter is used as the state estimator to recursively predict and update the states of the aerial robot and the environment landmarks. For the monocular vision sensor, the image depth is represented by using the inverse depth parameterization method and the image features initialization is achieved by a non-delayed procedure. The results of this study are twofold. First, an ultrasonic sensor is used to provide one-dimensional distance measurement and solve the image depth estimation problem of monocular vision. Second, a novel data association procedure is designed based on fuzzy system in order to improve the performance of map management. The software program of the robot navigation system is developed in a PC-based controller using Microsoft Visual Studio C++. The navigation system integrates the sensor inputs, image processing, and state estimation. The resultant system is used to perform the tasks of simultaneous localization and mapping for aerial robots.
APA, Harvard, Vancouver, ISO, and other styles
48

Ju-PengHuang and 黃如鵬. "Realization of Depth Estimation from Monocular Camera Based on Defocus Algorithm and Reverse Heat Equation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/rtgkgk.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Zhuo, Wei. "2D+3D Indoor Scene Understanding from a Single Monocular Image." Phd thesis, 2018. http://hdl.handle.net/1885/144616.

Full text
Abstract:
Scene understanding, as a broad field encompassing many subtopics, has gained great interest in recent years. Among these subtopics, indoor scene understanding, having its own specific attributes and challenges compared to outdoor scene under- standing, has drawn a lot of attention. It has potential applications in a wide variety of domains, such as robotic navigation, object grasping for personal robotics, augmented reality, etc. To our knowledge, existing research for indoor scenes typically makes use of depth sensors, such as Kinect, that is however not always available. In this thesis, we focused on addressing the indoor scene understanding tasks in a general case, where only a monocular color image of the scene is available. Specifically, we first studied the problem of estimating a detailed depth map from a monocular image. Then, benefiting from deep-learning-based depth estimation, we tackled the higher-level tasks of 3D box proposal generation, and scene parsing with instance segmentation, semantic labeling and support relationship inference from a monocular image. Our research on indoor scene understanding provides a comprehensive scene interpretation at various perspectives and scales. For monocular image depth estimation, previous approaches are limited in that they only reason about depth locally on a single scale, and do not utilize the important information of geometric scene structures. Here, we developed a novel graphical model, which reasons about detailed depth while leveraging geometric scene structures at multiple scales. For 3D box proposals, to our best knowledge, our approach constitutes the first attempt to reason about class-independent 3D box proposals from a single monocular image. To this end, we developed a novel integrated, differentiable framework that estimates depth, extracts a volumetric scene representation and generates 3D proposals. At the core of this framework lies a novel residual, differentiable truncated signed distance function module, which is able to handle the relatively low accuracy of the predicted depth map. For scene parsing, we tackled its three subtasks of instance segmentation, se- mantic labeling, and the support relationship inference on instances. Existing work typically reasons about these individual subtasks independently. Here, we leverage the fact that they bear strong connections, which can facilitate addressing these sub- tasks if modeled properly. To this end, we developed an integrated graphical model that reasons about the mutual relationships of the above subtasks. In summary, in this thesis, we introduced novel and effective methodologies for each of three indoor scene understanding tasks, i.e., depth estimation, 3D box proposal generation, and scene parsing, and exploited the dependencies on depth estimates of the latter two tasks. Evaluation on several benchmark datasets demonstrated the effectiveness of our algorithms and the benefits of utilizing depth estimates for higher-level tasks.
APA, Harvard, Vancouver, ISO, and other styles
50

Huang, Jian-hao, and 黃建豪. "Dense Piecewise Planar Reconstruction based on Low Gradient Region Depth Estimation from a Monocular Image Sequence." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ujhn5v.

Full text
Abstract:
碩士
國立臺灣大學
電機工程學研究所
106
Visual navigation of robot has been a popular and a challenge research topic in past few years. One of important part for navigation is environment sensing. Especially for previously unknown and GPS-denied environments, this thesis uses monocular camera to obtain image data and estimates the depth map information in each keyframe by LSD SLAM [11: Engel et al. 2014]. RGB image and depth map in each keyframe are extracted to detect low texture regions by region growing segmentation method. The assumption made is that image areas with low photometric gradients are mostly planar which is met in most indoors and man-made scene. This thesis proposes a depth filling method to optimize the depth map completeness in each keyframe. It can provide robot more environment information to apply on navigation. For monocular unknown scalar problem, the assigned marker in the scene is used to compute the scale. However, the estimated scale is used to define the thresholds that are used to filter out the unreasonable plane estimation in depth filling process. This thesis compares the depth filling method against several alternatives using Gazebo simulation [35: Gazebo from OSRF, Inc], public Tum dataset [23: Sturm et al. 2012], and experiment with a Microsoft Kinect sensor. The comparison demonstrate that our depth filling method for piecewise planar monocular SLAM is denser than LSD SLAM [11: Engel et al. 2014] and DPPTAM [12: Concha & Civera 2015].
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography