Dissertations / Theses on the topic 'Monocular depth'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Monocular depth.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Andraghetti, Lorenzo. "Monocular Depth Estimation enhancement by depth from SLAM Keypoints." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16626/.
Full textPinheiro, de Carvalho Marcela. "Deep Depth from Defocus : Neural Networks for Monocular Depth Estimation." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS609.
Full textDepth estimation from a single image is a key instrument for several applications from robotics to virtual reality. Successful Deep Learning approaches in computer vision tasks as object recognition and classification also benefited the domain of depth estimation. In this thesis, we develop methods for monocular depth estimation with deep neural network by exploring different cues: defocus blur and semantics. We conduct several experiments to understand the contribution of each cue in terms of generalization and model performance. At first, we propose an efficient convolutional neural network for depth estimation along with a conditional Generative Adversarial framework. Our method achieves performances among the best on standard datasets for depth estimation. Then, we propose to explore defocus blur cues, which is an optical information deeply related to depth. We show that deep models are able to implicitly learn and use this information to improve performance and overcome known limitations of classical Depth-from-Defocus. We also build a new dataset with real focused and defocused images that we use to validate our approach. Finally, we explore the use of semantic information, which brings rich contextual information while learned jointly to depth on a multi-task approach. We validate our approaches with several datasets containing indoor, outdoor and aerial images
Cheda, Diego. "Monocular Depth Cues in Computer Vision Applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2012. http://hdl.handle.net/10803/121644.
Full textDepth perception is a key aspect of human vision. It is a routine and essential visual task that the human do effortlessly in many daily activities. This has often been associated with stereo vision, but humans have an amazing ability to perceive depth relations even from a single image by using several monocular cues. In the computer vision field, if image depth information were available, many tasks could be posed from a different perspective for the sake of higher performance and robustness. Nevertheless, given a single image, this possibility is usually discarded, since obtaining depth information has frequently been performed by three-dimensional reconstruction techniques, requiring two or more images of the same scene taken from different viewpoints. Recently, some proposals have shown the feasibility of computing depth information from single images. In essence, the idea is to take advantage of a priori knowledge of the acquisition conditions and the observed scene to estimate depth from monocular pictorial cues. These approaches try to precisely estimate the scene depth maps by employing computationally demanding techniques. However, to assist many computer vision algorithms, it is not really necessary computing a costly and detailed depth map of the image. Indeed, just a rough depth description can be very valuable in many problems. In this thesis, we have demonstrated how coarse depth information can be integrated in different tasks following holistic and alternative strategies to obtain more precise and robustness results. In that sense, we have proposed a simple, but reliable enough technique, whereby image scene regions are categorized into discrete depth ranges to build a coarse depth map. Based on this representation, we have explored the potential usefulness of our method in three application domains from novel viewpoints: camera rotation parameters estimation, background estimation and pedestrian candidate generation. In the first case, we have computed camera rotation mounted in a moving vehicle from two novels methods that identify distant elements in the image, where the translation component of the image flow field is negligible. In background estimation, we have proposed a novel method to reconstruct the background by penalizing close regions in a cost function, which integrates color, motion, and depth terms. Finally, we have benefited of geometric and depth information available on single images for pedestrian candidate generation to significantly reduce the number of generated windows to be further processed by a pedestrian classifier. In all cases, results have shown that our depth-based approaches contribute to better performances.
Toschi, Marco. "Towards Monocular Depth Estimation for Robot Guidance." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textRovinelli, Marco. "Realtime Monocular Depth Estimation on Mobile Phones." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24159/.
Full textRivero, Pindado Víctor. "Monocular visual SLAM based on Inverse depth parametrization." Thesis, Mälardalen University, School of Innovation, Design and Engineering, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-10166.
Full textThe first objective of this research has always been carry out a study of visual techniques SLAM (Simultaneous localization and mapping), specifically the type monovisual, less studied than the stereo. These techniques have been well studied in the world of robotics. These techniques are focused on reconstruct a map of the robot enviroment while maintaining its position information in that map. We chose to investigate a method to encode the points by the inverse of its depth, from the first time that the feature was observed. This method permits efficient and accurate representation of uncertainty during undelayed initialization and beyond, all within the standard extended Kalman filter (EKF).At first, the study mentioned it should be consolidated developing an application that implements this method. After suffering various difficulties, it was decided to make use of a platform developed by the same author of Slam method mentioned in MATLAB. Until then it had developed the tasks of calibration, feature extraction and matching. From that point, that application was adapted to the characteristics of our camera and our video to work. We recorded a video with our camera following a known trajectory to check the calculated path shown in the application. Corroborating works and studying the limitations and advantages of this method.
Chan, Kevin S. (Kevin Sao Wei). "Multiview monocular depth estimation using unsupervised learning methods." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/119753.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 50-51).
Existing learned methods for monocular depth estimation use only a single view of scene for depth evaluation, so they inherently overt to their training scenes and cannot generalize well to new datasets. This thesis presents a neural network for multiview monocular depth estimation. Teaching a network to estimate depth via structure from motion allows it to generalize better to new environments with unfamiliar objects. This thesis extends recent work in unsupervised methods for single-view monocular depth estimation and uses the reconstruction losses for training posed in those works. Models and baseline models were evaluated on a variety of datasets and results indicate that indicate multiview models generalize across datasets better than previous work. This work is unique in that it emphasizes cross domain performance and ability to generalize more so than performance on the training set.
by Kevin S. Chan.
M. Eng.
Larsson, Susanna. "Monocular Depth Estimation Using Deep Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159981.
Full textMöckelind, Christoffer. "Improving deep monocular depth predictions using dense narrow field of view depth images." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235660.
Full textI det här arbetet studerar vi ett djupapproximationsproblem där vi tillhandahåller en djupbild med smal synvinkel och en RGB-bild med bred synvinkel till ett djupt nätverk med uppgift att förutsäga djupet för hela RGB-bilden. Vi visar att genom att ge djupbilden till nätverket förbättras resultatet för området utanför det tillhandahållna djupet jämfört med en existerande metod som använder en RGB-bild för att förutsäga djupet. Vi undersöker flera arkitekturer och storlekar på djupbildssynfält och studerar effekten av att lägga till brus och sänka upplösningen på djupbilden. Vi visar att större synfält för djupbilden ger en större fördel och även att modellens noggrannhet minskar med avståndet från det angivna djupet. Våra resultat visar också att modellerna som använde sig av det brusiga lågupplösta djupet presterade på samma nivå som de modeller som använde sig av det omodifierade djupet.
Pilzer, Andrea. "Learning Unsupervised Depth Estimation, from Stereo to Monocular Images." Doctoral thesis, Università degli studi di Trento, 2020. http://hdl.handle.net/11572/268252.
Full textNassir, Cesar. "Domain-Independent Moving Object Depth Estimation using Monocular Camera." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233519.
Full textI dag strävar bilföretag över hela världen för att skapa fordon med helt autonoma möjligheter. Det finns många fördelar med att utveckla autonoma fordon, såsom minskad trafikstockning, ökad säkerhet och minskad förorening, etc. För att kunna uppnå det målet finns det många utmaningar framåt, en av dem är visuell uppfattning. Att kunna uppskatta djupet från en 2D-bild har visat sig vara en nyckelkomponent för 3D-igenkännande, rekonstruktion och segmentering. Att kunna uppskatta djupet i en bild från en monokulär kamera är ett svårt problem eftersom det finns tvetydighet mellan kartläggningen från färgintensitet och djupvärde. Djupestimering från stereobilder har kommit långt jämfört med monokulär djupestimering och var ursprungligen den metod som man har förlitat sig på. Att kunna utnyttja monokulära bilder är dock nödvändig för scenarier när stereodjupuppskattning inte är möjligt. Vi har presenterat ett nytt nätverk, BiNet som är inspirerat av ENet, för att ta itu med djupestimering av rörliga objekt med endast en monokulär kamera i realtid. Det fungerar bättre än ENet med datasetet Cityscapes och lägger bara till en liten kostnad på komplexiteten.
Palou, Visa Guillem. "Monocular depth estimation in images and sequences using occlusion cues." Doctoral thesis, Universitat Politècnica de Catalunya, 2014. http://hdl.handle.net/10803/144653.
Full textQuan els humans observen una escena, son capaços de distingir perfectament les parts que la composen i organitzar-les espacialment per tal de poder-se orientar. Els mecanismes que governen la percepció visual han estat estudiats des dels principis de la neurociència, però encara no es coneixen tots els processos biològic que hi prenen part. En situacions normals, els humans poden fer servir tres eines per estimar l’estructura de l’escena. La primera és l’anomenada divergència. Aprofita l’ús de dos punts de vista (els dos ulls) i és capaç¸ de determinar molt acuradament la posició dels objectes ,que a una distància de fins a cent metres, romanen enfront de l’observador. A mesura que augmenta la distància o els objectes no es troben en el camp de visió dels dos ulls, altres mecanismes s’han d’utilitzar. Tant l’experiència anterior com certs indicis visuals s’utilitzen en aquests casos i, encara que la seva precisió és menor, els humans aconsegueixen quasi bé sempre interpretar bé el seu entorn. Els indicis visuals que aporten informació de profunditat més coneguts i utilitzats són per exemple, la perspectiva, les oclusions o el tamany de certs objectes. L’experiència anterior permet resoldre situacions vistes anteriorment com ara saber quins regions corresponen al terra, al cel o a objectes. Durant els últims anys, quan la tecnologia ho ha permès, s’han intentat dissenyar sistemes que interpretessin automàticament diferents tipus d’escena. En aquesta tesi s’aborda el tema de l’estimació de la profunditat utilitzant només un punt de vista i indicis visuals d’oclusió. L’objectiu del treball es la detecció d’aquests indicis i combinar-los amb un sistema de segmentació per tal de generar automàticament els diferents plans de profunditat presents a una escena. La tesi explora tant situacions estàtiques (imatges fixes) com situacions dinàmiques, com ara trames dins de seqüències de vídeo o seqüències completes. En el cas de seqüències completes, també es proposa un sistema automàtic per reconstruir l’estructura de l’escena només amb informació de moviment. Els resultats del treball son prometedors i competitius amb la literatura del moment, però mostren encara que la visió per computador té molt marge de millora respecte la precisió dels humans.
何漢達 and Hon-tat Ho. "An integrated approach to depth estimation using a monocular image sequence." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1996. http://hub.hku.hk/bib/B31213108.
Full textHo, Hon-tat. "An integrated approach to depth estimation using a monocular image sequence /." Hong Kong : University of Hong Kong, 1996. http://sunzi.lib.hku.hk/hkuto/record.jsp?B17592045.
Full textSchennings, Jacob. "Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-336923.
Full textPinard, Clément. "Robust Learning of a depth map for obstacle avoidance with a monocular stabilized flying camera." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLY003/document.
Full textCustomer unmanned aerial vehicles (UAVs) are mainly flying cameras. They democratized aerial footage, but with thei success came security concerns.This works aims at improving UAVs security with obstacle avoidance, while keeping a smooth flight. In this context, we use only one stabilized camera, because of weight and cost incentives.For their robustness in computer vision and thei capacity to solve complex tasks, we chose to use convolutional neural networks (CNN). Our strategy is based on incrementally learning tasks with increasing complexity which first steps are to construct a depth map from the stabilized camera. This thesis is focused on studying ability of CNNs to train for this task.In the case of stabilized footage, the depth map is closely linked to optical flow. We thus adapt FlowNet, a CNN known for optical flow, to output directly depth from two stabilized frames. This network is called DepthNet.This experiment succeeded with synthetic footage, but is not robust enough to be used directly on real videos. Consequently, we consider self supervised training with real videos, based on differentiably reproject images. This training method for CNNs being rather novel in literature, a thorough study is needed in order not to depend too moch on heuristics.Finally, we developed a depth fusion algorithm to use DepthNet efficiently on real videos. Multiple frame pairs are fed to DepthNet to get a great depth sensing range
Bartoli, Simone. "Deploying deep learning for 3D reconstruction from monocular video sequences." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22402/.
Full textGampher, John Eric. "Perception of motion-in-depth induced motion effects on monocular and binocular cues /." Birmingham, Ala. : University of Alabama at Birmingham, 2008. https://www.mhsl.uab.edu/dt/2009r/gampher.pdf.
Full textTitle from PDF title page (viewed Mar. 30, 2010). Additional advisors: Franklin R. Amthor, James E. Cox, Timothy J. Gawne, Rosalyn E. Weller. Includes bibliographical references (p. 104-114).
Dey, Rohit. "MonoDepth-vSLAM: A Visual EKF-SLAM using Optical Flow and Monocular Depth Estimation." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627666226301079.
Full textYe, Mao. "MONOCULAR POSE ESTIMATION AND SHAPE RECONSTRUCTION OF QUASI-ARTICULATED OBJECTS WITH CONSUMER DEPTH CAMERA." UKnowledge, 2014. http://uknowledge.uky.edu/cs_etds/25.
Full textDjikic, Addi. "Segmentation and Depth Estimation of Urban Road Using Monocular Camera and Convolutional Neural Networks." Thesis, KTH, Robotik, perception och lärande, RPL, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235496.
Full textDeep learning för säkra autonoma transportsystem framträder mer och mer inom forskning och utveckling. Snabb och robust uppfattning om miljön för autonoma fordon kommer att vara avgörande för framtida navigering inom stadsområden med stor trafiksampel. I denna avhandling härleder vi en ny form av ett neuralt nätverk som vi kallar AutoNet. Där nätverket är designat som en autoencoder för pixelvis djupskattning av den fria körbara vägytan för stadsområden, där nätverket endast använder sig av en monokulär kamera och dess bilder. Det föreslagna nätverket för djupskattning hanteras som ett regressions problem. AutoNet är även konstruerad som ett klassificeringsnätverk som endast ska klassificera och segmentera den körbara vägytan i realtid med monokulärt seende. Där detta är hanterat som ett övervakande klassificerings problem, som även visar sig vara en mer simpel och mer robust lösning för att hitta vägyta i stadsområden. Vi implementerar även ett av de främsta neurala nätverken ENet för jämförelse. ENet är utformat för snabb semantisk segmentering i realtid, med hög prediktions- hastighet. Evalueringen av nätverken visar att AutoNet utklassar ENet i varje prestandamätning för noggrannhet, men visar sig vara långsammare med avseende på antal bilder per sekund. Olika optimeringslösningar föreslås för framtida arbete, för hur man ökar nätverk-modelens bildhastighet samtidigt som man behåller robustheten.All träning och utvärdering görs på Cityscapes dataset. Ny data för träning samt evaluering för djupskattningen för väg skapas med ett nytt tillvägagångssätt, genom att kombinera förberäknade djupkartor med semantiska etiketter för väg. Datainsamling med ett Scania-fordon utförs även, monterad med en monoculär kamera för att testa den slutgiltiga härleda modellen. Det föreslagna nätverket AutoNet visar sig vara en lovande topp-presterande modell i fråga om djupuppskattning för väg samt vägklassificering för stadsområden.
Paerhati, Paruku. "Real-time monocular depth mapping system using variance of focal plane and pixel focus measure." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113117.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (page 48).
Vision is one of the most powerful senses available to creatures. Undoubtedly, many of the fundamental operations of humans, such as the ability to plan paths, avoid obstacles, and recognize objects, depend heavily on their visual perception of the world around them. Although humans have naturally evolved to efficiently use their stereo optical prowess to develop an understanding of their environment, artificial machines and systems in comparison have just begun to utilize computer vision to create awareness of local physical entities. One of the most important sensory skills creatures have is depth perception, which allows them to estimate the relative distance of objects in their vision from many visual cues. Many systems have been developed to aid machines in perceiving the depth map of their environment, and each system has its drawbacks and benefits. In this paper, we introduce the design and implementation of a new system which provides a depth map from the use of a single optical camera with focal plane variation in the images taken. The paper focuses on the methods used to scale the depth from focus algorithm to perform in real-time. The results also showcase a real-time depth mapping system capable of providing rich depth maps of scenes at a high framerate and with advanced noise filtration techniques.
by Paruku Paerhati.
M. Eng.
Tucker, Andrew James, and n/a. "Visual space attention in three-dimensional space." Swinburne University of Technology, 2006. http://adt.lib.swin.edu.au./public/adt-VSWT20070301.085637.
Full textJUNGÅKER, JONAS. "Monocular depth estimation for level assessment in an industrial waste management environment : A thesis within smart waste management." Thesis, KTH, Skolan för industriell teknik och management (ITM), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-303107.
Full textMed den tekniska omvandlingen till Industri 4.0, ledande aktörer i många branscher ställs inför utmaningar såsom hur de ska implementera tekniska lösningar och bibehålla konkurrenskraft. Inom området för smart avfallshantering har många tekniska lösningar presenterats som på ett effektivt sätt mäter soptunnenivåer men ett praktiskt sätt att jämföra dessa lösningar saknas. Från forskning inom Industrial Internet of Things (IIoT) och intervjuer med operatörer på Scania har vi tagit fram ett koncist och konkret sätt att jämföra dessa lösningar med avseende på operativ effektivitet. Tillsammans med detta har vi också tagit fram en djupestimeringsmodell som med hjälp av djupa konvolutionsneuronsnätverk kan mäta fyllnadsvolymen av soptunnor. Vi har visat i vår forskning att detta djupestimeringsnätverk är ett möjligt alternativ till andra sensorer. Vi jämför sedan detta system mot ultraljudssensorer och har funnit att ultraljudssensorerna presterar bättreän djupestimeringsmodellen på flera av de centrala mätvärdena. Trots detta så drog vi slutsatsen att vår metod att mäta fyllnadsvolym av soptunnor med hjälp av djupestimering kan användas tillsammans med objektigenkänning i mer komplexa applikationer för att undvika användandet av enklare sensorer, så som ultraljud.
Ekström, Marcus. "Road Surface Preview Estimation Using a Monocular Camera." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-151873.
Full textDiskin, Yakov. "Dense 3D Point Cloud Representation of a Scene Using Uncalibrated Monocular Vision." University of Dayton / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1366386933.
Full textCavalcanti, Ugo Leone. "Miglioramento tramite reti monoculari di mappe di disparità ottenute da reti stereo." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.
Find full textAli, Shahnewaz. "Robotic vision for knee arthroscopy." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/235890/1/Shahnewaz%2BAli%2BThesis%282%29.pdf.
Full textBanach, Artur. "Visual navigation in minimally invasive surgery." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/228730/1/Artur_Banach_Thesis.pdf.
Full textMoukari, Michel. "Estimation de profondeur à partir d'images monoculaires par apprentissage profond." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC211/document.
Full textComputer vision is a branch of artificial intelligence whose purpose is to enable a machine to analyze, process and understand the content of digital images. Scene understanding in particular is a major issue in computer vision. It goes through a semantic and structural characterization of the image, on one hand to describe its content and, on the other hand, to understand its geometry. However, while the real space is three-dimensional, the image representing it is two-dimensional. Part of the 3D information is thus lost during the process of image formation and it is therefore non trivial to describe the geometry of a scene from 2D images of it.There are several ways to retrieve the depth information lost in the image. In this thesis we are interested in estimating a depth map given a single image of the scene. In this case, the depth information corresponds, for each pixel, to the distance between the camera and the object represented in this pixel. The automatic estimation of a distance map of the scene from an image is indeed a critical algorithmic brick in a very large number of domains, in particular that of autonomous vehicles (obstacle detection, navigation aids).Although the problem of estimating depth from a single image is a difficult and inherently ill-posed problem, we know that humans can appreciate distances with one eye. This capacity is not innate but acquired and made possible mostly thanks to the identification of indices reflecting the prior knowledge of the surrounding objects. Moreover, we know that learning algorithms can extract these clues directly from images. We are particularly interested in statistical learning methods based on deep neural networks that have recently led to major breakthroughs in many fields and we are studying the case of the monocular depth estimation
Kaller, Ondřej. "Pokročilé metody snímání a hodnocení kvality 3D videa." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-369744.
Full textWu, Cheng-En, and 吳承恩. "Depth Estimation from Multiple Monocular Cues." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/25774947344479431961.
Full text國立臺灣大學
資訊網路與多媒體研究所
100
3-D display technology is one of popular topics in recent years. In the process of generating 3-D visual perspective, depth map plays an important role in rebuilding the stereoscopic effect. So far, manually drawing the depth map is the mainstream in movie industry. However, it costs lots of money and time. Therefore, many automatic and semi-automatic depth estimation methods have been published in recent years. In this thesis, a three-phase and semi-automatic system is proposed. First, the input image/frames are analyzed to extract the information of scene. Then absolute depth estimation and relative depth estimation are employed to generate depth map. The proposed system is applicable to both single image and image sequence. For sequence inputs, temporal coherence is obtained to make the depth maps between frames being smooth and continuous. The experimental results show that this method can estimate depth successfully. The effectiveness of the proposed system also gives development for automatic depth estimation. With the improvement of segmentation algorithm, generating depth map automatically will come true in the future.
Chang, Yu-Tzu, and 章祐慈. "Learning 3D Geometry for Monocular Depth Estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/y5k7aw.
Full text國立交通大學
資訊科學與工程研究所
107
Abstract In this thesis, we proposed a convolutional neural network (CNN) for monocular depth estimation. In previous depth estimation works, various approaches only take RGB images as input to reconstruct dense depth map. These approaches may have a limit on the accuracy of the predicted depth value. In the proposed method, we take RGB image and corresponding sparse depth information as input, extract both multi-scale context features and multi-resolution spatial features to reconstruct dense depth map. By utilizing the sparse depth information, we can significantly improve the accuracy of prediction depth map. Moreover, we introduce the concept of multi-view learning to our network, compute the photometric consistency between reference and neighborhood views. It provides the geometry constraint and helps network to recover a more complete depth map. The proposed network can efficiently predict accurate depth maps full of details through sparse depth information and geometry cues. In addition, we use the depth map predicted by our method to demonstrate the network ability on 3D reconstruction task. The 3D point clouds can be reconstructed well even in ground-truthless areas, such as textureless and reflective materials. In conclusion, the proposed network takes RGB image and sparse depth information as input, and learns the geometry constraint to predict depth map. The proposed network provides dense depth map in both accurate depth values and high visualization quality on variety datasets, including RGBD, SUN3D, MVS and ETH3D.
Yin, Wei. "3D Scene Reconstruction from A Monocular Image." Thesis, 2022. https://hdl.handle.net/2440/134585.
Full textThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
Su, Wei-Cheng, and 蘇偉誠. "Unsupervised Monocular Depth Estimation Using Spatial-Temporal Information." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/ns39f3.
Full text國立交通大學
電子研究所
107
Depth estimation is using on lots of applications in our life. First, the 3D information in depth map assists many real application. For example, 3D reconstruction, human machine interaction, virtual/ augmented reality, and navigation. Next, many tasks can be simplified with the help of information of depth map. Such as, simply setting a threshold in RGB-D images can separate the human and background in 3D human pose estimation task, then applying random forest to regress the position of joints. Another example is simultaneous localization and mapping(SLAM). It is clear to see that using RGB-D sensor is much more reliable than simply using monocular camera on visual SLAM. The RGB-D sensor can provide more information to help the tracking and mapping. It may be said without fear of exaggeration that lots of task would be benefit from the depth information. In theis thesis, we would foucus on monocular unsupervised depth estimation. We refine monodepth model and utilize atrous convolution to enlarge receptive field. So that the accuracy of our model will increase. In addition, we propose a model training with spatial-temporal information. With the help of learning pose transformation, the performances in different datasets are increase. Lastly, temporal branch increases the inference speed.
Bian, Jiawang. "Self-supervised Learning of Monocular Depth from Video." Thesis, 2022. https://hdl.handle.net/2440/136692.
Full textThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2022
Chen, L., W. Tang, Tao Ruan Wan, and N. W. John. "Self-supervised monocular image depth learning and confidence estimation." 2019. http://hdl.handle.net/10454/17908.
Full textWe present a novel self-supervised framework for monocular image depth learning and confidence estimation. Our framework reduces the amount of ground truth annotation data required for training Convolutional Neural Networks (CNNs), which is often a challenging problem for the fast deployment of CNNs in many computer vision tasks. Our DepthNet adopts a novel fully differential patch-based cost function through the Zero-Mean Normalized Cross Correlation (ZNCC) to take multi-scale patches as matching and learning strategies. This approach greatly increases the accuracy and robustness of the depth learning. Whilst the proposed patch-based cost function naturally provides a 0-to-1 confidence, it is then used to self-supervise the training of a parallel network for confidence map learning and estimation by exploiting the fact that ZNCC is a normalized measure of similarity which can be approximated as the confidence of the depth estimation. Therefore, the proposed corresponding confidence map learning and estimation operate in a self-supervised manner and is a parallel network to the DepthNet. Evaluation on the KITTI depth prediction evaluation dataset and Make3D dataset show that our method outperforms the state-of-the-art results.
CHANG, PO-CHAO, and 張博詔. "Excluding non-matched patches to do unsupervised monocular depth estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/3r3z6n.
Full text"Monocular Depth Estimation with Edge-Based Constraints and Active Learning." Master's thesis, 2019. http://hdl.handle.net/2286/R.I.54881.
Full textDissertation/Thesis
Masters Thesis Computer Engineering 2019
Lin, Xinyi, and 林心怡. "Enhancing Unsupervised Monocular Depth Estimation via Fusing Layer-wised Features." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qg72r4.
Full text國立臺灣大學
資訊網路與多媒體研究所
107
Recently, deep methods have shown good performance in depth estimation and Visual Odometry from monocular video sequence by optimizing the photometric consistency between frames. However, it remains hard to obtain large-scale ground truth depth maps for supervising a neural network for depth estimation. Meanwhile, existing solutions for depth estimation typically produce low resolution results. Inspired by recent deep learning methods for semantic segmentation, we present a simple but effective unsupervised learning deep network for more accurate depth estimation and camera motion estimation. An atrous spatial pyramid pooling module and an additional refinement layer are combined to an encoder-decoder base model. Besides, we introduce a consistency-regularization loss to increase the robustness towards handling illumination change. Our approach produces high-resolution depth maps with sharper object boundaries and achieve better results on the KITTI benchmark.
KE, MIN-HUNG, and 柯旻宏. "Monocular Depth Estimation and Collision Avoidance on a Multirotor Drone." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7z7n9k.
Full text國立聯合大學
電機工程學系碩士班
107
The thesis mainly presents a multirotor drone capable of autonomous flight and obstacle avoidance in an outdoor environment. The system consists of Pixhawk, Raspberry Pi and notebook. It uses deep learning, image processing and control strategies to achieve outdoor autonomous flight. The overall system is composed of a multirotor drone and a ground control station. We used a single-lens camera to get a single image,and the image is transmitted back to the ground control station by real time messaging protocol. Then, Socket API method returns obstacle avoidance information to Raspberry Pi on the multirotor drone. Moreover, obstacle distance detection is based on deep learning. We applied offline training to collect the training data that are captured by stereo camera. Left and right images at the same time and the same level input to the convolutional neural network. The output is a disparity map of a single image. In order to find the conversion relationship between the disparity value and the real distance, we used the curve fitting method. The true distance is measured using a laser range finder. Therefore, we can get the true distance of each pixel in a single image. Then, we have depth map and use image processing techniques to find the flightable area. And use the flightable area to calculate the multirotor that should go straight or turn left or turn right to carry out the mission. Finally, Raspberry Pi and Pixhawk communicate using Dronekit-Python when Raspberry get obstacle avoidance information on the multirotor dorne. When the multirotor drone encounters obstacle during the autonomous flight, it can instantly change the flight attitude and smoothly avoid the obstacles ahead to complete the multirotor drone automatic obstacle avoidance function. The user can set a target to make the multirotor dorne take off toward the target after taking off. During the flight, if the multirotor drone encounters an obstacle, the obstacle information can be obtained and the original flight path can be taken to avoid obstacles. The multirotor drone flies towards the target point again after the obstacle avoidance is completed. In the end, the multirotor dorne arrives at set target point for landing to complete the function of automatic obstacle avoidance during the autonomous flight.
Huang, Yao-Pao, and 黃耀葆. "Transfer2Depth: Dual attention network with transfer learning for monocular depth estimation." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/qcx3q7.
Full text國立中山大學
電機工程學系研究所
107
Resolving depth from monocular RGB image has been a long-standing task in computer vision and robotics. In this work we propose a monocular depth estimation method which takes only a single image as input. Unlike most existing learning based methods taking two images as input, our network has the advantage of high applicability while it does not require sufficient and static camera motion to reach optimal performance. We also propose a spatial-channel attention module to improve feature extraction. The proposed methods utilize transfer learning to achieve higher estimation accuracy while using less training data and less training epochs. The experimental results show that the proposed method outperforms state-of-the-art singe image depth estimation methods.
Chiu, Mian-Jhong, and 邱勉中. "Real-time Monocular Depth Estimation with Extremely Light-weight Neural Network." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4p2ddx.
Full text國立交通大學
多媒體工程研究所
107
Obstacle avoidance and environment sensing are crucial applications in autonomous driving and robotics. Among all types of sensors, camera is widely used in these applications because it can offer rich visual contents with relatively low-cost. Thus, using images from a single camera to perform depth estimation became one of the main focus in resent research works. However, prior works usually rely on highly complicated computation and power-consuming equipment to achieve such task; therefore, we focus on developing a real-time light-weight system for depth prediction in this thesis. Based on the well-known encoder-decoder architecture, we propose a supervised learning-based CNN with detachable decoders that outputs predicted depth maps with multiple resolutions. We also formulate a novel multi-task loss function for each decoder block, which considers both depth map and semantic segmentation simultaneously to encourage model convergence as well as to speed up the training process. To train our model on KITTI dataset, we generate depth map and semantic segmentation via PSMNet and DeepLabV3, respectively as ground truth, and test various pre-processing methods. We also collect a synthetic dataset in AirSim with a wide range of cameras views to evaluate the proposed depth estimation approach in terms of robustness. Via a series of ablation studies and experiments, it is validated that our model can efficiently performs real-time depth prediction with few parameters and fairly low computation cost, with the best trained model outperforms previous works on KITTI dataset for various evaluation matrices. Trained and tested on our AirSim dataset, our model also shown to be able to deal with images captured with quite different camera poses and altitudes.
Torralba, Antonio, and Aude Oliva. "Global Depth Perception from Familiar Scene Structure." 2001. http://hdl.handle.net/1721.1/7267.
Full textHuang, Po-Yu, and 黃柏諭. "Supervised Monocular Depth Estimation Using Deep Neural Network in Robotic Operating System." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/3ztypn.
Full text國立交通大學
電子研究所
107
Recently, unmanned aerial vehicles (UAVs) not only play an important role in military use, commercial applications like damage assessment, environment monitoring and pesticide spraying. More accurate and reliable technological capabilities such as autonomous flying, obstacle avoidance, battery performance, and localization are required. Deep neural networks (DNNs) are leading huge improvements in many artificial intelligence (AI) tasks such as image classification, object detection, and image segmentation, which makes UAVs become one of the important AI commercial technologies that can provide potential applications like rescue, transportation and monitoring services. However, UAVs are powered by battery which limits the flight time and payload capacity. When developing deep learning algorithms inferred on UAV, platform resources should be considered to achieve better accuracy versus latency trade-offs. In order to automatically fly a drone, we develop a monocular depth estimation algorithm based on deep neural network that takes a RGB image and predicts correspond depth image. The depth image is further transformed into point cloud and occupancy grid map in Robotic Operating System (ROS) so the drone is equipped with the knowledge of its surrounding environment. This information is critical to obstacle avoidance and path planning algorithms. After reducing the model complexity and compiled with an open-sourced compiler TVM, the proposed depth estimation algorithm has been deployed to energy efficient embedded systems Nvidia Tegra X1 (TX1) by taking the advantage of fast and powerful deep neural network with only 6.1 M parameters and reaches 14 FPS to estimate depth.
Fan, Chen-shuo, and 范辰碩. "Monocular Vision Based Depth Map Extraction Method for 2D to 3D Video Conversion." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/16383026759959088781.
Full text國立中央大學
電機工程學系
101
There are two semi-automatic depth map extraction methods for stereo video conversion presented in this thesis. Due to demand of 3D visualization and lack of 3D video content, we must develop low cost and high efficiency post processing methods to convert efficiently from 2D to 3D video if everyone wants to enjoy vivid 3D video. For static background video sequence, we proposed a method that is combined foreground segmentation with vanishing line technology of monocular depth cue. According to the results of separated foreground and background from foreground segmentation algorithm, viewer could use their acquired visual experience to assign computer some depth information of background at initialization step. Then, foreground would be obtained relative depth information form background depth map. This algorithm could be operated at 0.17s/frame in CIF size video under nearly 3D visualization to other references from our experimental results. Moreover, we proposed another conversion method followed conception as mentioned above for dynamic background video sequence. Foreground segmentation was replaced by relative velocity estimation based on motion estimation and motion compensation. Although this method is not able to attend equally quality of foreground segmentation method, this method still has wide utility and could be operated at 0.15s/frame in CIF size video.
Chen, Ting-Wei, and 陳庭瑋. "Image Depth Initialization and Fuzzy Data Association for Aerial Robot Monocular Visual Localization and Mapping." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/07348162565621224367.
Full text淡江大學
機械與機電工程學系碩士班
102
This study investigates the issues of visual sensor assisted aerial robot navigation. The major objectives are to provide the aerial robot the capabilities of localization and mapping in global positioning system (GPS) denied environments. When the aerial robot navigates in a GPS-denied environment, the visual sensor could provide the measurement for robot state estimation and environmental mapping. Considering the carrying capacity of the aerial robot, single camera is used in this study and the image is transmitted to PC-based controller for image processing using a radio frequency module. The extended Kalman filter is used as the state estimator to recursively predict and update the states of the aerial robot and the environment landmarks. For the monocular vision sensor, the image depth is represented by using the inverse depth parameterization method and the image features initialization is achieved by a non-delayed procedure. The results of this study are twofold. First, an ultrasonic sensor is used to provide one-dimensional distance measurement and solve the image depth estimation problem of monocular vision. Second, a novel data association procedure is designed based on fuzzy system in order to improve the performance of map management. The software program of the robot navigation system is developed in a PC-based controller using Microsoft Visual Studio C++. The navigation system integrates the sensor inputs, image processing, and state estimation. The resultant system is used to perform the tasks of simultaneous localization and mapping for aerial robots.
Ju-PengHuang and 黃如鵬. "Realization of Depth Estimation from Monocular Camera Based on Defocus Algorithm and Reverse Heat Equation." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/rtgkgk.
Full textZhuo, Wei. "2D+3D Indoor Scene Understanding from a Single Monocular Image." Phd thesis, 2018. http://hdl.handle.net/1885/144616.
Full textHuang, Jian-hao, and 黃建豪. "Dense Piecewise Planar Reconstruction based on Low Gradient Region Depth Estimation from a Monocular Image Sequence." Thesis, 2018. http://ndltd.ncl.edu.tw/handle/ujhn5v.
Full text國立臺灣大學
電機工程學研究所
106
Visual navigation of robot has been a popular and a challenge research topic in past few years. One of important part for navigation is environment sensing. Especially for previously unknown and GPS-denied environments, this thesis uses monocular camera to obtain image data and estimates the depth map information in each keyframe by LSD SLAM [11: Engel et al. 2014]. RGB image and depth map in each keyframe are extracted to detect low texture regions by region growing segmentation method. The assumption made is that image areas with low photometric gradients are mostly planar which is met in most indoors and man-made scene. This thesis proposes a depth filling method to optimize the depth map completeness in each keyframe. It can provide robot more environment information to apply on navigation. For monocular unknown scalar problem, the assigned marker in the scene is used to compute the scale. However, the estimated scale is used to define the thresholds that are used to filter out the unreasonable plane estimation in depth filling process. This thesis compares the depth filling method against several alternatives using Gazebo simulation [35: Gazebo from OSRF, Inc], public Tum dataset [23: Sturm et al. 2012], and experiment with a Microsoft Kinect sensor. The comparison demonstrate that our depth filling method for piecewise planar monocular SLAM is denser than LSD SLAM [11: Engel et al. 2014] and DPPTAM [12: Concha & Civera 2015].