Dissertations / Theses on the topic 'Deep Learning, Computer Vision, Object Detection'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Deep Learning, Computer Vision, Object Detection.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Kohmann, Erich. "Tecniche di deep learning per l'object detection." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19637/.
Full textAndersson, Dickfors Robin, and Nick Grannas. "OBJECT DETECTION USING DEEP LEARNING ON METAL CHIPS IN MANUFACTURING." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55068.
Full textDIGICOGS
Arefiyan, Khalilabad Seyyed Mostafa. "Deep Learning Models for Context-Aware Object Detection." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/88387.
Full textMS
Bartoli, Giacomo. "Edge AI: Deep Learning techniques for Computer Vision applied to embedded systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16820/.
Full textEspis, Andrea. "Object detection and semantic segmentation for assisted data labeling." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.
Find full textNorrstig, Andreas. "Visual Object Detection using Convolutional Neural Networks in a Virtual Environment." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156609.
Full textDickens, James. "Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42619.
Full textSolini, Arianna. "Applicazione di Deep Learning e Computer Vision ad un Caso d'uso aziendale: Progettazione, Risoluzione ed Analisi." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textCuan, Bonan. "Deep similarity metric learning for multiple object tracking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.
Full textMultiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy
Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.
Full textGustafsson, Fredrik, and Erik Linder-Norén. "Automotive 3D Object Detection Without Target Domain Annotations." Thesis, Linköpings universitet, Datorseende, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-148585.
Full textOgier, du Terrail Jean. "Réseaux de neurones convolutionnels profonds pour la détection de petits véhicules en imagerie aérienne." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC276/document.
Full textThe following manuscript is an attempt to tackle the problem of small vehicles detection in vertical aerial imagery through the use of deep learning algorithms. The specificities of the matter allows the use of innovative techniques leveraging the invariance and self similarities of automobiles/planes vehicles seen from the sky.We will start by a thorough study of single shot detectors. Building on that we will examine the effect of adding multiple stages to the detection decision process. Finally we will try to come to grips with the domain adaptation problem in detection through the generation of better looking synthetic data and its use in the training process of these detectors
Taurone, Francesco. "3D Object Recognition from a Single Image via Patch Detection by a Deep CNN." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/18669/.
Full textAl, Hakim Ezeddin. "3D YOLO: End-to-End 3D Object Detection Using Point Clouds." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-234242.
Full textFör att autonoma fordon ska ha en god uppfattning av sin omgivning används moderna sensorer som LiDAR och RADAR. Dessa genererar en stor mängd 3-dimensionella datapunkter som kallas point clouds. Inom utvecklingen av autonoma fordon finns det ett stort behov av att tolka LiDAR-data samt klassificera medtrafikanter. Ett stort antal studier har gjorts om 2D-objektdetektering som analyserar bilder för att upptäcka fordon, men vi är intresserade av 3D-objektdetektering med hjälp av endast LiDAR data. Därför introducerar vi modellen 3D YOLO, som bygger på YOLO (You Only Look Once), som är en av de snabbaste state-of-the-art modellerna inom 2D-objektdetektering för bilder. 3D YOLO tar in ett point cloud och producerar 3D lådor som markerar de olika objekten samt anger objektets kategori. Vi har tränat och evaluerat modellen med den publika träningsdatan KITTI. Våra resultat visar att 3D YOLO är snabbare än dagens state-of-the-art LiDAR-baserade modeller med en hög träffsäkerhet. Detta gör den till en god kandidat för kunna användas av autonoma fordon.
Fucili, Mattia. "3D object detection from point clouds with dense pose voters." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17616/.
Full textCapuzzo, Davide. "3D StixelNet Deep Neural Network for 3D object detection stixel-based." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/22017/.
Full textAzizpour, Hossein. "Visual Representations and Models: From Latent SVM to Deep Learning." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.
Full textQC 20160908
Kalogeiton, Vasiliki. "Localizing spatially and temporally objects and actions in videos." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/28984.
Full textSöderlund, Henrik. "Real-time Detection and Tracking of Moving Objects Using Deep Learning and Multi-threaded Kalman Filtering : A joint solution of 3D object detection and tracking for Autonomous Driving." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-160180.
Full textPeng, Zeng. "Pedestrian Tracking by using Deep Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-302107.
Full textDetta projekt syftar till att använda djupinlärning för att lösa problemet med att följa fotgängare för autonom körning. For ligger inom datorseende och djupinlärning. Multi-Objekt-följning (MOT) syftar till att följa flera mål samtidigt i videodata. de viktigaste applikationsscenarierna för MOT är säkerhetsövervakning och autonom körning. I dessa scenarier behöver vi ofta följa många mål samtidigt, vilket inte är möjligt med endast objektdetektering eller algoritmer för enkel följning av objekt för deras bristande stabilitet och användbarhet, därför måste utforska området för multipel objektspårning. Vår metod bryter MOT i olika steg och använder rörelse- och utseendinformation för mål för att spåra dem i videodata, vi använde tre olika objektdetektorer för att upptäcka fotgängare i ramar en personidentifieringsmodell som utseendefunktionsavskiljare och Kalmanfilter som rörelsesprediktor. Vår föreslagna modell uppnår 47,6 % MOT-noggrannhet och 53,2 % i IDF1 medan resultaten som erhållits av modellen utan personåteridentifieringsmodul är endast 44,8%respektive 45,8 %. Våra experimentresultat visade att den robusta algoritmen för multipel objektspårning kan uppnås genom delade uppgifter och förbättras av de representativa DNN-baserade utseendefunktionerna.
Papakis, Ioannis. "A Graph Convolutional Neural Network Based Approach for Object Tracking Using Augmented Detections With Optical Flow." Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/103372.
Full textMaster of Science
This thesis presents a novel method for Multi-Object Tracking (MOT) in videos, with the main goal of associating objects between frames. The proposed method is based on a Deep Neural Network Architecture operating on a Graph Structure. The Graph based approach makes it possible to use both appearance and geometry of detected objects to retrieve high level information about their characteristics and interaction. The framework includes the Sinkhorn algorithm, which can be embedded in the training phase to satisfy MOT constraints, such as the 1 to 1 matching between previous and new objects. Another approach is also proposed to improve the sensitivity of the object detector by using previous frame detections as a guide to detect objects in each new frame, resulting in less missed objects. Alongside the new methods, a new dataset is also provided which includes naturalistic video footage from current infrastructure cameras in Virginia Beach City with a variety of vehicle density and environment conditions. Experimental evaluation demonstrates the efficacy of the proposed approaches on the provided dataset and the popular MOT Challenge Benchmark.
Mhalla, Ala. "Multi-object detection and tracking in video sequences." Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC084/document.
Full textThe work developed in this PhD thesis is focused on video sequence analysis. Thelatter consists of object detection, categorization and tracking. The development ofreliable solutions for the analysis of video sequences opens new horizons for severalapplications such as intelligent transport systems, video surveillance and robotics.In this thesis, we put forward several contributions to deal with the problems ofdetecting and tracking multi-objects on video sequences. The proposed frameworksare based on deep learning networks and transfer learning approaches.In a first contribution, we tackle the problem of multi-object detection by puttingforward a new transfer learning framework based on the formalism and the theoryof a Sequential Monte Carlo (SMC) filter to automatically specialize a Deep ConvolutionalNeural Network (DCNN) detector towards a target scene. The suggestedspecialization framework is used in order to transfer the knowledge from the sourceand the target domain to the target scene and to estimate the unknown target distributionas a specialized dataset composed of samples from the target domain. Thesesamples are selected according to the importance of their weights which reflectsthe likelihood that they belong to the target distribution. The obtained specializeddataset allows training a specialized DCNN detector to a target scene withouthuman intervention.In a second contribution, we propose an original multi-object tracking frameworkbased on spatio-temporal strategies (interlacing/inverse interlacing) and aninterlaced deep detector, which improves the performances of tracking-by-detectionalgorithms and helps to track objects in complex videos (occlusion, intersection,strong motion).In a third contribution, we provide an embedded system for traffic surveillance,which integrates an extension of the SMC framework so as to improve the detectionaccuracy in both day and night conditions and to specialize any DCNN detector forboth mobile and stationary cameras.Throughout this report, we provide both quantitative and qualitative results.On several aspects related to video sequence analysis, this work outperformsthe state-of-the-art detection and tracking frameworks. In addition, we havesuccessfully implemented our frameworks in an embedded hardware platform forroad traffic safety and monitoring
Grossman, Mikael. "Proposal networks in object detection." Thesis, KTH, Matematisk statistik, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-241918.
Full textLokalisering av användbar data från bilder är något som har revolutionerats under det senaste decenniet när datorkraften har ökat till en nivå då man kan använda artificiella neurala nätverk i praktiken. En typ av ett neuralt nätverk som använder faltning passar utmärkt till bilder eftersom det ger möjlighet för nätverket att skapa sina egna filter som tidigare skapades för hand. För lokalisering av objekt i bilder används huvudsakligen Faster R-CNN arkitekturen. Den fungerar i två steg, först skapar RPN boxar som innehåller regioner där nätverket tror det är störst sannolikhet att hitta ett objekt. Sedan är det en detektor som verifierar om boxen är på ett objekt .I denna uppsats går vi igenom den nuvarande litteraturen i artificiella neurala nätverk, objektdektektering, förslags metoder och presenterar ett nytt förslag att generera förslag på regioner. Vi visar att genom att byta ut RPN med vår metod (MPN) ökar vi precisionen med 12% och reducerar tiden med 10%.
Moussallik, Laila. "Towards Condition-Based Maintenance of Catenary wires using computer vision : Deep Learning applications on eMaintenance & Industrial AI for railway industry." Thesis, Luleå tekniska universitet, Institutionen för samhällsbyggnad och naturresurser, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-83123.
Full textIACONO, MASSIMILIANO. "Object detection and recognition with event driven cameras." Doctoral thesis, Università degli studi di Genova, 2020. http://hdl.handle.net/11567/1005981.
Full textLamberti, Lorenzo. "A deep learning solution for industrial OCR applications." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19777/.
Full textCottignoli, Lorenzo. "Strumento di Realtà Aumentata su Dispositivi Mobili per Labeling di Immagini Semi-Automatico." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/17734/.
Full textArcidiacono, Claudio Salvatore. "An empirical study on synthetic image generation techniques for object detectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235502.
Full textKonvolutionella neurala nätverk är ett mycket kraftfullt verktyg för maskininlärning som överträffade andra tekniker inom bildigenkänning. Den största nackdelen med denna metod är den massiva mängd träningsdata som krävs, eftersom det är mycket arbetsintensivt att producera träningsdata för bildigenkänningsuppgifter. För att ta itu med detta problem har olika tekniker föreslagits för att generera syntetiska träningsdata automatiskt. Dessa syntetiska datagenererande tekniker kan grupperas i två kategorier: den första kategorin genererar syntetiska bilder med hjälp av datorgrafikprogram och CAD-modeller av objekten att känna igen; Den andra kategorin genererar syntetiska bilder genom att klippa objektet från en bild och klistra in det på en annan bild. Eftersom båda teknikerna har sina fördelar och nackdelar, skulle det vara intressant för industrier att undersöka mer ingående de båda metoderna. Ett vanligt fall i industriella scenarier är att upptäcka och klassificera objekt i en bild. Olika föremål som hänför sig till klasser som är relevanta i industriella scenarier är ofta oskiljbara (till exempel de är alla samma komponent). Av dessa skäl syftar detta avhandlingsarbete till att svara på frågan “Bland CAD-genereringsteknikerna, Cut-paste generationsteknikerna och en kombination av de två teknikerna, vilken teknik är mer lämplig för att generera bilder för träningsobjektdetektorer i industriellascenarier”. För att svara på forskningsfrågan föreslås två syntetiska bildgenereringstekniker som hänför sig till de två kategorierna. De föreslagna teknikerna är skräddarsydda för applikationer där alla föremål som tillhör samma klass är oskiljbara, men de kan också utökas till andra applikationer. De två syntetiska bildgenereringsteknikerna jämförs med att mäta prestanda hos en objektdetektor som utbildas med hjälp av syntetiska bilder på en testdataset med riktiga bilder. Föreställningarna för de två syntetiska datagenererande teknikerna som används för dataförökning har också uppmätts. De empiriska resultaten visar att CAD-modelleringstekniken fungerar väsentligt bättre än Cut-Paste-genereringstekniken, där syntetiska bilder är den enda källan till träningsdata (61% bättre), medan de två generationsteknikerna fungerar lika bra som dataförstoringstekniker. Dessutom visar de empiriska resultaten att modellerna som utbildats med bara syntetiska bilder utför nästan lika bra som modellen som utbildats med hjälp av riktiga bilder (7,4% sämre) och att förstora datasetet med riktiga bilder med hjälp av syntetiska bilder förbättrar modellens prestanda (9,5% bättre).
Schennings, Jacob. "Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation." Thesis, Uppsala universitet, Avdelningen för systemteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-336923.
Full textPitteri, Giorgia. "3D Object Pose Estimation in Industrial Context." Thesis, Bordeaux, 2020. http://www.theses.fr/2020BORD0202.
Full text3D object detection and pose estimation are of primary importance for tasks such as robotic manipulation, augmented reality and they have been the focus of intense research in recent years. Methods relying on depth data acquired by depth cameras are robust. Unfortunately, active depth sensors are power hungry or sometimes it is not possible to use them. It is therefore often desirable to rely on color images. When training machine learning algorithms that aim at estimate object's 6D poses from images, many challenges arise, especially in industrial context that requires handling objects with symmetries and generalizing to unseen objects, i.e. objects never seen by the networks during training.In this thesis, we first analyse the link between the symmetries of a 3D object and its appearance in images. Our analysis explains why symmetrical objects can be a challenge when training machine learning algorithms to predict their 6D pose from images. We then propose an efficient and simple solution that relies on the normalization of the pose rotation. This approach is general and can be used with any 6D pose estimation algorithm.Then, we address the second main challenge: the generalization to unseen objects. Many recent methods for 6D pose estimation are robust and accurate but their success can be attributed to supervised Machine Learning approaches. For each new object, these methods have to be retrained on many different images of this object, which are not always available. Even if domain transfer methods allow for training such methods with synthetic images instead of real ones-at least to some extent-such training sessions take time, and it is highly desirable to avoid them in practice.We propose two methods to handle this problem. The first method relies only on the objects’ geometries and focuses on objects with prominent corners, which covers a large number of industrial objects. We first learn to detect object corners of various shapes in images and also to predict their 3D poses, by using training images of a small set of objects. To detect a new object in a given image, we first identify its corners from its CAD model; we also detect the corners visible in the image and predict their 3D poses. We then introduce a RANSAC-like algorithm that robustly and efficiently detects and estimates the object’s 3D pose by matching its corners on the CAD model with their detected counterparts in the image.The second method overcomes the limitations of the first one as it does not require objects to have specific corners and the offline selection of the corners on the CAD model. It combines Deep Learning and 3D geometry and relies on an embedding of the local 3D geometry to match the CAD models to the input images. For points at the surface of objects, this embedding can be computed directly from the CAD model; for image locations, we learn to predict it from the image itself. This establishes correspondences between 3D points on the CAD model and 2D locations of the input images. However, many of these correspondences are ambiguous as many points may have similar local geometries. We also show that we can use Mask-RCNN in a class-agnostic way to detect the new objects without retraining and thus drastically limit the number of possible correspondences. We can then robustly estimate a 3D pose from these discriminative correspondences using a RANSAC-like algorithm
Grard, Matthieu. "Generic instance segmentation for object-oriented bin-picking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEC015.
Full textReferred to as robotic random bin-picking, a fast-expanding industrial task consists in robotizing the unloading of many object instances piled up in bulk, one at a time, for further processing such as kitting or part assembling. However, explicit object models are not always available in many bin-picking applications, especially in the food and automotive industries. Furthermore, object instances are often subject to intra-class variations, for example due to elastic deformations.Object pose estimation techniques, which require an explicit model and assume rigid transformations, are therefore not suitable in such contexts. The alternative approach, which consists in detecting grasps without an explicit notion of object, proves hardly efficient when the object geometry makes bulk instances prone to occlusion and entanglement. These approaches also typically rely on a multi-view scene reconstruction that may be unfeasible due to transparent and shiny textures, or that reduces critically the time frame for image processing in high-throughput robotic applications.In collaboration with Siléane, a French company in industrial robotics, we thus aim at developing a learning-based solution for localizing the most affordable instance of a pile from a single image, in open loop, without explicit object models. In the context of industrial bin-picking, our contribution is two-fold.First, we propose a novel fully convolutional network (FCN) for jointly delineating instances and inferring the spatial layout at their boundaries. Indeed, the state-of-the-art methods for such a task rely on two independent streams for boundaries and occlusions respectively, whereas occlusions often cause boundaries. Specifically, the mainstream approach, which consists in isolating instances in boxes before detecting boundaries and occlusions, fails in bin-picking scenarios as a rectangle region often includes several instances. By contrast, our box proposal-free architecture recovers fine instance boundaries, augmented with their occluding side, from a unified scene representation. As a result, the proposed network outperforms the two-stream baselines on synthetic data and public real-world datasets.Second, as FCNs require large training datasets that are not available in bin-picking applications, we propose a simulation-based pipeline for generating training images using physics and rendering engines. Specifically, piles of instances are simulated and rendered with their ground-truth annotations from sets of texture images and meshes to which multiple random deformations are applied. We show that the proposed synthetic data is plausible for real-world applications in the sense that it enables the learning of deep representations transferable to real data. Through extensive experiments on a real-world robotic setup, our synthetically trained network outperforms the industrial baseline while achieving real-time performances. The proposed approach thus establishes a new baseline for model-free object-oriented bin-picking
Rispoli, Luca. "Un approccio deep learning-based per il conteggio di persone tramite videocamere low-cost in un contesto Smart Campus." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19567/.
Full textSuzano, Massa Francisco Vitor. "Mise en relation d'images et de modèles 3D avec des réseaux de neurones convolutifs." Thesis, Paris Est, 2017. http://www.theses.fr/2017PESC1198/document.
Full textThe recent availability of large catalogs of 3D models enables new possibilities for a 3D reasoning on photographs. This thesis investigates the use of convolutional neural networks (CNNs) for relating 3D objects to 2D images.We first introduce two contributions that are used throughout this thesis: an automatic memory reduction library for deep CNNs, and a study of CNN features for cross-domain matching. In the first one, we develop a library built on top of Torch7 which automatically reduces up to 91% of the memory requirements for deploying a deep CNN. As a second point, we study the effectiveness of various CNN features extracted from a pre-trained network in the case of images from different modalities (real or synthetic images). We show that despite the large cross-domain difference between rendered views and photographs, it is possible to use some of these features for instance retrieval, with possible applications to image-based rendering.There has been a recent use of CNNs for the task of object viewpoint estimation, sometimes with very different design choices. We present these approaches in an unified framework and we analyse the key factors that affect performance. We propose a joint training method that combines both detection and viewpoint estimation, which performs better than considering the viewpoint estimation separately. We also study the impact of the formulation of viewpoint estimation either as a discrete or a continuous task, we quantify the benefits of deeper architectures and we demonstrate that using synthetic data is beneficial. With all these elements combined, we improve over previous state-of-the-art results on the Pascal3D+ dataset by a approximately 5% of mean average viewpoint precision.In the instance retrieval study, the image of the object is given and the goal is to identify among a number of 3D models which object it is. We extend this work to object detection, where instead we are given a 3D model (or a set of 3D models) and we are asked to locate and align the model in the image. We show that simply using CNN features are not enough for this task, and we propose to learn a transformation that brings the features from the real images close to the features from the rendered views. We evaluate our approach both qualitatively and quantitatively on two standard datasets: the IKEAobject dataset, and a subset of the Pascal VOC 2012 dataset of the chair category, and we show state-of-the-art results on both of them
Carletti, Angelo. "Development of a machine learning algorithm for the automatic analysis of microscopy images in an in-vitro diagnostic platform." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.
Find full textSievert, Rolf. "Instance Segmentation of Multiclass Litter and Imbalanced Dataset Handling : A Deep Learning Model Comparison." Thesis, Linköpings universitet, Datorseende, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-175173.
Full textGao, Yuan. "Surround Vision Object Detection Using Deep Learning." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-231929.
Full textAvhandlingen börjar med att utveckla ett ramverk för objektdetektering i bilder från den framåtriktade kameran i surroundvision-data. Med målet att minska mängden annoterad data så mycket som möjligt, tillämpas olika metoder för domänanpassning för att träna på andra kamerabilder baserat på en basmodell. Relevant dataanalys utförs som avslöjar användbar information i objektdistributioner för alla kamerorna. Regulariseringstekniker som infattar Dropout, viktsönderfall, data-augmentering testas för att reducera träningsmodellens komplexitet. Experiment med kvotreduktion utförs också för att hitta förhållandet mellan modellens prestanda och mängden träningsdata. Det påvisas att 30% av träningsdata för den vänstra bakåtriktade och den vänstra framåtriktade kameran kan reduceras utan att modellens prestanda minskar väsentligt. Dessutom visas i avhandlingen felen angående fordonens placeringar genom värmekartor som är användbara för vidare studier. Sammantaget indikerar resultaten i dessa omfattande experiment på att modellen tränad med domänanpassning är, som förväntat, effektiv.
Mordan, Taylor. "Conception d'architectures profondes pour l'interprétation de données visuelles." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS270.
Full textNowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework
Zhang, Kaige. "Deep Learning for Crack-Like Object Detection." DigitalCommons@USU, 2019. https://digitalcommons.usu.edu/etd/7616.
Full textYellapantula, Sudha Ravali. "Synthesizing Realistic Data for Vision Based Drone-to-Drone Detection." Thesis, Virginia Tech, 2019. http://hdl.handle.net/10919/91460.
Full textMaster of Science
In the recent years, technologies like Deep Learning and Machine Learning have seen many rapid developments. Among the many applications they have, object detection is one of the widely used application and well established problems. In our thesis, we deal with a scenario where we have a swarm of drones and our aim is for one drone to recognize another drone in its field of vision. As there was no drone image dataset readily available, we explored different ways of generating realistic data to address this issue. Finally, we proposed a solution to generate realistic images using Deep Learning techniques and trained an object detection model on it where we evaluated how well it has performed against other models.
Matosevic, Antonio. "Batch Active Learning for Deep Object Detection in Videos." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292035.
Full textDe senaste framstegen inom objektdetektering kan till största delen tillskrivas djupa neurala nätverk. Träning av sådana modeller kräver dock stora mängder annoterad data. Detta utgör ett två-faldigt problem; att få tag i annoterad data är en tidskrävande process och själva träningen är i många fall beräkningsmässigt kostsam. En vanlig approach för att åtgärda dessa problem är att använda så kallad aktiv inlärning, vilket innebär att man skapar en strategi för att interaktivt välja ut och använda betydligt färre datapunkter samtidigt som prestandan maximeras. I samband med djup objektdetektering på videodata så uppstår två nya utmaningar. För det första så beror typiska osäkerhetsbaserade strategier på kvaliteten på osäkerhetsuppskattningarna, vilka ofta kräver särskild behandling av djupa neurala nätverk. För det andra, om karaktären av batch-baserad träning och likhet mellan bilder i en videosekvens ej beaktas kan det resultera i icke-informativa samlingar av datapunkter som saknar mångfald. I detta arbete försöker vi åtgärda båda problemen genom att föreslå strategier som bygger på förbättrade osäkerhetsuppskattningar och diversifieringsmetoder. Empiriska experiment demonstrerar att våra föreslagna osäkerhetsbaserade strategier är jämförbara med en referensmetod som väljer ut datapunkter slumpmässigt medan diversifieringsstrategierna, givet förbättrade osäkerhetsuppskattningar, ger betydligt bättre prestanda än referensmetoden. Noterbart är att med endast 15% av datamängden når vår bästa strategi så mycket som 90:27% av prestandan som när man använder all tillgänglig data för att träna detektorn.
Ibrahim, Ahmed Sobhy Elnady. "End-To-End Text Detection Using Deep Learning." Diss., Virginia Tech, 2017. http://hdl.handle.net/10919/81277.
Full textPh. D.
Rahman, Quazi Marufur. "Performance monitoring of deep learning vision systems during deployment." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/229733/1/Quazi%20Marufur_Rahman_Thesis.pdf.
Full textCapellier, Édouard. "Application of machine learning techniques for evidential 3D perception, in the context of autonomous driving." Thesis, Compiègne, 2020. http://www.theses.fr/2020COMP2534.
Full textThe perception task is paramount for self-driving vehicles. Being able to extract accurate and significant information from sensor inputs is mandatory, so as to ensure a safe operation. The recent progresses of machine-learning techniques revolutionize the way perception modules, for autonomous driving, are being developed and evaluated, while allowing to vastly overpass previous state-of-the-art results in practically all the perception-related tasks. Therefore, efficient and accurate ways to model the knowledge that is used by a self-driving vehicle is mandatory. Indeed, self-awareness, and appropriate modeling of the doubts, are desirable properties for such system. In this work, we assumed that the evidence theory was an efficient way to finely model the information extracted from deep neural networks. Based on those intuitions, we developed three perception modules that rely on machine learning, and the evidence theory. Those modules were tested on real-life data. First, we proposed an asynchronous evidential occupancy grid mapping algorithm, that fused semantic segmentation results obtained from RGB images, and LIDAR scans. Its asynchronous nature makes it particularly efficient to handle sensor failures. The semantic information is used to define decay rates at the cell level, and handle potentially moving object. Then, we proposed an evidential classifier of LIDAR objects. This system is trained to distinguish between vehicles and vulnerable road users, that are detected via a clustering algorithm. The classifier can be reinterpreted as performing a fusion of simple evidential mass functions. Moreover, a simple statistical filtering scheme can be used to filter outputs of the classifier that are incoherent with regards to the training set, so as to allow the classifier to work in open world, and reject other types of objects. Finally, we investigated the possibility to perform road detection in LIDAR scans, from deep neural networks. We proposed two architectures that are inspired by recent state-of-the-art LIDAR processing systems. A training dataset was acquired and labeled in a semi-automatic fashion from road maps. A set of fused neural networks reaches satisfactory results, which allowed us to use them in an evidential road mapping and object detection algorithm, that manages to run at 10 Hz
Aytar, Yusuf. "Transfer learning for object category detection." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:c9e18ff9-df43-4f67-b8ac-28c3fdfa584b.
Full textMoniruzzaman, Md. "Seagrass detection using deep learning." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2019. https://ro.ecu.edu.au/theses/2261.
Full textRunow, Björn. "Deep Learning for Point Detection in Images." Thesis, Linköpings universitet, Datorseende, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166644.
Full textÖhman, Wilhelm. "Data augmentation using military simulators in deep learning object detection applications." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-264917.
Full textLösningar som använder sig av djupinlärning har gjort stora framsteg under senare år, dock så är kravet på ett stort och etiketterat dataset en begränsande faktor. Detta är ett än större problem i domäner där även icke etiketterad data är svårtillgänglig, som till exempel den militära domänen. Syntetisk data har på sistone ådragit sig uppmärksamhet som en potentiell lösning för detta problem. Detta examensarbete utforskar möjligheten att använda syntetisk data som ett sätt att förbättra prestandan för en djupinlärningslösning. Detta neuronnätverk har i uppgift att detektera och lokalisera skjutvapen i bilder. För att generera syntetisk data används militärsimulatorn VBS3. Neuronnätverket använder sig av Faster R-CNN-arkitektur. Med hjälp av detta tränades flera modeller på varierande mängd riktig och syntetisk data. Vidare har de syntetiska dataseten genererats där de följt två olika filosofier. Ett dataset försöker efterlikna den verkliga världen, och det andra förkastar realism till förmån för variation. Det påvisas att det dataset som strävar efter variation medförde ökad prestanda i uppgiften att detektera vapen. Det dataset som eftersträver realism gav blandade resultat.
Estgren, Martin. "Bone Fragment Segmentation Using Deep Interactive Object Selection." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-157668.
Full textCase, Isaac. "Automatic object detection and tracking in video /." Online version of thesis, 2010. http://hdl.handle.net/1850/12332.
Full textCogswell, Michael Andrew. "Understanding Representations and Reducing their Redundancy in Deep Networks." Thesis, Virginia Tech, 2016. http://hdl.handle.net/10919/78167.
Full textMaster of Science