Academic literature on the topic 'Deep Learning, Computer Vision, Object Detection'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Deep Learning, Computer Vision, Object Detection.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Dissertations / Theses on the topic "Deep Learning, Computer Vision, Object Detection"

1

Kohmann, Erich. "Tecniche di deep learning per l'object detection." Bachelor's thesis, Alma Mater Studiorum - Università di Bologna, 2019. http://amslaurea.unibo.it/19637/.

Full text
Abstract:
L’object detection è uno dei principali problemi nell’ambito della computer vision. Negli ultimi anni, con l’avvento delle reti neurali e del deep learning, sono stati fatti notevoli progressi nei metodi per affrontare questo problema. Questa tesi intende fornire una rassegna dei principali modelli di object detection basati su deep learning, di cui si illustrano le caratteristiche fondamentali e gli elementi che li contraddistinguono dai modelli precedenti. Dopo un infarinatura iniziale sul deep learning e sulle reti neurali in genere, vengono presentati i modelli caratterizzati da tecniche innovative che hanno portato ad un miglioramento significativo, sia nella precisione e nell’accuratezza delle predizioni, che in termini di consumo di risorse. Nella seconda parte l’elaborato si concentra su YOLO e sui suoi sviluppi. YOLO è un modello basato su reti neurali convoluzionali, con il quale i problemi di localizzazione e classificazione degli oggetti in un’immagine sono stati trattati per la prima volta come un unico problema di regressione. Questo cambio di prospettiva apportato dagli autori di YOLO ha aperto la strada verso un nuovo approccio all’object detection, facilitando il successivo sviluppo di modelli sempre più precisi e performanti.
APA, Harvard, Vancouver, ISO, and other styles
2

Andersson, Dickfors Robin, and Nick Grannas. "OBJECT DETECTION USING DEEP LEARNING ON METAL CHIPS IN MANUFACTURING." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-55068.

Full text
Abstract:
Designing cutting tools for the turning industry, providing optimal cutting parameters is of importance for both the client, and for the company's own research. By examining the metal chips that form in the turning process, operators can recommend optimal cutting parameters. Instead of doing manual classification of metal chips that come from the turning process, an automated approach of detecting chips and classification is preferred. This thesis aims to evaluate if such an approach is possible using either a Convolutional Neural Network (CNN) or a CNN feature extraction coupled with machine learning (ML). The thesis started with a research phase where we reviewed existing state of the art CNNs, image processing and ML algorithms. From the research, we implemented our own object detection algorithm, and we chose to implement two CNNs, AlexNet and VGG16. A third CNN was designed and implemented with our specific task in mind. The three models were tested against each other, both as standalone image classifiers and as a feature extractor coupled with a ML algorithm. Because the chips were inside a machine, different angles and light setup had to be tested to evaluate which setup provided the optimal image for classification. A top view of the cutting area was found to be the optimal angle with light focused on both below the cutting area, and in the chip disposal tray. The smaller proposed CNN with three convolutional layers, three pooling layers and two dense layers was found to rival both AlexNet and VGG16 in terms of both as a standalone classifier, and as a feature extractor. The proposed model was designed with a limited system in mind and is therefore more suited for those systems while still having a high accuracy. The classification accuracy of the proposed model as a standalone classifier was 92.03%. Compared to the state of the art classifier AlexNet which had an accuracy of 92.20%, and VGG16 which had an accuracy of 91.88%. When used as a feature extractor, all three models paired best with the Random Forest algorithm, but the accuracy between the feature extractors is not that significant. The proposed feature extractor combined with Random Forest had an accuracy of 82.56%, compared to AlexNet with an accuracy of 81.93%, and VGG16 with 79.14% accuracy.<br>DIGICOGS
APA, Harvard, Vancouver, ISO, and other styles
3

Arefiyan, Khalilabad Seyyed Mostafa. "Deep Learning Models for Context-Aware Object Detection." Thesis, Virginia Tech, 2017. http://hdl.handle.net/10919/88387.

Full text
Abstract:
In this thesis, we present ContextNet, a novel general object detection framework for incorporating context cues into a detection pipeline. Current deep learning methods for object detection exploit state-of-the-art image recognition networks for classifying the given region-of-interest (ROI) to predefined classes and regressing a bounding-box around it without using any information about the corresponding scene. ContextNet is based on an intuitive idea of having cues about the general scene (e.g., kitchen and library), and changes the priors about presence/absence of some object classes. We provide a general means for integrating this notion in the decision process about the given ROI by using a pretrained network on the scene recognition datasets in parallel to a pretrained network for extracting object-level features for the corresponding ROI. Using comprehensive experiments on the PASCAL VOC 2007, we demonstrate the effectiveness of our design choices, the resulting system outperforms the baseline in most object classes, and reaches 57.5 mAP (mean Average Precision) on the PASCAL VOC 2007 test set in comparison with 55.6 mAP for the baseline.<br>MS
APA, Harvard, Vancouver, ISO, and other styles
4

Bartoli, Giacomo. "Edge AI: Deep Learning techniques for Computer Vision applied to embedded systems." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2018. http://amslaurea.unibo.it/16820/.

Full text
Abstract:
In the last decade, Machine Learning techniques have been used in different fields, ranging from finance to healthcare and even marketing. Amongst all these techniques, the ones adopting a Deep Learning approach were revealed to outperform humans in tasks such as object detection, image classification and speech recognition. This thesis introduces the concept of Edge AI: that is the possibility to build learning models capable of making inference locally, without any dependence on expensive servers or cloud services. A first case study we consider is based on the Google AIY Vision Kit, an intelligent camera equipped with a graphic board to optimize Computer Vision algorithms. Then, we test the performances of CORe50, a dataset for continuous object recognition, on embedded systems. The techniques developed in these chapters will be finally used to solve a challenge within the Audi Autonomous Driving Cup 2018, where a mobile car equipped with a camera, sensors and a graphic board must recognize pedestrians and stop before hitting them.
APA, Harvard, Vancouver, ISO, and other styles
5

Espis, Andrea. "Object detection and semantic segmentation for assisted data labeling." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2022.

Find full text
Abstract:
The automation of data labeling tasks is a solution to the errors and time costs related to human labeling. In this thesis work CenterNet, DeepLabV3, and K-Means applied to the RGB color space, are deployed to build a pipeline for Assisted data labeling: a semi-automatic process to iteratively improve the quality of the annotations. The proposed pipeline pointed out a total of 1547 wrong and missing annotations when applied to a dataset originally containing 8,300 annotations. Moreover, the quality of each annotation has been drastically improved, and at the same time, more than 600 hours of work have been saved. The same models have also been used to address the real-time Tire inspection task, regarding the detection of markers on the surface of tires. According to the experiments, the combination of DeepLabV3 output and post-processing based on the area and shape of the predicted blobs, achieves a maximum of mean Precision 0.992, with mean Recall 0.982, and a maximum of mean Recall 0.998, with mean Precision 0.960.
APA, Harvard, Vancouver, ISO, and other styles
6

Norrstig, Andreas. "Visual Object Detection using Convolutional Neural Networks in a Virtual Environment." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156609.

Full text
Abstract:
Visual object detection is a popular computer vision task that has been intensively investigated using deep learning on real data. However, data from virtual environments have not received the same attention. A virtual environment enables generating data for locations that are not easily reachable for data collection, e.g. aerial environments. In this thesis, we study the problem of object detection in virtual environments, more specifically an aerial virtual environment. We use a simulator, to generate a synthetic data set of 16 different types of vehicles captured from an airplane. To study the performance of existing methods in virtual environments, we train and evaluate two state-of-the-art detectors on the generated data set. Experiments show that both detectors, You Only Look Once version 3 (YOLOv3) and Single Shot MultiBox Detector (SSD), reach similar performance quality as previously presented in the literature on real data sets. In addition, we investigate different fusion techniques between detectors which were trained on two different subsets of the dataset, in this case a subset which has cars with fixed colors and a dataset which has cars with varying colors. Experiments show that it is possible to train multiple instances of the detector on different subsets of the data set, and combine these detectors in order to boost the performance.
APA, Harvard, Vancouver, ISO, and other styles
7

Dickens, James. "Depth-Aware Deep Learning Networks for Object Detection and Image Segmentation." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42619.

Full text
Abstract:
The rise of convolutional neural networks (CNNs) in the context of computer vision has occurred in tandem with the advancement of depth sensing technology. Depth cameras are capable of yielding two-dimensional arrays storing at each pixel the distance from objects and surfaces in a scene from a given sensor, aligned with a regular color image, obtaining so-called RGBD images. Inspired by prior models in the literature, this work develops a suite of RGBD CNN models to tackle the challenging tasks of object detection, instance segmentation, and semantic segmentation. Prominent architectures for object detection and image segmentation are modified to incorporate dual backbone approaches inputting RGB and depth images, combining features from both modalities through the use of novel fusion modules. For each task, the models developed are competitive with state-of-the-art RGBD architectures. In particular, the proposed RGBD object detection approach achieves 53.5% mAP on the SUN RGBD 19-class object detection benchmark, while the proposed RGBD semantic segmentation architecture yields 69.4% accuracy with respect to the SUN RGBD 37-class semantic segmentation benchmark. An original 13-class RGBD instance segmentation benchmark is introduced for the SUN RGBD dataset, for which the proposed model achieves 38.4% mAP. Additionally, an original depth-aware panoptic segmentation model is developed, trained, and tested for new benchmarks conceived for the NYUDv2 and SUN RGBD datasets. These benchmarks offer researchers a baseline for the task of RGBD panoptic segmentation on these datasets, where the novel depth-aware model outperforms a comparable RGB counterpart.
APA, Harvard, Vancouver, ISO, and other styles
8

Solini, Arianna. "Applicazione di Deep Learning e Computer Vision ad un Caso d'uso aziendale: Progettazione, Risoluzione ed Analisi." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text
Abstract:
Nella computer vision, sono oramai più di dieci anni che si parla di Machine Learning (ML), con l'obiettivo di creare sistemi autonomi che siano in grado di realizzare modelli approssimati della realtà tridimensionale partendo da immagini bidimensionali. Grazie a questa capacità si possono interpretare e comprendere le immagini, emulando la vista umana. Molti ricercatori hanno creato reti neurali in grado di sfidarsi su grandi dataset di milioni di immagini e, come conseguenza, si è ottenuto il continuo miglioramento delle performance di classificazione di immagini da parte delle reti e la capacità di individuare il framework più adatto per ogni situazione, ottenendo risultati il più possibile performanti, veloci e accurati. Numerose aziende in tutto il mondo fanno uso di Machine Learning e computer vision, spaziando dal controllo qualità, all'assistenza diretta a persone che lavorano su attività ripetitive e spesso stancanti. Il lavoro di tesi è stato realizzato nel corso di un tirocinio presso Injenia (azienda informatica italiana partner Google) ed è stato svolto nell'ambito di un progetto industriale commissionato ad Injenia da parte di una multi-utility italiana. Il progetto prevedeva l'utilizzo di uno o più modelli di ML in ambito computer vision e, a tal fine, è stata portata avanti un'indagine su più fronti per indirizzare le scelte durante il processo di sviluppo. Una parte dei risultati dell'indagine ha fornito informazioni utili all'ottimizzazione del modello di ML utilizzato. Un'altra parte è stata utilizzata per il fine-tuning di un modello di ML (già pre-allenato), applicando quindi il principio di transfer learning al dataset di immagini fornite dalla multi-utility. Lo scopo della tesi è, quindi, quello di presentare lo sviluppo e l'applicazione di tecniche di Machine Learning, Deep Learning e computer vision ad un caso d'uso aziendale concreto.
APA, Harvard, Vancouver, ISO, and other styles
9

Cuan, Bonan. "Deep similarity metric learning for multiple object tracking." Thesis, Lyon, 2019. http://www.theses.fr/2019LYSEI065.

Full text
Abstract:
Le suivi d’objets multiples dans une scène est une tâche importante dans le domaine de la vision par ordinateur, et présente toujours de très nombreux verrous. Les objets doivent être détectés et distingués les uns des autres de manière continue et simultanée. Les approches «suivi par détection» sont largement utilisées, où la détection des objets est d’abord réalisée sur toutes les frames, puis le suivi est ramené à un problème d’association entre les détections d’un même objet et les trajectoires identifiées. La plupart des algorithmes de suivi associent des modèles de mouvement et des modèles d’apparence. Dans cette thèse, nous proposons un modèle de ré-identification basé sur l’apparence et utilisant l’apprentissage de métrique de similarité. Nous faisons tout d’abord appel à un réseau siamois profond pour apprendre un maping de bout en bout, des images d’entrée vers un espace de caractéristiques où les objets sont mieux discriminés. De nombreuses configurations sont évaluées, afin d’en déduire celle offrant les meilleurs scores. Le modèle ainsi obtenu atteint des résultats de ré-identification satisfaisants comparables à l’état de l’art. Ensuite, notre modèle est intégré dans un système de suivi d’objets multiples pour servir de guide d’apparence pour l’association des objets. Un modèle d’apparence est établi pour chaque objet détecté s’appuyant sur le modèle de ré-identification. Les similarités entre les objets détectés sont alors exploitées pour la classification. Par ailleurs, nous avons étudié la coopération et les interférences entre les modèles d’apparence et de mouvement dans le processus de suivi. Un couplage actif entre ces 2 modèles est proposé pour améliorer davantage les performances du suivi, et la contribution de chacun d’eux est estimée en continue. Les expérimentations menées dans le cadre du benchmark «Multiple Object Tracking Challenge» ont prouvé l’efficacité de nos propositions et donné de meilleurs résultats de suivi que l’état de l’art<br>Multiple object tracking, i.e. simultaneously tracking multiple objects in the scene, is an important but challenging visual task. Objects should be accurately detected and distinguished from each other to avoid erroneous trajectories. Since remarkable progress has been made in object detection field, “tracking-by-detection” approaches are widely adopted in multiple object tracking research. Objects are detected in advance and tracking reduces to an association problem: linking detections of the same object through frames into trajectories. Most tracking algorithms employ both motion and appearance models for data association. For multiple object tracking problems where exist many objects of the same category, a fine-grained discriminant appearance model is paramount and indispensable. Therefore, we propose an appearance-based re-identification model using deep similarity metric learning to deal with multiple object tracking in mono-camera videos. Two main contributions are reported in this dissertation: First, a deep Siamese network is employed to learn an end-to-end mapping from input images to a discriminant embedding space. Different metric learning configurations using various metrics, loss functions, deep network structures, etc., are investigated, in order to determine the best re-identification model for tracking. In addition, with an intuitive and simple classification design, the proposed model achieves satisfactory re-identification results, which are comparable to state-of-the-art approaches using triplet losses. Our approach is easy and fast to train and the learned embedding can be readily transferred onto the domain of tracking tasks. Second, we integrate our proposed re-identification model in multiple object tracking as appearance guidance for detection association. For each object to be tracked in a video, we establish an identity-related appearance model based on the learned embedding for re-identification. Similarities among detected object instances are exploited for identity classification. The collaboration and interference between appearance and motion models are also investigated. An online appearance-motion model coupling is proposed to further improve the tracking performance. Experiments on Multiple Object Tracking Challenge benchmark prove the effectiveness of our modifications, with a state-of-the-art tracking accuracy
APA, Harvard, Vancouver, ISO, and other styles
10

Chen, Zhe. "Augmented Context Modelling Neural Networks." Thesis, The University of Sydney, 2019. http://hdl.handle.net/2123/20654.

Full text
Abstract:
Contexts provide beneficial information for machine-based image understanding tasks. However, existing context modelling methods still cannot fully exploit contexts, especially for object recognition and detection. In this thesis, we develop augmented context modelling neural networks to better utilize contexts for different object recognition and detection tasks. Our contributions are two-fold: 1) we introduce neural networks to better model instance-level visual relationships; 2) we introduce neural network-based algorithms to better utilize contexts from 3D information and synthesized data. In particular, to augment the modelling of instance-level visual relationships, we propose a context refinement network and an encapsulated context modelling network for object detection. In the context refinement study, we propose to improve the modeling of visual relationships by introducing overlap scores and confidence scores of different regions. In addition, in the encapsulated context modelling study, we boost the context modelling performance by exploiting the more powerful capsule-based neural networks. To augment the modeling of contexts from different sources, we propose novel neural networks to better utilize 3D information and synthesis-based contexts. For the modelling of 3D information, we mainly investigate the modelling of LiDAR data for road detection and the depth data for instance segmentation, respectively. In road detection, we develop a progressive LiDAR adaptation algorithm to improve the fusion of 3D LiDAR data and 2D image data. Regarding instance segmentation, we model depth data as context to help tackle the low-resolution annotation-based training problem. Moreover, to improve the modelling of synthesis-based contexts, we devise a shape translation-based pedestrian generation framework to help improve the pedestrian detection performance.
APA, Harvard, Vancouver, ISO, and other styles
More sources
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography