To see the other types of publications on this topic, follow the link: Dataset VISION.

Dissertations / Theses on the topic 'Dataset VISION'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 39 dissertations / theses for your research on the topic 'Dataset VISION.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Toll, Abigail. "Matrices of Vision : Sonic Disruption of a Dataset." Thesis, Kungl. Musikhögskolan, Institutionen för komposition, dirigering och musikteori, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kmh:diva-4152.

Full text
Abstract:
Matrices of Vision is a sonic deconstruction of a higher education dataset compiled by the influential Swedish higher education authority Universitetskanslersämbetet (UKÄ). The title Matrices of Vision and project theme is inspired by Indigenous cyberfeminist, scholar and artist Tiara Roxanne’s work into data colonialism. The method explores how practical applications of sound and theory can be used to meditate on political struggles and envision emancipatory modes of creation that hold space through a music-making practice. The artistic approach uses just intonation as a system, or grid of fixed points, which it refuses. The pitch strategy diverges from this approach by way of its political motivations: it disobeys just intonation’s rigid structure through practice and breaks with its order as a way to explore its experiential qualities. The approach seeks to engage beyond the structures designed to regulate behaviors and ways of perceiving and rather hold space for a multiplicity of viewpoints which are explored through cacophony, emotion and deep listening techniques.
APA, Harvard, Vancouver, ISO, and other styles
2

Berriel, Rodrigo Ferreira. "Vision-based ego-lane analysis system : dataset and algorithms." Mestrado em Informática, 2016. http://repositorio.ufes.br/handle/10/6775.

Full text
Abstract:
Submitted by Patricia Barros (patricia.barros@ufes.br) on 2017-04-13T13:58:14Z No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao Rodrigo Ferreira Berriel.pdf: 18168750 bytes, checksum: 52805e1f943170ef4d6cc96046ea48ec (MD5)
Approved for entry into archive by Patricia Barros (patricia.barros@ufes.br) on 2017-04-13T14:00:19Z (GMT) No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao Rodrigo Ferreira Berriel.pdf: 18168750 bytes, checksum: 52805e1f943170ef4d6cc96046ea48ec (MD5)
Made available in DSpace on 2017-04-13T14:00:19Z (GMT). No. of bitstreams: 2 license_rdf: 0 bytes, checksum: d41d8cd98f00b204e9800998ecf8427e (MD5) dissertacao Rodrigo Ferreira Berriel.pdf: 18168750 bytes, checksum: 52805e1f943170ef4d6cc96046ea48ec (MD5)
FAPES
A detecção e análise da faixa de trânsito são tarefas importantes e desafiadoras em sistemas avançados de assistência ao motorista e direção autônoma. Essas tarefas são necessárias para auxiliar veículos autônomos e semi-autônomos a operarem com segurança. A queda no custo dos sensores de visão e os avanços em hardware embarcado impulsionaram as pesquisas relacionadas a faixa de trânsito –detecção, estimativa, rastreamento, etc. – nas últimas duas décadas. O interesse nesse tópico aumentou ainda mais com a demanda por sistemas avançados de assistência ao motorista (ADAS) e carros autônomos. Embora amplamente estudado de forma independente, ainda há necessidade de estudos que propõem uma solução combinada para os vários problemas relacionados a faixa do veículo, tal como aviso de saída de faixa (LDW), detecção de troca de faixa, classificação do tipo de linhas de divisão de fluxo (LMT), detecção e classificação de inscrições no pavimento, e detecção da presença de faixas ajdacentes. Esse trabalho propõe um sistema de análise da faixa do veículo (ELAS) em tempo real capaz de estimar a posição da faixa do veículo, classificar as linhas de divisão de fluxo e inscrições na faixa, realizar aviso de saída de faixa e detectar eventos de troca de faixa. O sistema proposto, baseado em visão, funciona em uma sequência temporal de imagens. Características das marcações de faixa são extraídas tanto na perspectiva original quanto em images mapeadas para a vista aérea, que então são combinadas para aumentar a robustez. A estimativa final da faixa é modelada como uma spline usando uma combinação de métodos (linhas de Hough, filtro de Kalman e filtro de partículas). Baseado na faixa estimada, todos os outros eventos são detectados. Além disso, o sistema proposto foi integrado para experimentação em um sistema para carros autônomos que está sendo desenvolvido pelo Laboratório de Computação de Alto Desempenho (LCAD) da Universidade Federal do Espírito Santo (UFES). Para validar os algorítmos propostos e cobrir a falta de base de dados para essas tarefas na literatura, uma nova base dados com mais de 20 cenas diferentes (com mais de 15.000 imagens) e considerando uma variedade de cenários (estrada urbana, rodovias, tráfego, sombras, etc.) foi criada. Essa base de dados foi manualmente anotada e disponilizada publicamente para possibilitar a avaliação de diversos eventos que são de interesse para a comunidade de pesquisa (i.e. estimativa, mudança e centralização da faixa; inscrições no pavimento; cruzamentos; tipos de linhas de divisão de fluxo; faixas de pedestre e faixas adjacentes). Além disso, o sistema também foi validado qualitativamente com base na integração com o veículo autônomo. O sistema alcançou altas taxas de detecção em todos os eventos do mundo real e provou estar pronto para aplicações em tempo real.
Lane detection and analysis are important and challenging tasks in advanced driver assistance systems and autonomous driving. These tasks are required in order to help autonomous and semi-autonomous vehicles to operate safely. Decreasing costs of vision sensors and advances in embedded hardware boosted lane related research – detection, estimation, tracking, etc. – in the past two decades. The interest in this topic has increased even more with the demand for advanced driver assistance systems (ADAS) and self-driving cars. Although extensively studied independently, there is still need for studies that propose a combined solution for the multiple problems related to the ego-lane, such as lane departure warning (LDW), lane change detection, lane marking type (LMT) classification, road markings detection and classification, and detection of adjacent lanes presence. This work proposes a real-time Ego-Lane Analysis System (ELAS) capable of estimating ego-lane position, classifying LMTs and road markings, performing LDW and detecting lane change events. The proposed vision-based system works on a temporal sequence of images. Lane marking features are extracted in perspective and Inverse Perspective Mapping (IPM) images that are combined to increase robustness. The final estimated lane is modeled as a spline using a combination of methods (Hough lines, Kalman filter and Particle filter). Based on the estimated lane, all other events are detected. Moreover, the proposed system was integrated for experimentation into an autonomous car that is being developed by the High Performance Computing Laboratory of the Universidade Federal do Espírito Santo. To validate the proposed algorithms and cover the lack of lane datasets in the literature, a new dataset with more than 20 different scenes (in more than 15,000 frames) and considering a variety of scenarios (urban road, highways, traffic, shadows, etc.) was created. The dataset was manually annotated and made publicly available to enable evaluation of several events that are of interest for the research community (i.e. lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes). Furthermore, the system was also validated qualitatively based on the integration with the autonomous vehicle. ELAS achieved high detection rates in all real-world events and proved to be ready for real-time applications.
APA, Harvard, Vancouver, ISO, and other styles
3

RAGONESI, RUGGERO. "Addressing Dataset Bias in Deep Neural Networks." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1069001.

Full text
Abstract:
Deep Learning has achieved tremendous success in recent years in several areas such as image classification, text translation, autonomous agents, to name a few. Deep Neural Networks are able to learn non-linear features in a data-driven fashion from complex, large scale datasets to solve tasks. However, some fundamental issues remain to be fixed: the kind of data that is provided to the neural network directly influences its capability to generalize. This is especially true when training and test data come from different distributions (the so called domain gap or domain shift problem): in this case, the neural network may learn a data representation that is representative for the training data but not for the test, thus performing poorly when deployed in actual scenarios. The domain gap problem is addressed by the so-called Domain Adaptation, for which a large literature was recently developed. In this thesis, we first present a novel method to perform Unsupervised Domain Adaptation. Starting from the typical scenario in which we dispose of labeled source distributions and an unlabeled target distribution, we pursue a pseudo-labeling approach to assign a label to the target data, and then, in an iterative way, we refine them using Generative Adversarial Networks. Subsequently, we faced the debiasing problem. Simply speaking, bias occurs when there are factors in the data which are spuriously correlated with the task label, e.g., the background, which might be a strong clue to guess what class is depicted in an image. When this happens, neural networks may erroneously learn such spurious correlations as predictive factors, and may therefore fail when deployed on different scenarios. Learning a debiased model can be done using supervision regarding the type of bias affecting the data, or can be done without any annotation about what are the spurious correlations. We tackled the problem of supervised debiasing -- where a ground truth annotation for the bias is given -- under the lens of information theory. We designed a neural network architecture that learns to solve the task while achieving at the same time, statistical independence of the data embedding with respect to the bias label. We finally addressed the unsupervised debiasing problem, in which there is no availability of bias annotation. we address this challenging problem by a two-stage approach: we first split coarsely the training dataset into two subsets, samples that exhibit spurious correlations and those that do not. Second, we learn a feature representation that can accommodate both subsets and an augmented version of them.
APA, Harvard, Vancouver, ISO, and other styles
4

Xie, Shuang. "A Tiny Diagnostic Dataset and Diverse Modules for Learning-Based Optical Flow Estimation." Thesis, Université d'Ottawa / University of Ottawa, 2019. http://hdl.handle.net/10393/39634.

Full text
Abstract:
Recent work has shown that flow estimation from a pair of images can be formulated as a supervised learning task to be resolved with convolutional neural networks (CNN). However, the basic straightforward CNN methods estimate optical flow with motion and occlusion boundary blur. To tackle this problem, we propose a tiny diagnostic dataset called FlowClevr to quickly evaluate various modules that can use to enhance standard CNN architectures. Based on the experiments of the FlowClevr dataset, we find that a deformable module can improve model prediction accuracy by around 30% to 100% in most tasks and more significantly reduce boundary blur. Based on these results, we are able to design modifications to various existing network architectures improving their performance. Compared with the original model, the model with the deformable module clearly reduces boundary blur and achieves a large improvement on the MPI sintel dataset, an omni-directional stereo (ODS) and a novel omni-directional optical flow dataset.
APA, Harvard, Vancouver, ISO, and other styles
5

Nett, Ryan. "Dataset and Evaluation of Self-Supervised Learning for Panoramic Depth Estimation." DigitalCommons@CalPoly, 2020. https://digitalcommons.calpoly.edu/theses/2234.

Full text
Abstract:
Depth detection is a very common computer vision problem. It shows up primarily in robotics, automation, or 3D visualization domains, as it is essential for converting images to point clouds. One of the poster child applications is self driving cars. Currently, the best methods for depth detection are either very expensive, like LIDAR, or require precise calibration, like stereo cameras. These costs have given rise to attempts to detect depth from a monocular camera (a single camera). While this is possible, it is harder than LIDAR or stereo methods since depth can't be measured from monocular images, it has to be inferred. A good example is covering one eye: you still have some idea how far away things are, but it's not exact. Neural networks are a natural fit for this. Here, we build on previous neural network methods by applying a recent state of the art model to panoramic images in addition to pinhole ones and performing a comparative evaluation. First, we create a simulated depth detection dataset that lends itself to panoramic comparisons and contains pre-made cylindrical and spherical panoramas. We then modify monodepth2 to support cylindrical and cubemap panoramas, incorporating current best practices for depth detection on those panorama types, and evaluate its performance for each type of image using our dataset. We also consider the resources used in training and other qualitative factors.
APA, Harvard, Vancouver, ISO, and other styles
6

Andruccioli, Matteo. "Previsione del Successo di Prodotti di Moda Prima della Commercializzazione: un Nuovo Dataset e Modello di Vision-Language Transformer." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/24956/.

Full text
Abstract:
A differenza di quanto avviene nel commercio tradizionale, in quello online il cliente non ha la possibilità di toccare con mano o provare il prodotto. La decisione di acquisto viene maturata in base ai dati messi a disposizione dal venditore attraverso titolo, descrizioni, immagini e alle recensioni di clienti precedenti. É quindi possibile prevedere quanto un prodotto venderà sulla base di queste informazioni. La maggior parte delle soluzioni attualmente presenti in letteratura effettua previsioni basandosi sulle recensioni, oppure analizzando il linguaggio usato nelle descrizioni per capire come questo influenzi le vendite. Le recensioni, tuttavia, non sono informazioni note ai venditori prima della commercializzazione del prodotto; usando solo dati testuali, inoltre, si tralascia l’influenza delle immagini. L'obiettivo di questa tesi è usare modelli di machine learning per prevedere il successo di vendita di un prodotto a partire dalle informazioni disponibili al venditore prima della commercializzazione. Si fa questo introducendo un modello cross-modale basato su Vision-Language Transformer in grado di effettuare classificazione. Un modello di questo tipo può aiutare i venditori a massimizzare il successo di vendita dei prodotti. A causa della mancanza, in letteratura, di dataset contenenti informazioni relative a prodotti venduti online che includono l’indicazione del successo di vendita, il lavoro svolto comprende la realizzazione di un dataset adatto a testare la soluzione sviluppata. Il dataset contiene un elenco di 78300 prodotti di Moda venduti su Amazon, per ognuno dei quali vengono riportate le principali informazioni messe a disposizione dal venditore e una misura di successo sul mercato. Questa viene ricavata a partire dal gradimento espresso dagli acquirenti e dal posizionamento del prodotto in una graduatoria basata sul numero di esemplari venduti.
APA, Harvard, Vancouver, ISO, and other styles
7

Joubert, Deon. "Saliency grouped landmarks for use in vision-based simultaneous localisation and mapping." Diss., University of Pretoria, 2013. http://hdl.handle.net/2263/40834.

Full text
Abstract:
The effective application of mobile robotics requires that robots be able to perform tasks with an extended degree of autonomy. Simultaneous localisation and mapping (SLAM) aids automation by providing a robot with the means of exploring an unknown environment while being able to position itself within this environment. Vision-based SLAM benefits from the large amounts of data produced by cameras but requires intensive processing of these data to obtain useful information. In this dissertation it is proposed that, as the saliency content of an image distils a large amount of the information present, it can be used to benefit vision-based SLAM implementations. The proposal is investigated by developing a new landmark for use in SLAM. Image keypoints are grouped together according to the saliency content of an image to form the new landmark. A SLAM system utilising this new landmark is implemented in order to demonstrate the viability of using the landmark. The landmark extraction, data filtering and data association routines necessary to make use of the landmark are discussed in detail. A Microsoft Kinect is used to obtain video images as well as 3D information of a viewed scene. The system is evaluated using computer simulations and real-world datasets from indoor structured environments. The datasets used are both newly generated and freely available benchmarking ones.
Dissertation (MEng)--University of Pretoria, 2013.
gm2014
Electrical, Electronic and Computer Engineering
unrestricted
APA, Harvard, Vancouver, ISO, and other styles
8

Horečný, Peter. "Metody segmentace obrazu s malými trénovacími množinami." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2020. http://www.nusl.cz/ntk/nusl-412996.

Full text
Abstract:
The goal of this thesis was to propose an image segmentation method, which is capable of effective segmentation process with small datasets. Recently published ODE neural network was used for this method, because its features should provide better generalization in case of tasks with only small datasets available. The proposed ODE-UNet network was created by combining UNet architecture with ODE neural network, while using benefits of both networks. ODE-UNet reached following results on ISBI dataset: Rand: 0,950272 and Info: 0,978061. These results are better than the ones received from UNet model, which was also tested in this thesis, but it has been proven that state of the art can not be outperformed using ODE neural networks. However, the advantages of ODE neural network over tested UNet architecture and other methods were confirmed, and there is still a room for improvement by extending this method.
APA, Harvard, Vancouver, ISO, and other styles
9

Tagebrand, Emil, and Ek Emil Gustafsson. "Dataset Generation in a Simulated Environment Using Real Flight Data for Reliable Runway Detection Capabilities." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-54974.

Full text
Abstract:
Implementing object detection methods for runway detection during landing approaches is limited in the safety-critical aircraft domain. This limitation is due to the difficulty that comes with verification of the design and the ability to understand how the object detection behaves during operation. During operation, object detection needs to consider the aircraft's position, environmental factors, different runways and aircraft attitudes. Training such an object detection model requires a comprehensive dataset that defines the features mentioned above. The feature's impact on the detection capabilities needs to be analysed to ensure the correct distribution of images in the dataset. Gathering images for these scenarios would be costly and needed due to the aviation industry's safety standards. Synthetic data can be used to limit the cost and time required to create a dataset where all features occur. By using synthesised data in the form of generating datasets in a simulated environment, these features could be applied to the dataset directly. The features could also be implemented separately in different datasets and compared to each other to analyse their impact on the object detections capabilities. By utilising this method for the features mentioned above, the following results could be determined. For object detection to consider most landing cases and different runways, the dataset needs to replicate real flight data and generate additional extreme landing cases. The dataset also needs to consider landings at different altitudes, which can differ at a different airport. Environmental conditions such as clouds and time of day reduce detection capabilities far from the runway, while attitude and runway appearance reduce it at close range. Runway appearance did also affect the runway at long ranges but only for darker runways.
APA, Harvard, Vancouver, ISO, and other styles
10

Sievert, Rolf. "Instance Segmentation of Multiclass Litter and Imbalanced Dataset Handling : A Deep Learning Model Comparison." Thesis, Linköpings universitet, Datorseende, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-175173.

Full text
Abstract:
Instance segmentation has a great potential for improving the current state of littering by autonomously detecting and segmenting different categories of litter. With this information, litter could, for example, be geotagged to aid litter pickers or to give precise locational information to unmanned vehicles for autonomous litter collection. Land-based litter instance segmentation is a relatively unexplored field, and this study aims to give a comparison of the instance segmentation models Mask R-CNN and DetectoRS using the multiclass litter dataset called Trash Annotations in Context (TACO) in conjunction with the Common Objects in Context precision and recall scores. TACO is an imbalanced dataset, and therefore imbalanced data-handling is addressed, exercising a second-order relation iterative stratified split, and additionally oversampling when training Mask R-CNN. Mask R-CNN without oversampling resulted in a segmentation of 0.127 mAP, and with oversampling 0.163 mAP. DetectoRS achieved 0.167 segmentation mAP, and improves the segmentation mAP of small objects most noticeably, with a factor of at least 2, which is important within the litter domain since small objects such as cigarettes are overrepresented. In contrast, oversampling with Mask R-CNN does not seem to improve the general precision of small and medium objects, but only improves the detection of large objects. It is concluded that DetectoRS improves results compared to Mask R-CNN, as well does oversampling. However, using a dataset that cannot have an all-class representation for train, validation, and test splits, together with an iterative stratification that does not guarantee all-class representations, makes it hard for future works to do exact comparisons to this study. Results are therefore approximate considering using all categories since 12 categories are missing from the test set, where 4 of those were impossible to split into train, validation, and test set. Further image collection and annotation to mitigate the imbalance would most noticeably improve results since results depend on class-averaged values. Doing oversampling with DetectoRS would also help improve results. There is also the option to combine the two datasets TACO and MJU-Waste to enforce training of more categories.
APA, Harvard, Vancouver, ISO, and other styles
11

Molin, David. "Pedestrian Detection Using Convolutional Neural Networks." Thesis, Linköpings universitet, Datorseende, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-120019.

Full text
Abstract:
Pedestrian detection is an important field with applications in active safety systems for cars as well as autonomous driving. Since autonomous driving and active safety are becoming technically feasible now the interest for these applications has dramatically increased.The aim of this thesis is to investigate convolutional neural networks (CNN) for pedestrian detection. The reason for this is that CNN have recently beensuccessfully applied to several different computer vision problems. The main applications of pedestrian detection are in real time systems. For this reason,this thesis investigates strategies for reducing the computational complexity offorward propagation for CNN.The approach used in this thesis for extracting pedestrians is to use a CNN tofind a probability map of where pedestrians are located. From this probabilitymap bounding boxes for pedestrians are generated. A method for handling scale invariance for the objects of interest has also been developed in this thesis. Experiments show that using this method givessignificantly better results for the problem of pedestrian detection.The accuracy which this thesis has managed to achieve is similar to the accuracy for some other works which use CNN.
APA, Harvard, Vancouver, ISO, and other styles
12

Arcidiacono, Claudio Salvatore. "An empirical study on synthetic image generation techniques for object detectors." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-235502.

Full text
Abstract:
Convolutional Neural Networks are a very powerful machine learning tool that outperformed other techniques in image recognition tasks. The biggest drawback of this method is the massive amount of training data required, since producing training data for image recognition tasks is very labor intensive. To tackle this issue, different techniques have been proposed to generate synthetic training data automatically. These synthetic data generation techniques can be grouped in two categories: the first category generates synthetic images using computer graphic software and CAD models of the objects to recognize; the second category generates synthetic images by cutting the object from an image and pasting it on another image. Since both techniques have their pros and cons, it would be interesting for industries to investigate more in depth the two approaches. A common use case in industrial scenarios is detecting and classifying objects inside an image. Different objects appertaining to classes relevant in industrial scenarios are often undistinguishable (for example, they all the same component). For these reasons, this thesis work aims to answer the research question “Among the CAD model generation techniques, the Cut-paste generation techniques and a combination of the two techniques, which technique is more suitable for generating images for training object detectors in industrial scenarios”. In order to answer the research question, two synthetic image generation techniques appertaining to the two categories are proposed.The proposed techniques are tailored for applications where all the objects appertaining to the same class are indistinguishable, but they can also be extended to other applications. The two synthetic image generation techniques are compared measuring the performances of an object detector trained using synthetic images on a test dataset of real images. The performances of the two synthetic data generation techniques used for data augmentation have been also measured. The empirical results show that the CAD models generation technique works significantly better than the Cut-Paste generation technique where synthetic images are the only source of training data (61% better),whereas the two generation techniques perform equally good as data augmentation techniques. Moreover, the empirical results show that the models trained using only synthetic images performs almost as good as the model trained using real images (7,4% worse) and that augmenting the dataset of real images using synthetic images improves the performances of the model (9,5% better).
Konvolutionella neurala nätverk är ett mycket kraftfullt verktyg för maskininlärning som överträffade andra tekniker inom bildigenkänning. Den största nackdelen med denna metod är den massiva mängd träningsdata som krävs, eftersom det är mycket arbetsintensivt att producera träningsdata för bildigenkänningsuppgifter. För att ta itu med detta problem har olika tekniker föreslagits för att generera syntetiska träningsdata automatiskt. Dessa syntetiska datagenererande tekniker kan grupperas i två kategorier: den första kategorin genererar syntetiska bilder med hjälp av datorgrafikprogram och CAD-modeller av objekten att känna igen; Den andra kategorin genererar syntetiska bilder genom att klippa objektet från en bild och klistra in det på en annan bild. Eftersom båda teknikerna har sina fördelar och nackdelar, skulle det vara intressant för industrier att undersöka mer ingående de båda metoderna. Ett vanligt fall i industriella scenarier är att upptäcka och klassificera objekt i en bild. Olika föremål som hänför sig till klasser som är relevanta i industriella scenarier är ofta oskiljbara (till exempel de är alla samma komponent). Av dessa skäl syftar detta avhandlingsarbete till att svara på frågan “Bland CAD-genereringsteknikerna, Cut-paste generationsteknikerna och en kombination av de två teknikerna, vilken teknik är mer lämplig för att generera bilder för träningsobjektdetektorer i industriellascenarier”. För att svara på forskningsfrågan föreslås två syntetiska bildgenereringstekniker som hänför sig till de två kategorierna. De föreslagna teknikerna är skräddarsydda för applikationer där alla föremål som tillhör samma klass är oskiljbara, men de kan också utökas till andra applikationer. De två syntetiska bildgenereringsteknikerna jämförs med att mäta prestanda hos en objektdetektor som utbildas med hjälp av syntetiska bilder på en testdataset med riktiga bilder. Föreställningarna för de två syntetiska datagenererande teknikerna som används för dataförökning har också uppmätts. De empiriska resultaten visar att CAD-modelleringstekniken fungerar väsentligt bättre än Cut-Paste-genereringstekniken, där syntetiska bilder är den enda källan till träningsdata (61% bättre), medan de två generationsteknikerna fungerar lika bra som dataförstoringstekniker. Dessutom visar de empiriska resultaten att modellerna som utbildats med bara syntetiska bilder utför nästan lika bra som modellen som utbildats med hjälp av riktiga bilder (7,4% sämre) och att förstora datasetet med riktiga bilder med hjälp av syntetiska bilder förbättrar modellens prestanda (9,5% bättre).
APA, Harvard, Vancouver, ISO, and other styles
13

Capuzzo, Davide. "3D StixelNet Deep Neural Network for 3D object detection stixel-based." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/22017/.

Full text
Abstract:
In this thesis it has been presented an algorithm of deep learning for 3D object detection from the point cloud in an outdoor environment. This algorithm is feed with stixel, a medium-type data generates starting from a point cloud or depth map. A stixel can be think as a small rectangle that start form the base of the road and then rises until the top of the obstacle summarizing the vertical surface of an object. The goal of stixel is to compress the data coming from sensors in order to have a fast transmission without losing information. The algorithm to generate stixel is a novel algorithm developed by myself that is able to be applied both from point cloud generated by lidar and also from depth map generated by stereo and mono camera. The main passage to create this type of data are: the elimination of points that lied on ground plane; the creation of an average matrix that summarizes the depth of group of stixel; the creation of stixel merging all the cells that are of the same object. The stixel generates reduce the points from 40 000 to 1200 for LIDAR point cloud and to 480 000 to 1200 for depth map. In order to extract 3D information from stixel this data has been feed into a deep learning algorithm adapted to receive as input this type of data. The adaptation has been made starting from an existing neural network use for 3D object detection in an indoor environment. This network has been adapted in order to overcome the sparsity of data and to the big size of the scene. Despite the reduction of the number of data, thanks to the right tuning the network created in this thesis have been able to achieve the state of the art for 3D object detection. This is a relevant result because it opens the way to the use of medium-type data and underlines that the reduction of points does not mean a reduction of information if the data are compressed in a smart way. oints not means a reduction of information if the data are compressed in a smart way.
APA, Harvard, Vancouver, ISO, and other styles
14

Mahmood, Muhammad Habib. "Motion annotation in complex video datasets." Doctoral thesis, Universitat de Girona, 2018. http://hdl.handle.net/10803/667583.

Full text
Abstract:
Motion segmentation refers to the process of separating regions and trajectories from a video sequence into coherent subsets of space and time. In this thesis, we created a new multifaceted motion segmentation dataset enclosing real-life long and short sequences, with different numbers of motions and frames per sequence, and real distortions with missing data. Trajectory- and region-based ground-truth is provided on all the frames of all the sequences. We also proposed a new semi-automatic tool for delineating the trajectories in complex videos, even in videos captured from moving cameras. With a minimal manual annotation of an object mask, the algorithm is able to propagate the label mask in all the frames. Object label correction based on static and moving occluder is performed by applying occluder mask tracking for a given depth ordering. The results show that our cascaded-naive approach provides successful results in a variety of video sequences.
La segmentació del moviment es refereix al procés de separar regions i trajectòries d'una seqüència de vídeo en subconjunts coherents d'espai i de temps. En aquesta tesi hem creat un nou i multifacètic dataset amb seqüències de la vida real que inclou diferent número de moviments i fotogrames per seqüència i distorsions amb dades incomplertes. A més, inclou ground-truth en tots els fotogrames basat en mesures de trajectòria i regió. Hem proposat també una nova eina semiautomàtica per delinear les trajectòries en vídeos complexos, fins i tot en vídeos capturats amb càmeres mòbils. Amb una mínima anotació manual dels objectes, l'algoritme és capaç de propagar-la en tots els fotogrames. Durant les oclusions, la correcció de les etiquetes es realitza aplicant el seguiment de la màscara per a cada ordre de profunditat. Els resultats obtinguts mostren que el nostre enfocament ofereix resultats reeixits en una àmplia varietat de seqüències de vídeo.
APA, Harvard, Vancouver, ISO, and other styles
15

Bustos, Aurelia. "Extraction of medical knowledge from clinical reports and chest x-rays using machine learning techniques." Doctoral thesis, Universidad de Alicante, 2019. http://hdl.handle.net/10045/102193.

Full text
Abstract:
This thesis addresses the extraction of medical knowledge from clinical text using deep learning techniques. In particular, the proposed methods focus on cancer clinical trial protocols and chest x-rays reports. The main results are a proof of concept of the capability of machine learning methods to discern which are regarded as inclusion or exclusion criteria in short free-text clinical notes, and a large scale chest x-ray image dataset labeled with radiological findings, diagnoses and anatomic locations. Clinical trials provide the evidence needed to determine the safety and effectiveness of new medical treatments. These trials are the basis employed for clinical practice guidelines and greatly assist clinicians in their daily practice when making decisions regarding treatment. However, the eligibility criteria used in oncology trials are too restrictive. Patients are often excluded on the basis of comorbidity, past or concomitant treatments and the fact they are over a certain age, and those patients that are selected do not, therefore, mimic clinical practice. This signifies that the results obtained in clinical trials cannot be extrapolated to patients if their clinical profiles were excluded from the clinical trial protocols. The efficacy and safety of new treatments for patients with these characteristics are not, therefore, defined. Given the clinical characteristics of particular patients, their type of cancer and the intended treatment, discovering whether or not they are represented in the corpus of available clinical trials requires the manual review of numerous eligibility criteria, which is impracticable for clinicians on a daily basis. In this thesis, a large medical corpora comprising all cancer clinical trials protocols in the last 18 years published by competent authorities was used to extract medical knowledge in order to help automatically learn patient’s eligibility in these trials. For this, a model is built to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. A method based on deep neural networks is trained on a dataset of 6 million short free-texts to classify them between elegible or not elegible. For this, pretrained word embeddings were used as inputs in order to predict whether or not short free-text statements describing clinical information were considered eligible. The semantic reasoning of the word-embedding representations obtained was also analyzed, being able to identify equivalent treatments for a type of tumor in an analogy with the drugs used to treat other tumors. Results show that representation learning using deep neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols and potentially assist practitioners when prescribing treatments. The second main task addressed in this thesis is related to knowledge extraction from medical reports associated with radiographs. Conventional radiology remains the most performed technique in radiodiagnosis services, with a percentage close to 75% (Radiología Médica, 2010). In particular, chest x-ray is the most common medical imaging exam with over 35 million taken every year in the US alone (Kamel et al., 2017). They allow for inexpensive screening of several pathologies including masses, pulmonary nodules, effusions, cardiac abnormalities and pneumothorax. For this task, all the chest-x rays that had been interpreted and reported by radiologists at the Hospital Universitario de San Juan (Alicante) from Jan 2009 to Dec 2017 were used to build a novel large-scale dataset in which each high-resolution radiograph is labeled with its corresponding metadata, radiological findings and pathologies. This dataset, named PadChest, includes more than 160,000 images obtained from 67,000 patients, covering six different position views and additional information on image acquisition and patient demography. The free text reports written in Spanish by radiologists were labeled with 174 different radiographic findings, 19 differential diagnoses and 104 anatomic locations organized as a hierarchical taxonomy and mapped onto standard Unified Medical Language System (UMLS) terminology. For this, a subset of the reports (a 27%) were manually annotated by trained physicians, whereas the remaining set was automatically labeled with deep supervised learning methods using attention mechanisms and fed with the text reports. The labels generated were then validated in an independent test set achieving a 0.93 Micro-F1 score. To the best of our knowledge, this is one of the largest public chest x-ray databases suitable for training supervised models concerning radiographs, and also the first to contain radiographic reports in Spanish. The PadChest dataset can be downloaded on request from http://bimcv.cipf.es/bimcv-projects/padchest/. PadChest is intended for training image classifiers based on deep learning techniques to extract medical knowledge from chest x-rays. It is essential that automatic radiology reporting methods could be integrated in a clinically validated manner in radiologists’ workflow in order to help specialists to improve their efficiency and enable safer and actionable reporting. Computer vision methods capable of identifying both the large spectrum of thoracic abnormalities (and also the normality) need to be trained on large-scale comprehensively labeled large-scale x-ray datasets such as PadChest. The development of these computer vision tools, once clinically validated, could serve to fulfill a broad range of unmet needs. Beyond implementing and obtaining results for both clinical trials and chest x-rays, this thesis studies the nature of the health data, the novelty of applying deep learning methods to obtain large-scale labeled medical datasets, and the relevance of its applications in medical research, which have contributed to its extramural diffusion and worldwide reach. This thesis describes this journey so that the reader is navigated across multiple disciplines, from engineering to medicine up to ethical considerations in artificial intelligence applied to medicine.
APA, Harvard, Vancouver, ISO, and other styles
16

Cooper, Lee Alex Donald. "High Performance Image Analysis for Large Histological Datasets." The Ohio State University, 2009. http://rave.ohiolink.edu/etdc/view?acc_num=osu1250004647.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Chaudhary, Gautam. "RZSweep a new volume-rendering technique for uniform rectilinear datasets /." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04012003-141739.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Bäck, Erik. "En vision om en ny didaktik för undervisning i företagsekonomi." Thesis, Malmö högskola, Lärarutbildningen (LUT), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-34009.

Full text
Abstract:
Detta examensarbete är genomfört inom ramen för examensarbetet på lärarprogrammet vid Malmö högskola. Syftet med uppsatsen är att visa på möjligheterna med ny didaktik vid undervisning i företagsekonomi på gymnasiet. Metoden som använts är studie av relevant pedagogisk och didaktisk litteratur. Resultatet är ett förslag på möjligheter med ny didaktik i ämnet Företagsekonomi med hjälp av MMORPGs funktioner. Slutsatsen är att användandet av MMORPGs vid undervisning i ämnet Företagsekonomi skapar många värdefulla fördelar ur didaktisk synpunkt i jämförelse med dagens klassrumsdominerade didaktik.
APA, Harvard, Vancouver, ISO, and other styles
19

Ramswamy, Lakshmy. "PARZSweep a novel parallel algorithm for volume rendering of regular datasets /." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-04012003-140443.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Jia, Sen. "Data from the wild in computer vision : generating and exploiting large scale and noisy datasets." Thesis, University of Bristol, 2016. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.738203.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Godavarthy, Sridhar. "Microexpression Spotting in Video Using Optical Strain." Scholar Commons, 2010. https://scholarcommons.usf.edu/etd/1642.

Full text
Abstract:
Microexpression detection plays a vital role in applications such as lie detection and psychological consultations. Current research is progressing in the direction of automating microexpression recognition by aiming at classifying the microexpressions in terms of FACS Action Units. Although high detection rates are being achieved, the datasets used for evaluation of these systems are highly restricted. They are limited in size - usually still pictures or extremely short videos; motion constrained; containing only a single microexpression and do not contain negative cases where microexpressions are absent. Only a few of these systems run in real time and even fewer have been tested on real life videos. This work proposes a novel method for automated spotting of facial microexpressions as a preprocessing step to existing microexpression recognition systems. By identifying and rejecting sequences that do not contain microexpressions, longer sequences can be converted into shorter, constrained, relevant sequences which comprise of only single microexpressions, which can then be passed as input to existing systems, improving their performance and efficiency. This method utilizes the small temporal extent of microexpressions for their identification. The extent is determined by the period for which strain, due to the non-rigid motion caused during facial movement, is impacted on the facial skin. The subject's face is divided into sub-regions, and facial strain is calculated for each of these regions. The strain patterns in individual regions are used to identify subtle changes which facilitate the detection of microexpressions. The strain magnitude is calculated using the central difference method over the robust and dense optical flow field of each subject's face. The computed strain is then thresholded using a variable threshold. If the duration for which the strain is above the threshold corresponds to the duration of a microexpression, detection is reported. The datasets used for algorithm evaluation are comprised of a mix of natural and enacted microexpressions. The results were promising with up to 80% true detection rate. Increased false positive spots in the Canal 9 dataset can be attributed to talking by the subjects causing fine movements in the mouth region. Performing speech detection to identify sequences where the subject is talking and excluding the mouth region during those periods could help reduce the number of false positives.
APA, Harvard, Vancouver, ISO, and other styles
22

Touranakou, Maria. "A Novel System for Deep Analysis of Large-Scale Hand Pose Datasets." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-240419.

Full text
Abstract:
This degree project proposes the design and the implementation of a novel systemfor deep analysis on large-scale datasets of hand poses. The system consists of a set ofmodules for automatic redundancy removal, classification, statistical analysis andvisualization of large-scale datasets based on their content characteristics. In thisproject, work is performed on the specific use case of images of hand movements infront of smartphone cameras. The characteristics of the images are investigated, andthe images are pre-processed to reduce repetitive content and noise in the data. Twodifferent design paradigms for content analysis and image classification areemployed, a computer vision pipeline and a deep learning pipeline. The computervision pipeline incorporates several stages of image processing including imagesegmentation, hand detection as well as feature extraction followed by a classificationstage. The deep learning pipeline utilizes a convolutional neural network forclassification. For industrial applications with high diversity on data content, deeplearning is suggested for image classification and computer vision is recommendedfor feature analysis. Finally, statistical analysis is performed to visually extractrequired information about hand features and diversity of the classified data. Themain contribution of this work lies in the customization of computer vision and deeplearning tools for the design and the implementation of a hybrid system for deep dataanalysis.
Detta examensprojekt föreslår design och implementering av ett nytt system för djup analys av storskaliga datamängder av handställningar. Systemet består av en uppsättning moduler för automatisk borttagning av redundans, klassificering, statistisk analys och visualisering av storskaliga dataset baserade på deras egenskaper. I det här projektet utförs arbete på det specifika användningsområdet för bilder av handrörelser framför smarttelefonkameror. Egenskaperna hos bilderna undersöks, och bilderna förbehandlas för att minska repetitivt innehåll och ljud i data. Två olika designparadigmer för innehållsanalys och bildklassificering används, en datorvisionspipeline och en djuplärningsrörledning. Datasynsrörledningen innehåller flera steg i bildbehandling, inklusive bildsegmentering, handdetektering samt funktionen extraktion följt av ett klassificeringssteg. Den djupa inlärningsrörledningen använder ett fällningsnätverk för klassificering. För industriella applikationer med stor mångfald på datainnehåll föreslås djupinlärning för bildklassificering och vision rekommenderas för funktionsanalys. Slutligen utförs statistisk analys för att visuellt extrahera nödvändig information om handfunktioner och mångfald av klassificerade data. Huvuddelen av detta arbete ligger i anpassningen av datasyn och djupa inlärningsverktyg för design och implementering av ett hybridsystem för djup dataanalys.
APA, Harvard, Vancouver, ISO, and other styles
23

Kratochvíla, Lukáš. "Trasování objektu v reálném čase." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-403748.

Full text
Abstract:
Sledování obecného objektu na zařízení s omezenými prostředky v reálném čase je obtížné. Mnoho algoritmů věnujících se této problematice již existuje. V této práci se s nimi seznámíme. Různé přístupy k této problematice jsou diskutovány včetně hlubokého učení. Představeny jsou reprezentace objektu, datasety i metriky pro vyhodnocování. Mnoho sledovacích algorimů je představeno, osm z nich je implementováno a vyhodnoceno na VOT datasetu.
APA, Harvard, Vancouver, ISO, and other styles
24

Hummel, Georg Verfasser], Peter [Akademischer Betreuer] [Stütz, and Paolo [Gutachter] Remagnino. "On synthetic datasets for development of computer vision algorithms in airborne reconnaissance applications / Georg Hummel ; Gutachter: Peter Stütz, Paolo Remagnino ; Akademischer Betreuer: Peter Stütz ; Universität der Bundeswehr München, Fakultät für Luft- und Raumfahrttechnik." Neubiberg : Universitätsbibliothek der Universität der Bundeswehr München, 2017. http://d-nb.info/1147386331/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Hummel, Georg [Verfasser], Peter [Akademischer Betreuer] [Gutachter] Stütz, and Paolo [Gutachter] Remagnino. "On synthetic datasets for development of computer vision algorithms in airborne reconnaissance applications / Georg Hummel ; Gutachter: Peter Stütz, Paolo Remagnino ; Akademischer Betreuer: Peter Stütz ; Universität der Bundeswehr München, Fakultät für Luft- und Raumfahrttechnik." Neubiberg : Universitätsbibliothek der Universität der Bundeswehr München, 2017. http://d-nb.info/1147386331/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Michaud, Dorian. "Indexation bio-inspirée pour la recherche d'images par similarité." Thesis, Poitiers, 2018. http://www.theses.fr/2018POIT2288/document.

Full text
Abstract:
La recherche d'images basée sur le contenu visuel est un domaine très actif de la vision par ordinateur, car le nombre de bases d'images disponibles ne cesse d'augmenter.L’objectif de ce type d’approche est de retourner les images les plus proches d'une requête donnée en terme de contenu visuel.Notre travail s'inscrit dans un contexte applicatif spécifique qui consiste à indexer des petites bases d'images expertes sur lesquelles nous n'avons aucune connaissance a priori.L’une de nos contributions pour palier ce problème consiste à choisir un ensemble de descripteurs visuels et de les placer en compétition directe. Nous utilisons deux stratégies pour combiner ces caractéristiques : la première, est pyschovisuelle, et la seconde, est statistique.Dans ce contexte, nous proposons une approche adaptative non supervisée, basée sur les sacs de mots et phrases visuels, dont le principe est de sélectionner les caractéristiques pertinentes pour chaque point d'intérêt dans le but de renforcer la représentation de l'image.Les tests effectués montrent l'intérêt d'utiliser ce type de méthodes malgré la domination des méthodes basées réseaux de neurones convolutifs dans la littérature.Nous proposons également une étude, ainsi que les résultats de nos premiers tests concernant le renforcement de la recherche en utilisant des méthodes semi-interactives basées sur l’expertise de l'utilisateur
Image Retrieval is still a very active field of image processing as the number of available image datasets continuously increases.One of the principal objectives of Content-Based Image Retrieval (CBIR) is to return the most similar images to a given query with respect to their visual content.Our work fits in a very specific application context: indexing small expert image datasets, with no prior knowledge on the images. Because of the image complexity, one of our contributions is the choice of effective descriptors from literature placed in direct competition.Two strategies are used to combine features: a psycho-visual one and a statistical one.In this context, we propose an unsupervised and adaptive framework based on the well-known bags of visual words and phrases models that select relevant visual descriptors for each keypoint to construct a more discriminative image representation.Experiments show the interest of using this this type of methodologies during a time when convolutional neural networks are ubiquitous.We also propose a study about semi interactive retrieval to improve the accuracy of CBIR systems by using the knowledge of the expert users
APA, Harvard, Vancouver, ISO, and other styles
27

Malireddi, Sri Raghu. "Systematic generation of datasets and benchmarks for modern computer vision." Thesis, 2019. http://hdl.handle.net/1828/10689.

Full text
Abstract:
Deep Learning is dominant in the field of computer vision, thanks to its high performance. This high performance is driven by large annotated datasets and proper evaluation benchmarks. However, two important areas in computer vision, depth-based hand segmentation, and local features, respectively lack a large well-annotated dataset and a benchmark protocol that properly demonstrates its practical performance. Therefore, in this thesis, we focus on these two problems. For hand segmentation, we create a novel systematic way to easily create automatic semantic segmentation annotations for large datasets. We achieved this with the help of traditional computer vision techniques and minimal hardware setup of one RGB-D camera and two distinctly colored skin-tight gloves. Our method allows easy creation of large-scale datasets with high annotation quality. For local features, we create a new modern benchmark, that reveals their different aspects. Specifically wide-baseline stereo matching and Multi-View Stereo (MVS), of keypoints in a more practical setup, namely Structure-from-Motion (SfM). We believe that through our new benchmark, we will be able to spur research on learned local features to a more practical direction. In this respect, the benchmark developed for the thesis will be used to host a challenge on local features.
Graduate
APA, Harvard, Vancouver, ISO, and other styles
28

Karpenko, Alexandre. "50,000 Tiny Videos: A Large Dataset for Non-parametric Content-based Retrieval and Recognition." Thesis, 2009. http://hdl.handle.net/1807/17690.

Full text
Abstract:
This work extends the tiny image data-mining techniques developed by Torralba et al. to videos. A large dataset of over 50,000 videos was collected from YouTube. This is the largest user-labeled research database of videos available to date. We demonstrate that a large dataset of tiny videos achieves high classification precision in a variety of content-based retrieval and recognition tasks using very simple similarity metrics. Content-based copy detection (CBCD) is evaluated on a standardized dataset, and the results are applied to related video retrieval within tiny videos. We use our similarity metrics to improve text-only video retrieval results. Finally, we apply our large labeled video dataset to various classification tasks. We show that tiny videos are better suited for classifying activities than tiny images. Furthermore, we demonstrate that classification can be improved by combining the tiny images and tiny videos datasets.
APA, Harvard, Vancouver, ISO, and other styles
29

Shullani, Dasara. "Video forensic tools exploiting features from video-container to video-encoder level." Doctoral thesis, 2018. http://hdl.handle.net/2158/1126144.

Full text
Abstract:
The escalation of multimedia contents exchange, especially of videos belonging to mobile devices, and the availability of a great amount of editing software has raised grave doubts on their digital life-cycle. In this thesis, we firstly introduce a new dataset for multimedia forensics and then develop forensic tools that analyse the video-container and the video-signal in order to evaluate possible tampering that have been introduced in the life-cycle of a video content. The first contribution consists on the release of a new Dataset of videos and images captured from to 35 modern smartphones/tablets belonging to 11 different brands: Apple, Asus, Huawei, Lenovo, LG electronics, Microsoft, Oneplus, Samsung, Sony, Wiko and Xiaomi. Overall, we collected 11732 native images; 7565 of them were shared through Facebook, in both high and low quality, and through WhatsApp, resulting in a total of 34427 images. Furthermore we acquired 648 native videos, 622 of which were shared through YouTube at the maximum available resolution, and 644 through WhatsApp, resulting in a total of 1914 videos. The uniqueness of the VISION dataset was tested on well known forensic tool, i.e., the detection of the Sensor Pattern Noise (SPN) left by the acquisition device for the source identification of native/social media contents. The second contribution is based on the analysis of the container structure of videos acquired by means of mobile devices. We argue that the atoms belonging to the container, in terms of order and value, are fragile and that it is more difficult to hide their modifications than the regular metadata. This characteristic can be exploited to perform Source Identification and Integrity Verification of videos taken from devices belonging to well known operating systems and manufactures. In the third contribution we focus on the video-signal and on its encoding process. We used codecs that perform a hybrid video coding scheme, and developed a classification technique able to perform group of picture length estimation and double compression detection. The proposed technique is one of the fastest approaches that use videos encoded with B-frames, with both constant bit rate and variable bit rate.
APA, Harvard, Vancouver, ISO, and other styles
30

Moreira, Gonçalo Rebelo de Almeida. "Neuromorphic Event-based Facial Identity Recognition." Master's thesis, 2021. http://hdl.handle.net/10316/98251.

Full text
Abstract:
Dissertação de Mestrado Integrado em Engenharia Electrotécnica e de Computadores apresentada à Faculdade de Ciências e Tecnologia
A investigação na área do reconhecimento facial existe já há mais de meio século. O grandeinteresse neste tópico advém do seu tremendo potencial para impactar várias indústrias, comoa de vídeovigilância, autenticação pessoal, investigação criminal, lazer, entre outras. A maioriados algoritmos estado­ da­ arte baseiam-­se apenas na aparência facial, especificamente, estesmétodos utilizam as caraterísticas estáticas da cara humana (e.g., a distância entre os olhos,a localização do nariz, a forma do nariz) para determinar com bastante eficácia a identidadede um sujeito. Contudo, é também discutido o facto de que os humanos fazem uso de outrotipo de informação facial para identificar outras pessoas, nomeadamente, o movimento facialidiossincrático de uma pessoa. Este conjunto de dados faciais é relevante devido a ser difí­cil de replicar ou de falsificar, enquanto que a aparência é facilmente alterada com ajuda deferramentas computacionais baratas e disponíveis a qualquer um.Por outro lado, câmaras de eventos são dispositivos neuromórficos, bastante recentes, quesão ótimos a codificar informação da dinâmica de uma cena. Estes sensores são inspiradospelo modo de funcionamento biológico do olho humano. Em vez de detetarem as várias inten­sidades de luz de uma cena, estes captam as variações dessas intensidades no cenário. Demodo que, e comparando com câmaras standard, estes mecanismos sensoriais têm elevadaresolução temporal, não sofrendo de imagem tremida, e são de baixo consumo, entre outrosbenefícios. Algumas das suas aplicações são Localização e Mapeamento Simultâneo (SLAM)em tempo real, deteção de anomalias e reconhecimento de ações/gestos.Tomando tudo isto em conta, o foco principal deste trabalho é de avaliar a aptidão da tec­nologia fornecida pelas câmaras de eventos para completar tarefas mais complexas, nestecaso, reconhecimento de identidade facial, e o quão fácil será a sua integração num sistemano mundo real. Adicionalmente, é também disponibilizado o Dataset criado no âmbito destadissertação (NVSFD Dataset) de modo a possibilitar investigação futura sobre o tópico.
Facial recognition research has been around for longer than a half-century, as of today. Thisgreat interest in the field stems from its tremendous potential to enhance various industries,such as video surveillance, personal authentication, criminal investigation, and leisure. Moststate­of­the­art algorithms rely on facial appearance, particularly, these methods utilize the staticcharacteristics of the human face (e.g., the distance between both eyes, nose location, noseshape) to determine the subject’s identity extremely accurately. However, it is further argued thathumans also make use of another type of facial information to identify other people, namely, one’s idiosyncratic facial motion. This kind of facial data is relevant due to being hardly replicableor forged, whereas appearance can be easily distorted by cheap software available to anyone.On another note, event­cameras are quite recent neuromorphic devices that are remark­able at encoding dynamic information in a scene. These sensors are inspired by the biologicaloperation mode of the human eye. Rather than detecting the light intensity, they capture lightintensity variations in the setting. Thus, in comparison to standard cameras, this sensing mech­anism has a high temporal resolution, therefore it does not suffer from motion blur, and haslow power consumption, among other benefits. A few of its early applications have been real­time Simultaneous Localization And Mapping (SLAM), anomaly detection, and action/gesturerecognition.Taking it all into account, the main purpose of this work is to evaluate the aptitude of the technology offered by event­cameras for completing a more complex task, that being facialidentity recognition, and how easily it could be integrated into real world systems. Additionally, itis also provided the Dataset created in the scope of this dissertation (NVSFD Dataset) in orderto facilitate future third-party investigation on the topic.
APA, Harvard, Vancouver, ISO, and other styles
31

Foroozandeh, Mehdi. "GAN-Based Synthesis of Brain Tumor Segmentation Data : Augmenting a dataset by generating artificial images." Thesis, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-169863.

Full text
Abstract:
Machine learning applications within medical imaging often suffer from a lack of data, as a consequence of restrictions that hinder the free distribution of patient information. In this project, GANs (generative adversarial networks) are used to generate data synthetically, in an effort to circumvent this issue. The GAN framework PGAN is trained on the brain tumor segmentation dataset BraTS to generate new, synthetic brain tumor masks with the same visual characteristics as the real samples. The image-to-image translation network SPADE is subsequently trained on the image pairs in the real dataset, to learn a transformation from segmentation masks to brain MR images, and is in turn used to map the artificial segmentation masks generated by PGAN to corresponding artificial MR images. The images generated by these networks form a new, synthetic dataset, which is used to augment the original dataset. Different quantities of real and synthetic data are then evaluated in three different brain tumor segmentation tasks, where the image segmentation network U-Net is trained on this data to segment (real) MR images into the classes in question. The final segmentation performance of each training instance is evaluated over test data from the real dataset with the Weighted Dice Loss metric. The results indicate a slight increase in performance across all segmentation tasks evaluated in this project, when including some quantity of synthetic images. However, the differences were largest when the experiments were restricted to using only 20 % of the real data, and less significant when the full dataset was made available. A majority of the generated segmentation masks appear visually convincing to an extent (although somewhat noisy with regards to the intra-tumoral classes), while a relatively large proportion appear heavily noisy and corrupted. However, the translation of segmentation masks to MR images via SPADE proved more reliable and consistent.
APA, Harvard, Vancouver, ISO, and other styles
32

Bojja, Abhishake Kumar. "Deep neural networks for semantic segmentation." Thesis, 2020. http://hdl.handle.net/1828/11696.

Full text
Abstract:
Segmenting image into multiple meaningful regions is an essential task in Computer Vision. Deep Learning has been highly successful for segmentation, benefiting from the availability of the annotated datasets and deep neural network architectures. However, depth-based hand segmentation, an important application area of semantic segmentation, has yet to benefit from rich and large datasets. In addition, while deep methods provide robust solutions, they are often not efficient enough for low-powered devices. In this thesis, we focus on these two problems. To tackle the problem of lack of rich data, we propose an automatic method for generating high-quality annotations and introduce a large scale hand segmentation dataset. By exploiting the visual cues given by an RGBD sensor and a pair of colored gloves, we automatically generate dense annotations for two-hand segmentation. Our automatic annotation method lowers the cost/complexity of creating high-quality datasets and makes it easy to expand the dataset in the future. To reduce the computational requirement and allow real-time segmentation on low power devices, we propose a new representation and architecture for deep networks that predict segmentation maps based on Voronoi Diagrams. Voronoi Diagrams split space into discrete regions based on proximity to a set of points making them a powerful representation of regions, which we can then use to represent our segmentation outcomes. Specifically, we propose to estimate the location and class for these sets of points, which are then rasterized into an image. Notably, we use a differentiable definition of the Voronoi Diagram based on the softmax operator, enabling its use as a decoder layer in an end-to-end trainable network. As rasterization can take place at any given resolution, our method especially excels at rendering high-resolution segmentation maps, given a low-resolution image. We believe that our new HandSeg dataset will open new frontiers in Hand Segmentation research, and our cost-effective automatic annotation pipeline can benefit other relevant labeling tasks. Our newly proposed segmentation network enables high-quality segmentation representations that are not practically possible on low power devices using existing approaches.
Graduate
APA, Harvard, Vancouver, ISO, and other styles
33

Anderson, Peter James. "Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents." Phd thesis, 2018. http://hdl.handle.net/1885/164018.

Full text
Abstract:
Each time we ask for an object, describe a scene, follow directions or read a document containing images or figures, we are converting information between visual and linguistic representations. Indeed, for many tasks it is essential to reason jointly over visual and linguistic information. People do this with ease, typically without even noticing. Intelligent systems that perform useful tasks in unstructured situations, and interact with people, will also require this ability. In this thesis, we focus on the joint modelling of visual and linguistic information using deep neural networks. We begin by considering the challenging problem of automatically describing the content of an image in natural language, i.e., image captioning. Although there is considerable interest in this task, progress is hindered by the difficulty of evaluating the generated captions. Our first contribution is a new automatic image caption evaluation metric that measures the quality of generated captions by analysing their semantic content. Extensive evaluations across a range of models and datasets indicate that our metric, dubbed SPICE, shows high correlation with human judgements. Armed with a more effective evaluation metric, we address the challenge of image captioning. Visual attention mechanisms have been widely adopted in image captioning and visual question answering (VQA) architectures to facilitate fine-grained visual processing. We extend existing approaches by proposing a bottom-up and top-down attention mechanism that enables attention to be focused at the level of objects and other salient image regions, which is the natural basis for attention to be considered. Applying this approach to image captioning we achieve state of the art results on the COCO test server. Demonstrating the broad applicability of the method, applying the same approach to VQA we obtain first place in the 2017 VQA Challenge. Despite these advances, recurrent neural network (RNN) image captioning models typically do not generalise well to out-of-domain images containing novel scenes or objects. This limitation severely hinders the use of these models in real applications. To address this problem, we propose constrained beam search, an approximate search algorithm that enforces constraints over RNN output sequences. Using this approach, we show that existing RNN captioning architectures can take advantage of side information such as object detector outputs and ground-truth image annotations at test time, without retraining. Our results significantly outperform previous approaches that incorporate the same information into the learning algorithm, achieving state of the art results for out-of-domain captioning on COCO. Last, to enable and encourage the application of vision and language methods to problems involving embodied agents, we present the Matterport3D Simulator, a large-scale interactive reinforcement learning environment constructed from densely-sampled panoramic RGB-D images of 90 real buildings. Using this simulator, which can in future support a range of embodied vision and language tasks, we collect the first benchmark dataset for visually-grounded natural language navigation in real buildings. We investigate the difficulty of this task, and particularly the difficulty of operating in unseen environments, using several baselines and a sequence-to-sequence model based on methods successfully applied to other vision and language tasks.
APA, Harvard, Vancouver, ISO, and other styles
34

Amelio, Ravelli Andrea. "Annotation of Linguistically Derived Action Concepts in Computer Vision Datasets." Doctoral thesis, 2020. http://hdl.handle.net/2158/1200356.

Full text
Abstract:
In the present work, an in-depth exploration of IMAGACT Ontology of Action Verbs has been traced, with the focus of exploiting the resource in NLP tasks. Starting from the Introduction, the idea of making use of IMAGACT multimodal action conceptualisation has been drawn, with some reflections on evidences of the deep linking between Language and Vision, and on the fact that action plays a key role in this linkage. Thus, the multimodal and multilingual features of IMAGACT have been described, with also some details on the framework of the resource building. It followed a concrete case-study on IMAGACT internal data, that led to the proposal of an inter-linguistic manual mapping between the Action Types of verbs which refer to cutting eventualities in English and Italian. Then, a series of ex-periments have been presented, involving the exploitation of IMAGACT in linking with other resources and building deliverable NLP products (such as the Ref-vectors of action verbs). One of the experiments has been described extensively: the visual enrichment of IMAGACT through instance population of its action concepts, making use of Audio Description of movies for visually impaired people. From this last experiment it emerged that dealing with non-conventional scenarios, such as the one of assessing action reference similarity between texts from different domains, is particularly challenging, given that fine-grained differences among action concepts are difficult to derive purely from the textual representation.
APA, Harvard, Vancouver, ISO, and other styles
35

Gebali, Aleya. "Detection of salient events in large datasets of underwater video." Thesis, 2012. http://hdl.handle.net/1828/4156.

Full text
Abstract:
NEPTUNE Canada possesses a large collection of video data for monitoring marine life. Such data is important for marine biologists who can observe species in their natural habitat on a 24/7 basis. It is counterproductive for researchers to manually search for the events of interest (EOI) in a large database. Our study aims to perform the automatic detection of the EOI de ned as animal motion. The output of this approach is in a summary video clip of the original video fi le that contains all salient events with their associated start and end frames. Our work is based on Laptev [1] spatio-temporal interest points detection method. Interest points in the 3D spatio-temporal domain (x,y,t) require frame values in local spatio-temporal volumes to have large variations along all three dimensions. These local intensity variations are captured in the magnitude of the spatio-temporal derivatives. We report experimental results on video summarization using a database of videos from Neptune Canada. The eff ect of several parameters on the performance of the proposed approach is studied in detail.
Graduate
APA, Harvard, Vancouver, ISO, and other styles
36

Breslav, Mikhail. "3D pose estimation of flying animals in multi-view video datasets." Thesis, 2016. https://hdl.handle.net/2144/19720.

Full text
Abstract:
Flying animals such as bats, birds, and moths are actively studied by researchers wanting to better understand these animals’ behavior and flight characteristics. Towards this goal, multi-view videos of flying animals have been recorded both in lab- oratory conditions and natural habitats. The analysis of these videos has shifted over time from manual inspection by scientists to more automated and quantitative approaches based on computer vision algorithms. This thesis describes a study on the largely unexplored problem of 3D pose estimation of flying animals in multi-view video data. This problem has received little attention in the computer vision community where few flying animal datasets exist. Additionally, published solutions from researchers in the natural sciences have not taken full advantage of advancements in computer vision research. This thesis addresses this gap by proposing three different approaches for 3D pose estimation of flying animals in multi-view video datasets, which evolve from successful pose estimation paradigms used in computer vision. The first approach models the appearance of a flying animal with a synthetic 3D graphics model and then uses a Markov Random Field to model 3D pose estimation over time as a single optimization problem. The second approach builds on the success of Pictorial Structures models and further improves them for the case where only a sparse set of landmarks are annotated in training data. The proposed approach first discovers parts from regions of the training images that are not annotated. The discovered parts are then used to generate more accurate appearance likelihood terms which in turn produce more accurate landmark localizations. The third approach takes advantage of the success of deep learning models and adapts existing deep architectures to perform landmark localization. Both the second and third approaches perform 3D pose estimation by first obtaining accurate localization of key landmarks in individual views, and then using calibrated cameras and camera geometry to reconstruct the 3D position of key landmarks. This thesis shows that the proposed algorithms generate first-of-a-kind and leading results on real world datasets of bats and moths, respectively. Furthermore, a variety of resources are made freely available to the public to further strengthen the connection between research communities.
APA, Harvard, Vancouver, ISO, and other styles
37

Hult, Jim, and Pontus Pihl. "Inspecting product quality with computer vision techniques : Comparing traditional image processingmethodswith deep learning methodson small datasets in finding surface defects." Thesis, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hj:diva-54056.

Full text
Abstract:
Quality control is an important part of any production line. It can be done manually but is most efficient if automated. Inspecting qualitycan include many different processes but this thesisisfocusedon the visual inspection for cracks and scratches. The best way of doingthis at the time of writing is with the help of Artificial Intelligence (AI), more specifically Deep Learning (DL).However, these need a training datasetbeforehand to train on and for some smaller companies, this mightnotbean option. This study triesto find an alternative visual inspection method,that does notrelyon atrained deep learning modelfor when trainingdata is severely limited. Our method is to use edge detection algorithmsin combination with a template to find any edge that doesn’t belong. These include scratches, cracks, or misaligned stickers. These anomalies arethen highlighted in the original picture to show where the defect is. Since deep learningis stateof the art ofvisual inspection, it is expected to outperform template matching when sufficiently trained.To find where this occurs,the accuracy of template matching iscompared to the accuracy of adeep learning modelat different training levels. The deep learning modelisto be trained onimage augmenteddatasets of size: 6, 12, 24, 48, 84, 126, 180, 210, 315, and 423. Both template matching and the deep learning modelwas tested on the samebalanceddataset of size 216. Half of the dataset was images of scratched units,and the other half was of unscratched units. This gave a baseline of 50% where anything under would be worse thanjust guessing. Template matching achieved an accuracy of 88%, and the deep learning modelaccuracyrose from 51% to 100%as the training setincreased. This makes template matching have better accuracy then AI trained on dataset of 84imagesor smaller. But a deep learning modeltrained on 126 images doesstart to outperform template matching. Template matching did perform well where no data was available and training adeep learning modelis no option. But unlike a deep learning model, template matching would not need retraining to find other kinds of surface defects. Template matching could also be used to find for example, misplaced stickers. Due to the use of a template, any edge that doesnot match isdetected.  The ways to train deep learning modelis highly customizable to the users need. Due to resourceand knowledge restrictions, a deep dive into this subject was not conducted.For template matching, only Canny edge detection was used whenmeasuringaccuracy. Other edge detection methodssuch as, Sobel, and Prewitt was ruledoutearlier in this study.
APA, Harvard, Vancouver, ISO, and other styles
38

RUSSO, PAOLO. "Broadening deep learning horizons: models for RGB and depth images adaptation." Doctoral thesis, 2020. http://hdl.handle.net/11573/1365047.

Full text
Abstract:
Deep Learning has revolutionized the whole field of Computer Vision. Very deep models with an huge number of parameters have been successfully applied on big image datasets for difficult tasks like object classification, person re-identification, semantic segmentation. Two-fold results have been obtained: astonishing performance, with accuracy often comparable or better than a human counterpart on one hand, and on the other the development of robust, complex and powerful visual features which exhibit the ability to generalize to new visual tasks. Still, the success of Deep Learning methods relies on the availability of big datasets: whenever the available, labeled data is limited or redundant, a deep neural network model will typically overfit on training data, showing poor performance on new, unseen data. A typical solution used by the Deep Learning community in those cases is to rely on some Transfer Learning techniques; within the several available methods, the most successful one has been to pre-train the deep model on a big heterogeneous dataset (like ImageNet) and then to finetune the model on the available training data. Among several fields of application, this approach has been heavily used by the robotic community for depth images object recognition. Depth images are usually provided by depth sensors (eg. Kinect) and their availability is somewhat scarce: the biggest depth images dataset publicly available includes 50.000 samples, making the use of a pre-trained network the only successful method to exploit deep models on depth data. Without any doubt, this method provides suboptimal results as the network is trained on traditional RGB images having very different perceptual information with respect to depth maps; better results could be obtained if a big enough depth dataset would be available, enabling the training a deep model from scratch. Another frequent issue is the difference of statistical properties between training and test data (domain gap). In this case, even in the presence of enough training data, the generalization ability of the model will be poor, thus making the use of a Domain Adaptation method able to reduce the domains gap; this can improve both the robustness of the model and its final classification performances. In this thesis both problems have been tackled by developing a series of Deep Learning solutions for Domain Adaptation and Transfer Learning tasks on RGB and depth images domains: a new synthetic depth images dataset is presented, showing the performance of a deep model trained from scratch on depth-only data. At the same time, a new powerful depthRGB mapping module is analyzed, to optimize the classification accuracy on depth images tasks while using pretrained-on-ImageNet deep models. The study of the depth domain ends with a recurrent neural network for egocentric action recognition capable of exploiting depth images as an additional source of attention. A novel GAN model and an hybrid pixel/features adaptation architecture for RGB images have been developed: the former on single-domain adaptation tasks, while the latter on multidomain adaptation and generalization tasks. Finally, a preliminary approach to the problem of multi-source Domain Adaptation on a semantic segmentation task is examined, based on the combination of a multi-branch segmentation model and a adversarial technique, capable of exploiting all the available synthetic training datasets and to increase the overall performance. The performance obtained by using the proposed algorithms are often better or equivalent with respect to the currently available state of the art methods on several datasets and domains, demonstrating the superiority of our approach. Moreover, our analysis shows that the creation of ad-hoc domain adaptation and transfer learning techniques are mandatory in order to obtain the best accuracy in the presence of any domain gap, with a little or negligible additional computational cost.
APA, Harvard, Vancouver, ISO, and other styles
39

(8771429), Ashley S. Dale. "3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING." Thesis, 2021.

Find full text
Abstract:

An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ F1 = 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background.

APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography