Dissertations / Theses: 'Neural networks; Visual information'

1

Song, Yue. "Towards Multi-Scale Visual Explainability for Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281359.

Full text

Abstract:

Explainability methods seek to find out visual explanations for neural network decisions. Existing techniques mainly fall into two categories: backpropagation- based methods and occlusion-based methods. The former category selectively highlights the computed gradients, while the latter occludes the input to maximally confuse the classifier and visualize the distinct regions. Motivated by the occlusion methods, we propose an explainability model which to our knowledge is the first attempt to extract multi-scale explanations by perturbing the intermediate representations. Furthermore, we present two vi- sualization techniques that can fuse the multi-scale explanations into a single image and suggest an general evaluation metric to assess the explanation’s quality. Both qualitative and quantitative experimental results on several kinds of datasets demonstrate the efficacy of our model.
Förklarbarhetsmetoder försöker ta reda på visuella förklaringar till beslut om neurala nätverk. Befintliga tekniker faller huvudsakligen i två kategorier: backpropagationsbaserade metoder och okklusionsbaserade metoder. Den förra kategorin belyser selektivt de beräknade gradienterna, medan den senare slår in ingången för att maximera förvirra klassificera och visualisera de distinkta regionerna. Motiverade av ocklusionsmetoderna föreslår vi en förklarbarhetsmodell som enligt vår kunskap är det första försöket att extrahera flerskaliga förklaringar genom att störa de mellanliggande representationerna. Vidare presenterar vi två visualiseringstekniker som kan smälta multi -skala förklaringar till en enda bild och föreslå en utvärderingsmetrik för att bedöma förklaringens kvalitet. Både kvalitativa och kvantitativa experimentella resultat på flera typer av datasätt visar effektiviteten hos vår modell.

APA, Harvard, Vancouver, ISO, and other styles

2

Newman, Rhys A. "Automatic learning in computer vision." Thesis, University of Oxford, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390526.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Mayer, Nikolaus [Verfasser], and Thomas [Akademischer Betreuer] Brox. "Synthetic training data for deep neural networks on visual correspondence tasks." Freiburg : Universität, 2020. http://d-nb.info/1216826692/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Yavari, Najib. "Few-Shot Learning with Deep Neural Networks for Visual Quality Control: Evaluations on a Production Line." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-283119.

Full text

Abstract:

Having a well representative and adequate amount of data samples plays an important role in the success of deep learning algorithms used for image recognition. On the other hand, collecting and manually labeling a large-scale dataset requires a great deal of human interaction which in turn is very timeconsuming. In this thesis project, we explore the possibilities of new deeplearning approaches used for image recognition that do not require a big amount of data. Since Few-Shot Learning (FSL) models are known to be the most promising approach to tackle the problem of not having an adequate dataset, a hand full of the state-of-the-art algorithms based on FSL approach such as Model-Agnostic Meta-Learning (MAML), Prototypical Networks (ProtoNet), Relation Networks (RelationNet), Baseline, and Baseline++ are implemented and analyzed. These models are used to classify a series of issues for the automation of the visual quality inspection in a production line. Moreover, the performance of the deeper networks in comparison to the shallower networks is explored. Our experiment results on the available dataset show that the Baseline++ model has the best performance among the other models. Furthermore, Baseline++ with a six-layer convolutional network as a feature backbone is a relatively simple model to train that does not require a high computational power compared to the other models.
Inom maskininlärning spelar tillgången till en bra och lämplig mängd data en viktig roll i framgången för djupa inlärningsalgoritmer som används för bildigenkänning. Insamling och manuell märkning av ett storskaligt dataset kräver däremot en hel del mänsklig interaktion som är mycket tidskrävande. I detta examensarbete undersöker vi möjligheterna med nya djupinlärningsmetoder som används för bildigenkänning som inte kräver ett storskaligt dataset. Eftersom Few-Shot Learning (FSL) modeller är kända för att vara den mest lovande metoden för att hantera problemet med att inte ha ett tillräckligt dataset, implementerar och analyserar vi några av de senaste modellerna baserad på FSL, såsom: Model-Agnostic Meta-Learning (MAML), PrototypicalNetworks (ProtoNet), Relation Networks (RelationNet), Baseline, och Baseline++. Dessa modeller används för att klassificera en rad olika defekta produkter för automatisering av den visuella kvalitetskontrollen i en produktionslinje. Vidare undersöks även de djupare nätverkens prestanda i jämförelse med de grundare nätverken. Experimentresultaten på det tillgängliga datasetet visar att Baseline++ modellen har bäst prestanda bland de olika modellerna. Dessutom är Baseline++ med ett sex-lagers faltningsnätverk, en relativt enkel modell att träna som inte kräver en hög beräkningskraft jämfört med de andra modellerna.

APA, Harvard, Vancouver, ISO, and other styles

5

Aboudib, Ala. "Neuro-inspired Architectures for the Acquisition and Processing of Visual Information." Thesis, Télécom Bretagne, 2016. http://www.theses.fr/2016TELB0419/document.

Full text

Abstract:

L'apprentissage automatique et la vision par ordinateur sont deux sujets de recherche d'actualité. Des contributions clés à ces domaines ont été les fruits de longues années d'études du cortex visuel et de la fonction des réseaux cérébraux. Dans cette thèse, nous nous intéressons à la conception des architectures neuro-inspirées pour le traitement de l'information sur trois niveaux différents du cortex visuel. Au niveau le plus bas, nous proposons un réseau de neurones pour l'acquisition des signaux visuels. Ce modèle est étroitement inspiré par le fonctionnement et l'architecture de la retine et les premières couches du cortex visuel chez l'humain. Il est également adapté à l'émulation des mouvements oculaires qui jouent un rôle important dans notre vision. Au niveau le plus haut, nous nous intéressons à la mémoire. Nous traitons un modèle de mémoire associative basée sur une architecture neuro-inspirée dite `Sparse Clustered Network (SCN)'. Notre contribution principale à ce niveau est de proposer une amélioration d'un algorithme utilisé pour la récupération des messages partiellement effacés du SCN. Nous suggérons également une formulation générique pour faciliter l'évaluation des algorithmes de récupération, et pour aider au développement des nouveaux algorithmes. Au niveau intermédiaire, nous étendons l'architecture du SCN pour l'adapter au problème de la mise en correspondance des caractéristiques d'images, un problème fondamental en vision par ordinateur. Nous démontrons que la performance de notre réseau atteint l'état de l'art, et offre de nombreuses perspectives sur la façon dont les architectures neuro-inspirées peuvent servir de substrat pour la mise en oeuvre de diverses tâches de vision
Computer vision and machine learning are two hot research topics that have witnessed major breakthroughs in recent years. Much of the advances in these domains have been the fruits of many years of research on the visual cortex and brain function. In this thesis, we focus on designing neuro-inspired architectures for processing information along three different stages of the visual cortex. At the lowest stage, we propose a neural model for the acquisition of visual signals. This model is adapted to emulating eye movements and is closely inspired by the function and the architecture of the retina and early layers of the ventral stream. On the highest stage, we address the memory problem. We focus on an existing neuro-inspired associative memory model called the Sparse Clustered Network. We propose a new information retrieval algorithm that offers more flexibility and a better performance over existing ones. Furthermore, we suggest a generic formulation within which all existing retrieval algorithms can fit. It can also be used to guide the design of new retrieval approaches in a modular fashion. On the intermediate stage, we propose a new way for dealing with the image feature correspondence problem using a neural network model. This model deploys the structure of Sparse Clustered Networks, and offers a gain in matching performance over state-of-the-art, and provides a useful insight on how neuro-inspired architectures can serve as a substrate for implementing various vision tasks

APA, Harvard, Vancouver, ISO, and other styles

6

Ajamlou, Kevin, and Max Sonebäck. "Multimodal Convolutional Graph Neural Networks for Information Extraction from Visually Rich Documents." Thesis, Uppsala universitet, Avdelningen för visuell information och interaktion, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-445457.

Full text

Abstract:

Monotonous and repetitive tasks consume a lot of time and resources in businesses today and the incentive to fully or partially automate said tasks, in order to relieve office workers and increase productivity in the industry, is therefore high. One such task is to process and extract information from Visually Rich Documents (VRD:s), e.g., documents where the visual attributes contain important information about the contents of the document. A lot of recent studies have focused on information extraction from invoices, where graph based convolutional nerual networks have shown a lot of promise for extracting relevant entities. By modelling the invoice as a graph, the text of the invoice can be modelled as nodes and the topological relationship between nodes, i.e., the visual representation of the document, can be preserved by connecting the nodes through edges. The idea is then to propagate the features of neighboring nodes to each other in order to find meaningful patterns for distinct entities in the document, based on both the features of the node itself as well as the features of its neighbors. This master thesis aims to investigate, analyze and compare the performances of state-of-the-art multimodal graph based convolutional neural networks, as well as evaluate how well the models generalize across unseen invoice templates. Three models, with two different model architecture designs, have been trained with either underlying ChebNet or GCN convolutional layers. Two of these models have been re-trained, and compared to their predecessors, using the over-smoothing combatting technique DropEdge. All models have been tested on two datasets - one containing both seen and unseen templates and a subset of the previous dataset, containing only invoices with unseen templates. The results show that multimodal graph based convolutional neural networks are a viable option for information extraction from invoices and that the models built in this thesis show great potential to generalize across unseen invoice templates. Moreover, due to an inherent sparse nature of graphs modeled from invoices, DropEdge does not yield an overall better performance for the models.

APA, Harvard, Vancouver, ISO, and other styles

7

Michler, Frank [Verfasser], and Thomas [Akademischer Betreuer] Wachtler. "Self-Organization of Spiking Neural Networks for Visual Object Recognition / Frank Michler ; Betreuer: Thomas Wachtler." Marburg : Philipps-Universität Marburg, 2020. http://d-nb.info/1204199876/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Dercksen, Vincent Jasper [Verfasser]. "Visual computing techniques for the reconstruction and analysis of anatomically realistic neural networks / Vincent Jasper Dercksen." Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1081935391/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Tong, Song. "Informatics Approaches for Understanding Human Facial Attractiveness Perception and Visual Attention." Doctoral thesis, Kyoto University, 2021. http://hdl.handle.net/2433/264679.

Full text

Abstract:

京都大学
新制・課程博士
博士(情報学)
甲第23398号
情博第767号
新制||情||131(附属図書館)
京都大学大学院情報学研究科知能情報学専攻
(主査)教授熊田孝恒, 教授西田眞也, 教授齋木潤, 准教授延原章平
学位規則第4条第1項該当
Doctor of Informatics
Kyoto University
DFAM

APA, Harvard, Vancouver, ISO, and other styles

10

Salem, Tawfiq. "Learning to Map the Visual and Auditory World." UKnowledge, 2019. https://uknowledge.uky.edu/cs_etds/86.

Full text

Abstract:

The appearance of the world varies dramatically not only from place to place but also from hour to hour and month to month. Billions of images that capture this complex relationship are uploaded to social-media websites every day and often are associated with precise time and location metadata. This rich source of data can be beneficial to improve our understanding of the globe. In this work, we propose a general framework that uses these publicly available images for constructing dense maps of different ground-level attributes from overhead imagery. In particular, we use well-defined probabilistic models and a weakly-supervised, multi-task training strategy to provide an estimate of the expected visual and auditory ground-level attributes consisting of the type of scenes, objects, and sounds a person can experience at a location. Through a large-scale evaluation on real data, we show that our learned models can be used for applications including mapping, image localization, image retrieval, and metadata verification.

APA, Harvard, Vancouver, ISO, and other styles

11

Jiu, Mingyuan. "Spatial information and end-to-end learning for visual recognition." Thesis, Lyon, INSA, 2014. http://www.theses.fr/2014ISAL0038/document.

Full text

Abstract:

Dans cette thèse nous étudions les algorithmes d'apprentissage automatique pour la reconnaissance visuelle. Un accent particulier est mis sur l'apprentissage automatique de représentations, c.à.d. l'apprentissage automatique d'extracteurs de caractéristiques; nous insistons également sur l'apprentissage conjoint de ces dernières avec le modèle de prédiction des problèmes traités, tels que la reconnaissance d'objets, la reconnaissance d'activités humaines, ou la segmentation d'objets. Dans ce contexte, nous proposons plusieurs contributions : Une première contribution concerne les modèles de type bags of words (BoW), où le dictionnaire est classiquement appris de manière non supervisée et de manière autonome. Nous proposons d'apprendre le dictionnaire de manière supervisée, c.à.d. en intégrant les étiquettes de classes issues de la base d'apprentissage. Pour cela, l'extraction de caractéristiques et la prédiction de la classe sont formulées en un seul modèle global de type réseau de neurones (end-to-end training). Deux algorithmes d'apprentissage différents sont proposés pour ce modèle : le premier est basé sur la retro-propagation du gradient de l'erreur, et le second procède par des mises à jour dans le diagramme de Voronoi calculé dans l'espace des caractéristiques. Une deuxième contribution concerne l'intégration d'informations géométriques dans l'apprentissage supervisé et non-supervisé. Elle se place dans le cadre d'applications nécessitant une segmentation d'un objet en un ensemble de régions avec des relations de voisinage définies a priori. Un exemple est la segmentation du corps humain en parties ou la segmentation d'objets spécifiques. Nous proposons une nouvelle approche intégrant les relations spatiales dans l'algorithme d'apprentissage du modèle de prédication. Contrairement aux méthodes existantes, les relations spatiales sont uniquement utilisées lors de la phase d'apprentissage. Les algorithmes de classification restent inchangés, ce qui permet d'obtenir une amélioration du taux de classification sans augmentation de la complexité de calcul lors de la phase de test. Nous proposons trois algorithmes différents intégrant ce principe dans trois modèles : - l'apprentissage du modèle de prédiction des forêts aléatoires, - l'apprentissage du modèle de prédiction des réseaux de neurones (et de la régression logistique), - l'apprentissage faiblement supervisé de caractéristiques visuelles à l'aide de réseaux de neurones convolutionnels
In this thesis, we present our research on visual recognition and machine learning. Two types of visual recognition problems are investigated: action recognition and human body part segmentation problem. Our objective is to combine spatial information such as label configuration in feature space, or spatial layout of labels into an end-to-end framework to improve recognition performance. For human action recognition, we apply the bag-of-words model and reformulate it as a neural network for end-to-end learning. We propose two algorithms to make use of label configuration in feature space to optimize the codebook. One is based on classical error backpropagation. The codewords are adjusted by using gradient descent algorithm. The other is based on cluster reassignments, where the cluster labels are reassigned for all the feature vectors in a Voronoi diagram. As a result, the codebook is learned in a supervised way. We demonstrate the effectiveness of the proposed algorithms on the standard KTH human action dataset. For human body part segmentation, we treat the segmentation problem as classification problem, where a classifier acts on each pixel. Two machine learning frameworks are adopted: randomized decision forests and convolutional neural networks. We integrate a priori information on the spatial part layout in terms of pairs of labels or pairs of pixels into both frameworks in the training procedure to make the classifier more discriminative, but pixelwise classification is still performed in the testing stage. Three algorithms are proposed: (i) Spatial part layout is integrated into randomized decision forest training procedure; (ii) Spatial pre-training is proposed for the feature learning in the ConvNets; (iii) Spatial learning is proposed in the logistical regression (LR) or multilayer perceptron (MLP) for classification

APA, Harvard, Vancouver, ISO, and other styles

12

Shunmugam, Nagarajan. "Operational data extraction using visual perception." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-292216.

Full text

Abstract:

The information era has led the manufacturer of trucks and logistics solution providers are inclined towards software as a service (SAAS) based solutions. With advancements in software technologies like artificial intelligence and deep learning, the domain of computer vision has achieved significant performance boosts that it competes with hardware based solutions. Firstly, data is collected from a large number of sensors which can increase production costs and carbon footprint in the environment. Secondly certain useful physical quantities/variables are impossible to measure or turns out to be very expensive solution. So in this dissertation, we are investigating the feasibility of providing the similar solution using a single sensor (dashboard- camera) to measure multiple variables. This provides a sustainable solution even when scaled up in huge fleets. The video frames that can be collected from the visual perception of the truck (i.e. the on-board camera of the truck) is processed by the deep learning techniques and operational data can be extracted. Certain techniques like the image classification and semantic segmentation outputs were experimented and shows potential to replace costly hardware counterparts like Lidar or radar based solutions.
Informationstiden har lett till att tillverkare av lastbilar och logistiklösningsleve -rantörer är benägna mot mjukvara som en tjänst (SAAS) baserade lösningar. Med framsteg inom mjukvaruteknik som artificiell intelligens och djupinlärnin har domänen för datorsyn uppnått betydande prestationsförstärkningar att konkurrera med hårdvarubaserade lösningar. För det första samlas data in från ett stort antal sensorer som kan öka produktionskostnaderna och koldioxidavtry -cket i miljön. För det andra är vissa användbara fysiska kvantiteter / variabler omöjliga att mäta eller visar sig vara en mycket dyr lösning. Så i denna avhandling undersöker vi möjligheten att tillhandahålla liknande lösning med hjälp av en enda sensor (instrumentbrädkamera) för att mäta flera variabler. Detta ger en hållbar lösning även när den skalas upp i stora flottor. Videoramar som kan samlas in från truckens visuella uppfattning (dvs. lastbilens inbyggda kamera) bearbetas av djupinlärningsteknikerna och operativa data kan extraher -as. Vissa tekniker som bildklassificering och semantiska segmenteringsutgång -ar experimenterades och visar potential att ersätta dyra hårdvaruprojekt som Lidar eller radarbaserade lösningar.

APA, Harvard, Vancouver, ISO, and other styles

13

Dang, Hieu. "Adaptive multiobjective memetic optimization: algorithms and applications." Journal of Cognitive Informatics and Natural Intelligence, 2012. http://hdl.handle.net/1993/30856.

Full text

Abstract:

The thesis presents research on multiobjective optimization based on memetic computing and its applications in engineering. We have introduced a framework for adaptive multiobjective memetic optimization algorithms (AMMOA) with an information theoretic criterion for guiding the selection, clustering, and local refinements. A robust stopping criterion for AMMOA has also been introduced to solve non-linear and large-scale optimization problems. The framework has been implemented for different benchmark test problems with remarkable results. This thesis also presents two applications of these algorithms. First, an optimal image data hiding technique has been formulated as a multiobjective optimization problem with conflicting objectives. In particular, trade-off factors in designing an optimal image data hiding are investigated to maximize the quality of watermarked images and the robustness of watermark. With the fixed size of a logo watermark, there is a conflict between these two objectives, thus a multiobjective optimization problem is introduced. We propose to use a hybrid between general regression neural networks (GRNN) and the adaptive multiobjective memetic optimization algorithm (AMMOA) to solve this challenging problem. This novel image data hiding approach has been implemented for many different test natural images with remarkable robustness and transparency of the embedded logo watermark. We also introduce a perceptual measure based on the relative Rényi information spectrum to evaluate the quality of watermarked images. The second application is the problem of joint spectrum sensing and power control optimization for a multichannel, multiple-user cognitive radio network. We investigated trade-off factors in designing efficient spectrum sensing techniques to maximize the throughput and minimize the interference. To maximize the throughput of secondary users and minimize the interference to primary users, we propose a joint determination of the sensing and transmission parameters of the secondary users, such as sensing times, decision threshold vectors, and power allocation vectors. There is a conflict between these two objectives, thus a multiobjective optimization problem is used again in the form of AMMOA. This algorithm learns to find optimal spectrum sensing times, decision threshold vectors, and power allocation vectors to maximize the averaged opportunistic throughput and minimize the averaged interference to the cognitive radio network.
February 2016

APA, Harvard, Vancouver, ISO, and other styles

14

Hazarika, Subhashis. "Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574692702479196.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Finfando, Filip. "Indoor scene verification : Evaluation of indoor scene representations for the purpose of location verification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-288856.

Full text

Abstract:

When human’s visual system is looking at two pictures taken in some indoor location, it is fairly easy to tell whether they were taken in exactly the same place, even when the location has never been visited in reality. It is possible due to being able to pay attention to the multiple factors such as spatial properties (windows shape, room shape), common patterns (floor, walls) or presence of specific objects (furniture, lighting). Changes in camera pose, illumination, furniture location or digital alteration of the image (e.g. watermarks) has little influence on this ability. Traditional approaches to measuring the perceptual similarity of images struggled to reproduce this skill. This thesis defines the Indoor scene verification (ISV) problem as distinguishing whether two indoor scene images were taken in the same indoor space or not. It explores the capabilities of state-of-the-art perceptual similarity metrics by introducing two new datasets designed specifically for this problem. Perceptual hashing, ORB, FaceNet and NetVLAD are evaluated as the baseline candidates. The results show that NetVLAD provides the best results on both datasets and therefore is chosen as the baseline for the experiments aiming to improve it. Three of them are carried out testing the impact of using the different training dataset, changing deep neural network architecture and introducing new loss function. Quantitative analysis of AUC score shows that switching from VGG16 to MobileNetV2 allows for improvement over the baseline.
Med mänskliga synförmågan är det ganska lätt att bedöma om två bilder som tas i samma inomhusutrymme verkligen har tagits i exakt samma plats även om man aldrig har varit där. Det är möjligt tack vare många faktorer, sådana som rumsliga egenskaper (fönsterformer, rumsformer), gemensamma mönster (golv, väggar) eller närvaro av särskilda föremål (möbler, ljus). Ändring av kamerans placering, belysning, möblernas placering eller digitalbildens förändring (t. ex. vattenstämpel) påverkar denna förmåga minimalt. Traditionella metoder att mäta bildernas perceptuella likheter hade svårigheter att reproducera denna färdighet . Denna uppsats definierar verifiering av inomhusbilder, Indoor SceneVerification (ISV), som en ansats att ta reda på om två inomhusbilder har tagits i samma utrymme eller inte. Studien undersöker de främsta perceptuella identitetsfunktionerna genom att introducera två nya datauppsättningar designade särskilt för detta. Perceptual hash, ORB, FaceNet och NetVLAD identifierades som potentiella referenspunkter. Resultaten visar att NetVLAD levererar de bästa resultaten i båda datauppsättningarna, varpå de valdes som referenspunkter till undersökningen i syfte att förbättra det. Tre experiment undersöker påverkan av användning av olika datauppsättningar, ändring av struktur i neuronnätet och införande av en ny minskande funktion. Kvantitativ AUC-värdet analys visar att ett byte frånVGG16 till MobileNetV2 tillåter förbättringar i jämförelse med de primära lösningarna.

APA, Harvard, Vancouver, ISO, and other styles

16

Baddeley, Roland. "Visual statistics using neural networks." Thesis, University of Stirling, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.259833.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Lee, Ji Young Ph D. Massachusetts Institute of Technology. "Information extraction with neural networks." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/111905.

Full text

Abstract:

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2017.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 85-97).
Electronic health records (EHRs) have been widely adopted, and are a gold mine for clinical research. However, EHRs, especially their text components, remain largely unexplored due to the fact that they must be de-identified prior to any medical investigation. Existing systems for de-identification rely on manual rules or features, which are time-consuming to develop and fine-tune for new datasets. In this thesis, we propose the first de-identification system based on artificial neural networks (ANNs), which achieves state-of-the-art results without any human-engineered features. The ANN architecture is extended to incorporate features, further improving the de-identification performance. Under practical considerations, we explore transfer learning to take advantage of large annotated dataset to improve the performance on datasets with limited number of annotations. The ANN-based system is publicly released as an easy-to-use software package for general purpose named-entity recognition as well as de-identification. Finally, we present an ANN architecture for relation extraction, which ranked first in the SemEval-2017 task 10 (ScienceIE) for relation extraction in scientific articles (subtask C).
by Ji Young Lee.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

18

Rabi, Gihad. "Visual speech recognition by recurrent neural networks." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0010/MQ36169.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Sørngård, Bård. "Information Theory for Analyzing Neural Networks." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for datateknikk og informasjonsvitenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-26773.

Full text

Abstract:

The goal of this thesis was to investigate how information theory could be used to analyze artificial neural networks. For this purpose, two problems, a classification problem and a controller problem were considered. The classification problem was solved with a feedforward neural network trained with backpropagation, the controller problem was solved with a continuous-time recurrent neural network optimized with evolution.Results from the classification problem shows that mutual information might indicate how much a particular neuron contributes to the classification. Tracking these neurons' mutual information during training might serve as an indicator of their progression, including neurons in the hidden layers.Results from the controller problem showed that time-delayed mutual information between a neuron and an environment variable might indicate what variable each neuron is estimating, and tracking this during evolution might tell us when this particular neuron started taking this role. Furthermore, unrolled transfer entropy appears to be a good measure for how neurons affect each other during simulation.

APA, Harvard, Vancouver, ISO, and other styles

20

Hodge, Victoria J. "Integrating information retrieval & neural networks." Thesis, University of York, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.247019.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Folta, Kristian. "Neural mechanisms of lateralized visual information processing." [S.l.] : [s.n.], 2005. http://deposit.ddb.de/cgi-bin/dokserv?idn=973557702.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Barraclough, Nicholas Edward. "The neural processing of visual motion information." Thesis, University of Nottingham, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.395577.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Cheetham, Emma. "The neural networks recruited during visual feature binding." Thesis, Cardiff University, 2014. http://orca.cf.ac.uk/68612/.

Full text

Abstract:

The binding problem presents one of the most challenging questions in psychology and cognitive neuroscience, despite its seemingly effortless resolution in daily life. Binding of visual features begins with stimulation of peripheral receptors and ends with the emergence of a perceived object, yet many questions remain unanswered about the nature of the intervening mechanisms. The primary focus of this thesis was to elucidate neurocognitive processes that support binding of features into a coherent object. Experiment 1 sought to dissociate neural correlates of feature binding from spatial and temporal attention, which are frequently conflated in previous studies. Results showed a widespread network engaged during both forms of attention, without any significant clusters of activity in response to an explicit feature-binding task. One explanation for these results may lie in evidence that feature binding is a spontaneous process that happens implicitly upon observing an object. Therefore, in order to measure the network associated with implicit visual feature binding the established reviewing paradigm was employed in the subsequent studies. Experiments 2 and 3 sought to replicate key aspects of the reviewing paradigm. The reviewing paradigm exploits the finding that when an object is shown in close spatial and temporal succession to another object it is perceived as a continuation of the same object. Therefore, if a feature changes between the initial object and the second presentation of this object then a rebinding of features occurs and a behavioural cost termed a partial repetition cost is often incurred. In order to observe the impact of a relevant feature change compared with an irrelevant feature change, the reviewing paradigm was modified. Results indicated that an irrelevant feature change carried with it a reaction time (RT) cost almost as large as a RT cost observed following a relevant feature change. Experiment 4 aimed to observe the neural network recruited during the completion of the reviewing task experiment using fMRI and a whole brain analysis. Results showed a widespread network encompassing bilateral frontal and occipital areas. Furthermore, the network that was recruited during the irrelevant feature change condition was different from that engaged during the relevant feature change condition. In order to probe the causality of these actions, 3 experiment 5 exploited the offline transcranial magnetic stimulation (TMS) to three key cortical areas: right lateral occipital complex, left superior frontal gyrus and left post-central gyrus. The overarching conclusion of this thesis is that feature binding is an implicit and spontaneous process that is coordinated by a wider cortical network than expected from previous research. The parietal cortex has often been observed as the key area in which object representations become bound, however the results of this thesis do not support a unique or privileged role of this area in binding. The latter experiments show that feature binding is an interaction between the memory trace, action-based implications and perceptual demands of an object. How the brain co-ordinates this widespread cortical network during feature binding is a key question for future research involving TMS and brain imaging techniques.

APA, Harvard, Vancouver, ISO, and other styles

24

Olde, Scheper Tjeerd. "Chaos and information in dynamic neural networks." Thesis, Oxford Brookes University, 2002. https://radar.brookes.ac.uk/radar/items/e2a920c8-ff78-4ad6-adf3-8217d18c3b96/1/.

Full text

Abstract:

This research attempts to identify and model ways to store information in dynamic, chaotic neural networks. The justification for this research is given by both biological as well as theoretical motivations [2, 27, 29, 46, 60, 107]. Firstly, there seems to be substantial support for the use of dynamic networks to study more complex and interesting behaviour. The artificial neural networks (ANN) have specific properties that define its order, such as size, type and function. Simply extending the ANN with complex non linear dynamics does not improve the memory performance of the network, it modifies the rate at which a global minimum may be located, if such .a state exists. Using non-linear differential equations may add more com plexity to the system and thereby increase the possible memory states. Secondly, even though chaos is generally undesirable, it has important properties that may be exploited to store and retrieve information [98]. These are the space filling, the possibility of control via delayed feedback, synchronization and the sensitive dependence on initial conditions. It is demonstrated in this thesis that by using delayed feedback, Unsta ble Periodic Orbits (UPO) may be stabilized to reduce the complexity of a chaotic system to n-periodic behaviour. This is a well known effect of de layed control in many types of chaotic models (e.g. Rossler equation), and the periodicity of the resulting orbit is determined by the model parameter as well as the delay (T) and the feedback strength (K) of the control func tion (F). Even though a theoretical infinite number of UPOs exist within a chaotic attractor only some can practically be stabilized. Furthermore, it is shown that input added to the delayed feedback controlled system allows different orbits to be stabilized. The addition of multiple delays changes the number and types of orbits that are available for stabilization. The use of synchronization between similar sets of chaotic systems may be used to target specific orbits.

APA, Harvard, Vancouver, ISO, and other styles

25

Smith, Julian P. "Neural networks, information theory and knowledge representation." Thesis, University of Edinburgh, 1996. http://hdl.handle.net/1842/20801.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Lee, Hyo-Dong. "Visual tasks beyond categorization for training convolutional neural networks." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/106095.

Full text

Abstract:

Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 21-23).
Humans can perceive a variety of visual properties of objects besides their category. In this paper, we explore- whether convolutional neural networks (CNNs) can also learn object-related variables. The models are trained for object position, size and pose, respectively, from synthetic images and tested on unseen held-out objects. First, we show that some object properties come "for free" from learning others, and pose-optimized model can generalize to both categorical and non-categorical variables. Second, we demonstrate that pre-training the model with pose facilitates learning object categories from both synthetic and realistic images.
by Hyodong Lee.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

27

Oquab, Maxime. "Convolutional neural networks : towards less supervision for visual recognition." Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE061.

Full text

Abstract:

Les réseaux de neurones à convolution sont des algorithmes d’apprentissage flexibles qui tirent efficacement parti des importantes masses de données qui leur sont fournies pour l’entraînement. Malgré leur utilisation dans des applications industrielles dès les années 90, ces algorithmes n’ont pas été utilisés pour la reconnaissance d’image à cause de leurs faibles performances avec les images naturelles. C’est finalement grâce a l’apparition d’importantes quantités de données et de puissance de calcul que ces algorithmes ont pu révéler leur réel potentiel lors de la compétition ImageNet, menant à un changement de paradigme en reconnaissance d’image. La première contribution de cette thèse est une méthode de transfert d’apprentissage dans les réseaux à convolution pour la classification d’image. À l’aide d’une procédure de pré-entraînement, nous montrons que les représentations internes d’un réseau à convolution sont assez générales pour être utilisées sur d’autres tâches, et meilleures lorsque le pré-entraînement est réalisé avec plus de données. La deuxième contribution de cette thèse est un système faiblement supervisé pour la classification d’images, pouvant prédire la localisation des objets dans des scènes complexes, en utilisant, lors de l’entraînement, seulement l’indication de la présence ou l’absence des objets dans les images. La troisième contribution de cette thèse est une recherche de pistes de progression en apprentissage non-supervisé. Nous étudions l’algorithme récent des réseaux génératifs adversariaux et proposons l’utilisation d’un test statistique pour l’évaluation de ces modèles. Nous étudions ensuite les liens avec le problème de la causalité, et proposons un test statistique pour la découverte causale. Finalement, grâce a un lien établi récemment avec les problèmes de transport optimal, nous étudions ce que ces réseaux apprennent des données dans le cas non-supervisé
Convolutional Neural Networks are flexible learning algorithms for computer vision that scale particularly well with the amount of data that is provided for training them. Although these methods had successful applications already in the ’90s, they were not used in visual recognition pipelines because of their lesser performance on realistic natural images. It is only after the amount of data and the computational power both reached a critical point that these algorithms revealed their potential during the ImageNet challenge of 2012, leading to a paradigm shift in visual recogntion. The first contribution of this thesis is a transfer learning setup with a Convolutional Neural Network for image classification. Using a pre-training procedure, we show that image representations learned in a network generalize to other recognition tasks, and their performance scales up with the amount of data used in pre-training. The second contribution of this thesis is a weakly supervised setup for image classification that can predict the location of objects in complex cluttered scenes, based on a dataset indicating only with the presence or absence of objects in training images. The third contribution of this thesis aims at finding possible paths for progress in unsupervised learning with neural networks. We study the recent trend of Generative Adversarial Networks and propose two-sample tests for evaluating models. We investigate possible links with concepts related to causality, and propose a two-sample test method for the task of causal discovery. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data

APA, Harvard, Vancouver, ISO, and other styles

28

Evans, Benjamin D. "Learning transformation-invariant visual representations in spiking neural networks." Thesis, University of Oxford, 2012. https://ora.ox.ac.uk/objects/uuid:15bdf771-de28-400e-a1a7-82228c7f01e4.

Full text

Abstract:

This thesis aims to understand the learning mechanisms which underpin the process of visual object recognition in the primate ventral visual system. The computational crux of this problem lies in the ability to retain specificity to recognize particular objects or faces, while exhibiting generality across natural variations and distortions in the view (DiCarlo et al., 2012). In particular, the work presented is focussed on gaining insight into the processes through which transformation-invariant visual representations may develop in the primate ventral visual system. The primary motivation for this work is the belief that some of the fundamental mechanisms employed in the primate visual system may only be captured through modelling the individual action potentials of neurons and therefore, existing rate-coded models of this process constitute an inadequate level of description to fully understand the learning processes of visual object recognition. To this end, spiking neural network models are formulated and applied to the problem of learning transformation-invariant visual representations, using a spike-time dependent learning rule to adjust the synaptic efficacies between the neurons. The ways in which the existing rate-coded CT (Stringer et al., 2006) and Trace (Földiák, 1991) learning mechanisms may operate in a simple spiking neural network model are explored, and these findings are then applied to a more accurate model using realistic 3-D stimuli. Three mechanisms are then examined, through which a spiking neural network may solve the problem of learning separate transformation-invariant representations in scenes composed of multiple stimuli by temporally segmenting competing input representations. The spike-time dependent plasticity in the feed-forward connections is then shown to be able to exploit these input layer dynamics to form individual stimulus representations in the output layer. Finally, the work is evaluated and future directions of investigation are proposed.

APA, Harvard, Vancouver, ISO, and other styles

29

Wu, Lizhong. "Speech processing with neural networks." Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.259529.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Lester, Ben. "Attentional and Neural Manipulations of Visuospatial Contextual Information." Thesis, University of Oregon, 2013. http://hdl.handle.net/1794/12985.

Full text

Abstract:

A critical function of the human visual system is to parse objects from the larger context of the environment, allowing for the identification of, and potential interaction with, those objects. The use of contextual information allows us to rapidly locate, identify, and interact with objects that appear in the environment. Contextual information can help specify an object's location within the environment (allocentric encoding) or with respect to the observer (egocentric encoding). Understanding how contextual information influences perceptual organization, and the neural systems that process a complex scene, is critical in understanding how contextual information assists in parsing local information from background. In the real world, relying on context is typically beneficial, as most objects occur in circumscribed environments. However, there are circumstances in which context can harm performance. In the case of visual illusions, relying on the context can bias observers' perceptions and cause significant motor errors. Studying the illusory conditions under which perceptual/motor functions are "fooled", or breakdown, can provide valuable information about how the brain computes allocentric and egocentric frames of reference. The following studies examine how attentional (Chapters II & III) manipulations of visuospatial context affect components of observers' egocentric reference frames (e.g., perceived vertical or subjective midline) and how neural manipulations (Chapter IV) can modulate observers' reliance on contextual information. In Chapter II, the role of attentional control settings on contextual processing is examined. Chapter III addresses the question of how visuospatial shifts of attention interact with an egocentric frame of reference. Finally, Chapter IV examines the functional role of superior parietal cortex in the processing of egocentric contextual information.

APA, Harvard, Vancouver, ISO, and other styles

31

Hallum, Luke Edward Graduate School of Biomedical Engineering Faculty of Engineering UNSW. "Prosthetic vision : Visual modelling, information theory and neural correlates." Publisher:University of New South Wales. Graduate School of Biomedical Engineering, 2008. http://handle.unsw.edu.au/1959.4/41450.

Full text

Abstract:

Electrical stimulation of the retina affected by photoreceptor loss (e.g., cases of retinitis pigmentosa) elicits the perception of luminous spots (so-called phosphenes) in the visual field. This phenomenon, attributed to the relatively high survival rates of neurons comprising the retina's inner layer, serves as the cornerstone of efforts to provide a microelectronic retinal prosthesis -- a device analogous to the cochlear implant. This thesis concerns phosphenes -- their elicitation and modulation, and, in turn, image analysis for use in a prosthesis. This thesis begins with a comparative review of visual modelling of electrical epiretinal stimulation and analogous acoustic modelling of electrical cochlear stimulation. The latter models involve coloured noise played to normal listeners so as to investigate speech processing and electrode design for use in cochlear implants. Subsequently, four experiments (three psychophysical and one numerical), and two statistical analyses, are presented. Intrinsic signal optical imaging in cerebral cortex is canvassed appendically. The first experiment describes a visual tracking task administered to 20 normal observers afforded simulated prosthetic vision. Fixation, saccade, and smooth pursuit, and the effect of practice, were assessed. Further, an image analysis scheme is demonstrated that, compared to existing approaches, assisted fixation and pursuit (but not saccade) accuracy (35.8% and 6.8%, respectively), and required less phosphene array scanning. Subsequently, (numerical) information-theoretic reasoning is provided for the scheme's superiority. This reasoning was then employed to further optimise the scheme (resulting in a filter comprising overlapping Gaussian kernels), and may be readily extended to arbitrary arrangements of many phosphenes. A face recognition study, wherein stimuli comprised either size- or intensity-modulated phosphenes, is then presented. The study involved unpracticed observers (n=85), and showed no 'size' --versus--'intensity' effect. Overall, a 400-phosphene (100-phosphene) image afforded subjects 89.0% (64.0%) correct recognition (two-interval forced-choice paradigm) when five seconds' scanning was allowed. Performance fell (64.5%) when the 400-phosphene image was stabilised on the retina and presented briefly. Scanning was similar in 400- and 100-phosphene tasks. The final chapter presents the statistical effects of sampling and rendering jitter on the phosphene image. These results may generalise to low-resolution imaging systems involving loosely packed pixels.

APA, Harvard, Vancouver, ISO, and other styles

32

Lowe, Scott Corren. "Decoding information from neural populations in the visual cortex." Thesis, University of Edinburgh, 2017. http://hdl.handle.net/1842/28861.

Full text

Abstract:

Visual perception in mammals is made possible by the visual system and the visual cortex. However, precisely how visual information is coded in the brain and how training can improve this encoding is unclear. The ability to see and process visual information is not an innate property of the visual cortex. Instead, it is learnt from exposure to visual stimuli. We first considered how visual perception is learnt, by studying the perceptual learning of contrast discrimination in macaques. We investigated how changes in population activity in the visual cortices V1 and V4 correlate with the changes in behavioural response during training on this task. Our results indicate that changes in the learnt neural and behavioural responses are directed toward optimising the performance on the training task, rather than a general improvement in perception of the presented stimulus type. We report that the most informative signal about the contrast of the stimulus within V1 and V4 is the transient stimulus-onset response in V1, 50 ms after the stimulus presentation begins. However, this signal does not become more informative with training, suggesting it is an innate and untrainable property of the system, on these timescales at least. Using a linear decoder to classify the stimulus based on the population activity, we find that information in the V4 population is closely related to the information available to the higher cortical regions involved with decision making, since the performance of the decoder is similar to the performance of the animal throughout training. These findings suggest that training the subject on this task directs V4 to improve its read out of contrast information contained in V1, and cortical regions responsible for decision making use this to improve the performance with training. The structure of noise correlations between the recorded neurons changes with training, but this does not appear to cause the increase in behavioural performance. Furthermore, our results suggest there is feedback of information about the stimulus into the visual cortex after 300 ms of stimulus presentation, which may be related to the high-level percept of the stimulus within the brain. After training on the task, but not before, information about the stimulus persists in the activity of both V1 and V4 at least 400 ms after the stimulus is removed. In the second part, we explore how information is distributed across the anatomical layers of the visual cortex. Cortical oscillations in the local field potential (LFP) and current source density (CSD) within V1, driven by population-level activity, are known to contain information about visual stimulation. However the purpose of these oscillations, the sites where they originate, and what properties of the stimulus is encoded within them is still unknown. By recording the LFP at multiple recording sites along the cortical depth of macaque V1 during presentation of a natural movie stimulus, we investigated the structure of visual information encoded in cortical oscillations. We found that despite a homogeneous distribution of the power of oscillations across the cortical depth, information was compartmentalised into the oscillations of the 4 Hz to 16 Hz range at the granular (G, layer 4) depths and the 60Hz to 170Hz range at the supragranular (SG, layers 1–3) depths, the latter of which is redundant with the population-level firing rate. These two frequency ranges contain independent information about the stimulus, which we identify as related to two spatiotemporal aspects of the visual stimulus. Oscillations in the visual cortex with frequencies < 40 Hz contain information about fast changes in low spatial frequency. Frequencies > 40 Hz and multi-unit firing rates contain information about properties of the stimulus related to changes, both slow and fast, at finer-grained spatial scales. The spatiotemporal domains encoded in each are complementary. In particular, both the power and phase of oscillations in the 7 Hz to 20Hz range contain information about scene transitions in the presented movie stimulus. Such changes in the stimulus are similar to saccades in natural behaviour, and this may be indicative of predictive coding within the cortex.

APA, Harvard, Vancouver, ISO, and other styles

33

Franklin, D. R. "Neural networks for visual feedback control of an industrial robot." Thesis, University of Cambridge, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.599180.

Full text

Abstract:

The majority of industrial robots in use today are configured by on-line programming at the start of each production run. The workpieces are located using precision indexing. The robots have little or no sensory input, other than joint position feedback, and are unable to operate in changing or loosely constrained environments. To overcome these constraints and to increase the range of practical applications, robots need to be able to apply adaptive intelligence to manufacturing operations. This calls for enhanced sensory capabilities. Vision systems have been introduced successfully into many production processes to perform component identification, inspection and location. When introduced into the robot workspace as part of a dynamic visual feedback control scheme they have the potential to reduce the costs associated with precise component fixturing, to compensate for calibration errors, to extend the working life of the robot, to align a robot program developed off-line with the part it is operating on, and to compensate for variations in components. The research presented here used a world-based stereo vision system to control an industrial robot in 3-dimensional space. A visual tracking algorithm was developed to follow the robot end-effector. Iterative and dynamic visual feedback control strategies were investigated. To achieve this it was necessary to translate between the visually observed position of the robot end-effector and its position in the workspace. The bulk of the experimental work was devoted to techniques for achieving this. Methods based on an affine stereo algorithm, a geometric perspective stereo algorithm, and a neural gas network were investigated. The neural gas network is an artificial neural network algorithm that uses a rapid interpolative training scheme. The network was used to implement either an image to robot joint space mapping or an image to Cartesian space mapping. The neural network algorithm had no prior knowledge of the positions of the cameras or the kinematics of the robot, but instead learned the mapping by making a series of trial movements and by updating the network weights based on the results. A number of different training scheme variations were investigated and optimised. The most accurate mapping algorithms were used to implement a dynamic dual loop visual control system. The resulting system was capable of driving the end-effector along a visually defined path. The system was able to tolerate a degree of robot miscalibration as well as serious image to robot miscalibration.

APA, Harvard, Vancouver, ISO, and other styles

34

McClure, Patrick. "Adapting deep neural networks as models of human visual perception." Thesis, University of Cambridge, 2018. https://www.repository.cam.ac.uk/handle/1810/278073.

Full text

Abstract:

Deep neural networks (DNNs) have recently been used to solve complex perceptual and decision tasks. In particular, convolutional neural networks (CNN) have been extremely successful for visual perception. In addition to performing well on the trained object recognition task, these CNNs also model brain data throughout the visual hierarchy better than previous models. However, these DNNs are still far from completely explaining visual perception in the human brain. In this thesis, we investigated two methods with the goal of improving DNNs’ capabilities to model human visual perception: (1) deep representational distance learning (RDL), a method for driving representational spaces in deep nets into alignment with other (e.g. brain) representational spaces and (2) variational DNNs that use sampling to perform approximate Bayesian inference. In the first investigation, RDL successfully transferred information from a teacher model to a student DNN. This was achieved by driving the student DNN’s representational distance matrix (RDM), which characterises the representational geometry, into alignment with that of the teacher. This led to a significant increase in test accuracy on machine learning benchmarks. In the future, we plan to use this method to simultaneously train DNNs to perform complex tasks and to predict neural data. In the second investigation, we showed that sampling during learning and inference using simple Bernoulli- and Gaussian-based noise improved a CNN’s representation of its own uncertainty for object recognition. We also found that sampling during learning and inference with Gaussian noise improved how well CNNs predict human behavioural data for image classification. While these methods alone do not fully explain human vision, they allow for training CNNs that better model several features of human visual perception.

APA, Harvard, Vancouver, ISO, and other styles

35

Xie, Ning. "Towards Interpretable and Reliable Deep Neural Networks for Visual Intelligence." Wright State University / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=wright1596208422672732.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Folsom, Tyler C. "Neural networks modeling cortical cells for machine vision /." Thesis, Connect to this title online; UW restricted, 1994. http://hdl.handle.net/1773/6135.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Stigeborn, Patrik. "Generating 3D-objects using neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-230668.

Full text

Abstract:

Enabling a 2D- to 3D-reconstruction is an interesting future service for Mutate AB, where this thesis is conducted. Convolutional neural networks (CNNs) is examined in different aspects, in order to give a realistic perception of what this technology is capable of. The task conducted, is the creation of a CNN that can be used to predict how an object from a 2D image would look in 3D. The main areas that this CNN is optimized for are Quality, Speed, and Simplicity. Where Quality is the output resolution of the 3D object, Speed is measured by the number of seconds it takes to complete a reconstruction, and Simplicity is achieved by using machine learning (ML). Enabling this could potentially ease the creation of 3D games and make the development faster. The chosen solution is to use two CNNs. The first CNN is using convolution to extract features from an input image. The second CNN is using transpose convolution to create a prediction of how the object would look in 3D, from the features extracted by the first neural network. This thesis is using an empirical development approach to reach an optimal solution for the CNN structure and its hyperparameters. The 3D-reconstruction is inspired by a sculpting process, meaning that the reconstruction starts with a low resolution and improves it iteratively. The result shows that the quality gained from each iteration grows exponentially whilst the increased time grows a lot less. Thereof, the conclusion is that the trade-off between speed and quality is in our favor. However, when looking at commercializing this technology or deploy it in a professional environment, it is still too slow to generate high resolution output. Also, in this case, the CNN is fragile when there are a lot of unrecognized shapes in the input image.

APA, Harvard, Vancouver, ISO, and other styles

38

Gousseau, Clément. "Hyperparameter Optimization for Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-272107.

Full text

Abstract:

Training algorithms for artificial neural networks depend on parameters called the hyperparameters. They can have a strong influence on the trained model but are often chosen manually with trial and error experiments. This thesis, conducted at Orange Labs Lannion, presents and evaluates three algorithms that aim at solving this task: a naive approach (random search), a Bayesian approach (Tree Parzen Estimator) and an evolutionary approach (Particle Swarm Optimization). A well-known dataset for handwritten digit recognition (MNIST) is used to compare these algorithms. These algorithms are also evaluated on audio classification, which is one of the main activities in the company team where the thesis was conducted. The evolutionary algorithm (PSO) showed better results than the two other methods.
Hyperparameteroptimering är en viktig men svår uppgift vid träning av ett artificiellt neuralt nätverk. Detta examensarbete, genomfört vid Orange Labs Lannion, presenterar och utvärderar tre algoritmer som syftar till att lösa denna uppgift: en naiv strategi (slumpmässig sökning), en Bayesiansk metod (TPE) och en evolutionär strategi (PSO). För att jämföra dessa algoritmer har MNIST-datasetet använts. Algoritmerna utvärderas även med hjälp av ljudklassificering, som är kärnverksamheten på företaget där examensarbetet genomfördes. Evolutionsalgoritmen (PSO) gav bättre resultat än de två andra metoderna.

APA, Harvard, Vancouver, ISO, and other styles

39

Zhang, Shuyuan. "AlphaZero with Input Convex Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-281860.

Full text

Abstract:

Modelling and solving real-life problems using reinforcement learning (RL) approaches is a typical and important branch in the world of artificial intelligence (AI). For playing board games, AlphaZero has been proved to be successful in games such as Go, Chess, and Shogi against professional human players or other AI counterparts. The very basic components of AlphaZero algorithm are MCTS tree search and deep neural networks for state value and policy prediction. These deep neural networks are designed to fit the mapping function between a state and its value/policy to make the initialization of the state value/ policy more accurate. In this thesis project, we propose Convex-AlphaZero to exploit a new prediction structure for the state value and policy and test its availability by providing theoretical evidence and experimental results. Instead of using one feed-forward process to get these values, our adaptation treats the problem as an optimization process by using input convex neural networks which can model the state value as a convex function of the policy given the state (i.e. game board configuration). The results of our experiments show that our method outperforms traditional mini-max approaches and worth further research on applying it to games other than Connect Four used in this thesis project.
Modellering och lösning av verkliga problem med hjälp av förstärkningsinlärningssätt (RL) är en typisk och viktig gren i världen av konstgjord intelligens (AI). För att spela brädspel har AlphaZero visat sig vara framgångsrikt i spel som Go, Chess och Shogi mot professionella mänskliga spelare eller andra AI-motsvarigheter. De mycket grundläggande komponenterna i AlphaZeroalgoritmen är MCTS-trädsökning och djupa nervnätverk för statligt värde och policyförutsägelse. Dessa djupa neurala nätverk är utformade för att passa kartläggningsfunktionen mellan ett tillstånd och dess värde / politik för att göra initieringen av tillståndsvärdet / politiken mer exakt. I det här avhandlingsprojektet föreslår vi Convex-AlphaZero att utnyttja en ny förutsägelsestruktur för det statliga värdet och policyn och testa dess tillgänglighet genom att tillhandahålla teoretiska bevis och experimentella resultat. Istället för att använda en framåtriktad process för att få dessa värden, behandlar vår anpassning problemet som en optimeringsprocess genom att använda inmatade konvexa neurala nätverk som kan modellera tillståndsvärdet som en konvex funktion av politiken som ges tillståndet (dvs. spelskortkonfiguration) . Resultaten från våra experiment visar att vår metod överträffar traditionella mini-max-tillvägagångssätt och är värt ytterligare forskning om att använda den på andra spel än Connect Four som används i denna avhandling.

APA, Harvard, Vancouver, ISO, and other styles

40

Berry, Ian Michael. "Data classification using unsupervised artificial neural networks." Thesis, University of Sussex, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.390079.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Mazurek, Mark. "Neural mechanisms for combining information in a visual discrimination task /." Thesis, Connect to this title online; UW restricted, 2004. http://hdl.handle.net/1773/10649.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Kim, Daehyon. "Acquiring parking information by image processing and neural networks." Thesis, University of Newcastle Upon Tyne, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.308978.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Lagerhjelm, Linus. "Extracting Information from Encrypted Data using Deep Neural Networks." Thesis, Umeå universitet, Institutionen för tillämpad fysik och elektronik, 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-155904.

Full text

Abstract:

In this paper we explore various approaches to using deep neural networks to per- form cryptanalysis, with the ultimate goal of having a deep neural network deci- pher encrypted data. We use long short-term memory networks to try to decipher encrypted text and we use a convolutional neural network to perform classification tasks on encrypted MNIST images. We find that although the network is unable to decipher encrypted data, it is able to perform classification on encrypted data. We also find that the networks performance is depending on what key were used to en- crypt the data. These findings could be valuable for further research into the topic of cryptanalysis using deep neural networks.

APA, Harvard, Vancouver, ISO, and other styles

44

Molter, Colin. "Storing information through complex dynamics in recurrent neural networks." Doctoral thesis, Universite Libre de Bruxelles, 2005. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211039.

Full text

Abstract:

The neural net computer simulations which will be presented here are based on the acceptance of a set of assumptions that for the last twenty years have been expressed in the fields of information processing, neurophysiology and cognitive sciences. First of all, neural networks and their dynamical behaviors in terms of attractors is the natural way adopted by the brain to encode information. Any information item to be stored in the neural net should be coded in some way or another in one of the dynamical attractors of the brain and retrieved by stimulating the net so as to trap its dynamics in the desired item's basin of attraction. The second view shared by neural net researchers is to base the learning of the synaptic matrix on a local Hebbian mechanism. The last assumption is the presence of chaos and the benefit gained by its presence. Chaos, although very simply produced, inherently possesses an infinite amount of cyclic regimes that can be exploited for coding information. Moreover, the network randomly wanders around these unstable regimes in a spontaneous way, thus rapidly proposing alternative responses to external stimuli and being able to easily switch from one of these potential attractors to another in response to any coming stimulus.

In this thesis, it is shown experimentally that the more information is to be stored in robust cyclic attractors, the more chaos appears as a regime in the back, erratically itinerating among brief appearances of these attractors. Chaos does not appear to be the cause but the consequence of the learning. However, it appears as an helpful consequence that widens the net's encoding capacity. To learn the information to be stored, an unsupervised Hebbian learning algorithm is introduced. By leaving the semantics of the attractors to be associated with the feeding data unprescribed, promising results have been obtained in term of storing capacity.
Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished

APA, Harvard, Vancouver, ISO, and other styles

45

Norrstig, Andreas. "Visual Object Detection using Convolutional Neural Networks in a Virtual Environment." Thesis, Linköpings universitet, Datorseende, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-156609.

Full text

Abstract:

Visual object detection is a popular computer vision task that has been intensively investigated using deep learning on real data. However, data from virtual environments have not received the same attention. A virtual environment enables generating data for locations that are not easily reachable for data collection, e.g. aerial environments. In this thesis, we study the problem of object detection in virtual environments, more specifically an aerial virtual environment. We use a simulator, to generate a synthetic data set of 16 different types of vehicles captured from an airplane. To study the performance of existing methods in virtual environments, we train and evaluate two state-of-the-art detectors on the generated data set. Experiments show that both detectors, You Only Look Once version 3 (YOLOv3) and Single Shot MultiBox Detector (SSD), reach similar performance quality as previously presented in the literature on real data sets. In addition, we investigate different fusion techniques between detectors which were trained on two different subsets of the dataset, in this case a subset which has cars with fixed colors and a dataset which has cars with varying colors. Experiments show that it is possible to train multiple instances of the detector on different subsets of the data set, and combine these detectors in order to boost the performance.

APA, Harvard, Vancouver, ISO, and other styles

46

Masih, Lawrence. "Associative recall in multilayered logical neural networks." Thesis, Brunel University, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.235933.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Allred, Sarah R. "The Neural basis of visual object perception /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/10645.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Besharat, Pour Shiva. "Hierarchical sales forecasting using Recurrent Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-290892.

Full text

Abstract:

Sales forecasting equips businesses with the essential basis for planning future investments, controlling costs, and production. This research is in cooperation with a property development company for the purpose of improving the accuracy of manual sales forecasting. The objective is to investigate the effects of using the underlying factors that affect the individual sales of the company in forecasting the company’s income. One approach uses an aggregation of the estimates of the individual sales to approximate the company’s income. This approach uses the underlying hierarchical factors of the company’s individual sales to forecast future sales, which is known as the bottom-up approach. Another approach, known as the direct approach, uses the history of the company’s income instead. The bottom-up approach estimates the income of the company in the chosen target quarter, Q4 2019, with a percentage error of 33 percent. On the contrary, the direct approach provides an estimate of the company’s income inQ4 2019 with a percentage error of 3 percent. The strength of the bottom-up approach is in providing detailed forecasts of the individual sales of the company. The direct approach, however, is more convenient in learning the overall behavior of the company’s earnings.
Försäljningsprognoser ger företag förutsättningar för planering av framtida investeringar och kontroll av både kostnader och produktion. Denna forskning har skett i samarbete med ett fastighetsutvecklingsföretag i syfte att förbättra noggrannheten i manuell försäljningsprognostisering. Målet är att undersöka effekterna av att använda de bakomliggande faktorer som påverkar enskild försäljning i prognoser för företagets intäkter. Ett av tillvägagångssätten som undersöks använder en sammanstallning av enskilda historiska försäljningar för att förutse företagets kommande intäkter. Detta tillvägagångssätt använder de bakomliggande hierarkiska faktorerna för företagets individuella försäljning för att prognostisera framtida försäljning, och metoden är känd som botten-upp-metoden. Ett annat tillvägagångssätt, känt som direktmetoden, använder företagets historiska inkomster som data i stället. Botten-upp-metoden användes för att upp- skatta företagets intäkter under Q4 2019 och gav ett procentuellt fel på 33 pro- cent. Direktmetoden, ˚a andra sidan, gav en uppskattning av företagets intäkter under Q4 2019 med ett procentuellt fel på 3 procent. Styrkan med botten- upp-metoden ¨ar att den kan tillhandahålla detaljerade prognoser för företagets individuella försäljning, samtidigt som direktmetoden ¨ar mer praktisk för att uppskatta företagets totala inkomster.

APA, Harvard, Vancouver, ISO, and other styles

49

Styren, Buster. "Uveal melanoma identification using artificial neural networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-241086.

Full text

Abstract:

Uveal melanoma is a deadly form of cancer that can develop from a uveal nevus in the eye fundus. By using deep convolutional networks this thesis aims to classify fundus images based on malignancy. A baseline model was compared against two state-of-the-art networks, Inceptionv3 and ResNet. The baseline model was trained using different gradient descent optimizers and image augmentations to find the best hyper parameters for the data. The state-of-the-art networks achieved comparable accuracy, with Inception-v3 achieving 0.912 AUC after training on 8360 samples. With 96% sensitivity, the same value as ophthalmologists, the top network achieves a specificity of 59% meaning that the network can greatly reduce theamount of manual naevi eye examinations by filtering out healthy subjects.
Uvealt melanom är en dödlig form av cancer som orsakas av pigmenförändringar i retina. Sjukdomen har en hög risk at metastasera sig i levern och när metastaserna är kliniskt manifesta är överlevnaden i allmänhet begränsad till några få månader. Genom att träna ett neuralt nätverk är målet med detta arbete att klassificera ögonbottenbilder på chorioidala naevi som benigna eller maligna. Detta har genomförts genom att utvärdera tre olika faltningsnätverk. Nätverken Inception-v3 och ResNet har jämförts med ett simpelt sex-lagers nätverk. En rad olika konfigurationer av hyperparametrar har utvärderats för att hitta en optimal modell. Efter träning på 8360 datapunkter nådde Inception-v3 ett AUC-värde på 0.912. Med 96% sensitivitet, vilket är samma nivå som oftalmologer, uppnår nätverket 59% specificitet. Alltså kan nätverket filtrera bort en stor del av de friska patienter som undersöks av läkare. Detta kan därför innebära en stor resurseffektivisering av patienter med pigmentförändringar i retina.

APA, Harvard, Vancouver, ISO, and other styles

50

Zamboni, Simone. "Pedestrian trajectory prediction with Convolutional Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-278818.

Full text

Abstract:

Modelling the behaviour of pedestrians is essential in autonomous driving because consequences for misjudging the intentions of a pedestrian can be severe when dealing with vehicles. Therefore, for an autonomous vehicle to plan a safe and collision-free path, it is necessary not only to know the current position of nearby pedestrians but also their future trajectory. In literature, methods to approach the problem of pedestrian trajectory prediction have evolved, transitioning from physics-based models to data-driven models based on recurrent neural networks. This thesis proposes a new approach to pedestrian trajectory prediction, with the introduction of a convolutional model. This new model is able to outperform recurrent models, and it achieves state-of-the-art results on the ETHUCY dataset and on the TrajNet dataset. Moreover, this thesis presents an effective system to represent pedestrian positions and powerful data augmentation techniques, such as the addition of noise and the use of random rotations, which can be applied to any model. Finally, a study on the effectiveness of various techniques to include social information is presented, which demonstrates that simpler approaches fail to capture complex social interaction.
Att modellera fotgängares beteende är en grundläggande byggsten inom förarlös körning eftersom att konsekvenserna av att misstolka en fotgängares intention kan leda till allvarliga konsekvenser när det finns bilar involverade. Därför, för ett förarlöst fordon ska kunna planera en kollisionsfri och effektiv trajektoria, behövs inte bara positionen på de omgivande fotgängarna utan även deras framtida rörelsemönster. I litteraturen har metoder anpassade för att prediktera fotgängare evolverat, med en övergång från fysikbaserade modeller till data-baserade modeller som baseras på repetitiva och återkommande neurala nätverk. Det här examensarbetet föreslår en ny approach för att prediktera fotgängares trajektorior, som introducerar en faltad neuralt nätverks modell. Den här nya modellen är kapabel att prestera bättre än repetitiva och återkommande neurala nätverks modeller, och åstadkommer toppresultat baserat på dataseteten ETH-UCY och TrajNet. Därutöver, presenterar det här examensarbetet ett effektivt system för att representera fotgängares positioner och kraftfulla dataaugmenteringstekniker, som till exempel metoder för metoder för att lägga till brus och även att randomiserade rotationer, som kan tillämpas på alla modeller. Slutligen, så har en studie presenterats för att studera effekten av att inkludera social information, det vill säga de relativa tillstånden mellan fotgängare, vilket redovisar att mindre avancerade modeller inte klarar av att fånga komplex social interaktion.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Neural networks; Visual information'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles