Dissertations / Theses on the topic 'Visual learning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Visual learning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Zhu, Fan. "Visual feature learning." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/8218/.
Full textGoh, Hanlin. "Learning deep visual representations." Paris 6, 2013. http://www.theses.fr/2013PA066356.
Full textRecent advancements in the areas of deep learning and visual information processing have presented an opportunity to unite both fields. These complementary fields combine to tackle the problem of classifying images into their semantic categories. Deep learning brings learning and representational capabilities to a visual processing model that is adapted for image classification. This thesis addresses problems that lead to the proposal of learning deep visual representations for image classification. The problem of deep learning is tackled on two fronts. The first aspect is the problem of unsupervised learning of latent representations from input data. The main focus is the integration of prior knowledge into the learning of restricted Boltzmann machines (RBM) through regularization. Regularizers are proposed to induce sparsity, selectivity and topographic organization in the coding to improve discrimination and invariance. The second direction introduces the notion of gradually transiting from unsupervised layer-wise learning to supervised deep learning. This is done through the integration of bottom-up information with top-down signals. Two novel implementations supporting this notion are explored. The first method uses top-down regularization to train a deep network of RBMs. The second method combines predictive and reconstructive loss functions to optimize a stack of encoder-decoder networks. The proposed deep learning techniques are applied to tackle the image classification problem. The bag-of-words model is adopted due to its strengths in image modeling through the use of local image descriptors and spatial pooling schemes. Deep learning with spatial aggregation is used to learn a hierarchical visual dictionary for encoding the image descriptors into mid-level representations. This method achieves leading image classification performances for object and scene images. The learned dictionaries are diverse and non-redundant. The speed of inference is also high. From this, a further optimization is performed for the subsequent pooling step. This is done by introducing a differentiable pooling parameterization and applying the error backpropagation algorithm. This thesis represents one of the first attempts to synthesize deep learning and the bag-of-words model. This union results in many challenging research problems, leaving much room for further study in this area
Walker, Catherine Livesay. "Visual learning through Hypermedia." CSUSB ScholarWorks, 1996. https://scholarworks.lib.csusb.edu/etd-project/1148.
Full textOwens, Andrew (Andrew Hale). "Learning visual models from paired audio-visual examples." Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/107352.
Full textCataloged from PDF version of thesis.
Includes bibliographical references (pages 93-104).
From the clink of a mug placed onto a saucer to the bustle of a busy café, our days are filled with visual experiences that are accompanied by distinctive sounds. In this thesis, we show that these sounds can provide a rich training signal for learning visual models. First, we propose the task of predicting the sound that an object makes when struck as a way of studying physical interactions within a visual scene. We demonstrate this idea by training an algorithm to produce plausible soundtracks for videos in which people hit and scratch objects with a drumstick. Then, with human studies and automated evaluations on recognition tasks, we verify that the sounds produced by the algorithm convey information about actions and material properties. Second, we show that ambient audio - e.g., crashing waves, people speaking in a crowd - can also be used to learn visual models. We train a convolutional neural network to predict a statistical summary of the sounds that occur within a scene, and we demonstrate that the visual representation learned by the model conveys information about objects and scenes.
by Andrew Owens.
Ph. D.
Peyre, Julia. "Learning to detect visual relations." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE016.
Full textIn this thesis, we study the problem of detection of visual relations of the form (subject, predicate, object) in images, which are intermediate level semantic units between objects and complex scenes. Our work addresses two main challenges in visual relation detection: (1) the difficulty of obtaining box-level annotations to train fully-supervised models, (2) the variability of appearance of visual relations. We first propose a weakly-supervised approach which, given pre-trained object detectors, enables us to learn relation detectors using image-level labels only, maintaining a performance close to fully-supervised models. Second, we propose a model that combines different granularities of embeddings (for subject, object, predicate and triplet) to better model appearance variation and introduce an analogical reasoning module to generalize to unseen triplets. Experimental results demonstrate the improvement of our hybrid model over a purely compositional model and validate the benefits of our transfer by analogy to retrieve unseen triplets
Wang, Zhaoqing. "Self-supervised Visual Representation Learning." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29595.
Full textTang-Wright, Kimmy. "Visual topography and perceptual learning in the primate visual system." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:388b9658-dceb-443a-a19b-c960af162819.
Full textShi, Xiaojin. "Visual learning from small training datasets /." Diss., Digital Dissertations Database. Restricted to UC campuses, 2005. http://uclibs.org/PID/11984.
Full textLiu, Jingen. "Learning Semantic Features for Visual Recognition." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/3358.
Full textPh.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD
Beale, Dan. "Autonomous visual learning for robotic systems." Thesis, University of Bath, 2012. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.558886.
Full textLakshmi, Ratan Aparna. "Learning visual concepts for image classification." Thesis, Massachusetts Institute of Technology, 1999. http://hdl.handle.net/1721.1/80092.
Full textIncludes bibliographical references (leaves 166-174).
by Aparna Lakshmi Ratan.
Ph.D.
Moghaddam, Baback 1963. "Probabilistic visual learning for object detection." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10242.
Full textIncludes bibliographical references (leaves 78-82).
by Baback Moghaddam.
Ph.D.
Wilson, Andrew David. "Learning visual behavior for gesture analysis." Thesis, Massachusetts Institute of Technology, 1995. http://hdl.handle.net/1721.1/62924.
Full textZhou, Bolei. "Interpretable representation learning for visual intelligence." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117837.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 131-140).
Recent progress of deep neural networks in computer vision and machine learning has enabled transformative applications across robotics, healthcare, and security. However, despite the superior performance of the deep neural networks, it remains challenging to understand their inner workings and explain their output predictions. This thesis investigates several novel approaches for opening up the "black box" of neural networks used in visual recognition tasks and understanding their inner working mechanism. I first show that objects and other meaningful concepts emerge as a consequence of recognizing scenes. A network dissection approach is further introduced to automatically identify the internal units as the emergent concept detectors and quantify their interpretability. Then I describe an approach that can efficiently explain the output prediction for any given image. It sheds light on the decision-making process of the networks and why the predictions succeed or fail. Finally, I show some ongoing efforts toward learning efficient and interpretable deep representations for video event understanding and some future directions.
by Bolei Zhou.
Ph. D.
Pillai, Sudeep. "Learning articulated motions from visual demonstration." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/89861.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
35
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 94-98).
Robots operating autonomously in household environments must be capable of interacting with articulated objects on a daily basis. They should be able to infer each object's underlying kinematic linkages purely by observing its motion during manipulation. This work proposes a framework that enables robots to learn the articulation in objects from user-provided demonstrations, using RGB-D sensors. We introduce algorithms that combine concepts in sparse feature tracking, motion segmentation, object pose estimation, and articulation learning, to develop our proposed framework. Additionally, our methods can predict the motion of previously seen articulated objects in future encounters. We present experiments that demonstrate the ability of our method, given RGB-D data, to identify, analyze and predict the articulation of a number of everyday objects within a human-occupied environment.
by Sudeep Pillai.
S.M. in Computer Science and Engineering
Williams, Oliver Michael Christian. "Bayesian learning for efficient visual inference." Thesis, University of Cambridge, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.613979.
Full textNorth, Ben. "Learning dynamical models for visual tracking." Thesis, University of Oxford, 1998. http://ora.ox.ac.uk/objects/uuid:6ed12552-4c30-4d80-88ef-7245be2d8fb8.
Full textFlorence, Peter R. (Peter Raymond). "Dense visual learning for robot manipulation." Thesis, Massachusetts Institute of Technology, 2020. https://hdl.handle.net/1721.1/128398.
Full textThesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2020
Cataloged from student-submitted PDF of thesis.
Includes bibliographical references (pages 115-127).
We would like to have highly useful robots which can richly perceive their world, semantically distinguish its fine details, and physically interact with it sufficiently for useful robotic manipulation. This is hard to achieve with previous methods: prior work has not equipped robots with the scalable ability to understand the dense visual state of their varied environments. The limitations have both been in the state representations used, and how to acquire them without significant human labeling effort. In this thesis we present work that leverages self-supervision, particularly via a mix of geometrical computer vision, deep visual learning, and robotic systems, to scalably produce dense visual inferences of the world state. These methods either enable robots to teach themselves dense visual models without human supervision, or they act as a large multiplying factor on the value of information provided by humans. Specifically, we develop a pipeline for providing ground truth labels of visual data in cluttered and multi-object scenes, we introduce the novel application of dense visual object descriptors to robotic manipulation and provide a fully robot-supervised pipeline to acquire them, and we leverage this dense visual understanding to efficiently learn new manipulation skills through imitation. With real robot hardware we demonstrate contact-rich tasks manipulating household objects, including generalizing across a class of objects, manipulating deformable objects, and manipulating a textureless symmetrical object, all with closed-loop, real-time vision-based manipulation policies.
by Peter R. Florence.
Ph. D.
Ph.D. Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science
Chen, Zhenghao. "Deep Learning for Visual Data Compression." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29729.
Full textDey, Priya. "Visual speech in technology-enhanced learning." Thesis, University of Sheffield, 2012. http://etheses.whiterose.ac.uk/3329/.
Full textNguyen, Duc Minh Chau. "Affordance learning for visual-semantic perception." Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2021. https://ro.ecu.edu.au/theses/2443.
Full textSANGUINETI, VALENTINA. "Audio-Visual Learning for Scene Understanding." Doctoral thesis, Università degli studi di Genova, 2022. http://hdl.handle.net/11567/1068960.
Full textSantolin, Chiara. "Learning Regularities from the Visual World." Doctoral thesis, Università degli studi di Padova, 2016. http://hdl.handle.net/11577/3424417.
Full textIl mondo sensoriale è composto da un insieme di regolarità. Sequenze di sillabe e note musicali, oggetti disposti nell’ambiente visivo e sequenze di eventi sono solo alcune delle tipologie di pattern caratterizzanti l’input sensoriale. La capacità di rilevare queste regolarità risulta fondamentale per l’acquisizione di alcune proprietà del linguaggio naturale (ad esempio, la sintassi), l’apprendimento di sequenze di azioni (ad esempio, il linguaggio dei segni), la discriminazione di eventi ambientali complessi come pure la pianificazione del comportamento. Infatti, rilevare regolarità da una molteplicità di eventi permette di anticipare e pianificare azioni future, aspetti cruciali di adattamento all’ambiente. Questo meccanismo di apprendimento, riportato in letteratura con il nome di statistical learning, consiste nella rilevazione di distribuzioni di probabilità da input sensoriali ovvero, relazioni di dipendenza tra i suoi diversi componenti (ad esempio, X predice Y). Come illustrato nell capitolo introduttivo della presente ricerca, nonostante si tratti di uno dei meccanismi responsabili dell’apprendimento del linguaggio naturale umano, lo statistical learning non sembra essersi evoluto in modo specifico per servire questa funzione. Tale meccanismo rappresenta un processo cognitivo generale che si manifesta in diversi domini sensoriali (acustico, visivo, tattile), modalità (temporale oppure spaziale-statico) e specie (umana e non-umane). La rilevazione di pattern gioca quindi un ruolo fondamentale nell’elaborazione dell’informazione sensoriale, necessaria ad una corretta rappresentazione dell’ambiente. Una volta apprese le regolarità e le strutture presenti nell’ambiente, gli organismi viventi devono saper generalizzare tali strutture a stimoli nuovi da un punto di vista percettivo, ma rappresentanti le stesse regolarità. L’aspetto cruciale della generalizzazione è quindi la capacità di riconoscere una regolarità familiare anche quando implementata da nuovi stimoli. Anche il processo di generalizzazione ricopre un ruolo fondamentale nell’apprendimento della sintassi del linguaggio naturale umano. Ciò nonostante, si tratta di un meccanismo dominio-generale e non specie-specifico. Ciò che non risultava chiaro dalla letteratura era l’ontogenesi di entrambi i meccanismi, specialmente nel dominio visivo. In altre parole, non era chiaro se le abilità di statistical learning e generalizzazione di strutture visive fossero completamente sviluppate alla nascita. Il principale obbiettivo degli esperimenti condotti in questa tesi era quindi quello di approfondire le origini di visual statistical learning e generalizzazione, tramite del pulcino di pollo domestico (Gallus gallus) come modello animale. Appartenendo ad una specie precoce, il pulcino neonato è quasi completamente autonomo per una serie di funzioni comportamentali diventando il candidato ideale per lo studio dell’ontogenesi di diverse abilità percettive e cognitive. La possibilità di essere osservato appena dopo la nascita, e la completa manipolazione dell’ambiente pre- e post- natale (tramite schiusa e allevamento in condizioni controllate), rende il pulcino un’ottimo modello sperimentale per lo studio dell’apprendimento di regolarità. La prima serie di esperimenti illustrati erano allo studio di statistical learning (Chapter 2). Tramite un paradigma sperimentale basato sull’apprendimento per esposizione (imprinting filiale), pulcini neonati naive dal punto di vista visivo, sono stati esposti ad una video-sequenza di elementi visivi arbitrari (forme geometriche). Tale stimolo è definito da una struttura “statistica” basata su transitional (conditional) probabilities che determinano l’ordine di comparsa di ciascun elemento (ad esempio, il quadrato predice la croce con una probabilità del 100%). Al termine della fase di esposizione, i pulcini riuscivano a riconoscere tale sequenza, discriminandola rispetto a sequenze non-familiari che consistevano in una presentazione random degli stessi elementi (ovvero nessun elemento prediceva la comparsa di nessun altro elemento; Experiment 1) oppure in una ricombinazione degli stessi elementi familiari secondo nuovi pattern statistici (ad esempio, il quadrato predice la T con probabilità del 100% ma tale relazione statistica non era mai stata esperita dai pulcini; Experiment 2). In entrambi gli esperimenti i pulcini discriminarono la sequenza familiare da quella non-familiare, dimostrandosi in grado di riconoscere il struttura statistica alla quale erano stati esposti durante la fase d’imprinting. Uno degli aspetti più affascinanti di questo risultato è che il processo di apprendimento è non-supervisionato ovvero nessun rinforzo era stato dato ai pulcini durante la fase di esposizione. Successivamente, sono stati condotti altri due esperimenti (Experiments 3 and 4) con l’obbiettivo di verificare se i pulcini fossero in grado di apprendere regolarità più complesse di quelle testate in precedenza. In particolare, il compito che dovevano svolgere i pulcini consisteva nel differenziare una sequenza familiare strutturata similmente a quella appena descritta e una sequenza non-familiare composta da part-pairs ovvero coppie di figure composte dall’unione dell’ultima figura componente una coppia familiare e la prima figura componente un’altra coppia familiare. Essendo formate dall’unione di elementi appartenenti a coppie familiari, le part-pairs venivano esperite dai pulcini durante la fase di familiarizazzione ma con una probabilità più bassa rispetto alle pairs. La difficoltà del compito risiede quindi nel rilevare una sottile differenza caratterizzante la distribuzione di probabilità dei due stimoli. Sfortunatamente i pulcini non sono stati in grado di discriminare le due sequenze ne quando composte da 8 elementi (Experiment 3) ne da 6 (Experiment 4). L’obbiettivo finale di questi due esperimenti sarebbe stato quello di scoprire il tipo di regolarità appresa dai pulcini. Infatti, negli esperimenti 1 e 2 i pulcini potrebbero aver discriminato sequenze familiari e non familiari sulla base delle frequenze di co-occorrenza delle figure componenti le coppie familiari (ad esempio, co-occorrenza di X e Y) piuttosto che sulle probabilità condizionali (ad esempio, X predice Y). Tuttavia, non avendo superato il test presentato negli esperimenti 3 e 4, la questione riguardante quale tipo di cue statistico viene appreso da questa specie rimane aperta. Possibili spiegazioni e implicazioni teoriche di tale risultato non significativo sono discusse nel capitolo conclusivo. Il secondo gruppo di esperimenti condotti nella presente ricerca riguarda l’indagine del processo di generalizzazione di regolarità visive (Chapter 3). Le regolarità indagate sono rappresentate come stringhe di figure geometriche organizzate spazialmente, i cui elementi sono visibili simultaneamente. Ad esempio, la regolarità definita come AAB viene descritta come una tripletta in cui i primi due elementi sono identici tra loro (AA), seguiti da un’altro elemento diverso dai precedenti (B). I pattern impiegati erano AAB, ABA (Experiment 5) ABB e BAA (Experiment 6) e la procedura sperimentale utilizzata prevedeva addestramento tramite rinforzo alimentare. Una volta imparato a riconoscere il pattern rinforzato (ad esempio, AAB implementato da croce-croce-cerchio) da quello non rinforzato (ad esempio, ABA implementato da croce-cerchio-croce), i pulcini dovevano riconoscere tali strutture rappresentate da nuovi elementi (ad esempio, clessidra-clessidra-freccia vs. clessidra-freccia-clessidra). Gli animali si dimostrarono capaci di generalizzare tutte le regolarità a nuovi esemplari delle stesse. L’aspetto più importante di questi risultati è quanto dimostrato nell’esperimento 6, il cui obbiettivo era quello di indagare le possibili strategie di apprendimento messe in atto dagli animali nello studio precedente. Infatti, considerando il confronto AAB vs. ABA, i pulcini potrebbero aver riconosciuto (e generalizzato) il pattern familiare sulla base della presenza di una ripetizione consecutiva di uno stesso elemento (presente in AAB ma non in ABA, dove lo stesso elemento A è ripetuto e posizionato ai due estremi della tripletta). Nell’esperimento 6 sono state quindi confrontate regolarità caratterizzate da ripetizioni: AAB vs. ABB e AAB vs. BAA. I pulcini si mostrarono comunque in grado di distinguere le nuove regolarità e di generalizzare a nuovi esemplari, suggerendo come tale abilità non sia limitata a un particolare tipo di configurazione. Complessivamente, i risultati ottenuti nella presente ricerca costituiscono la prima evidenza di statistical learning e generalizzazione di regolarità visive in un modello animale osservato appena dopo la nascita. Per quanto riguarda lo statistical learning, i pulcini dimostrano capacità comparabili a quelle osservate in altre specie animali e agli infanti umani ma apparentemente superiori a quelle osservate nel neonato. Ipotesi e implicazioni teoriche di tali differenze sono riportate nel capitolo conclusivo. Per quanto riguarda i processi di generalizzazione, la performance dei pulcini è in linea con quanto dimostrato dai neonati umani nel dominio linguistico. Alla luce di questi risultati, è plausibile pensare che il pulcino si biologicamente predisposto ad rilevare regolarità caratterizzanti il suo ambiente visivo, a partire dai primi momenti di vita.
Durand, Thibaut. "Weakly supervised learning for visual recognition." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066142/document.
Full textThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
Dancette, Corentin. "Shortcut Learning in Visual Question Answering." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS073.
Full textThis thesis is focused on the task of VQA: it consists in answering textual questions about images. We investigate Shortcut Learning in this task: the literature reports the tendency of models to learn superficial correlations leading them to correct answers in most cases, but which can fail when encountering unusual input data. We first propose two methods to reduce shortcut learning on VQA. The first, which we call RUBi, consists of an additional loss to encourage the model to learn from the most difficult and less biased examples -- those which cannot be answered solely from the question. We then propose SCN, a model for the more specific task of visual counting, which incorporates architectural priors designed to make it more robust to distribution shifts. We then study the existence of multimodal shortcuts in the VQA dataset. We show that shortcuts are not only based on correlations between the question and the answer but can also involve image information. We design an evaluation benchmark to measure the robustness of models to multimodal shortcuts. We show that existing models are vulnerable to multimodal shortcut learning. The learning of those shortcuts is particularly harmful when models are evaluated in an out-of-distribution context. Therefore, it is important to evaluate the reliability of VQA models, i.e. We propose a method to improve their ability to abstain from answering when their confidence is too low. It consists of training an external ``selector'' model to predict the confidence of the VQA model. This selector is trained using a cross-validation-like scheme in order to avoid overfitting on the training set
Chen, Yifu. "Deep learning for visual semantic segmentation." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS200.
Full textIn this thesis, we are interested in Visual Semantic Segmentation, one of the high-level task that paves the way towards complete scene understanding. Specifically, it requires a semantic understanding at the pixel level. With the success of deep learning in recent years, semantic segmentation problems are being tackled using deep architectures. In the first part, we focus on the construction of a more appropriate loss function for semantic segmentation. More precisely, we define a novel loss function by employing a semantic edge detection network. This loss imposes pixel-level predictions to be consistent with the ground truth semantic edge information, and thus leads to better shaped segmentation results. In the second part, we address another important issue, namely, alleviating the need for training segmentation models with large amounts of fully annotated data. We propose a novel attribution method that identifies the most significant regions in an image considered by classification networks. We then integrate our attribution method into a weakly supervised segmentation framework. The semantic segmentation models can thus be trained with only image-level labeled data, which can be easily collected in large quantities. All models proposed in this thesis are thoroughly experimentally evaluated on multiple datasets and the results are competitive with the literature
Durand, Thibaut. "Weakly supervised learning for visual recognition." Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066142.
Full textThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
De, Pasquale Roberto. "Visual discrimination learning and LTP-like changes in primary visual cortex." Doctoral thesis, Scuola Normale Superiore, 2009. http://hdl.handle.net/11384/85939.
Full textDoyon, Julien. "Right temporal-lobe contribution to global visual processing and visual-cue learning." Thesis, McGill University, 1988. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=75696.
Full textGepperth, Alexander Rainer Tassilo. "Neural learning methods for visual object detection." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=981053998.
Full textQin, Lei. "Online machine learning methods for visual tracking." Thesis, Troyes, 2014. http://www.theses.fr/2014TROY0017/document.
Full textWe study the challenging problem of tracking an arbitrary object in video sequences with no prior knowledge other than a template annotated in the first frame. To tackle this problem, we build a robust tracking system consisting of the following components. First, for image region representation, we propose some improvements to the region covariance descriptor. Characteristics of a specific object are taken into consideration, before constructing the covariance descriptor. Second, for building the object appearance model, we propose to combine the merits of both generative models and discriminative models by organizing them in a detection cascade. Specifically, generative models are deployed in the early layers for eliminating most easy candidates whereas discriminative models are in the later layers for distinguishing the object from a few similar "distracters". The Partial Least Squares Discriminant Analysis (PLS-DA) is employed for building the discriminative object appearance models. Third, for updating the generative models, we propose a weakly-supervised model updating method, which is based on cluster analysis using the mean-shift gradient density estimation procedure. Fourth, a novel online PLS-DA learning algorithm is developed for incrementally updating the discriminative models. The final tracking system that integrates all these building blocks exhibits good robustness for most challenges in visual tracking. Comparing results conducted in challenging video sequences showed that the proposed tracking system performs favorably with respect to a number of state-of-the-art methods
Pralle, Mandi Jo. "Visual design in the online learning environment." [Ames, Iowa : Iowa State University], 2007.
Find full textHussain, Sibt Ul. "Machine Learning Methods for Visual Object Detection." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00680048.
Full textCabral, Ricardo da Silveira. "Unifying Low-Rank Models for Visual Learning." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/506.
Full textXu, Yang. "Cortical spatiotemporal plasticity in visual category learning." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/272.
Full textRamachandran, Suchitra. "Visual Statistical Learning in Monkey Inferotemporal Cortex." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/463.
Full textFrier, Helen Jane. "Compass orientation during visual learning by honeybees." Thesis, University of Sussex, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321446.
Full textKodirov, Elyor. "Cross-class transfer learning for visual data." Thesis, Queen Mary, University of London, 2017. http://qmro.qmul.ac.uk/xmlui/handle/123456789/31852.
Full textCrowley, Elliott Joseph. "Visual recognition in art using machine learning." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:d917f38e-64cb-4b09-9ccf-b081fe68b187.
Full textKashyap, Karan. "Learning digits via joint audio-visual representations." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113143.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-60).
Our goal is to explore models for language learning in the manner that humans learn languages as children. Namely, children do not have intermediary text transcriptions in correlating visual and audio inputs from the environment; rather, they directly make connections between what they see and what they hear, sometimes even across languages! In this thesis, we present weakly-supervised models for learning representations of numerical digits between two modalities: speech and images. We experiment with architectures of convolutional neural networks taking in spoken utterances of numerical digits and images of handwritten digits as inputs. In nearly all cases we randomly initialize network weights (without pre-training) and evaluate the model's ability to return a matching image for a spoken input or to identify the number of overlapping digits between an utterance and an image. We also provide some visuals as evidence that our models are truly learning correspondences between the two modalities.
by Karan Kashyap.
M. Eng.
Gilja, Vikash. "Learning and applying model-based visual context." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/33139.
Full textIncludes bibliographical references (p. 53).
I believe that context's ability to reduce the ambiguity of an input signal makes it a vital constraint for understanding the real world. I specifically examine the role of context in vision and how a model-based approach can aid visual search and recognition. Through the implementation of a system capable of learning visual context models from an image database, I demonstrate the utility of the model-based approach. The system is capable of learning models for "water-horizon scenes" and "suburban street scenes" from a database of 745 images.
by Vikash Gilja.
M.Eng.
Woodley, Thomas Edward. "Visual tracking using offline and online learning." Thesis, University of Cambridge, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.608814.
Full textNaha, Shujon. "Zero-shot Learning for Visual Recognition Problems." IEEE, 2015. http://hdl.handle.net/1993/31806.
Full textOctober 2016
Rao, Anantha N. "Learning-based Visual Odometry - A Transformer Approach." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1627658636420617.
Full textHorn, Robert R. "Visual attention and information in observational learning." Thesis, Liverpool John Moores University, 2003. http://researchonline.ljmu.ac.uk/5624/.
Full textWhite, Alan Daniel. "Visual-motor learning in minimally invasive surgery." Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/17321/.
Full textHanwell, David. "Weakly supervised learning of visual semantic attributes." Thesis, University of Bristol, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687063.
Full textHussain, Sabit ul. "Machine Learning Methods for Visual Object Detection." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENM070/document.
Full textThe goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP
Campanholo, Guizilini Vitor. "Non-Parametric Learning for Monocular Visual Odometry." Thesis, The University of Sydney, 2013. http://hdl.handle.net/2123/9903.
Full textLiu, Li. "Learning discriminative feature representations for visual categorization." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/8239/.
Full text