Dissertations / Theses on the topic 'Visual representation learning'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Visual representation learning.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Wang, Zhaoqing. "Self-supervised Visual Representation Learning." Thesis, The University of Sydney, 2022. https://hdl.handle.net/2123/29595.
Full textZhou, Bolei. "Interpretable representation learning for visual intelligence." Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117837.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 131-140).
Recent progress of deep neural networks in computer vision and machine learning has enabled transformative applications across robotics, healthcare, and security. However, despite the superior performance of the deep neural networks, it remains challenging to understand their inner workings and explain their output predictions. This thesis investigates several novel approaches for opening up the "black box" of neural networks used in visual recognition tasks and understanding their inner working mechanism. I first show that objects and other meaningful concepts emerge as a consequence of recognizing scenes. A network dissection approach is further introduced to automatically identify the internal units as the emergent concept detectors and quantify their interpretability. Then I describe an approach that can efficiently explain the output prediction for any given image. It sheds light on the decision-making process of the networks and why the predictions succeed or fail. Finally, I show some ongoing efforts toward learning efficient and interpretable deep representations for video event understanding and some future directions.
by Bolei Zhou.
Ph. D.
Ben-Younes, Hedi. "Multi-modal representation learning towards visual reasoning." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.
Full textThe quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature
Sharif, Razavian Ali. "Convolutional Network Representation for Visual Recognition." Doctoral thesis, KTH, Robotik, perception och lärande, RPL, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-197919.
Full textQC 20161209
Yu, Mengyang. "Feature reduction and representation learning for visual applications." Thesis, Northumbria University, 2016. http://nrl.northumbria.ac.uk/30222/.
Full textVenkataramanan, Shashanka. "Metric learning for instance and category-level visual representation." Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS022.
Full textThe primary goal in computer vision is to enable machines to extract meaningful information from visual data, such as images and videos, and leverage this information to perform a wide range of tasks. To this end, substantial research has focused on developing deep learning models capable of encoding comprehensive and robust visual representations. A prominent strategy in this context involves pretraining models on large-scale datasets, such as ImageNet, to learn representations that can exhibit cross-task applicability and facilitate the successful handling of diverse downstream tasks with minimal effort. To facilitate learning on these large-scale datasets and encode good representations, com- plex data augmentation strategies have been used. However, these augmentations can be limited in their scope, either being hand-crafted and lacking diversity, or generating images that appear unnatural. Moreover, the focus of these augmentation techniques has primarily been on the ImageNet dataset and its downstream tasks, limiting their applicability to a broader range of computer vision problems. In this thesis, we aim to tackle these limitations by exploring different approaches to en- hance the efficiency and effectiveness in representation learning. The common thread across the works presented is the use of interpolation-based techniques, such as mixup, to generate diverse and informative training examples beyond the original dataset. In the first work, we are motivated by the idea of deformation as a natural way of interpolating images rather than using a convex combination. We show that geometrically aligning the two images in the fea- ture space, allows for more natural interpolation that retains the geometry of one image and the texture of the other, connecting it to style transfer. Drawing from these observations, we explore the combination of mixup and deep metric learning. We develop a generalized formu- lation that accommodates mixup in metric learning, leading to improved representations that explore areas of the embedding space beyond the training classes. Building on these insights, we revisit the original motivation of mixup and generate a larger number of interpolated examples beyond the mini-batch size by interpolating in the embedding space. This approach allows us to sample on the entire convex hull of the mini-batch, rather than just along lin- ear segments between pairs of examples. Finally, we investigate the potential of using natural augmentations of objects from videos. We introduce a "Walking Tours" dataset of first-person egocentric videos, which capture a diverse range of objects and actions in natural scene transi- tions. We then propose a novel self-supervised pretraining method called DoRA, which detects and tracks objects in video frames, deriving multiple views from the tracks and using them in a self-supervised manner
Li, Nuo Ph D. Massachusetts Institute of Technology. "Unsupervised learning of invariant object representation in primate visual cortex." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/65288.
Full textCataloged from PDF version of thesis.
Includes bibliographical references.
Visual object recognition (categorization and identification) is one of the most fundamental cognitive functions for our survival. Our visual system has the remarkable ability to convey to us visual object and category information in a manner that is largely tolerant ("invariant") to the exact position, size, pose of the object, illumination, and clutter. The ventral visual stream in non-human primate has solved this problem. At the highest stage of the visual hierarchy, the inferior temporal cortex (IT), neurons have selectivity for objects and maintain that selectivity across variations in the images. A reasonably sized population of these tolerant neurons can support object recognition. However, we do not yet understand how IT neurons construct this neuronal tolerance. The aim of this thesis is to tackle this question and to examine the hypothesis that the ventral visual stream may leverage experience to build its neuronal tolerance. One potentially powerful idea is that time can act as an implicit teacher, in that each object's identity tends to remain temporally stable, thus different retinal images of the same object are temporally contiguous. In theory, the ventral stream could take advantage of this natural tendency and learn to associate together the neuronal representations of temporally contiguous retinal images to yield tolerant object selectivity in IT cortex. In this thesis, I report neuronal support for this hypothesis in IT of non-human primates. First, targeted alteration of temporally contiguous experience with object images at different retinal positions rapidly reshaped IT neurons' position tolerance. Second, similar temporal contiguity manipulation of experience with object images at different sizes similarly reshaped IT size tolerance. These instances of experience-induced effect were similar in magnitude, grew gradually stronger with increasing visual experience, and the size of the effect was large. Taken together, these studies show that unsupervised, temporally contiguous experience can reshape and build at least two types of IT tolerance, and that they can do so under a wide range of spatiotemporal regimes encountered during natural visual exploration. These results suggest that the ventral visual stream uses temporal contiguity visual experience with a general unsupervised tolerance learning (UTL) mechanism to build its invariant object representation.
by Nuo Li.
Ph.D.
Dalens, Théophile. "Learnable factored image representation for visual discovery." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE036.
Full textThis thesis proposes an approach for analyzing unpaired visual data annotated with time stamps by generating how images would have looked like if they were from different times. To isolate and transfer time dependent appearance variations, we introduce a new trainable bilinear factor separation module. We analyze its relation to classical factored representations and concatenation-based auto-encoders. We demonstrate this new module has clear advantages compared to standard concatenation when used in a bottleneck encoder-decoder convolutional neural network architecture. We also show that it can be inserted in a recent adversarial image translation architecture, enabling the image transformation to multiple different target time periods using a single network
Jonaityte, Inga <1981>. "Visual representation and financial decision making." Doctoral thesis, Università Ca' Foscari Venezia, 2014. http://hdl.handle.net/10579/4593.
Full textQuesta tesi affronta sperimentalmente gli effetti delle rappresentazioni visive sulle decisioni finanziarie. Ipotizziamo che le rappresentazioni visive dell'informazione finanziaria possano influenzare le decisioni. Per testare tali ipotesi, abbiamo condotto esperimenti online e mostrato che la scelta della rappresentazione visiva conduce a cambiamenti nell'attenzione, comprensione, e valutazione dell'informazione. Il secondo studio riguarda l'abilità dei consulenti finanziari di offrire giudizio esperto per aiutare consumatori inesperti nelle decisioni finanziarie. Abbiamo trovato che il contenuto della pubblicità influenza significativamente tanto l'esperto quanto l'inesperto, il che offre una nuova prospettiva sulle decisioni dei consulenti finanziari. Il terzo tema riguarda l'apprendimento da informazioni multidimensionali, l'adattamento al cambiamento e lo sviluppo di nuove strategie. Abbiamo investigato gli effetti dell'importanza delle "cues" e di cambiamenti dell'ambiente decisionale sull'apprendimento. Trasformazioni improvvise nell'ambiente decisionale sono più dannose di trasformazioni graduali.
Büchler, Uta [Verfasser], and Björn [Akademischer Betreuer] Ommer. "Visual Representation Learning with Minimal Supervision / Uta Büchler ; Betreuer: Björn Ommer." Heidelberg : Universitätsbibliothek Heidelberg, 2021. http://d-nb.info/1225868505/34.
Full textSanakoyeu, Artsiom [Verfasser], and Björn [Akademischer Betreuer] Ommer. "Visual Representation Learning with Limited Supervision / Artsiom Sanakoyeu ; Betreuer: Björn Ommer." Heidelberg : Universitätsbibliothek Heidelberg, 2021. http://d-nb.info/1231632488/34.
Full textAnand, Gaurangi. "Unsupervised visual perception-based representation learning for time-series and trajectories." Thesis, Queensland University of Technology, 2021. https://eprints.qut.edu.au/212901/1/Gaurangi_Anand_Thesis.pdf.
Full textJones, Carl. "Localisation and representation of visual memory in the domestic chick." Thesis, University of Sussex, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.324183.
Full textWang, Qian. "Zero-shot visual recognition via latent embedding learning." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/zeroshot-visual-recognition-via-latent-embedding-learning(bec510af-6a53-4114-9407-75212e1a08e1).html.
Full textXu, Dan. "Exploring Multi-Modal and Structured Representation Learning for Visual Image and Video Understanding." Doctoral thesis, Università degli studi di Trento, 2018. https://hdl.handle.net/11572/367610.
Full textXu, Dan. "Exploring Multi-Modal and Structured Representation Learning for Visual Image and Video Understanding." Doctoral thesis, University of Trento, 2018. http://eprints-phd.biblio.unitn.it/2918/1/disclaimer.pdf.
Full textLee, Wooyoung. "Learning Statistical Features of Scene Images." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/540.
Full textAzizpour, Hossein. "Visual Representations and Models: From Latent SVM to Deep Learning." Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.
Full textQC 20160908
Varol, Gül. "Learning human body and human action representations from visual data." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE029.
Full textThe focus of visual content is often people. Automatic analysis of people from visual data is therefore of great importance for numerous applications in content search, autonomous driving, surveillance, health care, and entertainment. The goal of this thesis is to learn visual representations for human understanding. Particular emphasis is given to two closely related areas of computer vision: human body analysis and human action recognition. In summary, our contributions are the following: (i) we generate photo-realistic synthetic data for people that allows training CNNs for human body analysis, (ii) we propose a multi-task architecture to recover a volumetric body shape from a single image, (iii) we study the benefits of long-term temporal convolutions for human action recognition using 3D CNNs, (iv) we incorporate similarity training in multi-view videos to design view-independent representations for action recognition
Krawec, Jennifer Lee. "Problem Representation and Mathematical Problem Solving of Students of Varying Math Ability." Scholarly Repository, 2010. http://scholarlyrepository.miami.edu/oa_dissertations/455.
Full textZhao, Yongheng. "3D feature representations for visual perception and geometric shape understanding." Doctoral thesis, Università degli studi di Padova, 2019. http://hdl.handle.net/11577/3424787.
Full textRouhafzay, Ghazal. "3D Object Representation and Recognition Based on Biologically Inspired Combined Use of Visual and Tactile Data." Thesis, Université d'Ottawa / University of Ottawa, 2021. http://hdl.handle.net/10393/42122.
Full textPlebe, Alice. "Cognitively Guided Modeling of Visual Perception in Intelligent Vehicles." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/299909.
Full textPlebe, Alice. "Cognitively Guided Modeling of Visual Perception in Intelligent Vehicles." Doctoral thesis, Università degli studi di Trento, 2021. http://hdl.handle.net/11572/299909.
Full textYaner, Patrick William. "From Shape to Function: Acquisition of Teleological Models from Design Drawings by Compositional Analogy." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19791.
Full textCommittee Chair: Goel, Ashok; Committee Member: Eastman, Charles; Committee Member: Ferguson, Ronald; Committee Member: Glasgow, Janice; Committee Member: Nersessian, Nancy; Committee Member: Ram, Ashwin.
Garg, Sourav. "Robust visual place recognition under simultaneous variations in viewpoint and appearance." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/134410/1/Sourav%20Garg%20Thesis.pdf.
Full textGoh, Hanlin. "Learning deep visual representations." Paris 6, 2013. http://www.theses.fr/2013PA066356.
Full textRecent advancements in the areas of deep learning and visual information processing have presented an opportunity to unite both fields. These complementary fields combine to tackle the problem of classifying images into their semantic categories. Deep learning brings learning and representational capabilities to a visual processing model that is adapted for image classification. This thesis addresses problems that lead to the proposal of learning deep visual representations for image classification. The problem of deep learning is tackled on two fronts. The first aspect is the problem of unsupervised learning of latent representations from input data. The main focus is the integration of prior knowledge into the learning of restricted Boltzmann machines (RBM) through regularization. Regularizers are proposed to induce sparsity, selectivity and topographic organization in the coding to improve discrimination and invariance. The second direction introduces the notion of gradually transiting from unsupervised layer-wise learning to supervised deep learning. This is done through the integration of bottom-up information with top-down signals. Two novel implementations supporting this notion are explored. The first method uses top-down regularization to train a deep network of RBMs. The second method combines predictive and reconstructive loss functions to optimize a stack of encoder-decoder networks. The proposed deep learning techniques are applied to tackle the image classification problem. The bag-of-words model is adopted due to its strengths in image modeling through the use of local image descriptors and spatial pooling schemes. Deep learning with spatial aggregation is used to learn a hierarchical visual dictionary for encoding the image descriptors into mid-level representations. This method achieves leading image classification performances for object and scene images. The learned dictionaries are diverse and non-redundant. The speed of inference is also high. From this, a further optimization is performed for the subsequent pooling step. This is done by introducing a differentiable pooling parameterization and applying the error backpropagation algorithm. This thesis represents one of the first attempts to synthesize deep learning and the bag-of-words model. This union results in many challenging research problems, leaving much room for further study in this area
Sicilia, Gómez Álvaro. "Supporting Tools for Automated Generation and Visual Editing of Relational-to-Ontology Mappings." Doctoral thesis, Universitat Ramon Llull, 2016. http://hdl.handle.net/10803/398843.
Full textLa integración de datos con formatos heterogéneos y de diversos dominios mediante tecnologías de la Web Semántica permite solventar su disparidad estructural y semántica. El acceso a datos basado en ontologías (OBDA, en inglés) es una solución integral que se basa en el uso de ontologías como esquemas mediadores y mapeos entre los datos y las ontologías para facilitar la consulta de las fuentes de datos. Sin embargo, una de las principales barreras que puede dificultar más la adopción de OBDA es la falta de herramientas para apoyar la creación de mapeos entre datos y ontologías. El objetivo de esta investigación ha sido desarrollar nuevas herramientas que permitan a expertos sin conocimientos de ontologías la creación de mapeos entre datos y ontologías. Con este fin, se han llevado a cabo dos líneas de trabajo: la generación automática de mapeos entre datos relacionales y ontologías y la edición de los mapeos a través de su representación visual. Las herramientas actualmente disponibles para automatizar la generación de mapeos están lejos de proporcionar una solución completa, ya que se basan en los esquemas relacionales y apenas tienen en cuenta los contenidos de la fuente de datos relacional y las características de la ontología. Sin embargo, los datos pueden contener relaciones ocultas que pueden ayudar a la generación de mapeos. Para superar esta limitación, hemos desarrollado AutoMap4OBDA, un sistema que genera automáticamente mapeos R2RML a partir del análisis de los contenidos de la fuente relacional y teniendo en cuenta las características de la ontología. El sistema emplea una técnica de aprendizaje de ontologías para inferir jerarquías de clases, selecciona las métricas de similitud de cadenas en base a las etiquetas de las ontologías y analiza las estructuras de grafos para generar los mapeos a partir de la estructura de la ontología. La representación visual por medio de interfaces intuitivas puede ayudar a los usuarios sin conocimientos técnicos a establecer mapeos entre una fuente relacional y una ontología. Sin embargo, las herramientas existentes para la edición visual de mapeos muestran algunas limitaciones. En particular, la representación de mapeos no contempla las estructuras de la fuente relacional y de la ontología de forma conjunta. Para superar este inconveniente, hemos desarrollado Map-On, un entorno visual web para la edición manual de mapeos. AutoMap4OBDA ha demostrado que supera las prestaciones de las soluciones existentes para la generación de mapeos. Map-On se ha aplicado en proyectos de investigación para verificar su eficacia en la gestión de mapeos.
Integration of data from heterogeneous formats and domains based on Semantic Web technologies enables us to solve their structural and semantic heterogeneity. Ontology-based data access (OBDA) is a comprehensive solution which relies on the use of ontologies as mediator schemas and relational-to-ontology mappings to facilitate data source querying. However, one of the greatest obstacles in the adoption of OBDA is the lack of tools to support the creation of mappings between physically stored data and ontologies. The objective of this research has been to develop new tools that allow non-ontology experts to create relational-to-ontology mappings. For this purpose, two lines of work have been carried out: the automated generation of relational-to-ontology mappings, and visual support for mapping editing. The tools currently available to automate the generation of mappings are far from providing a complete solution, since they rely on relational schemas and barely take into account the contents of the relational data source and features of the ontology. However, the data may contain hidden relationships that can help in the process of mapping generation. To overcome this limitation, we have developed AutoMap4OBDA, a system that automatically generates R2RML mappings from the analysis of the contents of the relational source and takes into account the characteristics of ontology. The system employs an ontology learning technique to infer class hierarchies, selects the string similarity metric based on the labels of ontologies, and analyses the graph structures to generate the mappings from the structure of the ontology. The visual representation through intuitive interfaces can help non-technical users to establish mappings between a relational source and an ontology. However, existing tools for visual editing of mappings show somewhat limitations. In particular, the visual representation of mapping does not embrace the structure of the relational source and the ontology at the same time. To overcome this problem, we have developed Map-On, a visual web environment for the manual editing of mappings. AutoMap4OBDA has been shown to outperform existing solutions in the generation of mappings. Map-On has been applied in research projects to verify its effectiveness in managing mappings.
Åberg, Ludvig. "Multimodal Classification of Second-Hand E-Commerce Ads." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-233324.
Full textProdukter som läggs ut på marknadsplatser, såsom Blocket.se, kategoriseras oftast av säljaren själv. Att automatisera processen för kategorisering gör det därför både enklare och snabbare att lägga upp annonser och kan minska antalet produkter med felaktig kategori. Automatisk kategorisering gör det ocksåmöjligt för marknadsplatsen att använda ett mer detaljerat kategorisystem, vilket skulle kunna effektivisera sökandet efter produkter för potentiella köpare.Produktkategorisering adresseras ofta som ett klassificeringsproblem för text, eftersom den största delen av produktinformationen finns i skriftlig form. Genom att också inkludera produktbilder kan vi dock förvänta oss bättre resultat.I den här uppsatsen evalueras olika metoder för att använda både bild och text för annonsklassificering av data från blocket.se. I synnerhetundersökslate fusion modeller, där informationen från modaliteterna kombineras i samband med klassificeringen, samt early fusion modeller, där modaliteterna istället kombineras på en abstrakt nivå innan klassificeringen. Vi introduserar också vår egen modell Text Based Visual Attention (TBVA), en utvidgning av bildklassificeraren Inception v3 [1], som använder en attention mekanism för att inkorporera textinformation. För alla modeller som beskrivs i denna uppsats används textklassificeraren fast Text[2] för att processa text och bildklassificeraren Inception v3 för att processa bild. Våra resultat visar att late fusion modeller presterar bäst med vår data. I slutsatsen konstateras att late fusion modellerna lär sig vilka fall den ska 'lita' på text eller bild informationen, där early fusion och TBVA modellerna istället lär sig mer abstrakta koncept. Som framtida arbete tror vi det skulle vara av värde att undersöka hur TBVA modellerna presterar på andra uppgifter, såsom att bedöma likheter mellan annonser.
SAGHIÉ, Najla Fouad. "O ENSINO / APRENDIZAGEM DA LÍNGUA INGLESA NA PERSPECTIVA DA CULTURA VISUAL." Universidade Federal de Goiás, 2008. http://repositorio.bc.ufg.br/tede/handle/tde/2802.
Full textThis work focuses on discussing a methodological approach application of teaching based on reading and interpretation of the electronic and printed media image, intending to recognize linguistic elements (foreign words) linked to the subtitles from the ads. In the educational framework, suggesting an English Language teaching integrated to the Visual Culture and Art, contemplating the imagistic culture. This work, we also research the mediator process, that is, the teacher interference as a guide in the involvement between images, students and their representations. This dissertation is a result from a camp research, which on we applied an interpretative and prescriptive study about images from publicity done to twelfth-grade students (two groups formed by fifteen of teenagers) from a private school in Goiânia-GO. Reflecting about verbal language (written in English) and nonverbal (practices of looking images), in order to contribute to the educational context, that means, in the English Language teaching / learning process. This research was supported by theories of Visual Culture, Education, Discourse Analysis and Transdisciplinarity, argued for several theoreticians: Maingueneau (2004), Duncun (2003), Barbosa (2002) among others authors who had contributed, significantly, in order to a comprehension and analysis about this work. This way, we look forward to this investigation could, in some way, provide to go deeper into an interdisciplinary teaching with Visual Culture and the critical reading about the visual manifestations.
Este trabalho tem por objetivo discutir uma proposta de abordagem metodológica de ensino com base na leitura e interpretação imagética da mídia eletrônica e impressa, com o intuito de compreender os elementos lingüísticos: estrangeirismos vinculados à legenda da propaganda, em um contexto educacional, sugerindo o ensino de Língua Inglesa integrada à Cultura Visual e Arte, e contemplando a cultura imagética. Neste trabalho, pesquisei, também, os processos de mediação, ou seja, a interferência do professor como orientador do envolvimento entre imagens e alunos, e suas representações. Esse trabalho é resultado de uma pesquisa de campo, no qual apliquei um estudo interpretativo e prescritivo das imagens publicitárias, para alunos de sétimo ano do Ensino Fundamental (dois grupos de quinze alunos), de uma escola particular de Goiânia-Go, como forma de reflexão sobre as linguagens verbal (texto escrito em inglês) e não verbal (práticas do ver - imagens), no sentido de contribuir com o contexto educacional, ou melhor, com o processo ensino/aprendizagem da Língua Inglesa. Essa pesquisa foi apoiada pelas teorias da Cultura Visual, Educação, Análise do Discurso e Transdisciplinaridade, pontuadas por vários teóricos: Maingueneau (2004), Duncun (2003), Barbosa (2002) entre outros autores que contribuíram, significativamente, para a compreensão e análise deste trabalho. Desse modo, espero que esta pesquisa possa, de alguma forma, colaborar para o aprofundamento do ensino interdisciplinar com a Cultura Visual, e da leitura crítica das visualidades.
Kochukhova, Olga. "When, Where and What : The Development of Perceived Spatio-Temporal Continuity." Doctoral thesis, Uppsala : Acta Universitatis Upsaliensis, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-7760.
Full textKashyap, Karan. "Learning digits via joint audio-visual representations." Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/113143.
Full textThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 59-60).
Our goal is to explore models for language learning in the manner that humans learn languages as children. Namely, children do not have intermediary text transcriptions in correlating visual and audio inputs from the environment; rather, they directly make connections between what they see and what they hear, sometimes even across languages! In this thesis, we present weakly-supervised models for learning representations of numerical digits between two modalities: speech and images. We experiment with architectures of convolutional neural networks taking in spoken utterances of numerical digits and images of handwritten digits as inputs. In nearly all cases we randomly initialize network weights (without pre-training) and evaluate the model's ability to return a matching image for a spoken input or to identify the number of overlapping digits between an utterance and an image. We also provide some visuals as evidence that our models are truly learning correspondences between the two modalities.
by Karan Kashyap.
M. Eng.
Liu, Li. "Learning discriminative feature representations for visual categorization." Thesis, University of Sheffield, 2015. http://etheses.whiterose.ac.uk/8239/.
Full textRahim, Medhat H., and Radcliffe Siddo. "The use of visualization for learning and teaching mathematics." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2012. http://nbn-resolving.de/urn:nbn:de:bsz:14-qucosa-80852.
Full textDoersch, Carl. "Supervision Beyond Manual Annotations for Learning Visual Representations." Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/787.
Full textJankowska, Gierus Bogumila. "Learning with visual representations through cognitive load theory." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104827.
Full textCette étude a examiné deux stratégies différentes d'apprendre à l'aide des diagrammes: le dessin de diagrammes tout en apprenant ou en apprenant sur la base des diagrammes préconstruits. Cent quatre-vingt-seize étudiants de lycée ont été aléatoirement placés dans une condition où soit ils dessinaient tout en se renseignant sur la façon dont les avions volent ou étudiaient à partir des diagrammes préconstruits. Avant l'étude, les stratégies de connaissance et d'élaboration des étudiants ont été vérifiées. Pendant l'étude sous l'une ou l'autre des conditions, les étudiants signalaient leur effort mental. Suite à cela, l'étude des étudiants est examinée sur une tâche semblable et une tâche de transfert. Cadre théorique de Cook (2006), qui combine la théorie de la connaissance antérieure et de charge cognitive sur les représentations visuelles dans l'éducation de la science, ont été employés pour analyser les résultats. Les résultats ont prouvé que l'effort mental des étudiants a augmenté sensiblement sous condition de dessin, pourtant les résultats sur le post-test étaient mitigés. En règle générale, les étudiants ont fait plus ou moins mauvais sur les mesures de post-test quand ils ont appris en traçant des diagrammes au contraire de l'utilisation des diagrammes préconstruits pour apprendre. Cependant, les étudiants ayant une faible connaissance de base ont mieux exécuté le post-test en traçant leurs propres diagrammes. Les stratégies d'élaborations n'ont pas exercé d' effet sur l'accomplissement ou l'effort mental des étudiants pour chacune des conditions.
Parekh, Sanjeel. "Learning representations for robust audio-visual scene analysis." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT015/document.
Full textThe goal of this thesis is to design algorithms that enable robust detection of objectsand events in videos through joint audio-visual analysis. This is motivated by humans’remarkable ability to meaningfully integrate auditory and visual characteristics forperception in noisy scenarios. To this end, we identify two kinds of natural associationsbetween the modalities in recordings made using a single microphone and camera,namely motion-audio correlation and appearance-audio co-occurrence.For the former, we use audio source separation as the primary application andpropose two novel methods within the popular non-negative matrix factorizationframework. The central idea is to utilize the temporal correlation between audio andmotion for objects/actions where the sound-producing motion is visible. The firstproposed method focuses on soft coupling between audio and motion representationscapturing temporal variations, while the second is based on cross-modal regression.We segregate several challenging audio mixtures of string instruments into theirconstituent sources using these approaches.To identify and extract many commonly encountered objects, we leverageappearance–audio co-occurrence in large datasets. This complementary associationmechanism is particularly useful for objects where motion-based correlations are notvisible or available. The problem is dealt with in a weakly-supervised setting whereinwe design a representation learning framework for robust AV event classification,visual object localization, audio event detection and source separation.We extensively test the proposed ideas on publicly available datasets. The experimentsdemonstrate several intuitive multimodal phenomena that humans utilize on aregular basis for robust scene understanding
Parekh, Sanjeel. "Learning representations for robust audio-visual scene analysis." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT015.
Full textThe goal of this thesis is to design algorithms that enable robust detection of objectsand events in videos through joint audio-visual analysis. This is motivated by humans’remarkable ability to meaningfully integrate auditory and visual characteristics forperception in noisy scenarios. To this end, we identify two kinds of natural associationsbetween the modalities in recordings made using a single microphone and camera,namely motion-audio correlation and appearance-audio co-occurrence.For the former, we use audio source separation as the primary application andpropose two novel methods within the popular non-negative matrix factorizationframework. The central idea is to utilize the temporal correlation between audio andmotion for objects/actions where the sound-producing motion is visible. The firstproposed method focuses on soft coupling between audio and motion representationscapturing temporal variations, while the second is based on cross-modal regression.We segregate several challenging audio mixtures of string instruments into theirconstituent sources using these approaches.To identify and extract many commonly encountered objects, we leverageappearance–audio co-occurrence in large datasets. This complementary associationmechanism is particularly useful for objects where motion-based correlations are notvisible or available. The problem is dealt with in a weakly-supervised setting whereinwe design a representation learning framework for robust AV event classification,visual object localization, audio event detection and source separation.We extensively test the proposed ideas on publicly available datasets. The experimentsdemonstrate several intuitive multimodal phenomena that humans utilize on aregular basis for robust scene understanding
Silberer, Carina Helga. "Learning visually grounded meaning representations." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/14236.
Full textRobert, Thomas. "Improving Latent Representations of ConvNets for Visual Understanding." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS343.
Full textFor a decade now, convolutional deep neural networks have demonstrated their ability to produce excellent results for computer vision. For this, these models transform the input image into a series of latent representations. In this thesis, we work on improving the "quality'' of the latent representations of ConvNets for different tasks. First, we work on regularizing those representations to increase their robustness toward intra-class variations and thus improve their performance for classification. To do so, we develop a loss based on information theory metrics to decrease the entropy conditionally to the class. Then, we propose to structure the information in two complementary latent spaces, solving a conflict between the invariance of the representations and the reconstruction task. This structure allows to release the constraint posed by classical architecture, allowing to obtain better results in the context of semi-supervised learning. Finally, we address the problem of disentangling, i.e. explicitly separating and representing independent factors of variation of the dataset. We pursue our work on structuring the latent spaces and use adversarial costs to ensure an effective separation of the information. This allows to improve the quality of the representations and allows semantic image editing
Fernández, López Adriana. "Learning of meaningful visual representations for continuous lip-reading." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/671206.
Full textEn les darreres dècades, hi ha hagut un interès creixent en la descodificació de la parla utilitzant exclusivament senyals visuals, es a dir, imitant la capacitat humana de llegir els llavis, donant lloc a sistemes de lectura automàtica de llavis (ALR). No obstant això, se sap que l’accès a la parla a través del canal visual està subjecte a moltes limitacions en comparació amb el senyal acústic, es a dir, s’ha argumentat que els humans poden llegir al voltant del 30% de la informació dels llavis, i la resta es completa fent servir el context. Així, un dels principals reptes de l’ALR resideix en les ambigüitats visuals que sorgeixen a escala de paraula, destacant que no tots els sons que escoltem es poden distingir fàcilment observant els llavis. A la literatura, els primers sistemes ALR van abordar tasques de reconeixement senzilles, com ara el reconeixement de l’alfabet o els dígits, però progressivament van passar a entorns mes complexos i realistes que han conduït a diversos sistemes recents dirigits a la lectura continua dels llavis. En gran manera, aquests avenços han estat possibles gracies a la construcció de sistemes potents basats en arquitectures d’aprenentatge profund que han començat a substituir ràpidament els sistemes tradicionals. Tot i que les taxes de reconeixement de la lectura continua dels llavis poden semblar modestes en comparació amb les assolides pels sistemes basats en audio, és evident que el camp ha fet un pas endavant. Curiosament, es pot observar un efecte anàleg quan els humans intenten descodificar la parla: donats senyals sense soroll, la majoria de la gent pot descodificar el canal d’àudio sense esforç¸, però tindria dificultats per llegir els llavis, ja que l’ambigüitat dels senyals visuals fa necessari l’ús de context addicional per descodificar el missatge. En aquesta tesi explorem el modelatge adequat de representacions visuals amb l’objectiu de millorar la lectura contínua dels llavis. Amb aquest objectiu, presentem diferents mecanismes basats en dades per fer front als principals reptes de la lectura de llavis relacionats amb les ambigüitats o la dependència dels parlants dels senyals visuals. Els nostres resultats destaquen els avantatges d’una correcta codificació del canal visual, per a la qual les característiques més útils són aquelles que codifiquen les posicions corresponents dels llavis d’una manera similar, independentment de l’orador. Aquest fet obre la porta a i) la lectura de llavis en molts idiomes diferents sense necessitat de conjunts de dades a gran escala, i ii) a l’augment de la contribució del canal visual en sistemes de parla audiovisuals.´ D’altra banda, els nostres experiments identifiquen una tendència a centrar-se en iii la modelització del context temporal com la clau per avançar en el camp, on hi ha la necessitat de models d’ALR que s’entrenin en conjunts de dades que incloguin una gran variabilitat de la parla a diversos nivells de context. En aquesta tesi, demostrem que tant el modelatge adequat de les representacions visuals com la capacitat de retenir el context a diversos nivells són condicions necessàries per construir sistemes de lectura de llavis amb èxit.
Evans, Benjamin D. "Learning transformation-invariant visual representations in spiking neural networks." Thesis, University of Oxford, 2012. https://ora.ox.ac.uk/objects/uuid:15bdf771-de28-400e-a1a7-82228c7f01e4.
Full textFeng, Zeyu. "Learning Deep Representations from Unlabelled Data for Visual Recognition." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/26876.
Full textAl, chanti Dawood. "Analyse Automatique des Macro et Micro Expressions Faciales : Détection et Reconnaissance par Machine Learning." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAT058.
Full textFacial expression analysis is an important problem in many biometric tasks, such as face recognition, face animation, affective computing and human computer interface. In this thesis, we aim at analyzing facial expressions of a face using images and video sequences. We divided the problem into three leading parts.First, we study Macro Facial Expressions for Emotion Recognition and we propose three different levels of feature representations. Low-level feature through a Bag of Visual Word model, mid-level feature through Sparse Representation and hierarchical features through a Deep Learning based method. The objective of doing this is to find the most effective and efficient representation that contains distinctive information of expressions and that overcomes various challenges coming from: 1) intrinsic factors such as appearance and expressiveness variability and 2) extrinsic factors such as illumination, pose, scale and imaging parameters, e.g., resolution, focus, imaging, noise. Then, we incorporate the time dimension to extract spatio-temporal features with the objective to describe subtle feature deformations to discriminate ambiguous classes.Second, we direct our research toward transfer learning, where we aim at Adapting Facial Expression Category Models to New Domains and Tasks. Thus we study domain adaptation and zero shot learning for developing a method that solves the two tasks jointly. Our method is suitable for unlabelled target datasets coming from different data distributions than the source domain and for unlabelled target datasets with different label distributions but sharing the same context as the source domain. Therefore, to permit knowledge transfer between domains and tasks, we use Euclidean learning and Convolutional Neural Networks to design a mapping function that map the visual information coming from facial expressions into a semantic space coming from a Natural Language model that encodes the visual attribute description or use the label information. The consistency between the two subspaces is maximized by aligning them using the visual feature distribution.Third, we study Micro Facial Expression Detection. We propose an algorithm to spot micro-expression segments including the onset and offset frames and to spatially pinpoint in each image space the regions involved in the micro-facial muscle movements. The problem is formulated into Anomaly Detection due to the fact that micro-expressions occur infrequently and thus leading to few data generation compared to natural facial behaviours. In this manner, first, we propose a deep Recurrent Convolutional Auto-Encoder to capture spatial and motion feature changes of natural facial behaviours. Then, a statistical based model for estimating the probability density function of normal facial behaviours while associating a discriminating score to spot micro-expressions is learned based on a Gaussian Mixture Model. Finally, an adaptive thresholding technique for identifying micro expressions from natural facial behaviour is proposed.Our algorithms are tested over deliberate and spontaneous facial expression benchmarks
Clapés, i. Sintes Albert. "Learning to recognize human actions: from hand-crafted to deep-learning based visual representations." Doctoral thesis, Universitat de Barcelona, 2019. http://hdl.handle.net/10803/666794.
Full textEl reconeixement d’accions és un repte de gran rellevància pel que fa a la visió per computador. Els investigadors que treballen en el camp aspiren a proveir als ordinadors l’habilitat de percebre visualment les accions humanes – és a dir, d’observar, interpretar i comprendre a partir de dades visuals els events que involucren humans i que transcorren en l’entorn físic. Les aplicacions d’aquesta tecnologia són nombroses: interacció home-màquina, e-salut, monitoració/vigilància, indexació de videocontingut, etc. Els mètodes de disseny manual han dominat el camp fins l’aparició dels primers treballs exitosos d’aprenentatge profund, els quals han acabat esdevenint estat de l’art. No obstant, els mètodes de disseny manual resulten útils en certs escenaris, com ara quan no es tenen prou dades per a l’entrenament dels mètodes profunds, així com també aportant coneixement addicional que aquests últims no són capaços d’aprendre fàcilment. És per això que sovint els trobem ambdós combinats, aconseguint una millora general del reconeixement. Aquesta Tesi ha concorregut en el temps amb aquest canvi de paradigma i, per tant, ho reflecteix en dues parts ben distingides. En la primera part, estudiem les possibles millores sobre els mètodes existents de característiques manualment dissenyades per al reconeixement d’accions, i ho fem des de diversos punts de vista. Fent ús de les trajectòries denses com a fonament del nostre treball: primer, explorem l’ús de dades d’entrada de múltiples modalitats i des de múltiples vistes per enriquir els descriptors de les trajectòries. Segon, ens centrem en la part de la classificació del reconeixement d’accions, proposant un assemblat de classificadors d’accions que actuen sobre diversos conjunts de característiques i fusionant-ne les sortides amb una estratégia basada en la Teoria de Dempster-Shaffer. I tercer, proposem un nou mètode de disseny manual d’extracció de característiques que construeix una descripció intermèdia dels videos per tal d’aconseguir un millor modelat de les dinàmiques espai-temporals de llarg termini presents en els vídeos d’accions. Pel que fa a la segona part de la Tesi, comencem amb un estudi exhaustiu els mètodes actuals d’aprenentatge profund pel reconeixement d’accions. En revisem les metodologies més fonamentals i les més avançades darrerament aparegudes i establim una taxonomia que en resumeix els aspectes més importants. Més concretament, analitzem com cadascun dels mètodes tracta la dimensió temporal de les dades de vídeo. Per últim però no menys important, proposem una nova xarxa de neurones recurrent amb connexions residuals que integra de manera implícita les nostres contribucions prèvies en un nou marc d’acoblament potent i que mostra resultats prometedors.
Hom, John S. "Making the Invisible Visible: Interrogating social spaces through photovoice." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1284482097.
Full textMcNeill, Dean K. "Adaptive visual representations for autonomous mobile robots using competitive learning algorithms." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk2/ftp02/NQ35045.pdf.
Full textLi, Muhua 1973. "Learning invariant neuronal representations for objects across visual-related self-actions." Thesis, McGill University, 2005. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=85565.
Full textIn contrast to the bulk of previous research work on the learning of invariance that focuses on the pure bottom-up visual information, we incorporate visual-related self-action signals such as commands for eye, head or body movements, to actively collect the changing visual information and gate the learning process. This helps neural networks learn certain degrees of invariance in an efficient way. We describe a method that can produce a network with invariance to changes in visual input caused by eye movements and covert attention shifts. Training of the network is controlled by signals associated with eye movements and covert attention shifting. A temporal perceptual stability constraint is used to drive the output of the network towards remaining constant across temporal sequences of saccadic motions and covert attention shifts. We use a four-layer neural network model to perform the position-invariant extraction of local features and temporal integration of invariant presentations of local features. The model is further extended to handle viewpoint invariance over eye, head, and/or body movements. We also study cases of multiple features instead of single features in the retinal images, which need a self-organized system to learn over a set of feature classes. A modified saliency map mechanism with spatial constraint is employed to assure that attention stays as much as possible on the same targeted object in a multiple-object scene during the first few shifts.
We present results on both simulated data and real images, to demonstrate that our network can acquire invariant neuronal representations, such as position and attention shift invariance. We also demonstrate that our method performs well in realistic situations in which the temporal sequence of input data is not smooth, situations in which earlier approaches have difficulty.
Sena, Claudia Pinto Pereira. "Colaboração e mediação no processo de construção e representação do conhecimento por pessoas com deficiência visual, a partir da utilização da aprendizagem baseada em problemas." Faculdade de Educação, 2014. http://repositorio.ufba.br/ri/handle/ri/18154.
Full textApproved for entry into archive by Maria Auxiliadora da Silva Lopes (silopes@ufba.br) on 2015-10-21T14:34:54Z (GMT) No. of bitstreams: 1 TeseDoutorado_ClaudiaPintoPereiraSena.pdf: 6711466 bytes, checksum: f098d412ebbc0b471762971497da62dd (MD5)
Made available in DSpace on 2015-10-21T14:34:54Z (GMT). No. of bitstreams: 1 TeseDoutorado_ClaudiaPintoPereiraSena.pdf: 6711466 bytes, checksum: f098d412ebbc0b471762971497da62dd (MD5)
O homem, em todo seu trajeto histórico cultural, vem produzindo e utilizando tecnologias, desde as mais rudimentares, como uma lança de madeira, até as mais complexas, não só como garantia de sobrevivência, como também em um processo contínuo de construção e difusão de conhecimentos. A sociedade, cada vez mais, vem exigindo do homem autonomia, criatividade e adaptação, valorando as informações e, em especial, os conhecimentos construídos. A educação, enquanto espaço formador de cidadãos, precisa estar atenta a estas constantes modificações sociais, econômicas e políticas e oferecer oportunidades que privilegiem o aprendizado significativo. Entendendo que o ambiente, a cultura e outros sujeitos influenciam a construção do conhecimento (mediação), propõe-se, com este trabalho, investigar o PBL (Problem Based Learning), enquanto estratégia educacional de aprendizagem colaborativa, em um grupo de pessoas com deficiência visual, através da experiência vivenciada em um Centro de Apoio ao Deficiente Visual da cidade de Feira de Santana-Ba. A mediação, compreendida neste trabalho como a interação entre os sujeitos e a utilização de instrumentos e signos, perpassa o processo de ensino aprendizagem, privilegiando o diálogo entre os pares e a intervenção. Em se tratando das pessoas com deficiência visual, signos não visuais devem ser utilizados, privilegiando o desenvolvimento de outras aptidões, como a percepção tátil, auditiva, dentre outras. A relevância de experimentar o PBL em um grupo de pessoas com deficiência visual se revela na possibilidade de oportunizar a estas pessoas um ambiente de interação que favorece a aquisição de conceitos e representação de conhecimentos, um espaço de diálogo e de inclusão social e de observar as potencialidades e fragilidades do método neste contexto. Frente às transformações do mundo contemporâneo, a educação tem utilizado as tecnologias de informação e comunicação (TIC) com a intenção de participar do processo de inclusão sociodigital. Diante do exposto, pretende-se também observar de que maneira as TIC podem ser utilizadas, ampliando as habilidades da pessoa com deficiência visual e colaborando na construção e difusão dos conhecimentos partilhados.
ABSTRACT The human being in all its historical and cultural path, has been producing and using technologies, from the most rudimentary, like a wooden spear, to the most complex, not only as a guarantee of survival, but also in a continuous process of construction and diffusion of knowledge. Society increasingly has required human autonomy, creativity and adaptation, valuing information and, in particular, the knowledge built. Education, as an area that forms citizens, need to be aware of these social, economic and political constant changes and provide opportunities that emphasize meaningful learning. Understanding that the environment, culture and other subjects influence the construction of knowledge (mediation), it is proposed, with this work, to investigate the PBL (Problem Based Learning), while educational strategy of collaborative learning in a group of visually impaired people through the lived experience at a Support Center for the Visually Impaired of Feira de Santana - Ba. Mediation, understood in this work as the interaction between the subject and the use of tools and signs, permeates the teaching and learning process, focusing on the dialogue between peers and intervention. In the case of people with visual impairment, visual signs should not be used, favoring the development of other skills, such as the tactile perception, hearing, among others. The relevance of experiencing PBL in a group of visually impaired people is revealed in the ability to create opportunities for these people an environment of interaction that favors the acquisition of concepts and knowledge representation, a space of dialogue and social inclusion and to observe the strengths and weaknesses of the method in this context. Face the transformations of the contemporary world; education has used the information and communication technologies (ICT) with the intention to participate in the process of sociodigital inclusion. Therefore, we intend to also observe how ICT can be used increasing the skills of people with visual impairments, supporting the tutorial sessions and collaborating in the construction and diffusion of shared knowledge.
Eigenstetter, Angela [Verfasser], and Björn [Akademischer Betreuer] Ommer. "Learning Mid-Level Representations for Visual Recognition / Angela Eigenstetter ; Betreuer: Björn Ommer." Heidelberg : Universitätsbibliothek Heidelberg, 2015. http://d-nb.info/1180499883/34.
Full text