Dissertations / Theses on the topic 'Questions visuelles'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 16 dissertations / theses for your research on the topic 'Questions visuelles.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Lerner, Paul. "Répondre aux questions visuelles à propos d'entités nommées." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG074.
Full textThis thesis is positioned at the intersection of several research fields, Natural Language Processing, Information Retrieval (IR) and Computer Vision, which have unified around representation learning and pre-training methods. In this context, we have defined and studied a new multimodal task: Knowledge-based Visual Question Answering about Named Entities (KVQAE).In this context, we were particularly interested in cross-modal interactions and different ways of representing named entities. We also focused on data used to train and, more importantly, evaluate Question Answering systems through different metrics.More specifically, we proposed a dataset for this purpose, the first in KVQAE comprising various types of entities. We also defined an experimental framework for dealing with KVQAE in two stages through an unstructured knowledge base and identified IR as the main bottleneck of KVQAE, especially for questions about non-person entities. To improve the IR stage, we studied different multimodal fusion methods, which are pre-trained through an original task: the Multimodal Inverse Cloze Task. We found that these models leveraged a cross-modal interaction that we had not originally considered, and which may address the heterogeneity of visual representations of named entities. These results were strengthened by a study of the CLIP model, which allows this cross-modal interaction to be modeled directly. These experiments were carried out while staying aware of biases present in the dataset or evaluation metrics, especially of textual biases, which affect any multimodal task
Bordes, Patrick. "Deep Multimodal Learning for Joint Textual and Visual Reasoning." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS370.
Full textIn the last decade, the evolution of Deep Learning techniques to learn meaningful data representations for text and images, combined with an important increase of multimodal data, mainly from social network and e-commerce websites, has triggered a growing interest in the research community about the joint understanding of language and vision. The challenge at the heart of Multimodal Machine Learning is the intrinsic difference in semantics between language and vision: while vision faithfully represents reality and conveys low-level semantics, language is a human construction carrying high-level reasoning. One the one hand, language can enhance the performance of vision models. The underlying hypothesis is that textual representations contain visual information. We apply this principle to two Zero-Shot Learning tasks. In the first contribution on ZSL, we extend a common assumption in ZSL, which states that textual representations encode information about the visual appearance of objects, by showing that they also encode information about their visual surroundings and their real-world frequence. In a second contribution, we consider the transductive setting in ZSL. We propose a solution to the limitations of current transductive approaches, that assume that the visual space is well-clustered, which does not hold true when the number of unknown classes is high. On the other hand, vision can expand the capacities of language models. We demonstrate it by tackling Visual Question Generation (VQG), which extends the standard Question Generation task by using an image as complementary input, by using visual representations derived from Computer Vision
Castro, Teresa. "Le cinéma et la vocation cartographique des images : questions de culture visuelle." Paris 3, 2008. http://www.theses.fr/2008PA030099.
Full textHow does the mapping impulse of images become apparent in the cinema? Conceived as an investigation in visual culture, this research is grounded on the following premise: the existence of a cartographic reason of images, expressed in and by cartographic shapes, illustrating the turn from “map” to “mapping impulse”. The enquiry is built on the analysis of the cinematographic expressions of three cartographic shapes: panoramas, atlases and aerial views. Confronting a welter of fi lms and images from different periods and genres, ranging from silent non-fi ction fi lms to contemporary artists’ projects, our discussion proceeds by accumulating visual objects and creating associations between them. If the mapping impulse of images is embodied in the cinema in many different ways, it seems to be related to two visibility regimes: a descriptive regime and a diagrammatic regime. Suggesting different ways of conceiving the spatiotemporal representation of the real, these visibility regimes concern the fabrication of points of view and, at times, the creation of new realities. The consideration of the mapping impulse of images eventually allows for the identifi cation of two cartographic rationalities, the fi rst spanning the fi rst decades of the 20th century and the second the beginning of the 21st century. If the implications of these cartographic rationalities go well beyond the fi eld of the moving image, both seem to be related to the proliferation of different image technologies and to globalisation as an historical phenomenon
Lindmark, Olivia, and Aino Soukko. "A graphic profile should answer questions, not create them - a case study about usability in a visual identity manual." Thesis, Linköpings universitet, Medie- och Informationsteknik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-119946.
Full textOzdil, Yilmaz. "La construction visuelle des identités kurdes : cinema turc, cinéma kurde." Thesis, Paris 3, 2013. http://www.theses.fr/2013PA030165.
Full textIn the four countries dominating Kurdistan (Turkey, Iran, Iraq and Syria) the Kurdish question translates first and foremost under the concept of visibility/invisibility, around the problem of the recognition of the Kurds as a denied nation. This is especially apparent in the case of Turkey, the first of the countries which imposed its own nation-state on the Kurds : this question is associated with the negationist state policies on Kurdish culture and identity,which, since 1924, have been considered as obstacles on the path to the creation of a nationalTurkish identity. In this conflictual relation between Kurdish and Turkish nationalisms – the fruit, among others, of a traumatic memory and a long history of Kurdish resistance inrespective sections of Kurdistan – the imagery of the Kurds refers to a historical dimensionwhich has spontaneously become an essential reference of cinematographic treatment of« Kurdishness » under the form of interactions constructed by themselves or by their own political opponents. The present thesis aims at describing that permanent influence of nationalism on the cinematographic treatment of « Kurdishness » in the Turkish cinema which principally treats the Kurds without designating them as Kurds, then in the Kurdish cinema in the service of « Kurdish cause » following the 1990s
Lemaire, Laurence. "Approche comportementale et anatomo-fonctionnelle de la question de Molyneux." Université Louis Pasteur (Strasbourg) (1971-2008), 2002. http://www.theses.fr/2002STR1PS01.
Full textThis work interrogates the processes involved in the intermodal perception, that is to say, the mechanisms which allow that the same spatial information can be obtained by different sensory modalities (Streri 1993). If intermodality is yet admitted for adults, the agreement is not established concerning the processes on which it bases. The question done by William Molyneux, dawn of the XVIIIth century, illustrates this problem, and is in the origin of a theoretical debate of which the ambition remains to determine if intermodal equivalences are innate, or gradually acquired during the development. Under an impulse of the cognitive search, recent works, realized since the 90s, expose multiple factors susceptible to force the possibilities of intermodal integration, as well on the perceptive side as on the representational side. .
Borlizzi, Vincenzo. "Trois questions sur le modelage des films - Les obstacles visuels, la pesanteur et la durée." Thesis, Paris 3, 2011. http://www.theses.fr/2011PA030039.
Full textThis research is brought about by the interaction between two ideas Eisenstein wrote : in 1934 hestated that a film is stronger than granite, but the next year he maintained there is no physical reality in films, only reflections and grey shadows. So, the question of this work is : can a film director touch his film? A movie is not an object. Contradiction between the two Eisenstein ideas isapparent and can be overcome if films begin to be studied beyond “shadow-granite” analogy,without any words that compel an image to be a thing.On the contrary, if every image can be studied as an integrating part of a film, if each movie can be considered as a visual body which suggests some questions and which becomes a material of visualthought, then Eisenstein ideas can express not only the physical force of a film, but also oneinstrument a director can develop to touch and to model his film: shadows and lights.Consequently, the question can be redefined : how could a director touch his film? How could afilm propose some questions about its visual form and about its interaction with physical forces that try to model it ?This study does not impose a dogmatic definition about film modelling, it tries to examine threemodelling paths thoroughly : film creation by lights and visual obstacles in Bergman movies withthe actress Harriet Andersson ; film modelling and the problem of force of gravity on the bodies insome works by Ford and Hitchcock ; finally the ways to model movies by expressing duration ofhuman (the modelling of the eyes of Vera Miles) or extra-human bodies (Victor Erice film creationand disintegration of the fruits of a quince-tree because of the light)
Jourdain, Christine. "Etude des difficultés de lecture chez l'adulte : la question de l'automatisation de la reconnaissance visuelle des mots." Dijon, 1995. http://www.theses.fr/1995DIJOL019.
Full textThe aim of this thesis is to study the visual word-recognition processes that prevent adult efficient reading. In order to do so, first we focused on the different visual word- recognition processes (chapter 1) and on reading disabilities (chapter 2). Second, we proposed experiments (11) which investigate efficiency of word-recognition processes from on-line paradigms (chapter 3). The results show differences in word-recognition processes automatization and gradation of disabilities (chapters 4 and 5). Thus the difficulties could be both visual, phonological and lexical (group 4), or phonological with consequence on lexical processing (group 3), or strictly phonological (group 2). However such interpretations should be taken with caution since subjects of a same group do not show the identical pattern of difficulties (chapter 6)
Maunet, Isabelle. "La Poésie à la lettre et à la question : Du coup de dés aux poésies concrète et visuelle." Tours, 2000. http://www.theses.fr/2000TOUR2033.
Full textThorisdottir, Rosa Rut. "L'Arctique en images : l'analyse des films de Jean Malaurie et la question de la valeur des documents visuels." Paris 7, 2010. http://www.theses.fr/2010PA070102.
Full textIn this thesis, we study the value of images in the discourse of the current cultural awakening amongst the Inuit population and what these images could bring in terms of the collective memory tools. We defend the opinion that the image, the film, as scientific documents, today pass without the attention they deserve. During our research, we examine specifically Jean Malaurie's films on the Inuit and his testimonies on the cultural crisis lived by the Arctic people in the 1970s. We conclude that these films should be accessible to the Inuit population themselves, as its importance, to these very same people, lays in both the private as the public level. We have realised that the Inuit have very limited rights to the documentaries and films concerning their lives and culture. What seems to be preventing the Inuit in using these films are western copyright laws. We explore thus the question of whose property these films really are and thus propose that these films should be considered as Inuit public cultural goods. As such they should be shared with institutions and universities in the Arctic and put in service of the people who inspired them. A generous gesture, giving a good example to other directors and ethnologists, and a new dimension to the legacy of Jean Malaurie's audio-visual works
Papantoniou, Nowak Stéphane. "Le livre. Dedans / Dehors. Autour des éditions Al Dante : la question du medium : Livre, transmédialité et intermédialité. Contemporanéité et avant-garde. Questions de création littéraire et artistique. L'édition comparée." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSEN002.
Full textThis doctoral thesis offers to study poetic practices in and out of the book from poets’ itineraries published by the Al Dante publishing house. The thesis is questioning the performance's notion, most often reduced to its scenic's dimension, but also the avant-garde's idea, too often limited to a political history which has ended. The avant-garde notion doesn't appear anymore as the element structuring the group but as an acting spectrality, leading to mix political issues - criticism of the institutions, criticism of the dominant language, challenges the places assigned by culture - with aesthetic issues. It is therefore a question of poetic translation as actualization of the political situation, and of transmediation. The stylistic approach has been gradually supplanted by a mediological approach to problematize heterogeneous practices. The Al Dante publishing house specificities allow us to see the book in a more general poetic ecosystem, where the book is no longer the only purpose, but the mediation between a process of creation and public events. So we can read this contemporary moment not only as the emergence of dominant themes, but also as a crisis of the book’s centrality and its economy. The Al Dante publishing house practices has led us to defend a theory of “editorial gesture” that cannot be reduced to the layout of manuscript or the production of a book and its marketing, but sometimes leads to the creation of books that didn’t find an editorial space. To push the boundaries of edition, to think the specificity poetry-action’s book is raising paradoxes: the disintegration of the linearity of the speeches, the reconfiguration of the page’s space, the specific adaptation of the books forms and fonts. These practices concern the book’s performative dimension. So it participates in a renewed way to a "typographic performance"
Dancette, Corentin. "Shortcut Learning in Visual Question Answering." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS073.
Full textThis thesis is focused on the task of VQA: it consists in answering textual questions about images. We investigate Shortcut Learning in this task: the literature reports the tendency of models to learn superficial correlations leading them to correct answers in most cases, but which can fail when encountering unusual input data. We first propose two methods to reduce shortcut learning on VQA. The first, which we call RUBi, consists of an additional loss to encourage the model to learn from the most difficult and less biased examples -- those which cannot be answered solely from the question. We then propose SCN, a model for the more specific task of visual counting, which incorporates architectural priors designed to make it more robust to distribution shifts. We then study the existence of multimodal shortcuts in the VQA dataset. We show that shortcuts are not only based on correlations between the question and the answer but can also involve image information. We design an evaluation benchmark to measure the robustness of models to multimodal shortcuts. We show that existing models are vulnerable to multimodal shortcut learning. The learning of those shortcuts is particularly harmful when models are evaluated in an out-of-distribution context. Therefore, it is important to evaluate the reliability of VQA models, i.e. We propose a method to improve their ability to abstain from answering when their confidence is too low. It consists of training an external ``selector'' model to predict the confidence of the VQA model. This selector is trained using a cross-validation-like scheme in order to avoid overfitting on the training set
Strub, Florian. "Développement de modèles multimodaux interactifs pour l'apprentissage du langage dans des environnements visuels." Thesis, Lille 1, 2020. http://www.theses.fr/2020LIL1I030.
Full textWhile our representation of the world is shaped by our perceptions, our languages, and our interactions, they have traditionally been distinct fields of study in machine learning. Fortunately, this partitioning started opening up with the recent advents of deep learning methods, which standardized raw feature extraction across communities. However, multimodal neural architectures are still at their beginning, and deep reinforcement learning is often limited to constrained environments. Yet, we ideally aim to develop large-scale multimodal and interactive models towards correctly apprehending the complexity of the world. As a first milestone, this thesis focuses on visually grounded language learning for three reasons (i) they are both well-studied modalities across different scientific fields (ii) it builds upon deep learning breakthroughs in natural language processing and computer vision (ii) the interplay between language and vision has been acknowledged in cognitive science. More precisely, we first designed the GuessWhat?! game for assessing visually grounded language understanding of the models: two players collaborate to locate a hidden object in an image by asking a sequence of questions. We then introduce modulation as a novel deep multimodal mechanism, and we show that it successfully fuses visual and linguistic representations by taking advantage of the hierarchical structure of neural networks. Finally, we investigate how reinforcement learning can support visually grounded language learning and cement the underlying multimodal representation. We show that such interactive learning leads to consistent language strategies but gives raise to new research issues
Laforge, Frédéric. "De la question éthique à l'esthétique." Thèse, 2003. http://constellation.uqac.ca/765/1/17710606.pdf.
Full textPahuja, Vardaan. "Visual question answering with modules and language modeling." Thèse, 2019. http://hdl.handle.net/1866/22534.
Full textBahdanau, Dzmitry. "On sample efficiency and systematic generalization of grounded language understanding with deep learning." Thesis, 2020. http://hdl.handle.net/1866/23943.
Full textBy using the methodology of deep learning that advocates relying more on data and flexible neural models rather than on the expert's knowledge of the domain, the research community has recently achieved remarkable progress in natural language understanding and generation. Nevertheless, it remains unclear whether simply scaling up existing deep learning methods will be sufficient to achieve the goal of using natural language for human-computer interaction. We focus on two related aspects in which current methods appear to require major improvements. The first such aspect is the data inefficiency of deep learning systems: they are known to require extreme amounts of data to perform well. The second aspect is their limited ability to generalize systematically, namely to understand language in situations when the data distribution changes yet the principles of syntax and semantics remain the same. In this thesis, we present four case studies in which we seek to provide more clarity regarding the aforementioned data efficiency and systematic generalization aspects of deep learning approaches to language understanding, as well as to facilitate further work on these topics. In order to separate the problem of representing open-ended real-world knowledge from the problem of core language learning, we conduct all these studies using synthetic languages that are grounded in simple visual environments. In the first article, we study how to train agents to follow compositional instructions in environments with a restricted form of supervision. Namely for every instruction and initial environment configuration we only provide a goal-state instead of a complete trajectory with actions at all steps. We adapt adversarial imitation learning methods to this setting and demonstrate that such a restricted form of data is sufficient to learn compositional meanings of the instructions. Our second article also focuses on instruction following. We develop the BabyAI platform to facilitate further, more extensive and rigorous studies of this setup. The platform features a compositional Baby language with $10^{19}$ instructions, whose semantics is precisely defined in a partially-observable gridworld environment. We report baseline results on how much supervision is required to teach the agent certain subsets of Baby language with different training methods, such as reinforcement learning and imitation learning. In the third article we study systematic generalization of visual question answering (VQA) models. In the VQA setting the system must answer compositional questions about images. We construct a dataset of spatial questions about object pairs and evaluate how well different models perform on questions about pairs of objects that never occured in the same question in the training distribution. We show that models in which word meanings are represented by separate modules that perform independent computation generalize much better than models whose design is not explicitly modular. The modular models, however, generalize well only when the modules are connected in an appropriate layout, and our experiments highlight the challenges of learning the layout by end-to-end learning on the training distribution. In our fourth and final article we also study generalization of VQA models to questions outside of the training distribution, but this time using the popular CLEVR dataset of complex questions about 3D-rendered scenes as the platform. We generate novel CLEVR-like questions by using similarity-based references (e.g. ``the ball that has the same color as ...'') in contexts that occur in CLEVR questions but only with location-based references (e.g. ``the ball that is to the left of ...''). We analyze zero- and few- shot generalization to CLOSURE after training on CLEVR for a number of existing models as well as a novel one.