Dissertations / Theses on the topic 'Compréhension de scènes'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 15 dissertations / theses for your research on the topic 'Compréhension de scènes.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Picco, Frédérique. "La compréhension et la mémorisation de scènes imagées." Montpellier 3, 1999. http://www.theses.fr/1999MON30050.
Full textBauda, Marie-Anne. "Compréhension de scènes urbaines par combinaison d'information 2D/3D." Phd thesis, Toulouse, INPT, 2016. http://oatao.univ-toulouse.fr/16483/1/BAUDA_MarieAnne.pdf.
Full textDeléarde, Robin. "Configurations spatiales et segmentation pour la compréhension de scènes, application à la ré-identification." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7020.
Full textModeling the spatial configuration of objects in an image is a subject that is still little discussed to date, including in the most modern computer vision approaches such as convolutional neural networks ,(CNN). However, it is an essential aspect of scene perception, and integrating it into the models should benefit many tasks in the field, by helping to bridge the “semantic gap” between the digital image and the interpretation of its content. Thus, this thesis aims to improve spatial configuration modeling ,techniques, in order to exploit it in description and recognition systems. ,First, we looked at the case of the spatial configuration between two objects, by proposing an improvement of an existing descriptor. This new descriptor called “force banner” is an extension of the histogram of the same name to a whole range of forces, which makes it possible to better describe complex configurations. We were able to show its interest in the description of scenes, by learning toautomatically classify relations in natural language from pairs of segmented objects. We then tackled the problem of the transition to scenes containing several objects and proposed an approach per object by confronting each object with all the others, rather than having one descriptor per pair. Secondly, the industrial context of this thesis led us to deal with an application to the problem of re-identification of scenes or objects, a task which is similar to fine recognition from few examples. To do so, we rely on a traditional approach by describing scene components with different descriptors dedicated to specific characteristics, such as color or shape, to which we add the spatial configuration. The comparison of two scenes is then achieved by matching their components thanks to these characteristics, using the Hungarian algorithm for instance. Different combinations of characteristics can be considered for the matching and for the final score, depending on the present and desired invariances. For each one of these two topics, we had to cope with the problems of data and segmentation. We then generated and annotated a synthetic dataset, and exploited two existing datasets by segmenting them, in two different frameworks. The first approach concerns object-background segmentation and more precisely the case where a detection is available, which may help the segmentation. It consists in using an existing global segmentation model and exploiting the detection to select the right segment, by using several geometric and semantic criteria. The second approach concerns the decomposition of a scene or an object into parts and addresses the unsupervised case. It is based on the color of the pixels, by using a clustering method in an adapted color space, such as the HSV cone that we used. All these works have shown the possibility of using the spatial configuration for the description of real scenes containing several objects, as well as in a complex processing chain such as the one we used for re-identification. In particular, the force histogram could be used for this, which makes it possible to take advantage of its good performance, by using a segmentation method adapted to the use case when processing natural images
Trujillo, Morales Noël. "Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2007. http://tel.archives-ouvertes.fr/tel-00926395.
Full textTrujillo, Morales Noel. "Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets." Clermont-Ferrand 2, 2007. http://www.theses.fr/2007CLF21803.
Full textOesau, Sven. "Modélisation géométrique de scènes intérieures à partir de nuage de points." Thesis, Nice, 2015. http://www.theses.fr/2015NICE4034/document.
Full textGeometric modeling and semantization of indoor scenes from sampled point data is an emerging research topic. Recent advances in acquisition technologies provide highly accurate laser scanners and low-cost handheld RGB-D cameras for real-time acquisition. However, the processing of large data sets is hampered by high amounts of clutter and various defects such as missing data, outliers and anisotropic sampling. This thesis investigates three novel methods for efficient geometric modeling and semantization from unstructured point data: Shape detection, classification and geometric modeling. Chapter 2 introduces two methods for abstracting the input point data with primitive shapes. First, we propose a line extraction method to detect wall segments from a horizontal cross-section of the input point cloud. Second, we introduce a region growing method that progressively detects and reinforces regularities of planar shapes. This method utilizes regularities common to man-made architecture, i.e. coplanarity, parallelism and orthogonality, to reduce complexity and improve data fitting in defect-laden data. Chapter 3 introduces a method based on statistical analysis for separating clutter from structure. We also contribute a supervised machine learning method for object classification based on sets of planar shapes. Chapter 4 introduces a method for 3D geometric modeling of indoor scenes. We first partition the space using primitive shapes detected from permanent structures. An energy formulation is then used to solve an inside/outside labeling of a space partitioning, the latter providing robustness to missing data and outliers
Macé, Nadège. "Contraintes temporelles des traitements visuels dans une tâche de catégorisation de scènes naturelles." Toulouse 3, 2006. http://www.theses.fr/2006TOU30063.
Full textThe different experiments of this thesis focused on a categorisation task in which subjects have to initiate response on the basis of incomplete visual information, limited by a masking procedure. Results of these experiments not only confirmed that visual processing is extremely robust and fast (Chapter 2), but also demonstrated that the accumulated sensory information could be interpreted into a decisional signal to efficiently guide the response, depending on the motor effector (manual or ocular – Chapter 3) or the level of representation required in the task (Detection – Categorisation – Identification – Chapter 4). The early latencies recorded in this set of experiment is compatible with the idea that object recognition is initially based on the rapid transfer of visual information through the visual system, in a feed-forward and massively parallel way
Jouen, Anne-Lise. "Au-delà des mots et des images, bases neurophysiologiques d'un système sémantique commun à la compréhension des phrases et des scènes visuelles." Thesis, Lyon 1, 2013. http://www.theses.fr/2013LYO10322.
Full textCertain theories of cognitive function postulate a neural system for processing meaning, independent of the stimulus input modality. The objective of this thesis work, in line with the embodied cognition domain, was to study functionalities of such a network involved in both sentence and visual scene comprehension. In the literature, a wide network of fronto-temporo-parietal sensorimotor and associative areas are described as being involved in this process, and while there’s a lack of consensus on the amodal nature of this system, extensive research has focused on identifying distributed cortical systems that participate in meaning representations separately in the visual and language modalities. Moreover, the stimuli used are generally less complex than everyday life situations we meet. However, a significant portion of human mental life is built upon the construction of perceptually and socially rich internal scene representations and these mental models are involved in a large variety of processes for exploring specific memories of the past, planning the future, or understanding current situations. Although diffusion-tensor imagery based techniques makes feasible the visualization of white matter tracts in the human brain, the connectivity of the semantic network has been little studied. Through different experimental protocols involving mainly neuroimaging techniques (fMRI, DTI, EEG), we were able to reveal the neurophysiological basis of this common semantic network involved in the building of representation and comprehension of rich verbal and non-verbal stimuli. With our first experiment, we examined brain activation and connectivity in 19 subjects who read sentences and viewed pictures corresponding to everyday events, in a combined fMRI and DTI study. Conjunction of activity in understanding sentences and pictures revealed a common fronto temporo-parietal network that included inferior frontal gyrus, precentral gyrus, the retrosplenial complex, and medial temporal gyrus extending into the temporo-parietal junction (TPJ) and inferior parietal lobe. DTI tractography revealed a specific architecture of white matter fibers supporting this network which involves principally the pathways described as the ventral semantic route (IFOF, UF, ILF, MdLF). Our second experiment, which is a behavioral protocol, explored interindividual differences in the ability to represent sentences presented in auditory or visual modality. We demonstrated that individuals are not equal in this capacity to represent sentences, these differences were reflected in the effects on behavioral markers including scores of ease of representation (COR) and speed of responses (TR); they are also related to the number of fibers of the MdLF which supposes a role for this fasciculus in capacities of representation. Both the results of this behavioral protocol and results from our third EEG experiment also showed that the contextual effect was significant: the context induced by the presentation of a first stimulus has the ability to influence the representation of a second stimulus when is the second is semantically consistent or not with the first presented stimulus. Our EEG results (ERPs) revealed components influenced by the available semantic information: early attentional effects which could be modality-specific and later semantic integration process common for verbal and non-verbal stimuli... [etc]
Nguyen, Van Dinh. "Exploitation de la détection de contours pour la compréhension de texte dans une scène visuelle." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS473.
Full textScene texts have been attracting increasing interest in recent years as witnessed by a large number of applications such as car licence plate recognition systems, navigation systems, self-driving cars based on traffic sign, and so on. In this research, we tackle challenges of designing robust and reliable automatic scene text reading systems. Two major steps of the system as a scene text localization and a scene text recognition have been studied and novel algorithms have been developed to address them. Our works are based on the observation that providing primary scene text regions which have high probability of being texts is very important for localizing and recognizing texts in scenes. This factor can influence both accuracy and efficiency of detection and recognition systems. Inspired by successes of object proposal researches in general object detection and recognition, two state-of-the-art scene text proposal techniques have been proposed, namely Text-Edge-Box (TEB) and Max-Pooling Text Proposal (MPT). In the TEB, proposed bottom-up features, which are extracted from binary Canny edge maps, are used to group edge connected components into proposals and score them. In the MPT technique, a novel grouping solution is proposed as inspired by the max-pooling idea. Different from existing grouping techniques, it does not rely on any text specific heuristic rules and thresholds for providing grouping decisions. Based on our proposed scene text proposal techniques, we designed an end-to-end scene text reading system by integrating proposals with state-of-the-art scene text recognition models, where a false positive proposals suppression and a word recognition can be processed concurrently. Furthermore, we developed an assisted scene text searching system by building a web-page user interface on top of the proposed end-to-end system. The system can be accessed by any smart device at the link: dinh.ubismart.org:27790. Experiments on various public scene text datasets show that the proposed scene text proposal techniques outperform other state-of-the-art scene text proposals under different evaluation frameworks. The designed end-to-end systems also outperforms other scene-text-proposal based end-to-end systems and are competitive to other systems as presented in the robust reading competition community. It achieves the fifth position in the champion list (Dec-2017): http://rrc.cvc.uab.es/?ch=2&com =evaluation&task=4
Macé, Marc. "Représentations visuelles précoces dans la catégorisation rapide de scènes naturelles chez l'homme et le singe." Phd thesis, Université Paul Sabatier - Toulouse III, 2006. http://tel.archives-ouvertes.fr/tel-00077594.
Full textcomposée de trois chapitres, chacun abordant un aspect particulier de la construction des
représentations visuelles précoces utilisées pour catégoriser rapidement les objets.
Nous montrons dans le premier chapitre que les informations magnocellulaires sont probablement très
impliquées dans la construction des représentations visuelles précoces. Ces représentations
rudimentaires de la scène visuelle pourraient servir à guider les traitements effectués sur les
informations parvocellulaires accessibles plus tardivement.
Dans le deuxième chapitre, nous nous intéressons à la chronométrie des traitements visuels, en
analysant les résultats de tâches conçues pour diminuer le temps de réaction des sujets ainsi que la
latence de l'activité différentielle cérébrale. Nous étudions également la dynamique fine de ces
traitements grâce à un protocole de masquage dans lequel l'information n'est accessible à l'écran que
pendant une période de temps très courte et nous montrons ainsi toute l'importance des 20-40
premières millisecondes de traitement.
Le troisième chapitre traite de la nature des représentations visuelles précoces et des tâches qu'elles
permettent de réaliser. Des expériences dans lesquelles les sujets doivent catégoriser des animaux à
différents niveaux montrent que le premier niveau auquel le système visuel accède n'est pas le niveau
de base mais le niveau superordonné. Ces résultats vont à l'encontre de l'architecture classiquement
admise sur la base de travaux utilisant des processus lexicaux et met en évidence l'importance de
facteurs comme l'expertise et la diagnosticité des indices visuels pour expliquer la vitesse d'accès aux
différents niveaux de catégorie.
Ces différents résultats permettent de caractériser les représentations précoces que le système visuel
utilise pour extraire le sens des informations qui lui parviennent et faire émerger la représentation
interne du monde telle que nous la percevons.
Xu, Philippe. "Information fusion for scene understanding." Thesis, Compiègne, 2014. http://www.theses.fr/2014COMP2153/document.
Full textImage understanding is a key issue in modern robotics, computer vison and machine learning. In particular, driving scene understanding is very important in the context of advanced driver assistance systems for intelligent vehicles. In order to recognize the large number of objects that may be found on the road, several sensors and decision algorithms are necessary. To make the most of existing state-of-the-art methods, we address the issue of scene understanding from an information fusion point of view. The combination of many diverse detection modules, which may deal with distinct classes of objects and different data representations, is handled by reasoning in the image space. We consider image understanding at two levels : object detection ans semantic segmentation. The theory of belief functions is used to model and combine the outputs of these detection modules. We emphazise the need of a fusion framework flexible enough to easily include new classes, new sensors and new object detection algorithms. In this thesis, we propose a general method to model the outputs of classical machine learning techniques as belief functions. Next, we apply our framework to the combination of pedestrian detectors using the Caltech Pedestrain Detection Benchmark. The KITTI Vision Benchmark Suite is then used to validate our approach in a semantic segmentation context using multi-modal information
Sanchez, Corentin. "A world model enabling information integrity for autonomous vehicles." Thesis, Compiègne, 2022. http://www.theses.fr/2022COMP2683.
Full textTo drive in complex urban environments, autonomous vehicles need to understand their driving context. This task, also known as the situation awareness, relies on an internal virtual representation of the world made by the vehicle, called world model. This representation is generally built from information provided by multiple sources. High definition navigation maps supply prior information such as road network topology, geometric description of the carriageway, and semantic information including traffic laws. The perception system provides a description of the space and of road users evolving in the vehicle surroundings. Conjointly, they provide representations of the environment (static and dynamic) and allow to model interactions. In complex situations, a reliable and non-misleading world model is mandatory to avoid inappropriate decision-making and to ensure safety. The goal of this PhD thesis is to propose a novel formalism on the concept of world model that fulfills the situation awareness requirements for an autonomous vehicle. This world model integrates prior knowledge on the road network topology, a lane-level grid representation, its prediction over time and more importantly a mechanism to control and monitor the integrity of information. The concept of world model is present in many autonomous vehicle architectures but may take many various forms and sometimes only implicitly. In some work, it is part of the perception process when in some other it is part of a decisionmaking process. The first contribution of this thesis is a survey on the concept of world model for autonomous driving covering different levels of abstraction for information representation and reasoning. Then, a novel representation is proposed for the world model at the tactical level combining dynamic objects and spatial occupancy information. First, a graph based top-down approach using a high-definition map is proposed to extract the areas of interests with respect to the situation from the vehicle's perspective. It is then used to build a Lane Grid Map (LGM), which is an intermediate space state representation from the ego-vehicle point of view. A top-down approach is chosen to assess and characterize the relevant information of the situation. Additionally to classical free-occupied states, the unknown state is further characterized by the notions of neutralized and safe areas that provide a deeper level of understanding of the situation. Another contribution to the world model is an integrity management mechanism that is built upon the LGM representation. It consists in managing the spatial sampling of the grid cells in order to take into account localization and perception errors and to avoid misleading information. Regardless of the confidence on localization and perception information, the LGM is capable of providing reliable information to decision making in order not to take hazardous decisions.The last part of the situation awareness strategy is the prediction of the world model based on the LGM representation. The main contribution is to show how a classical object-level prediction fits this representation and that the integrity can also be extended at the prediction stage. It is also depicted how a neutralized area can be used in the prediction stage to provide a better situation prediction. The work relies on experimental data in order to demonstrate a real application of a complex situation awareness representation. The approach is evaluated with real data obtained thanks to several experimental vehicles equipped with LiDAR sensors and IMU with RTK corrections in the city of Compi_egne. A high-definition map has also been used in the framework of the SIVALab joint laboratory between Renault and Heudiasyc CNRS-UTC. The world model module has been implemented (with ROS software) in order to fulfll real-time application and is functional on the experimental vehicles for live demonstrations
Delaitre, Vincent. "Modeling and recognizing interactions between people, objects and scenes." Thesis, Paris, Ecole normale supérieure, 2015. http://www.theses.fr/2015ENSU0003/document.
Full textIn this thesis, we focus on modeling interactions between people, objects and scenes and show benefits of combining corresponding cues for improving both action classification and scene understanding. In the first part, we seek to exploit the scene and object context to improve action classification in still images. We explore alternative bag-of-features models and propose a method that takes advantage of the scene context. We then propose a new model exploiting the object context for action classification based on pairs of body part and object detectors. We evaluate our methods on our newly collected still image dataset as well as three other datasets for action classification and show performance close to the state of the art. In the second part of this thesis, we address the reverse problem and aim at using the contextual information provided by people to help object localization and scene understanding. We collect a new dataset of time-lapse videos involving people interacting with indoor scenes. We develop an approach to describe image regions by the distribution of human co-located poses and use this pose-based representation to improve object localization. We further demonstrate that people cues can improve several steps of existing pipelines for indoor scene understanding. Finally, we extend the annotation of our time-lapse dataset to 3D and show how to infer object labels for occupied 3D volumes of a scene. To summarize, the contributions of this thesis are the following: (i) we design action classification models for still images that take advantage of the scene and object context and we gather a new dataset to evaluate their performance, (ii) we develop a new model to improve object localization thanks to observations of people interacting with an indoor scene and test it on a new dataset centered on person, object and scene interactions, (iii) we propose the first method to evaluate the volumes occupied by different object classes in a room that allow us to analyze the current 3D scene understanding pipeline and identify its main source of errors
Wang, Fan. "How polarimetry may contribute to understand reflective road scenes : theory and applications." Thesis, Rouen, INSA, 2016. http://www.theses.fr/2016ISAM0003/document.
Full textAdvance Driver Assistance Systems (ADAS) aim to automate/adapt/enhance trans-portation systems for safety and better driving. Various research topics are emerged to focus around the ADAS, including the object detection and recognition, image understanding, disparity map estimation etc. The presence of the specular highlights restricts the accuracy of such algorithms, since it covers the original image texture and leads to the lost of information. Light polarization implicitly encodes the object related information, such as the surface direction, material nature, roughness etc. Under the context of ADAS, we are inspired to further inspect the usage of polarization imaging to remove image highlights and analyze the road scenes.We firstly propose in this thesis to remove the image specularity through polarization by applying a global energy minimization. Polarization information provides a color constraint that reduces the color distortion of the results. The global smoothness assumption further integrates the long range information in the image and produces an improved diffuse image.We secondly propose to use polarization images as a new feature, since for the road scenes, the high reflection appears only upon certain objects such as cars. Polarization features are applied in image understanding and car detection in two different ways. The experimental results show that, once properly fused with rgb-based features, the complementary information provided by the polarization images improve the algorithm accuracy. We finally test the polarization imaging for depth estimation. A post-aggregation stereo matching method is firstly proposed and validated on a color database. A fusion rule is then proposed to use the polarization imaging as a constraint to the disparity map estimation. From these applications, we proved the potential and the feasibility to apply polariza-tion imaging in outdoor tasks for ADAS
Huet, Moïra-Phoebé. "Voice mixology at a cocktail party : Combining behavioural and neural tracking for speech segregation." Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI070.
Full textIt is not always easy to follow a conversation in a noisy environment. In order to discriminate two speakers, we have to mobilize many perceptual and cognitive processes to maintain attention on a target voice and avoid shifting attention to the background. In this dissertation, the processes underlying speech segregation are explored through behavioural and neurophysiological experiments. In a preliminary phase, the development of an intelligibility task -- the Long-SWoRD test -- is introduced. This protocol allows participants to benefit from cognitive resources, such as linguistic knowledge, to separate two talkers in a realistic listening environment. The similarity between the two speakers, and thus by extension the difficulty of the task, was controlled by manipulating the acoustic parameters of the target and masker voices. In a second phase, the performance of the participants on this task is evaluated through three behavioural and neurophysiological studies (EEG). Behavioural results are consistent with the literature and show that the distance between voices, spatialisation cues, and semantic information influence participants' performance. Neurophysiological results, analysed with temporal response functions (TRF), indicate that the neural representations of the two speakers differ according to the difficulty of listening conditions. In addition, these representations are constructed more quickly when the voices are easily distinguishable. It is often presumed in the literature that participants' attention remains constantly on the same voice. The experimental protocol presented in this work provides the opportunity to retrospectively infer when participants were listening to each voice. Therefore, in a third stage, a combined analysis of this attentional information and EEG signals is presented. Results show that information about attentional focus can be used to improve the neural representation of the attended voice in situations where the voices are similar