Thèses sur le sujet « Reconnaissance de scènes »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Reconnaissance de scènes ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.
Blachon, David. « Reconnaissance de scènes multimodale embarquée ». Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM001/document.
Texte intégralContext: This PhD takes place in the contexts of Ambient Intelligence and (Mobile) Context/Scene Awareness. Historically, the project comes from the company ST-Ericsson. The project was depicted as a need to develop and embed a “context server” on the smartphone that would get and provide context information to applications that would require it. One use case was given for illustration: when someone is involved in a meeting and receives a call, then thanks to the understanding of the current scene (meet at work), the smartphone is able to automatically act and, in this case, switch to vibrate mode in order not to disturb the meeting. The main problems consist of i) proposing a definition of what is a scene and what examples of scenes would suit the use case, ii) acquiring a corpus of data to be exploited with machine learning based approaches, and iii) propose algorithmic solutions to the problem of scene recognition.Data collection: After a review of existing databases, it appeared that none fitted the criteria I fixed (long continuous records, multi-sources synchronized records necessarily including audio, relevant labels). Hence, I developed an Android application for collecting data. The application is called RecordMe and has been successfully tested on 10+ devices, running Android 2.3 and 4.0 OS versions. It has been used for 3 different campaigns including the one for scenes. This results in 500+ hours recorded, 25+ volunteers were involved, mostly in Grenoble area but abroad also (Dublin, Singapore, Budapest). The application and the collection protocol both include features for protecting volunteers privacy: for instance, raw audio is not saved, instead MFCCs are saved; sensitive strings (GPS coordinates, device ids) are hashed on the phone.Scene definition: The study of existing works related to the task of scene recognition, along with the analysis of the annotations provided by the volunteers during the data collection, allowed me to propose a definition of a scene. It is defined as a generalisation of a situation, composed of a place and an action performed by one person (the smartphone owner). Examples of scenes include taking a transportation, being involved in a work meeting, walking in the street. The composition allows to get different kinds of information to provide on the current scene. However, the definition is still too generic, and I think that it might be completed with additionnal information, integrated as new elements of the composition.Algorithmics: I have performed experiments involving machine learning techniques, both supervised and unsupervised. The supervised one is about classification. The method is quite standard: find relevant descriptors of the data through the use of an attribute selection method. Then train and test several classifiers (in my case, there were J48 and Random Forest trees ; GMM ; HMM ; and DNN). Also, I have tried a 2-stage system composed of a first step of classifiers trained to identify intermediate concepts and whose predictions are merged in order to estimate the most likely scene. The unsupervised part of the work aimed at extracting information from the data, in an unsupervised way. For this purpose, I applied a bottom-up hierarchical clustering, based on the EM algorithm on acceleration and audio data, taken separately and together. One of the results is the distinction of acceleration into groups based on the amount of agitation
Paumard, José. « Reconnaissance multiéchelle d'objets dans des scènes ». Cachan, Ecole normale supérieure, 1996. http://www.theses.fr/1996DENS0025.
Texte intégralBremond, François. « Interprétation de Scènes : perception, fusion multi-capteurs, raisonnement spatio-temporel et reconnaissance d'activités ». Habilitation à diriger des recherches, Université de Nice Sophia-Antipolis, 2007. http://tel.archives-ouvertes.fr/tel-00275889.
Texte intégralTupin, Florence. « Reconnaissance des formes et analyse de scènes en imagerie radar a ouverture synthetique ». Paris, ENST, 1997. http://www.theses.fr/1997ENST0016.
Texte intégralGrandjean, Pierrick. « Perception multisensorielle et interprétation de scènes ». Toulouse 3, 1991. http://www.theses.fr/1991TOU30232.
Texte intégralFua, Pascal. « Une approche variationnelle pour la reconnaissance d'objets ». Paris 11, 1989. http://www.theses.fr/1989PA112357.
Texte intégralTran, Thi Hà Châu. « La reconnaissance des objets et des scènes naturelles dans la dégénérescence maculaire liée à l'âge ». Phd thesis, Université du Droit et de la Santé - Lille II, 2011. http://tel.archives-ouvertes.fr/tel-00638964.
Texte intégralPeyrin, Carole. « Reconnaissance des scènes naturelles : approche neurocognitive de la spécialisation hémisphérique du traitement des fréquences spatiales ». Grenoble 2, 2003. http://www.theses.fr/2003GRE29020.
Texte intégralTran, Thi Hà Châu. « La reconnaissance des objets et des scènes naturelles dans la dégénérescence maculaire liée à l’âge ». Thesis, Lille 2, 2011. http://www.theses.fr/2011LIL2S010/document.
Texte intégralAMD (Age Related Macular Degeneration) is the leading cause of blindness in western countries. Quality of life Questionnaires indicate that people with AMD exhibit difficulties in finding objects and in mobility. In the natural environment, objects seldom appear in isolation. They appear in their natural setting in which they can be masked by other objects. The contrast of a scene may also change, as light varies as a function of the hour in the day and the light source. The objective of the study was to access objects and scene recognition impairments in people with AMD. We studied the perception of natural scenes, figure/ground discrimination, the effect of contrast on object recognition in achromatic scenes, and then navigation and spatial memory in a virtual environment. Performance was compared for people with AMD and age matched normally sighted controls. The results show that scene gist recognition can be accomplished with high accuracy with the low spatial resolution of peripheral vision, which supports the “scene centered approach” in scene recognition. Figure/ground discrimination is impaired in AMD. A white space surrounding the object is sufficient to improve its recognition and to facilitate figure/ground segregation. Performance is also improved when the object is displayed on its natural setting than when it appears on a non structured, non significant background. Sensitivity for the detection of a target object in achromatic scenes is impaired in AMD patients, who are more affected by contrast reductions than normally sighted people. A study on spatial nagigation showed a compression of space representation. People with AMD underestimate the virtual distance in a spatial navigation task. The results of our studies have implications for rehabilitation, for improving texts and magazines destined to people with low vision and for the improvement of the spatial environment of people suffering from AMD in order to facilitate mobility, object search and reduce the risk of falls
Romdhane, Rim. « Reconnaissance d'activités et connaissances incertaines dans les scènes vidéos appliquées à la surveillance de personnes âgées ». Phd thesis, Université Nice Sophia Antipolis, 2013. http://tel.archives-ouvertes.fr/tel-00967943.
Texte intégralTrujillo, Morales Noël. « Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets ». Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2007. http://tel.archives-ouvertes.fr/tel-00926395.
Texte intégralPham, Trong-Ton. « Modélisation et recherche de graphes visuels : une approche par modèles de langue pour la reconnaissance de scènes ». Phd thesis, Université de Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00996067.
Texte intégralLemaire, Jérôme. « Utilisation de descriptions de haut niveau et gestion de l'incertitude dans un système de reconnaissance de scènes ». Toulouse, ENSAE, 1996. http://www.theses.fr/1996ESAE0021.
Texte intégralTan, Shengbiao. « Contribution à la reconnaissance automatique des images : application à l'analyse de scènes de vrac planaire en robotique ». Paris 11, 1987. http://www.theses.fr/1987PA112349.
Texte intégralA method for object modeling and overlapped object automatic recognition is presented. Our work is composed of three essential parts: image processing, object modeling, and evaluation, implementation of the stated concepts. In the first part, we present a method of edge encoding which is based on a re-sampling of the data encoded according to Freeman, this method generates an isotropie, homogenous and very precise representation. The second part relates to object modeling. This important step makes much easier the recognition work. The new method proposed characterizes a model with two groups of information : the description group containing the primitives, the discrimination group containing data packs, called "transition vectors". Based on this original method of information organization, a "relative learning" is able to select, to ignore and to update the information concerning the objects already learned, according to the new information to be included into the data base. The recognition is a two - pass process: the first pass determines very efficiently the presence of objects by making use of each object's particularities, and this hypothesis is either confirmed or rejected by the following fine verification pass. The last part describes in detail the experimentation results. We demonstrate the robustness of the algorithms with images in both poor lighting and overlapping objects conditions. The system, named SOFIA, has been installed into an industrial vision system series and works in real time
Trujillo, Morales Noel. « Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets ». Clermont-Ferrand 2, 2007. http://www.theses.fr/2007CLF21803.
Texte intégralOliva, Aude. « Perception de scènes : traitement fréquentiel du signal visuel : aspects psychophysiques et neurophysiologiques ». Grenoble INPG, 1995. http://www.theses.fr/1995INPG0060.
Texte intégralDexter, Émilie. « Modélisation de l'auto-similarité dans les vidéos : applications à la synchronisation de scènes et à la reconnaissance d'actions ». Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/dexter.pdf.
Texte intégralThis PhD work deals with action recognition and image sequence synchronization. We propose to compute temporal similarities of image sequences to build self-similarity matrix. Although these matrices are not strictly view-invariant, they remain stable across views providing temporal descriptors of image sequences useful for synchronization as well as discriminant for action recognition. Synchronization is achieved with a dynamic programming algorithm known as Dynamic Time Warping. We opt for “Bag-of-Features” methods for recognizing actions such as actions are represented either as unordered sets of descriptors or as normalized histograms of quantized descriptor occurrences. Classification is performed by well known classification methods as Nearest Neighbor Classifier or Support Vector Machine. Proposed methods are characterized by their simplicity and flexibility: they do not require point correspondences between views
Besbes, Bassem. « Intégration de méthodes de représentation et de classification pour la détection et la reconnaissance d'obstacles dans des scènes routières ». Phd thesis, INSA de Rouen, 2011. http://tel.archives-ouvertes.fr/tel-00633109.
Texte intégralPotelle, Alexis. « Reconnaissance par propagation d'informations dans une structure hiérarchique de tâches organisée par apprentissage : application à l'interprétation de scènes routières ». Clermont-Ferrand 2, 1996. http://www.theses.fr/1996CLF21790.
Texte intégralIzquierdo, David. « Contribution au développement d'une architecture générique dédiée au suivi d'objets en télésurveillance : application au suivi de véhicules et de visages ». Bordeaux 1, 2004. http://www.theses.fr/2004BOR12889.
Texte intégralEl, Ez Eddine El Dandachy Nancy. « Techniques alternatives de visualisation pour la prise de connaissance de scènes tridimensionnelles ». Limoges, 2007. http://aurore.unilim.fr/theses/nxfile/default/b0a2c636-a13a-4923-97ea-cb655a15baeb/blobholder:0/2007LIMO4043.pdf.
Texte intégralThe fast development of the image synthesis domain, the spread of this domain in lot of applications and then because of the development of PCs in speed and memory capacities, the problem of scene understanding and extracting knowledge is becoming more and more pertinent and complicated. Since the half of the seventies, practically no new basic techniques of visualization were invented. All the researchers’ efforts were focused on the possibility of the enhancement of existent techniques whether by reducing the time of computations, or by inventing photometric models more sophisticated allowing the obtaining of better image quality. Other researchers have turned their attention to search for methods that compute automatically a good point of view position or do an automatic animation around the scene following a path that respect heuristic rules in order to avoid brusque changes that might disconcert the observer. However, these techniques aren't sufficient to resolve the problem of the visualization of all type of scenes created by the PCs so developed nowadays. We are going to propose in this thesis alternative techniques which are based on the combination of existent visualization techniques in order to enhance the understanding of complex scenes. We are going first to study the case of three-dimensional complex scene that contain lot of lights, mirrors and transparent objects which produce realistic effects that might create illusions due of the presence of shadows, reflections and refractions. The presence of these realistic effects might confuse the observer and prevent him to distinguish between real objects of the scene and illusions. In order to enhance the understanding of this type of scenes, we have proposed a new method that combine between the ray tracing realistic technique of visualization with the selective refinement improvement algorithm and the followed contour technique by the code direction method, in order to underline the real objects of the scene by detecting their apparent contours so that we will be able to distinguish them from their reflexions and refractions. Another type of scenes will be introduced in this thesis, scenes which contain objects that include other objects. Three new alternative techniques will be described in order to enhance the visualization end the taking knowledge of this type of scene. The first one leads to visualize the exterior object on wireframe mode while the interior one will be visualized in fill mode. The elimination of hidden surfaces will be regulated by the combination of the z-buffer method with the back facing culling technique. The second approach leads to create a hole on the surface of the exterior object in order to show the interior one. Two methods will be proposed in order to achieve this project. The first one is applied only for scenes where exterior objects are modeled by a polygonal mesh and leads to eliminate the exterior faces which hide the interior object. The second method can be applied to any scene model and leads first to visualize both objects: the exterior and the interior one, and then make darken the pixels which are proportional and orthogonal to the silhouette of the interior object oriented to the outside of the interior object
Song, Jianming. « Contribution à l'étude de la reconnaissance des objets 2-D partiellement visibles ». Compiègne, 1988. http://www.theses.fr/1988COMPD127.
Texte intégralThe problem of recognizing 2D objects from a partially occluded boundary image is considered. Two methods have been proposed which are respectively global-feature-based and local-feature-based. Effort is made to develop an efficient method, capable of recognizing a large number of different objects. The proposed methods are characterized by the use of a decision tree for object classification, the overlapping contour detection and the technique of local feature sequencing. The implementation problems such as the image processing, the object representation as well as the model training are also mentioned in this paper
Pusiol, Guido. « Découverte des activités humaines dans des vidéos ». Nice, 2012. http://www.theses.fr/2012NICE4036.
Texte intégralThe main objective of this thesis is to propose a complete framework for activity discovery, modelling and recognition using video information. The framework uses perceptual information (e. G. Trajectories) as input and goes up to activities (semantics). The framework is divided into five main parts. First, we break the video into clunks to characterize activities. We propose different techniques to extract perceptual features from the chunks. This way, we build packages of perceptual features capable to describing activity occurring in small periods of time. Second, we propose to learn the video contextual information. We build scene models by learning salient perceptual features. The model ends up containing interesting scene regions capable of describing basic semantics (i. E. Region where interactions occur). Third, we propose to reduce the gap between low-level vision information and semantic interpretation, by building an intermediate layer composed of Primitive Events. The proposed representation for primitive events aims at describing the meaningful motions over the scene. This is achieved by abstracting perceptual features using contextual information in an unsupervised manner. Fourth, we propose a pattern – based method to discover activities at multiple resolutions (i. E. Activities and sub-activities). Also, we propose a generative method to model multi-resolution activities. The models are built as a flexible probabilistic framework easy to update. Finally, we propose an activity recognition method that finds in a deterministic manner the occurrences of modelled activities in unseen datasets. Semantics are provided by the method under interaction. All this research work has been evaluated using real datasets of people living in an apartment (home-care application) and elder patient’s ion a hospital
Bąk, Slawomir. « Human re-identification through a video camera network ». Nice, 2012. http://www.theses.fr/2012NICE4040.
Texte intégralThis thesis targets the appearance-based re-identification of humans in images and videos. Human re-identification is defined as a requirement to determine whether a given individual has already appeared over a network of cameras. This problem is particularly hard by significant appearance changes across different camera views, where variations in viewing angle, illumination and object pose, make the problem challenging. We focus on developing robust appearance models that are able to match human appearances registered in disjoint camera views. As encoding of image regions is fundamental for appearance matching, we study different kinds of image descriptors. These different descriptors imply different strategies for appearance matching, bringing different models for the human appearance representation. By applying machine learning techniques, we generate descriptive and discriminative models, which enhance distinctive characteristics of extracted features, improving re-identification accuracy. This thesis makes the following contributions. We propose six techniques for human re-identification. The first two belong to single-shot approaches, in which a single image is sufficient to extract a robust signature. These approaches divide the human body into the predefined body parts and then extract image features. This allows to establish the corresponding body parts, while comparing signatures. The remaining four methods address the re-identification problem using signatures computed from multiple images (multiple-shot case). We propose two techniques which learn online the human appearance model using a boosting scheme. The boosting approaches improve recognition accuracy at the expense of time consumption. The last two approaches either assume the predefined model, or learn offline a model, to meet time requirements. We find that covariance feature is in general the best descriptor for matching appearances across disjoint camera views. As a distance operator of this descriptor is computationally intensive, we also propose a new GPU-based implementation which significantly speeds up computations. Our experiments suggest that mean Riemannian covariance computed from multiple images improves state of the art performance of human re-identification techniques. Finally, we extract two new image sets of individuals for evaluating the multiple-shot scenario
Yang, Di. « Apprendre des représentations vidéo efficaces pour la reconnaissance d'actions ». Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4000.
Texte intégralHuman action recognition is an active research field with significant contributions to applications such as home-care monitoring, human-computer interaction, and game control. However, recognizing human activities in real-world videos remains challenging in learning effective video representations that have a high expressive power to represent human spatio-temporal motion, view-invariant actions, complex composable actions, etc. To address this challenge, this thesis makes three contributions towards learning such effective video representations that can be applied and evaluated on real-world human action classification and segmentation tasks by transfer-learning. The first contribution is to improve the generalizability of human skeleton motion representation models. We propose a unified framework for real-world skeleton human action recognition. The framework includes a novel skeleton model that not only effectively learns spatio-temporal features on human skeleton sequences but also generalizes across datasets. The second contribution extends the proposed framework by introducing two novel joint skeleton action generation and representation learning frameworks for different downstream tasks. The first is a self-supervised framework for learning from synthesized composable motions for skeleton-based action segmentation. The second is a View-invariant model for self-supervised skeleton action representation learning that can deal with large variations across subjects and camera viewpoints. The third contribution targets general RGB-based video action recognition. Specifically, a time-parameterized contrastive learning strategy is proposed. It captures time-aware motions to improve performance of action classification in fine-grained and human-oriented tasks. Experimental results on benchmark datasets demonstrate that the proposed approaches achieve state-of-the-art performance in action classification and segmentation tasks. The proposed frameworks improve the accuracy and interpretability of human activity recognition and provide insights into the underlying structure and dynamics of human actions in videos. Overall, this thesis contributes to the field of video understanding by proposing novel methods for skeleton-based action representation learning, and general RGB video representation learning. Such representations benefit both action classification and segmentation tasks
Mahiddine, Amine. « Recalage hétérogène pour la reconstruction 3D de scènes sous-marines ». Thesis, Aix-Marseille, 2015. http://www.theses.fr/2015AIXM4027/document.
Texte intégralThe survey and the 3D reconstruction of underwater become indispensable for our growing interest in the study of the seabed. Most of the existing works in this area are based on the use of acoustic sensors image.The objective of this thesis is to develop techniques for the fusion of heterogeneous data from a photogrammetric system and an acoustic system.The presented work is organized in three parts. The first is devoted to the processing of 2D data to improve the colors of the underwater images, in order to increase the repeatability of the feature descriptors. Then, we propose a system for creating mosaics, in order to visualize the scene.In the second part, a 3D reconstruction method from an unordered set of several images was proposed. The calculated 3D data will be merged with data from the acoustic system in order to reconstruct the underwater scene.In the last part of this thesis, we propose an original method of 3D registration in terms of the nature of the descriptor extracted at each point. The descriptor that we propose is invariant to isometric transformations (rotation, transformation) and addresses the problem of multi-resolution. We validate our approach with a study on synthetic and real data, where we show the limits of the existing methods of registration in the literature. Finally, we propose an application of our method to the recognition of 3D objects
De, Mezzo Benoît. « Reconnaissance d'objets par la génération d'hypothèses de modèles de forme appliquée à l'extraction des feuilles de plantes dans des scènes naturelles complexes ». Montpellier 2, 2004. http://www.theses.fr/2004MON20153.
Texte intégralMinetto, Rodrigo. « Reconnaissance de zones de texte et suivi d'objets dans les images et les vidéos ». Paris 6, 2012. http://www.theses.fr/2012PA066108.
Texte intégralIn this thesis we address three computer vision problems: (1) the detection and recognition of flat text objects in images of real scenes; (2) the tracking of such text objects in a digital video; and (3) the tracking an arbitrary three-dimensional rigid object with known markings in a digital video. For each problem we developed innovative algorithms, which are at least as accurate and robust as other state-of-the-art algorithms. Specifically, for text recognition we developed (and extensively evaluated) a new HOG-based descriptor specialized for Roman script, which we call T-HOG, and showed its value as a post-filter for an existing text detector (SnooperText). We also improved the SnooperText algorithm by using the multi-scale technique to handle widely different letter sizes while limiting the sensitivity of the algorithm to various artifacts. For text tracking, we describe four basic ways of combining a text detector and a text tracker, and we developed a specific tracker based on a particle-filter which exploits the T-HOG recognizer. For rigid object tracking we developed a new accurate and robust algorithm (AFFTrack) that combines the KLT feature tracker with an improved camera calibration procedure. We extensively tested our algorithms on several benchmarks well-known in the literature. We also created benchmarks (publicly available) for the evaluation of text detection and tracking and rigid object tracking algorithms
Dahyot, Rozenn. « Analyse d'images séquentielles de scènes routières par modèle d'apparence pour la gestion du réseau routier ». Université Louis Pasteur (Strasbourg) (1971-2008), 2001. https://publication-theses.unistra.fr/public/theses_doctorat/2001/DAHYOT_Rozenn_2001.pdf.
Texte intégralDeléarde, Robin. « Configurations spatiales et segmentation pour la compréhension de scènes, application à la ré-identification ». Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7020.
Texte intégralModeling the spatial configuration of objects in an image is a subject that is still little discussed to date, including in the most modern computer vision approaches such as convolutional neural networks ,(CNN). However, it is an essential aspect of scene perception, and integrating it into the models should benefit many tasks in the field, by helping to bridge the “semantic gap” between the digital image and the interpretation of its content. Thus, this thesis aims to improve spatial configuration modeling ,techniques, in order to exploit it in description and recognition systems. ,First, we looked at the case of the spatial configuration between two objects, by proposing an improvement of an existing descriptor. This new descriptor called “force banner” is an extension of the histogram of the same name to a whole range of forces, which makes it possible to better describe complex configurations. We were able to show its interest in the description of scenes, by learning toautomatically classify relations in natural language from pairs of segmented objects. We then tackled the problem of the transition to scenes containing several objects and proposed an approach per object by confronting each object with all the others, rather than having one descriptor per pair. Secondly, the industrial context of this thesis led us to deal with an application to the problem of re-identification of scenes or objects, a task which is similar to fine recognition from few examples. To do so, we rely on a traditional approach by describing scene components with different descriptors dedicated to specific characteristics, such as color or shape, to which we add the spatial configuration. The comparison of two scenes is then achieved by matching their components thanks to these characteristics, using the Hungarian algorithm for instance. Different combinations of characteristics can be considered for the matching and for the final score, depending on the present and desired invariances. For each one of these two topics, we had to cope with the problems of data and segmentation. We then generated and annotated a synthetic dataset, and exploited two existing datasets by segmenting them, in two different frameworks. The first approach concerns object-background segmentation and more precisely the case where a detection is available, which may help the segmentation. It consists in using an existing global segmentation model and exploiting the detection to select the right segment, by using several geometric and semantic criteria. The second approach concerns the decomposition of a scene or an object into parts and addresses the unsupervised case. It is based on the color of the pixels, by using a clustering method in an adapted color space, such as the HSV cone that we used. All these works have shown the possibility of using the spatial configuration for the description of real scenes containing several objects, as well as in a complex processing chain such as the one we used for re-identification. In particular, the force histogram could be used for this, which makes it possible to take advantage of its good performance, by using a segmentation method adapted to the use case when processing natural images
Bey, Aurélien. « Reconstruction de modèles CAO de scènes complexes à partir de nuages de points basés sur l’utilisation de connaissances a priori ». Thesis, Lyon 1, 2012. http://www.theses.fr/2012LYO10103/document.
Texte intégral3D models are often used in order to plan the maintenance of industrial environments.When it comes to the simulation of maintenance interventions, these 3D models have todescribe accurately the actual state of the scenes they stand for. These representationsare usually built from 3D point clouds that are huge set of 3D measurements acquiredin industrial sites, which guarantees the accuracy of the resulting 3D model. Althoughthere exists many works addressing the reconstruction problem, there is no solution toour knowledge which can provide results that are reliable enough to be further used inindustrial applications. Therefore this task is in fact handled by human experts nowadays.This thesis aims at providing a solution automating the reconstruction of industrialsites from 3D point clouds and providing highly reliable results. For that purpose, ourapproach relies on some available a priori knowledge and data about the scene to beprocessed. First, we consider that the 3D models of industrial sites are made of simpleprimitive shapes. Indeed, in the Computer Aided Design (CAD) field, this kind of scenesare described as assemblies of shapes such as planes, spheres, cylinders, cones, tori, . . . Ourown work focuses on planes, cylinders and tori since these three kind of shapes allow thedescription of most of the main components in industrial environment. Furthermore, weset some a priori rules about the way shapes should be assembled in a CAD model standingfor an industrial facility, which are based on expert knowledge about these environments.Eventually, we suppose that a CAD model standing for a scene which is similar to theone to be processed is available. This a priori CAO model typically comes from the priorreconstruction of a scene which looks like the one we are interested in. Despite the factthat they are similar theoretically, there may be significant differences between the sitessince each one has its own life cycle.Our work first states the reconstruction task as a Bayesian problem in which we haveto find the most probable CAD Model with respect to both the point cloud and the a prioriexpectations. In order to reach the CAD model maximizing the target probability, wepropose an iterative approach which improves the solution under construction each time anew randomly generated shape is tried to be inserted in it. Thus, the CAD model is builtstep by step by adding and removing shapes, until the algorithm gets to a local maximumof the target probability
Kulikova, Maria. « Shape recognition for image scene analysis ». Nice, 2009. http://www.theses.fr/2009NICE4081.
Texte intégralThis thesis includes two main parts. In the first part we address the problem of tree crown classification into species using shape features, without, or in combination with, those of radiometry and texture, to demonstrate that shape information improves classification performance. For this purpose, we first study the shapes of tree crowns extracted from very high resolution aerial infra-red images. For our study, we choose a methodology based on the shape analysis of closed continuous curves on shape spaces using geodesic paths under the bending metric with the angle function curve representation, and the elastic metric with the square root q-function representation? A necessary preliminary step to classification is extraction of the tree crowns. In the second part, we address thus the problem of extraction of multiple objects with complex, arbitrary shape from remote sensing images of very high resolution. We develop a model based on marked point process. Its originality lies on its use of arbitrarily-shaped objects as opposed to parametric shape objects, e. G. Ellipses or rectangles. The shapes considered are obtained by local minimisation of an energy of contour active type with weak and the strong shape prior knowledge included. The objects in the final (optimal) configuration are then selected from amongst these candidates by a birth-and-death dynamics embedded in an annealing scheme. The approach is validated on very high resolutions of forest provided by the Swedish University of Agriculture
Vaquette, Geoffrey. « Reconnaissance robuste d'activités humaines par vision ». Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS090.
Texte intégralThis thesis focuses on supervised activity segmentation from video streams within application context of smart homes. Three semantic levels are defined, namely gesture, action and activity, this thesis focuses mainly on the latter. Based on the Deeply Optimized Hough Transform paridigm, three fusion levels are introduced in order to benefit from various modalities. A review of existing action based datasets is presented and the lack of activity detection oriented database is noticed. Then, a new dataset is introduced. It is composed of unsegmented long time range daily activities and has been recorded in a realistic environment. Finaly, a hierarchical activity detection method is proposed aiming to detect high level activities from unsupervised action detection
Lefèvre, Florent. « Contributions au montage automatique de scènes complexes multi-vues en interaction avec l'environnement ». Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0239.
Texte intégralThis thesis, resulting from a collaboration between CRAN and CitizenCam, aims to capture and broadcast public events at a lower cost. Thus, the company wishes to offer an automatic editing system, adaptable to each application context and taking into account the spectators’ requirements. A bibliographical study on the automatic editing of video sequences is presented in the first chapter. This study shows that the existing methods are very specific to the application context and thus not very generalizable. The objective of the second chapter is therefore to propose a methodological approach to automatic editing, based on a generic framework adaptable according to the context, while taking into account user preferences. This approach, based on the knowledge modelling of the application context using the NIAM-ORM method, allows us to identify people (POI) and actions (AOI) of interest. The modelled knowledge also facilitate the choice and configuration of algorithms for extracting the POI and AOI features required for editing. Chapter 3 focuses on implementation of an automatic editing system for municipal councils with the proposal of an original speaker detection method and its identification based on the VLC concept. The broadcasting of basketball matches is covered in Chapter 4 with the proposal of an automatic camera selection method for broadcasting of the AOI "relevant game" with two customizations that are free throw detection and player tracking. Thus, the proposed methodology is validated by its application to this two types of events
Zuniga, Marcos. « Incremental learning of events in video using reliable information ». Nice, 2008. http://www.theses.fr/2008NICE4098.
Texte intégralThe goal of this thesis is to propose a general video understanding framework for learning and recognition of events occurring in videos, for real world applications. This video understanding frameworks is composed of four tasks : first, at each video frame, a segmentation task detects the moving regions, represented by bounding boxes enclosing them. Second, a new 3D classifier associates to each moving region an object class label (e. G. Person, vehicle) and a 3D parallelepiped described by its width, height, length, position, orientation, and visual reliability measures of these attributes. Third, a new multi-object tracking algorithm uses these object descriptions to generate tracking hypotheses about the objects evolving in the scene. Finally, a new incremental event learning algorithm aggregates on-line the attributes and reliability information of the tracked objects to learn a hierarchy of concepts describing the events occurring in the scene. Reliability measures are used to focus the learning process on the most valuable information. Simultaneously, the event learning approach recognizes the events associated to the objects evolving in the scene. The tracking approach has been validated using video-surveillance benchmarks publicly accessible. The complete video understanding framework has been evaluated with videos for a real elderly care application. The framework has been able to successfully learn events related to trajectory (e. G. Change in 3D position and velocity), posture (e. G. Standing up, crouching), and object interaction (e. G. Person approaching to a table), among other events, with a minimal configuration effort
Crouzet, Sébastien. « Jeter un regard sur une phase précoce des traitements visuels ». Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00505864.
Texte intégralStrat, Sabin Tiberius. « Analyse et interprétation de scènes visuelles par approches collaboratives ». Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00959081.
Texte intégralOesau, Sven. « Modélisation géométrique de scènes intérieures à partir de nuage de points ». Thesis, Nice, 2015. http://www.theses.fr/2015NICE4034/document.
Texte intégralGeometric modeling and semantization of indoor scenes from sampled point data is an emerging research topic. Recent advances in acquisition technologies provide highly accurate laser scanners and low-cost handheld RGB-D cameras for real-time acquisition. However, the processing of large data sets is hampered by high amounts of clutter and various defects such as missing data, outliers and anisotropic sampling. This thesis investigates three novel methods for efficient geometric modeling and semantization from unstructured point data: Shape detection, classification and geometric modeling. Chapter 2 introduces two methods for abstracting the input point data with primitive shapes. First, we propose a line extraction method to detect wall segments from a horizontal cross-section of the input point cloud. Second, we introduce a region growing method that progressively detects and reinforces regularities of planar shapes. This method utilizes regularities common to man-made architecture, i.e. coplanarity, parallelism and orthogonality, to reduce complexity and improve data fitting in defect-laden data. Chapter 3 introduces a method based on statistical analysis for separating clutter from structure. We also contribute a supervised machine learning method for object classification based on sets of planar shapes. Chapter 4 introduces a method for 3D geometric modeling of indoor scenes. We first partition the space using primitive shapes detected from permanent structures. An energy formulation is then used to solve an inside/outside labeling of a space partitioning, the latter providing robustness to missing data and outliers
Devanne, Maxime. « 3D human behavior understanding by shape analysis of human motion and pose ». Thesis, Lille 1, 2015. http://www.theses.fr/2015LIL10138/document.
Texte intégralThe emergence of RGB-D sensors providing the 3D structure of both the scene and the human body offers new opportunities for studying human motion and understanding human behaviors. However, the design and development of models for behavior recognition that are both accurate and efficient is a challenging task due to the variability of the human pose, the complexity of human motion and possible interactions with the environment. In this thesis, we first focus on the action recognition problem by representing human action as the trajectory of 3D coordinates of human body joints over the time, thus capturing simultaneously the body shape and the dynamics of the motion. The action recognition problem is then formulated as the problem of computing the similarity between shape of trajectories in a Riemannian framework. Experiments carried out on four representative benchmarks demonstrate the potential of the proposed solution in terms of accuracy/latency for a low-latency action recognition. Second, we extend the study to more complex behaviors by analyzing the evolution of the human pose shape to decompose the motion stream into short motion units. Each motion unit is then characterized by the motion trajectory and depth appearance around hand joints, so as to describe the human motion and interaction with objects. Finally, the sequence of temporal segments is modeled through a Dynamic Naive Bayesian Classifier. Experiments on four representative datasets evaluate the potential of the proposed approach in different contexts, including recognition and online detection of behaviors
Zouba, Valentin Nadia. « Multisensor fusion for monitoring elderly activities at home ». Nice, 2010. http://www.theses.fr/2010NICE4001.
Texte intégralIn this thesis, an approach combining heterogeneous sensor data for recognizing elderly activities at home is proposed. This approach consists in combining data provided by video cameras with data provided by environmental sensors to monitor the interaction of people with the environment. The first contribution is a new sensor model able to give a coherent and efficient representation of the information provided by various types of physical sensors. This sensor model includes an uncertainty in sensor measurement. The second contribution is a multisensor based activity recognition approach. This approach consists in detecting people, tracking people as they move, recognizing human postures and recognizing activities of interest based on multisensor analysis and human activity recognition. To address the problem of heterogeneous sensor system, we choose to perform fusion at the high-level (event level) by combining video events with environmental events. The third contribution is the extension of a description language which lets users (i. E. Medical staff) to describe the activities of interest into formal models. The results of this approach are shown for the recognition of ADLs of real elderly people evolving in an experimental apartment called Gerhome equipped with video sensors and environmental sensors. The obtained results of the recognition of the different ADLs are encouraging
Delorme, Arnaud. « Traitement visuel rapide de scènes naturelles chez le singe, l'homme et la machine : une vision qui va de l'avant ». Phd thesis, Université Paul Sabatier - Toulouse III, 2000. http://tel.archives-ouvertes.fr/tel-00078924.
Texte intégralErcolessi, Philippe. « Extraction multimodale de la structure narrative des épisodes de séries télévisées ». Toulouse 3, 2013. http://thesesups.ups-tlse.fr/2056/.
Texte intégralOur contributions concern the extraction of the structure of TV series episodes at two hierarchical levels. The first level of structuring is to find the scene transitions based on the analysis of the color information and the speakers involved in the scenes. We show that the analysis of the speakers improves the result of a color-based segmentation into scenes. It is common to see several stories (or lines of action) told in parallel in a single TV series episode. Thus, the second level of structure is to cluster scenes into stories. We seek to deinterlace the stories in order to visualize the different lines of action independently. The main difficulty is to determine the most relevant descriptors for grouping scenes belonging to the same story. We explore the use of descriptors from the three different modalities described above. We also propose methods to combine these three modalities. To address the variability of the narrative structure of TV series episodes, we propose a method that adapts to each episode. It can automatically select the most relevant clustering method among the various methods we propose. Finally, we developed StoViz, a tool for visualizing the structure of a TV series episode (scenes and stories). It allows an easy browsing of each episode, revealing the different stories told in parallel. It also allows playback of episodes story by story, and visualizing a summary of the episode by providing a short overview of each story
Gidel, Samuel. « Méthodes de détection et de suivi multi-piétons multi-capteurs embarquées sur un véhicule routier : application à un environnement urbain ». Clermont-Ferrand 2, 2010. http://www.theses.fr/2010CLF22028.
Texte intégralPerotin, Lauréline. « Localisation et rehaussement de sources de parole au format Ambisonique : analyse de scènes sonores pour faciliter la commande vocale ». Electronic Thesis or Diss., Université de Lorraine, 2019. http://www.theses.fr/2019LORR0124.
Texte intégralThis work was conducted in the fast-growing context of hands-free voice command. In domestic environments, smart devices are usually laid in a fixed position, while the human speaker gives orders from anywhere, not necessarily next to the device, or nor even facing it. This adds difficulties compared to the problem of near-field voice command (typically for mobile phones) : strong reverberation, early reflections on furniture around the device, and surrounding noises can degrade the signal. Moreover, other speakers may interfere, which make the understanding of the target speaker quite difficult. In order to facilitate speech recognition in such adverse conditions, several preprocessing methods are introduced here. We use a spatialized audio format suitable for audio scene analysis : the Ambisonic format. We first propose a sound source localization method that relies on a convolutional and recurrent neural network. We define an input feature vector inspired by the acoustic intensity vector which improves the localization performance, in particular in real conditions involving several speakers and a microphone array laid on a table. We exploit the visualization technique called layerwise relevance propagation (LRP) to highlight the time-frequency zones that are correlate positively with the network output. This analysis is of paramount importance to establish the validity of a neural network. In addition, it shows that the neural network essentially relies on time-frequency zones where direct sound dominates reverberation and background noise. We then present a method to enhance the voice of the main speaker and ease its recognition. We adopt a mask-based beamforming framework based on a time-frequency mask estimated by a neural network. To deal with the situation of multiple speakers with similar loudness, we first use a wideband beamformer to enhance the target speaker thanks to the associated localization information. We show that this additional information is not enough for the network when two speakers are close to each other. However, if we also give an enhanced version of the interfering speaker as input to the network, it returns much better masks. The filters generated from those masks greatly improve speech recognition performance. We evaluate this algorithm in various environments, including real ones, with a black-box automatic speech recognition system. Finally, we combine the proposed localization and enhancement systems and evaluate the robustness of the latter to localization errors in real environments
Ménier, Clément. « Système de vision temps-réel pour les intéractions ». Grenoble INPG, 2007. http://www.theses.fr/2007INPG0041.
Texte intégralThis thesis focuses on the the real time acquisition of 3D information on a scene from multiple camera in the context of interactive applications. A complete vision system from image acquisition to motion and shape modeling is presented. The distribution of tasks on a PC cluster, and more precisely the parallelization of different shape modeling algorithms, enables a real time execution with a low latency. Several applications are developped and validate the practical implementation of this system. An original approach of motion modeling is lso presented. It allows for limbs tracking and identification white not requiring prior information on the shape of the user
Tu, Xiao-Wei. « Détection et estimation des objets mobiles dans une séquence d'images ». Compiègne, 1987. http://www.theses.fr/1987COMPD063.
Texte intégralAlqasir, Hiba. « Apprentissage profond pour l'analyse de scènes de remontées mécaniques : amélioration de la généralisation dans un contexte multi-domaines ». Thesis, Lyon, 2020. http://www.theses.fr/2020LYSES045.
Texte intégralThis thesis presents our work on chairlift safety using deep learning techniques as part of the Mivao project, which aims to develop a computer vision system that acquires images of the chairlift boarding station, analyzes the crucial elements, and detects dangerous situations. In this scenario, we have different chairlifts spread over different ski resorts, with a high diversity of acquisition conditions and geometries; thus, each chairlift is considered a domain. When the system is installed for a new chairlift, the objective is to perform an accurate and reliable scene analysis, given the lack of labeled data on this new domain (chairlift).In this context, we mainly concentrate on the chairlift safety bar and propose to classify each image into two categories, depending on whether the safety bar is closed (safe) or open (unsafe). Thus, it is an image classification problem with three specific features: (i) the image category depends on a small detail (the safety bar) in a cluttered background, (ii) manual annotations are not easy to obtain, (iii) a classifier trained on some chairlifts should provide good results on a new one (generalization). To guide the classifier towards the important regions of the images, we have proposed two solutions: object detection and Siamese networks. Furthermore, we analyzed the generalization property of these two approaches. Our solutions are motivated by the need to minimize human annotation efforts while improving the accuracy of the chairlift safety problem. However, these contributions are not necessarily limited to this specific application context, and they may be applied to other problems in a multi-domain context
Bardet, François. « Suivi et catégorisation multi-objets par vision artificielle ». Phd thesis, Clermont-Ferrand 2, 2009. http://www.theses.fr/2009CLF21972.
Texte intégralHannecart, Claire. « Des musiciens sur les scènes locales en Nord de France : formes d'engagement et enjeux de pluriactivité des pratiques de création collective ». Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL12031/document.
Texte intégralThe present research studies the social practices to be observed on local scenes, i.e. groups of various actors as musicians, “support systems” and audiences. This thesis contributes to the understanding of the way creators are committed to practices driven by a desire to express their singularity. What is at stake here is to identify how these practices have been shaped by pluralist social representations in northern France. A dual theoretical framework combines both comprehensive and pragmatic sociologies. The period under study spans 5 years from 2009 to 2013 and the empirical study lies on two methodological approaches. On the one hand, a qualitative analysis based upon both semi-structured and unstructured interviews with 52 respondents that were involved one way or another on local scenes, be they artists or associate private or political intermediaries. On the other hand, a quantitative survey used to verify the empirical data relative to the practices and profiles of the musicians, the sample being made of musicians from the city of Lille. The results show the ambivalence in the representation of all the actors that contribute to the formation of local scenes. The cooperative dimension of such practices in keeping with material conditions favored by the digital era have been underlined. Finally, the artisan dimension of the projects represents one of the major stakes this research highlights
Perotin, Lauréline. « Localisation et rehaussement de sources de parole au format Ambisonique : analyse de scènes sonores pour faciliter la commande vocale ». Thesis, Université de Lorraine, 2019. http://www.theses.fr/2019LORR0124/document.
Texte intégralThis work was conducted in the fast-growing context of hands-free voice command. In domestic environments, smart devices are usually laid in a fixed position, while the human speaker gives orders from anywhere, not necessarily next to the device, or nor even facing it. This adds difficulties compared to the problem of near-field voice command (typically for mobile phones) : strong reverberation, early reflections on furniture around the device, and surrounding noises can degrade the signal. Moreover, other speakers may interfere, which make the understanding of the target speaker quite difficult. In order to facilitate speech recognition in such adverse conditions, several preprocessing methods are introduced here. We use a spatialized audio format suitable for audio scene analysis : the Ambisonic format. We first propose a sound source localization method that relies on a convolutional and recurrent neural network. We define an input feature vector inspired by the acoustic intensity vector which improves the localization performance, in particular in real conditions involving several speakers and a microphone array laid on a table. We exploit the visualization technique called layerwise relevance propagation (LRP) to highlight the time-frequency zones that are correlate positively with the network output. This analysis is of paramount importance to establish the validity of a neural network. In addition, it shows that the neural network essentially relies on time-frequency zones where direct sound dominates reverberation and background noise. We then present a method to enhance the voice of the main speaker and ease its recognition. We adopt a mask-based beamforming framework based on a time-frequency mask estimated by a neural network. To deal with the situation of multiple speakers with similar loudness, we first use a wideband beamformer to enhance the target speaker thanks to the associated localization information. We show that this additional information is not enough for the network when two speakers are close to each other. However, if we also give an enhanced version of the interfering speaker as input to the network, it returns much better masks. The filters generated from those masks greatly improve speech recognition performance. We evaluate this algorithm in various environments, including real ones, with a black-box automatic speech recognition system. Finally, we combine the proposed localization and enhancement systems and evaluate the robustness of the latter to localization errors in real environments