Dissertations / Theses on the topic 'Analyse de scènes par vision'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Analyse de scènes par vision.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Le, Borgne Hervé. "Analyse de scènes naturelles par Composantes Indépendantes." Phd thesis, Grenoble INPG, 2004. http://tel.archives-ouvertes.fr/tel-00005925.
Strat, Sabin Tiberius. "Analyse et interprétation de scènes visuelles par approches collaboratives." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00959081.
Carrasco, Miguel. "Non-calibrated multiple views : applications and methodologies." Paris 6, 2010. http://www.theses.fr/2010PA066015.
Frémont, Vincent. "Analyse de séquences d'images pour la reconstruction 3D euclidienne : cas des scènes complexes pour des mouvements de caméra contraints et non contraints." Nantes, 2003. http://www.theses.fr/2003NANT2102.
Servant, Fabien. "Localisation et cartographie simultanées en vision monoculaire et en temps réel basé sur les structures planes." Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/servant.pdf.
Our work deals with computer vision. The problem of augmented reality implies a real time estimation of the relive position between camera and scene. This thesis presents a complete method of pose tracking that works with planar structures which are abundant in indoor and outdoor urban environments. The pose tracking is done using a low cost camera and an inertial sensor. Our approach is to use the planes to make the pose estimation easier. Homographies computed by an image tracking algorithm presented in this document are used as measurements for our Simultaneous Localization And Mapping method. This SLAM method permits a long term and robust pose tracking by propagating the measurements uncertainties. Works about selection of regions to track and their corresponding plane parameters initialization are also described in this thesis. Numerical and image based experiments shows the validity of our approach
Alquier, Laurent. "Analyse et représentation de scènes complexes par groupement perceptuel : Application à la perception de structures curvilignes." Montpellier 2, 1998. http://www.theses.fr/1998MON20137.
Boukarri, Bachir. "Reconstruction 3D récursive de scènes structurées au moyen d'une caméra mobile : application à la robotique." Paris 11, 1989. http://www.theses.fr/1989PA112290.
Dahyot, Rozenn. "Analyse d'images séquentielles de scènes routières par modèle d'apparence pour la gestion du réseau routier." Université Louis Pasteur (Strasbourg) (1971-2008), 2001. https://publication-theses.unistra.fr/public/theses_doctorat/2001/DAHYOT_Rozenn_2001.pdf.
Bugeau, Aurélie. "Détection et suivi d'objets en mouvement dans des scènes complexes : application à la surveillance des conducteurs." Rennes 1, 2007. ftp://ftp.irisa.fr/techreports/theses/2007/bugeau.pdf.
Detecting and tracking moving objects in dynamic scenes is a hard but essential task in a large number of computer vision applications such as surveillance. This thesis aims at detecting, segmenting and tracking foreground moving objects in sequences (such as driver sequences) having highly dynamic backgrounds, illumination changes and low contrasts, and possibly shot by a moving camera. Two main steps compose the thesis. First, moving points, described by their motion and color, are selected within a sub-grid of image pixels. Clusters of points are then formed using a variable bandwidth mean shift with automatic bandwidth selection. In a second part, a tracking method is proposed. It combines color and motion distributions, the prediction of the tracked object and some external observations (which can be the clusters from the detector) into an energy function minimized with Graph Cuts
Bardet, François. "Suivi et catégorisation multi-objets par vision artificielle." Phd thesis, Clermont-Ferrand 2, 2009. http://www.theses.fr/2009CLF21972.
Veit, Thomas. "Détection et analyse de mouvements dans des séquences d'images par une approche probabiliste a contrario." Rennes 1, 2005. http://www.theses.fr/2005REN1S096.
Pusiol, Guido. "Découverte des activités humaines dans des vidéos." Nice, 2012. http://www.theses.fr/2012NICE4036.
The main objective of this thesis is to propose a complete framework for activity discovery, modelling and recognition using video information. The framework uses perceptual information (e. G. Trajectories) as input and goes up to activities (semantics). The framework is divided into five main parts. First, we break the video into clunks to characterize activities. We propose different techniques to extract perceptual features from the chunks. This way, we build packages of perceptual features capable to describing activity occurring in small periods of time. Second, we propose to learn the video contextual information. We build scene models by learning salient perceptual features. The model ends up containing interesting scene regions capable of describing basic semantics (i. E. Region where interactions occur). Third, we propose to reduce the gap between low-level vision information and semantic interpretation, by building an intermediate layer composed of Primitive Events. The proposed representation for primitive events aims at describing the meaningful motions over the scene. This is achieved by abstracting perceptual features using contextual information in an unsupervised manner. Fourth, we propose a pattern – based method to discover activities at multiple resolutions (i. E. Activities and sub-activities). Also, we propose a generative method to model multi-resolution activities. The models are built as a flexible probabilistic framework easy to update. Finally, we propose an activity recognition method that finds in a deterministic manner the occurrences of modelled activities in unseen datasets. Semantics are provided by the method under interaction. All this research work has been evaluated using real datasets of people living in an apartment (home-care application) and elder patient’s ion a hospital
Bąk, Slawomir. "Human re-identification through a video camera network." Nice, 2012. http://www.theses.fr/2012NICE4040.
This thesis targets the appearance-based re-identification of humans in images and videos. Human re-identification is defined as a requirement to determine whether a given individual has already appeared over a network of cameras. This problem is particularly hard by significant appearance changes across different camera views, where variations in viewing angle, illumination and object pose, make the problem challenging. We focus on developing robust appearance models that are able to match human appearances registered in disjoint camera views. As encoding of image regions is fundamental for appearance matching, we study different kinds of image descriptors. These different descriptors imply different strategies for appearance matching, bringing different models for the human appearance representation. By applying machine learning techniques, we generate descriptive and discriminative models, which enhance distinctive characteristics of extracted features, improving re-identification accuracy. This thesis makes the following contributions. We propose six techniques for human re-identification. The first two belong to single-shot approaches, in which a single image is sufficient to extract a robust signature. These approaches divide the human body into the predefined body parts and then extract image features. This allows to establish the corresponding body parts, while comparing signatures. The remaining four methods address the re-identification problem using signatures computed from multiple images (multiple-shot case). We propose two techniques which learn online the human appearance model using a boosting scheme. The boosting approaches improve recognition accuracy at the expense of time consumption. The last two approaches either assume the predefined model, or learn offline a model, to meet time requirements. We find that covariance feature is in general the best descriptor for matching appearances across disjoint camera views. As a distance operator of this descriptor is computationally intensive, we also propose a new GPU-based implementation which significantly speeds up computations. Our experiments suggest that mean Riemannian covariance computed from multiple images improves state of the art performance of human re-identification techniques. Finally, we extract two new image sets of individuals for evaluating the multiple-shot scenario
Atrevi, Dieudonne Fabrice. "Détection et analyse des évènements rares par vision, dans un contexte urbain ou péri-urbain." Thesis, Orléans, 2019. http://www.theses.fr/2019ORLE2008.
The main objective of this thesis is the development of complete methods for rare events detection. The works can be summarized in two parts. The first part is devoted to the study of shapes descriptors of the state of the art. On the one hand, the robustness of some descriptors to varying light conditions was studied.On the other hand, the ability of geometric moments to describe the human shape was also studied through a3D human pose estimation application based on 2D images. From this study, we have shown that through a shape retrieval application, geometric moments can be used to estimate a human pose through an exhaustive search in a pose database. This kind of application can be used in human actions recognition system which may be a final step of an event analysis system. In the second part of this report, three main contributions to rare event detection are presented. The first contribution concerns the development of a global scene analysis method for crowd event detection. In this method, global scene modeling is done based on spatiotemporal interest points filtered from the saliency map of the scene. The characteristics used are the histogram of the optical flow orientations and a set of shapes descriptors studied in the first part. The Latent Dirichlet Allocation algorithm is used to create event models by using a visual document representation of image sequences(video clip). The second contribution is the development of a method for salient motions detection in video.This method is totally unsupervised and relies on the properties of the discrete cosine transform to explore the optical flow information of the scene. Local modeling for events detection and localization is at the core of the latest contribution of this thesis. The method is based on the saliency score of movements and one class SVM algorithm to create the events model. The methods have been tested on different public database and the results obtained are promising
Hervieu, Alexandre. "Analyse de trajectoires vidéos à l'aide de modélisations markoviennes pour l'interprétation de contenus." Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/hervieu.pdf.
This thesis deals with the use of trajectories extracted from videos. The approach is invariant to translation, to rotation and to scaling and takes into account both shape and dynamics-related information on the trajectories. A hidden Markov model (HMM) is proposed to handle lack of observations and parameters are properly estimated. A similarity measure between HMM is used to tackle three dynamic video content understanding tasks: recognition, clustering and detection of unexpected events. Hierarchical semi-Markov chains are developed to process interacting trajectories. The interactions between trajectories are taken into used to recognize activity phases. Our method has been evaluated on sets of trajectories extracted from squash and handball video. Applications of such interaction-based models have also been extended to 3D gesture and action recognition and clustering. The results show that taking into account the interactions may be of great interest for such applications
Dexter, Émilie. "Modélisation de l'auto-similarité dans les vidéos : applications à la synchronisation de scènes et à la reconnaissance d'actions." Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/dexter.pdf.
This PhD work deals with action recognition and image sequence synchronization. We propose to compute temporal similarities of image sequences to build self-similarity matrix. Although these matrices are not strictly view-invariant, they remain stable across views providing temporal descriptors of image sequences useful for synchronization as well as discriminant for action recognition. Synchronization is achieved with a dynamic programming algorithm known as Dynamic Time Warping. We opt for “Bag-of-Features” methods for recognizing actions such as actions are represented either as unordered sets of descriptors or as normalized histograms of quantized descriptor occurrences. Classification is performed by well known classification methods as Nearest Neighbor Classifier or Support Vector Machine. Proposed methods are characterized by their simplicity and flexibility: they do not require point correspondences between views
Trujillo, Morales Noel. "Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets." Clermont-Ferrand 2, 2007. http://www.theses.fr/2007CLF21803.
Ménier, Clément. "Système de vision temps-réel pour les intéractions." Grenoble INPG, 2007. http://www.theses.fr/2007INPG0041.
This thesis focuses on the the real time acquisition of 3D information on a scene from multiple camera in the context of interactive applications. A complete vision system from image acquisition to motion and shape modeling is presented. The distribution of tasks on a PC cluster, and more precisely the parallelization of different shape modeling algorithms, enables a real time execution with a low latency. Several applications are developped and validate the practical implementation of this system. An original approach of motion modeling is lso presented. It allows for limbs tracking and identification white not requiring prior information on the shape of the user
Leignel, Christine. "Modèle 2D du corps pour l'analyse des gestes par l'image via une architecture de type tableau noir : application aux interfaces homme-machine évoluées." Rennes 1, 2006. http://www.theses.fr/2006REN1S095.
Ayral, Bruno. "Conception d'un système modulaire pour l'utilisation de connaissances hétérogènes en inspection visuelle de scènes : Application en vision tridimensionnelle ultra-sonore." Compiègne, 1990. http://www.theses.fr/1990COMPD246.
Oudjail, Veïs. "Réseaux de neurones impulsionnels appliqués à la vision par ordinateur." Electronic Thesis or Diss., Université de Lille (2022-....), 2022. http://www.theses.fr/2022ULILB048.
Artificial neural networks (ANN) have become a must-have technique in computer vision, a trend that started during the 2012 ImageNet challenge. However, this success comes with a non-negligible human cost for manual data labeling, very important in model learning, and a high energy cost caused by the need for large computational resources. Spiking Neural Networks (SNN) provide solutions to these problems. It is a particular class of ANNs, close to the biological model, in which neurons communicate asynchronously by representing information through spikes. The learning of SNNs can rely on an unsupervised rule: the STDP. It modulates the synaptic weights according to the local temporal correlations observed between the incoming and outgoing spikes. Different hardware architectures have been designed to exploit the properties of SNNs (asynchrony, sparse and local operation, etc.) in order to design low-power solutions, some of them dividing the cost by several orders of magnitude. SNNs are gaining popularity and there is growing interest in applying them to vision. Recent work shows that SNNs are maturing by being competitive with the state of the art on "simple" image datasets such as MNIST (handwritten numbers) but not on more complex datasets. However, SNNs can potentially stand out from ANNs in video processing. The first reason is that these models incorporate an additional temporal dimension. The second reason is that they lend themselves well to the use of event-driven cameras. They are bio-inspired sensors that perceive temporal contrasts in a scene, in other words, they are sensitive to motion. Each pixel can detect a light variation (positive or negative), which triggers an event. Coupling these cameras to neuromorphic chips allows the creation of totally asynchronous and massively parallelized vision systems. The objective of this thesis is to exploit the capabilities offered by SNNs in video processing. In order to explore the potential offered by SNNs, we are interested in motion analysis and more particularly in motion direction estimation. The goal is to develop a model capable of learning incrementally, without supervision and with few examples, to extract spatiotemporal features. We have therefore performed several studies examining the different points mentioned using synthetic event datasets. We show that the tuning of the SNN parameters is essential for the model to be able to extract useful features. We also show that the model is able to learn incrementally by presenting it with new classes without deteriorating the performance on the mastered classes. Finally, we discuss some limitations, especially on the weight learning, suggesting the possibility of more delay learning, which are still not very well exploited and which could mark a break with ANNs
Vignais, Nicolas. "Mise en oeuvre et évaluation d’une méthodologie fondée sur la réalité virtuelle pour l’analyse de la prise d’informations visuelles du gardien de but de handball." Rennes 2, 2009. http://tel.archives-ouvertes.fr/tel-00451040/fr/.
Visual perception is a basic element allowing us to interact with our environment. During sport activities, the visual information uptake enable an athlete to extract and select visual cues necessary to anticipate the opposing action. In the field of sports, visual information uptake analysis has been widely carried out but all the methodologies used involve functional and material limits. The purpose of this work is to evaluate and introduce an innovative methodology based on virtual reality for analyzing visual information uptake. This methodology is applied to the handball goalkeeper activity in a duel situation. Firstly, the results obtained with our methodology and with a video-based technique are compared in order to demonstrate the interest of virtual reality in the field of sport. Secondly, we focused our attention on the setting-up of our methodology. Specifically, we aimed to analyze the influence of the graphical level of detail of the throwing action on the goalkeeper’s performance. At last, our methodology is used to analyze the visual information uptake of the handball goalkeeper. More precisely, the relative importance of visual cues from ball trajectory and throwing motion is estimated
Ujjwal, Ujjwal. "Gestion du compromis vitesse-précision dans les systèmes de détection de piétons basés sur apprentissage profond." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4087.
The main objective of this thesis is to improve the detection performance of deep learning based pedestrian detection systems without sacrificing detection speed. Detection speed and accuracy are traditionally known to be at trade-off with one another. Thus, this thesis aims to handle this trade-off in a way that amounts to faster and better pedestrian detection. To achieve this, we first conduct a systematic quantitative analysis of various deep learning techniques with respect to pedestrian detection. This analysis allows us to identify the optimal configuration of various deep learning components of a pedestrian detection pipeline. We then consider the important question of convolutional layer selection for pedestrian detection and propose a pedestrian detection system called Multiple-RPN, which utilizes multiple convolutional layers simultaneously. We propose Multiple-RPN in two configurations -- early-fused and late-fused; and go on to demonstrate that early fusion is a better approach than late fusion for detection across scales and occlusion levels of pedestrians. This work furthermore, provides a quantitative demonstration of the selectivity of various convolutional layers to pedestrian scale and occlusion levels. We next, integrate the early fusion approach with that of pseudo-semantic segmentation to reduce the number of processing operations. In this approach, pseudo-semantic segmentation is shown to reduce false positives and false negatives. This coupled with reduced number of processing operations results in improved detection performance and speed (~20 fps) simultaneously; performing at state-of-art level on caltechreasonable (3.79% miss-rate) and citypersons (7.19% miss-rate) datasets. The final contribution in this thesis is that of an anchor classification layer, which further reduces the number of processing operations for detection. The result is doubling of detection speed (~40 fps) with a minimal loss in detection performance (3.99% and 8.12% miss-rate in caltech-reasonable and citypersons datasets respectively) which is still at the state-of-art standard
Trujillo, Morales Noël. "Stratégie de perception pour la compréhension de scènes par une approche focalisante, application à la reconnaissance d'objets." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2007. http://tel.archives-ouvertes.fr/tel-00926395.
Far, Aïcha Beya. "Analyse multi-images : Application à l'extraction contrôlés d'indices images et à la détermination de descriptions scéniques." Université Louis Pasteur (Strasbourg) (1971-2008), 2005. https://publication-theses.unistra.fr/public/theses_doctorat/2005/FAR_Aicha_Beya_2005.pdf.
Computer vision based applications requesting a quantitative evaluation of machined parts, require efficient tools for the computation of 3D descriptions. Stereovision is a powerful technique for building 3D information out of two images. Accordingly, we have contributed to the development of an automated strereoscopic approach in order to determine these descriptions, by studying more specifically the following points :- Matching method adapted to the nature of the images to process. The approach suggested is tailored for matching contours and relies on the bidirectional estimation of the epipolar geometry in the stereoscopic image pair, as well as on the comparison of real and synthetic data in order to select the set of contours to be matched. This comparison exploits a priori knowledge (CAD models) as a constraint for further processing in order to keep only the contours of the object seen in the two images. - Illumination parameter control through adequate modeling of the illumination artefacts. This allows to adjust image processing to the illumination conditions observed locally in the image and to anticipate the relevance of information contained in the real images. - Dynamic planning using a control system relying on situation graph trees. The interest lies in the adaptation capacity of the system as a function of results obtained while processing the data. Thus, decision rules allowing the control system to adapt the processing on line have been devised. These rules rely on the adjustment of the scene illumination and, secondly, on the displacement of the stereoscopic sensor. Integrated into a processing chain, the developed modules provide a partial reconstruction of the objects to evaluate. This reconstruction can then be compared to the corresponding CAD model in order to evaluate the object
Gidel, Samuel. "Méthodes de détection et de suivi multi-piétons multi-capteurs embarquées sur un véhicule routier : application à un environnement urbain." Clermont-Ferrand 2, 2010. http://www.theses.fr/2010CLF22028.
Vaquette, Geoffrey. "Reconnaissance robuste d'activités humaines par vision." Thesis, Sorbonne université, 2018. http://www.theses.fr/2018SORUS090.
This thesis focuses on supervised activity segmentation from video streams within application context of smart homes. Three semantic levels are defined, namely gesture, action and activity, this thesis focuses mainly on the latter. Based on the Deeply Optimized Hough Transform paridigm, three fusion levels are introduced in order to benefit from various modalities. A review of existing action based datasets is presented and the lack of activity detection oriented database is noticed. Then, a new dataset is introduced. It is composed of unsegmented long time range daily activities and has been recorded in a realistic environment. Finaly, a hierarchical activity detection method is proposed aiming to detect high level activities from unsupervised action detection
Benamrane, Nacéra. "Contribution à la vision stéréoscopique par mise en correspondance de régions." Valenciennes, 1994. https://ged.uphf.fr/nuxeo/site/esupversions/f861a6a0-1e2f-489c-8859-05c0368d8969.
Kaiser, Adrien. "Analyse de scène temps réel pour l'interaction 3D." Electronic Thesis or Diss., Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT025.
This PhD thesis focuses on the problem of visual scene analysis captured by commodity depth sensors to convert their data into high level understanding of the scene. It explores the use of 3D geometry analysis tools on visual depth data in terms of enhancement, registration and consolidation. In particular, we aim to show how shape abstraction can generate lightweight representations of the data for fast analysis with low hardware requirements. This last property is important as one of our goals is to design algorithms suitable for live embedded operation in e.g., wearable devices, smartphones or mobile robots. The context of this thesis is the live operation of 3D interaction on a mobile device, which raises numerous issues including placing 3D interaction zones with relation to real surrounding objects, tracking the interaction zones in space when the sensor moves and providing a meaningful and understandable experience to non-expert users. Towards solving these problems, we make contributions where scene abstraction leads to fast and robust sensor localization as well as efficient frame data representation, enhancement and consolidation. While simple geometric surface shapes are not as faithful as heavy point sets or volumes to represent observed scenes, we show that they are an acceptable approximation and their light weight makes them well balanced between accuracy and performance
Papadakis, Nicolas. "Assimilation de données d'images : application au suivi de courbes et de champs de vecteurs." Rennes 1, 2007. ftp://ftp.irisa.fr/techreports/theses/2007/papadakis.pdf.
This thesis presents the use of sequential and variational methods for tracking applications in image sequences. These techniques aim at estimating a system state from a dynamical model and a set of noisy and sparse observations. We first apply these methods to various tracking problems of computer vision (with an imperfect modelisation of the dynamical model): curve tracking, fluid motion estimation and joint tracking of curve and motion. We thus show that data assimilation enables to deal with complete data occlusions. Two particular applications where an accurate modelisation of the dynamic can be considered are finally studied: atmospheric layer motion estimation from satellite imagery and control of low order dynamical system from experimental visualisation
Crivelli, Tomás. "Modèles de Markov à états mixtes pour l'analyse du mouvement dans des séquences d'images." Rennes 1, 2010. http://www.theses.fr/2010REN1S009.
This thesis deals with mixed-state random fields and their application to image motion analysis. The approach allows us to consider both discrete and continuous values within a single statistical model, exploiting the interaction between the two types of states. In this context, we identify two possible scenarios. First, we are concerned with the modeling of mixed-state observations. Typically they are obtained from image motion measurements depicting a discrete value at zero (null-motion) and continuous motion values. Such motion maps extracted from dynamic texture video sequences are suitable to be modeled as mixed-state Markov fields. We thus design parametric models of motion textures based on the theory of mixed-state Markov random fields and mixed-state Markov chains. We apply them for motion texture characterization, recognition, segmentation and tracking. The second scenario involves inferring mixed-state random variables for simultaneous decision-estimation problems. In this case, the discrete state is a symbolic value indicating an abstract label. Such problems need to be solved jointly and the mixed-state framework can be exploited in order to model the natural coupling that exists between them. In this context, we address the problem of motion detection (decision problem) and background reconstruction (estimation problem). An accurate estimation of the background is only possible if we locate the moving objects; meanwhile, a correct motion detection is achieved if we have a good available background representation. Solving the motion detection and the background reconstruction jointly reduces to obtain a single optimal estimate of a mixed-state process
Louvat, Benoît. "Analyse de séquences d'images à cadence vidéo pour l'asservissement d'une caméra embarquée sur un drone." Grenoble INPG, 2008. https://tel.archives-ouvertes.fr/tel-00380091.
This thesis deals with visual servoing for a pan and tilt camera embedded in a drone. The aim is to control the camera in order to track any fixed object on the ground without knowledge about shape or texture and to keep it centered in the image. In a first part, an algorithm that combines global and local motion estimation is proposed. In a second part, the control of the system is based on a double closed loop : the outer one includes the video analysis while the inner one controls the pan and tilt speed. In order to improve the time response of the system we propose a new upsampling scheme. Upsampling means that controls are sent to the pan and tilt actuator during the convergence of the image analysis algorithm and not at the end as usual. We also propose a LQR controller for removing offset and non-linearities. Simulations and experimentations in real conditions show the effectiveness of the proposed scheme
Amer, Fawzy. "Les algorithmes d'extraction de contours ligne par ligne." Compiègne, 1986. http://www.theses.fr/1986COMPI235.
Minetto, Rodrigo. "Reconnaissance de zones de texte et suivi d'objets dans les images et les vidéos." Paris 6, 2012. http://www.theses.fr/2012PA066108.
In this thesis we address three computer vision problems: (1) the detection and recognition of flat text objects in images of real scenes; (2) the tracking of such text objects in a digital video; and (3) the tracking an arbitrary three-dimensional rigid object with known markings in a digital video. For each problem we developed innovative algorithms, which are at least as accurate and robust as other state-of-the-art algorithms. Specifically, for text recognition we developed (and extensively evaluated) a new HOG-based descriptor specialized for Roman script, which we call T-HOG, and showed its value as a post-filter for an existing text detector (SnooperText). We also improved the SnooperText algorithm by using the multi-scale technique to handle widely different letter sizes while limiting the sensitivity of the algorithm to various artifacts. For text tracking, we describe four basic ways of combining a text detector and a text tracker, and we developed a specific tracker based on a particle-filter which exploits the T-HOG recognizer. For rigid object tracking we developed a new accurate and robust algorithm (AFFTrack) that combines the KLT feature tracker with an improved camera calibration procedure. We extensively tested our algorithms on several benchmarks well-known in the literature. We also created benchmarks (publicly available) for the evaluation of text detection and tracking and rigid object tracking algorithms
Pham, Haonhiên. "Contribution à la définition d'un système de vision bidimensionelle orienté objets : implantation des modules de base." Compiègne, 1986. http://www.theses.fr/1986COMPI247.
Boucher, Christophe. "Contribution à la fusion d'informations par filtrage non-linéaire : application à l'estimation de la structure et du mouvement 3D dans un contexte multi-capteurs." Littoral, 2000. http://www.theses.fr/2000DUNKA001.
This thesis deals with non-linear filtering for data fusion. One tries to identify the motion and structure of 3D objects viewed by a multisensory system. The dynamics is described by an affine model whose parameters are unknown and the used feature is the line segment. One estimated first the characteristics of the structure and motion from 2D projected data of the scene. The use of Plücker’s reprensentation allowed to recover the whished information from monocular image sequences and the knowledge of the 3D object motion. The use of an active sensor leads to an increase of the system observability. The joint estimation of the 3Dstructure and motion is done using an unique filter which fuses information from sensors to track the 2D features in the image sequences and estimate the positions and motion of the 3D object. The solution lies on a centralized Extended Kalman filter. This method was applied successfully on simulated and real data. Interest lies especially in its independence to the kind of sensor and its capacity to manage a system composed by different sensors. Finally, to avoid the intrinsic drawbacks of Extended Kalman filtering, a first study is led on the contribution of the particle filtering to this non-linear estimation problem
Mordan, Taylor. "Conception d'architectures profondes pour l'interprétation de données visuelles." Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS270.
Nowadays, images are ubiquitous through the use of smartphones and social media. It then becomes necessary to have automatic means of processing them, in order to analyze and interpret the large amount of available data. In this thesis, we are interested in object detection, i.e. the problem of identifying and localizing all objects present in an image. This can be seen as a first step toward a complete visual understanding of scenes. It is tackled with deep convolutional neural networks, under the Deep Learning paradigm. One drawback of this approach is the need for labeled data to learn from. Since precise annotations are time-consuming to produce, bigger datasets can be built with partial labels. We design global pooling functions to work with them and to recover latent information in two cases: learning spatially localized and part-based representations from image- and object-level supervisions respectively. We address the issue of efficiency in end-to-end learning of these representations by leveraging fully convolutional networks. Besides, exploiting additional annotations on available images can be an alternative to having more images, especially in the data-deficient regime. We formalize this problem as a specific kind of multi-task learning with a primary objective to focus on, and design a way to effectively learn from this auxiliary supervision under this framework
Elloumi, Wael. "Contributions à la localisation de personnes par vision monoculaire embarquée." Phd thesis, Université d'Orléans, 2012. http://tel.archives-ouvertes.fr/tel-00843634.
Deléarde, Robin. "Configurations spatiales et segmentation pour la compréhension de scènes, application à la ré-identification." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7020.
Modeling the spatial configuration of objects in an image is a subject that is still little discussed to date, including in the most modern computer vision approaches such as convolutional neural networks ,(CNN). However, it is an essential aspect of scene perception, and integrating it into the models should benefit many tasks in the field, by helping to bridge the “semantic gap” between the digital image and the interpretation of its content. Thus, this thesis aims to improve spatial configuration modeling ,techniques, in order to exploit it in description and recognition systems. ,First, we looked at the case of the spatial configuration between two objects, by proposing an improvement of an existing descriptor. This new descriptor called “force banner” is an extension of the histogram of the same name to a whole range of forces, which makes it possible to better describe complex configurations. We were able to show its interest in the description of scenes, by learning toautomatically classify relations in natural language from pairs of segmented objects. We then tackled the problem of the transition to scenes containing several objects and proposed an approach per object by confronting each object with all the others, rather than having one descriptor per pair. Secondly, the industrial context of this thesis led us to deal with an application to the problem of re-identification of scenes or objects, a task which is similar to fine recognition from few examples. To do so, we rely on a traditional approach by describing scene components with different descriptors dedicated to specific characteristics, such as color or shape, to which we add the spatial configuration. The comparison of two scenes is then achieved by matching their components thanks to these characteristics, using the Hungarian algorithm for instance. Different combinations of characteristics can be considered for the matching and for the final score, depending on the present and desired invariances. For each one of these two topics, we had to cope with the problems of data and segmentation. We then generated and annotated a synthetic dataset, and exploited two existing datasets by segmenting them, in two different frameworks. The first approach concerns object-background segmentation and more precisely the case where a detection is available, which may help the segmentation. It consists in using an existing global segmentation model and exploiting the detection to select the right segment, by using several geometric and semantic criteria. The second approach concerns the decomposition of a scene or an object into parts and addresses the unsupervised case. It is based on the color of the pixels, by using a clustering method in an adapted color space, such as the HSV cone that we used. All these works have shown the possibility of using the spatial configuration for the description of real scenes containing several objects, as well as in a complex processing chain such as the one we used for re-identification. In particular, the force histogram could be used for this, which makes it possible to take advantage of its good performance, by using a segmentation method adapted to the use case when processing natural images
Huguet, Frédéric. "Modélisation et calcul du flot de scène stéréoscopique par une méthode variationnelle." Phd thesis, Grenoble 1, 2009. http://www.theses.fr/2009GRE10053.
The scene flow is the displacement vector of any surface points estimated between two consecutive moments. Mathematically it is a three-dimensionnal vector field. This one is useful when any surface temporal deformation has to be studied, using two or more cameras. This thesis handles the scene flow computation and shows the use of this one for a geophysical project. In this aim, we worked with the geophysics sciences laboratory named Geosciencez Azur, which is located in Sophia Antipolis (Alpes Maritimes, UMR 6526 - CNRS - UNSA - UPMC- IRD). This paper presents a method for scene flow estimation from a calibrated stereo image sequence. The scene flow contains the 3-D displacement field of scene points, so that the 2-D optical flow can be seen as a projection of the scene flow onto the images. We propose to recover the scene flow by coupling the optical flow estimation in both cameras with dense stereo matching between the images, thus reducing the number of unknowns per image point. Moreover our approach handles occlusions both for the optical flow and the stereo. We obtain a partial differential equations system coupling both the optical flow and the stereo, which is numerically solved using an original multi-resolution algorithm. Whereas previous variational methods were estimating the 3-D reconstruction at time t and the scene flow separately, our method jointly estimates both. We present numerical results on synthetic data with ground truth information, and we also compare the accuracy of the scene flow projected in one camera with a state-of-the-art single-camera optical flow computation method. Results are also presented on a real stereo sequence with large motion and stereo discontinuities. We finally present the original approach developed in Geosciences Azur to study the gravitary mountain landslides, the 3D physical modelling. We describe the experimental stereo device used to track the deformations of the reduced moutain model used by the geophysicists. 3D reconstruction and scene flow results are shown, as well as the tracking of the observed surface deformations, in the fourth chapter of the thesis
Huguet, Frédéric. "Modélisation et calcul du flot de scène stéréoscopique par une méthode variationnelle." Phd thesis, Université Joseph Fourier (Grenoble), 2009. http://tel.archives-ouvertes.fr/tel-00421958.
Cette thèse traite de l'estimation du flot de scène et d'une application dans le domaine de la géophysique. Elle s'est déroulée dans le cadre de l'ACI GEOLSTEREO, en collaboration étroite avec le laboratoire Geosciences Azur, situé à Sophia Antipolis (06, UMR 6526 - CNRS - UNSA - UPMC- IRD).
Nous proposons d'estimer le flot de scène en couplant l'évaluation du flot optique dans les séquences d'images associées à chaque caméra, à l'estimation de la correspondance stéréo dense entre les images. De plus, notre approche évalue, en même temps que le flot de scène, les occultations à la fois en flot optique et en stéréo. Nous obtenons au final un système d'EDP couplant le flot optique et la stéréo, que nous résolvons numériquement à l'aide d'un algorithme multirésolution original.
Alors que les précédentes méthodes variationnelles estimaient la reconstrution 3D au temps $t$ et le flot de scène séparément, notre méthode estime les deux simultanément. Nous présentons des résultats numériques sur des séquences synthétiques avec leur vérité terrain, et nous comparons également la précision du flot de scène projeté dans une caméra avec une méthode récente et performante d'estimation variationnelle du flot optique. Des résultats sont présentés sur une séquence stéréo réelle, se rapportant à un mouvement non rigide et à de larges discontinuités.
Enfin, nous présentons l'approche originale de modélisation physique 3D utilisée au laboratoire Geosciences Azur. Nous décrivons la mise en place du dispositif stéréoscopique associé, ainsi que le déroulement de l'expérience. Des résultats de reconstruction 3D, d'estimation du flot de scène, et de suivi de la déformation d'une surface sont montrés dans le chapitre 4 de la thèse.
Benabbas, Yassine. "Analyse du comportement humain à partir de la vidéo en étudiant l'orientation du mouvement." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2012. http://tel.archives-ouvertes.fr/tel-00839699.
Mangin, Franck. "Amélioration de la détection de contours en imagerie artificielle par un modèle coopératif multi-résolution." Nice, 1994. http://www.theses.fr/1994NICE4715.
Zhang, Yiqun. "Contribution à l'étude de la vision dynamique : une approche basée sur la géométrie projective." Compiègne, 1993. http://www.theses.fr/1993COMPD650.
Kaiser, Adrien. "Analyse de scène temps réel pour l'interaction 3D." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLT025/document.
This PhD thesis focuses on the problem of visual scene analysis captured by commodity depth sensors to convert their data into high level understanding of the scene. It explores the use of 3D geometry analysis tools on visual depth data in terms of enhancement, registration and consolidation. In particular, we aim to show how shape abstraction can generate lightweight representations of the data for fast analysis with low hardware requirements. This last property is important as one of our goals is to design algorithms suitable for live embedded operation in e.g., wearable devices, smartphones or mobile robots. The context of this thesis is the live operation of 3D interaction on a mobile device, which raises numerous issues including placing 3D interaction zones with relation to real surrounding objects, tracking the interaction zones in space when the sensor moves and providing a meaningful and understandable experience to non-expert users. Towards solving these problems, we make contributions where scene abstraction leads to fast and robust sensor localization as well as efficient frame data representation, enhancement and consolidation. While simple geometric surface shapes are not as faithful as heavy point sets or volumes to represent observed scenes, we show that they are an acceptable approximation and their light weight makes them well balanced between accuracy and performance
Pérez, Patricio Madain. "Stéréovision dense par traitement adaptatif temps réel : algorithmes et implantation." Lille 1, 2005. https://ori-nuxeo.univ-lille1.fr/nuxeo/site/esupversions/0c4f5769-6f43-455c-849d-c34cc32f7181.
Joubert, Eric. "Reconstruction de surfaces en trois dimensions par analyse de la polarisation de la lumière réfléchie par les objets de la scène." Rouen, 1993. http://www.theses.fr/1993ROUES052.
Hamdoun, Omar. "Détection et ré-identification de piétons par points d'intérêt entre caméras disjointes." Phd thesis, École Nationale Supérieure des Mines de Paris, 2010. http://pastel.archives-ouvertes.fr/pastel-00566417.
Weinzaepfel, Philippe. "Le mouvement en action : estimation du flot optique et localisation d'actions dans les vidéos." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM013/document.
With the recent overwhelming growth of digital video content, automatic video understanding has become an increasingly important issue.This thesis introduces several contributions on two automatic video understanding tasks: optical flow estimation and human action localization.Optical flow estimation consists in computing the displacement of every pixel in a video andfaces several challenges including large non-rigid displacements, occlusions and motion boundaries.We first introduce an optical flow approach based on a variational model that incorporates a new matching method.The proposed matching algorithm is built upon a hierarchical multi-layer correlational architecture and effectively handles non-rigid deformations and repetitive textures.It improves the flow estimation in the presence of significant appearance changes and large displacements.We also introduce a novel scheme for estimating optical flow based on a sparse-to-dense interpolation of matches while respecting edges.This method leverages an edge-aware geodesic distance tailored to respect motion boundaries and to handle occlusions.Furthermore, we propose a learning-based approach for detecting motion boundaries.Motion boundary patterns are predicted at the patch level using structured random forests.We experimentally show that our approach outperforms the flow gradient baseline on both synthetic data and real-world videos,including an introduced dataset with consumer videos.Human action localization consists in recognizing the actions that occur in a video, such as `drinking' or `phoning', as well as their temporal and spatial extent.We first propose a novel approach based on Deep Convolutional Neural Network.The method extracts class-specific tubes leveraging recent advances in detection and tracking.Tube description is enhanced by spatio-temporal local features.Temporal detection is performed using a sliding window scheme inside each tube.Our approach outperforms the state of the art on challenging action localization benchmarks.Second, we introduce a weakly-supervised action localization method, ie, which does not require bounding box annotation.Action proposals are computed by extracting tubes around the humans.This is performed using a human detector robust to unusual poses and occlusions, which is learned on a human pose benchmark.A high recall is reached with only several human tubes, allowing to effectively apply Multiple Instance Learning.Furthermore, we introduce a new dataset for human action localization.It overcomes the limitations of existing benchmarks, such as the diversity and the duration of the videos.Our weakly-supervised approach obtains results close to fully-supervised ones while significantly reducing the required amount of annotations
Duong, Nam duong. "Hybrid Machine Learning and Geometric Approaches for Single RGB Camera Relocalization." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0008.
In the last few years, image-based camera relocalization becomes an important issue of computer vision applied to augmented reality, robotics as well as autonomous vehicles. Camera relocalization refers to the problematic of the camera pose estimation including both 3D translation and 3D rotation. In localization systems, camera relocalization component is necessary to retrieve camera pose after tracking lost, rather than restarting the localization from scratch.This thesis aims at improving the performance of camera relocalization in terms of both runtime and accuracy as well as handling challenges of camera relocalization in dynamic environments. We present camera pose estimation based on combining multi-patch pose regression to overcome the uncertainty of end-to-end deep learning methods. To balance between accuracy and computational time of camera relocalization from a single RGB image, we propose a sparse feature hybrid methods. A better prediction in the machine learning part of our methods leads to a rapid inference of camera pose in the geometric part. To tackle the challenge of dynamic environments, we propose an adaptive regression forest algorithm that adapts itself in real time to predictive model. It evolves by part over time without requirement of re-training the whole model from scratch. When applying this algorithm to our real-time and accurate camera relocalization, we can cope with dynamic environments, especially moving objects. The experiments proves the efficiency of our proposed methods. Our method achieves results as accurate as the best state-of-the-art methods on the rigid scenes dataset. Moreover, we also obtain high accuracy even on the dynamic scenes dataset