Dissertations / Theses on the topic 'Apprentissage de représentations vidéos'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Apprentissage de représentations vidéos.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Francis, Danny. "Représentations sémantiques d'images et de vidéos." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Full textRecent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Mazari, Ahmed. "Apprentissage profond pour la reconnaissance d’actions en vidéos." Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS171.
Full textNowadays, video contents are ubiquitous through the popular use of internet and smartphones, as well as social media. Many daily life applications such as video surveillance and video captioning, as well as scene understanding require sophisticated technologies to process video data. It becomes of crucial importance to develop automatic means to analyze and to interpret the large amount of available video data. In this thesis, we are interested in video action recognition, i.e. the problem of assigning action categories to sequences of videos. This can be seen as a key ingredient to build the next generation of vision systems. It is tackled with AI frameworks, mainly with ML and Deep ConvNets. Current ConvNets are increasingly deeper, data-hungrier and this makes their success tributary of the abundance of labeled training data. ConvNets also rely on (max or average) pooling which reduces dimensionality of output layers (and hence attenuates their sensitivity to the availability of labeled data); however, this process may dilute the information of upstream convolutional layers and thereby affect the discrimination power of the trained video representations, especially when the learned action categories are fine-grained
Franceschi, Jean-Yves. "Apprentissage de représentations et modèles génératifs profonds dans les systèmes dynamiques." Electronic Thesis or Diss., Sorbonne université, 2022. http://www.theses.fr/2022SORUS014.
Full textThe recent rise of deep learning has been motivated by numerous scientific breakthroughs, particularly regarding representation learning and generative modeling. However, most of these achievements have been obtained on image or text data, whose evolution through time remains challenging for existing methods. Given their importance for autonomous systems to adapt in a constantly evolving environment, these challenges have been actively investigated in a growing body of work. In this thesis, we follow this line of work and study several aspects of temporality and dynamical systems in deep unsupervised representation learning and generative modeling. Firstly, we present a general-purpose deep unsupervised representation learning method for time series tackling scalability and adaptivity issues arising in practical applications. We then further study in a second part representation learning for sequences by focusing on structured and stochastic spatiotemporal data: videos and physical phenomena. We show in this context that performant temporal generative prediction models help to uncover meaningful and disentangled representations, and conversely. We highlight to this end the crucial role of differential equations in the modeling and embedding of these natural sequences within sequential generative models. Finally, we more broadly analyze in a third part a popular class of generative models, generative adversarial networks, under the scope of dynamical systems. We study the evolution of the involved neural networks with respect to their training time by describing it with a differential equation, allowing us to gain a novel understanding of this generative model
Saxena, Shreyas. "Apprentissage de représentations pour la reconnaissance visuelle." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM080/document.
Full textIn this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset
Chan, wai tim Stefen. "Apprentissage supervisé d’une représentation multi-couches à base de dictionnaires pour la classification d’images et de vidéos." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAT089/document.
Full textIn the recent years, numerous works have been published on dictionary learning and sparse coding. They were initially used in image reconstruction and image restoration tasks. Recently, researches were interested in the use of dictionaries for classification tasks because of their capability to represent underlying patterns in images. Good results have been obtained in specific conditions: centered objects of interest, homogeneous sizes and points of view.However, without these constraints, the performances are dropping.In this thesis, we are interested in finding good dictionaries for classification.The learning methods classically used for dictionaries rely on unsupervised learning. Here, we are going to study how to perform supervised dictionary learning.In order to push the performances further, we introduce a multilayer architecture for dictionaries. The proposed architecture is based on the local description of an input image and its transformation thanks to a succession of encoding and processing steps. It outputs a vector of features effective for classification.The learning method we developed is based on the backpropagation algorithm which allows a joint learning of the different dictionaries and an optimization solely with respect to the classification cost.The proposed architecture has been tested on MNIST, CIFAR-10 and STL-10 datasets with good results compared to other dicitonary-based methods. The proposed architecture can be extended to video analysis
Nguyen, Thanh Tuan. "Représentations efficaces des textures dynamiques." Electronic Thesis or Diss., Toulon, 2020. https://bu.univ-tln.fr/files/userfiles/file/intranet/travuniv/theses/sciences/2020/2020_Nguyen_ThanhTuan.pdf.
Full textRepresentation of dynamic textures (DTs), well-known as a sequence of moving textures, is a challenge in video analysis for various computer vision applications. It is partly due to disorientation of motions, the negative impacts of the well-known issues on capturing turbulent features: noise, changes of environment, illumination, similarity transformations, etc. In this work, we introduce significant solutions in order to deal with above problems. Accordingly, three streams of those are proposed for encoding DTs: i) based on dense trajectories extracted from a given video; ii) based on robust responses extracted by moment models; iii) based on filtered outcomes which are computed by variants of Gaussian-filtering kernels. In parallel, we also propose several discriminative descriptors to capture spatio-temporal features for above DT encodings. For DT representation based on dense trajectories, we firstly extract dense trajectories from a given video. Motion points along the paths of dense trajectories are then encoded by our xLVP operator, an important extension of Local Vector Patterns (LVP) in a completed encoding context, in order to capture directional dense-trajectory-based features for DT representation.For DT description based on moment models, motivated by the moment-image model, we propose a novel model of moment volumes based on statistical information of spherical supporting regions centered at a voxel. Two these models are then taken into account video analysis to point out moment-based images/volumes. In order to encode the moment-based images, we address CLSP operator, a variant of completed local binary patterns (CLBP). In the meanwhile, our xLDP, an important extension of Local Derivative Patterns (LDP) in a completed encoding context, is introduced to capture spatio-temporal features of the moment-volume-based outcomes. For DT representation based on the Gaussian-based filterings, we will investigate many kinds of filterings as pre-processing analysis of a video to point out its filtered outcomes. After that, these outputs are encoded by discriminative operators to structure DT descriptors correspondingly. More concretely, we exploit the Gaussian-based kernel and variants of high-order Gaussian gradients for the filtering analysis. Particularly, we introduce a novel filtering kernel (DoDG) in consideration of the difference of Gaussian gradients, which allows to point out robust DoDG-filtered components to construct prominent DoDG-based descriptors in small dimension. In parallel to the Gaussian-based filterings, some novel operators will be introduced to meet different contexts of the local DT encoding: CAIP, an adaptation of CLBP to fix the close-to-zero problem caused by separately bipolar features; LRP, based on a concept of a square cube of local neighbors sampled at a center voxel; CHILOP, a generalized formulation of CLBP to adequately investigate local relationships of hierarchical supporting regions. Experiments for DT recognition have validated that our proposals significantly perform in comparison with state of the art. Some of which have performance being very close to deep-learning approaches, expected as one of appreciated solutions for mobile applications due to their simplicity in computation and their DT descriptors in a small number of bins
Ullah, Muhammad Muneeb. "Représentations statistiques supervisées pour la reconnaissance d'actions humaines dans les vidéos." Rennes 1, 2012. https://tel.archives-ouvertes.fr/tel-01063349.
Full textDans cette thèse, nous nous occupons du problème de la reconnaissance d'actions humaines dans les données vidéo réalistes, telles que des films et des vidéos en ligne. La reconnaissance automatique et exacte des actions humaines dans une vidéo est une capacité fascinante. Les applications potentielles vont de la surveillance et de la robotique au diagnostic médical, à la recherche d'images par le contenu et les interfaces homme-ordinateur intelligents. Cette tâche constitue un grand défi à cause des variations importantes dans les apparences des personnes, les fonds dynamiques, les changements d'angle de prise de vue, les conditions de luminosité, les styles d'actions et d'autres facteurs encore. Les représentations de vidéo statistiques basées sur les caractéristiques spatio-temporelles locales se sont dernièrement montrées très efficaces pour la reconnaissance dans les scénarios réalistes. Leur succès peut être attribué à des hypothèses favorables, relatives aux données et à la solidité par rapport à plusieurs variations dans la vidéo. De telles représentations, encodent néanmoins souvent des vidéos par un ensemble désordonné de primitifs de bas niveau. La thèse élargit les méthodes actuelles en développant des caractéristiques plus distinctives et en intégrant un contrôle additionnel dans les sacs de caractéristiques basés sur les représentations vidéo, visant à améliorer la reconnaissance d'actions dans des données vidéos sans contrainte et particulièrement difficiles
Roman, Mathilde. "Représentations et mises en scène de soi dans les vidéos d'artistes." Paris 1, 2005. http://www.theses.fr/2005PA010694.
Full textSafadi, Bahjat. "Indexation sémantique des images et des vidéos par apprentissage actif." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00766904.
Full textLuc, Pauline. "Apprentissage autosupervisé de modèles prédictifs de segmentation à partir de vidéos." Thesis, Université Grenoble Alpes (ComUE), 2019. http://www.theses.fr/2019GREAM024/document.
Full textPredictive models of the environment hold promise for allowing the transfer of recent reinforcement learning successes to many real-world contexts, by decreasing the number of interactions needed with the real world.Video prediction has been studied in recent years as a particular case of such predictive models, with broad applications in robotics and navigation systems.While RGB frames are easy to acquire and hold a lot of information, they are extremely challenging to predict, and cannot be directly interpreted by downstream applications.Here we introduce the novel tasks of predicting semantic and instance segmentation of future frames.The abstract feature spaces we consider are better suited for recursive prediction and allow us to develop models which convincingly predict segmentations up to half a second into the future.Predictions are more easily interpretable by downstream algorithms and remain rich, spatially detailed and easy to obtain, relying on state-of-the-art segmentation methods.We first focus on the task of semantic segmentation, for which we propose a discriminative approach based on adversarial training.Then, we introduce the novel task of predicting future semantic segmentation, and develop an autoregressive convolutional neural network to address it.Finally, we extend our method to the more challenging problem of predicting future instance segmentation, which additionally segments out individual objects.To deal with a varying number of output labels per image, we develop a predictive model in the space of high-level convolutional image features of the Mask R-CNN instance segmentation model.We are able to produce visually pleasing segmentations at a high resolution for complex scenes involving a large number of instances, and with convincing accuracy up to half a second ahead
Mensch, Arthur. "Apprentissage de représentations en imagerie fonctionnelle." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS300/document.
Full textThanks to the advent of functional brain-imaging technologies, cognitive neuroscience is accumulating maps of neural activity responses to specific tasks or stimuli, or of spontaneous activity. In this work, we consider data from functional Magnetic Resonance Imaging (fMRI), that we study in a machine learning setting: we learn a model of brain activity that should generalize on unseen data. After reviewing the standard fMRI data analysis techniques, we propose new methods and models to benefit from the recently released large fMRI data repositories. Our goal is to learn richer representations of brain activity. We first focus on unsupervised analysis of terabyte-scale fMRI data acquired on subjects at rest (resting-state fMRI). We perform this analysis using matrix factorization. We present new methods for running sparse matrix factorization/dictionary learning on hundreds of fMRI records in reasonable time. Our leading approach relies on introducing randomness in stochastic optimization loops and provides speed-up of an order of magnitude on a variety of settings and datasets. We provide an extended empirical validation of our stochastic subsampling approach, for datasets from fMRI, hyperspectral imaging and collaborative filtering. We derive convergence properties for our algorithm, in a theoretical analysis that reaches beyond the matrix factorization problem. We then turn to work with fMRI data acquired on subject undergoing behavioral protocols (task fMRI). We investigate how to aggregate data from many source studies, acquired with many different protocols, in order to learn more accurate and interpretable decoding models, that predicts stimuli or tasks from brain maps. Our multi-study shared-layer model learns to reduce the dimensionality of input brain images, simultaneously to learning to decode these images from their reduced representation. This fosters transfer learning in between studies, as we learn the undocumented cognitive common aspects that the many fMRI studies share. As a consequence, our multi-study model performs better than single-study decoding. Our approach identifies universally relevant representation of brain activity, supported by a few task-optimized networks learned during model fitting. Finally, on a related topic, we show how to use dynamic programming within end-to-end trained deep networks, with applications in natural language processing
Risser-Maroix, Olivier. "Similarité visuelle et apprentissage de représentations." Electronic Thesis or Diss., Université Paris Cité, 2022. http://www.theses.fr/2022UNIP7327.
Full textThe objective of this CIFRE thesis is to develop an image search engine, based on computer vision, to assist customs officers. Indeed, we observe, paradoxically, an increase in security threats (terrorism, trafficking, etc.) coupled with a decrease in the number of customs officers. The images of cargoes acquired by X-ray scanners already allow the inspection of a load without requiring the opening and complete search of a controlled load. By automatically proposing similar images, such a search engine would help the customs officer in his decision making when faced with infrequent or suspicious visual signatures of products. Thanks to the development of modern artificial intelligence (AI) techniques, our era is undergoing great changes: AI is transforming all sectors of the economy. Some see this advent of "robotization" as the dehumanization of the workforce, or even its replacement. However, reducing the use of AI to the simple search for productivity gains would be reductive. In reality, AI could allow to increase the work capacity of humans and not to compete with them in order to replace them. It is in this context, the birth of Augmented Intelligence, that this thesis takes place. This manuscript devoted to the question of visual similarity is divided into two parts. Two practical cases where the collaboration between Man and AI is beneficial are proposed. In the first part, the problem of learning representations for the retrieval of similar images is still under investigation. After implementing a first system similar to those proposed by the state of the art, one of the main limitations is pointed out: the semantic bias. Indeed, the main contemporary methods use image datasets coupled with semantic labels only. The literature considers that two images are similar if they share the same label. This vision of the notion of similarity, however fundamental in AI, is reductive. It will therefore be questioned in the light of work in cognitive psychology in order to propose an improvement: the taking into account of visual similarity. This new definition allows a better synergy between the customs officer and the machine. This work is the subject of scientific publications and a patent. In the second part, after having identified the key components allowing to improve the performances of thepreviously proposed system, an approach mixing empirical and theoretical research is proposed. This secondcase, augmented intelligence, is inspired by recent developments in mathematics and physics. First applied tothe understanding of an important hyperparameter (temperature), then to a larger task (classification), theproposed method provides an intuition on the importance and role of factors correlated to the studied variable(e.g. hyperparameter, score, etc.). The processing chain thus set up has demonstrated its efficiency byproviding a highly explainable solution in line with decades of research in machine learning. These findings willallow the improvement of previously developed solutions
Moradi, Fard Maziar. "Apprentissage de représentations de données dans un apprentissage non-supervisé." Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALM053.
Full textDue to the great impact of deep learning on variety fields of machine learning, recently their abilities to improve clustering approaches have been investi- gated. At first, deep learning approaches (mostly Autoencoders) have been used to reduce the dimensionality of the original space and to remove possible noises (also to learn new data representations). Such clustering approaches that utilize deep learning approaches are called Deep Clustering. This thesis focuses on developing Deep Clustering models which can be used for different types of data (e.g., images, text). First we propose a Deep k-means (DKM) algorithm where learning data representations (through a deep Autoencoder) and cluster representatives (through the k-means) are performed in a joint way. The results of our DKM approach indicate that this framework is able to outperform similar algorithms in Deep Clustering. Indeed, our proposed framework is able to truly and smoothly backpropagate the loss function error through all learnable variables.Moreover, we propose two frameworks named SD2C and PCD2C which are able to integrate respectively seed words and pairwise constraints into end-to-end Deep Clustering frameworks. In fact, by utilizing such frameworks, the users can observe the reflection of their needs in clustering. Finally, the results obtained from these frameworks indicate their ability to obtain more tailored results
Phan, Thi Hai Hong. "Reconnaissance d'actions humaines dans des vidéos avec l'apprentissage automatique." Thesis, Cergy-Pontoise, 2019. http://www.theses.fr/2019CERG1038.
Full textIn recent years, human action recognition (HAR) has attracted the research attention thanks to its various applications such as intelligent surveillance systems, video indexing, human activities analysis, human-computer interactions and so on. The typical issues that the researchers are envisaging can be listed as the complexity of human motions, the spatial and temporal variations, cluttering, occlusion and change of lighting condition. This thesis focuses on automatic recognizing of the ongoing human actions in a given video. We address this research problem by using both shallow learning and deep learning approaches.First, we began the research work with traditional shallow learning approaches based on hand-scrafted features by introducing a novel feature named Motion of Oriented Magnitudes Patterns (MOMP) descriptor. We then incorporated this discriminative descriptor into simple yet powerful representation techniques such as Bag of Visual Words, Vector of locally aggregated descriptors (VLAD) and Fisher Vector to better represent actions. Also, PCA (Principal Component Analysis) and feature selection (statistical dependency, mutual information) are applied to find out the best subset of features in order to improve the performance and decrease the computational expense. The proposed method obtained the state-of-the-art results on several common benchmarks.Recent deep learning approaches require an intensive computations and large memory usage. They are therefore difficult to be used and deployed on the systems with limited resources. In the second part of this thesis, we present a novel efficient algorithm to compress Convolutional Neural Network models in order to decrease both the computational cost and the run-time memory footprint. We measure the redundancy of parameters based on their relationship using the information theory based criteria, and we then prune the less important ones. The proposed method significantly reduces the model sizes of different networks such as AlexNet, ResNet up to 70% without performance loss on the large-scale image classification task.Traditional approach with the proposed descriptor achieved the great performance for human action recognition but only on small datasets. In order to improve the performance on the large-scale datasets, in the last part of this thesis, we therefore exploit deep learning techniques to classify actions. We introduce the concepts of MOMP Image as an input layer of CNNs as well as incorporate MOMP image into deep neural networks. We then apply our network compression algorithm to accelerate and improve the performance of system. The proposed method reduces the model size, decreases the over-fitting, and thus increases the overall performance of CNN on the large-scale action datasets.Throughout the thesis, we have showed that our algorithms obtain good performance in comparison to the state-of-the-art on challenging action datasets (Weizmann, KTH, UCF Sports, UCF-101 and HMDB51) with low resource required
Bouindour, Samir. "Apprentissage profond appliqué à la détection d'événements anormaux dans les flux vidéos." Electronic Thesis or Diss., Troyes, 2019. http://www.theses.fr/2019TROY0036.
Full textThe use of surveillance cameras has increased considerably in recent years. This proliferation poses a major societal problem, which is the exploitation of the generated video streams. Currently, most of these data are being analyzed by human operators. However, several studies question the relevance of this approach. It is time-consuming and laborious for an operator to monitor surveillance videos for long time periods. Given recent advances in computer vision, particularly through deep learning, one solution to this problem consists in the development of intelligent systems that can support the human operator in the exploitation of this data. These intelligent systems will aim to model the normal behaviours of a monitored scene and detect any deviant event that could lead to a security breach. Within the context of this thesis entitled "Deep learning applied to the detection of abnormal events in video streams", we propose to develop algorithms based on deep learning for the detection and localization of abnormal video events that may reflect dangerous situations. The purpose is to extract robust spatial and temporal descriptors and define classification algorithms adapted to detect suspicious behaviour with the minimum possible number of false alarms, while ensuring a high detection rate
Calandre, Jordan. "Analyse non intrusive du geste sportif dans des vidéos par apprentissage automatique." Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS040.
Full textIn this thesis, we are interested in the characterization and fine-grained analysis of sports gestures in videos, and more particularly in non-intrusive 3D analysis using a single camera. Our case study is table tennis. We propose a method for reconstructing 3D ball positions using a high-speed calibrated camera (240fps). For this, we propose and train a convolutional network that extracts the apparent diameter of the ball from the images. The knowledge of the real diameter of the ball allows us to compute the distance between the camera and the ball, and then to position the latter in a 3D coordinate system linked to the table. Then, we use a physical model, taking into account the Magnus effect, to estimate the kinematic parameters of the ball from its successive 3D positions. The proposed method segments the trajectories from the impacts of the ball on the table or the racket. This allows, using a physical model of rebound, to refinethe estimates of the kinematic parameters of the ball. It is then possible to compute the racket's speed and orientation after the stroke and to deduce relevant performance indicators. Two databases have been built: the first one is made of real game sequence acquisitions. The second is a synthetic dataset that reproduces the acquisition conditions of the previous one. This allows us to validate our methods as the physical parameters used to generate it are known.Finally, we present our participation to the Sport\&Vision task of the MediaEval challenge on the classification of human actions, using approaches based on the analysis and representation of movement
Tchobanov, Atanas. "Représentations et apprentissage des primitives phonologiques : ^pproche neuromimétique." Paris 10, 2002. http://www.theses.fr/2002PA100018.
Full textWe develop the idea that the basic phonological objects : features, phonemes and syllables are represented at the level of cortical activity by coherent neuron assemblies' reverberations. Thes assemblies of hebbian type are located at cortex areas specializing in the process of phonological planning-production (Broca) and perception-comprehension (Wernicke). Neurobiological and connectionist simulations data support the view that synchronous activity of neurons from distant areas can be rapidly obtained if the model respects some neurobiological properties. We claim that phonology should be neurologically plausible. Using a well-studied coding scheme as the temporal synchrony of neuron activity gives representations a cognitive realism. Resulting patters are generic, not specially phonological and might be reused in modeling other linguistics and cognitive phenomena. .
Melouki, Brahim. "Apprentissage du français en Palestine : motivations et représentations." Rouen, 2011. http://www.theses.fr/2011ROUEL013.
Full textTonnelier, Emeric. "Apprentissage de représentations pour les traces de mobilité." Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS389.
Full textUrban transport is a crucial issue for territories management. In large cities, many inhabitants have to rely on urban public transport to move around, go to work, visit friends. Historically, urban transportation analysis is based on surveys. Questions are ask to a panel of users, leading to the introduction of various bias and no dynamic informations. Since the late 1990s, we see the emergence of new types of data (GPS, smart cards log, etc.) that describe the mobility and of individuals in the city. Available in large quantities, sampled precisely, but containing few semantics and a lot of noise, they allow a monitoring of the individuals's mobility in the medium term. During this thesis, we propose to work on the modeling of users and the network on the one hand, and the detection of anomalies on the other hand. We will do so using data collected automatically in a context of urban transport networks and using machine learning methods. Moreover, we will focus on the design of methods suited to deal with the particularities of mobility data. We will see that the user-oriented modeling of a transport network allows to obtain fine and robust profiles that can be aggregated efficiently in order to obtain a more precise and more descriptive valuation of the network than a network-oriented modeling. Then, we will explain that the use of these profiles makes it possible to handle complex tasks such as anomaly detection or partitioning of network stations. Finally we will show that the contextualization of the models (spatial context, temporal, shared behaviors) improves the quantitative and qualitative performances
Bisot, Victor. "Apprentissage de représentations pour l'analyse de scènes sonores." Electronic Thesis or Diss., Paris, ENST, 2018. http://www.theses.fr/2018ENST0016.
Full textThis thesis work focuses on the computational analysis of environmental sound scenes and events. The objective of such tasks is to automatically extract information about the context in which a sound has been recorded. The interest for this area of research has been rapidly increasing in the last few years leading to a constant growth in the number of works and proposed approaches. We explore and contribute to the main families of approaches to sound scene and event analysis, going from feature engineering to deep learning. Our work is centered at representation learning techniques based on nonnegative matrix factorization, which are particularly suited to analyse multi-source environments such as acoustic scenes. As a first approach, we propose a combination of image processing features with the goal of confirming that spectrograms contain enough information to discriminate sound scenes and events. From there, we leave the world of feature engineering to go towards automatically learning the features. The first step we take in that direction is to study the usefulness of matrix factorization for unsupervised feature learning techniques, especially by relying on variants of NMF. Several of the compared approaches allow us indeed to outperform feature engineering approaches to such tasks. Next, we propose to improve the learned representations by introducing the TNMF model, a supervised variant of NMF. The proposed TNMF models and algorithms are based on jointly learning nonnegative dictionaries and classifiers by minimising a target classification cost. The last part of our work highlights the links and the compatibility between NMF and certain deep neural network systems by proposing and adapting neural network architectures to the use of NMF as an input representation. The proposed models allow us to get state of the art performance on scene classification and overlapping event detection tasks. Finally we explore the possibility of jointly learning NMF and neural networks parameters, grouping the different stages of our systems in one optimisation problem
Tamaazousti, Youssef. "Vers l’universalité des représentations visuelle et multimodales." Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC038/document.
Full textBecause of its key societal, economic and cultural stakes, Artificial Intelligence (AI) is a hot topic. One of its main goal, is to develop systems that facilitates the daily life of humans, with applications such as household robots, industrial robots, autonomous vehicle and much more. The rise of AI is highly due to the emergence of tools based on deep neural-networks which make it possible to simultaneously learn, the representation of the data (which were traditionally hand-crafted), and the task to solve (traditionally learned with statistical models). This resulted from the conjunction of theoretical advances, the growing computational capacity as well as the availability of many annotated data. A long standing goal of AI is to design machines inspired humans, capable of perceiving the world, interacting with humans, in an evolutionary way. We categorize, in this Thesis, the works around AI, in the two following learning-approaches: (i) Specialization: learn representations from few specific tasks with the goal to be able to carry out very specific tasks (specialized in a certain field) with a very good level of performance; (ii) Universality: learn representations from several general tasks with the goal to perform as many tasks as possible in different contexts. While specialization was extensively explored by the deep-learning community, only a few implicit attempts were made towards universality. Thus, the goal of this Thesis is to explicitly address the problem of improving universality with deep-learning methods, for image and text data. We have addressed this topic of universality in two different forms: through the implementation of methods to improve universality (“universalizing methods”); and through the establishment of a protocol to quantify its universality. Concerning universalizing methods, we proposed three technical contributions: (i) in a context of large semantic representations, we proposed a method to reduce redundancy between the detectors through, an adaptive thresholding and the relations between concepts; (ii) in the context of neural-network representations, we proposed an approach that increases the number of detectors without increasing the amount of annotated data; (iii) in a context of multimodal representations, we proposed a method to preserve the semantics of unimodal representations in multimodal ones. Regarding the quantification of universality, we proposed to evaluate universalizing methods in a Transferlearning scheme. Indeed, this technical scheme is relevant to assess the universal ability of representations. This also led us to propose a new framework as well as new quantitative evaluation criteria for universalizing methods
Ez-Zaher, Ahmed. "Représentations métaphonologiques et apprentissage de la lecture en arabe." Toulouse 2, 2004. http://www.theses.fr/2004TOU20028.
Full textThis study was designed to examine the relation between phonological awareness and learning to read arabic. The main hypothesis holds that, unlike other alphabetic languages, syllabic awareness may play important role in learning to read. Some phonological and orthographic characteristics of the arabic language do have an influence both on phonological awareness children, shows clearly that syllabic awareness is strongly related to learning to read in beginning years, both as prerequisite or as a consequence of this learning. Syllabic segmentation appears much useful to establish letter/sound correspondences in the vowelised script. In contrast, phonemic awareness is needed only later in a second stage when children have to process an unvowelised, deep orthography. It was concluded that in the first stage phonemic awareness is not necessary to acquire reading abilities in vowelised arabic orthography and thus teaching methods must rely on syllabic units to introduce children to literacy
Villon, Sébastien. "Estimation automatisée sur vidéos de la biodiversité et de l’abondance des poissons coralliens." Thesis, Montpellier, 2019. http://www.theses.fr/2019MONTG058.
Full textCoral reefs are home of a great fish biodiversity (approximately 7000 species). This biodiversity is the source of many vital ecosystem services such as protein intakes for local populations, nutrients cycle or regulation of algae abundancy. However, increasing human pressure through over-fishing and global warming is destroying both fish popu-lations and their habitats. In this context, monitoring the coral reef fish biodiversity,abundancy and biomass with precision is one of the major issues for marine ecology. To face the increasing pressure and fast globals changes, such monitoring has to be done at a large sclae, temporally and spatially. Up to date, most of fish underwater census is achieved through diving, during which the diver identify fish species and count them. Such manual census induces many constraints (depth and duration of the dive) and biais due to the diver experience. These biais (mistaking fish species or over/under estimating fish populations) are not quantifiable nor correctable. Today, thanks to the improvement of high resolution, low-cost, underwater cameras, new protocoles are developed to use video census. However, there is not yet a way to automaticaly process these underwater videos.Therefore, the analysis of the videos remains a bottleneck between the data gathering through video census and the analysis of fish communities. During this thesis, we develop-ped automated methods for detection and identification of fish in underwater videos with Deep Learning based algorithm. We work on all aspects of the pipeline, from video acqui-sition, data annotation, to the models and post-processings conception, and models testing. Today, we have gather more than 380,000 images of 300 coral reef species. We developped an identification model who successfully identified 20 of the most common species onMayotte coral reefs with 94% rate of success, and post-processing methods allowing us to decrease the error rate down to 2%. We also developped a detection method allowing us to detect up to 84% of fish individuals in underwater videos
Gaidon, Adrien. "Modèles structurés pour la reconnaissance d'actions dans des vidéos réalistes." Phd thesis, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00780679.
Full textGuilmart, Christophe. "Filtrage de segments informatifs dans des vidéos." Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00668307.
Full textBoisson, Arthur. "Motricité et intégration multi-sensorielle : apprentissage des représentations grapho-phonémiques." Thesis, Lyon, 2018. http://www.theses.fr/2018LYSE2126/document.
Full textIn our daily lives, we are surrounded by audiovisual associations: we perceive and memorize them throughout our lives. However, the mechanisms involved in their learning are not fully understood. In particular, factors such as motor skills that promote such learning are rarely studied from a memory point of view.Thus, the general objectives of this thesis are to: i) study the cognitive mechanisms underlying the learning of audio-visual associations, ii) better understand the impact of motor skills on the effectiveness of its mechanisms, and iii) propose original methodologies likely to increase the effectiveness of these mechanisms and/or compensate for possible deficits.More precisely, this thesis work focuses on the benefit of motor exploration in learning grapho-phonemic correspondences (GPC). In addition to the purely theoretical interest in studying this learning, the importance of this acquisition for young pre-readers adds a practical and pedagogical dimension to this work. What stands out from this thesis is that two areas of study, the one of learning to read and the one of memory are combined. Though both of them deal with learning hence memory, there has never been a real attempt to apply memory models to help understand the mechanisms of learning word reading and writing, and conversely, memory research has rarely looked to research on learning to read and write to validate their assumptions. However, one of the interests of the Act-In model used to support this thesis is precisely to propose an integrated approach to cognitive functioning and not only to memory
Le, Hy Ronan. "Programmation et apprentissage bayésien de comportements pour des personnages synthétiques : applications aux personnages de jeux vidéos." Grenoble INPG, 2007. http://www.theses.fr/2007INPG0040.
Full textWe treat the problem of behaviours for autonomous characters (bots) in virtual worlds, with the example of video games. Our two essential objectives are : to reduce time and difficulty of behaviour development ; to give to the player a new possibility : teaching bots how to play. We propose a method to build behaviours based on Bayesian programming (a formalism to describe probabilist models). It lays on two innovations: a generic technique for definition of elementary tasks, called enhanced fusion by coherence; and a technique for sequencing these elementary tasks, called inverse programming. Ln contrast with classical approaches, this method allows to efficiently learn behaviours by demonstration
Hamadi, Abdelkader. "Utilisation du contexte pour l'indexation sémantique des images et vidéos." Thesis, Grenoble, 2014. http://www.theses.fr/2014GRENM047/document.
Full textThe automated indexing of image and video is a difficult problem because of the``distance'' between the arrays of numbers encoding these documents and the concepts (e.g. people, places, events or objects) with which we wish to annotate them. Methods exist for this but their results are far from satisfactory in terms of generality and accuracy. Existing methods typically use a single set of such examples and consider it as uniform. This is not optimal because the same concept may appear in various contexts and its appearance may be very different depending upon these contexts. In this thesis, we considered the use of context for indexing multimedia documents. The context has been widely used in the state of the art to treat various problems. In our work, we use relationships between concepts as a source of semantic context. For the case of videos, we exploit the temporal context that models relationships between the shots of the same video. We propose several approaches using both types of context and their combination, in different levels of an indexing system. We also present the problem of multiple concept detection. We assume that it is related to the context use problematic. We consider that detecting simultaneously a set of concepts is equivalent to detecting one or more concepts forming the group in a context where the others are present. To do that, we studied and compared two types of approaches. All our proposals are generic and can be applied to any system for the detection of any concept. We evaluated our contributions on TRECVID and VOC collections, which are of international standards and recognized by the community. We achieved good results comparable to those of the best indexing systems evaluated in recent years in the evaluation campaigns cited previously
Amate, Laure. "Apprentissage de modèles de formes parcimonieux basés sur des représentations splines." Phd thesis, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00456612.
Full textDo, Huu Nicolas. "Apprentissage de représentations sensori-motrices pour la reconnaissance d'objet en robotique." Phd thesis, Université Paul Sabatier - Toulouse III, 2007. http://tel.archives-ouvertes.fr/tel-00283073.
Full textBreton, Jean-Luc. "Apprentissage de l'anglais en section européenne au lycée : représentations et pratiques." Phd thesis, Paris 10, 2011. http://tel.archives-ouvertes.fr/tel-00812568.
Full textAmate, Laure. "Apprentissage de modèles de formes parcimonieux basés sur les représentations splines." Nice, 2009. http://www.theses.fr/2009NICE4117.
Full textIn many contexts it is important to be able to find compact representations of the collective morphological properties of a set of objects. This is the case of autonomous robotic platforms operating in natural environments that must use the perceptual properties of the objects present in their workspace to execute their mission. This thesis is a contribution to the definition of formalisms and methods for automatic identification of such models. The shapes we want to characterize are closed curves corresponding to contours of objects detected in the scene. We begin with the formal definition of the notion of shape as classes of equivalence with respect to groups of basic geometric operators, introducing two distinct approaches that have been used in the literature: discrete and continuous. The discrete theory, admitting the existence of a finite number of recognizable landmarks, provides in an obvious manner a compact representation but is sensible to their selection. The continuous theory of shapes provides a more fundamental approach, but leads to shape spaces of infinite dimension, lacking the parsimony of the discrete representation. We thus combine in our work the advantages of both approaches representing shapes of curves with splines: piece-wise continuous polynomials defined by sets of knots and control points. We first study the problem of fitting free-knots splines of varying complexity to a single observed curve. The trade-o_ between the parsimony of the representation and its fidelity to the observations is a well known characteristic of model identification using nested families of increasing dimension. After presenting an overview of methods previously proposed in the literature, we single out a two-step approach which is formally sound and matches our specific requirements. It splits the identification, simulating a reversible jump Markov chain to select the complexity of the model followed by a simulated annealing algorithm to estimate its parameters. We investigate the link between Kendall's shape space and spline representations when we take the spline control points as landmarks. We consider now the more complex problem of modeling a set of objects with similar morphological characteristics. We equate the problem to finding the statistical distribution of the parameters of the spline representation, modeling the knots and control points as unobserved variables. The identified distribution is the maximizer of a marginal likelihood criterion, and we propose a new Expectation-Maximization algorithm to optimize it. Because we may want to treat a large number of curves observed sequentially, we adapt an iterative (on-line) version of the EM algorithm recently proposed in the literature. For the choice of statistical distributions that we consider, both the expectation and the maximization steps must resort to numerical approximations, leading to a stochastic/on-line variant of the EM algorithm that, as far as we know, is implemented here for the first time
Munzer, Thibaut. "Représentations relationnelles et apprentissage interactif pour l'apprentissage efficace du comportement coopératif." Thesis, Bordeaux, 2017. http://www.theses.fr/2017BORD0574/document.
Full textThis thesis presents new approaches toward efficient and intuitive high-level plan learning for cooperative robots. More specifically this work study Learning from Demonstration algorithm for relational domains. Using relational representation to model the world, simplify representing concurrentand cooperative behavior.We have first developed and studied the first algorithm for Inverse ReinforcementLearning in relational domains. We have then presented how one can use the RAP formalism to represent Cooperative Tasks involving a robot and a human operator. RAP is an extension of the Relational MDP framework that allows modeling concurrent activities. Using RAP allow us to represent both the human and the robot in the same process but also to model concurrent robot activities. Under this formalism, we have demonstrated that it is possible to learn behavior, as policy and as reward, of a cooperative team. Prior knowledge about the task can also be used to only learn preferences of the operator.We have shown that, using relational representation, it is possible to learn cooperative behaviors from a small number of demonstration. That these behaviors are robust to noise, can generalize to new states and can transfer to different domain (for example adding objects). We have also introduced an interactive training architecture that allows the system to make fewer mistakes while requiring less effort from the human operator. By estimating its confidence the robot is able to ask for instructions when the correct activity to dois unsure. Lastly, we have implemented these approaches on a real robot and showed their potential impact on an ecological scenario
Zuo, Jingwei. "Apprentissage de représentations et prédiction pour des séries-temporelles inter-dépendantes." Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG038.
Full textTime series is a common data type that has been applied to enormous real-life applications, such as financial analysis, medical diagnosis, environmental monitoring, astronomical discovery, etc. Due to its complex structure, time series raises several challenges in their data processing and mining. The representation of time series plays a key role in data mining tasks and machine learning algorithms for time series. Yet, a few methods consider the interrelation that may exist between different time series when building the representation. Moreover, the time series mining requires considering not only the time series' characteristics in terms of data complexity but also the concrete application scenarios where the data mining task is performed to build task-specific representations.In this thesis, we will study different time series representation approaches that can be used in various time series mining tasks, while capturing the relationships among them. We focus specifically on modeling the interrelations between different time series when building the representations, which can be the temporal relationship within each data source or the inter-variable relationship between various data sources. Accordingly, we study the time series collected from various application contexts under different forms. First, considering the temporal relationship between the observations, we learn the time series in a dynamic streaming context, i.e., time series stream, for which the time series data is continuously generated from the data source. Second, for the inter-variable relationship, we study the multivariate time series (MTS) with data collected from multiple data sources. Finally, we study the MTS in the Smart City context, when each data source is given a spatial position. The MTS then becomes a geo-located time series (GTS), for which the inter-variable relationship requires more modeling efforts with the external spatial information. Therefore, for each type of time series data collected from distinct contexts, the interrelations between the time series observations are emphasized differently, on the temporal or (and) variable axis.Apart from the data complexity from the interrelations, we study various machine learning tasks on time series in order to validate the learned representations. The high-level learning tasks studied in this thesis consist of time series classification, semi-supervised time series learning, and time series forecasting. We show how the learned representations connect with different time series learning tasks under distinct application contexts. More importantly, we conduct the interdisciplinary study on time series by leveraging real-life challenges in machine learning tasks, which allows for improving the learning model's performance and applying more complex time series scenarios.Concretely, for these time series learning tasks, our main research contributions are the following: (i) we propose a dynamic time series representation learning model in the streaming context, which considers both the characteristics of time series and the challenges in data streams. We claim and demonstrate that the Shapelet, a shape-based time series feature, is the best representation in such a dynamic context; (ii) we propose a semi-supervised model for representation learning in multivariate time series (MTS). The inter-variable relationship over multiple data sources is modeled in a real-life context, where the data annotations are limited; (iii) we design a geo-located time series (GTS) representation learning model for Smart City applications. We study specifically the traffic forecasting task, with a focus on the missing-value treatment within the forecasting algorithm
Pop, Ionel. "Détection des événements rares dans des vidéos." Thesis, Lyon 2, 2010. http://www.theses.fr/2010LYO22023.
Full textThe growing number of video data makes often difficult, even impossible, any attemptof watching them entirely. In the context of automatic analysis of videos, a recurring request is to identify moments in the video when something unusual happens.We propose several algorithms to identify unusual events, making the hypothesis that these events have a low probability. We address several types of events, from those generates by moving areas to the trajectories of objects tracked. In the first part of the study, we build a simple tracking system. We propose several measures of similarity between trajectories. These measures give an estimate of the similarity of trajectories by taking into account both spatial and/or temporal aspects. It is possible to differentiate between objects moving on the same path, but with different speeds. Based on these measures, we build models of trajectories representing the common behavior of objects, so that we can identify those that are abnormal.We noticed that the tracking yields bad results, especially in crowd situations. Therefore, we use the optical flow vectors to build a movement model based on a codebook. This model stores the preferred movement directions for each pixel. It is possible to identify abnormal movement at pixel-level, without having to use a tracker. By using temporal coherence, we can further improve the detection rate, affected by errors of estimation of optic flow. In a second step, we change the method of construction of this model. With the new approach, we can extract higher-level features — the equivalent trajectories, but still without the notion of object tracking. In this situation, we can reuse partial trajectory analysis to detect rare events.All aspects presented in this study have been implemented. In addition, we have design some applications, like predicting the trajectories of visible objects or storing and retrieving tracked objects in a database
Barthelemy, Quentin. "Représentations parcimonieuses pour les signaux multivariés." Thesis, Grenoble, 2013. http://www.theses.fr/2013GRENU008/document.
Full textIn this thesis, we study approximation and learning methods which provide sparse representations. These methods allow to analyze very redundant data-bases thanks to learned atoms dictionaries. Being adapted to studied data, they are more efficient in representation quality than classical dictionaries with atoms defined analytically. We consider more particularly multivariate signals coming from the simultaneous acquisition of several quantities, as EEG signals or 2D and 3D motion signals. We extend sparse representation methods to the multivariate model, to take into account interactions between the different components acquired simultaneously. This model is more flexible that the common multichannel one which imposes a hypothesis of rank 1. We study models of invariant representations: invariance to temporal shift, invariance to rotation, etc. Adding supplementary degrees of freedom, each kernel is potentially replicated in an atoms family, translated at all samples, rotated at all orientations, etc. So, a dictionary of invariant kernels generates a very redundant atoms dictionary, thus ideal to represent the redundant studied data. All these invariances require methods adapted to these models. Temporal shift-invariance is an essential property for the study of temporal signals having a natural temporal variability. In the 2D and 3D rotation invariant case, we observe the efficiency of the non-oriented approach over the oriented one, even when data are not revolved. Indeed, the non-oriented model allows to detect data invariants and assures the robustness to rotation when data are revolved. We also observe the reproducibility of the sparse decompositions on a learned dictionary. This generative property is due to the fact that dictionary learning is a generalization of K-means. Moreover, our representations have many invariances that is ideal to make classification. We thus study how to perform a classification adapted to the shift-invariant model, using shift-consistent pooling functions
Hugueney, Bernard. "Représentations symboliques de longues séries temporelles." Paris 6, 2003. http://www.theses.fr/2003PA066161.
Full textKaâniche, Mohamed-Bécha. "Reconnaissance de gestes à partir de séquences vidéos." Phd thesis, Université de Nice Sophia-Antipolis, 2009. http://tel.archives-ouvertes.fr/tel-00428690.
Full textDelanoy, Johanna. "Interprétation et génération de représentations artistiques : applications à la modélisation par le dessin et à la stylisation de vidéos." Thesis, Université Côte d'Azur (ComUE), 2019. http://www.theses.fr/2019AZUR4036.
Full textDigital tools brings new ways of creation, for accomplished artists as well as for any individual willing to create. In this thesis, I am interested in two different aspects in helping artists: interpreting their creation and generating new content. I first study how to interpret a sketch as a 3D object. We propose a data-driven approach that tackles this challenge by training deep convolutional neural networks (CNN) to predict occupancy of a voxel grid from a line drawing. We integrate our CNNs in an interactive modeling system that allows users to seamlessly draw an object, rotate it to see its 3D reconstruction, and refine it by re-drawing from another vantage point using the 3D reconstruction as guidance. We then complement this technique with a geometric method that allows to refine the quality of the final object. To do so, we train an additional CNN to predict higher resolution normal maps from each input view. We then fuse these normal maps with the voxel grid prediction by optimizing for the final surface. We train all of these networks by rendering synthetic contour drawings from procedurally generated abstract shapes. In a second part, I present a method to generate stylized videos with a look reminiscent of traditional 2D animation. Existing stylization methods often retain the 3D motion of the original video, making the result look like a 3D scene covered in paint rather than a 2D painting of a scene. Inspired by cut-out animation, we propose to modify the motion of the sequence so that it is composed of 2D rigid motions. To achieve this goal, our approach applies motion segmentation and optimization to best approximate the input optical flow with piecewise-rigid transforms, and re-renders the video such that its content follows the simplified motion. Applying existing stylization algorithm to the new sequence produce a stylized video more similar to 2D animation. Although the two parts of my thesis lean on different methods, they both rely on traditional techniques used by artists: either by understanding how they draw objects or by taking inspiration from how they simplify the motion in 2D animation
Negin, Farhood. "Vers une reconnaissance des activités humaines non supervisées et des gestes dans les vidéos." Thesis, Université Côte d'Azur (ComUE), 2018. http://www.theses.fr/2018AZUR4246/document.
Full textThe main goal of this thesis is to propose a complete framework for automatic discovery, modeling and recognition of human activities in videos. In order to model and recognize activities in long-term videos, we propose a framework that combines global and local perceptual information from the scene and accordingly constructs hierarchical activity models. In the first variation of the framework, a supervised classifier based on Fisher vector is trained and the predicted semantic labels are embedded in the constructed hierarchical models. In the second variation, to have a completely unsupervised framework, rather than embedding the semantic labels, the trained visual codebooks are stored in the models. Finally, we evaluate the proposed frameworks on two realistic Activities of Daily Living datasets recorded from patients in a hospital environment. Furthermore, to model fine motions of human body, we propose four different gesture recognition frameworks where each framework accepts one or combination of different data modalities as input. We evaluate the developed frameworks in the context of medical diagnostic test namely Praxis. Praxis test is a gesture-based diagnostic test, which has been accepted as a diagnostically indicative of cortical pathologies such as Alzheimer’s disease. We suggest a new challenge in gesture recognition, which is to obtain an objective opinion about correct and incorrect performances of very similar gestures. The experiments show effectiveness of our deep learning based approach in gesture recognition and performance assessment tasks
Isaac, Yoann. "Représentations redondantes pour les signaux d’électroencéphalographie." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112072/document.
Full textThe electroencephalography measures the brain activity by recording variations of the electric field on the surface of the skull. This measurement is usefull in various applications like medical diagnosis, analysis of brain functionning or whithin brain-computer interfaces. Numerous studies have tried to develop methods for analyzing these signals in order to extract various components of interest, however, none of them allows to extract them with sufficient reliabilty. This thesis focuses on the development of approaches considering redundant (overcomoplete) representations for these signals. During the last years, these representations have been shown particularly efficient to describe various classes of signals due to their flexibility. Obtaining such representations for EEG presents some difficuties due to the low signal-to-noise ratio of these signals. We propose in this study to overcome them by guiding the methods considered to physiologically plausible representations thanks to well-suited regularizations. These regularizations are built from prior knowledge about the spatial and temporal properties of these signals. For each regularization, an algorithm is proposed to solve the optimization problem allowing to obtain the targeted representations. The evaluation of the proposed EEG signals approaches highlights their effectiveness in representing them
Maâmatou, Houda. "Apprentissage semi-supervisé pour la détection multi-objets dans des séquences vidéos : Application à l'analyse de flux urbains." Thesis, Université Clermont Auvergne (2017-2020), 2017. http://www.theses.fr/2017CLFAC015/document.
Full textSince 2000, a significant progress has been recorded in research work which has proposed to learn object detectors using large manually labeled and publicly available databases. However, when a generic object detector is applied on images of a specific scene, the detection performances will decrease considerably. This decrease may be explained by the differences between the test samples and the learning ones at viewpoints taken by camera(s), resolution, illumination and background images. In addition, the storage capacity evolution of computer systems, the "video surveillance" democratization and the development of automatic video-data analysis tools have encouraged research into the road-traffic domain. The ultimate aims are the management evaluation of current and future trafic requests, the road infrastructures development based on real necessities, the intervention of maintenance task in time and the continuous road surveillance. Moreover, traffic analysis is a problematicness where several scientific locks should be lifted. These latter are due to a great variety of traffic fluidity, various types of users, as well multiple weather and lighting conditions. Thus, developing automatic and real-time tools to analyse road-traffic videos has become an indispensable task. These tools should allow retrieving rich data concerning the traffic from the video sequence and they must be precise and easy to use. This is the context of our thesis work which proposes to use previous knowledges and to combine it with information extracted from the new scene to specialize an object detector to the new situations of the target scene. In this thesis, we propose to automatically specialize a generic object classifier/detector to a road traffic scene surveilled by a fixed camera. We mainly present two contributions. The first one is an original formalization of Transductive Transfer Learning based on a sequential Monte Carlo filter for automatic classifier specialization. This formalization approximates iteratively the previously unknown target distribution as a set of samples composing the specialized dataset of the target scene. The samples of this dataset are selected from both source dataset and target scene further to a weighting step using some prior information on the scene. The obtained specialized dataset allows training a specialized classifier to the target scene without human intervention. The second contribution consists in proposing two observation strategies to be used in the SMC filter’s update step. These strategies are based on a set of specific spatio-temporal cues of the video surveillance scene. They are used to weight the target samples. The different experiments carried out have shown that the proposed specialization approach is efficient and generic. We have been able to integrate multiple observation strategies. It can also be applied to any classifier / detector. In addition, we have implemented into the Logiroad OD SOFT software the loading and utilizing possibilities of a detector provided by our approach. We have also shown the advantages of the specialized detectors by comparing their results to the result of Logiroad’s Vu-meter method
Potapov, Danila. "Supervised Learning Approaches for Automatic Structuring of Videos." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAM023/document.
Full textAutomatic interpretation and understanding of videos still remains at the frontier of computer vision. The core challenge is to lift the expressive power of the current visual features (as well as features from other modalities, such as audio or text) to be able to automatically recognize typical video sections, with low temporal saliency yet high semantic expression. Examples of such long events include video sections where someone is fishing (TRECVID Multimedia Event Detection), or where the hero argues with a villain in a Hollywood action movie (Inria Action Movies). In this manuscript, we present several contributions towards this goal, focusing on three video analysis tasks: summarization, classification, localisation.First, we propose an automatic video summarization method, yielding a short and highly informative video summary of potentially long videos, tailored for specified categories of videos. We also introduce a new dataset for evaluation of video summarization methods, called MED-Summaries, which contains complete importance-scorings annotations of the videos, along with a complete set of evaluation tools.Second, we introduce a new dataset, called Inria Action Movies, consisting of long movies, and annotated with non-exclusive semantic categories (called beat-categories), whose definition is broad enough to cover most of the movie footage. Categories such as "pursuit" or "romance" in action movies are examples of beat-categories. We propose an approach for localizing beat-events based on classifying shots into beat-categories and learning the temporal constraints between shots.Third, we overview the Inria event classification system developed within the TRECVID Multimedia Event Detection competition and highlight the contributions made during the work on this thesis from 2011 to 2014
Barthélemy, Quentin. "Représentations parcimonieuses pour les signaux multivariés." Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00853362.
Full textThomas, Hugues. "Apprentissage de nouvelles représentations pour la sémantisation de nuages de points 3D." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEM048/document.
Full textIn the recent years, new technologies have allowed the acquisition of large and precise 3D scenes as point clouds. They have opened up new applications like self-driving vehicles or infrastructure monitoring that rely on efficient large scale point cloud processing. Convolutional deep learning methods cannot be directly used with point clouds. In the case of images, convolutional filters brought the ability to learn new representations, which were previously hand-crafted in older computer vision methods. Following the same line of thought, we present in this thesis a study of hand-crafted representations previously used for point cloud processing. We propose several contributions, to serve as basis for the design of a new convolutional representation for point cloud processing. They include a new definition of multiscale radius neighborhood, a comparison with multiscale k-nearest neighbors, a new active learning strategy, the semantic segmentation of large scale point clouds, and a study of the influence of density in multiscale representations. Following these contributions, we introduce the Kernel Point Convolution (KPConv), which uses radius neighborhoods and a set of kernel points to play the role of the kernel pixels in image convolution. Our convolutional networks outperform state-of-the-art semantic segmentation approaches in almost any situation. In addition to these strong results, we designed KPConv with a great flexibility and a deformable version. To conclude our argumentation, we propose several insights on the representations that our method is able to learn
Paquier, Williams. "Apprentissage ouvert de représentations et de fonctionalités en robotique : analyse, modèles et implémentation." Toulouse 3, 2004. http://www.theses.fr/2004TOU30233.
Full textAutonomous acquisition of representations and functionalities by a machine address several theoretical questions. Today’s autonomous robots are developed around a set of functionalities. Their representations of the world are deduced from the analysis and modeling of a given problem, and are initially given by the developers. This limits the learning capabilities of robots. In this thesis, we propose an approach and a system able to build open-ended representation and functionalities. This system learns through its experimentations of the environment and aims to augment a value function. Its objective consists in acting to reactivate the representations it has already learnt to connote positively. An analysis of the generalization capabilities to produce appropriate actions enable define a minimal set of properties needed by such a system. The open-ended representation system is composed of a network of homogeneous processing units and is based on position coding. The meaning of a processing unit depends on its position in the global network. This representation system presents similarities with the principle of numeration by position. A representation is given by a set of active units. This system is implemented in a suite of software called NeuSter, which is able to simulate million unit networks with billions of connections on heterogeneous clusters of POSIX machines. .
Caron, Stéphane. "Détection d'anomalies basée sur les représentations latentes d'un autoencodeur variationnel." Master's thesis, Université Laval, 2021. http://hdl.handle.net/20.500.11794/69185.
Full textIn this master's thesis, we propose a methodology that aims to detect anomalies among complex data, such as images. In order to do that, we use a specific type of neural network called the varitionnal autoencoder (VAE). This non-supervised deep learning approach allows us to obtain a simple representation of our data on which we then use the Kullback-Leibler distance to discriminate between anomalies and "normal" observations. To determine if an image should be considered "abnormal", our approach is based on a proportion of observations to be filtered, which is easier and more intuitive to establish than applying a threshold based on the value of a distance metric. By using our methodology on real complex images, we can obtain superior anomaly detection performances in terms of area under the ROC curve (AUC),precision and recall compared to other non-supervised methods. Moreover, we demonstrate that the simplicity of our filtration level allows us to easily adapt the method to datasets having different levels of anomaly contamination.
Gaillard, Audrey. "Développement des représentations conceptuelles chez l'enfant : une approche transversale." Paris 8, 2011. http://www.theses.fr/2011PA083972.
Full textIn recent years, many studies in developmental psychology have focused on concept formation in children, i. E. Object categorization. This thesis aimed, first, to study the influence of several contextual factors (experimental instructions, number of repetitions, category membership) on representation stability studied with sorting task and property-generation production task with adult participants. In the second time, in order to study conceptual representations in children, we analyzed the categorical organization of various objects names and its temporal stability in children aged from 6 to 11 years old according to different factors: children's age, experimental tasks and category membership. The set of our results shows the influence of the task on temporal stability of representations, both in adults than in children. Therefore, it seems to be the type of task that induces variability, not the contextual factors tested (instructions, repetitions, category membership). In, children, our results show that stability representations depends on the age and the category membership of objects (natural objects or artifacts). We discuss results compared to theories of categorization and conceptual development
Bucher, Maxime. "Apprentissage et exploitation de représentations sémantiques pour la classification et la recherche d'images." Thesis, Normandie, 2018. http://www.theses.fr/2018NORMC250/document.
Full textIn this thesis, we examine some practical difficulties of deep learning models.Indeed, despite the promising results in computer vision, implementing them in some situations raises some questions. For example, in classification tasks where thousands of categories have to be recognised, it is sometimes difficult to gather enough training data for each category.We propose two new approaches for this learning scenario, called <>. We use semantic information to model classes which allows us to define models by description, as opposed to modelling from a set of examples.In the first chapter we propose to optimize a metric in order to transform the distribution of the original data and to obtain an optimal attribute distribution. In the following chapter, unlike the standard approaches of the literature that rely on the learning of a common integration space, we propose to generate visual features from a conditional generator. The artificial examples can be used in addition to real data for learning a discriminant classifier. In the second part of this thesis, we address the question of computational intelligibility for computer vision tasks. Due to the many and complex transformations of deep learning algorithms, it is difficult for a user to interpret the returned prediction. Our proposition is to introduce what we call a <> in the processing pipeline, which is a crossing point in which the representation of the image is entirely expressed with natural language, while retaining the efficiency of numerical representations. This semantic bottleneck allows to detect failure cases in the prediction process so as to accept or reject the decision
Bourigault, Simon. "Apprentissage de représentations pour la prédiction de propagation d'information dans les réseaux sociaux." Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066368/document.
Full textIn this thesis, we study information diffusion in online social networks. Websites like Facebook or Twitter have indeed become information medias, on which users create and share a lot of data. Most existing models of the information diffusion phenomenon relies on strong hypothesis about the structure and dynamics of diffusion. In this document, we study the problem of diffusion prediction in the context where the social graph is unknown and only user actions are observed. - We propose a learning algorithm for the independant cascades model that does not take time into account. Experimental results show that this approach obtains better results than time-based learning schemes. - We then propose several representations learning methods for this task of diffusion prediction. This let us define more compact and faster models. - Finally, we apply our representation learning approach to the source detection task, where it obtains much better results than graph-based approaches