Tesis sobre el tema "Traitement des vidéos faciales"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 50 mejores tesis para su investigación sobre el tema "Traitement des vidéos faciales".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Ouzar, Yassine. "Reconnaissance automatique sans contact de l'état affectif de la personne par fusion physio-visuelle à partir de vidéo du visage". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0076.
Texto completoHuman affective state recognition remains a challenging topic due to the complexity of emotions, which involves experiential, behavioral, and physiological elements. Since it is difficult to comprehensively describe emotion in terms of single modalities, recent studies have focused on artificial intelligence approaches and fusion strategy to exploit the complementarity of multimodal signals using artificial intelligence approaches. The main objective is to study the feasibility of a physio-visual fusion for the recognition of the affective state of the person (emotions/stress) from facial videos. The fusion of facial expressions and physiological signals allows to take advantage of each modality. Facial expressions are easy to acquire and provide an external view of the affective state, while physiological signals improve reliability and address the problem of falsified facial expressions. The research developed in this thesis lies at the intersection of artificial intelligence, affective computing, and biomedical engineering. Our contribution focuses on two points. First, we propose a new end-to-end approach for instantaneous pulse rate estimation directly from facial video recordings using the principle of imaging photoplethysmography (iPPG). This method is based on a deep spatio-temporal network (X-iPPGNet) that learns the iPPG concept from scratch, without incorporating prior knowledge or going through manual iPPG signal extraction. The second contribution focuses on a physio-visual fusion for spontaneous emotions and stress recognition from facial videos. The proposed model includes two pipelines to extract the features of each modality. The physiological pipeline is common to both the emotion and stress recognition systems. It is based on MTTS-CAN, a recent method for estimating the iPPG signal, while two distinct neural models were used to predict the person's emotions and stress from the visual information contained in the video (e.g. facial expressions): a spatio-temporal network combining the Squeeze-Excitation module and the Xception architecture for estimating the emotional state and a transfer learning approach for estimating the stress level. This approach reduces development effort and overcomes the lack of data. A fusion of physiological and facial features is then performed to predict the emotional or stress states
Guerrero, Isabelle. "Évaluation économique du protocole de traitement des fentes faciales". Montpellier 1, 1986. http://www.theses.fr/1986MON10053.
Texto completoCleft lip and palate treatment may be considered as a good which economic value depends on its ability to satisfy a need and on the efficency of the unit where it is produced : the hospital. From the research carried out at the regional hospital of montpellier on 166 children treated for cleft, it appears that the clinical production is adapted to the need for treatment. Nevertheless, the hospital as a whole does not seem to function in the best economic way. The results obtained do not confirm the case-mix analysis by which the cost by d. R. G. Should be used as the new basis of hospital tariffs
Precioso, Frédéric. "Contours actifs paramétriques pour la segmentation d'images et vidéos". Nice, 2004. http://www.theses.fr/2004NICE4078.
Texto completoActive contour modelling represents the main framework of this thesis. Active contours are dynamic methods applied to segmentation of till images and video. The goal is to extract regions corresponding to semantic objects. Image and video segmentation can be cast in a minimization framework by choosing a criterion which includes region and boundary functional. The minimization is achieved through the propagation of a region-based active contour. The efficiency of these methods lies in their robustness and their accuracy. The aim of this thesis is triple : to develop (i) a model of parametric curve providing a smooth active contour, to precise (ii) conditions of stable evolution for such curves, and to reduce (iii) the computation cost of our algorithm in order to provide an efficient solution for real time applications. We mainly consider constraints on contour regularity providing a better robustness regarding to noisy data. In the framework of active contour, we focus on stability of the propagation force, on handling topology changes and convergence conditions. We chose cubic splines curves. Such curves provide great properties of regularity allow an exact computation for analytic expressions involved in the functional and reduce highly the coputation cost. Furthermore, we extended the well-known model-based on interpolating splines to an approximating model based smoothing splines. This latter converts the interpolation error into increased smoothness, smaller energy of the second derivative. The flexibility of this new model provides a tunable balance between accuracy and robustness. The efficiency of implementating such parametric active contour spline-based models has been illustrated for several applications of segmentation process
Francis, Danny. "Représentations sémantiques d'images et de vidéos". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS605.
Texto completoRecent research in Deep Learning has sent the quality of results in multimedia tasks rocketing: thanks to new big datasets of annotated images and videos, Deep Neural Networks (DNN) have outperformed other models in most cases. In this thesis, we aim at developing DNN models for automatically deriving semantic representations of images and videos. In particular we focus on two main tasks : vision-text matching and image/video automatic captioning. Addressing the matching task can be done by comparing visual objects and texts in a visual space, a textual space or a multimodal space. Based on recent works on capsule networks, we define two novel models to address the vision-text matching problem: Recurrent Capsule Networks and Gated Recurrent Capsules. In image and video captioning, we have to tackle a challenging task where a visual object has to be analyzed, and translated into a textual description in natural language. For that purpose, we propose two novel curriculum learning methods. Moreover regarding video captioning, analyzing videos requires not only to parse still images, but also to draw correspondences through time. We propose a novel Learned Spatio-Temporal Adaptive Pooling method for video captioning that combines spatial and temporal analysis. Extensive experiments on standard datasets assess the interest of our models and methods with respect to existing works
Hugard, Daniel. "Prévention et traitement des lésions maxillo-faciales dues aux radiations ionisantes". Montpellier 1, 1988. http://www.theses.fr/1988MON11001.
Texto completoKhalid, Musaab. "Analyse de vidéos de cours d'eau pour l'estimation de la vitesse surfacique". Thesis, Rennes 1, 2018. http://www.theses.fr/2018REN1S019/document.
Texto completoThis thesis is an application of computer vision findings to river velocimetry research. Hydraulic research scientists already use various image processing techniques to process image sequences of rivers. The ultimate goal is to estimate free surface velocity of rivers remotely. As such, many risks related to intrusive river gauging techniques could be avoided. Towards this goal, there are two major issues need be addressed. Firstly, the motion of the river in image space need to be estimated. The second issue is related to how to transform this image velocity to real world velocity. Until recently, imagebased velocimetry methods impose many requirements on images and still need considerable amount of field work to be able to estimate rivers velocity with good accuracy. We extend the perimeter of this field by including amateur videos of rivers and we provide better solutions for the aforementioned issues. We propose a motion estimation model that is based on the so-called optical flow, which is a well developed method for rigid motion estimation in image sequences. Contrary to conventional techniques used before, optical flow formulation is flexible enough to incorporate physics equations that govern rivers motion. Our optical flow is based on the scalar transport equation and is augmented with a weighted diffusion term to compensate for small scale (non-captured) contributions. Additionally, since there is no ground truth data for such type of image sequences, we present a new evaluation method to assess the results. It is based on trajectory reconstruction of few Lagrangian particles of interest and a direct comparison against their manually-reconstructed trajectories. The new motion estimation technique outperformed traditional methods in image space. Finally, we propose a specialized geometric modeling of river sites that allows complete and accurate passage from 2D velocity to world velocity, under mild assumptions. This modeling considerably reduces the field work needed before to deploy Ground Reference Points (GRPs). We proceed to show the results of two case studies in which world velocity is estimated from raw videos
Kijak, Ewa. "Structuration multimodale des vidéos de sports par modèles stochastiques". Rennes 1, 2003. https://tel.archives-ouvertes.fr/tel-00532944.
Texto completoNaturel, Xavier. "Structuration automatique de flux vidéos de télévision". Phd thesis, Université Rennes 1, 2007. http://tel.archives-ouvertes.fr/tel-00524584.
Texto completoLefebvre-Albaret, François. "Traitement automatique de vidéos en LSF : modélisation et exploitation des contraintes phonologiques du mouvement". Phd thesis, Université Paul Sabatier - Toulouse III, 2010. http://tel.archives-ouvertes.fr/tel-00608768.
Texto completoDenoulet, Julien. "Architectures massivement parallèles de systèmes sur circuits (SoC) pour le traitement de flux vidéos". Paris 11, 2004. http://www.theses.fr/2004PA112223.
Texto completoThis thesis describes the evolution of the associative mesh, a massively parallel simd architecture dedicated to image processing. This design is drawn from a theoretical model called associative nets, which implements a large number of image processing algorithms in an efficient way. In the prospect of a system on chip (soc) implementation of the associative mesh, this study presents the various possibilities of evolution for this architecture, and evaluates their consequences in terms of hardware costs and algorithmic performances. We show that a reorganisation of the structure based on the virtualisation of its elementary processors allows to reduce the design's area in substantial proportions, and opens new prospects in terms of calculation or memory management. Using an evaluation environment based on a programming library of associative nets and a parameterized description of the architecture using the system c language, we show that a virtualised associative mesh achieves real-time treatments for a great number of algorithms: low-level operations such as convolution filters, statistical statistical algorithms or mathematical morphology, and more complex treatments such as a split & merge segmentation, watershed segmentation, and motion detection using markovian relaxation
Dellandréa, Emmanuel. "Analyse de signaux vidéos et sonores : application à l'étude de signaux médicaux". Tours, 2003. http://www.theses.fr/2003TOUR4031.
Texto completoThe work deals with the study of multimedia sequences containing images and sounds. The analysis of images sequences consists in the tracking of moving objects in order to allow the study of their properties. The investigations have to enable the understanding of sounds when correlated to events in the image sequence. One generic method, based on the combination of regions and contours tracking, and one method adapted to homogeneous objects, based on level set theory, are proposed. The analysis of audio data consists in the development of an identification system based on the study of the structure of signals thanks to their coding and Zipf laws modeling. These methods have been evaluated on medical sequences within the framework of the gastro-oesophageal reflux pathology study, in collaboration with the Acoustique et Motricité Digestive research team of the University of Tours
Boltz, Sylvain. "Un cadre statistique en traitement d'images et vidéos par approche variationnelle avec modélisation haute dimension". Phd thesis, Université de Nice Sophia-Antipolis, 2008. http://tel.archives-ouvertes.fr/tel-00507488.
Texto completoLu, Hua. "Video Analysis for Micro- Expression Spotting and Recognition". Thesis, Rennes, INSA, 2018. http://www.theses.fr/2018ISAR0005/document.
Texto completoRecent years, there has been an increasing interest in the computer vision in automatic facial micro-expression algorithms. this has been driven by applications in high-stakes contexts such as criminal investigations, airport and mass transit checkpoints, counter terrorism, and so on. micro-expression approaches in computer vision area consist of detecting and classifying them from videos. compared to macro-expression, a micro-expression involves a rapid change which lasts less than a half of second, and moreover, its subtle appearance in part of the face makes detection and recognition difficult to achieve. effective facial features play a crucial role for micro-expression analysis. this thesis focuses on the feature extraction parts, by developing various feature extraction methods for types of micro-expression detection and recognition tasks.the detection of micro-expressions is the first step for its analysis. this thesis aims to spot micro-expressions from videos. existing detection methods based on features, such as the local binary patterns, the histogram of gradient, the optical flow suffer difficulties in computation consuming leading to real-time implementation problem. thus, in this thesis, the spotting method based on integral projection to address this problem. however, all the above features are extracted from cropped faces which usually cause residual mis-registration that appears between images. in order to deal with this issue, another detection method based on geometrical feature is proposed. it involves the geometrical distances between facial key-points without the need of cropping face. this captures subtle geometric displacements along sequences and is proved to be suitable for different facial analysis tasks that require high computational speed. for micro-expression recognition, motion features based on the optical flow have advantages in characterizing subtle movements on face among the existing recognition features. it is still a difficult problem for optical flow to determine the accurate locations of each facial feature mappings between different images even though the face images have been aligned. such an issue may give rise to wrong orientation and magnitude estimation associated to the optical flow field. in order to address this problem, the motion boundary histograms are considered. it can remove unexpected motions caused by residual mis-registration that appears between images cropped from different frames. nevertheless, the relative motion can be captured. based on the the motion boundary, a new descriptor the fusion motion boundary histograms is introduced. this feature is generated by combing both the horizontal and the vertical components of the differential of optical flow as inspired from the motion boundary histograms. the main contributions of this thesis lie at the study of features for micro-expressions spotting and recognition. experiments on the micro-expression databases show the effectiveness of the presented contributions
Barland, Rémi. "Évaluation objective sans référence de la qualité perçue : applications aux images et vidéos compressées". Nantes, 2007. http://www.theses.fr/2007NANT2028.
Texto completoThe conversion to the all-digital and the development of multimedia communications produce an ever-increasing flow of information. This massive increase in the quantity of data exchanged generates a progressive saturation of the transmission networks. To deal with this situation, the compression standards seek to exploit more and more the spatial and/or temporal correlation to reduce the bit rate. The reduction of the resulting information creates visual artefacts which can deteriorate the visual content of the scene and thus cause troubles for the end-user. In order to propose the best broadcasting service, the assessment of the perceived quality is then necessary. The subjective tests which represent the reference method to quantify the perception of distortions are expensive, difficult to implement and remain inappropriate for an on-line quality assessment. In this thesis, we are interested in the most used compression standards (image or video) and have designed no-reference quality metrics based on the exploitation of the most annoying visual artefacts, such as the blocking, blurring and ringing effects. The proposed approach is modular and adapts to the considered coder and to the required ratio between computational cost and performance. For a low complexity, the metric quantifies the distortions specific to the considered coder, only exploiting the properties of the image signal. To improve the performance, to the detriment of a certain complexity, this one integrates in addition, cognitive models simulating the mechanisms of the visual attention. The saliency maps generated are then used to refine the proposed distortion measures purely based on the image signal
Precioso, Frédéric. "Contours actifs paramétriques pour la segmentationd'images et vidéos". Phd thesis, Université de Nice Sophia-Antipolis, 2004. http://tel.archives-ouvertes.fr/tel-00327411.
Texto completoSoladié, Catherine. "Représentation Invariante des Expressions Faciales". Phd thesis, Université Rennes 1, 2013. http://tel.archives-ouvertes.fr/tel-00935973.
Texto completoLIOZON, PATRICK. "Une nouvelle technique de traitement chirurgical des paralysies faciales : la retension du muscle de horner". Limoges, 1989. http://www.theses.fr/1989LIMO0185.
Texto completoChan-Hon-Tong, Adrien. "Segmentation supervisée d'actions à partir de primitives haut niveau dans des flux vidéos". Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066226/document.
Texto completoThis thesis focuses on the supervised segmentation of video streams within the application context of daily action recognition.A segmentation algorithm is obtained from Implicit Shape Model by optimising the votes existing in this polling method.We prove that this optimisation can be linked to the sliding windows plus SVM framework and more precisely is equivalent with a standard training by adding temporal constraint, or, by encoding the data through a dense pyramidal decomposition. This algorithm is evaluated on a public database of segmentation where it outperforms other Implicit Shape Model like methods and the standard linear SVM.This algorithm is then integrated into a action segmentation system.Specific features are extracted from skeleton obtained from the video by standard software.These features are then clustered and given to the polling method.This system, combining our feature and our algorithm, obtains the best published performance on a human daily action segmentation dataset
Brangoulo, Sébastien. "Codage d'images fixes et de vidéos par ondelette de seconde génération : théorie et applications". Rennes 1, 2005. http://www.theses.fr/2005REN1S003.
Texto completoLéonard, Isabelle. "Reconnaissance des objets manufacturés dans des vidéos sous-marines". Phd thesis, Université de Bretagne occidentale - Brest, 2012. http://tel.archives-ouvertes.fr/tel-00780647.
Texto completoHervieu, Alexandre. "Analyse de trajectoires vidéos à l'aide de modélisations markoviennes pour l'interprétation de contenus". Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/hervieu.pdf.
Texto completoThis thesis deals with the use of trajectories extracted from videos. The approach is invariant to translation, to rotation and to scaling and takes into account both shape and dynamics-related information on the trajectories. A hidden Markov model (HMM) is proposed to handle lack of observations and parameters are properly estimated. A similarity measure between HMM is used to tackle three dynamic video content understanding tasks: recognition, clustering and detection of unexpected events. Hierarchical semi-Markov chains are developed to process interacting trajectories. The interactions between trajectories are taken into used to recognize activity phases. Our method has been evaluated on sets of trajectories extracted from squash and handball video. Applications of such interaction-based models have also been extended to 3D gesture and action recognition and clustering. The results show that taking into account the interactions may be of great interest for such applications
Mahboubi, Amal Kheira. "Méthodes d'extraction, de suivi temporel et de caractérisation des objets dans les vidéos basées sur des modèles polygonaux et triangulés". Nantes, 2003. http://www.theses.fr/2003NANT2036.
Texto completoLandais, Rémi. "Compréhension de systèmes d'extraction d'objets dans la vidéo sous l'angle de l'adaptation". Lyon, INSA, 2006. http://theses.insa-lyon.fr/publication/2006ISAL0019/these.pdf.
Texto completoAt the French “Institut National de l’Audiovisuel”, extracting meaningful objects, such as texts or faces, from video streams is a task of great importance so as to automate the documentation process. These objects may take many different forms and such variations impose to adapt extraction systems to maintain their performances over different documents. This PhD presents an autonomous adaptation methodology of these systems: it does not require the acquisition of any expert knowledge concerning the functioning of the system. The methodology is then based on the fusion of two analyses: the first one extracts the different categories of performances obtained by the system and especially, the different types of errors it produces; the second analysis, called “diagnosis of responsibility”, aims at determining automatically which module of the system is responsible of each error category, in order to tune its parameters. Experimentations have been carried out on the text object
Ravaut, Frédéric. "Analyse automatique des manifestations faciales cliniques par techniques de traitement d'images : application aux manifestations de l'épilepsie". Paris 5, 1999. http://www.theses.fr/1999PA05S027.
Texto completoDollion, Nicolas. "Le traitement des expressions faciales au cours de la première année : développement et rôle de l'olfaction". Thesis, Dijon, 2015. http://www.theses.fr/2015DIJOS085/document.
Texto completoThe first year of life is critical for the development of the abilities to process facial expressions. Olfaction and expressions are both strongly linked to each other, and it is well known that infants are able to multisensorially integrate their environment as early as birth. However, most of the studies interested in multisensory processing of facial expressions are restricted to the investigation of audio-visual interactions.In this thesis, we firstly aimed to resolve different issues concerning the ontogenesis of infants’ ability to process facial expressions. Our results allowed to specify the development of visual exploratory strategies of facial emotions along the first year of life, and to demonstrate that a progressive distinction of expressions according to their emotional meaning is present. Using the EEG, we were also able to specify the nature and the time course of facial expressions distinction in 3-month-old infants.The second objective of our studies was to expand the knowledge concerning the multisensory processing of facial expressions. More specifically we wanted to investigate the influence of olfacto-visual interactions on this processing. Our event-related potentials experiments allowed to specify the time course of the cerebral integration of olfaction in the visual processing of emotional faces in adults, and to demonstrate that similar interactions are present in infants as young as 3 month-old. We also demonstrated that at 7 months of age odors trigger the search for specific facial expressions. Our results suggest that olfaction might contribute to the development of infants’ ability to process facially displayed emotions
Baccouche, Moez. "Apprentissage neuronal de caractéristiques spatio-temporelles pour la classification automatique de séquences vidéo". Phd thesis, INSA de Lyon, 2013. http://tel.archives-ouvertes.fr/tel-00932662.
Texto completoGuilmart, Christophe. "Filtrage de segments informatifs dans des vidéos". Phd thesis, École normale supérieure de Cachan - ENS Cachan, 2011. http://tel.archives-ouvertes.fr/tel-00668307.
Texto completoChan, wai tim Stefen. "Apprentissage supervisé d’une représentation multi-couches à base de dictionnaires pour la classification d’images et de vidéos". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAT089/document.
Texto completoIn the recent years, numerous works have been published on dictionary learning and sparse coding. They were initially used in image reconstruction and image restoration tasks. Recently, researches were interested in the use of dictionaries for classification tasks because of their capability to represent underlying patterns in images. Good results have been obtained in specific conditions: centered objects of interest, homogeneous sizes and points of view.However, without these constraints, the performances are dropping.In this thesis, we are interested in finding good dictionaries for classification.The learning methods classically used for dictionaries rely on unsupervised learning. Here, we are going to study how to perform supervised dictionary learning.In order to push the performances further, we introduce a multilayer architecture for dictionaries. The proposed architecture is based on the local description of an input image and its transformation thanks to a succession of encoding and processing steps. It outputs a vector of features effective for classification.The learning method we developed is based on the backpropagation algorithm which allows a joint learning of the different dictionaries and an optimization solely with respect to the classification cost.The proposed architecture has been tested on MNIST, CIFAR-10 and STL-10 datasets with good results compared to other dicitonary-based methods. The proposed architecture can be extended to video analysis
Matta, Federico. "Video person recognition strategies using head motion and facial appearance". Nice, 2008. http://www.theses.fr/2008NICE4038.
Texto completoDans cette thèse, nous avons principalement exploré l'utilisation de l'information temporelle des séquences vidéo afin de l'appliquer à la reconnaissance de personne et de son genre; en particulier, nous nous concentrons sur l'analyse du mouvement de la tête et du visage ainsi que sur leurs applications potentielles comme éléments d'identification biométriques. De plus, nous cherchons à exploiter la majorité de l'information contenue dans la vidéo pour la reconnaissance automatique; plus précisément, nous regardons la possibilité d'intégrer dans un système biométrique multimodal l'information liée au mouvement de la tête et de la bouche avec celle de l'aspect du visage, et nous étudions l'extraction des nouveaux paramètres spatio-temporels pour la reconnaissance faciale. Nous présentons d'abord un système de reconnaissance de la personne qui exploite l'information relative au mouvement spontané de la tête. Cette information est extraite par le suivi dans le plan image de certains éléments caractéristiques du visage. En particulier, nous détaillons la façon dont dans chaque séquence vidéo le visage est tout d'abord détecté semi-automatiquement, puis le suivi automatique dans le temps de certains éléments caractéristiques en utilisant une approche basée sur l'appariement de bloques (template matching). Ensuite, nous exposons les normalisations géométriques des signaux que nous avons obtenus, le calcul des vecteurs caractéristiques, et la façon dont ils sont utilisés pour estimer les modèles des clients, approximés avec des modèles de mélange de gaussiennes. En fin de compte, nous parvenons à identifier et vérifier l'identité de la personne en appliquant la théorie des probabilités et la règle de décision bayésienne (aussi appelée inférence bayésienne). Nous proposons ensuite une extension multimodale de notre système de reconnaissance de la personne; plus précisément, nous intégrons à travers un cadre probabiliste unifié l'information sur le mouvement de la tête avec celles liées au mouvement de la bouche et à l'aspect du visage. En fait nous développons un nouveau sous-système temporel qui a un espace caractéristique étendu, lequel est enrichi par certains paramètres supplémentaires relatif au mouvement de la bouche; dans le même temps nous introduisons un sous-système spatial complémentaire au précédent, basé sur une extension probabiliste de l'approche Eigenfaces d'origine. Ensuite, une étape d'intégration combine les scores de similarité des deux sous-systèmes parallèles, grâce à une stratégie appropriée de fusion d'opinions. Enfin nous étudions une méthode pratique pour extraire de nouveaux paramètres spatio-temporels liés au visage à partir des séquences vidéo; le but est de distinguer l'identité et le genre de la personne. À cette fin nous développons un système de reconnaissance appelé tomovisages (tomofaces), qui applique la technique de la tomographie vidéo pour résumer en une seule image l'information relative au mouvement et à l'aspect du visage d'une personne. Puis, nous détaillons la projection linéaire à partir de l'espace de l'image en rayons X à un espace caractéristique de dimension réduite, l'estimation des modèles des utilisateurs en calculant les représentants des clusters correspondants, et la reconnaissance de l'identité et du genre par le biais d'un classificateur de plus proche voisin, qui adopte des distances dans le sous-espace
In questa tesi di dottorato esploriamo la possibilità di riconoscere l'identità e il sesso di una persona attraverso l'utilizzo dell'informazione temporale disponibile in alcune sequenze video, in particolare ci concentriamo sull'analisi del movimento della testa e del viso, nonché del loro potenziale utilizzo come identificatiori biometrici. Esaminiamo inoltre la problematica relativa al fatto di sfruttare la maggior parte dell'informazione presente nei video per effettuare il riconoscimento automatico della persona; più precisamente, analizziamo la possibilità di integrare in un sistema biometrico multimodale l'informazione relativa al movimento della testa e della bocca con quella dell'aspetto del viso, e studiamo il calcolo di nuovi parametri spazio-temporali che siano utilizzabili per il riconoscimento stesso. In primo luogo presentiamo un sistema di riconoscimento biometrico della persona che sfrutti l'informazione legata al movimento naturale della testa, il quale è estratto seguendo la posizione nel piano immagine di alcuni elementi caratteristici del viso. In particolare descriviamo come in una sequenza video il volto venga dapprima individuato semiautomaticamente, e come poi alcuni suoi elementi caratteristici siano localizzati nel tempo tramite un algoritmo automatico di messa in corrispondenza di modelli (template matching) permettendo di seguirne la posizione. Spieghiamo quindi le normalizzazioni geometriche dei segnali che abbiamo ricavato, il calcolo dei vettori caratteristici, ed il modo in cui questi sono utilizzati per stimare i modelli degli utilizzatori, approssimandoli tramite delle misture di distribuzioni gaussiane (Gaussian mixture models). Alla fine otteniamo l'identificazione e la verifica dell'identità della persona applicando la teoria delle probabilità e la regola di decisione o inferenza bayesiana. In seguito proponiamo un'estensione multimodale del nostro sistema di riconoscimento della persona; più precisamente, tramite un approccio probabilistico unificato, integriamo l'informazione sul movimento della testa con quelle relative al movimento della bocca e all'aspetto del viso. Infatti sviluppiamo un nuovo sottosistema temporale che possiede uno spazio caratteristico esteso, arricchito di alcuni parametri aggiuntivi legati al movimento della bocca; contemporaneamente, introduciamo un sottosistema spaziale complementare al precedente, basato su un'estensione probabilistica dell'approccio Eigenfaces originale. Alla fine implementiamo uno stadio di fusione, che metta insieme i valori di somiglianza dei due sottosistemi paralleli, attraverso un'appropriata strategia di fusione delle opinioni. Infine investighiamo un metodo pratico per estrarre nuovi parametri spazio-temporali relativi al volto a partire da sequenze video, i quali sono utilizzati per distinguere l'identità ed il sesso della persona. A questo riguardo sviluppiamo un sistema di riconoscimento chiamato tomovolti (tomofaces), il quale utilizza la tecnica della tomografia video per riassumere in una sola immagine l'informazione relativa all'aspetto ed al movimento del volto di una persona. Poi descriviamo la proiezione lineare dallo spazio dell'immagine ai raggi X ad un spazio caratteristico di dimensione ridotta, la stima dei modelli degli utilizzatori attraverso il calcolo dei rappresentanti corrispondenti ad ogni cluster, ed il riconoscimento dell'identità e del genere attraverso un classificatore al vicino più prossimo (nearest neighbour classifier), che adopera le distanze nel sottospazio
Yao, Xu. "Latent representations for facial images and video editing". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT019.
Texto completoLearning to edit facial images and videos is one of the most popular tasks in both academia and industrial research. This thesis addresses the problem of face editing for the special case of high-resolution images and videos.In this thesis, we develop deep learning-based methods to perform facial image editing. Specifically, we explore the task using the latent representations obtained from two types of deep neural networks: autoencoder-based models and generative adversarial networks. For each type of method, we consider a specific image editing problem and propose an effective solution that outperforms the state-of-the-art.The thesis contains two parts. In part I, we explore image editing tasks via the latent space of autoencoders. We first consider the style transfer task between photos and propose an effective algorithm that is built on a pair of autoencoder-based networks. Second, we study the face age editing task for high-resolution images, using an encoder-decoder architecture. The proposed network encodes a face image to age-invariant feature representations and learns a modulation vector corresponding to a target age. Our approach allows for fine-grained age editing on high-resolution images in a single unified model.In part II, we explore the editing task via the latent space of generative adversarial models (GANs). First, we consider the problem of facial attribute disentangled editing on synthetic and real images, by proposing a latent transformation network that acts in the latent space of a pre-trained GAN model. We also proposed a video manipulation pipeline, to generalize the editing result to videos. Second, we investigate the problem of GAN inversion -- the projection of a real image to the latent space of a pretrained GAN. In particular, we propose a feed-forward encoder, which encodes a given image to a feature code and a latent code in one pass. The proposed encoder is shown to be more accurate and stable for image and video inversion, meanwhile, maintaining good editing capacities
Hudon-ven, der Buhs Isabelle. "Les expressions faciales aptes à susciter un traitement favorable de la part d'autrui au sein de différentes relations interpersonnelles". Thesis, Université d'Ottawa / University of Ottawa, 2016. http://hdl.handle.net/10393/34490.
Texto completoGastaud, Muriel. "Modèles de contours actifs pour la segmentation d'images et de vidéos". Phd thesis, Université de Nice Sophia-Antipolis, 2005. http://tel.archives-ouvertes.fr/tel-00089384.
Texto completoLa contribution de cette thèse réside dans l'élaboration et l'étude de différents descripteurs de région. Pour chaque critère, nous calculons la dérivée du critère à l'aide des gradients de forme, et en déduisons l'équation d'évolution du contour actif.
Le premier descripteur définit un a priori géométrique sans contrainte paramétrique: il minimise la distance du contour actif à un contour de référence. Nous l'avons appliqué à la déformation de courbe, la segmentation et le suivi de cible.
Le deuxième descripteur caractérise le mouvement de l'objet par un modèle de mouvement. Le critère associé définit conjointement une région et son mouvement sur plusieurs images consécutives. Nous avons appliqué ce critère à l'estimation et la segmentation conjointe du mouvement et au suivi d'objets en mouvement.
Bertolino, Pascal. "Algorithmes pour la segmentation et l'amélioration de la qualité des images et des vidéos". Habilitation à diriger des recherches, Université de Grenoble, 2012. http://tel.archives-ouvertes.fr/tel-00798440.
Texto completoDexter, Émilie. "Modélisation de l'auto-similarité dans les vidéos : applications à la synchronisation de scènes et à la reconnaissance d'actions". Rennes 1, 2009. ftp://ftp.irisa.fr/techreports/theses/2009/dexter.pdf.
Texto completoThis PhD work deals with action recognition and image sequence synchronization. We propose to compute temporal similarities of image sequences to build self-similarity matrix. Although these matrices are not strictly view-invariant, they remain stable across views providing temporal descriptors of image sequences useful for synchronization as well as discriminant for action recognition. Synchronization is achieved with a dynamic programming algorithm known as Dynamic Time Warping. We opt for “Bag-of-Features” methods for recognizing actions such as actions are represented either as unordered sets of descriptors or as normalized histograms of quantized descriptor occurrences. Classification is performed by well known classification methods as Nearest Neighbor Classifier or Support Vector Machine. Proposed methods are characterized by their simplicity and flexibility: they do not require point correspondences between views
Kornreich, Charles. "Contribution à l'étude du traitement de l'information émotionnelle dans les assuétudes: exemple de la reconnaissance des expressions faciales émotionnelles". Doctoral thesis, Universite Libre de Bruxelles, 2003. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211264.
Texto completoBourdis, Nicolas. "Détection de changements entre vidéos aériennes avec trajectoires arbitraires". Electronic Thesis or Diss., Paris, ENST, 2013. http://www.theses.fr/2013ENST0028.
Texto completoBusiness activities based on the use of video data have developed at a dazzling speed these last few years: not only has the market of some of these activities widely expanded (video-surveillance) but the operational applications have also greatly diversified (natural resources monitoring, intelligence etc). However, nowadays, the volume of generated data has become overwhelming and the efficiency of these activities is now limited by the cost and the time required by the human interpretation of this video data. Automatic analysis of video streams has hence become a critical problem for numerous applications. The semi-autmoatic approach developed in this thesis focuses more specifically on the automatic analysis of aerial videos and enables assisting the image analyst in his task by suggesting areas of potential interest identified using change detection. For that purpose, our approach proceeds to a tridimensional modeling of the appearances observed in the reference videos. Such a modeling then enables the online detection of significant changes in a new video, by identifying appearance deviations with respect to the reference models. Specific techniques have also been developed to estimate the acquisition parameters and to attenuate illumination effects. Moreover, we developed several consolidation techniques making use of a priori knowledge related to targeted changes, in order to improve detection accuracy. The interest and good performance of our change detection approach has been carefully demonstrated using both real and synthetical data
Herbulot, Ariane. "Mesures statistiques non-paramétriques pour la segmentation d'images et de vidéos et minimisation par contours actifs". Phd thesis, Université de Nice Sophia-Antipolis, 2007. http://tel.archives-ouvertes.fr/tel-00507087.
Texto completoHammal, Zakia. "Segmentation des traits du visage, analyse et reconnaissance d'expressions faciales par le modèle de croyance transférable". Université Joseph Fourier (Grenoble), 2006. http://www.theses.fr/2006GRE10059.
Texto completoThe aim of this work is the analysis and the classification of facial expressions. Experiments in psychology show that hum an is able to recognize the emotions based on the visualization of the temporal evolution of sorne characteristic fiducial points. Thus we firstly propose an automatic system for the extraction of the permanent facial features (eyes, eyebrows and lips). Ln this work we are interested in the problem of the segmentation of the eyes and the eyebrows. The segmentation of lips contours is based on a previous work developed in the laboratory. The proposed algorithm for eyes and eyebrows contours segmentation consists of three steps : firstly, the definition of parametric models to fit as accurate as possible the contour of each feature ; then, a whole set of characteristic points is detected to initialize the selected models in the face ; finally, the initial models are finally fitted by taking into account the luminance gradient information. The segmentation of the eyes, eyebrows and lips contours leads to what we cali skeletons of expressions. To measure the characteristic features deformation, five characteristic distances are defined on these skeletons. Based on the state of these distances a whole set of logical rules is defined for each one of the considered expression : Smile, Surprise, Disgust, Anger, Fear, Sadness and Neutral. These rules are compatible with the standard MPEG-4 which provides a description of the deformations undergone by each facial feature during the production of the six universal facial expressions. However the human behavior is not binary, a pure expression is rarely produced. To be able to model the doubt between several expressions and to model the unknown expressions, the Transferable Belief Model is used as a fusion process for the facial expressions classification. The classification system takes into account the evolution of the facial features deformation in the course of the time. Towards an audio-visual system for emotional expressions classification, a reliminary study on vocal expressions is also proposed
Bourdis, Nicolas. "Détection de changements entre vidéos aériennes avec trajectoires arbitraires". Phd thesis, Telecom ParisTech, 2013. http://tel.archives-ouvertes.fr/tel-00834717.
Texto completoSong, Guanghan. "Effet du son dans les vidéos sur la direction du regard : contribution à la modélisation de la saillance audiovisuelle". Phd thesis, Université de Grenoble, 2013. http://tel.archives-ouvertes.fr/tel-00875651.
Texto completoMemmi, Paul Joseph. "Etude sémiolinguistique du sous-titrage pour une écriture concise assistée par ordinateur (ECAO) avec application à l'audiovisuel". Paris 10, 2005. http://www.theses.fr/2005PA100069.
Texto completoIntelligentiæ pauca – To intelligence, little (is enough). Through its elliptic form, the pleasure it arouses and the wit it calls for, this phrase praised by Stendhal points out what concise writing is. This thesis aims at conceiving a word processor ÉCAO (French for Automatically Processed Concise Writing – APCW) which, in its audiovisual application, should find uses also for Internet, subtitled translations and subtitling for the hearing-impaired. A semiolinguistic study of the subtitling, an example of concise writing in a verbal and audiovisual environment, leads to coming up with a method for referencing and disambiguating the source information and with a set of phrastic concision operators. Some are programmable, others reveal the automaton's deficiencies faced with sense constructions which are yet of capital importance. There lies the essential purpose of this research: the study of cognitive integration of complex communications and of concision as a mode of representation
Souvannavong, Fabrice. "Indexation et recherche de plans videos par le contenu sémantique". Paris, ENST, 2005. http://www.theses.fr/2005ENST0018.
Texto completoIn this thesis, we address the fussy problem of video content indexing and retrieval and in particular automatic semantic video content indexing. Indexing is the operation that consists in extracting a numerical or textual signature that describes the content in an accurate and concise manner. The objective is to allow an efficient search in a database. The automatic aspect of the indexing is important since we can imagine the difficulty to annotate video shots in huge databases. Until now, systems were concentrated on the description and indexing of the visual content. The search was mainly led on colors and textures of video shots. The new challenge is now to automatically add to these signatures a semantic description of the content. First, a range of indexing techniques is presented. Second, we introduce a method to compute an accurate and compact signature from key-frames regions. This method is an adaptation of the latent semantic indexing method originally used to index text documents. Third, we address the difficult task of semantic content retrieval. Experiments are led in the framework of TRECVID. It allows having a huge amount of videos and their labels. Fourth, we pursue on the semantic classification task through the study of fusion mechanisms. Finally, this thesis concludes on the introduction of a new active learning approach to limit the annotation effort
Jehan-Besson, Stéphanie. "Modèles de contours actifs basés régions pour la segmentation d'images et de vidéos". Phd thesis, Université de Nice Sophia-Antipolis, 2003. http://tel.archives-ouvertes.fr/tel-00089867.
Texto completoNous proposons de segmenter les régions ou objets en minimisant une fonctionnelle composée d'intégrales de régions et d'intégrales de contours. Dans ce cadre de travail, les fonctions caractérisant les régions ou les contours sont appelées "descripteurs''. La recherche du minimum se fait via la propagation d'un contour actif dit basé régions. L'équation d'évolution associée est calculée en utilisant les outils de dérivation de domaines. Par ailleurs, nous prenons en compte le cas des descripteurs dépendant de la région qui évoluent au cours de la propagation du contour. Nous montrons que cette dépendance induit des termes supplémentaires dans l'équation d'évolution.
Le cadre de travail développé est ensuite mis en oeuvre pour des applications variées de segmentation. Tout d'abord, des descripteurs statistiques basés sur le déterminant de la matrice de covariance sont étudiés pour la segmentation du visage. L'estimation des paramètres statistiques se fait conjointement à la segmentation. Nous proposons ensuite des descripteurs statistiques utilisant une distance à un histogramme de référence. Enfin, la détection des objets en mouvement dans les séquences à caméra fixe et mobile est opérée via l'utilisation hierarchique de descripteurs basés mouvement et de descripteurs spatiaux.
Stoiber, Nicolas. "Modélisation des expressions faciales émotionnelles et de leurs dynamiques pour l'animation réaliste et interactive de personnages virtuels". Phd thesis, Université Rennes 1, 2010. http://tel.archives-ouvertes.fr/tel-00558851.
Texto completoYang, Yu-Fang. "Contribution des caractéristiques diagnostiques dans la reconnaissance des expressions faciales émotionnelles : une approche neurocognitive alliant oculométrie et électroencéphalographie". Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLS099/document.
Texto completoProficient recognition of facial expression is crucial for social interaction. Behaviour, event-related potentials (ERPs), and eye-tracking techniques can be used to investigate the underlying brain mechanisms supporting this seemingly effortless processing of facial expression. Facial expression recognition involves not only the extraction of expressive information from diagnostic facial features, known as part-based processing, but also the integration of featural information, known as configural processing. Despite the critical role of diagnostic features in emotion recognition and extensive research in this area, it is still not known how the brain decodes configural information in terms of emotion recognition. The complexity of facial information integration becomes evident when comparing performance between healthy subjects and individuals with schizophrenia because those patients tend to process featural information on emotional faces. The different ways in examining faces possibly impact on social-cognitive ability in recognizing emotions. Therefore, this thesis investigates the role of diagnostic features and face configuration in the recognition of facial expression. In addition to behavior, we examined both the spatiotemporal dynamics of fixations using eye-tracking, and early neurocognitive sensitivity to face as indexed by the P100 and N170 ERP components. In order to address the questions, we built a new set of sketch face stimuli by transforming photographed faces from the Radboud Faces Database through the removal of facial texture and retaining only the diagnostic features (e.g., eyes, nose, mouth) with neutral and four facial expressions - anger, sadness, fear, happiness. Sketch faces supposedly impair configural processing in comparison with photographed faces, resulting in increased sensitivity to diagnostic features through part-based processing. The direct comparison of neurocognitive measures between sketch and photographed faces expressing basic emotions has never been tested. In this thesis, we examined (i) eye fixations as a function of stimulus type, and (ii) neuroelectric response to experimental manipulations such face inversion and deconfiguration. The use of these methods aimed to reveal which face processing drives emotion recognition and to establish neurocognitive markers of emotional sketch and photographed faces processing. Overall, the behavioral results showed that sketch faces convey sufficient expressive information (content of diagnostic features) as in photographed faces for emotion recognition. There was a clear emotion recognition advantage for happy expressions as compared to other emotions. In contrast, recognizing sad and angry faces was more difficult. Concomitantly, results of eye-tracking showed that participants employed more part-based processing on sketch and photographed faces during second fixation. The extracting information from the eyes is needed when the expression conveys more complex emotional information and when stimuli are impoverished (e.g., sketch). Using electroencephalographic (EEG), the P100 and N170 components are used to study the effect of stimulus type (sketch, photographed), orientation (inverted, upright), and deconfiguration, and possible interactions. Results also suggest that sketch faces evoked more part-based processing. The cues conveyed by diagnostic features might have been subjected to early processing, likely driven by low-level information during P100 time window, followed by a later decoding of facial structure and its emotional content in the N170 time window. In sum, this thesis helped elucidate elements of the debate about configural and part-based face processing for emotion recognition, and extend our current understanding of the role of diagnostic features and configural information during neurocognitive processing of facial expressions of emotion
Vidal, Eloïse. "Étude et implémentation d'une architecture temps réel pour l'optimisation de la compression H.264/AVC de vidéos SD/HD". Thesis, Valenciennes, 2014. http://www.theses.fr/2014VALE0011/document.
Texto completoThe use of digital video over IP has increased exponentially over the last years, due to the development of high-speed networks dedicated to high quality TV transmission as well as the wide development of the nonprofessional video webcast. Optimization of the H.264/AVC encoding process allows manufacturers to offer differentiating encoding solutions, by reducing the bandwidth necessary for transmitting a video sequence at a given quality level, or improving the quality perceived by final users at a fixed bit rate. This thesis was carried out at the company Digigram in a context of professional high quality video. We propose two solutions of preprocessing which consider the characteristics of the human visual system by exploiting a JND profile (Just Noticeable Distortion). A JND model defines perceptual thresholds, below which a distortion cannot be seen, according to the video content. The first solution proposes an adaptive pre-filter independent to the encoder, controlled by a JND profile to reduce the perceptually non-relevant content and so reduce the bitrate while maintaining the perceived quality. By analyzing the state-of-the-art literature, the AWA (Adaptive Weighted Averaging) and Bilateral filters have been selected. Then we define two new filters using a large convolution mask, which enable to better exploit correlations in high-definition video contents. Through subjective tests, we show that the proposed perceptual prefilters give an average bitrate reduction of 20% for the same visual quality in VBR (Variable Bitrate) H.264/AVC Intra and Inter encoding. Finally, the second solution enables to improve the perceived quality in CBR (Constant Bitrate) encoding, by integrating the JND profile into the x264 codec, one of the best implementation of the H.264/AVC standard. Thus, we propose a perceptual adaptive quantization which enhances the x264 performance by improving edge information coding in low and middle bitrate applications
Pierre, Fabien. "Méthodes variationnelles pour la colorisation d’images, de vidéos, et la correction des couleurs". Thesis, Bordeaux, 2016. http://www.theses.fr/2016BORD0250/document.
Texto completoThis thesis deals with problems related to color. In particular, we are interested inproblems which arise in image and video colorization and contrast enhancement. When considering color images composed of two complementary information, oneachromatic (without color) and the other chromatic (in color), the applications studied in this thesis are based on the processing one of these information while preserving its complement. In colorization, the challenge is to compute a color image while constraining its gray-scale channel. Contrast enhancement aims to modify the intensity channel of an image while preserving its hue.These joined problems require to formally study the RGB space geometry. In this work, it has been shown that the classical color spaces of the literature designed to solve these classes of problems lead to errors. An novel algorithm, called luminance-hue specification, which computes a color with a given hue and luminance is described in this thesis. The extension of this method to a variational framework has been proposed. This model has been used successfully to enhance color images, using well-known assumptions about the human visual system. The state-of-the-art methods for image colorization fall into two categories. The first category includes those that diffuse color scribbles drawn by the user (manual colorization). The second consists of those that benefits from a reference color image or a base of reference images to transfer the colors from the reference to the grayscale image (exemplar-based colorization). Both approach have their advantages and drawbacks. In this thesis, we design a variational model for exemplar-based colorization which is extended to a method unifying the manual colorization and the exemplar-based one. Finally, we describe two variational models to colorize videos in interaction with the user
Boukadida, Haykel. "Création automatique de résumés vidéo par programmation par contraintes". Thesis, Rennes 1, 2015. http://www.theses.fr/2015REN1S074/document.
Texto completoThis thesis focuses on the issue of automatic video summarization. The idea is to create an adaptive video summary that takes into account a set of rules defined on the audiovisual content on the one hand, and that adapts to the users preferences on the other hand. We propose a novel approach that considers the problem of automatic video summarization as a constraint satisfaction problem. The solution is based on constraint satisfaction programming (CSP) as programming paradigm. A set of general rules for summary production are inherently defined by an expert. These production rules are related to the multimedia content of the input video. The rules are expressed as constraints to be satisfied. The final user can then define additional constraints (such as the desired duration of the summary) or enter a set of high-level parameters involving to the constraints already defined by the expert. This approach has several advantages. This will clearly separate the summary production rules (the problem modeling) from the summary generation algorithm (the problem solving by the CSP solver). The summary can hence be adapted without reviewing the whole summary generation process. For instance, our approach enables users to adapt the summary to the target application and to their preferences by adding a constraint or modifying an existing one, without changing the summaries generation algorithm. We have proposed three models of video representation that are distinguished by their flexibility and their efficiency. Besides the originality related to each of the three proposed models, an additional contribution of this thesis is an extensive comparative study of their performance and the quality of the resulting summaries using objective and subjective measures. Finally, and in order to assess the quality of automatically generated summaries, the proposed approach was evaluated by a large-scale user evaluation. This evaluation involved more than 60 people. All these experiments have been performed within the challenging application of tennis match automatic summarization
Grigoras, Romulus. "Supervision de flux pour les contenus hypermédia : optimisation de politiques de préchargement et ordonnancement causal". Toulouse, INPT, 2003. http://www.theses.fr/2003INPT025H.
Texto completoWeinzaepfel, Philippe. "Le mouvement en action : estimation du flot optique et localisation d'actions dans les vidéos". Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAM013/document.
Texto completoWith the recent overwhelming growth of digital video content, automatic video understanding has become an increasingly important issue.This thesis introduces several contributions on two automatic video understanding tasks: optical flow estimation and human action localization.Optical flow estimation consists in computing the displacement of every pixel in a video andfaces several challenges including large non-rigid displacements, occlusions and motion boundaries.We first introduce an optical flow approach based on a variational model that incorporates a new matching method.The proposed matching algorithm is built upon a hierarchical multi-layer correlational architecture and effectively handles non-rigid deformations and repetitive textures.It improves the flow estimation in the presence of significant appearance changes and large displacements.We also introduce a novel scheme for estimating optical flow based on a sparse-to-dense interpolation of matches while respecting edges.This method leverages an edge-aware geodesic distance tailored to respect motion boundaries and to handle occlusions.Furthermore, we propose a learning-based approach for detecting motion boundaries.Motion boundary patterns are predicted at the patch level using structured random forests.We experimentally show that our approach outperforms the flow gradient baseline on both synthetic data and real-world videos,including an introduced dataset with consumer videos.Human action localization consists in recognizing the actions that occur in a video, such as `drinking' or `phoning', as well as their temporal and spatial extent.We first propose a novel approach based on Deep Convolutional Neural Network.The method extracts class-specific tubes leveraging recent advances in detection and tracking.Tube description is enhanced by spatio-temporal local features.Temporal detection is performed using a sliding window scheme inside each tube.Our approach outperforms the state of the art on challenging action localization benchmarks.Second, we introduce a weakly-supervised action localization method, ie, which does not require bounding box annotation.Action proposals are computed by extracting tubes around the humans.This is performed using a human detector robust to unusual poses and occlusions, which is learned on a human pose benchmark.A high recall is reached with only several human tubes, allowing to effectively apply Multiple Instance Learning.Furthermore, we introduce a new dataset for human action localization.It overcomes the limitations of existing benchmarks, such as the diversity and the duration of the videos.Our weakly-supervised approach obtains results close to fully-supervised ones while significantly reducing the required amount of annotations