Thèses sur le sujet « Fully- and weakly-Supervised learning »
Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres
Consultez les 50 meilleures thèses pour votre recherche sur le sujet « Fully- and weakly-Supervised learning ».
À côté de chaque source dans la liste de références il y a un bouton « Ajouter à la bibliographie ». Cliquez sur ce bouton, et nous générerons automatiquement la référence bibliographique pour la source choisie selon votre style de citation préféré : APA, MLA, Harvard, Vancouver, Chicago, etc.
Vous pouvez aussi télécharger le texte intégral de la publication scolaire au format pdf et consulter son résumé en ligne lorsque ces informations sont inclues dans les métadonnées.
Parcourez les thèses sur diverses disciplines et organisez correctement votre bibliographie.
Ma, Qixiang. « Deep learning based segmentation and detection of aorta structures in CT images involving fully and weakly supervised learning ». Electronic Thesis or Diss., Université de Rennes (2023-....), 2024. http://www.theses.fr/2024URENS029.
Texte intégralEndovascular aneurysm repair (EVAR) and transcatheter aortic valve implantation (TAVI) are endovascular interventions where preoperative CT image analysis is a prerequisite for planning and navigation guidance. In the case of EVAR procedures, the focus is specifically on the challenging issue of aortic segmentation in non-contrast-enhanced CT (NCCT) imaging, which remains unresolved. For TAVI procedures, attention is directed toward detecting anatomical landmarks to predict the risk of complications and select the bioprosthesis. To address these challenges, we propose automatic methods based on deep learning (DL). Firstly, a fully-supervised model based on 2D-3D features fusion is proposed for vascular segmentation in NCCTs. Subsequently, a weakly-supervised framework based on Gaussian pseudo labels is considered to reduce and facilitate manual annotation during the training phase. Finally, hybrid weakly- and fully-supervised methods are proposed to extend segmentation to more complex vascular structures beyond the abdominal aorta. When it comes to aortic valve in cardiac CT scans, a two-stage fully-supervised DL method is proposed for landmarks detection. The results contribute to enhancing preoperative imaging and the patient's digital model for computer-assisted endovascular interventions
Hlynur, Davíð Hlynsson. « Predicting expert moves in the game of Othello using fully convolutional neural networks ». Thesis, KTH, Robotik, perception och lärande, RPL, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-210914.
Texte intégralNoggrann funktionsteknik är en viktig faktor för artificiell intelligens för spel. I dennaavhandling undersöker jag fördelarna med att delegera teknikarbetet till modellen i ställetför de funktioner, som använder brädspelet Othello som en fallstudie. Konvolutionellaneurala nätverk av varierande djup är utbildade att spela på ett mänskligt sätt genom attlära sig att förutsäga handlingar från turneringar. Mitt främsta resultat är att ett nätverkkan utbildas för att uppnå 57,4% prediktionsnoggrannhet på en testuppsättning, vilketöverträffar tidigare toppmoderna i den här uppgiften. Noggrannheten ökar till 58.3% genomatt lägga till flera vanliga handgjorda funktioner som inmatning till nätverket, tillkostnaden för mer än hälften så mycket beräknatid.
Durand, Thibaut. « Weakly supervised learning for visual recognition ». Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066142/document.
Texte intégralThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
Durand, Thibaut. « Weakly supervised learning for visual recognition ». Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066142.
Texte intégralThis thesis studies the problem of classification of images, where the goal is to predict if a semantic category is present in the image, based on its visual content. To analyze complex scenes, it is important to learn localized representations. To limit the cost of annotation during training, we have focused on weakly supervised learning approaches. In this thesis, we propose several models that simultaneously classify and localize objects, using only global labels during training. The weak supervision significantly reduces the cost of full annotation, but it makes learning more challenging. The key issue is how to aggregate local scores - e.g. regions - into global score - e.g. image. The main contribution of this thesis is the design of new pooling functions for weakly supervised learning. In particular, we propose a “max + min” pooling function, which unifies many pooling functions. We describe how to use this pooling in the Latent Structured SVM framework as well as in convolutional networks. To solve the optimization problems, we present several solvers, some of which allow to optimize a ranking metric such as Average Precision. We experimentally show the interest of our models with respect to state-of-the-art methods, on ten standard image classification datasets, including the large-scale dataset ImageNet
Raisi, Elaheh. « Weakly Supervised Machine Learning for Cyberbullying Detection ». Diss., Virginia Tech, 2019. http://hdl.handle.net/10919/89100.
Texte intégralDoctor of Philosophy
Social media has become an inevitable part of individuals social and business lives. Its benefits, however, come with various negative consequences such as online harassment, cyberbullying, hate speech, and online trolling especially among the younger population. According to the American Academy of Child and Adolescent Psychiatry,1 victims of bullying can suffer interference to social and emotional development and even be drawn to extreme behavior such as attempted suicide. Any widespread bullying enabled by technology represents a serious social health threat. In this research, we develop automated, data-driven methods for harassment-based cyberbullying detection. The availability of tools such as these can enable technologies that reduce the harm and toxicity created by these detrimental behaviors. Our general framework is based on consistency of two detectors that co-train one another. One learner identifies bullying incidents by examining the language content in the message; another learner considers social structure to discover bullying. When designing the general framework, we address three tasks: First, we use machine learning with weak supervision, which significantly alleviates the need for human experts to perform tedious data annotation. Second, we incorporate the efficacy of distributed representations of words and nodes such as deep, nonlinear models in the framework to improve the predictive power of models. Finally, we decrease the sensitivity of the framework to language describing particular social groups including race, gender, religion, and sexual orientation. This research represents important steps toward improving technological capability for automatic cyberbullying detection.
Hanwell, David. « Weakly supervised learning of visual semantic attributes ». Thesis, University of Bristol, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.687063.
Texte intégralKumar, M. Pawan. « Weakly Supervised Learning for Structured Output Prediction ». Habilitation à diriger des recherches, École normale supérieure de Cachan - ENS Cachan, 2013. http://tel.archives-ouvertes.fr/tel-00943602.
Texte intégralNodet, Pierre. « Biquality learning : from weakly supervised learning to distribution shifts ». Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG030.
Texte intégralThe field of Learning with weak supervision is called Weakly Supervised Learning and aggregates a variety of situations where the collected ground truth is imperfect. The collected labels may suffer from bad quality, non-adaptability, or insufficient quantity. In this report, we propose a novel taxonomy of Weakly Supervised Learning as a continuous cube called the Weak Supervision Cube that encompasses all of the weaknesses of supervision. To design algorithms capable of handling any weak supervisions, we suppose the availability of a small trusted dataset, without bias and corruption, in addition to the potentially corrupted dataset. The trusted dataset allows the definition of a generic learning framework named Biquality Learning. We review the state-of-the-art of these algorithms that assumed the availability of a small trusted dataset. Under this framework, we propose an algorithm based on Importance Reweighting for Biquality Learning (IRBL). This classifier-agnostic approach is based on the empirical estimation of the Radon-Nikodym derivative (RND), to build a risk-consistent estimator on reweighted untrusted data. Then we extend the proposed framework to dataset shifts. Dataset shifts happen when the data distribution observed at training time is different from what is expected from the data distribution at testing time. So we propose an improved version of IRBL named IRBL2, capable of handling such dataset shifts. Additionally, we propose another algorithm named KPDR based on the same theory but focused on covariate shift instead of the label noise formulation. To diffuse and democratize the Biquality Learning Framework, we release an open-source Python library à la Scikit-Learn for Biquality Learning named biquality-learn
Ruiz, Ovejero Adrià. « Weakly-supervised learning for automatic facial behaviour analysis ». Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/457708.
Texte intégralAquesta tesi doctoral se centra en el problema de l'Anàlisi Automàtic del Comportament Facial, on l'objectiu és desenvolupar sistemes autònoms capaços de reconèixer i entendre les expressions facials humanes. Donada la quantitat d'informació que es pot extreure d'aquestes expressions, sistemes d'aquest tipus tenen multitud d'aplicacions en camps com la Interacció Home-Màquina, el Marketing o l'Assistència Clínica. Per aquesta raó, investigadors en Visió per Computador i Aprenentatge Automàtic han destinat molts esforços en les últimes dècades per tal d'aconseguir avenços en aquest sentit. Malgrat això, la majoria de problemes relacionats amb l'anàlisi automàtic d'expressions facials encara estan lluny de ser conisderats com a resolts. En aquest context, aquesta tesi està motivada pel fet que la majoria de mètodes proposats fins ara han seguit el paradigma d'aprenentatge supervisat, on els models són entrenats mitjançant dades anotades explícitament en funció del problema a resoldre. Desafortunadament, aquesta estratègia té grans limitacions donat que l'anotació d'expressions en bases de dades és una tasca molt costosa i lenta. Per tal d'afrontar aquest repte, aquesta tesi proposa encarar l'Anàlisi Automàtic del Comportament Facial mitjançant el paradigma d'aprenentatge dèbilment supervisat. A diferència del cas anterior, aquests models poden ser entrenats utilitzant etiquetes que són fàcils d'anotar però que només donen informació parcial sobre la tasca que es vol aprendre. Seguint aquesta idea, desenvolupem un conjunt de mètodes per tal de resoldre problemes típics en el camp com el reconeixement d' "Action Units", l'Estimació d'Intensitat d'Expressions Facials o l'Anàlisi Emocional. Els resultats obtinguts avaluant els mètodes presentats en aquestes tasques, demostren que l'aprenentatge dèbilment supervisat pot ser una solució per tal de reduir l'esforç d'anotació en l'Anàlisi Automàtic del Comportament Facial. De la mateixa manera, aquests mètodes es mostren útils a l'hora de facilitar el procés d'etiquetatge de bases de dades creades per aquest propòsit.
Siva, Parthipan. « Automatic annotation for weakly supervised learning of detectors ». Thesis, Queen Mary, University of London, 2012. http://qmro.qmul.ac.uk/xmlui/handle/123456789/3359.
Texte intégralTorcinovich, Alessandro <1992>. « Using Contextual Information In Weakly Supervised Learning : Toward the integration of contextual and deep learningapproaches, to address weakly supervised tasks ». Doctoral thesis, Università Ca' Foscari Venezia, 2021. http://hdl.handle.net/10579/20596.
Texte intégralValvano, Gabriele. « Semi-supervised and weakly-supervised learning with spatio-temporal priors in medical image segmentation ». Thesis, IMT Alti Studi Lucca, 2021. http://e-theses.imtlucca.it/344/1/Valvano_phdthesis.pdf.
Texte intégralStella, Federico. « Learning a Local Reference Frame for Point Clouds using Spherical CNNs ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2020. http://amslaurea.unibo.it/20197/.
Texte intégralChiaroni, Florent. « Weakly supervised learning for image classification and potentially moving obstacles analysis ». Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASC006.
Texte intégralIn the context of autonomous vehicle perception, the interest of the research community for deep learning approaches has continuously grown since the last decade. This can be explained by the fact that deep learning techniques provide nowadays state-of-the-art prediction performances for several computer vision challenges. More specifically, deep learning techniques can provide rich semantic information concerning the complex visual patterns encountered in autonomous driving scenarios. However, such approaches require, as their name implies, to learn on data. In particular, state-of-the-art prediction performances on discriminative tasks often demand hand labeled data of the target application domain. Hand labeling has a significant cost, while, conversely, unlabeled data can be easily obtained in the autonomous driving context. It turns out that a category of learning strategies, referred to as weakly supervised learning, enables to exploit partially labeled data. Therefore, we aim in this thesis at reducing as much as possible the hand labeling requirement by proposing weakly supervised learning techniques.We start by presenting a type of learning methods which are self-supervised. They consist of substituting hand-labels by upstream techniques able to automatically generate exploitable training labels. Self-supervised learning (SSL) techniques have proven their usefulness in the past for offroad obstacles avoidance and path planning through changing environments. However, SSL techniques still leave the door open for detection, segmentation, and classification of static potentially moving obstacles.Consequently, we propose in this thesis three novel weakly supervised learning methods with the final goal to deal with such road users through an SSL framework. The first two proposed contributions of this work aim at dealing with partially labeled image classification datasets, such that the labeling effort can be only focused on our class of interest, the positive class. Then, we propose an approach which deals with training data containing a high fraction of wrong labels, referred to as noisy labels. Next, we demonstrate the potential of such weakly supervised strategies for detection and segmentation of potentially moving obstacles
He, Fengxiang. « Instance-Dependent Positive-Unlabelled Learning ». Thesis, The University of Sydney, 2018. http://hdl.handle.net/2123/20115.
Texte intégralLerousseau, Marvin. « Weakly Supervised Segmentation and Context-Aware Classification in Computational Pathology ». Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG015.
Texte intégralAnatomic pathology is the medical discipline responsible for the diagnosis and characterization of diseases through the macroscopic, microscopic, molecular and immunologic inspection of tissues. Modern technologies have made possible the digitization of tissue glass slides into whole slide images, which can themselves be processed by artificial intelligence to enhance the capabilities of pathologists. This thesis presented several novel and powerful approaches that tackle pan-cancer segmentation and classification of whole slide images. Learning segmentation models for whole slide images is challenged by an annotation bottleneck which arises from (i) a shortage of pathologists, (ii) an intense cumbersomeness and boring annotation process, and (iii) major inter-annotators discrepancy. My first line of work tackled pan-cancer tumor segmentation by designing two novel state-of-the-art weakly supervised approaches that exploit slide-level annotations that are fast and easy to obtain. In particular, my second segmentation contribution was a generic and highly powerful algorithm that leverages percentage annotations on a slide basis, without needing any pixelbased annotation. Extensive large-scale experiments showed the superiority of my approaches over weakly supervised and supervised methods for pan-cancer tumor segmentation on a dataset of more than 15,000 unfiltered and extremely challenging whole slide images from snap-frozen tissues. My results indicated the robustness of my approaches to noise and systemic biases in annotations. Digital slides are difficult to classify due to their colossal sizes, which range from millions of pixels to billions of pixels, often weighing more than 500 megabytes. The straightforward use of traditional computer vision is therefore not possible, prompting the use of multiple instance learning, a machine learning paradigm consisting in assimilating a whole slide image as a set of patches uniformly sampled from it. Up to my works, the greater majority of multiple instance learning approaches considered patches as independently and identically sampled, i.e. discarded the spatial relationship of patches extracted from a whole slide image. Some approaches exploited such spatial interconnection by leveraging graph-based models, although the true domain of whole slide images is specifically the image domain which is more suited with convolutional neural networks. I designed a highly powerful and modular multiple instance learning framework that leverages the spatial relationship of patches extracted from a whole slide image by building a sparse map from the patches embeddings, which is then further processed into a whole slide image embedding by a sparse-input convolutional neural network, before being classified by a generic classifier model. My framework essentially bridges the gap between multiple instance learning, and fully convolutional classification. I performed extensive experiments on three whole slide image classification tasks, including the golden task of cancer pathologist of subtyping tumors, on a dataset of more than 20,000 whole slide images from public data. Results highlighted the superiority of my approach over all other widespread multiple instance learning methods. Furthermore, while my experiments only investigated my approach with sparse-input convolutional neural networks with two convolutional layers, the results showed that my framework works better as the number of parameters increases, suggesting that more sophisticated convolutional neural networks can easily obtain superior results
Sahasrabudhe, Mihir. « Unsupervised and weakly supervised deep learning methods for computer vision and medical imaging ». Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASC010.
Texte intégralThe first two contributions of this thesis (Chapter 2 and 3) are models for unsupervised 2D alignment and learning 3D object surfaces, called Deforming Autoencoders (DAE) and Lifting Autoencoders (LAE). These models are capable of identifying canonical space in order to represent different object properties, for example, appearance in a canonical space, deformation associated with this appearance that maps it to the image space, and for human faces, a 3D model for a face, its facial expression, and the angle of the camera. We further illustrate applications of models to other domains_ alignment of lung MRI images in medical image analysis, and alignment of satellite images for remote sensing imagery. In Chapter 4, we concentrate on a problem in medical image analysis_ diagnosis of lymphocytosis. We propose a convolutional network to encode images of blood smears obtained from a patient, followed by an aggregation operation to gather information from all images in order to represent them in one feature vector which is used to determine the diagnosis. Our results show that the performance of the proposed models is at-par with biologists and can therefore augment their diagnosis
Sanchez, Eduardo Hugo. « Learning disentangled representations of satellite image time series in a weakly supervised manner ». Thesis, Toulouse 3, 2021. http://www.theses.fr/2021TOU30032.
Texte intégralThis work focuses on learning data representations of satellite image time series via an unsupervised learning approach. The main goal is to enforce the data representation to capture the relevant information from the time series to perform other applications of satellite imagery. However, extracting information from satellite data involves many challenges since models need to deal with massive amounts of images provided by Earth observation satellites. Additionally, it is impossible for human operators to label such amount of images manually for each individual task (e.g. classification, segmentation, change detection, etc.). Therefore, we cannot use the supervised learning framework which achieves state-of-the-art results in many tasks.To address this problem, unsupervised learning algorithms have been proposed to learn the data structure instead of performing a specific task. Unsupervised learning is a powerful approach since no labels are required during training and the knowledge acquired can be transferred to other tasks enabling faster learning with few labels.In this work, we investigate the problem of learning disentangled representations of satellite image time series where a shared representation captures the spatial information across the images of the time series and an exclusive representation captures the temporal information which is specific to each image. We present the benefits of disentangling the spatio-temporal information of time series, e.g. the spatial information is useful to perform time-invariant image classification or segmentation while the knowledge about the temporal information is useful for change detection. To accomplish this, we analyze some of the most prevalent unsupervised learning models such as the variational autoencoder (VAE) and the generative adversarial networks (GANs) as well as the extensions of these models to perform representation disentanglement. Encouraged by the successful results achieved by generative and reconstructive models, we propose a novel framework to learn spatio-temporal representations of satellite data. We prove that the learned disentangled representations can be used to perform several computer vision tasks such as classification, segmentation, information retrieval and change detection outperforming other state-of-the-art models. Nevertheless, our experiments suggest that generative and reconstructive models present some drawbacks related to the dimensionality of the data representation, architecture complexity and the lack of disentanglement guarantees. In order to overcome these limitations, we explore a recent method based on mutual information estimation and maximization for representation learning without relying on image reconstruction or image generation. We propose a new model that extends the mutual information maximization principle to disentangle the representation domain into two parts. In addition to the experiments performed on satellite data, we show that our model is able to deal with different kinds of datasets outperforming the state-of-the-art methods based on GANs and VAEs. Furthermore, we show that our mutual information based model is less computationally demanding yet more effective. Finally, we show that our model is useful to create a data representation that only captures the class information between two images belonging to the same category. Disentangling the class or category of an image from other factors of variation provides a powerful tool to compute the similarity between pixels and perform image segmentation in a weakly-supervised manner
Tang, Yuxing. « Weakly supervised learning of deformable part models and convolutional neural networks for object detection ». Thesis, Lyon, 2016. http://www.theses.fr/2016LYSEC062/document.
Texte intégralIn this dissertation we address the problem of weakly supervised object detection, wherein the goal is to recognize and localize objects in weakly-labeled images where object-level annotations are incomplete during training. To this end, we propose two methods which learn two different models for the objects of interest. In our first method, we propose a model enhancing the weakly supervised Deformable Part-based Models (DPMs) by emphasizing the importance of location and size of the initial class-specific root filter. We first compute a candidate pool that represents the potential locations of the object as this root filter estimate, by exploring the generic objectness measurement (region proposals) to combine the most salient regions and “good” region proposals. We then propose learning of the latent class label of each candidate window as a binary classification problem, by training category-specific classifiers used to coarsely classify a candidate window into either a target object or a non-target class. Furthermore, we improve detection by incorporating the contextual information from image classification scores. Finally, we design a flexible enlarging-and-shrinking post-processing procedure to modify the DPMs outputs, which can effectively match the approximate object aspect ratios and further improve final accuracy. Second, we investigate how knowledge about object similarities from both visual and semantic domains can be transferred to adapt an image classifier to an object detector in a semi-supervised setting on a large-scale database, where a subset of object categories are annotated with bounding boxes. We propose to transform deep Convolutional Neural Networks (CNN)-based image-level classifiers into object detectors by modeling the differences between the two on categories with both image-level and bounding box annotations, and transferring this information to convert classifiers to detectors for categories without bounding box annotations. We have evaluated both our approaches extensively on several challenging detection benchmarks, e.g. , PASCAL VOC, ImageNet ILSVRC and Microsoft COCO. Both our approaches compare favorably to the state-of-the-art and show significant improvement over several other recent weakly supervised detection methods
Doersch, Carl. « Supervision Beyond Manual Annotations for Learning Visual Representations ». Research Showcase @ CMU, 2016. http://repository.cmu.edu/dissertations/787.
Texte intégralGötz, Michael [Verfasser], et R. [Akademischer Betreuer] Dillmann. « Variability-Aware and Weakly Supervised Learning for Semantic Tissue Segmentation / Michael Götz ; Betreuer : R. Dillmann ». Karlsruhe : KIT-Bibliothek, 2017. http://d-nb.info/1137265000/34.
Texte intégralHrabovszki, Dávid. « Classification of brain tumors in weakly annotated histopathology images with deep learning ». Thesis, Linköpings universitet, Statistik och maskininlärning, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-177271.
Texte intégralWang, Xin. « Gaze based weakly supervised localization for image classification : application to visual recognition in a food dataset ». Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066577/document.
Texte intégralIn this dissertation, we discuss how to use the human gaze data to improve the performance of the weak supervised learning model in image classification. The background of this topic is in the era of rapidly growing information technology. As a consequence, the data to analyze is also growing dramatically. Since the amount of data that can be annotated by the human cannot keep up with the amount of data itself, current well-developed supervised learning approaches may confront bottlenecks in the future. In this context, the use of weak annotations for high-performance learning methods is worthy of study. Specifically, we try to solve the problem from two aspects: One is to propose a more time-saving annotation, human eye-tracking gaze, as an alternative annotation with respect to the traditional time-consuming annotation, e.g. bounding box. The other is to integrate gaze annotation into a weakly supervised learning scheme for image classification. This scheme benefits from the gaze annotation for inferring the regions containing the target object. A useful property of our model is that it only exploits gaze for training, while the test phase is gaze free. This property further reduces the demand of annotations. The two isolated aspects are connected together in our models, which further achieve competitive experimental results
Wang, Xin. « Gaze based weakly supervised localization for image classification : application to visual recognition in a food dataset ». Electronic Thesis or Diss., Paris 6, 2017. http://www.theses.fr/2017PA066577.
Texte intégralIn this dissertation, we discuss how to use the human gaze data to improve the performance of the weak supervised learning model in image classification. The background of this topic is in the era of rapidly growing information technology. As a consequence, the data to analyze is also growing dramatically. Since the amount of data that can be annotated by the human cannot keep up with the amount of data itself, current well-developed supervised learning approaches may confront bottlenecks in the future. In this context, the use of weak annotations for high-performance learning methods is worthy of study. Specifically, we try to solve the problem from two aspects: One is to propose a more time-saving annotation, human eye-tracking gaze, as an alternative annotation with respect to the traditional time-consuming annotation, e.g. bounding box. The other is to integrate gaze annotation into a weakly supervised learning scheme for image classification. This scheme benefits from the gaze annotation for inferring the regions containing the target object. A useful property of our model is that it only exploits gaze for training, while the test phase is gaze free. This property further reduces the demand of annotations. The two isolated aspects are connected together in our models, which further achieve competitive experimental results
Abou-Moustafa, Karim. « Metric learning revisited : new approaches for supervised and unsupervised metric learning with analysis and algorithms ». Thesis, McGill University, 2012. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=106370.
Texte intégralDans cette thèse, je propose deux algorithmes pour l'apprentissage de la métrique dX; le premier pour l'apprentissage supervisé, et le deuxième pour l'apprentissage non-supervisé, ainsi que pour l'apprentissage supervisé et semi-supervisé. En particulier, je propose des algorithmes qui prennent en considération la structure et la géométrie de X d'une part, et les caractéristiques des ensembles de données du monde réel d'autre part. Cependant, si on cherche également la réduction de dimension, donc sous certaines hypothèses légères sur la topologie de X, et en même temps basé sur des informations disponibles a priori, on peut apprendre une intégration de X dans un espace Euclidien de petite dimension Rp0 p0 << p, où la distance Euclidienne révèle mieux les ressemblances entre les éléments de X et leurs groupements (clusters). Alors, comme un sous-produit, on obtient simultanément une réduction de dimension et un apprentissage métrique. Pour l'apprentissage supervisé, je propose PARDA, ou Pareto discriminant analysis, pour la discriminante réduction linéaire de dimension. PARDA est basé sur le mécanisme d'optimisation à multi-objectifs; optimisant simultanément plusieurs fonctions objectives, éventuellement des fonctions contradictoires. Cela permet à PARDA de s'adapter à la topologie de classe dans un espace dimensionnel plus petit, et naturellement gère le problème de masquage de classe associé au discriminant Fisher dans le cadre d'analyse de problèmes à multi-classes. En conséquence, PARDA permet des meilleurs résultats de classification par rapport aux techniques modernes de réduction discriminante de dimension. Pour l'apprentissage non-supervisés, je propose un cadre algorithmique, noté par ??, qui encapsule les algorithmes spectraux d'apprentissage formant an algorithme d'apprentissage de métrique. Le cadre ?? capture la structure locale et la densité locale d'information de chaque point dans un ensemble de données, et donc il porte toutes les informations sur la densité d'échantillon différente dans l'espace d'entrée. La structure de ?? induit deux métriques de distance pour ses éléments: la métrique Bhattacharyya-Riemann dBR et la métrique Jeffreys-Riemann dJR. Les deux mesures réorganisent la proximité entre les points de X basé sur la structure locale et la densité autour de chaque point. En conséquence, lorsqu'on combine l'espace métrique (??, dBR) ou (??, dJR) avec les algorithmes de "spectral clustering" et "Euclidean embedding", ils donnent des améliorations significatives dans les précisions de regroupement et les taux d'erreur pour une grande variété de tâches de clustering et de classification.
De, La Bourdonnaye François. « Learning sensori-motor mappings using little knowledge : application to manipulation robotics ». Thesis, Université Clermont Auvergne (2017-2020), 2018. http://www.theses.fr/2018CLFAC037/document.
Texte intégralThe thesis is focused on learning a complex manipulation robotics task using little knowledge. More precisely, the concerned task consists in reaching an object with a serial arm and the objective is to learn it without camera calibration parameters, forward kinematics, handcrafted features, or expert demonstrations. Deep reinforcement learning algorithms suit well to this objective. Indeed, reinforcement learning allows to learn sensori-motor mappings while dispensing with dynamics. Besides, deep learning allows to dispense with handcrafted features for the state spacerepresentation. However, it is difficult to specify the objectives of the learned task without requiring human supervision. Some solutions imply expert demonstrations or shaping rewards to guiderobots towards its objective. The latter is generally computed using forward kinematics and handcrafted visual modules. Another class of solutions consists in decomposing the complex task. Learning from easy missions can be used, but this requires the knowledge of a goal state. Decomposing the whole complex into simpler sub tasks can also be utilized (hierarchical learning) but does notnecessarily imply a lack of human supervision. Alternate approaches which use several agents in parallel to increase the probability of success can be used but are costly. In our approach,we decompose the whole reaching task into three simpler sub tasks while taking inspiration from the human behavior. Indeed, humans first look at an object before reaching it. The first learned task is an object fixation task which is aimed at localizing the object in the 3D space. This is learned using deep reinforcement learning and a weakly supervised reward function. The second task consists in learning jointly end-effector binocular fixations and a hand-eye coordination function. This is also learned using a similar set-up and is aimed at localizing the end-effector in the 3D space. The third task uses the two prior learned skills to learn to reach an object and uses the same requirements as the two prior tasks: it hardly requires supervision. In addition, without using additional priors, an object reachability predictor is learned in parallel. The main contribution of this thesis is the learning of a complex robotic task with weak supervision
Rocco, Ignacio. « Neural architectures for estimating correspondences between images ». Electronic Thesis or Diss., Université Paris sciences et lettres, 2020. http://www.theses.fr/2020UPSLE060.
Texte intégralThe goal of this thesis is to develop methods for establishing correspondences between pairs of images in challenging situations, such as extreme illumination changes, scenes with little texture or with repetitive structures, and matching parts of objects which belong to the same class, but which may have large intra-class appearance differences. In summary, our contributions are the following: (i) we develop a trainable approach for parametric image alignment by means of a siamese network model, (ii) we devise a weakly-supervised training approach, which allow training from real image pairs having only annotation at the level of image-pairs, (iii) we propose the Neighbourhood Consensus Networks which can be used to robustly estimate correspondences in tasks where discrete correspondences are required, and (iv) because the dense formulation of the Neighbourhood Consensus Networks is memory and computationally intensive, we develop a more efficient variant that can reduce the memory requirements and run-time by more than ten times
Peyre, Julia. « Learning to detect visual relations ». Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE016.
Texte intégralIn this thesis, we study the problem of detection of visual relations of the form (subject, predicate, object) in images, which are intermediate level semantic units between objects and complex scenes. Our work addresses two main challenges in visual relation detection: (1) the difficulty of obtaining box-level annotations to train fully-supervised models, (2) the variability of appearance of visual relations. We first propose a weakly-supervised approach which, given pre-trained object detectors, enables us to learn relation detectors using image-level labels only, maintaining a performance close to fully-supervised models. Second, we propose a model that combines different granularities of embeddings (for subject, object, predicate and triplet) to better model appearance variation and introduce an analogical reasoning module to generalize to unseen triplets. Experimental results demonstrate the improvement of our hybrid model over a purely compositional model and validate the benefits of our transfer by analogy to retrieve unseen triplets
Jacobzon, Gustaf. « Multi-site Organ Detection in CT Images using Deep Learning ». Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279290.
Texte intégralVid optimering av en kontrollerad dos inom strålbehandling krävs det information om friska organ, så kallade riskorgan, i närheten av de maligna cellerna för att minimera strålningen i dessa organ. Denna information kan tillhandahållas av djupa volymetriskta segmenteringsnätverk, till exempel 3D U-Net. Begränsningar i minnesstorleken hos moderna grafikkort gör att det inte är möjligt att träna ett volymetriskt segmenteringsnätverk på hela bildvolymen utan att först nedsampla volymen. Detta leder dock till en lågupplöst segmentering av organen som inte är tillräckligt precis för att kunna användas vid optimeringen. Ett alternativ är att endast behandla en intresseregion som innesluter ett eller ett fåtal organ från bildvolymen och träna ett regionspecifikt nätverk på denna mindre volym. Detta tillvägagångssätt kräver dock information om vilket område i bildvolymen som ska skickas till det regionspecifika segmenteringsnätverket. Denna information kan tillhandahållas av ett 3Dobjektdetekteringsnätverk. I regel är även detta nätverk regionsspecifikt, till exempel thorax-regionen, och kräver mänsklig assistans för att välja rätt nätverk för en viss region i kroppen. Vi föreslår istället ett multiregions-detekteringsnätverk baserat påYOLOv3 som kan detektera 43 olika organ och fungerar på godtyckligt valda axiella fönster i kroppen. Vår modell identifierar närvarande organ (hela eller trunkerade) i bilden och kan automatiskt ge information om vilken region som ska behandlas av varje regionsspecifikt segmenteringsnätverk. Vi tränar vår modell på fyra små (så lågt som 20 bilder) platsspecifika datamängder med svag övervakning för att hantera den delvis icke-annoterade egenskapen hos datamängderna. Vår modell genererar en organ-specifik intresseregion för 92 % av organen som finns i testmängden.
Miech, Antoine. « Large-scale learning from video and natural language ». Electronic Thesis or Diss., Université Paris sciences et lettres, 2020. http://www.theses.fr/2020UPSLE059.
Texte intégralThe goal of this thesis is to build and train machine learning models capable of understanding the content of videos. Current video understanding approaches mainly rely on large-scale manually annotated video datasets for training. However, collecting and annotating such dataset is cumbersome, expensive and time-consuming. To address this issue, this thesis focuses on leveraging large amounts of readily-available, but noisy annotations in the form of natural language. In particular, we exploit a diverse corpus of textual metadata such as movie scripts, web video titles and descriptions or automatically transcribed speech obtained from narrated videos. Training video models on such readily-available textual data is challenging as such annotation is often imprecise or wrong. In this thesis, we introduce learning approaches to deal with weak annotation and design specialized training objectives and neural network architectures
Zhukov, Dimitri. « Learning to localize goal-oriented actions with weak supervision ». Electronic Thesis or Diss., Université Paris sciences et lettres, 2021. http://www.theses.fr/2021UPSLE105.
Texte intégralThe goal of this thesis is to develop methods for automatic understanding of video content. We focus on instructional videos that demonstrate how to perform complex tasks, such as making an omelette or hanging a picture. First, we investigate learning visual models for the steps of tasks, using only a list of steps for each task, instead of costly and time consuming human annotations. Our model allows us to share the information between the tasks on the substep level, effectively multiplying the amount of available training data. We demonstrate the benefits of our method on a newly collected dataset of instructional videos, CrossTask. Next, we present a method for isolating taskrelated actions from the surrounding background, that doesn’t rely on human supervision. Finally, we learn to associate natural language instructions with the corresponding objects within the 3D scene, reconstructed from the videos
Fathi, Alireza. « Learning descriptive models of objects and activities from egocentric video ». Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/48738.
Texte intégralBoyraz, Hakan. « Human Action Localization and Recognition in Unconstrained Videos ». Doctoral diss., University of Central Florida, 2013. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5910.
Texte intégralPh.D.
Doctorate
Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering
Spreyer, Kathrin. « Does it have to be trees ? : Data-driven dependency parsing with incomplete and noisy training data ». Phd thesis, Universität Potsdam, 2011. http://opus.kobv.de/ubp/volltexte/2012/5749/.
Texte intégralWir präsentieren eine neuartige Herangehensweise an das Trainieren von daten-gesteuerten Dependenzparsern auf unvollständigen Annotationen. Unsere Parser sind einfache Varianten von zwei bekannten Dependenzparsern, nämlich des transitions-basierten Malt-Parsers sowie des graph-basierten MST-Parsers. Während frühere Arbeiten zum Parsing mit unvollständigen Daten die Aufgabe meist in Frameworks für unüberwachtes oder schwach überwachtes maschinelles Lernen gebettet haben, behandeln wir sie im Wesentlichen mit überwachten Lernverfahren. Insbesondere schlagen wir "agnostische" Parser vor, die jegliche Fragmentierung der Trainingsdaten vor ihren daten-gesteuerten Lernkomponenten verbergen. Wir stellen Versuchsergebnisse mit Trainingsdaten vor, die mithilfe von Annotationsprojektion gewonnen wurden. Annotationsprojektion ist ein Verfahren, das es uns erlaubt, innerhalb eines Parallelkorpus Annotationen von einer Sprache auf eine andere zu übertragen. Bedingt durch begrenzten crosslingualen Parallelismus und fehleranfällige Wortalinierung ist die Ausgabe des Projektionsschrittes jedoch üblicherweise verrauscht und unvollständig. Gerade dies macht projizierte Annotationen zu einer angemessenen Testumgebung für unsere fragment-fähigen Parser. Unsere Ergebnisse belegen, dass (i) Dependenzparser, die auf großen Mengen von projizierten Annotationen trainiert wurden, größere Genauigkeit erzielen als die zugrundeliegenden direkten Projektionen, und dass (ii) die Genauigkeit unserer agnostischen, fragment-fähigen Parser der Genauigkeit der Originalparser (trainiert auf streng gefilterten, komplett projizierten Bäumen) annähernd gleichgestellt ist. Schließlich zeigen wir mit künstlich fragmentierten Gold-Standard-Daten, dass (iii) der Verlust an Genauigkeit selbst dann bescheiden bleibt, wenn bis zu 50% aller Kanten in den Trainingsdaten fehlen.
Giraldo, Zuluaga Jhony Heriberto. « Graph-based Algorithms in Computer Vision, Machine Learning, and Signal Processing ». Electronic Thesis or Diss., La Rochelle, 2022. http://www.theses.fr/2022LAROS037.
Texte intégralGraph representation learning and its applications have gained significant attention in recent years. Notably, Graph Neural Networks (GNNs) and Graph Signal Processing (GSP) have been extensively studied. GNNs extend the concepts of convolutional neural networks to non-Euclidean data modeled as graphs. Similarly, GSP extends the concepts of classical digital signal processing to signals supported on graphs. GNNs and GSP have numerous applications such as semi-supervised learning, point cloud semantic segmentation, prediction of individual relations in social networks, modeling proteins for drug discovery, image, and video processing. In this thesis, we propose novel approaches in video and image processing, GNNs, and recovery of time-varying graph signals. Our main motivation is to use the geometrical information that we can capture from the data to avoid data hungry methods, i.e., learning with minimal supervision. All our contributions rely heavily on the developments of GSP and spectral graph theory. In particular, the sampling and reconstruction theory of graph signals play a central role in this thesis. The main contributions of this thesis are summarized as follows: 1) we propose new algorithms for moving object segmentation using concepts of GSP and GNNs, 2) we propose a new algorithm for weakly-supervised semantic segmentation using hypergraph neural networks, 3) we propose and analyze GNNs using concepts from GSP and spectral graph theory, and 4) we introduce a novel algorithm based on the extension of a Sobolev smoothness function for the reconstruction of time-varying graph signals from discrete samples
Chen, Mickaël. « Learning with weak supervision using deep generative networks ». Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS024.
Texte intégralMany successes of deep learning rely on the availability of massive annotated datasets that can be exploited by supervised algorithms. Obtaining those labels at a large scale, however, can be difficult, or even impossible in many situations. Designing methods that are less dependent on annotations is therefore a major research topic, and many semi-supervised and weakly supervised methods have been proposed. Meanwhile, the recent introduction of deep generative networks provided deep learning methods with the ability to manipulate complex distributions, allowing for breakthroughs in tasks such as image edition and domain adaptation. In this thesis, we explore how these new tools can be useful to further alleviate the need for annotations. Firstly, we tackle the task of performing stochastic predictions. It consists in designing systems for structured prediction that take into account the variability in possible outputs. We propose, in this context, two models. The first one performs predictions on multi-view data with missing views, and the second one predicts possible futures of a video sequence. Then, we study adversarial methods to learn a factorized latent space, in a setting with two explanatory factors but only one of them is annotated. We propose models that aim to uncover semantically consistent latent representations for those factors. One model is applied to the conditional generation of motion capture data, and another one to multi-view data. Finally, we focus on the task of image segmentation, which is of crucial importance in computer vision. Building on previously explored ideas, we propose a model for object segmentation that is entirely unsupervised
Masood, Syed Zain. « A Study of Localization and Latency Reduction for Action Recognition ». Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5426.
Texte intégralPh.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
Yu, Lu. « Semantic representation : from color to deep embeddings ». Doctoral thesis, Universitat Autònoma de Barcelona, 2019. http://hdl.handle.net/10803/669458.
Texte intégralUno de los problemas fundamentales de la visión por computador es representar imágenes con descripciones compactas semánticamente relevantes. Estas descripciones podrían utilizarse en una amplia variedad de aplicaciones, como la comparación de imágenes, la detección de objetos y la búsqueda de vídeos. El objetivo principal de esta tesis es estudiar las representaciones de imágenes desde dos aspectos: las descripciones de color y las descripciones profundas con redes neuronales. En la primera parte de la tesis partimos de descripciones de color modeladas a mano. Existen nombres comunes en varias lenguas para los colores básicos, y proponemos un método para extender los nombres de colores adicionales de acuerdo con su naturaleza complementaria a los básicos. Esto nos permite calcular representaciones de nombres de colores de longitud arbitraria con un alto poder discriminatorio. Los experimentos psicofísicos confirman que el método propuesto supera a los marcos de referencia existentes. En segundo lugar, al agregar estrategias de atención, aprendemos descripciones de colores profundos con redes neuronales a partir de datos con anotaciones para la imagen en vez de para cada uno de los píxeles. La estrategia de atención logra identificar correctamente las regiones relevantes para cada clase que queremos evaluar. La ventaja del enfoque propuesto es que los nombres de colores a usar se pueden aprender específicamente para dominios de los que no existen anotaciones a nivel de píxel. En la segunda parte de la tesis, nos centramos en las descripciones profundas con redes neuronales. En primer lugar, abordamos el problema de comprimir grandes redes de descriptores en redes más pequeñas, manteniendo un rendimiento similar. Proponemos destilar las métricas de una red maestro a una red estudiante. Se introducen dos nuevas funciones de coste para modelar la comunicación de la red maestro a una red estudiante más pequeña: una basada en un maestro absoluto, donde el estudiante pretende producir los mismos descriptores que el maestro, y otra basada en un maestro relativo, donde las distancias entre pares de puntos de datos son comunicadas del maestro al alumno. Además, se han investigado diversos aspectos de la destilación para las representaciones, incluidas las capas de atención, el aprendizaje semi-supervisado y la destilación de calidad cruzada. Finalmente, se estudia otro aspecto del aprendizaje por métrica profundo, el aprendizaje continuado. Observamos que se produce una variación del conocimiento aprendido durante el entrenamiento de nuevas tareas. En esta tesis se presenta un método para estimar la variación semántica en función de la variación que experimentan los datos de la tarea actual durante su aprendizaje. Teniendo en cuenta esta estimación, las tareas anteriores pueden ser compensadas, mejorando así su rendimiento. Además, mostramos que las redes de descripciones profundas sufren significativamente menos olvidos catastróficos en comparación con las redes de clasificación cuando aprenden nuevas tareas.
One of the fundamental problems of computer vision is to represent images with compact semantically relevant embeddings. These embeddings could then be used in a wide variety of applications, such as image retrieval, object detection, and video search. The main objective of this thesis is to study image embeddings from two aspects: color embeddings and deep embeddings. In the first part of the thesis we start from hand-crafted color embeddings. We propose a method to order the additional color names according to their complementary nature with the basic eleven color names. This allows us to compute color name representations with high discriminative power of arbitrary length. Psychophysical experiments confirm that our proposed method outperforms baseline approaches. Secondly, we learn deep color embeddings from weakly labeled data by adding an attention strategy. The attention branch is able to correctly identify the relevant regions for each class. The advantage of our approach is that it can learn color names for specific domains for which no pixel-wise labels exists. In the second part of the thesis, we focus on deep embeddings. Firstly, we address the problem of compressing large embedding networks into small networks, while maintaining similar performance. We propose to distillate the metrics from a teacher network to a student network. Two new losses are introduced to model the communication of a deep teacher network to a small student network: one based on an absolute teacher, where the student aims to produce the same embeddings as the teacher, and one based on a relative teacher, where the distances between pairs of data points is communicated from the teacher to the student. In addition, various aspects of distillation have been investigated for embeddings, including hint and attention layers, semi-supervised learning and cross quality distillation. Finally, another aspect of deep metric learning, namely lifelong learning, is studied. We observed some drift occurs during training of new tasks for metric learning. A method to estimate the semantic drift based on the drift which is experienced by data of the current task during its training is introduced. Having this estimation, previous tasks can be compensated for this drift, thereby improving their performance. Furthermore, we show that embedding networks suffer significantly less from catastrophic forgetting compared to classification networks when learning new tasks.
Gonthier, Nicolas. « Transfer learning of convolutional neural networks for texture synthesis and visual recognition in artistic images ». Thesis, université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG024.
Texte intégralIn this thesis, we study the transfer of Convolutional Neural Networks (CNN) trained on natural images to related tasks. We follow two axes: texture synthesis and visual recognition in artworks. The first one consists in synthesizing a new image given a reference sample. Most methods are based on enforcing the Gram matrices of ImageNet-trained CNN features. We develop a multi-resolution strategy to take into account large scale structures. This strategy can be coupled with long-range constraints either through a Fourier frequency constraint, or the use of feature maps autocorrelation. This scheme allows excellent high-resolution synthesis especially for regular textures. We compare our methods to alternatives ones with quantitative and perceptual evaluations. In a second axis, we focus on transfer learning of CNN for artistic image classification. CNNs can be used as off-the-shelf feature extractors or fine-tuned. We illustrate the advantage of the last solution. Second, we use feature visualization techniques, CNNs similarity indexes and quantitative metrics to highlight some characteristics of the fine-tuning process. Another possibility is to transfer a CNN trained for object detection. We propose a simple multiple instance method using off-the-shelf deep features and box proposals, for weakly supervised object detection. At training time, only image-level annotations are needed. We experimentally show the interest of our models on six non-photorealistic
Caye, Daudt Rodrigo. « Convolutional neural networks for change analysis in earth observation images with noisy labels and domain shifts ». Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT033.
Texte intégralThe analysis of satellite and aerial Earth observation images allows us to obtain precise information over large areas. A multitemporal analysis of such images is necessary to understand the evolution of such areas. In this thesis, convolutional neural networks are used to detect and understand changes using remote sensing images from various sources in supervised and weakly supervised settings. Siamese architectures are used to compare coregistered image pairs and to identify changed pixels. The proposed method is then extended into a multitask network architecture that is used to detect changes and perform land cover mapping simultaneously, which permits a semantic understanding of the detected changes. Then, classification filtering and a novel guided anisotropic diffusion algorithm are used to reduce the effect of biased label noise, which is a concern for automatically generated large-scale datasets. Weakly supervised learning is also achieved to perform pixel-level change detection using only image-level supervision through the usage of class activation maps and a novel spatial attention layer. Finally, a domain adaptation method based on adversarial training is proposed, which succeeds in projecting images from different domains into a common latent space where a given task can be performed. This method is tested not only for domain adaptation for change detection, but also for image classification and semantic segmentation, which proves its versatility
Oquab, Maxime. « Convolutional neural networks : towards less supervision for visual recognition ». Thesis, Paris Sciences et Lettres (ComUE), 2018. http://www.theses.fr/2018PSLEE061.
Texte intégralConvolutional Neural Networks are flexible learning algorithms for computer vision that scale particularly well with the amount of data that is provided for training them. Although these methods had successful applications already in the ’90s, they were not used in visual recognition pipelines because of their lesser performance on realistic natural images. It is only after the amount of data and the computational power both reached a critical point that these algorithms revealed their potential during the ImageNet challenge of 2012, leading to a paradigm shift in visual recogntion. The first contribution of this thesis is a transfer learning setup with a Convolutional Neural Network for image classification. Using a pre-training procedure, we show that image representations learned in a network generalize to other recognition tasks, and their performance scales up with the amount of data used in pre-training. The second contribution of this thesis is a weakly supervised setup for image classification that can predict the location of objects in complex cluttered scenes, based on a dataset indicating only with the presence or absence of objects in training images. The third contribution of this thesis aims at finding possible paths for progress in unsupervised learning with neural networks. We study the recent trend of Generative Adversarial Networks and propose two-sample tests for evaluating models. We investigate possible links with concepts related to causality, and propose a two-sample test method for the task of causal discovery. Finally, building on a recent connection with optimal transport, we investigate what these generative algorithms are learning from unlabeled data
Dufraux, Adrien. « Exploitation de transcriptions bruitées pour la reconnaissance automatique de la parole ». Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0032.
Texte intégralUsual methods to design automatic speech recognition systems require speech datasets with high quality transcriptions. These datasets are composed of the acoustic signals uttered by speakers and the corresponding word-level transcripts representing what is being said. It takes several thousand hours of transcribed speech to build a good speech recognition model. The dataset must include a variety of speakers recorded in different situations in order to cover the wide variability of speech and language. To create such a system, human annotators are asked to listen to audio tracks and to write down the corresponding text. This process is costly and can lead to errors. What is beeing said in realistic settings is indeed not always easy to understand. Poorly transcribed signals cause a drop of performance of the acoustic model. To improve the quality of the transcripts, the same utterances may be transcribed by several people, but this leads to an even more expensive process.This thesis takes the opposite view. We design algorithms which can exploit datasets with “noisy” transcriptions i.e., which contain errors. The main goal of this thesis is to reduce the costs of building an automatic speech recognition system by limiting the performance drop induced by these errors.We first introduce the Lead2Gold algorithm. Lead2Gold is based on a cost function that is tolerant to datasets with noisy transcriptions. We model transcription errors at the letter level with a noise model. For each transcript in the dataset, the algorithm searches for a set of likely better transcripts relying on a beam search in a graph. This technique is usually not used to design cost functions. We show that it is possible to explicitly add new elements (here a noise model) to design complex cost functions.We then express the Lead2Gold loss in the wFST formalism. wFSTs are graphs whose edges are weighted and represent symbols. To build flexible cost functions we can compose several graphs. With our proposal, it becomes easier to add new elements, such as a lexicon, to better characterize good transcriptions. We show that using wFSTs is a good alternative to using Lead2Gold's explicit beam search. The modular formulation allows us to design a new variety of cost functions that model transcription errors.Finally, we conduct a data collection experiment in real conditions. We observe different types of annotator profiles. Annotators do not have the same perception of acoustic signals and hence can produce different types of errors. The explicit goal of this experiment is to collect transcripts with errors and to prove the usefulness of modeling these errors
Cinbis, Ramazan Gokberk. « Classification d'images et localisation d'objets par des méthodes de type noyau de Fisher ». Phd thesis, Université de Grenoble, 2014. http://tel.archives-ouvertes.fr/tel-01071581.
Texte intégralPécheux, Nicolas. « Modèles exponentiels et contraintes sur les espaces de recherche en traduction automatique et pour le transfert cross-lingue ». Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS242/document.
Texte intégralMost natural language processing tasks are modeled as prediction problems where one aims at finding the best scoring hypothesis from a very large pool of possible outputs. Even if algorithms are designed to leverage some kind of structure, the output space is often too large to be searched exaustively. This work aims at understanding the importance of the search space and the possible use of constraints to reduce it in size and complexity. We report in this thesis three case studies which highlight the risk and benefits of manipulating the seach space in learning and inference.When information about the possible outputs of a sequence labeling task is available, it may seem appropriate to include this knowledge into the system, so as to facilitate and speed-up learning and inference. A case study on type constraints for CRFs however shows that using such constraints at training time is likely to drastically reduce performance, even when these constraints are both correct and useful at decoding.On the other side, we also consider possible relaxations of the supervision space, as in the case of learning with latent variables, or when only partial supervision is available, which we cast as ambiguous learning. Such weakly supervised methods, together with cross-lingual transfer and dictionary crawling techniques, allow us to develop natural language processing tools for under-resourced languages. Word order differences between languages pose several combinatorial challenges to machine translation and the constraints on word reorderings have a great impact on the set of potential translations that is explored during search. We study reordering constraints that allow to restrict the factorial space of permutations and explore the impact of the reordering search space design on machine translation performance. However, we show that even though it might be desirable to design better reordering spaces, model and search errors seem yet to be the most important issues
Guillaumin, Matthieu. « Données multimodales pour l'analyse d'image ». Phd thesis, Grenoble, 2010. http://www.theses.fr/2010GRENM048.
Texte intégralThis dissertation delves into the use of textual metadata for image understanding. We seek to exploit this additional textual information as weak supervision to improve the learning of recognition models. There is a recent and growing interest for methods that exploit such data because they can potentially alleviate the need for manual annotation, which is a costly and time-consuming process. We focus on two types of visual data with associated textual information. First, we exploit news images that come with descriptive captions to address several face related tasks, including face verification, which is the task of deciding whether two images depict the same individual, and face naming, the problem of associating faces in a data set to their correct names. Second, we consider data consisting of images with user tags. We explore models for automatically predicting tags for new images, i. E. Image auto-annotation, which can also used for keyword-based image search. We also study a multimodal semi-supervised learning scenario for image categorisation. In this setting, the tags are assumed to be present in both labelled and unlabelled training data, while they are absent from the test data. Our work builds on the observation that most of these tasks can be solved if perfectly adequate similarity measures are used. We therefore introduce novel approaches that involve metric learning, nearest neighbour models and graph-based methods to learn, from the visual and textual data, task-specific similarities. For faces, our similarities focus on the identities of the individuals while, for images, they address more general semantic visual concepts. Experimentally, our approaches achieve state-of-the-art results on several standard and challenging data sets. On both types of data, we clearly show that learning using additional textual information improves the performance of visual recognition systems
Guillaumin, Matthieu. « Données multimodales pour l'analyse d'image ». Phd thesis, Grenoble, 2010. http://tel.archives-ouvertes.fr/tel-00522278/en/.
Texte intégralPatrini, Giorgio. « Weakly supervised learning via statistical sufficiency ». Phd thesis, 2016. http://hdl.handle.net/1885/117067.
Texte intégralHuang, Gary B. « Weakly supervised learning for unconstrained face processing ». 2012. https://scholarworks.umass.edu/dissertations/AAI3518242.
Texte intégralShen, Tong. « Context Learning and Weakly Supervised Learning for Semantic Segmentation ». Thesis, 2018. http://hdl.handle.net/2440/120354.
Texte intégralThesis (Ph.D.) -- University of Adelaide, School of Computer Science, 2018
Liu, Jen-Yu, et 劉任瑜. « Weakly-supervised Event Detection for Music Audios andVideos Using Fully-convolutional Networks ». Thesis, 2018. http://ndltd.ncl.edu.tw/handle/3n7ebz.
Texte intégral國立臺灣大學
電機工程學研究所
106
With the growing of audio and video streaming services, music audios and videos are among the most popular sources for entertainment in recent days. There are rich information in music and music playing. In order to automatically analyze these audios and videos for further retrieval or pedagogical purpose, we may want to use machine learning to help with detecting audio and visual events. However, learning-based methods usually require a large amount of training data. In audios and videos, annotating these data are not easy because the process is time-consuming and tedious. In this work, we will see how to train such detection models with only clip-level annotations with weakly-supervised learning. We will use fully-convolutional networks (FCNs) for event detection in music audios and videos. First, we will develop FCNs for temporally detecting music audio events such as genres, instruments, and moods, which will be evaluated on an instrument dataset. Second, we will develop a weakly-supervised framework for detecting instrument-playing actions in videos. The learning framework involves two auxiliary models, a sound model and an object model, which are trained using clip-level annotations only. They will provide supervisions temporally and spatially for the action model. In total 5,400 annotated frames will be used to evaluate the performance of the proposed framework. The proposed framework largely improves the performance temporally and spatially.