Thèses : « Representation learning (artifical intelligence) »

1

Li, Hao. « Towards Fast and Efficient Representation Learning ». Thesis, University of Maryland, College Park, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10845690.

Texte intégral

Résumé :

The success of deep learning and convolutional neural networks in many fields is accompanied by a significant increase in the computation cost. With the increasing model complexity and pervasive usage of deep neural networks, there is a surge of interest in fast and efficient model training and inference on both cloud and embedded devices. Meanwhile, understanding the reasons for trainability and generalization is fundamental for its further development. This dissertation explores approaches for fast and efficient representation learning with a better understanding of the trainability and generalization. In particular, we ask following questions and provide our solutions: 1) How to reduce the computation cost for fast inference? 2) How to train low-precision models on resources-constrained devices? 3) What does the loss surface looks like for neural nets and how it affects generalization?

To reduce the computation cost for fast inference, we propose to prune filters from CNNs that are identified as having a small effect on the prediction accuracy. By removing filters with small norms together with their connected feature maps, the computation cost can be reduced accordingly without using special software or hardware. We show that simple filter pruning approach can reduce the inference cost while regaining close to the original accuracy by retraining the networks.

To further reduce the inference cost, quantizing model parameters with low-precision representations has shown significant speedup, especially for edge devices that have limited computing resources, memory capacity, and power consumption. To enable on-device learning on lower-power systems, removing the dependency of full-precision model during training is the key challenge. We study various quantized training methods with the goal of understanding the differences in behavior, and reasons for success or failure. We address the issue of why algorithms that maintain floating-point representations work so well, while fully quantized training methods stall before training is complete. We show that training algorithms that exploit high-precision representations have an important greedy search phase that purely quantized training methods lack, which explains the difficulty of training using low-precision arithmetic.

Finally, we explore the structure of neural loss functions, and the effect of loss landscapes on generalization, using a range of visualization methods. We introduce a simple filter normalization method that helps us visualize loss function curvature, and make meaningful side-by-side comparisons between loss functions. The sharpness of minimizers correlates well with generalization error when this visualization is used. Then, using a variety of visualizations, we explore how training hyper-parameters affect the shape of minimizers, and how network architecture affects the loss landscape.

Styles APA, Harvard, Vancouver, ISO, etc.

2

Denize, Julien. « Self-supervised representation learning and applications to image and video analysis ». Electronic Thesis or Diss., Normandie, 2023. http://www.theses.fr/2023NORMIR37.

Texte intégral

Résumé :

Dans cette thèse, nous développons des approches d'apprentissage auto-supervisé pour l'analyse d'images et de vidéos. L'apprentissage de représentation auto-supervisé permet de pré-entraîner les réseaux neuronaux à apprendre des concepts généraux sans annotations avant de les spécialiser plus rapidement à effectuer des tâches, et avec peu d'annotations. Nous présentons trois contributions à l'apprentissage auto-supervisé de représentations d'images et de vidéos. Premièrement, nous introduisons le paradigme théorique de l'apprentissage contrastif doux et sa mise en œuvre pratique appelée Estimation Contrastive de Similarité (SCE) qui relie l'apprentissage contrastif et relationnel pour la représentation d'images. Ensuite, SCE est étendue à l'apprentissage de représentation vidéo temporelle globale. Enfin, nous proposons COMEDIAN, un pipeline pour l'apprentissage de représentation vidéo locale-temporelle pour l'architecture transformer. Ces contributions ont conduit à des résultats de pointe sur de nombreux benchmarks et ont donné lieu à de multiples contributions académiques et techniques publiées
In this thesis, we develop approaches to perform self-supervised learning for image and video analysis. Self-supervised representation learning allows to pretrain neural networks to learn general concepts without labels before specializing in downstream tasks faster and with few annotations. We present three contributions to self-supervised image and video representation learning. First, we introduce the theoretical paradigm of soft contrastive learning and its practical implementation called Similarity Contrastive Estimation (SCE) connecting contrastive and relational learning for image representation. Second, SCE is extended to global temporal video representation learning. Lastly, we propose COMEDIAN a pipeline for local-temporal video representation learning for transformers. These contributions achieved state-of-the-art results on multiple benchmarks and led to several academic and technical published contributions

Styles APA, Harvard, Vancouver, ISO, etc.

3

Aboul-Enien, Hisham Abdel-Ghaffer. « Neural network learning and knowledge representation in a multi-agent system ». Thesis, Imperial College London, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.252040.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Carvalho, Micael. « Deep representation spaces ». Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS292.

Texte intégral

Résumé :

Ces dernières années, les techniques d’apprentissage profond ont fondamentalement transformé l'état de l'art de nombreuses applications de l'apprentissage automatique, devenant la nouvelle approche standard pour plusieurs d’entre elles. Les architectures provenant de ces techniques ont été utilisées pour l'apprentissage par transfert, ce qui a élargi la puissance des modèles profonds à des tâches qui ne disposaient pas de suffisamment de données pour les entraîner à partir de zéro. Le sujet d'étude de cette thèse couvre les espaces de représentation créés par les architectures profondes. Dans un premier temps, nous étudions les propriétés de leurs espaces, en prêtant un intérêt particulier à la redondance des dimensions et la précision numérique de leurs représentations. Nos résultats démontrent un fort degré de robustesse, pointant vers des schémas de compression simples et puissants. Ensuite, nous nous concentrons sur le l'affinement de ces représentations. Nous choisissons d'adopter un problème multi-tâches intermodal et de concevoir une fonction de coût capable de tirer parti des données de plusieurs modalités, tout en tenant compte des différentes tâches associées au même ensemble de données. Afin d'équilibrer correctement ces coûts, nous développons également un nouveau processus d'échantillonnage qui ne prend en compte que des exemples contribuant à la phase d'apprentissage, c'est-à-dire ceux ayant un coût positif. Enfin, nous testons notre approche sur un ensemble de données à grande échelle de recettes de cuisine et d'images associées. Notre méthode améliore de 5 fois l'état de l'art sur cette tâche, et nous montrons que l'aspect multitâche de notre approche favorise l'organisation sémantique de l'espace de représentation, lui permettant d'effectuer des sous-tâches jamais vues pendant l'entraînement, comme l'exclusion et la sélection d’ingrédients. Les résultats que nous présentons dans cette thèse ouvrent de nombreuses possibilités, y compris la compression de caractéristiques pour les applications distantes, l'apprentissage multi-modal et multitâche robuste et l'affinement de l'espace des caractéristiques. Pour l'application dans le contexte de la cuisine, beaucoup de nos résultats sont directement applicables dans une situation réelle, en particulier pour la détection d'allergènes, la recherche de recettes alternatives en raison de restrictions alimentaires et la planification de menus
In recent years, Deep Learning techniques have swept the state-of-the-art of many applications of Machine Learning, becoming the new standard approach for them. The architectures issued from these techniques have been used for transfer learning, which extended the power of deep models to tasks that did not have enough data to fully train them from scratch. This thesis' subject of study is the representation spaces created by deep architectures. First, we study properties inherent to them, with particular interest in dimensionality redundancy and precision of their features. Our findings reveal a strong degree of robustness, pointing the path to simple and powerful compression schemes. Then, we focus on refining these representations. We choose to adopt a cross-modal multi-task problem, and design a loss function capable of taking advantage of data coming from multiple modalities, while also taking into account different tasks associated to the same dataset. In order to correctly balance these losses, we also we develop a new sampling scheme that only takes into account examples contributing to the learning phase, i.e. those having a positive loss. Finally, we test our approach in a large-scale dataset of cooking recipes and associated pictures. Our method achieves a 5-fold improvement over the state-of-the-art, and we show that the multi-task aspect of our approach promotes a semantically meaningful organization of the representation space, allowing it to perform subtasks never seen during training, like ingredient exclusion and selection. The results we present in this thesis open many possibilities, including feature compression for remote applications, robust multi-modal and multi-task learning, and feature space refinement. For the cooking application, in particular, many of our findings are directly applicable in a real-world context, especially for the detection of allergens, finding alternative recipes due to dietary restrictions, and menu planning

Styles APA, Harvard, Vancouver, ISO, etc.

5

Newman-Griffis, Denis R. « Capturing Domain Semantics with Representation Learning : Applications to Health and Function ». The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587658607378958.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

6

Cao, Xi Hang. « On Leveraging Representation Learning Techniques for Data Analytics in Biomedical Informatics ». Diss., Temple University Libraries, 2019. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/586006.

Texte intégral

Résumé :

Computer and Information Science
Ph.D.
Representation Learning is ubiquitous in state-of-the-art machine learning workflow, including data exploration/visualization, data preprocessing, data model learning, and model interpretations. However, the majority of the newly proposed Representation Learning methods are more suitable for problems with a large amount of data. Applying these methods to problems with a limited amount of data may lead to unsatisfactory performance. Therefore, there is a need for developing Representation Learning methods which are tailored for problems with ``small data", such as, clinical and biomedical data analytics. In this dissertation, we describe our studies of tackling the challenging clinical and biomedical data analytics problem from four perspectives: data preprocessing, temporal data representation learning, output representation learning, and joint input-output representation learning. Data scaling is an important component in data preprocessing. The objective in data scaling is to scale/transform the raw features into reasonable ranges such that each feature of an instance will be equally exploited by the machine learning model. For example, in a credit flaw detection task, a machine learning model may utilize a person's credit score and annual income as features, but because the ranges of these two features are different, a machine learning model may consider one more heavily than another. In this dissertation, I thoroughly introduce the problem in data scaling and describe an approach for data scaling which can intrinsically handle the outlier problem and lead to better model prediction performance. Learning new representations for data in the unstandardized form is a common task in data analytics and data science applications. Usually, data come in a tubular form, namely, the data is represented by a table in which each row is a feature (row) vector of an instance. However, it is also common that the data are not in this form; for example, texts, images, and video/audio records. In this dissertation, I describe the challenge of analyzing imperfect multivariate time series data in healthcare and biomedical research and show that the proposed method can learn a powerful representation to encounter various imperfections and lead to an improvement of prediction performance. Learning output representations is a new aspect of Representation Learning, and its applications have shown promising results in complex tasks, including computer vision and recommendation systems. The main objective of an output representation algorithm is to explore the relationship among the target variables, such that a prediction model can efficiently exploit the similarities and potentially improve prediction performance. In this dissertation, I describe a learning framework which incorporates output representation learning to time-to-event estimation. Particularly, the approach learns the model parameters and time vectors simultaneously. Experimental results do not only show the effectiveness of this approach but also show the interpretability of this approach from the visualizations of the time vectors in 2-D space. Learning the input (feature) representation, output representation, and predictive modeling are closely related to each other. Therefore, it is a very natural extension of the state-of-the-art by considering them together in a joint framework. In this dissertation, I describe a large-margin ranking-based learning framework for time-to-event estimation with joint input embedding learning, output embedding learning, and model parameter learning. In the framework, I cast the functional learning problem to a kernel learning problem, and by adopting the theories in Multiple Kernel Learning, I propose an efficient optimization algorithm. Empirical results also show its effectiveness on several benchmark datasets.
Temple University--Theses

Styles APA, Harvard, Vancouver, ISO, etc.

7

Panesar, Kulvinder. « Conversational artificial intelligence - demystifying statistical vs linguistic NLP solutions ». Universitat Politécnica de Valéncia, 2020. http://hdl.handle.net/10454/18121.

Texte intégral

Résumé :

yes
This paper aims to demystify the hype and attention on chatbots and its association with conversational artificial intelligence. Both are slowly emerging as a real presence in our lives from the impressive technological developments in machine learning, deep learning and natural language understanding solutions. However, what is under the hood, and how far and to what extent can chatbots/conversational artificial intelligence solutions work – is our question. Natural language is the most easily understood knowledge representation for people, but certainly not the best for computers because of its inherent ambiguous, complex and dynamic nature. We will critique the knowledge representation of heavy statistical chatbot solutions against linguistics alternatives. In order to react intelligently to the user, natural language solutions must critically consider other factors such as context, memory, intelligent understanding, previous experience, and personalized knowledge of the user. We will delve into the spectrum of conversational interfaces and focus on a strong artificial intelligence concept. This is explored via a text based conversational software agents with a deep strategic role to hold a conversation and enable the mechanisms need to plan, and to decide what to do next, and manage the dialogue to achieve a goal. To demonstrate this, a deep linguistically aware and knowledge aware text based conversational agent (LING-CSA) presents a proof-of-concept of a non-statistical conversational AI solution.

Styles APA, Harvard, Vancouver, ISO, etc.

8

Tamaazousti, Youssef. « Vers l’universalité des représentations visuelle et multimodales ». Thesis, Université Paris-Saclay (ComUE), 2018. http://www.theses.fr/2018SACLC038/document.

Texte intégral

Résumé :

En raison de ses enjeux sociétaux, économiques et culturels, l’intelligence artificielle (dénotée IA) est aujourd’hui un sujet d’actualité très populaire. L’un de ses principaux objectifs est de développer des systèmes qui facilitent la vie quotidienne de l’homme, par le biais d’applications telles que les robots domestiques, les robots industriels, les véhicules autonomes et bien plus encore. La montée en popularité de l’IA est fortement due à l’émergence d’outils basés sur des réseaux de neurones profonds qui permettent d’apprendre simultanément, la représentation des données (qui était traditionnellement conçue à la main), et la tâche à résoudre (qui était traditionnellement apprise à l’aide de modèles d’apprentissage automatique). Ceci résulte de la conjonction des avancées théoriques, de la capacité de calcul croissante ainsi que de la disponibilité de nombreuses données annotées. Un objectif de longue date de l’IA est de concevoir des machines inspirées des humains, capables de percevoir le monde, d’interagir avec les humains, et tout ceci de manière évolutive (c’est `a dire en améliorant constamment la capacité de perception du monde et d’interaction avec les humains). Bien que l’IA soit un domaine beaucoup plus vaste, nous nous intéressons dans cette thèse, uniquement à l’IA basée apprentissage (qui est l’une des plus performante, à ce jour). Celle-ci consiste `a l’apprentissage d’un modèle qui une fois appris résoud une certaine tâche, et est généralement composée de deux sous-modules, l’un représentant la donnée (nommé ”représentation”) et l’autre prenant des décisions (nommé ”résolution de tâche”). Nous catégorisons, dans cette thèse, les travaux autour de l’IA, dans les deux approches d’apprentissage suivantes : (i) Spécialisation : apprendre des représentations à partir de quelques tâches spécifiques dans le but de pouvoir effectuer des tâches très spécifiques (spécialisées dans un certain domaine) avec un très bon niveau de performance; ii) Universalité : apprendre des représentations à partir de plusieurs tâches générales dans le but d’accomplir autant de tâches que possible dansdifférents contextes. Alors que la spécialisation a été largement explorée par la communauté de l’apprentissage profond, seules quelques tentatives implicites ont été réalisée vers la seconde catégorie, à savoir, l’universalité. Ainsi, le but de cette thèse est d’aborder explicitement le problème de l’amélioration de l’universalité des représentations avec des méthodes d’apprentissage profond, pour les données d’image et de texte. [...]
Because of its key societal, economic and cultural stakes, Artificial Intelligence (AI) is a hot topic. One of its main goal, is to develop systems that facilitates the daily life of humans, with applications such as household robots, industrial robots, autonomous vehicle and much more. The rise of AI is highly due to the emergence of tools based on deep neural-networks which make it possible to simultaneously learn, the representation of the data (which were traditionally hand-crafted), and the task to solve (traditionally learned with statistical models). This resulted from the conjunction of theoretical advances, the growing computational capacity as well as the availability of many annotated data. A long standing goal of AI is to design machines inspired humans, capable of perceiving the world, interacting with humans, in an evolutionary way. We categorize, in this Thesis, the works around AI, in the two following learning-approaches: (i) Specialization: learn representations from few specific tasks with the goal to be able to carry out very specific tasks (specialized in a certain field) with a very good level of performance; (ii) Universality: learn representations from several general tasks with the goal to perform as many tasks as possible in different contexts. While specialization was extensively explored by the deep-learning community, only a few implicit attempts were made towards universality. Thus, the goal of this Thesis is to explicitly address the problem of improving universality with deep-learning methods, for image and text data. We have addressed this topic of universality in two different forms: through the implementation of methods to improve universality (“universalizing methods”); and through the establishment of a protocol to quantify its universality. Concerning universalizing methods, we proposed three technical contributions: (i) in a context of large semantic representations, we proposed a method to reduce redundancy between the detectors through, an adaptive thresholding and the relations between concepts; (ii) in the context of neural-network representations, we proposed an approach that increases the number of detectors without increasing the amount of annotated data; (iii) in a context of multimodal representations, we proposed a method to preserve the semantics of unimodal representations in multimodal ones. Regarding the quantification of universality, we proposed to evaluate universalizing methods in a Transferlearning scheme. Indeed, this technical scheme is relevant to assess the universal ability of representations. This also led us to propose a new framework as well as new quantitative evaluation criteria for universalizing methods

Styles APA, Harvard, Vancouver, ISO, etc.

9

Liu, Xudong. « MODELING, LEARNING AND REASONING ABOUT PREFERENCE TREES OVER COMBINATORIAL DOMAINS ». UKnowledge, 2016. http://uknowledge.uky.edu/cs_etds/43.

Texte intégral

Résumé :

In my Ph.D. dissertation, I have studied problems arising in various aspects of preferences: preference modeling, preference learning, and preference reasoning, when preferences concern outcomes ranging over combinatorial domains. Preferences is a major research component in artificial intelligence (AI) and decision theory, and is closely related to the social choice theory considered by economists and political scientists. In my dissertation, I have exploited emerging connections between preferences in AI and social choice theory. Most of my research is on qualitative preference representations that extend and combine existing formalisms such as conditional preference nets, lexicographic preference trees, answer-set optimization programs, possibilistic logic, and conditional preference networks; on learning problems that aim at discovering qualitative preference models and predictive preference information from practical data; and on preference reasoning problems centered around qualitative preference optimization and aggregation methods. Applications of my research include recommender systems, decision support tools, multi-agent systems, and Internet trading and marketing platforms.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Cleland, Benjamin George. « Reinforcement Learning for Racecar Control ». The University of Waikato, 2006. http://hdl.handle.net/10289/2507.

Texte intégral

Résumé :

This thesis investigates the use of reinforcement learning to learn to drive a racecar in the simulated environment of the Robot Automobile Racing Simulator. Real-life race driving is known to be difficult for humans, and expert human drivers use complex sequences of actions. There are a large number of variables, some of which change stochastically and all of which may affect the outcome. This makes driving a promising domain for testing and developing Machine Learning techniques that have the potential to be robust enough to work in the real world. Therefore the principles of the algorithms from this work may be applicable to a range of problems. The investigation starts by finding a suitable data structure to represent the information learnt. This is tested using supervised learning. Reinforcement learning is added and roughly tuned, and the supervised learning is then removed. A simple tabular representation is found satisfactory, and this avoids difficulties with more complex methods and allows the investigation to concentrate on the essentials of learning. Various reward sources are tested and a combination of three are found to produce the best performance. Exploration of the problem space is investigated. Results show exploration is essential but controlling how much is done is also important. It turns out the learning episodes need to be very long and because of this the task needs to be treated as continuous by using discounting to limit the size of the variables stored. Eligibility traces are used with success to make the learning more efficient. The tabular representation is made more compact by hashing and more accurate by using smaller buckets. This slows the learning but produces better driving. The improvement given by a rough form of generalisation indicates the replacement of the tabular method by a function approximator is warranted. These results show reinforcement learning can work within the Robot Automobile Racing Simulator, and lay the foundations for building a more efficient and competitive agent.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Kim, Seungyeon. « Novel document representations based on labels and sequential information ». Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/53946.

Texte intégral

Résumé :

A wide variety of text analysis applications are based on statistical machine learning techniques. The success of those applications is critically affected by how we represent a document. Learning an efficient document representation has two major challenges: sparsity and sequentiality. The sparsity often causes high estimation error, and text's sequential nature, interdependency between words, causes even more complication. This thesis presents novel document representations to overcome the two challenges. First, I employ label characteristics to estimate a compact document representation. Because label attributes implicitly describe the geometry of dense subspace that has substantial impact, I can effectively resolve the sparsity issue while only focusing the compact subspace. Second, while modeling a document as a joint or conditional distribution between words and their sequential information, I can efficiently reflect sequential nature of text in my document representations. Lastly, the thesis is concluded with a document representation that employs both labels and sequential information in a unified formulation. The following four criteria are utilized to evaluate the goodness of representations: how close a representation is to its original data, how strongly a representation can be distinguished from each other, how easy to interpret a representation by a human, and how much computational effort is needed for a representation. While pursuing those good representation criteria, I was able to obtain document representations that are closer to the original data, stronger in discrimination, and easier to be understood than traditional document representations. Efficient computation algorithms make the proposed approaches largely scalable. This thesis examines emotion prediction, temporal emotion analysis, modeling documents with edit histories, locally coherent topic modeling, and text categorization tasks for possible applications.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Kilinc, Ismail Ozsel. « Graph-based Latent Embedding, Annotation and Representation Learning in Neural Networks for Semi-supervised and Unsupervised Settings ». Scholar Commons, 2017. https://scholarcommons.usf.edu/etd/7415.

Texte intégral

Résumé :

Machine learning has been immensely successful in supervised learning with outstanding examples in major industrial applications such as voice and image recognition. Following these developments, the most recent research has now begun to focus primarily on algorithms which can exploit very large sets of unlabeled examples to reduce the amount of manually labeled data required for existing models to perform well. In this dissertation, we propose graph-based latent embedding/annotation/representation learning techniques in neural networks tailored for semi-supervised and unsupervised learning problems. Specifically, we propose a novel regularization technique called Graph-based Activity Regularization (GAR) and a novel output layer modification called Auto-clustering Output Layer (ACOL) which can be used separately or collaboratively to develop scalable and efficient learning frameworks for semi-supervised and unsupervised settings. First, singularly using the GAR technique, we develop a framework providing an effective and scalable graph-based solution for semi-supervised settings in which there exists a large number of observations but a small subset with ground-truth labels. The proposed approach is natural for the classification framework on neural networks as it requires no additional task calculating the reconstruction error (as in autoencoder based methods) or implementing zero-sum game mechanism (as in adversarial training based methods). We demonstrate that GAR effectively and accurately propagates the available labels to unlabeled examples. Our results show comparable performance with state-of-the-art generative approaches for this setting using an easier-to-train framework. Second, we explore a different type of semi-supervised setting where a coarse level of labeling is available for all the observations but the model has to learn a fine, deeper level of latent annotations for each one. Problems in this setting are likely to be encountered in many domains such as text categorization, protein function prediction, image classification as well as in exploratory scientific studies such as medical and genomics research. We consider this setting as simultaneously performed supervised classification (per the available coarse labels) and unsupervised clustering (within each one of the coarse labels) and propose a novel framework combining GAR with ACOL, which enables the network to perform concurrent classification and clustering. We demonstrate how the coarse label supervision impacts performance and the classification task actually helps propagate useful clustering information between sub-classes. Comparative tests on the most popular image datasets rigorously demonstrate the effectiveness and competitiveness of the proposed approach. The third and final setup builds on the prior framework to unlock fully unsupervised learning where we propose to substitute real, yet unavailable, parent- class information with pseudo class labels. In this novel unsupervised clustering approach the network can exploit hidden information indirectly introduced through a pseudo classification objective. We train an ACOL network through this pseudo supervision together with unsupervised objective based on GAR and ultimately obtain a k-means friendly latent representation. Furthermore, we demonstrate how the chosen transformation type impacts performance and helps propagate the latent information that is useful in revealing unknown clusters. Our results show state-of-the-art performance for unsupervised clustering tasks on MNIST, SVHN and USPS datasets with the highest accuracies reported to date in the literature.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Koga, Marcelo Li. « Relational transfer across reinforcement learning tasks via abstract policies ». Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/3/3141/tde-04112014-103827/.

Texte intégral

Résumé :

When designing intelligent agents that must solve sequential decision problems, often we do not have enough knowledge to build a complete model for the problems at hand. Reinforcement learning enables an agent to learn behavior by acquiring experience through trial-and-error interactions with the environment. However, knowledge is usually built from scratch and learning the optimal policy may take a long time. In this work, we improve the learning performance by exploring transfer learning; that is, the knowledge acquired in previous source tasks is used to accelerate learning in new target tasks. If the tasks present similarities, then the transferred knowledge guides the agent towards faster learning. We explore the use of a relational representation that allows description of relationships among objects. This representation simplifies the use of abstraction and the extraction of the similarities among tasks, enabling the generalization of solutions that can be used across different, but related, tasks. This work presents two model-free algorithms for online learning of abstract policies: AbsSarsa(λ) and AbsProb-RL. The former builds a deterministic abstract policy from value functions, while the latter builds a stochastic abstract policy through direct search on the space of policies. We also propose the S2L-RL agent architecture, containing two levels of learning: an abstract level and a ground level. The agent simultaneously builds a ground policy and an abstract policy; not only the abstract policy can accelerate learning on the current task, but also it can guide the agent in a future task. Experiments in a robotic navigation environment show that these techniques are effective in improving the agents learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task.
Na construção de agentes inteligentes para a solução de problemas de decisão sequenciais, o uso de aprendizado por reforço é necessário quando o agente não possui conhecimento suficiente para construir um modelo completo do problema. Entretanto, o aprendizado de uma política ótima é em geral muito lento pois deve ser atingido através de tentativa-e-erro e de repetidas interações do agente com o ambiente. Umas das técnicas para se acelerar esse processo é possibilitar a transferência de aprendizado, ou seja, utilizar o conhecimento adquirido para se resolver tarefas passadas no aprendizado de novas tarefas. Assim, se as tarefas tiverem similaridades, o conhecimento prévio guiará o agente para um aprendizado mais rápido. Neste trabalho é explorado o uso de uma representação relacional, que explicita relações entre objetos e suas propriedades. Essa representação possibilita que se explore abstração e semelhanças estruturais entre as tarefas, possibilitando a generalização de políticas de ação para o uso em tarefas diferentes, porém relacionadas. Este trabalho contribui com dois algoritmos livres de modelo para construção online de políticas abstratas: AbsSarsa(λ) e AbsProb-RL. O primeiro constrói uma política abstrata determinística através de funções-valor, enquanto o segundo constrói uma política abstrata estocástica através de busca direta no espaço de políticas. Também é proposta a arquitetura S2L-RL para o agente, que possui dois níveis de aprendizado: o nível abstrato e o nível concreto. Uma política concreta é construída simultaneamente a uma política abstrata, que pode ser utilizada tanto para guiar o agente no problema atual quanto para guiá-lo em um novo problema futuro. Experimentos com tarefas de navegação robótica mostram que essas técnicas são efetivas na melhoria do desempenho do agente, principalmente nas fases inicias do aprendizado, quando o agente desconhece completamente o novo problema.

Styles APA, Harvard, Vancouver, ISO, etc.

14

Yaner, Patrick William. « From Shape to Function : Acquisition of Teleological Models from Design Drawings by Compositional Analogy ». Diss., Atlanta, Ga. : Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/19791.

Texte intégral

Résumé :

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2008.
Committee Chair: Goel, Ashok; Committee Member: Eastman, Charles; Committee Member: Ferguson, Ronald; Committee Member: Glasgow, Janice; Committee Member: Nersessian, Nancy; Committee Member: Ram, Ashwin.

Styles APA, Harvard, Vancouver, ISO, etc.

15

Azizpour, Hossein. « Visual Representations and Models : From Latent SVM to Deep Learning ». Doctoral thesis, KTH, Datorseende och robotik, CVAP, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-192289.

Texte intégral

Résumé :

Two important components of a visual recognition system are representation and model. Both involves the selection and learning of the features that are indicative for recognition and discarding those features that are uninformative. This thesis, in its general form, proposes different techniques within the frameworks of two learning systems for representation and modeling. Namely, latent support vector machines (latent SVMs) and deep learning. First, we propose various approaches to group the positive samples into clusters of visually similar instances. Given a fixed representation, the sampled space of the positive distribution is usually structured. The proposed clustering techniques include a novel similarity measure based on exemplar learning, an approach for using additional annotation, and augmenting latent SVM to automatically find clusters whose members can be reliably distinguished from background class. In another effort, a strongly supervised DPM is suggested to study how these models can benefit from privileged information. The extra information comes in the form of semantic parts annotation (i.e. their presence and location). And they are used to constrain DPMs latent variables during or prior to the optimization of the latent SVM. Its effectiveness is demonstrated on the task of animal detection. Finally, we generalize the formulation of discriminative latent variable models, including DPMs, to incorporate new set of latent variables representing the structure or properties of negative samples. Thus, we term them as negative latent variables. We show this generalization affects state-of-the-art techniques and helps the visual recognition by explicitly searching for counter evidences of an object presence. Following the resurgence of deep networks, in the last works of this thesis we have focused on deep learning in order to produce a generic representation for visual recognition. A Convolutional Network (ConvNet) is trained on a largely annotated image classification dataset called ImageNet with $\sim1.3$ million images. Then, the activations at each layer of the trained ConvNet can be treated as the representation of an input image. We show that such a representation is surprisingly effective for various recognition tasks, making it clearly superior to all the handcrafted features previously used in visual recognition (such as HOG in our first works on DPM). We further investigate the ways that one can improve this representation for a task in mind. We propose various factors involving before or after the training of the representation which can improve the efficacy of the ConvNet representation. These factors are analyzed on 16 datasets from various subfields of visual recognition.

QC 20160908

Styles APA, Harvard, Vancouver, ISO, etc.

16

Chennupati, Nikhil. « Recommending Collaborations Using Link Prediction ». Wright State University / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=wright1621899961924795.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

17

El-Shaer, Mennat Allah. « An Experimental Evaluation of Probabilistic Deep Networks for Real-time Traffic Scene Representation using Graphical Processing Units ». The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1546539166677894.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

18

Jones, Joshua K. « Empirically-based self-diagnosis and repair of domain knowledge ». Diss., Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/33931.

Texte intégral

Résumé :

In this work, I view incremental experiential learning in intelligent software agents as progressive agent self-adaptation. When an agent produces an incorrect behavior, then it may reflect on, and thus diagnose and repair, the reasoning and knowledge that produced the incorrect behavior. In particular, I focus on the self-diagnosis and self-repair of an agent's domain knowledge. The implementation of systems with the capability to self-diagnose and self-repair involves building both reasoning processes capable of such learning and knowledge representations capable of supporting those reasoning processes. The core issue my dissertation addresses is: what kind of metaknowledge (knowledge about knowledge) may enable the agent to diagnose faults in its domain knowledge? In providing a solution to this issue, the central contribution of this research is a theory of the kind of metaknowledge that enables a system to reason about and adapt its conceptual knowledge. For this purpose, I propose a representation that explicitly encodes metaknowledge in the form of procedures called Empirical Verification Procedures (EVPs). In the proposed knowledge representation, an EVP is associated with each concept within the agent's domain knowledge. Each EVP explicitly semantically grounds the associated concept in the agent's perception, and can thus be used as a test to determine the validity of knowledge of that concept during diagnosis. I present the formal and empirical evaluation of a system, Augur, that makes use of EVP metaknowledge to adapt its own domain knowledge in the context of a particular subclass of classification problem that I call compositional classification, in which the overall classification task can be broken into a hierarchically organized set of subtasks. I hypothesize that EVP metaknowledge will enable a system to automatically adapt its knowledge in two ways: first, by adjusting the ways that inputs are categorized by a concept, in accordance with semantics fixed by an associated EVP; and second, by adjusting the semantics of concepts themselves when they fail to contribute appropriately to system goals. The latter adaptation is realized by altering the EVP associated with the concept in question. I further hypothesize that the semantic grounding of domain concepts in perception through the use of EVPs will increase the generalization power of a learner that operates over those concepts, and thus make learning more efficient. Beyond the support of these hypotheses, I also present results pertinent to the understanding of learning in compositional classification settings using structured knowledge representations.

Styles APA, Harvard, Vancouver, ISO, etc.

19

Terreau, Enzo. « Apprentissage de représentations d'auteurs et d'autrices à partir de modèles de langue pour l'analyse des dynamiques d'écriture ». Electronic Thesis or Diss., Lyon 2, 2024. http://www.theses.fr/2024LYO20001.

Texte intégral

Résumé :

La démocratisation récente et massive des outils numériques a donné à tous le moyen de produire de l'information et de la partager sur le web, que ce soit à travers des blogs, des réseaux sociaux, des plateformes de partage, ... La croissance exponentielle de cette masse d'information disponible, en grande partie textuelle, nécessite le développement de modèles de traitement automatique du langage naturel (TAL), afin de la représenter mathématiquement pour ensuite la classer, la trier ou la recommander. C'est l'apprentissage de représentation. Il vise à construire un espace de faible dimension où les distances entre les objets projetées (mots, textes) reflètent les distances constatées dans le monde réel, qu'elles soient sémantique, stylistique, ...La multiplication des données disponibles, combinée à l'explosion des moyens de calculs et l'essor de l'apprentissage profond à permis de créer des modèles de langue extrêmement performant pour le plongement des mots et des documents. Ils assimilent des notions sémantiques et de langue complexes, en restant accessibles à tous et facilement spécialisables sur des tâches ou des corpus plus spécifiques. Il est possible de les utiliser pour construire des plongements d'auteurices. Seulement il est difficile de savoir sur quels aspects un modèle va se focaliser pour les rapprocher ou les éloigner. Dans un cadre littéraire, il serait préférable que les similarités se rapportent principalement au style écrit. Plusieurs problèmes se posent alors. La définition du style littéraire est floue, il est difficile d'évaluer l'écart stylistique entre deux textes et donc entre leurs plongements. En linguistique computationnelle, les approches visant à le caractériser sont principalement statistiques, s'appuyant sur des marqueurs du langage. Fort de ces constats, notre première contribution propose une méthode d'évaluation de la capacité des modèles de langue à appréhender le style écrit. Nous aurons au préalable détaillé comment le texte est représenté en apprentissage automatique puis en apprentissage profond, au niveau du mot, du document puis des auteurices. Nous aurons aussi présenté le traitement de la notion de style littéraire en TAL, base de notre méthode. Le transfert de connaissances entre les boîtes noires que sont les grands modèles de langue et ces méthodes issues de la linguistique n'en demeure pas moins complexe. Notre seconde contribution vise à réconcilier ces approches via un modèle d'apprentissage de représentations d'auteurices se focalisant sur le style, VADES (Variational Author and Document Embedding with Style). Nous nous comparons aux méthodes existantes et analysons leurs limites dans cette optique-là. Enfin, nous nous intéressons à l'apprentissage de plongements dynamiques d'auteurices et de documents. En effet, l'information temporelle est cruciale et permet une représentation plus fine des dynamiques d'écriture. Après une présentation de l'état de l'art, nous détaillons notre dernière contribution, B²ADE (Brownian Bridge for Author and Document Embedding), modélisant les auteurices comme des trajectoires. Nous finissons en décrivant plusieurs axes d'améliorations de nos méthodes ainsi que quelques problématiques pour de futurs travaux
The recent and massive democratization of digital tools has empowered individuals to generate and share information on the web through various means such as blogs, social networks, sharing platforms, and more. The exponential growth of available information, mostly textual data, requires the development of Natural Language Processing (NLP) models to mathematically represent it and subsequently classify, sort, or recommend it. This is the essence of representation learning. It aims to construct a low-dimensional space where the distances between projected objects (words, texts) reflect real-world distances, whether semantic, stylistic, and so on.The proliferation of available data, coupled with the rise in computing power and deep learning, has led to the creation of highly effective language models for word and document embeddings. These models incorporate complex semantic and linguistic concepts while remaining accessible to everyone and easily adaptable to specific tasks or corpora. One can use them to create author embeddings. However, it is challenging to determine the aspects on which a model will focus to bring authors closer or move them apart. In a literary context, it is preferable for similarities to primarily relate to writing style, which raises several issues. The definition of literary style is vague, assessing the stylistic difference between two texts and their embeddings is complex. In computational linguistics, approaches aiming to characterize it are mainly statistical, relying on language markers. In light of this, our first contribution is a framework to evaluate the ability of language models to grasp writing style. We will have previously elaborated on text embedding models in machine learning and deep learning, at the word, document, and author levels. We will also have presented the treatment of the notion of literary style in Natural Language Processing, which forms the basis of our method. Transferring knowledge between black-box large language models and these methods derived from linguistics remains a complex task. Our second contribution aims to reconcile these approaches through a representation learning model focusing on style, VADES (Variational Author and Document Embedding with Style). We compare our model to state-of-the-art ones and analyze their limitations in this context.Finally, we delve into dynamic author and document embeddings. Temporal information is crucial, allowing for a more fine-grained representation of writing dynamics. After presenting the state of the art, we elaborate on our last contribution, B²ADE (Brownian Bridge Author and Document Embedding), which models authors as trajectories. We conclude by outlining several leads for improving our methods and highlighting potential research directions for the future

Styles APA, Harvard, Vancouver, ISO, etc.

20

Martelli, Thérèse. « Modelisation objet pour la representation de connaissances complexes : application au decodage acoustico-phonetique de la parole continue ». Paris, ENST, 1988. http://www.theses.fr/1988ENST0005.

Texte intégral

Résumé :

Presentation d'une architecture heterarchique, nommee remora, suivant les criteres d'une station de travail "ideale" qui permet la representation active de connaissances tant procedurales que declaratives, elaborees dans des formalismes distincts, provenant de divers utilisateurs

Styles APA, Harvard, Vancouver, ISO, etc.

21

Mita, Graziano. « Toward interpretable machine learning, with applications to large-scale industrial systems data ». Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS112.

Texte intégral

Résumé :

Les contributions présentées dans cette thèse sont doubles. Nous fournissons d'abord un aperçu général de l'apprentissage automatique interprétable, en établissant des liens avec différents domaines, en introduisant une taxonomie des approches d'explicabilité. Nous nous concentrons sur l'apprentissage des règles et proposons une nouvelle approche de classification, LIBRE, basée sur la synthèse de fonction booléenne monotone. LIBRE est une méthode ensembliste qui combine les règles candidates apprises par plusieurs apprenants faibles ascendants avec une simple union, afin d'obtenir un ensemble final de règles interprétables. LIBRE traite avec succès des données équilibrés et déséquilibrés, atteignant efficacement des performances supérieures et une meilleure interprétabilité par rapport aux plusieurs approches. L'interprétabilité des représentations des données constitue la deuxième grande contribution à ce travail. Nous limitons notre attention à l'apprentissage des représentations démêlées basées sur les autoencodeurs variationnels pour apprendre des représentations sémantiquement significatives. Des contributions récentes ont démontré que le démêlage est impossible dans des contextes purement non supervisés. Néanmoins, nous présentons une nouvelle méthode, IDVAE, avec des garanties théoriques sur le démêlage, dérivant de l'emploi d'une distribution a priori exponentiel optimal factorisé, conditionnellement dépendant de variables auxiliaires complétant les observations d'entrée. Nous proposons également une version semi-supervisée de notre méthode. Notre campagne expérimentale montre qu'IDVAE bat souvent ses concurrents selon plusieurs métriques de démêlage
The contributions presented in this work are two-fold. We first provide a general overview of explanations and interpretable machine learning, making connections with different fields, including sociology, psychology, and philosophy, introducing a taxonomy of popular explainability approaches and evaluation methods. We subsequently focus on rule learning, a specific family of transparent models, and propose a novel rule-based classification approach, based on monotone Boolean function synthesis: LIBRE. LIBRE is an ensemble method that combines the candidate rules learned by multiple bottom-up learners with a simple union, in order to obtain a final intepretable rule set. Our method overcomes most of the limitations of state-of-the-art competitors: it successfully deals with both balanced and imbalanced datasets, efficiently achieving superior performance and higher interpretability in real datasets. Interpretability of data representations constitutes the second broad contribution to this work. We restrict our attention to disentangled representation learning, and, in particular, VAE-based disentanglement methods to automatically learn representations consisting of semantically meaningful features. Recent contributions have demonstrated that disentanglement is impossible in purely unsupervised settings. Nevertheless, incorporating inductive biases on models and data may overcome such limitations. We present a new disentanglement method - IDVAE - with theoretical guarantees on disentanglement, deriving from the employment of an optimal exponential factorized prior, conditionally dependent on auxiliary variables complementing input observations. We additionally propose a semi-supervised version of our method. Our experimental campaign on well-established datasets in the literature shows that IDVAE often beats its competitors according to several disentanglement metrics

Styles APA, Harvard, Vancouver, ISO, etc.

22

Trinh, Viet. « CONTEXTUALIZING OBSERVATIONAL DATA FOR MODELING HUMAN PERFORMANCE ». Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2747.

Texte intégral

Résumé :

This research focuses on the ability to contextualize observed human behaviors in efforts to automate the process of tactical human performance modeling through learning from observations. This effort to contextualize human behavior is aimed at minimizing the role and involvement of the knowledge engineers required in building intelligent Context-based Reasoning (CxBR) agents. More specifically, the goal is to automatically discover the context in which a human actor is situated when performing a mission to facilitate the learning of such CxBR models. This research is derived from the contextualization problem left behind in Fernlund's research on using the Genetic Context Learner (GenCL) to model CxBR agents from observed human performance [Fernlund, 2004]. To accomplish the process of context discovery, this research proposes two contextualization algorithms: Contextualized Fuzzy ART (CFA) and Context Partitioning and Clustering (COPAC). The former is a more naive approach utilizing the well known Fuzzy ART strategy while the latter is a robust algorithm developed on the principles of CxBR. Using Fernlund's original five drivers, the CFA and COPAC algorithms were tested and evaluated on their ability to effectively contextualize each driver's individualized set of behaviors into well-formed and meaningful context bases as well as generating high-fidelity agents through the integration with Fernlund's GenCL algorithm. The resultant set of agents was able to capture and generalized each driver's individualized behaviors.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Engineering PhD

Styles APA, Harvard, Vancouver, ISO, etc.

23

Seroussi, Brigitte. « Alex : resolution de problemes par analogie basee sur un apprentissage de strategies par la construction dynamique d'une memoire indexee des exemples ». Paris 7, 1988. http://www.theses.fr/1988PA077153.

Texte intégral

Résumé :

Alex est un systeme d'apprentissage de strategies de resolution de problemes qui procede par construction d'une memoire structuree des exemples ce qui permet d'eviter les difficultes dues au manque de structuration des connaissances representees. Alex est actuellement implemente en common lisp et applique a l'apprentissage du calcul de limites de suites

Styles APA, Harvard, Vancouver, ISO, etc.

24

Sha, Long. « Representing and predicting multi-agent data in adversarial team sports ». Thesis, Queensland University of Technology, 2018. https://eprints.qut.edu.au/116506/1/Long_Sha_Thesis.pdf.

Texte intégral

Résumé :

This thesis addresses the theoretical challenges of the application of Artificial Intelligence (AI) to the domain of sports. The key contribution of this work is a new data representation that allows AI algorithms to understand real world sports games such as basketball and soccer. The theoretical advances that this thesis has contributed has the potential to make a significant impact on many aspects of sport analytics, such as prediction, retrieval and simulation. Intelligent systems have been developed based upon this method which enables active spectator engagement in sporting events and more effective coaching of athletes.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Leitner, Jürgen. « From vision to actions : Towards adaptive and autonomous humanoid robots ». Thesis, Università della Svizzera Italiana, 2014. https://eprints.qut.edu.au/90178/2/2014INFO020.pdf.

Texte intégral

Résumé :

Although robotics research has seen advances over the last decades robots are still not in widespread use outside industrial applications. Yet a range of proposed scenarios have robots working together, helping and coexisting with humans in daily life. In all these a clear need to deal with a more unstructured, changing environment arises. I herein present a system that aims to overcome the limitations of highly complex robotic systems, in terms of autonomy and adaptation. The main focus of research is to investigate the use of visual feedback for improving reaching and grasping capabilities of complex robots. To facilitate this a combined integration of computer vision and machine learning techniques is employed. From a robot vision point of view the combination of domain knowledge from both imaging processing and machine learning techniques, can expand the capabilities of robots. I present a novel framework called Cartesian Genetic Programming for Image Processing (CGP-IP). CGP-IP can be trained to detect objects in the incoming camera streams and successfully demonstrated on many different problem domains. The approach requires only a few training images (it was tested with 5 to 10 images per experiment) is fast, scalable and robust yet requires very small training sets. Additionally, it can generate human readable programs that can be further customized and tuned. While CGP-IP is a supervised-learning technique, I show an integration on the iCub, that allows for the autonomous learning of object detection and identification. Finally this dissertation includes two proof-of-concepts that integrate the motion and action sides. First, reactive reaching and grasping is shown. It allows the robot to avoid obstacles detected in the visual stream, while reaching for the intended target object. Furthermore the integration enables us to use the robot in non-static environments, i.e. the reaching is adapted on-the- fly from the visual feedback received, e.g. when an obstacle is moved into the trajectory. The second integration highlights the capabilities of these frameworks, by improving the visual detection by performing object manipulation actions.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Jernite, Yacine. « Learning Representations of Text through Language and Discourse Modeling| From Characters to Sentences ». Thesis, New York University, 2018. http://pqdtopen.proquest.com/#viewpdf?dispub=10680744.

Texte intégral

Résumé :

In this thesis, we consider the problem of obtaining a representation of the meaning expressed in a text. How to do so correctly remains a largely open problem, combining a number of inter-related questions (e.g. what is the role of context in interpreting text? how should language understanding models handle compositionality? etc...) In this work, after reflecting on the notion of meaning and describing the most common sequence modeling paradigms in use in recent work, we focus on two of these questions: what level of granularity text should be read at, and what training objectives can lead models to learn useful representations of a text's meaning.

In a first part, we argue for the use of sub-word information for that purpose, and present new neural network architectures which can either process words in a way that takes advantage of morphological information, or do away with word separations altogether while still being able to identify relevant units of meaning.

The second part starts by arguing for the use of language modeling as a learning objective, and provides algorithms which can help with its scalability issues and propose a globally rather than locally normalized probability distribution. It then explores the question of what makes a good language learning objective, and introduces discriminative objectives inspired by the notion of discourse coherence which help learn a representation of the meaning of sentences.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Giusti, Rafael. « Classicação de séries temporais utilizando diferentes representações de dados e ensembles ». Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/55/55134/tde-05122017-170029/.

Texte intégral

Résumé :

Dados temporais são ubíquos em quase todas as áreas do conhecimento humano. A área de aprendizado de máquina tem contribuído para a mineração desse tipo de dados com algoritmos para classificação, agrupamento, detecção de anomalias ou exceções e detecção de padrões recorrentes, dentre outros. Tais algoritmos dependem, muitas vezes, de uma função capaz de expressar um conceito de similaridade entre os dados. Um dos mais importantes modelos de classificação, denominado 1-NN, utiliza uma função de distância para comparar uma série temporal de interesse a um conjunto de referência, atribuindo à primeira o rótulo da série de referência mais semelhante. Entretanto, existem situações nas quais os dados temporais são insuficientes para identificar vizinhos de acordo com o conceito associado às classes. Uma possível abordagem é transportar as séries para um domínio de representação no qual atributos mais relevantes para a classificação são mais claros. Por exemplo, uma série temporal pode ser decomposta em componentes periódicas de diferentes frequências e amplitudes. Para muitas aplicações, essas componentes são muito mais significativas na discriminação das classes do que a evolução da série ao longo do tempo. Nesta Tese, emprega-se diversidade de representações e de distâncias para a classificação de séries temporais. Com base na escolha de uma representação de dados adequada para expor as características discriminativas do domínio, pode-se obter classificadores mais fiéis ao conceitoalvo. Para esse fim, promove-se um estudo de domínios de representação de dados temporais, visando identificar como esses domínios podem estabelecer espaços alternativos de decisão. Diferentes modelos do classificador 1-NN são avaliados isoladamente e associados em ensembles de classificadores a fim de se obter classificadores mais robustos. Funções de distância e domínios alternativos de representação são também utilizados neste trabalho para produzir atributos não temporais, denominados atributos de distâncias. Esses atributos refletem conceitos de vizinhança aos exemplos do conjunto de treinamento e podem ser utilizados para treinar modelos de classificação que tipicamente não são eficazes quando treinados com as observações originais. Nesta Tese mostra-se que atributos de distância permitem obter resultados compatíveis com o estado-da-arte.
Temporal data are ubiquitous in nearly all areas of human knowledge. The research field known as machine learning has contributed to temporal data mining with algorithms for classification, clustering, anomaly or exception detection, and motif detection, among others. These algorithms oftentimes are reliant on a distance function that must be capable of expressing a similarity concept among the data. One of the most important classification models, the 1-NN, employs a distance function when comparing a time series of interest against a reference set, and assigns to the former the label of the most similar reference time series. There are, however, several domains in which the temporal data are insufficient to characterize neighbors according to the concepts associated to the classes. One possible approach to this problem is to transform the time series into a representation domain in which the meaningful attributes for the classifier are more clearly expressed. For instance, a time series may be decomposed into periodic components of different frequency and amplitude values. For several applications, those components are much more meaningful in discriminating the classes than the temporal evolution of the original observations. In this work, we employ diversity of representation and distance functions for the classification of time series. By choosing a data representation that is more suitable to express the discriminating characteristics of the domain, we are able to achieve classification that are more faithful to the target-concept. With this goal in mind, we promote a study of time series representation domains, and we evaluate how such domains can provide alternative decision spaces. Different models of the 1-NN classifier are evaluated both isolated and associated in classification ensembles in order to construct more robust classifiers. We also use distance functions and alternative representation domains in order to extract nontemporal attributes, known as distance features. Distance features reflect neighborhood concepts of the instances to the training samples, and they may be used to induce classification models which are typically not as efficient when trained with the original time series observations. We show that distance features allow for classification results compatible with the state-of-the-art.

Styles APA, Harvard, Vancouver, ISO, etc.

28

Zhou, Bolei. « Interpretable representation learning for visual intelligence ». Thesis, Massachusetts Institute of Technology, 2018. http://hdl.handle.net/1721.1/117837.

Texte intégral

Résumé :

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 131-140).
Recent progress of deep neural networks in computer vision and machine learning has enabled transformative applications across robotics, healthcare, and security. However, despite the superior performance of the deep neural networks, it remains challenging to understand their inner workings and explain their output predictions. This thesis investigates several novel approaches for opening up the "black box" of neural networks used in visual recognition tasks and understanding their inner working mechanism. I first show that objects and other meaningful concepts emerge as a consequence of recognizing scenes. A network dissection approach is further introduced to automatically identify the internal units as the emergent concept detectors and quantify their interpretability. Then I describe an approach that can efficiently explain the output prediction for any given image. It sheds light on the decision-making process of the networks and why the predictions succeed or fail. Finally, I show some ongoing efforts toward learning efficient and interpretable deep representations for video event understanding and some future directions.
by Bolei Zhou.
Ph. D.

Styles APA, Harvard, Vancouver, ISO, etc.

29

Boots, Byron. « Spectral Approaches to Learning Predictive Representations ». Research Showcase @ CMU, 2012. http://repository.cmu.edu/dissertations/131.

Texte intégral

Résumé :

A central problem in artificial intelligence is to choose actions to maximize reward in a partially observable, uncertain environment. To do so, we must obtain an accurate environment model, and then plan to maximize reward. However, for complex domains, specifying a model by hand can be a time consuming process. This motivates an alternative approach: learning a model directly from observations. Unfortunately, learning algorithms often recover a model that is too inaccurate to support planning or too large and complex for planning to succeed; or, they require excessive prior domain knowledge or fail to provide guarantees such as statistical consistency. To address this gap, we propose spectral subspace identification algorithms which provably learn compact, accurate, predictive models of partially observable dynamical systems directly from sequences of action-observation pairs. Our research agenda includes several variations of this general approach: spectral methods for classical models like Kalman filters and hidden Markov models, batch algorithms and online algorithms, and kernel-based algorithms for learning models in high- and infinite-dimensional feature spaces. All of these approaches share a common framework: the model’s belief space is represented as predictions of observable quantities and spectral algorithms are applied to learn the model parameters. Unlike the popular EM algorithm, spectral learning algorithms are statistically consistent, computationally efficient, and easy to implement using established matrixalgebra techniques. We evaluate our learning algorithms on a series of prediction and planning tasks involving simulated data and real robotic systems.

Styles APA, Harvard, Vancouver, ISO, etc.

30

Cribier-Delande, Perrine. « Contexts and user modeling through disentangled representations learning ». Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS407.

Texte intégral

Résumé :

Les récents succès, parfois très médiatisés, de l’apprentissage profond ont attiré beaucoup d'attention sur le domaine. Sa force réside dans sa capacité à apprendre des représentations d’objets complexes. Pour Renault, obtenir une représentation de conducteurs est un objectif à long terme, identifié depuis longtemps. Cela lui permettrait de mieux comprendre comment ses produits sont utilisés. Renault possède une grande connaissance de la voiture et des données qu’elle utilise et produit. Ces données sont presque entièrement contenues dans le CAN. Cependant, le CAN ne contient que le fonctionnement interne de la voiture (rien sur son environnement). De nombreux autres facteurs (tels que la météo, les autres usagers, l’état de la route...) peuvent affecter la conduite, il nous faut donc les démêler. Nous avons considéré l’utilisateur (ici le conducteur) comme un contexte comme les autres. En transférant des méthodes de démêlage utilisées en image, nous avons pu créer des modèles qui apprennent des représentations démêlées des contextes. Supervisés uniquement avec de la prédiction pendant l’entrainement, nos modèles sont capables de générer des données à partir des représentations de contextes apprises. Ils peuvent même représenter de nouveaux contextes, qui ne sont vus qu’après l'entrainement (durant l’inférence). Le transfert de ces modèles sur les données CAN a permis de confirmer que les informations sur les contextes de conduite (y compris l'identité des conducteurs) sont bien contenues dans le CAN
The recent, sometimes very publicised, successes have drawn a lot of attention to Deep Learning (DL). Many questions are asked about the limitations of these techniques. The great strength of DL is its ability to learn representations of complex objects. Renault, as a car manufacturer, has a vested interest in discovering how their cars are used. Learning representations of drivers is one of their long-term goals. Renault's strength partly lies in their knowledge of cars and the data they use and produce. This data is almost entirely contained in the Controller Area Network (CAN). However, the CAN data only contains the inner workings of a car and not its surroundings. As many factors exterior to the driver and the car (such as weather, other road users, road condition...) can affect driving, we must find a way to disentangle them.Seeing the user (or driver) as just another context allowed us to use context modelling approaches. By transferring disentanglement approaches used in computer vision, we were able to develop models that learn disentangled representations of contexts. We tested these models with a few public datasets of time series with clearly labelled contexts. Using only forecasting as supervision during training, our models are able to generate data only from the learned representations of contexts. They even learn to represent new contexts, only seen after training.We then transferred the developed models on CAN data and were able to confirm that information about driving contexts (including driver's identity) is indeed contained in the CAN

Styles APA, Harvard, Vancouver, ISO, etc.

31

Paudel, Subodh. « Methodology to estimate building energy consumption using artificial intelligence ». Thesis, Nantes, Ecole des Mines, 2016. http://www.theses.fr/2016EMNA0237/document.

Texte intégral

Résumé :

Les normes de construction pour des bâtiments de plus en plus économes en énergie (BBC) nécessitent une attention particulière. Ces normes reposent sur l’amélioration des performances thermiques de l’enveloppe du bâtiment associé à un effet capacitif des murs augmentant la constante de temps du bâtiment. La prévision de la demande en énergie de bâtiments BBC est plutôt complexe. Ce travail aborde cette question par la mise en œuvre d’intelligence artificielle(IA). Deux approches de mise en œuvre ont été proposées : « all data » et « relevant data ». L’approche « all data » utilise la totalité de la base de données. L’approche « relevant data » consiste à extraire de la base de données un jeu de données représentant le mieux possible les prévisions météorologiques en incluant les phénomènes inertiels. Pour cette extraction, quatre modes de sélection ont été étudiés : le degré jour (HDD), une modification du degré jour (mHDD) et des techniques de reconnaissance de chemin : distance de Fréchet (FD) et déformation temporelle dynamique (DTW). Quatre techniques IA sont mises en œuvre : réseau de neurones (ANN), machine à support de vecteurs (SVM), arbre de décision (DT) et technique de forêt aléatoire (RF). Dans un premier temps, six bâtiments ont été numériquement simulés (de consommation entre 86 kWh/m².an à 25 kWh/m².an) : l’approche « relevant data » reposant sur le couple (DTW, SVM) donne les prévisions avec le moins d’erreur. L’approche « relevant data » (DTW, SVM) sur les mesures du bâtiment de l’Ecole des Mines de Nantes reste performante
High-energy efficiency building standards (as Low energy building LEB) to improve building consumption have drawn significant attention. Building standards is basically focused on improving thermal performance of envelope and high heat capacity thus creating a higher thermal inertia. However, LEB concept introduces alarge time constant as well as large heat capacity resulting in a slower rate of heat transfer between interior of building and outdoor environment. Therefore, it is challenging to estimate and predict thermal energy demand for such LEBs. This work focuses on artificial intelligence (AI) models to predict energy consumptionof LEBs. We consider two kinds of AI modeling approaches: “all data” and “relevant data”. The “all data” uses all available data and “relevant data” uses a small representative day dataset and addresses the complexity of building non-linear dynamics by introducing past day climatic impacts behavior. This extraction is based on either simple physical understanding: Heating Degree Day (HDD), modified HDD or pattern recognition methods: Frechet Distance and Dynamic Time Warping (DTW). Four AI techniques have been considered: Artificial Neural Network (ANN), Support Vector Machine (SVM), Boosted Ensemble Decision Tree (BEDT) and Random forest (RF). In a first part, numerical simulations for six buildings (heat demand in the range [25 – 85 kWh/m².yr]) have been performed. The approach “relevant data” with (DTW, SVM) shows the best results. Real data of the building “Ecole des Mines de Nantes” proves the approach is still relevant

Styles APA, Harvard, Vancouver, ISO, etc.

32

Kim, Joo-Kyung. « Linguistic Knowledge Transfer for Enriching Vector Representations ». The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500571436042414.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

33

Zaiem, Mohamed Salah. « Informed Speech Self-supervised Representation Learning ». Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAT009.

Texte intégral

Résumé :

L'apprentissage des caractéristiques a été un des principaux moteurs des progrès de l'apprentissage automatique. L'apprentissage auto-supervisé est apparu dans ce contexte, permettant le traitement de données non étiquetées en vue d'une meilleure performance sur des tâches faiblement étiquetées. La première partie de mon travail de doctorat vise à motiver les choix dans les pipelines d'apprentissage auto-supervisé de la parole qui apprennent les représentations non supervisées. Dans cette thèse, je montre d'abord comment une fonction basée sur l'indépendance conditionnelle peut être utilisée pour sélectionner efficacement et de manière optimale des tâches de pré-entraînement adaptées à la meilleure performance sur une tâche cible. La deuxième partie de mon travail de doctorat étudie l'évaluation et l'utilisation de représentations auto-supervisées pré-entraînées. J'y explore d'abord la robustesse des benchmarks actuels d'auto-supervision de la parole aux changements dans les choix de modélisation en aval. Je propose, ensuite, de nouvelles approches d'entraînement en aval favorisant l'efficacité et la généralisation
Feature learning has been driving machine learning advancement with the recently proposed methods getting progressively rid of handcrafted parts within the transformations from inputs to desired labels. Self-supervised learning has emerged within this context, allowing the processing of unlabeled data towards better performance on low-labeled tasks. The first part of my doctoral work is aimed towards motivating the choices in the speech selfsupervised pipelines learning the unsupervised representations. In this thesis, I first show how conditional-independence-based scoring can be used to efficiently and optimally select pretraining tasks tailored for the best performance on a target task. The second part of my doctoral work studies the evaluation and usage of pretrained self-supervised representations. I explore, first, the robustness of current speech self-supervision benchmarks to changes in the downstream modeling choices. I propose, second, fine-tuning approaches for better efficicency and generalization

Styles APA, Harvard, Vancouver, ISO, etc.

34

Ceccon, Stefano. « Extending Bayesian network models for mining and classification of glaucoma ». Thesis, Brunel University, 2013. http://bura.brunel.ac.uk/handle/2438/8051.

Texte intégral

Résumé :

Glaucoma is a degenerative disease that damages the nerve fiber layer in the retina of the eye. Its mechanisms are not fully known and there is no fully-effective strategy to prevent visual impairment and blindness. However, if treatment is carried out at an early stage, it is possible to slow glaucomatous progression and improve the quality of life of sufferers. Despite the great amount of heterogeneous data that has become available for monitoring glaucoma, the performance of tests for early diagnosis are still insufficient, due to the complexity of disease progression and the diffculties in obtaining sufficient measurements. This research aims to assess and extend Bayesian Network (BN) models to investigate the nature of the disease and its progression, as well as improve early diagnosis performance. The exibility of BNs and their ability to integrate with clinician expertise make them a suitable tool to effectively exploit the available data. After presenting the problem, a series of BN models for cross-sectional data classification and integration are assessed; novel techniques are then proposed for classification and modelling of glaucoma progression. The results are validated against literature, direct expert knowledge and other Artificial Intelligence techniques, indicating that BNs and their proposed extensions improve glaucoma diagnosis performance and enable new insights into the disease process.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Hernández-Vela, Antonio. « From pixels to gestures : learning visual representations for human analysis in color and depth data sequences ». Doctoral thesis, Universitat de Barcelona, 2015. http://hdl.handle.net/10803/292488.

Texte intégral

Résumé :

The visual analysis of humans from images is an important topic of interest due to its relevance to many computer vision applications like pedestrian detection, monitoring and surveillance, human-computer interaction, e-health or content-based image retrieval, among others. In this dissertation in learning different visual representations of the human body that are helpful for the visual analysis of humans in images and video sequences. To that end, we analyze both RCB and depth image modalities and address the problem from three different research lines, at different levels of abstraction; from pixels to gestures: human segmentation, human pose estimation and gesture recognition. First, we show how binary segmentation (object vs. background) of the human body in image sequences is helpful to remove all the background clutter present in the scene. The presented method, based on “Graph cuts” optimization, enforces spatio-temporal consistency of the produced segmentation masks among consecutive frames. Secondly, we present a framework for multi-label segmentation for obtaining much more detailed segmentation masks: instead of just obtaining a binary representation separating the human body from the background, finer segmentation masks can be obtained separating the different body parts. At a higher level of abstraction, we aim for a simpler yet descriptive representation of the human body. Human pose estimation methods usually rely on skeletal models of the human body, formed by segments (or rectangles) that represent the body limbs, appropriately connected following the kinematic constraints of the human body, In practice, such skeletal models must fulfill some constraints in order to allow for efficient inference, while actually Iimiting the expressiveness of the model. In order to cope with this, we introduce a top-down approach for predicting the position of the body parts in the model, using a mid-level part representation based on Poselets. Finally, we propose a framework for gesture recognition based on the bag of visual words framework. We leverage the benefits of RGB and depth image modalities by combining modality-specific visual vocabularies in a late fusion fashion. A new rotation-variant depth descriptor is presented, yielding better results than other state-of-the-art descriptors. Moreover, spatio-temporal pyramids are used to encode rough spatial and temporal structure. In addition, we present a probabilistic reformulation of Dynamic Time Warping for gesture segmentation in video sequences, A Gaussian-based probabilistic model of a gesture is learnt, implicitly encoding possible deformations in both spatial and time domains.
L’anàlisi visual de persones a partir d'imatges és un tema de recerca molt important, atesa la rellevància que té a una gran quantitat d'aplicacions dins la visió per computador, com per exemple: detecció de vianants, monitorització i vigilància,interacció persona-màquina, “e-salut” o sistemes de recuperació d’matges a partir de contingut, entre d'altres. En aquesta tesi volem aprendre diferents representacions visuals del cos humà, que siguin útils per a la anàlisi visual de persones en imatges i vídeos. Per a tal efecte, analitzem diferents modalitats d'imatge com són les imatges de color RGB i les imatges de profunditat, i adrecem el problema a diferents nivells d'abstracció, des dels píxels fins als gestos: segmentació de persones, estimació de la pose humana i reconeixement de gestos. Primer, mostrem com la segmentació binària (objecte vs. fons) del cos humà en seqüències d'imatges ajuda a eliminar soroll pertanyent al fons de l'escena en qüestió. El mètode presentat, basat en optimització “Graph cuts”, imposa consistència espai-temporal a Ies màscares de segmentació obtingudes en “frames” consecutius. En segon lloc, presentem un marc metodològic per a la segmentació multi-classe, amb la qual podem obtenir una descripció més detallada del cos humà, en comptes d'obtenir una simple representació binària separant el cos humà del fons, podem obtenir màscares de segmentació més detallades, separant i categoritzant les diferents parts del cos. A un nivell d'abstraccíó més alt, tenim com a objectiu obtenir representacions del cos humà més simples, tot i ésser suficientment descriptives. Els mètodes d'estimació de la pose humana sovint es basen en models esqueletals del cos humà, formats per segments (o rectangles) que representen les extremitats del cos, connectades unes amb altres seguint les restriccions cinemàtiques del cos humà. A la pràctica, aquests models esqueletals han de complir certes restriccions per tal de poder aplicar mètodes d'inferència que permeten trobar la solució òptima de forma eficient, però a la vegada aquestes restriccions suposen una gran limitació en l'expressivitat que aques.ts models son capaços de capturar. Per tal de fer front a aquest problema, proposem un enfoc “top-down” per a predir la posició de les parts del cos del model esqueletal, introduïnt una representació de parts de mig nivell basada en “Poselets”. Finalment. proposem un marc metodològic per al reconeixement de gestos, basat en els “bag of visual words”. Aprofitem els avantatges de les imatges RGB i les imatges; de profunditat combinant vocabularis visuals específiques per a cada modalitat, emprant late fusion. Proposem un nou descriptor per a imatges de profunditat invariant a rotació, que millora l'estat de l'art, i fem servir piràmides espai-temporals per capturar certa estructura espaial i temporal dels gestos. Addicionalment, presentem una reformulació probabilística del mètode “Dynamic Time Warping” per al reconeixement de gestos en seqüències d'imatges. Més específicament, modelem els gestos amb un model probabilistic gaussià que implícitament codifica possibles deformacions tant en el domini espaial com en el temporal.

Styles APA, Harvard, Vancouver, ISO, etc.

36

Ben-Younes, Hedi. « Multi-modal representation learning towards visual reasoning ». Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS173.

Texte intégral

Résumé :

La quantité d'images présentes sur internet augmente considérablement, et il est nécessaire de développer des techniques permettant le traitement automatique de ces contenus. Alors que les méthodes de reconnaissance visuelle sont de plus en plus évoluées, la communauté scientifique s'intéresse désormais à des systèmes aux capacités de raisonnement plus poussées. Dans cette thèse, nous nous intéressons au Visual Question Answering (VQA), qui consiste en la conception de systèmes capables de répondre à une question portant sur une image. Classiquement, ces architectures sont conçues comme des systèmes d'apprentissage automatique auxquels on fournit des images, des questions et leur réponse. Ce problème difficile est habituellement abordé par des techniques d'apprentissage profond. Dans la première partie de cette thèse, nous développons des stratégies de fusion multimodales permettant de modéliser des interactions entre les représentations d'image et de question. Nous explorons des techniques de fusion bilinéaire, et assurons l'expressivité et la simplicité des modèles en utilisant des techniques de factorisation tensorielle. Dans la seconde partie, on s'intéresse au raisonnement visuel qui encapsule ces fusions. Après avoir présenté les schémas classiques d'attention visuelle, nous proposons une architecture plus avancée qui considère les objets ainsi que leurs relations mutuelles. Tous les modèles sont expérimentalement évalués sur des jeux de données standards et obtiennent des résultats compétitifs avec ceux de la littérature
The quantity of images that populate the Internet is dramatically increasing. It becomes of critical importance to develop the technology for a precise and automatic understanding of visual contents. As image recognition systems are becoming more and more relevant, researchers in artificial intelligence now seek for the next generation vision systems that can perform high-level scene understanding. In this thesis, we are interested in Visual Question Answering (VQA), which consists in building models that answer any natural language question about any image. Because of its nature and complexity, VQA is often considered as a proxy for visual reasoning. Classically, VQA architectures are designed as trainable systems that are provided with images, questions about them and their answers. To tackle this problem, typical approaches involve modern Deep Learning (DL) techniques. In the first part, we focus on developping multi-modal fusion strategies to model the interactions between image and question representations. More specifically, we explore bilinear fusion models and exploit concepts from tensor analysis to provide tractable and expressive factorizations of parameters. These fusion mechanisms are studied under the widely used visual attention framework: the answer to the question is provided by focusing only on the relevant image regions. In the last part, we move away from the attention mechanism and build a more advanced scene understanding architecture where we consider objects and their spatial and semantic relations. All models are thoroughly experimentally evaluated on standard datasets and the results are competitive with the literature

Styles APA, Harvard, Vancouver, ISO, etc.

37

Boufoy-Bastick, Zacharyas Amaury. « Internet democracy : the political science and computer science of direct democracy at the large scale ». Thesis, Paris 4, 2014. http://www.theses.fr/2014PA040182.

Texte intégral

Résumé :

La démocratie représentative souffre de nombreuses lacunes qui remettent en question la légitimité même des gouvernements démocratiques modernes. Tandis que la représentation directe pourrait théoriquement éliminer ces incongruités, elle a jusqu'à présent été considérée comme irréalisable en raison de limitations spatio-temporelles. Cette thèse adresse ces problèmes en introduisant le concept de Démocratie Internet - distinct de l’e-démocratie et de l’e-gouvernement existant. La Démocratie Internet consiste à cerner la représentation démocratique de telle manière qu’elle puisse être opérationnalisée par le biais de l’informatique. Pour ce faire, cette thèse remonte d'abord aux problèmes de la démocratie et de la représentation indirecte dans ses principes premiers, et propose une nouvelle approche (structurelle symbiotique) à l'application de l'Internet pour la démocratie. Ensuite, elle montre que la Démocratie Internet peut fonctionner grâce à l'analyse des données collectées passivement sur l'accès et la production de l’information. Enfin, elle offre de nombreuses contributions à l’informatique, qui jusqu’alors était limitée dans la précision de l'analyse des sentiments. La thèse développe une Proposition d’Opinion Asymétrique (AOP) et l’applique à un nouveau concept de ‘Espace des Sentiments’; elle développe également la première base de données assez nuancée pour l'analyse des sentiments; et elle utilise l'Espace de Sentiment afin de développer la méthode de calcul originale «Split-Fit » qui accroît la précision de l’apprentissage automatique
Representative democracy suffers from numerous shortcomings that are so significant they bring into question the very legitimacy of modern democratic governments. While direct representation might theoretically eliminate these multiple defects, it has until now been considered unworkable due to limitations of space and of time. This thesis addresses these deficiencies by introducing Internet Democracy, which is distinct from existing e-democracy and e-government. Internet Democracy is an operational, computational formulation of democratic representation. To support this contribution, this thesis first derives the problems of democracy and indirect representation from first principles. It then proposes a new approach (the symbiotic structural approach) which applies the Internet to democracy. It then supports the proposition that Internet Democracy can operate through the analysis of passively collected data on information access and on information production (for instance, using sentiment analysis). Finally, it makes numerous topical contributions to computer science based on the observation that sentiment analysis hits a ceiling of accuracy which cannot currently be transcended. These contributions range from suggesting an Asymmetric Opinion Proposition (AOP) and applying this to a Sentiment Space describing the computational structure of sentiment; developing the first extremely fine-grained dataset for sentiment analysis; and applying Sentiment Space to develop the original ‘Split-Fit’ computing method which increases the accuracy of machine learning based Sentiment Analysis

Styles APA, Harvard, Vancouver, ISO, etc.

38

Adapa, Supriya. « TensorFlow Federated Learning : Application to Decentralized Data ». Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Trouver le texte intégral

Résumé :

Machine learning is a complex discipline. But implementing machine learning models is far less daunting and difficult than it used to be, thanks to machine learning frameworks such as Google’s TensorFlow Federated that ease the process of acquiring data, training models, serving predictions, and refining future results. There are an estimated 3 billion smartphones in the world and 7 billion connected devices. These phones and devices are constantly generating new data. Traditional analytics and machine learning need that data to be centrally collected before it is processed to yield insights, ML models, and ultimately better products. This centralized approach can be problematic if the data is sensitive or expensive to centralize. Wouldn’t it be better if we could run the data analysis and machine learning right on the devices where that data is generated, and still be able to aggregate together what’s been learned? TensorFlow Federated (TFF) is an open-source framework for experimenting with machine learning and other computations on decentralized data. It implements an approach called Federated Learning (FL), which enables many participating clients to train shared ML models while keeping their data locally. We have designed TFF based on our experiences with developing the federated learning technology at Google, where it powers ML models for mobile keyboard predictions and on-device search. With TFF, we are excited to put a flexible, open framework for locally simulating decentralized computations into the hands of all TensorFlow users. By using Twitter datasets we have done text classification of positives and negatives tweets of Twitter Account by using the Twitter application in machine learning.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Duminy, Willem H. « A learning framework for zero-knowledge game playing agents ». Pretoria : [s.n.], 2006. http://upetd.up.ac.za/thesis/available/etd-10172007-153836.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

40

Wilhelmi, Roca Francesc. « Towards spatial reuse in future wireless local area networks : a sequential learning approach ». Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669970.

Texte intégral

Résumé :

The Spatial Reuse (SR) operation is gaining momentum in the latest IEEE 802.11 family of standards due to the overwhelming requirements posed by next-generation wireless networks. In particular, the rising traffic requirements and the number of concurrent devices compromise the efficiency of increasingly crowded Wireless Local Area Networks (WLANs) and throw into question their decentralized nature. The SR operation, initially introduced by the IEEE~802.11ax-2021 amendment and further studied in IEEE 802.11be-2024, aims to increase the number of concurrent transmissions in an Overlapping Basic Service Set (OBSS) using sensitivity adjustment and transmit power control, thus improving spectral efficiency. Our analysis of the SR operation shows outstanding potential in improving the number of concurrent transmissions in crowded deployments, which contributed to enabling low-latency next-generation applications. However, the potential gains of SR are currently limited by the rigidity of the mechanism introduced for the 11ax, and the lack of coordination among BSSs implementing it. The SR operation is evolving towards coordinated schemes where different BSSs cooperate. Nevertheless, coordination entails communication and synchronization overhead, which impact on the performance of WLANs remains unknown. Moreover, the coordinated approach is incompatible with devices using previous IEEE 802.11 versions, potentially leading to degrading the performance of legacy networks. For those reasons, in this thesis, we start assessing the viability of decentralized SR, and thoroughly examine the main impediments and shortcomings that may result from it. We aim to shed light on the future shape of WLANs concerning SR optimization and whether their decentralized nature should be kept, or it is preferable to evolve towards coordinated and centralized deployments. To address the SR problem in a decentralized manner, we focus on Artificial Intelligence (AI) and propose using a class of sequential learning-based methods, referred to as Multi-Armed Bandits (MABs). The MAB framework suits the SR problem because it addresses the uncertainty caused by the concurrent operation of multiple devices (i.e., multi-player setting) and the lack of information in decentralized deployments. MABs can potentially overcome the complexity of the spatial interactions that result from devices modifying their sensitivity and transmit power. In this regard, our results indicate significant performance gains (up to 100\% throughput improvement) in highly dense WLAN deployments. Nevertheless, the multi-agent setting raises several concerns that may compromise network devices' performance (definition of joint goals, time-horizon convergence, scalability aspects, or non-stationarity). Besides, our analysis of multi-agent SR encompasses an in-depth study of infrastructure aspects for next-generation AI-enabled networking.
L'operació de reutilització espacial (SR) està guanyant impuls per a la darrera família d'estàndards IEEE 802.11 a causa dels aclaparadors requisits que presenten les xarxes sense fils de nova generació. En particular, la creixent necessitat de tràfic i el nombre de dispositius concurrents comprometen l'eficiència de les xarxes d'àrea local sense fils (WLANs) cada cop més concorregudes i posen en dubte la seva naturalesa descentralitzada. L'operació SR, inicialment introduïda per l'estàndard IEEE 802.11ax-2021 i estudiada posteriorment a IEEE 802.11be-2024, pretén augmentar el nombre de transmissions concurrents en un conjunt bàsic de serveis superposats (OBSS) mitjançant l'ajustament de la sensibilitat i el control de potència de transmissió, millorant així l'eficiència espectral. El nostre estudi sobre el funcionament de SR mostra un potencial destacat per millorar el nombre de transmissions simultànies en desplegaments multitudinaris, contribuint així al desenvolupament d'aplicacions de nova generació de baixa latència. Tot i això, els beneficis potencials de SR són actualment limitats per la rigidesa del mecanisme introduït per a l'11ax, i la manca de coordinació entre els BSS que ho implementen. L'operació SR evoluciona cap a esquemes coordinats on cooperen diferents BSS. En canvi, la coordinació comporta una sobrecàrrega de comunicació i sincronització, el qual té un impacte en el rendiment de les WLAN. D'altra banda, l'esquema coordinat és incompatible amb els dispositius que utilitzen versions anteriors IEEE 802.11, la qual cosa podria deteriorar el rendiment de les xarxes ja existents. Per aquests motius, en aquesta tesi s'avalua la viabilitat de mecanismes descentralitzats per a SR i s'analitzen minuciosament els principals impediments i mancances que se'n poden derivar. El nostre objectiu és donar llum a la futura forma de les WLAN pel que fa a l?optimització de SR i si s'ha de mantenir el seu caràcter descentralitzat, o bé és preferible evolucionar cap a desplegaments coordinats i centralitzats. Per abordar SR de forma descentralitzada, ens centrem en la Intel·ligència Artificial (AI) i ens proposem utilitzar una classe de mètodes seqüencials basats en l'aprenentatge, anomenats Multi-Armed Bandits (MAB). L'esquema MAB s'adapta al problema descentralitzat de SR perquè aborda la incertesa causada pel funcionament simultani de diversos dispositius (és a dir, un entorn multi-jugador) i la falta d'informació que se'n deriva. Els MAB poden fer front a la complexitat darrera les interaccions espacials entre dispositius que resulten de modificar la seva sensibilitat i potència de transmissió. En aquest sentit, els nostres resultats indiquen guanys importants de rendiment (fins al 100 \%) en desplegaments altament densos. Tot i això, l'aplicació d'aprenentatge automàtic amb múltiples agents planteja diversos problemes que poden comprometre el rendiment dels dispositius d'una xarxa (definició d'objectius conjunts, horitzó de convergència, aspectes d'escalabilitat o manca d'estacionarietat). A més, el nostre estudi d'aprenentatge multi-agent per a SR multi-agent inclou aspectes d'infraestructura per a xarxes de nova generació que integrin AI de manera intrínseca.

Styles APA, Harvard, Vancouver, ISO, etc.

41

Feutry, Clément. « Two sides of relevant information : anonymized representation through deep learning and predictor monitoring ». Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS479.

Texte intégral

Résumé :

Le travail présenté ici est pour une première partie à l'intersection de l'apprentissage profond et anonymisation. Un cadre de travail complet est développé dans le but d'identifier et de retirer, dans une certaine mesure et de manière automatique, les caractéristiques privées d'une identité pour des données de type image. Deux méthodes différentes de traitement des données sont étudiées. Ces deux méthodes partagent une même architecture de réseau en forme de Y et cela malgré des différences concernant les types de couches de neurones utilisés conséquemment à leur objectif d'utilisation. La première méthode de traitement des données concerne la création ex nihilo de représentations anonymisées permettant un compromis entre la conservation des caractéristiques pertinentes et l'altération des caractéristiques privées. Ce cadre de travail a abouti à une nouvelle fonction de perte.Le deuxième type de traitement des données ne fait usage d'aucune information pertinente sur ces données et utilise uniquement des informations privées; ceci signifie que tout ce qui n'est pas une caractéristiques privées est supposé pertinent. Par conséquent les représentations anonymisées sont de même nature que les données initiales (une image est transformée en une image anonymisée). Cette tâche a conduit à un autre type d'architecture (toujours en forme de Y) et a fourni des résultats fortement sensibles au type des données. La seconde partie de mon travail concerne une autre sorte d'information utile : cette partie se concentre sur la surveillance du comportement des prédicteurs. Dans le cadre de l'analyse de "modèle boîte noire", on a uniquement accès aux probabilités que le prédicteur fournit (sans aucune connaissance du type de structure/architecture qui produit ces probabilités). Cette surveillance est effectuée pour détecter des comportements anormaux. L'étude de ces probabilités peut servir d'indicateur d'inadéquation potentiel entre les statistiques des données et les statistiques du modèle. Deux méthodes utilisant différents outils sont présentées. La première compare la fonction de répartition des statistiques de sortie d'un ensemble connu et d'un ensemble de données à tester. La seconde fait intervenir deux outils : un outil reposant sur l'incertitude du classifieur et un autre outil reposant sur la matrice de confusion. Ces méthodes produisent des résultats concluants
The work presented here is for a first part at the cross section of deep learning and anonymization. A full framework was developed in order to identify and remove to a certain extant, in an automated manner, the features linked to an identity in the context of image data. Two different kinds of processing data were explored. They both share the same Y-shaped network architecture despite components of this network varying according to the final purpose. The first one was about building from the ground an anonymized representation that allowed a trade-off between keeping relevant features and tampering private features. This framework has led to a new loss. The second kind of data processing specified no relevant information about the data, only private information, meaning that everything that was not related to private features is assumed relevant. Therefore the anonymized representation shares the same nature as the initial data (e.g. an image is transformed into an anonymized image). This task led to another type of architecture (still in a Y-shape) and provided results strongly dependent on the type of data. The second part of the work is relative to another kind of relevant information: it focuses on the monitoring of predictor behavior. In the context of black box analysis, we only have access to the probabilities outputted by the predictor (without any knowledge of the type of structure/architecture producing these probabilities). This monitoring is done in order to detect abnormal behavior that is an indicator of a potential mismatch between the data statistics and the model statistics. Two methods are presented using different tools. The first one is based on comparing the empirical cumulative distribution of known data and to be tested data. The second one introduces two tools: one relying on the classifier uncertainty and the other relying on the confusion matrix. These methods produce concluding results

Styles APA, Harvard, Vancouver, ISO, etc.

42

Maxwell, Tricia Lesley. « Factors affecting the representation of objects in distributed attention ». Thesis, University of Sussex, 2011. http://sro.sussex.ac.uk/id/eprint/7478/.

Texte intégral

Résumé :

Our phenomenological experience of what we see around us is of an accurate representation. However, such information is widely distributed in the brain so necessitates that some form of co-ordination of this information takes place to enable a coherent view of the world. The most prominently researched theory is Feature Integration Theory (Treisman, 1993). This proposes that accurate binding is dependent on the current spatial distribution of attention. Individual objects compete for attention via activity in a master map of locations with competition being modulated by grouping processes. When attention is distributed, features are randomly selected and a bound object can be perceived to be located at any position within the attentional window. However, there is evidence to suggest that in distributed attention, coarse location information is available and two alternative proposals have been put forward. The first suggests that it is the information from a unitary feature that can determine the perceived location of a bound object (Tsal & Lavie, 1988) and the second proposes that the information from all contributing features is averaged to provide the location information (Ashby et al, 1996). One way to determine which model best represents feature integration is to investigate the contribution each feature makes to the perceived location of a bound object by using the illusory conjunction paradigm in which an object is formed when the visual system binds together individual features from items located in different parts of the display. Results indicated that in briefly presented displays, perception can be subject to tritan-like shifts in colour space. No support for spatial averaging or for the random rule was found. Rather, there was a strong indication that the perceived location of illusory objects was sourced from a single feature supporting the unitary rule.

Styles APA, Harvard, Vancouver, ISO, etc.

43

Moulouel, Koussaila. « Hybrid AI approaches for context recognition : application to activity recognition and anticipation and context abnormalities handling in Ambient Intelligence environments ». Electronic Thesis or Diss., Paris Est, 2023. http://www.theses.fr/2023PESC0014.

Texte intégral

Résumé :

Les systèmes d'intelligence ambiante (AmI) visent à fournir aux utilisateurs des services d'assistance destinés à améliorer leur qualité de vie en termes d'autonomie, de sécurité et de bien-être. La conception de systèmes AmI capables d'une reconnaissance précise, fine et cohérente du contexte spatial et/ou temporel de l'utilisateur, en tenant compte de l'incertitude et de l'observabilité partielle des environnements AmI, pose plusieurs défis pour permettre une meilleure adaptation des services d'assistance au contexte de l'utilisateur. L'objectif de cette thèse est de proposer un ensemble de contributions qui répondent à ces défis. Premièrement, une ontologie de contexte est proposée pour modéliser les connaissances contextuelles dans les environnements AmI. L'objectif de cette ontologie est la modélisation du contexte de l'utilisateur en prenant en compte les différents attributs du contexte et en définissant les axiomes du raisonnement de bon sens nécessaire pour déduire et mettre à jour le contexte. La deuxième contribution est un cadre hybride basé sur une ontologie qui combine le raisonnement de bon sens probabiliste et la planification probabiliste pour reconnaître le contexte, en particulier les anomalies du contexte, et fournir des services d'assistance, en présence d'incertitude et d'observabilité partielle des environnements. Ce cadre exploite les prédictions des attributs du contexte, à savoir l'activité et la localisation de l'utilisateur, fournies par des modèles d'apprentissage profond. Dans ce cadre, le raisonnement probabiliste de bon sens est basé sur l'ontologie de contexte proposée pour définir l'axiomatisation de l'inférence de contexte et de la planification sous incertitude. La planification probabiliste est utilisée pour caractériser un contexte anormal en faisant face à l'incomplétude de la connaissance contextuelle due à l'observabilité partielle. Le cadre proposé a été évalué à l'aide de modèles transformateur et CNN-LSTM en considérant les datasets Orange4Home et SIMADL. Les résultats montrent l'efficacité du cadre pour reconnaître les contextes, en termes d'activité et de localisation de l'utilisateur, ainsi que les anomalies du contexte. Troisièmement, un cadre hybride combinant l'apprentissage profond et le raisonnement probabiliste pour anticiper les activités humaines est proposé. Le raisonnement de bon sens probabiliste exploité dans ce cadre est basé sur le raisonnement abductif pour anticiper les activités humaines atomiques et composites, et sur le raisonnement temporel pour saisir les changements d'attributs du contexte. Des modèles d'apprentissage profond ont été exploités pour reconnaître les attributs du contexte, tels que les objets, les mains humaines et les emplacements des personnes. L'ontologie du contexte est utilisée pour modéliser les relations entre les activités atomiques et les activités composites. L'évaluation du cadre montre sa capacité à anticiper les activités composites sur un horizon temporel de quelques minutes, contrairement aux approches de l'état de l'art qui ne peuvent anticiper les activités atomiques que sur un horizon temporel de quelques secondes. Enfin, un cadre basé sur le raisonnement par flux est proposé pour anticiper les activités humaines atomiques et composites à partir de flux de données d'attributs contextuels collectés à la volée. Le système de raisonnement par flux effectue un raisonnement causal, abductif et temporel avec les connaissances contextuelles obtenues en cours d'exécution. Des axiomes d'effets dynamiques ont été introduits pour anticiper les activités composites qui peuvent être soumises à des événements imprévus. Le cadre proposé a été validé par des expériences menées dans un environnement de cuisine. La performance remarquablement élevée en termes de nombre d'anticipations d'activités montre la capacité du cadre à prendre en compte la connaissance contextuelle des épisodes passés nécessaire pour anticiper les activités composites
Ambient Intelligence (AmI) systems aim to provide users with assistance services intended to improve the quality of their lives in terms of autonomy, safety, and well-being. The design of AmI systems capable of accurate, fine-grained and consistent recognition of the spatial and/or temporal user's context, taking into account the uncertainty and partial observability of AmI environments, poses several challenges to enable a better adaptation of the assistance services to the user's context. The purpose of this thesis is to propose a set of contributions that address these challenges. Firstly, a context ontology is proposed to model contextual knowledge in AmI environments. The purpose of this ontology is the modeling of the user's context taking into account different context attributes and defining axioms of the commonsense reasoning necessary to infer and update the context of the user. The second contribution is an ontology-based hybrid framework that combine probabilistic commonsense reasoning and probabilistic planning to recognize the user's context, in particular, context abnormalities, and provide context-aware assistance services, in presence of uncertainty and partial observability of the environments. This framework exploits context attribute predictions, namely user's activity and user's location, provided by deep learning models. In this framework, the probabilistic commonsense reasoning is based on the proposed context ontology to define the axiomatization of the context inference and planning under uncertainty. Probabilistic planning is used to characterize abnormal context by coping with the incompleteness of contextual knowledge due to the partial observability of AmI environments. The proposed framework was evaluated using transformers and CNN-LSTM models considering Orange4Home and SIMADL datasets. The results show the effectiveness of the framework to recognize user's contexts, in terms of user's activity and location, along with context abnormalities. Thirdly, a hybrid framework combining deep learning and probabilistic commonsense reasoning for anticipating human activities based on egocentric videos is proposed. The probabilistic commonsense reasoning exploited in this framework is based on abductive reasoning to anticipate both human atomic and composite activities, and temporal reasoning to capture context attribute changes. Deep learning models were exploited to recognize context attributes, such as objects, human hands, and human locations. The context ontology is used to model the relationships between atomic activities and composite activities. The evaluation of the framework shows its ability to anticipate composite activities over a time horizon of minutes, in contrast to state-of-the-art approaches that can only anticipate atomic activities over a time horizon of seconds. It also showed good performance in terms of accuracy of classification of anticipated activities and computation time. Lastly, a stream reasoning-based framework is proposed to anticipate atomic and composite human activities from data streams of context attributes collected on-the-fly. Deep learning models were used to recognize context attributes, such as objects used in activities, hands and user locations. The stream reasoning system performs causal, abductive and temporal reasoning with contextual knowledge obtained at run-time. Dynamic effect axioms were introduced to anticipate composite activities that can be subject to unforeseen events, such as skipping an atomic activity and delay an atomic activity. The proposed framework was validated through experiments conducted in a kitchen environment. The remarkably high performance in terms of the number of activity anticipations shows the ability of the framework to take into account the contextual knowledge of past episodes needed to anticipate composite activities

Styles APA, Harvard, Vancouver, ISO, etc.

44

Feng, Qianli. « Modeling Action Intentionality in Humans and Machines ». The Ohio State University, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=osu1616769653536292.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

45

Serban, Iulian Vlad. « Representation learning for dialogue systems ». Thèse, 2019. http://hdl.handle.net/1866/23440.

Texte intégral

Résumé :

Cette thèse présente une série de mesures prises pour étudier l’apprentissage de représentations (par exemple, l’apprentissage profond) afin de mettre en place des systèmes de dialogue et des agents de conversation virtuels. La thèse est divisée en deux parties générales. La première partie de la thèse examine l’apprentissage des représentations pour les modèles de dialogue génératifs. Conditionnés sur une séquence de tours à partir d’un dialogue textuel, ces modèles ont la tâche de générer la prochaine réponse appropriée dans le dialogue. Cette partie de la thèse porte sur les modèles séquence-à-séquence, qui est une classe de réseaux de neurones profonds génératifs. Premièrement, nous proposons un modèle d’encodeur-décodeur récurrent hiérarchique ("Hierarchical Recurrent Encoder-Decoder"), qui est une extension du modèle séquence-à-séquence traditionnel incorporant la structure des tours de dialogue. Deuxièmement, nous proposons un modèle de réseau de neurones récurrents multi-résolution ("Multiresolution Recurrent Neural Network"), qui est un modèle empilé séquence-à-séquence avec une représentation stochastique intermédiaire (une "représentation grossière") capturant le contenu sémantique abstrait communiqué entre les locuteurs. Troisièmement, nous proposons le modèle d’encodeur-décodeur récurrent avec variables latentes ("Latent Variable Recurrent Encoder-Decoder"), qui suivent une distribution normale. Les variables latentes sont destinées à la modélisation de l’ambiguïté et l’incertitude qui apparaissent naturellement dans la communication humaine. Les trois modèles sont évalués et comparés sur deux tâches de génération de réponse de dialogue: une tâche de génération de réponses sur la plateforme Twitter et une tâche de génération de réponses de l’assistance technique ("Ubuntu technical response generation task"). La deuxième partie de la thèse étudie l’apprentissage de représentations pour un système de dialogue utilisant l’apprentissage par renforcement dans un contexte réel. Cette partie porte plus particulièrement sur le système "Milabot" construit par l’Institut québécois d’intelligence artificielle (Mila) pour le concours "Amazon Alexa Prize 2017". Le Milabot est un système capable de bavarder avec des humains sur des sujets populaires à la fois par la parole et par le texte. Le système consiste d’un ensemble de modèles de récupération et de génération en langage naturel, comprenant des modèles basés sur des références, des modèles de sac de mots et des variantes des modèles décrits ci-dessus. Cette partie de la thèse se concentre sur la tâche de sélection de réponse. À partir d’une séquence de tours de dialogues et d’un ensemble des réponses possibles, le système doit sélectionner une réponse appropriée à fournir à l’utilisateur. Une approche d’apprentissage par renforcement basée sur un modèle appelée "Bottleneck Simulator" est proposée pour sélectionner le candidat approprié pour la réponse. Le "Bottleneck Simulator" apprend un modèle approximatif de l’environnement en se basant sur les trajectoires de dialogue observées et le "crowdsourcing", tout en utilisant un état abstrait représentant la sémantique du discours. Le modèle d’environnement est ensuite utilisé pour apprendre une stratégie d’apprentissage du renforcement par le biais de simulations. La stratégie apprise a été évaluée et comparée à des approches concurrentes via des tests A / B avec des utilisateurs réel, où elle démontre d’excellente performance.
This thesis presents a series of steps taken towards investigating representation learning (e.g. deep learning) for building dialogue systems and conversational agents. The thesis is split into two general parts. The first part of the thesis investigates representation learning for generative dialogue models. Conditioned on a sequence of turns from a text-based dialogue, these models are tasked with generating the next, appropriate response in the dialogue. This part of the thesis focuses on sequence-to-sequence models, a class of generative deep neural networks. First, we propose the Hierarchical Recurrent Encoder-Decoder model, which is an extension of the vanilla sequence-to sequence model incorporating the turn-taking structure of dialogues. Second, we propose the Multiresolution Recurrent Neural Network model, which is a stacked sequence-to-sequence model with an intermediate, stochastic representation (a "coarse representation") capturing the abstract semantic content communicated between the dialogue speakers. Third, we propose the Latent Variable Recurrent Encoder-Decoder model, which is a variant of the Hierarchical Recurrent Encoder-Decoder model with latent, stochastic normally-distributed variables. The latent, stochastic variables are intended for modelling the ambiguity and uncertainty occurring naturally in human language communication. The three models are evaluated and compared on two dialogue response generation tasks: a Twitter response generation task and the Ubuntu technical response generation task. The second part of the thesis investigates representation learning for a real-world reinforcement learning dialogue system. Specifically, this part focuses on the Milabot system built by the Quebec Artificial Intelligence Institute (Mila) for the Amazon Alexa Prize 2017 competition. Milabot is a system capable of conversing with humans on popular small talk topics through both speech and text. The system consists of an ensemble of natural language retrieval and generation models, including template-based models, bag-of-words models, and variants of the models discussed in the first part of the thesis. This part of the thesis focuses on the response selection task. Given a sequence of turns from a dialogue and a set of candidate responses, the system must select an appropriate response to give the user. A model-based reinforcement learning approach, called the Bottleneck Simulator, is proposed for selecting the appropriate candidate response. The Bottleneck Simulator learns an approximate model of the environment based on observed dialogue trajectories and human crowdsourcing, while utilizing an abstract (bottleneck) state representing high-level discourse semantics. The learned environment model is then employed to learn a reinforcement learning policy through rollout simulations. The learned policy has been evaluated and compared to competing approaches through A/B testing with real-world users, where it was found to yield excellent performance.

Styles APA, Harvard, Vancouver, ISO, etc.

46

« Multimodal Representation Learning for Visual Reasoning and Text-to-Image Translation ». Master's thesis, 2018. http://hdl.handle.net/2286/R.I.51644.

Texte intégral

Résumé :

abstract: Multimodal Representation Learning is a multi-disciplinary research field which aims to integrate information from multiple communicative modalities in a meaningful manner to help solve some downstream task. These modalities can be visual, acoustic, linguistic, haptic etc. The interpretation of ’meaningful integration of information from different modalities’ remains modality and task dependent. The downstream task can range from understanding one modality in the presence of information from other modalities, to that of translating input from one modality to another. In this thesis the utility of multimodal representation learning for understanding one modality vis-à-vis Image Understanding for Visual Reasoning given corresponding information in other modalities, as well as translating from one modality to the other, specifically, Text to Image Translation was investigated. Visual Reasoning has been an active area of research in computer vision. It encompasses advanced image processing and artificial intelligence techniques to locate, characterize and recognize objects, regions and their attributes in the image in order to comprehend the image itself. One way of building a visual reasoning system is to ask the system to answer questions about the image that requires attribute identification, counting, comparison, multi-step attention, and reasoning. An intelligent system is thought to have a proper grasp of the image if it can answer said questions correctly and provide a valid reasoning for the given answers. In this work how a system can be built by learning a multimodal representation between the stated image and the questions was investigated. Also, how background knowledge, specifically scene-graph information, if available, can be incorporated into existing image understanding models was demonstrated. Multimodal learning provides an intuitive way of learning a joint representation between different modalities. Such a joint representation can be used to translate from one modality to the other. It also gives way to learning a shared representation between these varied modalities and allows to provide meaning to what this shared representation should capture. In this work, using the surrogate task of text to image translation, neural network based architectures to learn a shared representation between these two modalities was investigated. Also, the ability that such a shared representation is capable of capturing parts of different modalities that are equivalent in some sense is proposed. Specifically, given an image and a semantic description of certain objects present in the image, a shared representation between the text and the image modality capable of capturing parts of the image being mentioned in the text was demonstrated. Such a capability was showcased on a publicly available dataset.
Dissertation/Thesis
Masters Thesis Computer Engineering 2018

Styles APA, Harvard, Vancouver, ISO, etc.

47

« Knowledge Representation, Reasoning and Learning for Non-Extractive Reading Comprehension ». Doctoral diss., 2019. http://hdl.handle.net/2286/R.I.55482.

Texte intégral

Résumé :

abstract: While in recent years deep learning (DL) based approaches have been the popular approach in developing end-to-end question answering (QA) systems, such systems lack several desired properties, such as the ability to do sophisticated reasoning with knowledge, the ability to learn using less resources and interpretability. In this thesis, I explore solutions that aim to address these drawbacks. Towards this goal, I work with a specific family of reading comprehension tasks, normally referred to as the Non-Extractive Reading Comprehension (NRC), where the given passage does not contain enough information and to correctly answer sophisticated reasoning and ``additional knowledge" is required. I have organized the NRC tasks into three categories. Here I present my solutions to the first two categories and some preliminary results on the third category. Category 1 NRC tasks refer to the scenarios where the required ``additional knowledge" is missing but there exists a decent natural language parser. For these tasks, I learn the missing ``additional knowledge" with the help of the parser and a novel inductive logic programming. The learned knowledge is then used to answer new questions. Experiments on three NRC tasks show that this approach along with providing an interpretable solution achieves better or comparable accuracy to that of the state-of-the-art DL based approaches. The category 2 NRC tasks refer to the alternate scenario where the ``additional knowledge" is available but no natural language parser works well for the sentences of the target domain. To deal with these tasks, I present a novel hybrid reasoning approach which combines symbolic and natural language inference (neural reasoning) and ultimately allows symbolic modules to reason over raw text without requiring any translation. Experiments on two NRC tasks shows its effectiveness. The category 3 neither provide the ``missing knowledge" and nor a good parser. This thesis does not provide an interpretable solution for this category but some preliminary results and analysis of a pure DL based approach. Nonetheless, the thesis shows beyond the world of pure DL based approaches, there are tools that can offer interpretable solutions for challenging tasks without using much resource and possibly with better accuracy.
Dissertation/Thesis
Doctoral Dissertation Computer Science 2019

Styles APA, Harvard, Vancouver, ISO, etc.

48

Racah, Evan. « Unsupervised representation learning in interactive environments ». Thèse, 2019. http://hdl.handle.net/1866/23788.

Texte intégral

Résumé :

Extraire une représentation de tous les facteurs de haut niveau de l'état d'un agent à partir d'informations sensorielles de bas niveau est une tâche importante, mais difficile, dans l'apprentissage automatique. Dans ce memoire, nous explorerons plusieurs approches non supervisées pour apprendre ces représentations. Nous appliquons et analysons des méthodes d'apprentissage de représentations non supervisées existantes dans des environnements d'apprentissage par renforcement, et nous apportons notre propre suite d'évaluations et notre propre méthode novatrice d'apprentissage de représentations d'état. Dans le premier chapitre de ce travail, nous passerons en revue et motiverons l'apprentissage non supervisé de représentations pour l'apprentissage automatique en général et pour l'apprentissage par renforcement. Nous introduirons ensuite un sous-domaine relativement nouveau de l'apprentissage de représentations : l'apprentissage auto-supervisé. Nous aborderons ensuite deux approches fondamentales de l'apprentissage de représentations, les méthodes génératives et les méthodes discriminatives. Plus précisément, nous nous concentrerons sur une collection de méthodes discriminantes d'apprentissage de représentations, appelées méthodes contrastives d'apprentissage de représentations non supervisées (CURL). Nous terminerons le premier chapitre en détaillant diverses approches pour évaluer l'utilité des représentations. Dans le deuxième chapitre, nous présenterons un article de workshop dans lequel nous évaluons un ensemble de méthodes d'auto-supervision standards pour les problèmes d'apprentissage par renforcement. Nous découvrons que la performance de ces représentations dépend fortement de la dynamique et de la structure de l'environnement. À ce titre, nous déterminons qu'une étude plus systématique des environnements et des méthodes est nécessaire. Notre troisième chapitre couvre notre deuxième article, Unsupervised State Representation Learning in Atari, où nous essayons d'effectuer une étude plus approfondie des méthodes d'apprentissage de représentations en apprentissage par renforcement, comme expliqué dans le deuxième chapitre. Pour faciliter une évaluation plus approfondie des représentations en apprentissage par renforcement, nous introduisons une suite de 22 jeux Atari entièrement labellisés. De plus, nous choisissons de comparer les méthodes d'apprentissage de représentations de façon plus systématique, en nous concentrant sur une comparaison entre méthodes génératives et méthodes contrastives, plutôt que les méthodes générales du deuxième chapitre choisies de façon moins systématique. Enfin, nous introduisons une nouvelle méthode contrastive, ST-DIM, qui excelle sur ces 22 jeux Atari.
Extracting a representation of all the high-level factors of an agent’s state from level-level sensory information is an important, but challenging task in machine learning. In this thesis, we will explore several unsupervised approaches for learning these state representations. We apply and analyze existing unsupervised representation learning methods in reinforcement learning environments, as well as contribute our own evaluation benchmark and our own novel state representation learning method. In the first chapter, we will overview and motivate unsupervised representation learning for machine learning in general and for reinforcement learning. We will then introduce a relatively new subfield of representation learning: self-supervised learning. We will then cover two core representation learning approaches, generative methods and discriminative methods. Specifically, we will focus on a collection of discriminative representation learning methods called contrastive unsupervised representation learning (CURL) methods. We will close the first chapter by detailing various approaches for evaluating the usefulness of representations. In the second chapter, we will present a workshop paper, where we evaluate a handful of off-the-shelf self-supervised methods in reinforcement learning problems. We discover that the performance of these representations depends heavily on the dynamics and visual structure of the environment. As such, we determine that a more systematic study of environments and methods is required. Our third chapter covers our second article, Unsupervised State Representation Learning in Atari, where we try to execute a more thorough study of representation learning methods in RL as motivated by the second chapter. To facilitate a more thorough evaluation of representations in RL we introduce a benchmark of 22 fully labelled Atari games. In addition, we choose the representation learning methods for comparison in a more systematic way by focusing on comparing generative methods with contrastive methods, instead of the less systematically chosen off-the-shelf methods from the second chapter. Finally, we introduce a new contrastive method, ST-DIM, which excels at the 22 Atari games.

Styles APA, Harvard, Vancouver, ISO, etc.

49

Dumoulin, Vincent. « Representation Learning for Visual Data ». Thèse, 2018. http://hdl.handle.net/1866/21140.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

50

Hosseini, Seyedarian. « Towards learning sentence representation with self-supervision ». Thèse, 2019. http://hdl.handle.net/1866/23784.

Texte intégral

Résumé :

Ces dernières années, il y a eu un intérêt croissant dans le domaine de l'apprentissage profond pour le traitement du langage naturel. Plusieurs étapes importantes ont été franchies au cours de la dernière décennie dans divers problèmes, tels que les systèmes de questions-réponses, le résumé de texte, l'analyse des sentiments, etc. Le pré-entraînement des modèles de langage dans une manière auto-supervisé est une partie importante de ces réalisations. Cette thèse explore un ensemble de méthodes auto-supervisées pour apprendre des représentations de phrases à partir d'une grande quantité de données non étiquetées. Nous introduisons également un nouveau modèle de mémoire augmentée pour apprendre des représentations basées sur une structure d'arbre. Nous évaluons et analysons ces représentations sur différentes tâches. Dans le chapitre 1, nous introduisons les bases des réseaux neuronaux avant et des réseaux neuronaux récurrents. Le chapitre se poursuit avec la discussion de l'algorithme de rétropropagation pour former les réseaux neuronaux de flux avant, et la rétropropagation à travers l'algorithme de temps pour former les réseaux neuronaux récurrents. Nous discutons également de trois approches différentes dans le domaine de l’apprentissage de représentations, notamment l'apprentissage supervisé, l'apprentissage non supervisé et une approche relativement nouvelle appelée apprentissage auto-supervisé. Dans le chapitre 2, nous discutons des principes fondamentaux du traitement automatique du langage naturel profond. Plus précisément, nous couvrons les représentations de mots, les représentations de phrases et la modélisation du langage. Nous nous concentrons sur l'évaluation et l'état actuel de la littérature pour ces concepts. Nous finissons le chapitre en discutant le pré-entraînement à grande échelle et le transfert de l’apprentissage dans la langue. Dans le chapitre 3, nous étudions un ensemble de tâches auto-supervisées qui prend avantage de l’estimation contrastive bruitée afin d'apprendre des représentations de phrases à l'aide de données non étiquetées. Nous entraînons notre modèle sur un grand corpus et évaluons nos représentations de phrases apprises sur un ensemble de tâches du langage naturel en aval provenant du cadre SentEval. Notre modèle entraîné sur les tâches proposées surpasse les méthodes non-supervisées sur un sous-ensemble de tâches de SentEval. Dans les chapitres 4, nous introduisons un modèle de mémoire augmentée appelé Ordered Memory, qui présente plusieurs améliorations par rapport aux réseaux de neurones récurrents augmentés par pile traditionnels. Nous introduisons un nouveau mécanisme d'attention de Stick-breaking inspiré par les Ordered Neurons [shen et. al., 2019] pour écrire et effacer la mémoire. Une nouvelle cellule récursive à portes est également introduite pour composer des représentations de bas niveau en des représentations de haut niveau. Nous montrons que ce modèle fonctionne bien sur la tâche d'inférence logique et la tâche ListOps, et il montre également de fortes propriétés de généralisation dans ces tâches. Enfin, nous évaluons notre modèle sur les tâches (binaire et multi-classe) SST (Stanford Sentiment Treebank) et rapportons des résultats comparables à l’état de l’art sur ces tâches.
In chapter 1, we introduce the basics of feed forward neural networks and recurrent neural networks. The chapter continues with the discussion of the backpropagation algorithm to train feed forward neural networks, and the backpropagation through time algorithm to train recurrent neural networks. We also discuss three different approaches in learning representations, namely supervised learning, unsupervised learning, and a relatively new approach called self-supervised learning. In chapter 2, we talk about the fundamentals of deep natural language processing. Specifically, we cover word representations, sentence representations, and language modelling. We focus on the evaluation and current state of the literature for these concepts. We close the chapter by discussing large scale pre-training and transfer learning in language. In chapter 3, we investigate a set of self-supervised tasks that take advantage of noise contrastive estimation in order to learn sentence representations using unlabeled data. We train our model on a large corpora and evaluate our learned sentence representations on a set of downstream natural language tasks from the SentEval framework. Our model trained on the proposed tasks outperforms unsupervised methods on a subset of tasks from SentEval. In chapter 4, we introduce a memory augmented model called Ordered Memory with several improvements over traditional stack-augmented recurrent neural networks. We introduce a new Stick-breaking attention mechanism inspired by Ordered Neurons [Shen et.al., 2019] to write in and erase from the memory. A new Gated Recursive Cell is also introduced to compose low level representations into higher level ones. We show that this model performs well on the logical inference task and the ListOps task, and it also shows strong generalization properties in these tasks. Finally, we evaluate our model on the SST (Stanford Sentiment Treebank) tasks (binary and fine-grained) and report results that are comparable with state-of-the-art on these tasks.

Styles APA, Harvard, Vancouver, ISO, etc.

Thèses sur le sujet « Representation learning (artifical intelligence) »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres