Gotowa bibliografia na temat „Apprentissage par renforcement factorisé”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Spis treści
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Apprentissage par renforcement factorisé”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Apprentissage par renforcement factorisé"
Degris, Thomas, Olivier Sigaud i Pierre-Henri Wuillemin. "Apprentissage par renforcement factorisé pour le comportement de personnages non joueurs". Revue d'intelligence artificielle 23, nr 2-3 (13.05.2009): 221–51. http://dx.doi.org/10.3166/ria.23.221-251.
Pełny tekst źródłaLaurent, Guillaume J., i Emmanuel Piat. "Apprentissage par renforcement dans le cadre des processus décisionnels de Markov factorisés observables dans le désordre. Etude expérimentale du Q-Learning parallèle appliqué aux problèmes du labyrinthe et du New York Driving". Revue d'intelligence artificielle 20, nr 2-3 (1.06.2006): 275–310. http://dx.doi.org/10.3166/ria.20.275-310.
Pełny tekst źródłaGriffon, L., M. Chennaoui, D. Leger i M. Strauss. "Apprentissage par renforcement dans la narcolepsie de type 1". Médecine du Sommeil 15, nr 1 (marzec 2018): 60. http://dx.doi.org/10.1016/j.msom.2018.01.164.
Pełny tekst źródłaGarcia, Pascal. "Exploration guidée en apprentissage par renforcement. Connaissancesa prioriet relaxation de contraintes". Revue d'intelligence artificielle 20, nr 2-3 (1.06.2006): 235–75. http://dx.doi.org/10.3166/ria.20.235-275.
Pełny tekst źródłaHost, Shirley, i Nicolas Sabouret. "Apprentissage par renforcement d'actes de communication dans un système multi-agent". Revue d'intelligence artificielle 24, nr 2 (17.04.2010): 159–88. http://dx.doi.org/10.3166/ria.24.159-188.
Pełny tekst źródłaCHIALI, Ramzi. "Le texte littéraire comme référentiel préférentiel dans le renforcement de la compétence interculturelle en contexte institutionnel. Réflexion et dynamique didactique." Revue plurilingue : Études des Langues, Littératures et Cultures 7, nr 1 (14.07.2023): 70–78. http://dx.doi.org/10.46325/ellic.v7i1.99.
Pełny tekst źródłaAltintas, Gulsun, i Isabelle Royer. "Renforcement de la résilience par un apprentissage post-crise : une étude longitudinale sur deux périodes de turbulence". M@n@gement 12, nr 4 (2009): 266. http://dx.doi.org/10.3917/mana.124.0266.
Pełny tekst źródłaDutech, Alain, i Manuel Samuelides. "Apprentissage par renforcement pour les processus décisionnels de Markov partiellement observés Apprendre une extension sélective du passé". Revue d'intelligence artificielle 17, nr 4 (1.08.2003): 559–89. http://dx.doi.org/10.3166/ria.17.559-589.
Pełny tekst źródłaBOUCHET, N., L. FRENILLOT, M. DELAHAYE, M. GAILLARD, P. MESTHE, E. ESCOURROU i L. GIMENEZ. "GESTION DES EMOTIONS VECUES PAR LES ETUDIANTS EN 3E CYCLE DE MEDECINE GENERALE DE TOULOUSE AU COURS DE LA PRISE EN CHARGE DES PATIENTS : ETUDE QUALITATIVE". EXERCER 34, nr 192 (1.04.2023): 184–90. http://dx.doi.org/10.56746/exercer.2023.192.184.
Pełny tekst źródłaZossou, Espérance, Seth Graham-Acquaah, John Manful, Simplice D. Vodouhe i Rigobert C. Tossou. "Les petits exploitants agricoles à l’école inclusive : cas de l’apprentissage collectif par la vidéo et la radio sur la post-récolte du riz local au Bénin". International Journal of Biological and Chemical Sciences 15, nr 4 (19.11.2021): 1678–97. http://dx.doi.org/10.4314/ijbcs.v15i4.29.
Pełny tekst źródłaRozprawy doktorskie na temat "Apprentissage par renforcement factorisé"
Kozlova, Olga. "Apprentissage par renforcement hiérarchique et factorisé". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2010. http://tel.archives-ouvertes.fr/tel-00632968.
Pełny tekst źródłaDegris, Thomas. "Apprentissage par renforcement dans les processus de décision Markoviens factorisés". Paris 6, 2007. http://www.theses.fr/2007PA066594.
Pełny tekst źródłaTournaire, Thomas. "Model-based reinforcement learning for dynamic resource allocation in cloud environments". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS004.
Pełny tekst źródłaThe emergence of new technologies (Internet of Things, smart cities, autonomous vehicles, health, industrial automation, ...) requires efficient resource allocation to satisfy the demand. These new offers are compatible with new 5G network infrastructure since it can provide low latency and reliability. However, these new needs require high computational power to fulfill the demand, implying more energy consumption in particular in cloud infrastructures and more particularly in data centers. Therefore, it is critical to find new solutions that can satisfy these needs still reducing the power usage of resources in cloud environments. In this thesis we propose and compare new AI solutions (Reinforcement Learning) to orchestrate virtual resources in virtual network environments such that performances are guaranteed and operational costs are minimised. We consider queuing systems as a model for clouds IaaS infrastructures and bring learning methodologies to efficiently allocate the right number of resources for the users.Our objective is to minimise a cost function considering performance costs and operational costs. We go through different types of reinforcement learning algorithms (from model-free to relational model-based) to learn the best policy. Reinforcement learning is concerned with how a software agent ought to take actions in an environment to maximise some cumulative reward. We first develop queuing model of a cloud system with one physical node hosting several virtual resources. On this first part we assume the agent perfectly knows the model (dynamics of the environment and the cost function), giving him the opportunity to perform dynamic programming methods for optimal policy computation. Since the model is known in this part, we also concentrate on the properties of the optimal policies, which are threshold-based and hysteresis-based rules. This allows us to integrate the structural property of the policies into MDP algorithms. After providing a concrete cloud model with exponential arrivals with real intensities and energy data for cloud provider, we compare in this first approach efficiency and time computation of MDP algorithms against heuristics built on top of the queuing Markov Chain stationary distributions.In a second part we consider that the agent does not have access to the model of the environment and concentrate our work with reinforcement learning techniques, especially model-based reinforcement learning. We first develop model-based reinforcement learning methods where the agent can re-use its experience replay to update its value function. We also consider MDP online techniques where the autonomous agent approximates environment model to perform dynamic programming. This part is evaluated in a larger network environment with two physical nodes in tandem and we assess convergence time and accuracy of different reinforcement learning methods, mainly model-based techniques versus the state-of-the-art model-free methods (e.g. Q-Learning).The last part focuses on model-based reinforcement learning techniques with relational structure between environment variables. As these tandem networks have structural properties due to their infrastructure shape, we investigate factored and causal approaches built-in reinforcement learning methods to integrate this information. We provide the autonomous agent with a relational knowledge of the environment where it can understand how variables are related to each other. The main goal is to accelerate convergence by: first having a more compact representation with factorisation where we devise a factored MDP online algorithm that we evaluate and compare with model-free and model-based reinforcement learning algorithms; second integrating causal and counterfactual reasoning that can tackle environments with partial observations and unobserved confounders
Lesaint, Florian. "Modélisation du conditionnement animal par représentations factorisées dans un système d'apprentissage dual : explication des différences inter-individuelles aux niveaux comportemental et neurophysiologique". Electronic Thesis or Diss., Paris 6, 2014. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2014PA066287.pdf.
Pełny tekst źródłaPavlovian conditioning, the acquisition of responses to neutral stimuli previously paired with rewards, and instrumental conditioning, the acquisition of goal-oriented responses, are central to our learning capacities. However, despite some evidences of entanglement, they are mainly studied separately. Reinforcement learning (RL), learning by trials and errors to reach goals, is central to models of instrumental conditioning, while models of Pavlovian conditioning rely on more dedicated and often incompatible architectures. This complicates the study of their interactions. We aim at finding concepts which combined with RL models may provide a unifying architecture to allow such a study. We develop a model that combines a classical RL system, learning values over states, with a revised RL system, learning values over individual stimuli and biasing the behaviour towards reward-related ones. It explains maladaptive behaviours in pigeons by the detrimental interaction of systems, and inter-individual differences in rats by a simple variation at the population level in the contribution of each system to the overall behaviour. It explains unexpected dopaminergic patterns with regard to the dominant hypothesis that dopamine parallels a reward prediction error signal by computing such signal over features rather than states, and makes it compatible with an alternative hypothesis that dopamine also contributes to the acquisition of incentive salience, making reward-related stimuli wanted for themselves. The present model shows promising properties for the investigation of Pavlovian conditioning, instrumental conditioning and their interactions
Lesaint, Florian. "Modélisation du conditionnement animal par représentations factorisées dans un système d'apprentissage dual : explication des différences inter-individuelles aux niveaux comportemental et neurophysiologique". Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066287/document.
Pełny tekst źródłaPavlovian conditioning, the acquisition of responses to neutral stimuli previously paired with rewards, and instrumental conditioning, the acquisition of goal-oriented responses, are central to our learning capacities. However, despite some evidences of entanglement, they are mainly studied separately. Reinforcement learning (RL), learning by trials and errors to reach goals, is central to models of instrumental conditioning, while models of Pavlovian conditioning rely on more dedicated and often incompatible architectures. This complicates the study of their interactions. We aim at finding concepts which combined with RL models may provide a unifying architecture to allow such a study. We develop a model that combines a classical RL system, learning values over states, with a revised RL system, learning values over individual stimuli and biasing the behaviour towards reward-related ones. It explains maladaptive behaviours in pigeons by the detrimental interaction of systems, and inter-individual differences in rats by a simple variation at the population level in the contribution of each system to the overall behaviour. It explains unexpected dopaminergic patterns with regard to the dominant hypothesis that dopamine parallels a reward prediction error signal by computing such signal over features rather than states, and makes it compatible with an alternative hypothesis that dopamine also contributes to the acquisition of incentive salience, making reward-related stimuli wanted for themselves. The present model shows promising properties for the investigation of Pavlovian conditioning, instrumental conditioning and their interactions
Magnan, Jean-Christophe. "Représentations graphiques de fonctions et processus décisionnels Markoviens factorisés". Thesis, Paris 6, 2016. http://www.theses.fr/2016PA066042/document.
Pełny tekst źródłaIn decision theoretic planning, the factored framework (Factored Markovian Decision Process, FMDP) has produced several efficient algorithms in order to resolve large sequential decision making under uncertainty problems. The efficiency of this algorithms relies on data structures such as decision trees or algebraïc decision diagrams (ADDs). These planification technics are exploited in Reinforcement Learning by the architecture SDyna in order to resolve large and unknown problems. However, state-of-the-art learning and planning algorithms used in SDyna require the problem to be specified uniquely using binary variables and/or to use improvable data structure in term of compactness. In this book, we present our research works that seek to elaborate and to use a new data structure more efficient and less restrictive, and to integrate it in a new instance of the SDyna architecture. In a first part, we present the state-of-the-art modeling tools used in the algorithms that tackle large sequential decision making under uncertainty problems. We detail the modeling using decision trees and ADDs. Then we introduce the Ordered and Reduced Graphical Representation of Function, a new data structure that we propose in this thesis to deal with the various problems concerning the ADDs. We demonstrate that ORGRFs improve on ADDs to model large problems. In a second part, we go over the resolution of large sequential decision under uncertainty problems using Dynamic Programming. After the introduction of the main algorithms, we see in details the factored alternative. We indicate the improvable points of these factored versions. We describe our new algorithm that improve on these points and exploit the ORGRFs previously introduced. In a last part, we speak about the use of FMDPs in Reinforcement Learning. Then we introduce a new algorithm to learn the new datastrcture we propose. Thanks to this new algorithm, a new instance of the SDyna architecture is proposed, based on the ORGRFs : the SPIMDDI instance. We test its efficiency on several standard problems from the litterature. Finally, we present some works around this new instance. We detail a new algorithm for efficient exploration-exploitation compromise management, aiming to simplify F-RMax. Then we speak about an application of SPIMDDI to the managements of units in a strategic real time video game
Magnan, Jean-Christophe. "Représentations graphiques de fonctions et processus décisionnels Markoviens factorisés". Electronic Thesis or Diss., Paris 6, 2016. http://www.theses.fr/2016PA066042.
Pełny tekst źródłaIn decision theoretic planning, the factored framework (Factored Markovian Decision Process, FMDP) has produced several efficient algorithms in order to resolve large sequential decision making under uncertainty problems. The efficiency of this algorithms relies on data structures such as decision trees or algebraïc decision diagrams (ADDs). These planification technics are exploited in Reinforcement Learning by the architecture SDyna in order to resolve large and unknown problems. However, state-of-the-art learning and planning algorithms used in SDyna require the problem to be specified uniquely using binary variables and/or to use improvable data structure in term of compactness. In this book, we present our research works that seek to elaborate and to use a new data structure more efficient and less restrictive, and to integrate it in a new instance of the SDyna architecture. In a first part, we present the state-of-the-art modeling tools used in the algorithms that tackle large sequential decision making under uncertainty problems. We detail the modeling using decision trees and ADDs. Then we introduce the Ordered and Reduced Graphical Representation of Function, a new data structure that we propose in this thesis to deal with the various problems concerning the ADDs. We demonstrate that ORGRFs improve on ADDs to model large problems. In a second part, we go over the resolution of large sequential decision under uncertainty problems using Dynamic Programming. After the introduction of the main algorithms, we see in details the factored alternative. We indicate the improvable points of these factored versions. We describe our new algorithm that improve on these points and exploit the ORGRFs previously introduced. In a last part, we speak about the use of FMDPs in Reinforcement Learning. Then we introduce a new algorithm to learn the new datastrcture we propose. Thanks to this new algorithm, a new instance of the SDyna architecture is proposed, based on the ORGRFs : the SPIMDDI instance. We test its efficiency on several standard problems from the litterature. Finally, we present some works around this new instance. We detail a new algorithm for efficient exploration-exploitation compromise management, aiming to simplify F-RMax. Then we speak about an application of SPIMDDI to the managements of units in a strategic real time video game
Zimmer, Matthieu. "Apprentissage par renforcement développemental". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0008/document.
Pełny tekst źródłaReinforcement learning allows an agent to learn a behavior that has never been previously defined by humans. The agent discovers the environment and the different consequences of its actions through its interaction: it learns from its own experience, without having pre-established knowledge of the goals or effects of its actions. This thesis tackles how deep learning can help reinforcement learning to handle continuous spaces and environments with many degrees of freedom in order to solve problems closer to reality. Indeed, neural networks have a good scalability and representativeness. They make possible to approximate functions on continuous spaces and allow a developmental approach, because they require little a priori knowledge on the domain. We seek to reduce the amount of necessary interaction of the agent to achieve acceptable behavior. To do so, we proposed the Neural Fitted Actor-Critic framework that defines several data efficient actor-critic algorithms. We examine how the agent can fully exploit the transitions generated by previous behaviors by integrating off-policy data into the proposed framework. Finally, we study how the agent can learn faster by taking advantage of the development of his body, in particular, by proceeding with a gradual increase in the dimensionality of its sensorimotor space
Mangin, Olivier. "Emergence de concepts multimodaux : de la perception de mouvements primitifs à l'ancrage de mots acoustiques". Thesis, Bordeaux, 2014. http://www.theses.fr/2014BORD0002/document.
Pełny tekst źródłaThis thesis focuses on learning recurring patterns in multimodal perception. For that purpose it develops cognitive systems that model the mechanisms providing such capabilities to infants; a methodology that fits into thefield of developmental robotics.More precisely, this thesis revolves around two main topics that are, on the one hand the ability of infants or robots to imitate and understand human behaviors, and on the other the acquisition of language. At the crossing of these topics, we study the question of the how a developmental cognitive agent can discover a dictionary of primitive patterns from its multimodal perceptual flow. We specify this problem and formulate its links with Quine's indetermination of translation and blind source separation, as studied in acoustics.We sequentially study four sub-problems and provide an experimental formulation of each of them. We then describe and test computational models of agents solving these problems. They are particularly based on bag-of-words techniques, matrix factorization algorithms, and inverse reinforcement learning approaches. We first go in depth into the three separate problems of learning primitive sounds, such as phonemes or words, learning primitive dance motions, and learning primitive objective that compose complex tasks. Finally we study the problem of learning multimodal primitive patterns, which corresponds to solve simultaneously several of the aforementioned problems. We also details how the last problems models acoustic words grounding
Filippi, Sarah. "Stratégies optimistes en apprentissage par renforcement". Phd thesis, Ecole nationale supérieure des telecommunications - ENST, 2010. http://tel.archives-ouvertes.fr/tel-00551401.
Pełny tekst źródłaKsiążki na temat "Apprentissage par renforcement factorisé"
Sutton, Richard S. Reinforcement learning: An introduction. Cambridge, Mass: MIT Press, 1998.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Sciences de l'activité physique pse4u cours préuniversitaire. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Technologie de l'information en affaires btx4e cours préemploi. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Études informatiques ics4m cours préuniversitaire. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Mathématiques de la technologie au collège mct4c cours précollégial. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Sciences snc4m cours préuniversitaire. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: English eae4e cours préemploi. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Le Canada et le monde: une analyse géographique cgw4u cours préuniversitaire. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Environnement et gestion des ressources cgr4e cours préemploi. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaOntario. Esquisse de cours 12e année: Histoire de l'Occident et du monde chy4c cours précollégial. Vanier, Ont: CFORP, 2002.
Znajdź pełny tekst źródłaCzęści książek na temat "Apprentissage par renforcement factorisé"
Tazdaït, Tarik, i Rabia Nessah. "5. Vote et apprentissage par renforcement". W Le paradoxe du vote, 157–77. Éditions de l’École des hautes études en sciences sociales, 2013. http://dx.doi.org/10.4000/books.editionsehess.1931.
Pełny tekst źródłaBENDELLA, Mohammed Salih, i Badr BENMAMMAR. "Impact de la radio cognitive sur le green networking : approche par apprentissage par renforcement". W Gestion du niveau de service dans les environnements émergents. ISTE Group, 2020. http://dx.doi.org/10.51926/iste.9002.ch8.
Pełny tekst źródłaRaporty organizacyjne na temat "Apprentissage par renforcement factorisé"
Melloni, Gian. Le leadership des autorités locales en matière d'assainissement et d'hygiène : expériences et apprentissage de l'Afrique de l'Ouest. Institute of Development Studies (IDS), styczeń 2022. http://dx.doi.org/10.19088/slh.2022.002.
Pełny tekst źródła