Literatura científica selecionada sobre o tema "Apprentissage par renforcement profond multi-agent"
Crie uma referência precisa em APA, MLA, Chicago, Harvard, e outros estilos
Consulte a lista de atuais artigos, livros, teses, anais de congressos e outras fontes científicas relevantes para o tema "Apprentissage par renforcement profond multi-agent".
Ao lado de cada fonte na lista de referências, há um botão "Adicionar à bibliografia". Clique e geraremos automaticamente a citação bibliográfica do trabalho escolhido no estilo de citação de que você precisa: APA, MLA, Harvard, Chicago, Vancouver, etc.
Você também pode baixar o texto completo da publicação científica em formato .pdf e ler o resumo do trabalho online se estiver presente nos metadados.
Artigos de revistas sobre o assunto "Apprentissage par renforcement profond multi-agent"
Host, Shirley, e Nicolas Sabouret. "Apprentissage par renforcement d'actes de communication dans un système multi-agent". Revue d'intelligence artificielle 24, n.º 2 (17 de abril de 2010): 159–88. http://dx.doi.org/10.3166/ria.24.159-188.
Texto completo da fonteTeses / dissertações sobre o assunto "Apprentissage par renforcement profond multi-agent"
Pageaud, Simon. "SmartGov : architecture générique pour la co-construction de politiques urbaines basée sur l'apprentissage par renforcement multi-agent". Thesis, Lyon, 2019. http://www.theses.fr/2019LYSE1128.
Texto completo da fonteIn this thesis, we propose the SmartGov model, coupling multi-agent simulation and multi-agent deep reinforcement learning, to help co-construct urban policies and integrate all stakeholders in the decision process. Smart Cities provide sensor data from the urban areas to increase realism of the simulation in SmartGov.Our first contribution is a generic architecture for multi-agent simulation of the city to study global behavior emergence with realistic agents reacting to political decisions. With a multi-level modeling and a coupling of different dynamics, our tool learns environment specificities and suggests relevant policies. Our second contribution improves autonomy and adaptation of the decision function with multi-agent, multi-level reinforcement learning. A set of clustered agents is distributed over the studied area to learn local specificities without any prior knowledge on the environment. Trust score assignment and individual rewards help reduce non-stationary impact on experience replay in deep reinforcement learning.These contributions bring forth a complete system to co-construct urban policies in the Smart City. We compare our model with different approaches from the literature on a parking fee policy to display the benefits and limits of our contributions
Tréca, Maxime. "Designing traffic signal control systems using reinforcement learning". Electronic Thesis or Diss., université Paris-Saclay, 2022. http://www.theses.fr/2022UPASG043.
Texto completo da fonteThis thesis studies the problem of traffic optimization through traffic light signals on road networks. Traffic optimization is achieved in our case through the use of reinforcement learning, a branch of machine learning in which an agent solves a given task in an environment by maximizing its reward signals.First, we present the fields of traffic signal control (TSC) and reinforcement learning (RL) separately, before presenting how the latter is applied on the former (RL-TSC). Then, we define a mathematical model of traffic based on graph theory, before introducing the reinforcement learning model, traffic simulator and deep reinforcement learning library created for our research work.Finally, these definitions allow us to build an efficient traffic signal control method based on reinforcement learning.We first study multiple classical reinforcement learning techniques on an isolated traffic intersection. Multiple classes of RL algorithms are compared (e.g. Q-learning, LRP, actor-critic) to deterministic TSC methods used as a baseline. We then introduce function approximation methods using deep neural networks, allowing for significant performance improvement on isolated intersections. These experiments allow us to single out dueling deep Q-learning as the best isolated RL-TSC method for out model.On this basis, we introduce the concept of agent coordination in multi-agent reinforcement learning systems (MARL). We compare multiple modes of coordinaiton to the isolated baseline that we previously defined. These experiments allow us to define the DEC-DQN coordination method, which allows for multiple agents of a POMDP to communicate in order to better optimize traffic. DEC-DQN uses a deep neural network shared by all agents of the network, allowing them to learn a common communication protocol from scratch. In order to correctly reward communication actions, which are entirely distinct from traffic optimization actions taken by agents, DEC-DQN defines a special reward function allowing each agent to directly estimate the impact of its communications on neighboring agents of the network. Communicaiton action rewards are directly estimated on the traffic optimization neural networks of neighboring intersections.Finally, this novel cooridnation method is compared to other methods of the literature on a large-scale simulation. The DEC-DQN algorithm results in faster agent learning, as well as increased performance and stability thanks to agent coordination
Nguyen, Van-Thai. "AI-based maintenance planning for multi-component systems considering different kinds of dependencies". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0070.
Texto completo da fonteMaintenance planning for systems consisting of multiple components has still been a challenging problem. Particularly, mathematically describing dependencies between components is usually a complicated task, however, omitting component dependencies in maintenance modeling might result in suboptimal plans. Moreover, the number of maintenance decision variables needed to be optimized increases rapidly in the number of components, causing computational expense for optimization algorithms.To face these issues, this PhD aims to propose an artificial-intelligence-based maintenance optimization approach allowing to consider different kinds of dependencies between components (i.e., economic, stochastic, and structural dependence). Particularly, the maintenance approach integrates a deep maintenance cost model, that allows to compute maintenance costs at system level without requiring individual costs at component level (e.g., setup costs, labor costs and costs of maintaining each component), into the framework of multi-agent deep reinforcement learning, which can be applicable for large sequential decision-making problems, to optimize maintenance decisions. Moreover, a novel degradation interaction model for discrete- state components is also developed and then integrated into the proposed maintenance approach. Numerical studied are conducted on multi-component systems with different configurations under different observability scenarios to investigate the performance and the advantage as well as limits of the proposed maintenance approach
Tran, Trung-Minh. "Contributions to Agent-Based Modeling and Its Application in Financial Market". Electronic Thesis or Diss., Université Paris sciences et lettres, 2023. http://www.theses.fr/2023UPSLP022.
Texto completo da fonteThe analysis of complex models such as financial markets helps managers to make reasonable policies and traders to choose effective trading strategies. Agent-based modeling is a computational methodology to model complex systems and analyze the influence of different assumptions on the behaviors of agents. In the scope of this thesis, we consider a financial market model that includes 3 types of agent: technical agents, fundamental agents and noise agents. We start with the technical agent with the challenge of optimizing a trading strategy based on technical analysis through an automated trading system. Then, the proposed optimization methods are applied with suitable objective functions to optimize the parameters for the ABM model. The study was conducted with a simple ABM model including only noise agents, then the model was extended to include different types of agents. The first part of the thesis investigates the trading behavior of technical agents. Different approaches are introduced such as: Genetic Algorithm, Bayesian Optimization and Deep Reinforcement Learning. The trading strategies are built based on a leading indicator, Relative Strength Index, and two lagging indicators, Bollinger Band and Moving Average Convergence-Divergence. Multiple experiments are performed in different markets including: cryptocurrency market, stock market and crypto futures market. The results show that optimized strategies from proposed approaches can generate higher returns than their typical form and Buy and Hold strategy. Using the results from the optimization of trading strategies, we propose a new approach to optimize the parameters of the agent-based model. The second part of the thesis presents an application of agent-based modeling to the stock market. As a result, we have shown that ABM models can be optimized using the Bayesian Optimization method with multiple objective functions. The stylized facts of the actual market can be reproduced by carefully constructing the objective functions of the agent. Our work includes the development of an environment, the behaviors of different agents and their interactions. Bayesian optimization method with Kolmogorov-Smirnov test as objective function has shown advantages and potential in estimating an optimal set of parameters for an artificial financial market model. The model we propose is capable of reproducing the stylized facts of the real market. Furthermore, a new stylized fact about the proportion of traders in the market is presented. With empirical data of the Dow Jones Industrial Average index, we found that fundamental traders account for 9%-11% of all traders in the stock market. In the future, more research will be done to improve the model and optimization methods, such as applying machine learning models, multi-agent reinforcement learning or considering the application in different markets and traded instruments
Alliche, Abderrahmane Redha. "Contrôle du réseau cloud basé intelligence artificielle". Electronic Thesis or Diss., Université Côte d'Azur, 2024. http://www.theses.fr/2024COAZ4022.
Texto completo da fonteThe exponential growth of Internet traffic in recent decades has prompted the emergence of Content Delivery Networks (CDNs) as a solution for managing high traffic volumes through data caching in cloud servers located near end-users. However, challenges persist, particularly for non-cacheable services, necessitating the use of cloud overlay networks. Due to a lack of knowledge about the underlay network, cloud overlay networks introduce complexities such as Triangle inequality violations (TIV) and dynamic traffic routing challenges.Leveraging the Software Defined Networks (SDN) paradigm, Deep Reinforcement Learning (DRL) techniques offer the possibility to exploit collected data to better adapt to network changes. Furthermore, the increase of cloud edge servers presents scalability challenges, motivating the exploration of Multi-Agent DRL (MA-DRL) solutions. Despite its suitability for the distributed packet routing problem in cloud overlay networks, MA-DRL faces non-addressed challenges such as the need for realistic network simulators, handling communication overhead, and addressing the multi-objective nature of the routing problem.This Ph.D. thesis delves into the realm of distributed Multi-Agent Deep Reinforcement Learning (MA-DRL) methods, specifically targeting the Distributed Packet Routing problem in cloud overlay networks. Throughout the thesis, we address these challenges by developing realistic network simulators, studying communication overhead in the non-overlay general setting, and proposing a distributed MA-DRL framework tailored to cloud overlay networks, focusing on communication overhead, convergence, and model stability
Younes, Walid. "Un système multi-agent pour la composition logicielle opportuniste en environnement ambiant et dynamique". Thesis, Toulouse 3, 2021. http://www.theses.fr/2021TOU30025.
Texto completo da fonteCyber-physical and ambient systems consist of fixed or mobile devices connected through communication networks. These devices host software components that provide services and may require other services to operate. These software components are usually developed, installed, and activated independently of each other and, with the mobility of users and devices, they may appear or disappear unpredictably. This gives cyber-physical and ambient systems an open and changing character. Software components are bricks that can be assembled to form applications. But, in such a dynamic and open context, component assemblies are difficult to design, maintain and adapt. Applications are used by humans who are at the heart of these systems. Ambient intelligence aims to offer them a personalized environment adapted to the situation, i.e. to provide the right application at the right time, anticipating their needs, which may also vary and evolve over time. To answer these problems, our team is exploring an original approach called "opportunistic software composition", which consists in automatically building applications on the fly from components currently available in the environment, without relying on explicit user needs or predefined assembly plans. In this way, applications emerge from the environment, taking advantage of opportunities as they arise. This thesis defines a software architecture for opportunistic software composition and proposes an intelligent system, called "opportunistic composition engine", in order to automatically build relevant applications, both adapted to the user and to the surrounding environment. The opportunistic composition engine periodically detects the components and their services that are present in the ambient environment, builds assemblies of components, and proposes them to the user. It automatically learns the user's preferences according to the situation in order to maximize user satisfaction over time. Learning is done online by reinforcement. It is decentralized within a multi-agent system in which agents interact via a protocol that supports dynamic service discovery and selection. To learn from and for the user, the latter is put in the loop. In this way, he keeps control over his ambient environment, and decides on the relevance of the emerging application before it is deployed. The solution has been implemented and tested. It works in conjunction with an interface that describes the emerging applications to the user and allows him to edit them. The user's actions on this interface are sources of feedback for the engine and serve as an input to the reinforcement learning mechanism
Robaglia, Benoît-Marie. "Reinforcement Learning for Uncoordinated Multiple Access". Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAT010.
Texto completo da fonteDistributed Medium Access Control (MAC) protocols are fundamental in wireless communication, yet traditional random access-based protocols face significant limitations dealing with the Internet-of-Things (IoT) use cases. Indeed, they struggle with latency guarantees, making them unsuitable for Ultra Reliable Low Latency Communications (URLLC). This thesis addresses these challenges by leveraging the potential of Deep Reinforcement Learning (DRL), a paradigm where decision-makers optimize actions by interacting with an environment.This thesis tackles key challenges in the Medium Access (MA) problem for URLLC networks, including the latency in centralized protocols, the collision and retransmission issues in Grant-Free (GF) protocols, the complexities to handle device heterogeneity and dynamic environments. Furthermore, the thesis explores the integration of new physical layer techniques like Non-Orthogonal Multiple Access (NOMA).Our methodology applies DRL to develop intelligent protocols, which has already shown effectiveness in addressing IoT applications. Initially, we model the URLLC problem within a centralized paradigm, where the Base Station (BS) orchestrates device transmissions. This setup has the benefit to ensure collision-free communication but introduces partial observability as the BS does not have access to the users' buffer and channel state. We tackle this problem by introducing two algorithms: FilteredPPO and NOMA-PPO. While the former outperforms the benchmarks in scenarios with periodic traffic patterns, the latter demonstrates superior performance over the state-of-the-art baselines on scenarios with sporadic traffic. The third and fourth contributions, SeqDQN and MCA-PPO, study the application of Multi-Agent Reinforcement Learning (MARL) for URLLC where each device is equipped by a DRL algorithm. While SeqDQN explores a method to reduce non-stationarity and enhances scalability and training efficiency, MCA-PPO presents a theoretically robust solution for the Dynamic Multi-Channel Access (DMCA) challenge allowing users to optimize bandwidth utilization, and thus enhancing the URLLC performance
Bono, Guillaume. "Deep multi-agent reinforcement learning for dynamic and stochastic vehicle routing problems". Thesis, Lyon, 2020. http://www.theses.fr/2020LYSEI096.
Texto completo da fonteRouting delivery vehicles in dynamic and uncertain environments like dense city centers is a challenging task, which requires robustness and flexibility. Such logistic problems are usually formalized as Dynamic and Stochastic Vehicle Routing Problems (DS-VRPs) with a variety of additional operational constraints, such as Capacitated vehicles or Time Windows (DS-CVRPTWs). Main heuristic approaches to dynamic and stochastic problems simply consist in restarting the optimization process on a frozen (static and deterministic) version of the problem given the new information. Instead, Reinforcement Learning (RL) offers models such as Markov Decision Processes (MDPs) which naturally describe the evolution of stochastic and dynamic systems. Their application to more complex problems has been facilitated by recent progresses in Deep Neural Networks, which can learn to represent a large class of functions in high dimensional spaces to approximate solutions with high performances. Finding a compact and sufficiently expressive state representation is the key challenge in applying RL to VRPs. Recent work exploring this novel approach demonstrated the capabilities of Attention Mechanisms to represent sets of customers and learn policies generalizing to different configurations of customers. However, all existing work using DNNs reframe the VRP as a single-vehicle problem and cannot provide online decision rules for a fleet of vehicles.In this thesis, we study how to apply Deep RL methods to rich DS-VRPs as multi-agent systems. We first explore the class of policy-based approaches in Multi-Agent RL and Actor-Critic methods for Decentralized, Partially Observable MDPs in the Centralized Training for Decentralized Control (CTDC) paradigm. To address DS-VRPs, we then introduce a new sequential multi-agent model we call sMMDP. This fully observable model is designed to capture the fact that consequences of decisions can be predicted in isolation. Afterwards, we use it to model a rich DS-VRP and propose a new modular policy network to represent the state of the customers and the vehicles in this new model, called MARDAM. It provides online decision rules adapted to the information contained in the state and takes advantage of the structural properties of the model. Finally, we develop a set of artificial benchmarks to evaluate the flexibility, the robustness and the generalization capabilities of MARDAM. We report promising results in the dynamic and stochastic case, which demonstrate the capacity of MARDAM to address varying scenarios with no re-optimization, adapting to new customers and unexpected delays caused by stochastic travel times. We also implement an additional benchmark based on micro-traffic simulation to better capture the dynamics of a real city and its road infrastructures. We report preliminary results as a proof of concept that MARDAM can learn to represent different scenarios, handle varying traffic conditions, and customers configurations
Basso, Gillian. "Approche à base d'agents pour l'ingénierie et le contrôle de micro-réseaux". Phd thesis, Université de Technologie de Belfort-Montbeliard, 2013. http://tel.archives-ouvertes.fr/tel-00982342.
Texto completo da fonteAjmi, Faiza. "Optimisation collaborative par des agents auto-adaptatifs pour résoudre les problèmes d'ordonnancement des patients en inter-intra urgences hospitalières". Thesis, Centrale Lille Institut, 2021. http://www.theses.fr/2021CLIL0019.
Texto completo da fonteThis thesis addresses the scheduling patients in emergency department (ED) considering downstreamconstraints, by using collaborative optimization approaches to optimize the total waiting time of patients.These approaches are used by integrating, in the behavior of each agent, a metaheuristic that evolvesefficiently, thanks to two interaction protocols "friends" and "enemies". In addition, each agent self-adaptsusing a reinforcement learning algorithm adapted to the studied problem. This self-adaptation considersthe agents’ experiences and their knowledge of the ED environment. The learning of the agents allowsto accelerate the convergence by guiding the search for good solutions towards more promising areas inthe search space. In order to ensure the continuity of quality patient care, we also propose in this thesis,a joint approach for scheduling and assigning downstream beds to patients. We illustrate the proposedcollaborative approaches and demonstrate their effectiveness on real data provided from the ED of the LilleUniversity Hospital Center obtained in the framework of the ANR OIILH project. The results obtainedshow that the collaborative Learning approach leads to better results compared to the scenario in whichagents work individually or without learning. The application of the algorithms that manage the patientscare in downstream services, provides results in the form of a dashboard, containing static and dynamicinformation. This information is updated in real time and allows emergency staff to assign patients morequickly to the adequate structures. The results of the simulation show that the proposed AI algorithms cansignificantly improve the efficiency of the emergency chain by reducing the total waiting time of patientsin inter-intra-emergency