Segui questo link per vedere altri tipi di pubblicazioni sul tema: MADDPG.

Articoli di riviste sul tema "MADDPG"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "MADDPG".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Yang, Jianfeng, Xinwei Yang e Tianqi Yu. "Multi-Unmanned Aerial Vehicle Confrontation in Intelligent Air Combat: A Multi-Agent Deep Reinforcement Learning Approach". Drones 8, n. 8 (7 agosto 2024): 382. http://dx.doi.org/10.3390/drones8080382.

Testo completo
Abstract (sommario):
Multiple unmanned aerial vehicle (multi-UAV) confrontation is becoming an increasingly important combat mode in intelligent air combat. The confrontation highly relies on the intelligent collaboration and real-time decision-making of the UAVs. Thus, a decomposed and prioritized experience replay (PER)-based multi-agent deep deterministic policy gradient (DP-MADDPG) algorithm has been proposed in this paper for the moving and attacking decisions of UAVs. Specifically, the confrontation is formulated as a partially observable Markov game. To solve the problem, the DP-MADDPG algorithm is proposed by integrating the decomposed and PER mechanisms into the traditional MADDPG. To overcome the technical challenges of the convergence to a local optimum and a single dominant policy, the decomposed mechanism is applied to modify the MADDPG framework with local and global dual critic networks. Furthermore, to improve the convergence rate of the MADDPG training process, the PER mechanism is utilized to optimize the sampling efficiency from the experience replay buffer. Simulations have been conducted based on the Multi-agent Combat Arena (MaCA) platform, wherein the traditional MADDPG and independent learning DDPG (ILDDPG) algorithms are benchmarks. Simulation results indicate that the proposed DP-MADDPG improves the convergence rate and the convergent reward value. During confrontations against the vanilla distance-prioritized rule-empowered and intelligent ILDDPG-empowered blue parties, the DP-MADDPG-empowered red party can improve the win rate to 96% and 80.5%, respectively.
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Zhang, Xiaoping, Yuanpeng Zheng, Li Wang, Arsen Abdulali e Fumiya Iida. "Multi-Agent Collaborative Target Search Based on the Multi-Agent Deep Deterministic Policy Gradient with Emotional Intrinsic Motivation". Applied Sciences 13, n. 21 (1 novembre 2023): 11951. http://dx.doi.org/10.3390/app132111951.

Testo completo
Abstract (sommario):
Multi-agent collaborative target search is one of the main challenges in the multi-agent field, and deep reinforcement learning (DRL) is a good way to learn such a task. However, DRL always faces the problem of sparse reward, which to some extent reduces its efficiency in task learning. Introducing intrinsic motivation has proved to be a useful way to make the sparse reward in DRL. So, based on the multi-agent deep deterministic policy gradient (MADDPG) structure, a new MADDPG algorithm with the emotional intrinsic motivation name MADDPG-E is proposed in this paper for the multi-agent collaborative target search. In MADDPG-E, a new emotional intrinsic motivation module with three emotions, joy, sadness, and fear, is designed. The three emotions are defined by corresponding psychological knowledge to the multi-agent embodied situations in an environment. An emotional steady-state variable function H is then designed to help judge the goodness of the emotions. Based on H, an emotion-based intrinsic reward function is finally proposed. With the designed emotional intrinsic motivation module, the multi-agent system always tries to make itself joy, which means it always learns to search the target. To show the effectiveness of the proposed MADDPG-E algorithm, two kinds of simulation experiments with a determined initial position and random initial position, respectively, are carried out, and comparisons are performed with MADDPG as well as MADDPG-ICM (MADDPG with an intrinsic curiosity module). The results show that with the designed emotional intrinsic motivation module, MADDPG-E has a higher learning speed and better learning stability, and the advantage is more obvious when facing complex situations.
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Wilk, Patrick, Ning Wang e Jie Li. "Multi-Agent Reinforcement Learning for Smart Community Energy Management". Energies 17, n. 20 (20 ottobre 2024): 5211. http://dx.doi.org/10.3390/en17205211.

Testo completo
Abstract (sommario):
This paper investigates a Local Strategy-Driven Multi-Agent Deep Deterministic Policy Gradient (LSD-MADDPG) method for demand-side energy management systems (EMS) in smart communities. LSD-MADDPG modifies the conventional MADDPG framework by limiting data sharing during centralized training to only discretized strategic information. During execution, it relies solely on local information, eliminating post-training data exchange. This approach addresses critical challenges commonly faced by EMS solutions serving dynamic, increasing-scale communities, such as communication delays, single-point failures, scalability, and nonstationary environments. By leveraging and sharing only strategic information among agents, LSD-MADDPG optimizes decision-making while enhancing training efficiency and safeguarding data privacy—a critical concern in the community EMS. The proposed LSD-MADDPG has proven to be capable of reducing energy costs and flattening the community demand curve by coordinating indoor temperature control and electric vehicle charging schedules across multiple buildings. Comparative case studies reveal that LSD-MADDPG excels in both cooperative and competitive settings by ensuring fair alignment between individual buildings’ energy management actions and community-wide goals, highlighting its potential for advancing future smart community energy management.
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Wang, Lixing, e Huirong Jiao. "Multi-Agent Reinforcement Learning-Based Computation Offloading for Unmanned Aerial Vehicle Post-Disaster Rescue". Sensors 24, n. 24 (15 dicembre 2024): 8014. https://doi.org/10.3390/s24248014.

Testo completo
Abstract (sommario):
Natural disasters cause significant losses. Unmanned aerial vehicles (UAVs) are valuable in rescue missions but need to offload tasks to edge servers due to their limited computing power and battery life. This study proposes a task offloading decision algorithm called the multi-agent deep deterministic policy gradient with cooperation and experience replay (CER-MADDPG), which is based on multi-agent reinforcement learning for UAV computation offloading. CER-MADDPG emphasizes collaboration between UAVs and uses historical UAV experiences to classify and obtain optimal strategies. It enables collaboration among edge devices through the design of the ’critic’ network. Additionally, by defining good and bad experiences for UAVs, experiences are classified into two separate buffers, allowing UAVs to learn from them, seek benefits, avoid harm, and reduce system overhead. The performance of CER-MADDPG was verified through simulations in two aspects. First, the influence of key hyperparameters on performance was examined, and the optimal values were determined. Second, CER-MADDPG was compared with other baseline algorithms. The results show that compared with MADDPG and stochastic game-based resource allocation with prioritized experience replay, CER-MADDPG achieves the lowest system overhead and superior stability and scalability.
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Petrenko, V. I., F. B. Tebueva, M. M. Gurchinsky e A. S. Pavlov. "Method of Multi-Agent Reinforcement Learning in Systems with a Variable Number of Agents". Mekhatronika, Avtomatizatsiya, Upravlenie 23, n. 10 (9 ottobre 2022): 507–14. http://dx.doi.org/10.17587/mau.23.507-514.

Testo completo
Abstract (sommario):
Multi-agent reinforcement learning methods are one of the newest and actively developing areas of machine learning. Among the methods of multi-agent reinforcement learning, one of the most promising is the MADDPG method, the advantage of which is the high convergence of the learning process. The disadvantage of the MADDPG method is the need to ensure the equality of the number of agents N at the training stage and the number of agents K at the functioning stage. At the same time, target multi-agent systems (MAS), such as groups of UAVs or mobile ground robots, are systems with a variable number of agents, which does not allow the use of the MADDPG method in them. To solve this problem, the article proposes an improved MADDPG method for multi-agent reinforcement learning in systems with a variable number of agents. The improved MADDPG method is based on the hypothesis that to perform its functions, an agent needs information about the state of not all other MAS agents, but only a few nearest neighbors. Based on this hypothesis, a method of hybrid joint / independent learning of MAS with a variable number of agents is proposed, which involves training a small number of agents N to ensure the functioning of an arbitrary number of agents K, K> N. The experiments have shown that the improved MADDPG method provides an efficiency of MAS functioning com-parable to the original method with varying the number of K agents at the stage of functioning within wide limits.
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Chen, Zhisheng. "DQN–MADDPG Coordinating the Multi-agent Cooperation". Highlights in Science, Engineering and Technology 39 (1 aprile 2023): 1141–45. http://dx.doi.org/10.54097/hset.v39i.6720.

Testo completo
Abstract (sommario):
Multi-agent coordinating aims for many agents to finish the same or more mutual goals. To achieve this function, there are two main ideas; the first is making multi-agents communicate with each other and acquire the information for other agents. Another is sharing the same environment, dividing the work, and cooperating to achieve the goal. ‘’Gym-cooking’’ is an excellent model to test the algorithms’ performance in network coordination and game theory; this is a sharing environment. Based on sharing information, the agent have two networks(policy and target) and tries to do something more efficiently. This article will increase the complexity of the environment and use different algorithms to process the experiment. Specially, this paper will use the MADDPG model as the primary model to show its performance in a complex environment and contrast other models like DQN. The MADDPG model in this experiment is unlike the traditional MADDPG; the work trains the MADDPG network to deal with emergencies and accidents.
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Liu, Bo, Shulei Wang, Qinghua Li, Xinyang Zhao, Yunqing Pan e Changhong Wang. "Task Assignment of UAV Swarms Based on Deep Reinforcement Learning". Drones 7, n. 5 (29 aprile 2023): 297. http://dx.doi.org/10.3390/drones7050297.

Testo completo
Abstract (sommario):
UAV swarm applications are critical for the future, and their mission-planning and decision-making capabilities have a direct impact on their performance. However, creating a dynamic and scalable assignment algorithm that can be applied to various groups and tasks is a significant challenge. To address this issue, we propose the Extensible Multi-Agent Deep Deterministic Policy Gradient (Ex-MADDPG) algorithm, which builds on the MADDPG framework. The Ex-MADDPG algorithm improves the robustness and scalability of the assignment algorithm by incorporating local communication, mean simulation observation, a synchronous parameter-training mechanism, and a scalable multiple-decision mechanism. Our approach has been validated for effectiveness and scalability through both simulation experiments in the Multi-Agent Particle Environment (MPE) and a real-world experiment. Overall, our results demonstrate that the Ex-MADDPG algorithm is effective in handling various groups and tasks and can scale well as the swarm size increases. Therefore, our algorithm holds great promise for mission planning and decision-making in UAV swarm applications.
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Wei, Juyao, Zhenggang Lu, Zheng Yin e Zhipeng Jing. "Multiagent Reinforcement Learning for Active Guidance Control of Railway Vehicles with Independently Rotating Wheels". Applied Sciences 14, n. 4 (19 febbraio 2024): 1677. http://dx.doi.org/10.3390/app14041677.

Testo completo
Abstract (sommario):
This paper presents a novel data-driven multiagent reinforcement learning (MARL) controller for enhancing the running stability of independently rotating wheels (IRW) and reducing wheel–rail wear. We base our active guidance controller on the multiagent deep deterministic policy gradient (MADDPG) algorithm. In this framework, each IRW controller is treated as an independent agent, facilitating localized control of individual wheelsets and reducing the complexity of the required observations. Furthermore, we enhance the MADDPG algorithm with prioritized experience replay (PER), resulting in the PER-MADDPG algorithm, which optimizes training convergence and stability by prioritizing informative experience samples. In this paper, we compare the PER-MADDPG algorithm against existing controllers, demonstrating the superior simulation performance of the proposed algorithm, particularly in terms of self-centering capability and curve-negotiation behavior, effectively reducing the wear number. We also develop a scaled IRW vehicle for active guidance experiments. The experimental results validate the enhanced running performance of IRW vehicles using our proposed controller.
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Li, Xilun, Zhan Li, Xiaolong Zheng , Xuebo Yang  e Xinghu Yu . "The Study of Crash-Tolerant, Multi-Agent Offensive and Defensive Games Using Deep Reinforcement Learning". Electronics 12, n. 2 (8 gennaio 2023): 327. http://dx.doi.org/10.3390/electronics12020327.

Testo completo
Abstract (sommario):
In the multi-agent offensive and defensive game (ODG), each agent achieves its goal by cooperating or competing with other agents. The multi-agent deep reinforcement learning (MADRL) method is applied in similar scenarios to help agents make decisions. In various situations, the agents of both sides may crash due to collisions. However, the existing algorithms cannot deal with the situation where the number of agents reduces. Based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, we study a method to deal with a reduction in the number of agents in the training process without changing the structure of the neural network (NN), which is called the frozen agent method for the MADDPG (FA-MADDPG) algorithm. In addition, we design a distance–collision reward function to help agents learn strategies better. Through the experiments in four scenarios with different numbers of agents, it is verified that the algorithm we proposed can not only successfully deal with the problem of agent number reduction in the training stage but also show better performance and higher efficiency than the MADDPG algorithm in simulation.
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Hu, Weichao, Hongzhang Mu, Yanyan Chen, Yixin Liu e Xiaosong Li. "Modeling Interactions of Autonomous/Manual Vehicles and Pedestrians with a Multi-Agent Deep Deterministic Policy Gradient". Sustainability 15, n. 7 (3 aprile 2023): 6156. http://dx.doi.org/10.3390/su15076156.

Testo completo
Abstract (sommario):
This article focuses on the development of a stable pedestrian crash avoidance mitigation system for autonomous vehicles (AVs). Previous works have only used simple AV–pedestrian models, which do not reflect the actual interaction and risk status of intelligent intersections with manual vehicles. The paper presents a model that simulates the interaction between automatic driving vehicles and pedestrians on unsignalized crosswalks using the multi-agent deep deterministic policy gradient (MADDPG) algorithm. The MADDPG algorithm optimizes the PCAM strategy through the continuous interaction of multiple independent agents and effectively captures the inherent uncertainty in continuous learning and human behavior. Experimental results show that the MADDPG model can fully mitigate collisions in different scenarios and outperforms the DDPG and DRL algorithms.
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Zhu, Zixiong, Nianhao Xie, Kang Zong e Lei Chen. "Building a Connected Communication Network for UAV Clusters Using DE-MADDPG". Symmetry 13, n. 8 (20 agosto 2021): 1537. http://dx.doi.org/10.3390/sym13081537.

Testo completo
Abstract (sommario):
Clusters of unmanned aerial vehicles (UAVs) are often used to perform complex tasks. In such clusters, the reliability of the communication network connecting the UAVs is an essential factor in their collective efficiency. Due to the complex wireless environment, however, communication malfunctions within the cluster are likely during the flight of UAVs. In such cases, it is important to control the cluster and rebuild the connected network. The asymmetry of the cluster topology also increases the complexity of the control mechanisms. The traditional control methods based on cluster consistency often rely on the motion information of the neighboring UAVs. The motion information, however, may become unavailable because of the interrupted communications. UAV control algorithms based on deep reinforcement learning have achieved outstanding results in many fields. Here, we propose a cluster control method based on the Decomposed Multi-Agent Deep Deterministic Policy Gradient (DE-MADDPG) to rebuild a communication network for UAV clusters. The DE-MADDPG improves the framework of the traditional multi-agent deep deterministic policy gradient (MADDPG) algorithm by decomposing the reward function. We further introduce the reward reshaping function to facilitate the convergence of the algorithm in sparse reward environments. To address the instability of the state-space in the reinforcement learning framework, we also propose the notion of the virtual leader–follower model. Extensive simulations show that the success rate of the DE-MADDPG is higher than that of the MADDPG algorithm, confirming the effectiveness of the proposed method.
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Bachiri, Khalil, Ali Yahyaouy, Hamid Gualous, Maria Malek, Younes Bennani, Philippe Makany e Nicoleta Rogovschi. "Multi-Agent DDPG Based Electric Vehicles Charging Station Recommendation". Energies 16, n. 16 (19 agosto 2023): 6067. http://dx.doi.org/10.3390/en16166067.

Testo completo
Abstract (sommario):
Electric vehicles (EVs) are a sustainable transportation solution with environmental benefits and energy efficiency. However, their popularity has raised challenges in locating appropriate charging stations, especially in cities with limited infrastructure and dynamic charging demands. To address this, we propose a multi-agent deep deterministic policy gradient (MADDPG) method for optimal EV charging station recommendations, considering real-time traffic conditions. Our approach aims to minimize total travel time in a stochastic environment for efficient smart transportation management. We adopt a centralized learning and decentralized execution strategy, treating each region of charging stations as an individual agent. Agents cooperate to recommend optimal charging stations based on various incentive functions and competitive contexts. The problem is modeled as a Markov game, suitable for analyzing multi-agent decisions in stochastic environments. Intelligent transportation systems provide us with traffic information, and each charging station feeds relevant data to the agents. Our MADDPG method is challenged with a substantial number of EV requests, enabling efficient handling of dynamic charging demands. Simulation experiments compare our method with DDPG and deterministic approaches, considering different distributions and EV numbers. The results highlight MADDPG’s superiority, emphasizing its value for sustainable urban mobility and efficient EV charging station scheduling.
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Xue, Junjie, Jie Zhu, Jiangtao Du, Weijie Kang e Jiyang Xiao. "Dynamic Path Planning for Multiple UAVs with Incomplete Information". Electronics 12, n. 4 (16 febbraio 2023): 980. http://dx.doi.org/10.3390/electronics12040980.

Testo completo
Abstract (sommario):
To address the dynamic path planning for multiple UAVs using incomplete information, this paper studies real-time conflict detection and intelligent resolution methods. When the UAVs execute the task under the condition of incomplete information, the mission strategy of different UAVs may conflict with each other due to the difference in target, departure place, time and other factors. Based on the multi-agent deep deterministic policy gradient algorithm (MADDPG), we designed new global reward and partial local reward functions for the UAVs’ path planning and named the improved algorithm as a complex memory driver-MADDPG (CMD-MADDPG). Thus, the trained UAVs can effectively and efficiently perform path planning tasks in conditions of incomplete information (each UAV does not know its reward function and so on). Finally, the simulation verifies that the proposed method can realize fast and accurate dynamic path planning for multiple UAVs.
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Lin, Xudong, e Mengxing Huang. "An Autonomous Cooperative Navigation Approach for Multiple Unmanned Ground Vehicles in a Variable Communication Environment". Electronics 13, n. 15 (1 agosto 2024): 3028. http://dx.doi.org/10.3390/electronics13153028.

Testo completo
Abstract (sommario):
Robots assist emergency responders by collecting critical information remotely. Deploying multiple cooperative unmanned ground vehicles (UGVs) for a response can reduce the response time, improve situational awareness, and minimize costs. Reliable communication is critical for multiple UGVs for environmental response because multiple robots need to share information for cooperative navigation and data collection. In this work, we investigate a control policy for optimal communication among multiple UGVs and base stations (BSs). A multi-agent deep deterministic policy gradient (MADDPG) algorithm is proposed to update the control policy for the maximum signal-to-interference ratio. The UGVs communicate with both the fixed BSs and a mobile BS. The proposed control policy can navigate the UGVs and mobile BS to optimize communication and signal strength. Finally, a genetic algorithm (GA) is proposed to optimize the hyperparameters of the MADDPG-based training. Simulation results demonstrate the computational efficiency and robustness of the GA-based MADDPG algorithm for the control of multiple UGVs.
Gli stili APA, Harvard, Vancouver, ISO e altri
15

Zheng, Siying, Jie Wu, Zhaolong Wang, Liping Qu e Yikai He. "Research on Cooperative Tracking of Multiple Agents on Heterogeneous Ground". Journal of Physics: Conference Series 2872, n. 1 (1 ottobre 2024): 012001. http://dx.doi.org/10.1088/1742-6596/2872/1/012001.

Testo completo
Abstract (sommario):
Abstract Pursuit-evasion scenarios are typical and significant area of study multi-agent behaviours. While previous research has primarily focused on training pursuit strategies on idealized flat, this paper explores pursuit strategies on heterogeneous ground that rely on ground features from the effects of friction and viscous forces. Since agent interacts with heterogeneous ground and changes its acceleration, we perform multi-agent modelling. Aiming at the multi-agent deep deterministic policy gradient (MADDPG) algorithm’s low training efficiency and slow convergence speed in pursuit learning, we use the prioritized experience selection mechanism of MADDPG algorithm (PES-MADDPG) which improve the experience extraction mechanism based on the policy evaluation function error and the experience extraction training frequency. With this approach, the design of rewards results in faster convergence of reward values. The experimental findings confirm that the suggested method delivers better performance and effectiveness for pursuers and evaders on heterogeneous ground, and both can acquire the relevant movement strategies through training.
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Dake, Delali Kwasi, James Dzisi Gadze, Griffith Selorm Klogo e Henry Nunoo-Mensah. "Multi-Agent Reinforcement Learning Framework in SDN-IoT for Transient Load Detection and Prevention". Technologies 9, n. 3 (29 giugno 2021): 44. http://dx.doi.org/10.3390/technologies9030044.

Testo completo
Abstract (sommario):
The fast emergence of IoT devices and its accompanying big and complex data has necessitated a shift from the traditional networking architecture to software-defined networks (SDNs) in recent times. Routing optimization and DDoS protection in the network has become a necessity for mobile network operators in maintaining a good QoS and QoE for customers. Inspired by the recent advancement in Machine Learning and Deep Reinforcement Learning (DRL), we propose a novel MADDPG integrated Multiagent framework in SDN for efficient multipath routing optimization and malicious DDoS traffic detection and prevention in the network. The two MARL agents cooperate within the same environment to accomplish network optimization task within a shorter time. The state, action, and reward of the proposed framework were further modelled mathematically using the Markov Decision Process (MDP) and later integrated into the MADDPG algorithm. We compared the proposed MADDPG-based framework to DDPG for network metrics: delay, jitter, packet loss rate, bandwidth usage, and intrusion detection. The results show a significant improvement in network metrics with the two agents.
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Qin, Pinpin, Hongyun Tan, Hao Li e Xuguang Wen. "Deep Reinforcement Learning Car-Following Model Considering Longitudinal and Lateral Control". Sustainability 14, n. 24 (13 dicembre 2022): 16705. http://dx.doi.org/10.3390/su142416705.

Testo completo
Abstract (sommario):
The lateral control of the vehicle is significant for reducing the rollover risk of high-speed cars and improving the stability of the following vehicle. However, the existing car-following (CF) models rarely consider lateral control. Therefore, a CF model with combined longitudinal and lateral control is constructed based on the three degrees of freedom vehicle dynamics model and reinforcement learning method. First, 100 CF segments were selected from the OpenACC database, including 50 straight and 50 curved road trajectories. Afterward, the deep deterministic policy gradient (DDPG) car-following model and multi-agent deep deterministic policy gradient (MADDPG) car-following model were constructed based on the deterministic policy gradient theory. Finally, the models are trained with the extracted trajectory data and verified by comparison with the observed data. The results indicate that the vehicle under the control of the MADDPG model and the vehicle under the control of the DDPG model are both safer and more comfortable than the human-driven vehicle (HDV) on straight roads and curved roads. Under the premise of safety, the vehicle under the control of the MADDPG model has the highest road traffic flow efficiency. The maximum lateral offset of the vehicle under the control of the MADDPG model and the vehicle under the control of the DDPG model in straight road conditions is respectively reduced by 80.86% and 71.92%, compared with the HDV, and the maximum lateral offset in the curved road conditions is lessened by 83.67% and 78.95%. The proposed car following model can provide a reference for developing an adaptive cruise control system considering lateral stability.
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Wan, Kaifang, Dingwei Wu, Yiwei Zhai, Bo Li, Xiaoguang Gao e Zijian Hu. "An Improved Approach towards Multi-Agent Pursuit–Evasion Game Decision-Making Using Deep Reinforcement Learning". Entropy 23, n. 11 (29 ottobre 2021): 1433. http://dx.doi.org/10.3390/e23111433.

Testo completo
Abstract (sommario):
A pursuit–evasion game is a classical maneuver confrontation problem in the multi-agent systems (MASs) domain. An online decision technique based on deep reinforcement learning (DRL) was developed in this paper to address the problem of environment sensing and decision-making in pursuit–evasion games. A control-oriented framework developed from the DRL-based multi-agent deep deterministic policy gradient (MADDPG) algorithm was built to implement multi-agent cooperative decision-making to overcome the limitation of the tedious state variables required for the traditionally complicated modeling process. To address the effects of errors between a model and a real scenario, this paper introduces adversarial disturbances. It also proposes a novel adversarial attack trick and adversarial learning MADDPG (A2-MADDPG) algorithm. By introducing an adversarial attack trick for the agents themselves, uncertainties of the real world are modeled, thereby optimizing robust training. During the training process, adversarial learning was incorporated into our algorithm to preprocess the actions of multiple agents, which enabled them to properly respond to uncertain dynamic changes in MASs. Experimental results verified that the proposed approach provides superior performance and effectiveness for pursuers and evaders, and both can learn the corresponding confrontational strategy during training.
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Yang, Yang, Jiang Li, Jinyong Hou, Ye Wang e Huadong Zhao. "A Policy Gradient Algorithm to Alleviate the Multi-Agent Value Overestimation Problem in Complex Environments". Sensors 23, n. 23 (30 novembre 2023): 9520. http://dx.doi.org/10.3390/s23239520.

Testo completo
Abstract (sommario):
Multi-agent reinforcement learning excels at addressing group intelligent decision-making problems involving sequential decision-making. In particular, in complex, high-dimensional state and action spaces, it imposes higher demands on the reliability, stability, and adaptability of decision algorithms. The reinforcement learning algorithm based on the multi-agent deep strategy gradient incorporates a function approximation method using discriminant networks. However, this can lead to estimation errors when agents evaluate action values, thereby reducing model reliability and stability and resulting in challenging convergence. With the increasing complexity of the environment, there is a decline in the quality of experience collected by the experience playback pool, resulting in low efficiency of the sampling stage and difficulties in algorithm convergence. To address these challenges, we propose an innovative approach called the empirical clustering layer-based multi-agent dual dueling policy gradient (ECL-MAD3PG) algorithm. Experimental results demonstrate that our ECL-MAD3PG algorithm outperforms other methods in various complex environments, demonstrating a remarkable 9.1% improvement in mission completion compared to MADDPG within the context of complex UAV cooperative combat scenarios.
Gli stili APA, Harvard, Vancouver, ISO e altri
20

Liu, Muchen. "Integrating Multi-Agent Deep Deterministic Policy Gradient and Go-Explore for Enhanced Reward Optimization". Highlights in Science, Engineering and Technology 85 (13 marzo 2024): 403–10. http://dx.doi.org/10.54097/znrt8d63.

Testo completo
Abstract (sommario):
The field of Multi-Agent Reinforcement Learning (MARL) continues to advance with the development of new and effective methods. This research is centered on two prominent approaches within this field: Multi-Agent Deep Deterministic Policy Gradient (MADDPG) and Go-Explore. The study explores the synergistic potential of combining these two methodologies to enhance rewards for individual agents as well as for agent groups. In the course of this research, MADDPG is introduced into the experimental environment, providing agents with both actor networks (policy networks) and critic networks (Q networks) to implement the actor-critic model. Additionally, each individual agent is equipped with a Go-Explore network, empowering them to conduct deeper explorations of the environment and accumulate rewards at an accelerated rate, often resulting in higher overall rewards. This novel approach emphasizes achieving a balance between individual and collaborative rewards, offering a promising avenue for optimizing multi-agent systems. The results of this study demonstrate that the combined method exhibits notable advantages in certain scenarios. Specifically, it showcases a higher rate of reward accumulation and improved overall performance. This research contributes to the MARL domain by highlighting the potential of combining MADDPG and Go-Explore to enhance the efficiency and effectiveness of multi-agent systems.
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Ye, Xianfeng, Zhiyun Deng, Yanjun Shi e Weiming Shen. "Toward Energy-Efficient Routing of Multiple AGVs with Multi-Agent Reinforcement Learning". Sensors 23, n. 12 (15 giugno 2023): 5615. http://dx.doi.org/10.3390/s23125615.

Testo completo
Abstract (sommario):
This paper presents a multi-agent reinforcement learning (MARL) algorithm to address the scheduling and routing problems of multiple automated guided vehicles (AGVs), with the goal of minimizing overall energy consumption. The proposed algorithm is developed based on the multi-agent deep deterministic policy gradient (MADDPG) algorithm, with modifications made to the action and state space to fit the setting of AGV activities. While previous studies overlooked the energy efficiency of AGVs, this paper develops a well-designed reward function that helps to optimize the overall energy consumption required to fulfill all tasks. Moreover, we incorporate the e-greedy exploration strategy into the proposed algorithm to balance exploration and exploitation during training, which helps it converge faster and achieve better performance. The proposed MARL algorithm is equipped with carefully selected parameters that aid in avoiding obstacles, speeding up path planning, and achieving minimal energy consumption. To demonstrate the effectiveness of the proposed algorithm, three types of numerical experiments including the ϵ-greedy MADDPG, MADDPG, and Q-Learning methods were conducted. The results show that the proposed algorithm can effectively solve the multi-AGV task assignment and path planning problems, and the energy consumption results show that the planned routes can effectively improve energy efficiency.
Gli stili APA, Harvard, Vancouver, ISO e altri
22

Wu, Tianhao, Mingzhi Jiang e Lin Zhang. "Cooperative Multiagent Deep Deterministic Policy Gradient (CoMADDPG) for Intelligent Connected Transportation with Unsignalized Intersection". Mathematical Problems in Engineering 2020 (22 luglio 2020): 1–12. http://dx.doi.org/10.1155/2020/1820527.

Testo completo
Abstract (sommario):
Unsignalized intersection control is one of the most critical issues in intelligent transportation systems, which requires connected and automated vehicles to support more frequent information interaction and on-board computing. It is very promising to introduce reinforcement learning in the unsignalized intersection control. However, the existing multiagent reinforcement learning algorithms, such as multiagent deep deterministic policy gradient (MADDPG), hardly handle a dynamic number of vehicles, which cannot meet the need of the real road condition. Thus, this paper proposes a Cooperative MADDPG (CoMADDPG) for connected vehicles at unsignalized intersection to solve this problem. Firstly, the scenario of multiple vehicles passing through an unsignalized intersection is formulated as a multiagent reinforcement learning (RL) problem. Secondly, MADDPG is redefined to adapt to the dynamic quantity agents, where each vehicle selects reference vehicles to construct a partial stationary environment, which is necessary for RL. Thirdly, this paper incorporates a novel vehicle selection method, which projects the reference vehicles on a virtual lane and selects the largest impact vehicles to construct the environment. At last, an intersection simulation platform is developed to evaluate the proposed method. According to the simulation result, CoMADDPG can reduce average travel time by 39.28% compared with the other optimization-based methods, which indicates that CoMADDPG has an excellent prospect in dealing with the scenario of unsignalized intersection control.
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Wei, Jingjing, Yinsheng Wei, Lei Yu e Rongqing Xu. "Radar Anti-Jamming Decision-Making Method Based on DDPG-MADDPG Algorithm". Remote Sensing 15, n. 16 (16 agosto 2023): 4046. http://dx.doi.org/10.3390/rs15164046.

Testo completo
Abstract (sommario):
In the face of smart and varied jamming, intelligent radar anti-jamming technologies are urgently needed. Due to the variety of radar electronic counter-countermeasures (ECCMs), it is necessary to efficiently optimize ECCMs in the high-dimensional knowledge base to ensure that the radar achieves the optimal anti-jamming effect. Therefore, an intelligent radar anti-jamming decision-making method based on the deep deterministic policy gradient (DDPG) and the multi-agent deep deterministic policy gradient (MADDPG) (DDPG-MADDPG) algorithm is proposed. Firstly, by establishing a typical working scenario of radar and jamming, we designed the intelligent radar anti-jamming decision-making model, and the anti-jamming decision-making process was formulated. Then, aiming at different jamming modes, we designed the anti-jamming improvement factor and the correlation matrix of jamming and ECCM. They were used to evaluate the jamming suppression performance of ECCMs and to provide feedback for the decision-making algorithm. The decision-making constraints and four different decision-making objectives were designed to verify the performance of the decision-making algorithm. Finally, we designed a DDPG-MADDPG algorithm to generate the anti-jamming strategy. The simulation results showed that the proposed method has excellent robustness and generalization performance. At the same time, it has a shorter convergence time and higher anti-jamming decision making accuracy.
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Budiyanto, Almira, Keisuke Azetsu e Nobutomo Matsunaga. "Accelerated Transfer Learning for Cooperative Transportation Formation Change via SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization)". Automation 5, n. 4 (27 novembre 2024): 597–612. http://dx.doi.org/10.3390/automation5040034.

Testo completo
Abstract (sommario):
A method for cooperative transportation, which required formation change in a traveling environment, is gaining interest. Deep reinforcement learning is used in formation changes for multi-robot cases. The MADDPG (Multi-Agent Deep Deterministic Policy Gradient) method is popularly used for recognized environments. On the other hand, re-learning may be required in unrecognized circumstances by using the MADDPG method. Although the development of MADDPG using model-based learning and imitation learning has been applied to reduce learning time, it is unclear how the learning results are transferred when the number of robots changes. For example, in the GASIL-MADDPG (Generative adversarial self-imitation learning and Multi-agent Deep Deterministic Policy Gradient) method, how the results of three robot training can be transferred to the four robots’ neural networks is uncertain. Nowadays, Scaled Dot Product Attention (SDPA) has attracted attention and is highly impactful for its speed and accuracy in natural language processing. When transfer learning is combined with fast computation, the efficiency of edge-level re-learning is improved. This paper proposes a formation change algorithm that allows easy and fast multi-robot knowledge transfer using SDPA combined with MAPPO (Multi-Agent Proximal Policy Optimization), compared to other methods. This algorithm applies SDPA to multi-robot formation learning and performs fast learning by transferring the acquired knowledge of formation changes to a certain number of robots. The proposed algorithm is verified by simulating the robot formation change and was able to achieve dramatic high-speed learning capabilities. The proposed SDPA-MAPPO (Scaled Dot Product Attention-Multi-Agent Proximal Policy Optimization) learned 20.83 times faster than the Deep Dyna-Q method. Furthermore, using transfer learning from a three-robot to five-robot case, the method shows that the learning time can be reduced by about 56.57 percent. A scenario of three-robot to five-robot is chosen based on the number of robots often used in cooperative robots.
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Lin, Yuanmo, Yuxun Ai, Zhiyong Xu, Jingyuan Wang e Jianhua Li. "Adaptive Resource Allocation for Emergency Communications with Unmanned Aerial Vehicle-Assisted Free Space Optical/Radio Frequency Relay System". Photonics 11, n. 8 (13 agosto 2024): 754. http://dx.doi.org/10.3390/photonics11080754.

Testo completo
Abstract (sommario):
This paper investigates the problem of coordinated resource allocation for multiple unmanned aerial vehicles (UAVs) to address the scarcity of communication resources in disaster-affected areas. UAVs carrying modules of free space optical (FSO) and radio frequency (RF) serve as relay nodes and edge offloading nodes, presenting an FSO/RF dual-hop framework. Considering the varying urgency levels of tasks, we assign task priorities and transform the proposed problem into distributed collaborative optimization problem. Based on the K-means algorithm and the multi-agent deep deterministic policy gradient (MADDPG) algorithm, we propose a UAV-coordinated K-means MADDPG (KMADDPG) to maximize the number of completed tasks while prioritizing high-priority tasks. Simulation results show that KMADDPG is 5% to 10% better than the benchmark DRL methods in convergence performance.
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Yu, Sheng, Wei Zhu e Yong Wang. "Research on Wargame Decision-Making Method Based on Multi-Agent Deep Deterministic Policy Gradient". Applied Sciences 13, n. 7 (4 aprile 2023): 4569. http://dx.doi.org/10.3390/app13074569.

Testo completo
Abstract (sommario):
Wargames are essential simulators for various war scenarios. However, the increasing pace of warfare has rendered traditional wargame decision-making methods inadequate. To address this challenge, wargame-assisted decision-making methods that leverage artificial intelligence techniques, notably reinforcement learning, have emerged as a promising solution. The current wargame environment is beset by a large decision space and sparse rewards, presenting obstacles to optimizing decision-making methods. To overcome these hurdles, a Multi-Agent Deep Deterministic Policy Gradient (MADDPG) based wargame decision-making method is presented. The Partially Observable Markov Decision Process (POMDP), joint action-value function, and the Gumbel-Softmax estimator are applied to optimize MADDPG in order to adapt to the wargame environment. Furthermore, a wargame decision-making method based on the improved MADDPG algorithm is proposed. Using supervised learning in the proposed approach, the training efficiency is improved and the space for manipulation before the reinforcement learning phase is reduced. In addition, a policy gradient estimator is incorporated to reduce the action space and to obtain the global optimal solution. Furthermore, an additional reward function is designed to address the sparse reward problem. The experimental results demonstrate that our proposed wargame decision-making method outperforms the pre-optimization algorithm and other algorithms based on the AC framework in the wargame environment. Our approach offers a promising solution to the challenging problem of decision-making in wargame scenarios, particularly given the increasing speed and complexity of modern warfare.
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Arain, Zulfiqar Ali, Xuesong Qiu, Changqiao Xu, Mu Wang e Mussadiq Abdul Rahim. "Energy-Aware MPTCP Scheduling in Heterogeneous Wireless Networks Using Multi-Agent Deep Reinforcement Learning Techniques". Electronics 12, n. 21 (1 novembre 2023): 4496. http://dx.doi.org/10.3390/electronics12214496.

Testo completo
Abstract (sommario):
This paper proposes an energy-efficient scheduling scheme for multi-path TCP (MPTCP) in heterogeneous wireless networks, aiming to minimize energy consumption while ensuring low latency and high throughput. Each MPTCP sub-flow is controlled by an agent that cooperates with other agents using the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm. This approach enables the agents to learn decentralized policies through centralized training and decentralized execution. The scheduling problem is modeled as a multi-agent decision-making task. The proposed energy-efficient scheduling scheme, referred to as EE-MADDPG, demonstrates significant energy savings while maintaining lower latency and higher throughput compared to other state-of-the-art scheduling techniques. By adopting a multi-agent deep reinforcement learning approach, the agents can learn efficient scheduling policies that optimize various performance metrics in heterogeneous wireless networks.
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Zhou, Xiao, Song Zhou, Xingang Mou e Yi He. "Multirobot Collaborative Pursuit Target Robot by Improved MADDPG". Computational Intelligence and Neuroscience 2022 (25 febbraio 2022): 1–10. http://dx.doi.org/10.1155/2022/4757394.

Testo completo
Abstract (sommario):
Policy formulation is one of the main problems in multirobot systems, especially in multirobot pursuit-evasion scenarios, where both sparse rewards and random environment changes bring great difficulties to find better strategy. Existing multirobot decision-making methods mostly use environmental rewards to promote robots to complete the target task that cannot achieve good results. This paper proposes a multirobot pursuit method based on improved multiagent deep deterministic policy gradient (MADDPG), which solves the problem of sparse rewards in multirobot pursuit-evasion scenarios by combining the intrinsic reward and the external environment. The state similarity module based on the threshold constraint is as a part of the intrinsic reward signal output by the intrinsic curiosity module, which is used to balance overexploration and insufficient exploration, so that the agent can use the intrinsic reward more effectively to learn better strategies. The simulation experiment results show that the proposed method can improve the reward value of robots and the success rate of the pursuit task significantly. The intuitive change is obviously reflected in the real-time distance between the pursuer and the escapee, the pursuer using the improved algorithm for training can get closer to the escapee more quickly, and the average following distance also decreases.
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Zhang, Lu, Junwei Li, Qianwen Yang, Chenglin Xu e Feng Zhao. "MADDPG-Based Deployment Algorithm for 5G Network Slicing". Electronics 13, n. 16 (12 agosto 2024): 3189. http://dx.doi.org/10.3390/electronics13163189.

Testo completo
Abstract (sommario):
One of the core features of 5G networks is the ability to support multiple services on the same infrastructure, with network slicing being a key technology. However, existing network slicing architectures have limitations in efficiently handling slice requests with different requirements, particularly when addressing high-reliability and high-demand services, where many issues remain unresolved. For example, predicting whether actual physical resources can meet network slice request demands and achieving flexible, on-demand resource allocation for different types of slice requests are significant challenges. To address the need for more flexible and efficient service demands, this paper proposes a 5G network slicing deployment algorithm based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG). Firstly, a new 5G network slicing deployment system framework is established, which measures resources for three typical 5G network slicing scenarios (eMBB, mMTC, uRLLC) and processes different types of slice requests by predicting slice request traffic. Secondly, by adopting the multi-agent approach of MADDPG, the algorithm enhances cooperation between multiple service requests, decentralizes action selection for requests, and schedules resources separately for the three types of slice requests, thereby optimizing resource allocation. Finally, simulation results demonstrate that the proposed algorithm significantly outperforms existing algorithms in terms of resource efficiency and slice request acceptance rate, showcasing the advantages of multi-agent approaches in slice request handling.
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Bildik, Enver, e Antonios Tsourdos. "Clustering and Cooperative Guidance of Multiple Decoys for Defending a Naval Platform against Salvo Threats". Aerospace 11, n. 10 (27 settembre 2024): 799. http://dx.doi.org/10.3390/aerospace11100799.

Testo completo
Abstract (sommario):
The threat to naval platforms from missile systems is increasing due to recent advancements in radar seeker technology, which have significantly enhanced the accuracy and effectiveness of missile targeting. In scenarios where a naval platform with limited maneuverability faces salvo attacks, the importance of an effective defense strategy becomes crucial to ensuring the protection of the platform. In this study, we present a multi-agent reinforcement learning-based decoy deployment approach that employs six decoys to increase the survival likelihood of a naval platform against salvo missile strikes. Our approach entails separating the decoys into two clusters, each consisting of three decoys. Subsequently, every cluster is allocated to a related missile threat. This is accomplished by training the decoys with the multi-agent deep reinforcement learning algorithm. To compare the proposed approach across different algorithms, we use two distinct algorithms to train the decoys; multi-agent deep deterministic policy gradient (MADDPG) and multi-agent twin-delayed deep deterministic policy gradient (MATD3). Following training, the decoys learn to form groups and establish effective formation configurations within each group to ensure optimal coordination. We assess the proposed decoy deployment strategy using parameters including decoy deployment angle and maximum decoy speed. Our findings indicate that decoys positioned on the same side outperform those positioned on different sides relative to the target platform. In general, MATD3 performs slightly better than MADDPG. Decoys trained with MATD3 succeed in more successful formation configurations than those trained with the MADDPG method, which accounts for this enhancement.
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Wang, Guangcheng, Fenglin Wei, Yu Jiang, Minghao Zhao, Kai Wang e Hong Qi. "A Multi-AUV Maritime Target Search Method for Moving and Invisible Objects Based on Multi-Agent Deep Reinforcement Learning". Sensors 22, n. 21 (7 novembre 2022): 8562. http://dx.doi.org/10.3390/s22218562.

Testo completo
Abstract (sommario):
Target search for moving and invisible objects has always been considered a challenge, as the floating objects drift with the flows. This study focuses on target search by multiple autonomous underwater vehicles (AUV) and investigates a multi-agent target search method (MATSMI) for moving and invisible objects. In the MATSMI algorithm, based on the multi-agent deep deterministic policy gradient (MADDPG) method, we add spatial and temporal information to the reinforcement learning state and set up specialized rewards in conjunction with a maritime target search scenario. Additionally, we construct a simulation environment to simulate a multi-AUV search for the floating object. The simulation results show that the MATSMI method has about 20% higher search success rate and about 70 steps shorter search time than the traditional search method. In addition, the MATSMI method converges faster than the MADDPG method. This paper provides a novel and effective method for solving the maritime target search problem.
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Zhang, Hao, Yu Du, Shixin Zhao, Ying Yuan e Qiuqi Gao. "VN-MADDPG: A Variable-Noise-Based Multi-Agent Reinforcement Learning Algorithm for Autonomous Vehicles at Unsignalized Intersections". Electronics 13, n. 16 (11 agosto 2024): 3180. http://dx.doi.org/10.3390/electronics13163180.

Testo completo
Abstract (sommario):
The decision-making performance of autonomous vehicles tends to be unstable at unsignalized intersections, making it difficult for them to make optimal decisions. We propose a decision-making model based on the Variable-Noise Multi-Agent Deep Deterministic Policy Gradient (VN-MADDPG) algorithm to address these issues. The variable-noise mechanism reduces noise dynamically, enabling the agent to utilize the learned policy more effectively to complete tasks. This significantly improves the stability of the decision-making model in making optimal decisions. The importance sampling module addresses the inconsistency between outdated experience in the replay buffer and current environmental features. This enhances the model’s learning efficiency and improves the robustness of the decision-making model. Experimental results on the CARLA simulation platform show that the success rate of decision making at unsignalized intersections by autonomous vehicles has significantly increased, and the pass time has been reduced. The decision-making model based on the VN-MADDPG algorithm demonstrates stable and excellent decision-making performance.
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Wu, Liangshun, Peilin Liu, Junsuo Qu, Cong Zhang e Bin Zhang. "Duty Cycle Scheduling in Wireless Sensor Networks Using an Exploratory Strategy-Directed MADDPG Algorithm". International Journal of Sensors and Sensor Networks 12, n. 1 (28 febbraio 2024): 1–12. http://dx.doi.org/10.11648/j.ijssn.20241201.11.

Testo completo
Abstract (sommario):
This paper presents an in-depth study of the application of Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithms with an exploratory strategy for duty cycle scheduling (DCS) in the wireless sensor networks (WSNs). The focus is on optimizing the performance of sensor nodes in terms of energy efficiency and event detection rates under varying environmental conditions. Through a series of simulations, we investigate the impact of key parameters such as the sensor specificity constant α and the Poisson rate of events on the learning and operational efficacy of sensor nodes. Our results demonstrate that the MADDPG algorithm with an exploratory strategy outperforms traditional reinforcement learning algorithms, particularly in environments characterized by high event rates and the need for precise energy management. The exploratory strategy enables a more effective balance between exploration and exploitation, leading to improved policy learning and adaptation in dynamic and uncertain environments. Furthermore, we explore the sensitivity of different algorithms to the tuning of the sensor specificity constant α, revealing that lower values generally yield better performance by reducing energy consumption without significantly compromising event detection. The study also examines the algorithms' robustness against the variability introduced by different event Poisson rates, emphasizing the importance of algorithm selection and parameter tuning in practical WSN applications. The insights gained from this research provide valuable guidelines for the deployment of sensor networks in real-world scenarios, where the trade-off between energy consumption and event detection is critical. Our findings suggest that the integration of exploratory strategies in MADDPG algorithms can significantly enhance the performance and reliability of sensor nodes in WSNs.
Gli stili APA, Harvard, Vancouver, ISO e altri
34

Intelligence and Neuroscience, Computational. "Retracted: Multirobot Collaborative Pursuit Target Robot by Improved MADDPG". Computational Intelligence and Neuroscience 2023 (26 luglio 2023): 1. http://dx.doi.org/10.1155/2023/9839345.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Ai, Ling, Shaozhen Tang e Jie Yu. "Multi-agent cooperative encirclement based on improved MADDPG algorithm". Journal of Physics: Conference Series 2898, n. 1 (1 novembre 2024): 012033. http://dx.doi.org/10.1088/1742-6596/2898/1/012033.

Testo completo
Abstract (sommario):
Abstract In this paper, we propose an improved Multi-agent Deep Deterministic Policy Gradient algorithm with a Priority Experience Replay mechanism (PER-MADDPG) to address the baseline algorithm’s high-dimensional state space challenges in multi-agent encirclement scenarios. The PER mechanism effectively mitigates the issue of non-stationary experience data distribution. By incorporating the Apollonian circle theory, we design an effective encirclement reward function that enables the multi-agent system to complete encirclement tasks in environments with static obstacles. Comparative simulation results show that the improved algorithm achieves faster reward value growth and higher average rewards.
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Zhang, Demu, Jing Zhang, Yu He, Tao Shen e Xingyan Liu. "Adaptive Control of VSG Inertia Damping Based on MADDPG". Energies 17, n. 24 (20 dicembre 2024): 6421. https://doi.org/10.3390/en17246421.

Testo completo
Abstract (sommario):
As renewable energy sources become more integrated into the power grid, traditional virtual synchronous generator (VSG) control strategies have become inadequate for the current low-damping, low-inertia power systems. Therefore, this paper proposes a VSG inertia and damping adaptive control method based on multi-agent deep deterministic policy gradient (MADDPG). The paper first introduces the working principles of virtual synchronous generators and establishes a corresponding VSG model. Based on this model, the influence of variations in virtual inertia (J) and damping (D) coefficients on fluctuations in active power output is examined, defining the action space for J and D. The proposed method is mainly divided into two phases: “centralized training and decentralized execution”. In the centralized training phase, each agent’s critic network shares global observation and action information to guide the actor network in policy optimization. In the decentralized execution phase, agents observe frequency deviations and the rate at which angular frequency changes, using reinforcement learning algorithms to adjust the virtual inertia J and damping coefficient D in real time. Finally, the effectiveness of the proposed MADDPG control strategy is validated through comparison with adaptive control and DDPG control methods.
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Suanpang, Pannee, e Pitchaya Jamjuntr. "Optimizing Electric Vehicle Charging Recommendation in Smart Cities: A Multi-Agent Reinforcement Learning Approach". World Electric Vehicle Journal 15, n. 2 (14 febbraio 2024): 67. http://dx.doi.org/10.3390/wevj15020067.

Testo completo
Abstract (sommario):
As global awareness for preserving natural energy sustainability rises, electric vehicles (EVs) are increasingly becoming a preferred choice for transportation because of their ability to emit zero emissions, conserve energy, and reduce pollution, especially in smart cities with sustainable development. Nonetheless, the lack of adequate EV charging infrastructure remains a significant problem that has resulted in varying charging demands at different locations and times, particularly in developing countries. As a consequence, this inadequacy has posed a challenge for EV drivers, particularly those in smart cities, as they face difficulty in locating suitable charging stations. Nevertheless, the recent development of deep reinforcement learning is a promising technology that has the potential to improve the charging experience in several ways over the long term. This paper proposes a novel approach for recommending EV charging stations using multi-agent reinforcement learning (MARL) algorithms by comparing several popular algorithms, including the deep deterministic policy gradient, deep Q-network, multi-agent DDPG (MADDPG), Real, and Random, in optimizing the placement and allocation of the EV charging stations. The results demonstrated that MADDPG outperformed other algorithms in terms of the Mean Charge Waiting Time, CFT, and Total Saving Fee, thus indicating its superiority in addressing the EV charging station problem in a multi-agent setting. The collaborative and communicative nature of the MADDPG algorithm played a key role in achieving these results. Hence, this approach could provide a better user experience, increase the adoption of EVs, and be extended to other transportation-related problems. Overall, this study highlighted the potential of MARL as a powerful approach for solving complex optimization problems in transportation and beyond. This would also contribute to the development of more efficient and sustainable transportation systems in smart cities for sustainable development.
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Li, Yan, Mengyu Zhao, Huazhi Zhang, Yuanyuan Qu e Suyu Wang. "A Multi-Agent Motion Prediction and Tracking Method Based on Non-Cooperative Equilibrium". Mathematics 10, n. 1 (5 gennaio 2022): 164. http://dx.doi.org/10.3390/math10010164.

Testo completo
Abstract (sommario):
A Multi-Agent Motion Prediction and Tracking method based on non-cooperative equilibrium (MPT-NCE) is proposed according to the fact that some multi-agent intelligent evolution methods, like the MADDPG, lack adaptability facing unfamiliar environments, and are unable to achieve multi-agent motion prediction and tracking, although they own advantages in multi-agent intelligence. Featured by a performance discrimination module using the time difference function together with a random mutation module applying predictive learning, the MPT-NCE is capable of improving the prediction and tracking ability of the agents in the intelligent game confrontation. Two groups of multi-agent prediction and tracking experiments are conducted and the results show that compared with the MADDPG method, in the aspect of prediction ability, the MPT-NCE achieves a prediction rate at more than 90%, which is 23.52% higher and increases the whole evolution efficiency by 16.89%; in the aspect of tracking ability, the MPT-NCE promotes the convergent speed by 11.76% while facilitating the target tracking by 25.85%. The proposed MPT-NCE method shows impressive environmental adaptability and prediction and tracking ability.
Gli stili APA, Harvard, Vancouver, ISO e altri
39

Fan, Dongyu, Haikuo Shen e Lijing Dong. "Multi-Agent Distributed Deep Deterministic Policy Gradient for Partially Observable Tracking". Actuators 10, n. 10 (14 ottobre 2021): 268. http://dx.doi.org/10.3390/act10100268.

Testo completo
Abstract (sommario):
In many existing multi-agent reinforcement learning tasks, each agent observes all the other agents from its own perspective. In addition, the training process is centralized, namely the critic of each agent can access the policies of all the agents. This scheme has certain limitations since every single agent can only obtain the information of its neighbor agents due to the communication range in practical applications. Therefore, in this paper, a multi-agent distributed deep deterministic policy gradient (MAD3PG) approach is presented with decentralized actors and distributed critics to realize multi-agent distributed tracking. The distinguishing feature of the proposed framework is that we adopted the multi-agent distributed training with decentralized execution, where each critic only takes the agent’s and the neighbor agents’ policies into account. Experiments were conducted in the distributed tracking tasks based on multi-agent particle environments where N(N=3,N=5) agents track a target agent with partial observation. The results showed that the proposed method achieves a higher reward with a shorter training time compared to other methods, including MADDPG, DDPG, PPO, and DQN. The proposed novel method leads to a more efficient and effective multi-agent tracking.
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Wen, Jiayi, Shaoman Liu e Yejin Lin. "Dynamic Navigation and Area Assignment of Multiple USVs Based on Multi-Agent Deep Reinforcement Learning". Sensors 22, n. 18 (14 settembre 2022): 6942. http://dx.doi.org/10.3390/s22186942.

Testo completo
Abstract (sommario):
The unmanned surface vehicle (USV) has attracted more and more attention because of its basic ability to perform complex maritime tasks autonomously in constrained environments. However, the level of autonomy of one single USV is still limited, especially when deployed in a dynamic environment to perform multiple tasks simultaneously. Thus, a multi-USV cooperative approach can be adopted to obtain the desired success rate in the presence of multi-mission objectives. In this paper, we propose a cooperative navigating approach by enabling multiple USVs to automatically avoid dynamic obstacles and allocate target areas. To be specific, we propose a multi-agent deep reinforcement learning (MADRL) approach, i.e., a multi-agent deep deterministic policy gradient (MADDPG), to maximize the autonomy level by jointly optimizing the trajectory of USVs, as well as obstacle avoidance and coordination, which is a complex optimization problem usually solved separately. In contrast to other works, we combined dynamic navigation and area assignment to design a task management system based on the MADDPG learning framework. Finally, the experiments were carried out on the Gym platform to verify the effectiveness of the proposed method.
Gli stili APA, Harvard, Vancouver, ISO e altri
41

Pan, Lei, Tong Zhang e Yuan Gao. "Real-Time Control of Gas Supply System for a PEMFC Cold-Start Based on the MADDPG Algorithm". Energies 16, n. 12 (12 giugno 2023): 4655. http://dx.doi.org/10.3390/en16124655.

Testo completo
Abstract (sommario):
During the cold-start process of a PEMFC, the supply of air and hydrogen in the gas supply system has a great influence on the cold-start performance. The cold-start of a PEMFC is a complex nonlinear coupling process, and the traditional control strategy is not sensitive to the real-time characteristics of the system. Inspired by the strong perception and decision-making abilities of deep reinforcement learning, this paper proposes a cold-start control strategy for a gas supply system based on the MADDPG algorithm, and designs an air supply controller and a hydrogen supply controller based on this algorithm. The proposed strategy can optimize the control parameters of the gas supply system in real time according to the temperature rise rate of the stack during the cold-start process, the fluctuation of the OER, and the voltage output characteristics. After the strategy is trained offline according to the designed reward function, the detailed in-loop simulation experiment results are given and compared with the traditional control strategy for the gas supply system. From the results, it can be seen that the proposed MADDPG control strategy has a more effective coordination control effect.
Gli stili APA, Harvard, Vancouver, ISO e altri
42

Wang, Yizheng, Enhao Shi, Yang Xu, Jiahua Hu e Changsen Feng. "Short-Term Electricity Futures Investment Strategies for Power Producers Based on Multi-Agent Deep Reinforcement Learning". Energies 17, n. 21 (28 ottobre 2024): 5350. http://dx.doi.org/10.3390/en17215350.

Testo completo
Abstract (sommario):
The global development and enhancement of electricity financial markets aim to mitigate price risk in the electricity spot market. Power producers utilize financial derivatives for both hedging and speculation, necessitating careful selection of portfolio strategies. Current research on investment strategies for power financial derivatives primarily emphasizes risk management, resulting in a lack of a comprehensive investment framework. This study analyzes six short-term electricity futures contracts: base day, base week, base weekend, peak day, peak week, and peak weekend. A multi-agent deep reinforcement learning algorithm, Dual-Q MADDPG, is employed to learn from interactions with both the spot and futures market environments, considering the hedging and speculative behaviors of power producers. Upon completion of model training, the algorithm enables power producers to derive optimal portfolio strategies. Numerical experiments conducted in the Nordic electricity spot and futures markets indicate that the proposed Dual-Q MADDPG algorithm effectively reduces price risk in the spot market while generating substantial speculative returns. This study contributes to lowering barriers for power generators in the power finance market, thereby facilitating the widespread adoption of financial instruments, which enhances market liquidity and stability.
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Cao, Zhengyang, e Gang Chen. "Advanced Cooperative Formation Control in Variable-Sweep Wing UAVs via the MADDPG–VSC Algorithm". Applied Sciences 14, n. 19 (7 ottobre 2024): 9048. http://dx.doi.org/10.3390/app14199048.

Testo completo
Abstract (sommario):
UAV technology is advancing rapidly, and variable-sweep wing UAVs are increasingly valuable because they can adapt to different flight conditions. However, conventional control methods often struggle with managing continuous action spaces and responding to dynamic environments, making them inadequate for complex multi-UAV cooperative formation control tasks. To address these challenges, this study presents an innovative framework that integrates dynamic modeling with morphing control, optimized by the multi-agent deep deterministic policy gradient for two-sweep control (MADDPG–VSC) algorithm. This approach enables real-time sweep angle adjustments based on current flight states, significantly enhancing aerodynamic efficiency and overall UAV performance. The precise motion state model for wing morphing developed in this study underpins the MADDPG–VSC algorithm’s implementation. The algorithm not only optimizes multi-UAV formation control efficiency but also improves obstacle avoidance, attitude stability, and decision-making speed. Extensive simulations and real-world experiments consistently demonstrate that the proposed algorithm outperforms contemporary methods in multiple aspects, underscoring its practical applicability in complex aerial systems. This study advances control technologies for morphing-wing UAV formation and offers new insights into multi-agent cooperative control, with substantial potential for real-world applications.
Gli stili APA, Harvard, Vancouver, ISO e altri
44

Wei, Xiaolong, Lifang Yang, Gang Cao, Tao Lu e Bing Wang. "Recurrent MADDPG for Object Detection and Assignment in Combat Tasks". IEEE Access 8 (2020): 163334–43. http://dx.doi.org/10.1109/access.2020.3022638.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
45

Jiang, Changxu, Zheng Lin, Chenxi Liu, Feixiong Chen e Zhenguo Shao. "MADDPG-Based Active Distribution Network Dynamic Reconfiguration with Renewable Energy". Protection and Control of Modern Power Systems 9, n. 6 (novembre 2024): 143–55. http://dx.doi.org/10.23919/pcmp.2023.000283.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Wang, Yuchen, Zishan Huang, Zhongcheng Wei e Jijun Zhao. "MADDPG-Based Offloading Strategy for Timing-Dependent Tasks in Edge Computing". Future Internet 16, n. 6 (21 maggio 2024): 181. http://dx.doi.org/10.3390/fi16060181.

Testo completo
Abstract (sommario):
With the increasing popularity of the Internet of Things (IoT), the proliferation of computation-intensive and timing-dependent applications has brought serious load pressure on terrestrial networks. In order to solve the problem of computing resource conflict and long response delay caused by concurrent application service applications from multiple users, this paper proposes an improved edge computing timing-dependent, task-offloading scheme based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG) that aims to shorten the offloading delay and improve the resource utilization rate by means of resource prediction and collaboration among multiple agents to shorten the offloading delay and improve the resource utilization. First, to coordinate the global computing resource, the gated recurrent unit is utilized, which predicts the next computing resource requirements of the timing-dependent tasks according to historical information. Second, the predicted information, the historical offloading decisions and the current state are used as inputs, and the training process of the reinforcement learning algorithm is improved to propose a task-offloading algorithm based on MADDPG. The simulation results show that the algorithm reduces the response latency by 6.7% and improves the resource utilization by 30.6% compared with the suboptimal benchmark algorithm, and it reduces nearly 500 training rounds during the learning process, which effectively improves the timeliness of the offloading strategy.
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Lei, Wenxin, Hong Wen, Jinsong Wu e Wenjing Hou. "MADDPG-Based Security Situational Awareness for Smart Grid with Intelligent Edge". Applied Sciences 11, n. 7 (31 marzo 2021): 3101. http://dx.doi.org/10.3390/app11073101.

Testo completo
Abstract (sommario):
Advanced communication and information technologies enable smart grids to be more intelligent and automated, although many security issues are emerging. Security situational awareness (SSA) has been envisioned as a potential approach to provide safe services for power systems’ operation. However, in the power cloud master station mode, massive heterogeneous power terminals make SSA complicated, and failure information cannot be promptly delivered. Moreover, the dynamic and continuous situational space also increases the challenges of SSA. By taking advantages of edge intelligence, this paper introduces edge computing between terminals and the cloud to address the drawbacks of the traditional power cloud paradigm. Moreover, a deep reinforcement learning algorithm based on the edge computing paradigm of multiagent deep deterministic policy gradient (MADDPG) is proposed. The minimum processing cost under the premise of minimum detection error rate is taken to analyze the smart grids’ SSA. Performance evaluations show that the algorithm under this paradigm can achieve faster convergence and the optimal goal, namely the provision of real-time protection for smart grids.
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Zhu, Zhidong, Xiaoying Deng, Jian Dong, Cheng Feng e Xiongjun Fu. "AK-MADDPG-Based Antijamming Strategy Design Method for Frequency Agile Radar". Sensors 24, n. 11 (27 maggio 2024): 3445. http://dx.doi.org/10.3390/s24113445.

Testo completo
Abstract (sommario):
Frequency agility refers to the rapid variation of the carrier frequency of adjacent pulses, which is an effective radar active antijamming method against frequency spot jamming. Variation patterns of traditional pseudo-random frequency hopping methods are susceptible to analysis and decryption, rendering them ineffective against increasingly sophisticated jamming strategies. Although existing reinforcement learning-based methods can adaptively optimize frequency hopping strategies, they are limited in adapting to the diversity and dynamics of jamming strategies, resulting in poor performance in the face of complex unknown jamming strategies. This paper proposes an AK-MADDPG (Adaptive K-th order history-based Multi-Agent Deep Deterministic Policy Gradient) method for designing frequency hopping strategies in frequency agile radar. Signal pulses within a coherent processing interval are treated as agents, learning to optimize their hopping strategies in the case of unknown jamming strategies. Agents dynamically adjust their carrier frequencies to evade jamming and collaborate with others to enhance antijamming efficacy. This approach exploits cooperative relationships among the pulses, providing additional information for optimized frequency hopping strategies. In addition, an adaptive K-th order history method has been introduced into the algorithm to capture long-term dependencies in sequential data. Simulation results demonstrate the superior performance of the proposed method.
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Lu, Junsong, Zongsheng Wang, Kang Pan e Hanshuo Zhang. "Research on the influence of multi-agent deep deterministic policy gradient algorithm key parameters in typical scenarios". Journal of Physics: Conference Series 2858, n. 1 (1 ottobre 2024): 012037. http://dx.doi.org/10.1088/1742-6596/2858/1/012037.

Testo completo
Abstract (sommario):
Abstract The MADDPG algorithm is widely used and relatively complete, but there is no intuitive data support for the values of some key parameters. Therefore, the influence law of key parameters on MADDPG in typical scenarios has been studied in this paper. Firstly, three typical experimental scenarios were identified, including collaborative cooperation, collaborative opposition, and collaborative pursuit with basic parameters and hyperparameters. Then, a research plan was formulated by using the control variable method to study the influence of learning rate, reward discount coefficient, and reward function coefficient on algorithm performance. Based on a large number of experimental data comparisons, the optimal values of each parameter under three experimental scenarios were obtained. The results showed that the optimal reward discount coefficient for all three scenarios was the same, indicating that its impact on scene complexity was relatively small. For the optimal learning rate, there was a general trend that the lower complexity collaborative cooperation scenario had an optimal value lower than that of the higher complexity collaborative pursuit and collaborative opposition scenarios. As for the reward coefficient, it could be concluded that when the reward coefficient was large in collaborative cooperation and collaborative opposition scenarios, the convergence and speed of the reward curve became poorer. The reward coefficient in the collaborative pursuit scenario had a less significant impact on the performance of the algorithm.
Gli stili APA, Harvard, Vancouver, ISO e altri
50

Cai, He, Xingsheng Li, Yibo Zhang e Huanli Gao. "Interception of a Single Intruding Unmanned Aerial Vehicle by Multiple Missiles Using the Novel EA-MADDPG Training Algorithm". Drones 8, n. 10 (26 settembre 2024): 524. http://dx.doi.org/10.3390/drones8100524.

Testo completo
Abstract (sommario):
This paper proposes an improved multi-agent deep deterministic policy gradient algorithm called the equal-reward and action-enhanced multi-agent deep deterministic policy gradient (EA-MADDPG) algorithm to solve the guidance problem of multiple missiles cooperating to intercept a single intruding UAV in three-dimensional space. The key innovations of EA-MADDPG include the implementation of the action filter with additional reward functions, optimal replay buffer, and equal reward setting. The additional reward functions and the action filter are set to enhance the exploration performance of the missiles during training. The optimal replay buffer and the equal reward setting are implemented to improve the utilization efficiency of exploration experiences obtained through the action filter. In order to prevent over-learning from certain experiences, a special storage mechanism is established, where experiences obtained through the action filter are stored only in the optimal replay buffer, while normal experiences are stored in both the optimal replay buffer and normal replay buffer. Meanwhile, we gradually reduce the selection probability of the action filter and the sampling ratio of the optimal replay buffer. Finally, comparative experiments show that the algorithm enhances the agents’ exploration capabilities, allowing them to learn policies more quickly and stably, which enables multiple missiles to complete the interception task more rapidly and with a higher success rate.
Gli stili APA, Harvard, Vancouver, ISO e altri
Offriamo sconti su tutti i piani premium per gli autori le cui opere sono incluse in raccolte letterarie tematiche. Contattaci per ottenere un codice promozionale unico!

Vai alla bibliografia