Готові списки джерел за темами / Multi-Objective Reinforcement Learning

Добірка наукової літератури з теми "Multi-Objective Reinforcement Learning"

Автор: Grafiati

Опубліковано: 10 грудня 2022

Оновлено: 28 січня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multi-Objective Reinforcement Learning".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Multi-Objective Reinforcement Learning"

Horie, Naoto, Tohgoroh Matsui, Koichi Moriyama, Atsuko Mutoh, and Nobuhiro Inuzuka. "Multi-objective safe reinforcement learning: the relationship between multi-objective reinforcement learning and safe reinforcement learning." Artificial Life and Robotics 24, no. 3 (February 8, 2019): 352–59. http://dx.doi.org/10.1007/s10015-019-00523-3.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Kim, Man-Je, Hyunsoo Park, and Chang Wook Ahn. "Nondominated Policy-Guided Learning in Multi-Objective Reinforcement Learning." Electronics 11, no. 7 (March 28, 2022): 1069. http://dx.doi.org/10.3390/electronics11071069.

Повний текст джерела

Анотація:

Control intelligence is a typical field where there is a trade-off between target objectives, and researchers in this field have longed for artificial intelligence that achieves the target objectives. Multi-objective deep reinforcement learning was sufficient to satisfy this need. In particular, multi-objective deep reinforcement learning methods based on policy optimization are leading the optimization of control intelligence. However, multi-objective reinforcement learning has difficulties when finding various Pareto optimals of multi-objectives due to the greedy nature of reinforcement learning. We propose a method of policy assimilation to solve this problem. This method was applied to MO-V-MPO, one of preference-based multi-objective reinforcement learning, to increase diversity. The performance of this method has been verified through experiments in a continuous control environment.

Стилі APA, Harvard, Vancouver, ISO та ін.

Drugan, Madalina, Marco Wiering, Peter Vamplew, and Madhu Chetty. "Special issue on multi-objective reinforcement learning." Neurocomputing 263 (November 2017): 1–2. http://dx.doi.org/10.1016/j.neucom.2017.06.020.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Perez, Julien, Cécile Germain-Renaud, Balazs Kégl, and Charles Loomis. "Multi-objective Reinforcement Learning for Responsive Grids." Journal of Grid Computing 8, no. 3 (June 8, 2010): 473–92. http://dx.doi.org/10.1007/s10723-010-9161-0.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Nguyen, Thanh Thi, Ngoc Duy Nguyen, Peter Vamplew, Saeid Nahavandi, Richard Dazeley, and Chee Peng Lim. "A multi-objective deep reinforcement learning framework." Engineering Applications of Artificial Intelligence 96 (November 2020): 103915. http://dx.doi.org/10.1016/j.engappai.2020.103915.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

García, Javier, Rubén Majadas, and Fernando Fernández. "Learning adversarial attack policies through multi-objective reinforcement learning." Engineering Applications of Artificial Intelligence 96 (November 2020): 104021. http://dx.doi.org/10.1016/j.engappai.2020.104021.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yamamoto, Hiroyuki, Tomohiro Hayashida, Ichiro Nishizaki, and Shinya Sekizaki. "Hypervolume-Based Multi-Objective Reinforcement Learning: Interactive Approach." Advances in Science, Technology and Engineering Systems Journal 4, no. 1 (2019): 93–100. http://dx.doi.org/10.25046/aj040110.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

García, Javier, Roberto Iglesias, Miguel A. Rodríguez, and Carlos V. Regueiro. "Incremental reinforcement learning for multi-objective robotic tasks." Knowledge and Information Systems 51, no. 3 (September 22, 2016): 911–40. http://dx.doi.org/10.1007/s10115-016-0992-2.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Schneider, Stefan, Ramin Khalili, Adnan Manzoor, Haydar Qarawlus, Rafael Schellenberg, Holger Karl, and Artur Hecker. "Self-Learning Multi-Objective Service Coordination Using Deep Reinforcement Learning." IEEE Transactions on Network and Service Management 18, no. 3 (September 2021): 3829–42. http://dx.doi.org/10.1109/tnsm.2021.3076503.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ferreira, Leonardo Anjoletto, Carlos Henrique Costa Ribeiro, and Reinaldo Augusto da Costa Bianchi. "Heuristically accelerated reinforcement learning modularization for multi-agent multi-objective problems." Applied Intelligence 41, no. 2 (May 1, 2014): 551–62. http://dx.doi.org/10.1007/s10489-014-0534-0.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Multi-Objective Reinforcement Learning"

Pinder, J. M. "Multi-objective reinforcement learning framework for unknown stochastic & uncertain environments." Thesis, University of Salford, 2016. http://usir.salford.ac.uk/39978/.

Повний текст джерела

Анотація:

This dissertation focuses on the problem of uncertainty handling during learning, by agents dealing in stochastic environments by means of Multi Objective Reinforcement Learning (MORL). Most previous investigations into multi objective reinforcement learning have proposed algorithms to deal with the learning performance issues but have neglected the uncertainty present in stochastic environments. The realisation that multiple long term objectives are exhibited in many risky and uncertain real-world decision making problems forms the principle motivation of this research. This dissertation proposes a novel modification to the single objective GPFRL algorithm (Hinojosa et al, 2008) where, the implementation of a linear scalarisation methodology provides a way to automatically find an optimal policy for multiple objectives under different kinds of uncertainty. The proposed Generalised Probabilistic Fuzzy Multi Objective Reinforcement Learning (GPFMORL) algorithm is further enhanced by the introduction of prospect theory to guarantee convergence by the means of risk evaluation. The simulated grid world increased in complexity as a further two complementary and conflicting objectives were specified whilst also introducing uncertainty in the form of stochastic cross winds. Results obtained from the GPFMORL grid world simulations were compared against two more classical multi objective algorithms, MOQ and MOSARSA, showing not only a stronger convergence but also a much faster one. Experiments performed on an actual Quad-Copter/Drone demonstrated that the proposed algorithm and developed framework are both feasible and promising for the control of Artificially Intelligent (AI) Unmanned Aerial Vehicles (UAV) in a variety of real-world multi objective applications such as; autonomous landing/delivery or search and rescue. Furthermore, the observed results of this work showed that the GPFMORL method can find its major real world application in the un-calibrated control of non-linear, multiple inputs, and multiple output systems, especially in multi objective situations with high uncertainty. Proposed novel case study research prototype examples include: Controlled Environment Agriculture for optimising Hydroponic Crop Growth by the proposed “Automated Solar Powered Environmental Controller” (ASPEC). Finally the “Robotic Dementia Medication Administration System” (RDMAS) attempts to optimise liquid medication dispensing via intelligent scheduling to more appropriate times of the day when the patient is more likely to remember to take their medication, based upon previous learned knowledge and experience.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Weijia. "Multi-objective sequential decision making." Phd thesis, Université Paris Sud - Paris XI, 2014. http://tel.archives-ouvertes.fr/tel-01057079.

Повний текст джерела

Анотація:

This thesis is concerned with multi-objective sequential decision making (MOSDM). The motivation is twofold. On the one hand, many decision problems in the domains of e.g., robotics, scheduling or games, involve the optimization of sequences of decisions. On the other hand, many real-world applications are most naturally formulated in terms of multi-objective optimization (MOO). The proposed approach extends the well-known Monte-Carlo tree search (MCTS) framework to the MOO setting, with the goal of discovering several optimal sequences of decisions through growing a single search tree. The main challenge is to propose a new reward, able to guide the exploration of the tree although the MOO setting does not enforce a total order among solutions. The main contribution of the thesis is to propose and experimentally study two such rewards, inspired from the MOO literature and assessing a solution with respect to the archive of previous solutions (Pareto archive): the hypervolume indicator and the Pareto dominance reward. The study shows the complementarity of these two criteria. The hypervolume indicator suffers from its known computational complexity; however the proposed extension thereof provides fine-grained information about the quality of solutions with respect to the current archive. Quite the contrary, the Pareto-dominance reward is linear but it provides increasingly rare information. Proofs of principle of the approach are given on artificial problems and challenges, and confirm the merits of the approach. In particular, MOMCTS is able to discover policies lying in non-convex regions of the Pareto front, contrasting with the state of the art: existing Multi-Objective Reinforcement Learning algorithms are based on linear scalarization and thus fail to sample such non-convex regions. Finally MOMCTS honorably competes with the state of the art on the 2013 MOPTSP competition.

Стилі APA, Harvard, Vancouver, ISO та ін.

Bouzid, Salah Eddine. "Optimisation multicritères des performances de réseau d’objets communicants par méta-heuristiques hybrides et apprentissage par renforcement." Thesis, Le Mans, 2020. http://cyberdoc-int.univ-lemans.fr/Theses/2020/2020LEMA1026.pdf.

Повний текст джерела

Анотація:

Le déploiement des réseaux d’objets communicants «ROCs», dont les densités augmentent sans cesse, conditionne à la fois l’optimalité de leur qualité de service, leur consommation énergétique et par conséquent leur durée de vie. Il s’avère que le problème de déterminer le placement optimal, relativement aux différents critères de qualité, des nœuds de ces réseaux est un problème Np-Complet. Face à cette Np-complétude, et en particulier pour des environnements intérieurs, les approches existantes focalisent sur l’optimisation d’un seul objectif en négligeant les autres critères, ou optent pour une solution manuelle fastidieuse et coûteuse. Des nouvelles approches pour résoudre ce problème sont donc nécessaires. Cette thèse propose une nouvelle approche qui permet de générer automatiquement, dès la phase de conception des réseaux d’objets communicants, le déploiement qui garantit à la fois l’optimalité en termes de performances et de robustesse face aux éventuelles défaillances et instabilités topologiques. Cette approche proposée est basée d’une part sur la modélisation du problème de déploiement sous forme d’un problème d’optimisation combinatoire multi-objectifs sous contraintes, et sa résolution par un algorithme génétique hybride combinant l’optimisation multi-objectifs avec l’optimisation à somme pondérée, et d’autre part sur l’intégration de l’apprentissage par renforcement pour garantir l’optimisation de la consommation énergétique et la prolongation de la durée de vie. Elle est concrétisée par le développement de deux outils. Un premier appelé MOONGA (pour Multi-Objective Optimization of Wireless Network Approach Based on Genetic Algorithm) qui permet de générer automatiquement le placement des nœuds, qui optimise la connectivité, la m-connectivité, la couverture, la k-couverture, la redondance de couverture et le coût. Cette optimisation prend en considération les contraintes liées à l'architecture de l’espace de déploiement, à la topologie du réseau, aux spécificités de l'application pour laquelle le réseau est conçu et aux préférences du concepteur. Après optimisation de déploiement l’outil R2LTO (Pour Reinforcement Learning for Life-Time Optimization), permet d’intégrer un protocole de routage, basé sur l'apprentissage par renforcement, pour garantir l’optimisation de la consommation énergétique et de la durée de vie du ROC après son déploiement tout en conservant la QoS requise
The deployment of Communicating Things Networks (CTNs), with continuously increasing densities, needs to be optimal in terms of quality of service, energy consumption and lifetime. Determining the optimal placement of the nodes of these networks, relative to the different quality criteria, is an NP-Hard problem. Faced to this NP-Hardness, especially for indoor environments, existing approaches focus on the optimization of one single objective while neglecting the other criteria, or adopt an expensive manual solution. Finding new approaches to solve this problem is required. Accordingly, in this thesis, we propose a new approach which automatically generates the deployment that guarantees optimality in terms of performance and robustness related to possible topological failures and instabilities. The proposed approach is based, on the first hand, on the modeling of the deployment problem as a multi-objective optimization problem under constraints, and its resolution using a hybrid algorithm combining genetic multi-objective optimization with weighted sum optimization and on the other hand, the integration of reinforcement learning to guarantee the optimization of energy consumption and the extending the network lifetime. To apply this approach, two tools are developed. A first called MOONGA (Multi-Objective Optimization of wireless Network approach based on Genetic Algorithm) which automatically generates the placement of nodes while optimizing the metrics that define the QoS of the CTN: connectivity, m-connectivity, coverage, k-coverage, coverage redundancy and cost. MOONGA tool considers constraints related to the architecture of the deployment space, the network topology, the specifies of the application and the preferences of the network designer. The second optimization tool is named R2LTO (Reinforcement Learning for Life-Time Optimization), which is a new routing protocol for CTNs, based on distributed reinforcement learning that allows to determine the optimal rooting path in order to guarantee energy-efficiency and to extend the network lifetime while maintaining the required QoS

Стилі APA, Harvard, Vancouver, ISO та ін.

Ho, Dinh Khanh. "Gestion des ressources et de l’énergie orientée qualité de service pour les systèmes robotiques mobiles autonomes." Thesis, Université Côte d'Azur, 2020. http://www.theses.fr/2020COAZ4000.

Повний текст джерела

Анотація:

Les systèmes robotiques mobiles autonomes deviennent de plus en plus complexes avec l’intégration de composants de capteurs et d’actionneurs et de fonctionnalités avancées pour effectuer les missions réelles. Pour ces systèmes techniques, les exigences sont divisées en deux catégories : les exigences fonctionnelles et les exigences non-fonctionnelles. Alors que les exigences fonctionnelles représentent ce que le robot doit faire pour accomplir la mission, les exigences non-fonctionnelles représentent la façon dont le robot exécute la mission. Ainsi, la qualité de service et l’efficacité énergétique d’une mission robotique sont classées dans cette catégorie. L’autonomie de ces systèmes est pleinement atteinte lorsque les exigences fonctionnelles et non-fonctionnelles sont garanties sans aucune intervention humaine ni aucun contrôle externe. Cependant, ces systèmes mobiles sont naturellement confrontés à des contraintes de disponibilité des ressources et de capacité énergétique, notamment dans le cadre de mission à longue durée, ces contraintes deviennent plus critiques. De plus, la performance de ces systèmes est également influencée par des conditions environnementales inattendues et non structurées dans lesquelles ils interagissent. La gestion des ressources et de l’énergie en cours de mission est donc un défi pour les robots mobiles autonomes afin de garantir les objectifs de performance souhaités tout en respectant les contraintes. Dans ce contexte, la capacité du système robotique à prendre conscience de ses propres comportements internes et de son environnement physique et à s’adapter à ces circonstances dynamiques devient importante.Cette thèse porte sur la qualité de service et l’efficacité énergétique des systèmes robotiques mobiles et propose une gestion hiérarchique en cours d’exécution afin de garantir ces objectifs non-fonctionnels de chaque mission robotique. Au niveau de la gestion locale de chaque mission, un MISSION MANAGER utilise un mécanisme de prise de décision fondé sur l’apprentissage par renforcement pour reconfigurer automatiquement certains paramètres clés propres à la mission afin de minimiser le niveau de violation des objectifs de performance et des objectifs énergétiques requis. Au niveau de la gestion globale de l’ensemble du système, un MULTI-MISSION MANAGER s’appuie sur des règles de prise de décision et des techniques de raisonnement par cas pour suivre les ressources du système et les réponses des MISSION MANAGERs afin de décider de réallouer le budget énergétique, de régler la qualité du service et de déclencher l’apprentissage en ligne pour chaque mission robotique. La méthodologie proposée a été prototypée et validée avec succès dans un environnement de simulation et le cadre de gestion est également intégré dans notre système robotique mobile réel basé sur une base mobile de Pioneer-3DX équipée d’une plate-forme embarquée de NVIDIA Jetson Xavier
Mobile robotic systems are becoming more and more complex with the integration of advanced sensing and acting components and functionalities to perform the real required missions. For these technical systems, the requirements are divided into two categories: functional and non-functional requirements. While functional requirements represent what the robot must do to accomplish the mission, non-functional requirements represent how the robot performs the mission. Thus, the quality of service and energy efficiency of a robotic mission are classified in this category. The autonomy of these systems is fully achieved when both functional and non-functional requirements are guaranteed without any human intervention or any external control. However, these mobile systems are naturally confronted with resource availability and energy capacity constraints, particularly in the context of long-term missions, these constraints become more critical. In addition, the performance of these systems is also influenced by unexpected and unstructured environmental conditions in which they interact. The management of resources and energy during operation is therefore a challenge for autonomous mobile robots in order to guarantee the desired performance objectives while respecting constraints. In this context, the ability of the robotic system to become aware of its own internal behaviors and physical environment and to adapt to these dynamic circumstances becomes important.This thesis focuses on the quality of service and energy efficiency of mobile robotic systems and proposes a hierarchical run-time management in order to guarantee these non-functional objectives of each robotic mission. At the local management level of each robotic mission, a Mission Manager employs a reinforcement learning-based decision-making mechanism to automatically reconfigure certain key mission-specific parameters to minimize the level of violation of required performance and energy objectives. At the global management level of the whole system, a Multi-Mission Manager leveraged rule-based decision-making and case-based reasoning techniques monitors the system's resources and the responses of Mission Managers in order to decide to reallocate the energy budget, regulate the quality of service and trigger the online learning for each robotic mission.The proposed methodology has been successfully prototyped and validated in a simulation environment and the run-time management framework is also integrated into our real mobile robotic system based on a Pioneer-3DX mobile base equipped with an embedded NVIDIA Jetson Xavier platform

Стилі APA, Harvard, Vancouver, ISO та ін.

Pereira, Tiago Oliveira. "Multi-Objective Deep Reinforcement Learning in Drug Discovery." Master's thesis, 2020. http://hdl.handle.net/10316/92570.

Повний текст джерела

Анотація:

Trabalho de Projeto do Mestrado Integrado em Engenharia Biomédica apresentado à Faculdade de Ciências e Tecnologia
O longo período de tempo, os enormes custos financeiros inerentes à introdução de um novo medicamento no mercado e a incerteza em relação à possibilidade de este vir a ser ou não aceite pelas autoridades responsáveis são claros obstáculos ao desenvolvimento de novos fármacos. A aplicação de técnicas de aprendizagem profunda em fases precoces do processo de descoberta de fármacos pode contribuir para facilitar a identificação de potenciais fármacos com propriedades biológicas promissoras. Nesse sentido, ao utilizar métodos computacionais, é possível reduzir o enorme espaço de pesquisa de possíveis fármacos e minimizar os problemas inerentes às fases subsequentes do processo. Não obstante, a maioria dos estudos que aplicam estas técnicas têm-se focado na otimização de apenas uma propriedade específica das moléculas, o que é insuficiente para o desenvolvimento de fármacos, uma vez que este é um problema que requer uma solução mais abrangente.Este trabalho propõe uma estratégia para a geração orientada de moléculas com o intuito de otimizar propriedades biológicas e físico-químicas. O propósito é gerar um conjunto promissor de moléculas que consigam desempenhar a função biológica desejada e ter efeitos inócuos para o organismo, para posteriormente ser investigada a possibilidade de encontrar possíveis fármacos. O modelo gerador computacional foi conseguido através da implementação de uma rede neuronal recorrente, por sua vez, contendo células de memória de longa duração. Este modelo foi treinado para aprender as regras fundamentais de construção de moléculas através de SMILES. O modelo gerador é depois treinado novamente através de aprendizagem por reforço para produzir moléculas com propriedades previamente determinadas. Para avaliar as novas moléculas geradas, é implementado um modelo regressivo que relaciona matematicamente a estrutura das moléculas com a sua atividade biológica em estudo. A novidade introduzida neste trabalho é a estratégia exploratória que garante, durante o processo de treino, um compromisso entre a necessidade de descobrir todo o espaço químico mais detalhadamente e a necessidade de utilizar a informação previamente aprendida para a construção de moléculas que otimizem a propriedade em estudo. Para demonstrar a eficácia deste método, o modelo gerador foi modificado para abordar objetivos individuais como, por exemplo, a afinidade da ligação entre o fármaco-recetor, e a estimativa quantitativa de um conjunto de propriedades típicas de fármacos. Os resultados demonstram a versatilidade do modelo uma vez que este garante a otimização de diferentes propriedades, mantendo as percentagens de diversidade e validade química nas moléculas geradas a níveis aceitáveis. Para além disso, o modelo gerador foi posteriormente melhorado através do seu alargamento à otimização simultânea de mais do que uma propriedade. Para fazer isso, foram exploradas diversas técnicas para implementar a otimização multiobjectivo com o intuito de aumentar a aplicabilidade dos novos potenciais fármacos através da otimização das suas propriedades físicas, químicas e biológicas. No contexto de aprendizagem por reforço, a abordagem geral foi combinar diferentes recompensas num único valor de recompensa. Neste sentido, foram aplicados diferentes métodos de escalarização para obter uma única recompensa que ponderasse os diferentes objetivos. Os resultados mostram que é possível encontrar moléculas que satisfaçam ambas as propriedades e, simultaneamente, com percentagens de validade a rondar os 90\%.
The long period of time, the enormous financial cost of bringing a new drug into the market, and the uncertainty about whether it will be accepted by the responsible authorities are clear obstacles to the development of new drugs. Applying deep learning techniques in the early stages of the drug discovery process can contribute to facilitating the identification of drug candidates with interesting biological properties. On that account, by employing computational methods, it is possible to reduce the enormous research space for drug-like compounds and minimize all the inherent issues. Nevertheless, most studies that employ these techniques focus on optimizing a specific molecule property, which is scarce for drug development, since this is a problem that requires a more far-reaching solution.This work proposes a framework for the targeted generation of molecules designed to optimize biological and psychochemical properties. The purpose is to create a promising set of molecules that can perform the desired function and have harmless effects for the organism to be further researched as candidate drugs.The artificial intelligence generative model was achieved by implementing a recurrent neural network, containing long short-term memory cells. This model was trained to learn the building rules of valid molecules in terms of SMILES strings. The generator model is then re-trained through reinforcement learning to produce molecules with bespoke properties. To evaluate the newly generated molecules, a structure-activity relationship model is implemented in order to map the molecular structure to the desired biological property. The novelty of this approach is the exploratory strategy that ensures, throughout the training process, a compromise between the need to discover in more detail the entire chemical space and the need to use the already learned information in the construction of molecules that guarantee the optimization of the property in study. To demonstrate the effectiveness of the method, the generator model was biased to address single-objectives, such as the drug-target binding affinity or the quantitative estimate of drug-likeness property. The results show the versatility of the proposed model since it guaranteed the optimization of different properties while maintaining the percentages of generated molecules diversity and validity at acceptable levels. Furthermore, we improve the generative model by expanding this framework to optimize more than one objective. To do that, different techniques to implement multi-objective optimization were explored. The goal was to increase the applicability of new potential drugs through the optimization of physical, chemical and biological properties. Our general approach combines different rewards into a single reward. Different scalarization methods were applied to have a unique reward that pondered the goodness of objectives. The results demonstrate that it is possible to find molecules that satisfy both proposed objectives and, simultaneously, achieve synthesizability rates of approximately 90\%.
Outro - This research has been funded by the Portuguese Research Agency FCT, throughD4 - Deep Drug Discovery and Deployment (CENTRO-01-0145-FEDER029266).This work is funded by national funds through the FCT - Foundation for Scienceand Technology, I.P., within the scope of the project CISUC -UID/CEC/00326/2020 and by European Social Fund, through the RegionalOperational Program Centro 2020

Стилі APA, Harvard, Vancouver, ISO та ін.

Hasan, Md Mahmudul. "An Intelligent Decision-making Scheme in a Dynamic Multi-objective Environment using Deep Reinforcement Learning." Thesis, 2020. https://arro.anglia.ac.uk/id/eprint/705890/1/Hasan_2020.pdf.

Повний текст джерела

Анотація:

Real-life problems are dynamic and associated with a decision-making process with multiple options. We need to do optimisation to solve some of these dynamic decision-making problems. These problems are challenging to solve when we need trade-off between multiple parameters in a decision-making process, especially in a dynamic environment. However, with the help of artificial intelligence (AI), we may solve these problems effectively. This research aims to investigate the development of an intelligent decision-making scheme for a dynamic multi-objective environment using deep reinforcement learning (DRL) algorithm. This includes developing a benchmark in the area of dynamic multi-objective optimisation in reinforcement learning (RL) settings, which stimulated the development of an improved testbed using the conventional deep-sea treasure (DST) benchmark. The proposed testbed is created based on changing the optimal Pareto front (PF) and Pareto set (PS). To the best of my knowledge, this is the first dynamic multi-objective testbed for RL settings. Moreover, a framework is proposed to handle multi-objective in a dynamic environment that fundamentally maintains an equilibrium between different objectives to provide a compromised solution that is closed to the true PF. To proof the concept, the proposed model has been implemented in a real-world scenario to predict the vulnerable zones based on the water quality resilience in São Paulo, Brazil. The proposed algorithm namely parity-Q deep Q network (PQDQN) is successfully implemented and tested where the agent outperforms in terms of achieving the goal (i.e. obtained rewards). Though, the agent requires higher elapsed time (i.e. the number of steps) to be trained compared to the multi-objective Monte Carlo tree search (MO-MCTS) agent in a particular event, its accuracy in finding the Pareto optimum solutions is significantly enhanced compared to the multi-policy DQN (MPDQN) and multi-Pareto Q learning (MPQ) algorithms. The outcome reveals that the proposed algorithm can find the optimum solution in a dynamic environment. It allows a new objective to accommodate without any retraining and behaviour tuning of the agent. It also governs the policy that needs to be selected. As far as the dynamic DST testbed is concerned, it will provide the researchers with a new dimension to conduct their research and enable them to test their algorithms in solving problems that are dynamic in nature.

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Multi-Objective Reinforcement Learning"

Van Moffaert, Kristof, Madalina M. Drugan, and Ann Nowé. "Hypervolume-Based Multi-Objective Reinforcement Learning." In Lecture Notes in Computer Science, 352–66. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37140-0_28.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Moustafa, Ahmed, and Minjie Zhang. "Multi-Objective Service Composition Using Reinforcement Learning." In Service-Oriented Computing, 298–312. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-45005-1_21.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Méndez-Hernández, Beatriz M., Erick D. Rodríguez-Bazan, Yailen Martinez-Jimenez, Pieter Libin, and Ann Nowé. "A Multi-objective Reinforcement Learning Algorithm for JSSP." In Artificial Neural Networks and Machine Learning – ICANN 2019: Theoretical Neural Computation, 567–84. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-30487-4_44.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Videau, Mathurin, Alessandro Leite, Olivier Teytaud, and Marc Schoenauer. "Multi-objective Genetic Programming for Explainable Reinforcement Learning." In Lecture Notes in Computer Science, 278–93. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-02056-8_18.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Xu, Jiangjiao, Ke Li, and Mohammad Abusara. "Multi-objective Reinforcement Learning Based Multi-microgrid System Optimisation Problem." In Lecture Notes in Computer Science, 684–96. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-72062-9_54.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yamaguchi, Tomohiro, Shota Nagahama, Yoshihiro Ichikawa, and Keiki Takadama. "Model-Based Multi-objective Reinforcement Learning with Unknown Weights." In Human Interface and the Management of Information. Information in Intelligent Systems, 311–21. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-22649-7_25.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yu, Yemin, Kun Kuang, Jiangchao Yang, Zeke Wang, Kunyang Jia, Weiming Lu, Hongxia Yang, and Fei Wu. "Multi-objective Meta-return Reinforcement Learning for Sequential Recommendation." In Artificial Intelligence, 95–111. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-20500-2_8.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Iwamura, Koji, and Nobuhiro Sugimura. "Distributed Real-Time Scheduling by Using Multi-agent Reinforcement Learning." In Multi-objective Evolutionary Optimisation for Product Design and Manufacturing, 325–42. London: Springer London, 2011. http://dx.doi.org/10.1007/978-0-85729-652-8_11.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yan, Jiaxin, Hua Wang, Xiaole Li, Shanwen Yi, and Yao Qin. "Multi-objective Disaster Backup in Inter-datacenter Using Reinforcement Learning." In Wireless Algorithms, Systems, and Applications, 590–601. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-59016-1_49.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Liu, Jun, Yi Zhou, Yimin Qiu, and Zhongfeng Li. "An Improved Multi-objective Optimization Algorithm Based on Reinforcement Learning." In Lecture Notes in Computer Science, 501–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-09677-8_42.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Multi-Objective Reinforcement Learning"

Skalse, Joar, Lewis Hammond, Charlie Griffin, and Alessandro Abate. "Lexicographic Multi-Objective Reinforcement Learning." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/476.

Повний текст джерела

Анотація:

In this work we introduce reinforcement learning techniques for solving lexicographic multi-objective problems. These are problems that involve multiple reward signals, and where the goal is to learn a policy that maximises the first reward signal, and subject to this constraint also maximises the second reward signal, and so on. We present a family of both action-value and policy gradient algorithms that can be used to solve such problems, and prove that they converge to policies that are lexicographically optimal. We evaluate the scalability and performance of these algorithms empirically, and demonstrate their applicability in practical settings. As a more specific application, we show how our algorithms can be used to impose safety constraints on the behaviour of an agent, and compare their performance in this context with that of other constrained reinforcement learning algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

Chen, Xi, Ali Ghadirzadeh, Marten Bjorkman, and Patric Jensfelt. "Meta-Learning for Multi-objective Reinforcement Learning." In 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019. http://dx.doi.org/10.1109/iros40897.2019.8968092.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Wiering, Marco A., Maikel Withagen, and Madalina M. Drugan. "Model-based multi-objective reinforcement learning." In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010622.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Liao, H. L., and Q. H. Wu. "Multi-objective optimisation by reinforcement learning." In 2010 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2010. http://dx.doi.org/10.1109/cec.2010.5585972.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ferreira, Leonardo A., Reinaldo A. C. Bianchi, and Carlos H. C. Ribeiro. "Multi-agent Multi-objective Learning Using Heuristically Accelerated Reinforcement Learning." In 2012 Brazilian Robotics Symposium and Latin American Robotics Symposium (SBR-LARS). IEEE, 2012. http://dx.doi.org/10.1109/sbr-lars.2012.10.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Yahyaa, Saba Q., Madalina M. Drugan, and Bernard Manderick. "Annealing-pareto multi-objective multi-armed bandit algorithm." In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010619.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ravichandran, Naresh Balaji, Fangkai Yang, Christopher Peters, Anders Lansner, and Pawel Herman. "Pedestrian simulation as multi-objective reinforcement learning." In IVA '18: International Conference on Intelligent Virtual Agents. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3267851.3267914.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Van Moffaert, Kristof, Tim Brys, and Ann Nowe. "Risk-sensitivity through multi-objective reinforcement learning." In 2015 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2015. http://dx.doi.org/10.1109/cec.2015.7257098.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Van Moffaert, Kristof, Madalina M. Drugan, and Ann Nowe. "Scalarized multi-objective reinforcement learning: Novel design techniques." In 2013 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2013. http://dx.doi.org/10.1109/adprl.2013.6615007.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Liu, Fei-Yu, and Chao Qian. "Prediction Guided Meta-Learning for Multi-Objective Reinforcement Learning." In 2021 IEEE Congress on Evolutionary Computation (CEC). IEEE, 2021. http://dx.doi.org/10.1109/cec45853.2021.9504972.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!