Journal articles on the topic 'Multi-Objective Markov Decision Processes'

To see the other types of publications on this topic, follow the link: Multi-Objective Markov Decision Processes.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multi-Objective Markov Decision Processes.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Wakuta, K., and K. Togawa. "Solution procedures for multi-objective markov decision processes." Optimization 43, no. 1 (January 1998): 29–46. http://dx.doi.org/10.1080/02331939808844372.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ahn, Hyun-Soo, and Rhonda Righter. "Multi-Actor Markov Decision Processes." Journal of Applied Probability 42, no. 1 (March 2005): 15–26. http://dx.doi.org/10.1239/jap/1110381367.

Full text
Abstract:
We give a very general reformulation of multi-actor Markov decision processes and show that there is a tendency for the actors to take the same action whenever possible. This considerably reduces the complexity of the problem, either facilitating numerical computation of the optimal policy or providing a basis for a heuristic.
APA, Harvard, Vancouver, ISO, and other styles
3

Ahn, Hyun-Soo, and Rhonda Righter. "Multi-Actor Markov Decision Processes." Journal of Applied Probability 42, no. 01 (March 2005): 15–26. http://dx.doi.org/10.1017/s0021900200000024.

Full text
Abstract:
We give a very general reformulation of multi-actor Markov decision processes and show that there is a tendency for the actors to take the same action whenever possible. This considerably reduces the complexity of the problem, either facilitating numerical computation of the optimal policy or providing a basis for a heuristic.
APA, Harvard, Vancouver, ISO, and other styles
4

LIU, QIU-SHENG, KATSUHISA OHNO, and HIROTAKA NAKAYAMA. "Multi-objective discounted Markov decision processes with expectation and variance criteria." International Journal of Systems Science 23, no. 6 (June 1992): 903–14. http://dx.doi.org/10.1080/00207729208949257.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Fujita, Toshiharu, and Akifumi Kira. "Mutually Dependent Markov Decision Processes." Journal of Advanced Computational Intelligence and Intelligent Informatics 18, no. 6 (November 20, 2014): 992–98. http://dx.doi.org/10.20965/jaciii.2014.p0992.

Full text
Abstract:
In this paper, we introduce a basic framework for mutually dependent Markov decision processes (MDMDP) showing recursive mutual dependence. Our model is structured upon two types of finite-stage Markov decision processes. At each stage, the reward in one process is given by the optimal value of the alternative process problem, whose initial state is determined by the current state and decision in the original process. We formulate the MDMDP model and derive mutually dependent recursive equations by dynamic programming. Furthermore, MDMDP is illustrated in a numerical example. The model enables easier treatment of some classes of complex multi-stage decision processes.
APA, Harvard, Vancouver, ISO, and other styles
6

Mandow, L., J. L. Perez-de-la-Cruz, and N. Pozas. "Multi-objective dynamic programming with limited precision." Journal of Global Optimization 82, no. 3 (November 2, 2021): 595–614. http://dx.doi.org/10.1007/s10898-021-01096-x.

Full text
Abstract:
AbstractThis paper addresses the problem of approximating the set of all solutions for Multi-objective Markov Decision Processes. We show that in the vast majority of interesting cases, the number of solutions is exponential or even infinite. In order to overcome this difficulty we propose to approximate the set of all solutions by means of a limited precision approach based on White’s multi-objective value-iteration dynamic programming algorithm. We prove that the number of calculated solutions is tractable and show experimentally that the solutions obtained are a good approximation of the true Pareto front.
APA, Harvard, Vancouver, ISO, and other styles
7

Wernz, Christian. "Multi-time-scale Markov decision processes for organizational decision-making." EURO Journal on Decision Processes 1, no. 3-4 (November 2013): 299–324. http://dx.doi.org/10.1007/s40070-013-0020-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Bouyer, Patricia, Mauricio González, Nicolas Markey, and Mickael Randour. "Multi-weighted Markov Decision Processes with Reachability Objectives." Electronic Proceedings in Theoretical Computer Science 277 (September 7, 2018): 250–64. http://dx.doi.org/10.4204/eptcs.277.18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

White, D. J. "An Heuristic for Multi-Dimensional Markov Decision Processes." Journal of Information and Optimization Sciences 14, no. 2 (May 1993): 203–19. http://dx.doi.org/10.1080/02522667.1993.10699150.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Randour, Mickael, Jean-François Raskin, and Ocan Sankur. "Percentile queries in multi-dimensional Markov decision processes." Formal Methods in System Design 50, no. 2-3 (January 5, 2017): 207–48. http://dx.doi.org/10.1007/s10703-016-0262-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Pianosi, F., A. Castelletti, and M. Restelli. "Tree-based fitted Q-iteration for multi-objective Markov decision processes in water resource management." Journal of Hydroinformatics 15, no. 2 (January 2, 2013): 258–70. http://dx.doi.org/10.2166/hydro.2013.169.

Full text
Abstract:
Multi-objective Markov decision processes (MOMDPs) provide an effective modeling framework for decision-making problems involving water systems. The traditional approach is to define many single-objective problems (resulting from different combinations of the objectives), each solvable by standard optimization. This paper presents an approach based on reinforcement learning (RL) that can learn the operating policies for all combinations of objectives in a single training process. The key idea is to enlarge the approximation of the action-value function, which is performed by single-objective RL over the state-action space, to the space of the objectives' weights. The batch-mode nature of the algorithm allows for enriching the training dataset without further interaction with the controlled system. The approach is demonstrated on a numerical test case study and evaluated on a real-world application, the Hoa Binh reservoir, Vietnam. Experimental results on the test case show that the proposed approach (multi-objective fitted Q-iteration; MOFQI) becomes computationally preferable over the repeated application of its single-objective version (fitted Q-iteration; FQI) when evaluating more than five weight combinations. In the Hoa Binh case study, the operating policies computed with MOFQI and FQI have comparable efficiency, while MOFQI provides a continuous approximation of the Pareto frontier with no additional computing costs.
APA, Harvard, Vancouver, ISO, and other styles
12

Abdelfattah, Sherif, Kathryn Kasmarik, and Jiankun Hu. "A robust policy bootstrapping algorithm for multi-objective reinforcement learning in non-stationary environments." Adaptive Behavior 28, no. 4 (August 15, 2019): 273–92. http://dx.doi.org/10.1177/1059712319869313.

Full text
Abstract:
Multi-objective Markov decision processes are a special kind of multi-objective optimization problem that involves sequential decision making while satisfying the Markov property of stochastic processes. Multi-objective reinforcement learning methods address this kind of problem by fusing the reinforcement learning paradigm with multi-objective optimization techniques. One major drawback of these methods is the lack of adaptability to non-stationary dynamics in the environment. This is because they adopt optimization procedures that assume stationarity in order to evolve a coverage set of policies that can solve the problem. This article introduces a developmental optimization approach that can evolve the policy coverage set while exploring the preference space over the defined objectives in an online manner. We propose a novel multi-objective reinforcement learning algorithm that can robustly evolve a convex coverage set of policies in an online manner in non-stationary environments. We compare the proposed algorithm with two state-of-the-art multi-objective reinforcement learning algorithms in stationary and non-stationary environments. Results showed that the proposed algorithm significantly outperforms the existing algorithms in non-stationary environments while achieving comparable results in stationary environments.
APA, Harvard, Vancouver, ISO, and other styles
13

Feinberg, Eugene A., and Aleksey B. Piunovskiy. "Multiple Objective Nonatomic Markov Decision Processes with Total Reward Criteria." Journal of Mathematical Analysis and Applications 247, no. 1 (July 2000): 45–66. http://dx.doi.org/10.1006/jmaa.2000.6819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Luo, Fei, Bo Feng, and Huazhong Wang. "Automatic first-arrival picking method via intelligent Markov optimal decision processes." Journal of Geophysics and Engineering 18, no. 3 (June 2021): 406–17. http://dx.doi.org/10.1093/jge/gxab026.

Full text
Abstract:
Abstract Picking the first arrival is an important step in seismic processing. The large volume of the seismic data calls for automatic and objective picking. In this paper, we formulate first-arrival picking as an intelligent Markov decision process in the multi-dimensional feature attribute space. By designing a reasonable model, the global optimization is carried out in the reward function space to obtain the path with the largest cumulative reward value, to achieve the purpose of automatically picking up the first arrival. The state-value function contains a distance-related discount factor γ, which enables the Markov decision process to pick up the first-arrival continuity to consider the lateral continuity of the seismic data and avoid the bad trace information in the seismic data. On this basis, the method of this paper further introduces the optimized model that is a fuzzy clustering-based multi-dimensional attribute reward function and structure-based Gaussian stochastic policy, thereby reducing the difficulty of model design, and making the seismic data pick up more accurately and automatically. Testing this approach in the field seismic data reveals its properties and shows it can automatically pick up more reasonable first arrivals and has a certain quality control ability, especially the first-arrival energy is weak (the signal-to-noise ratio is low) or there are adjacent complex waveforms in the shallow layer.
APA, Harvard, Vancouver, ISO, and other styles
15

Becker, R., S. Zilberstein, V. Lesser, and C. V. Goldman. "Solving Transition Independent Decentralized Markov Decision Processes." Journal of Artificial Intelligence Research 22 (December 1, 2004): 423–55. http://dx.doi.org/10.1613/jair.1497.

Full text
Abstract:
Formal treatment of collaborative multi-agent systems has been lagging behind the rapid progress in sequential decision making by individual agents. Recent work in the area of decentralized Markov Decision Processes (MDPs) has contributed to closing this gap, but the computational complexity of these models remains a serious obstacle. To overcome this complexity barrier, we identify a specific class of decentralized MDPs in which the agents' transitions are independent. The class consists of independent collaborating agents that are tied together through a structured global reward function that depends on all of their histories of states and actions. We present a novel algorithm for solving this class of problems and examine its properties, both as an optimal algorithm and as an anytime algorithm. To our best knowledge, this is the first algorithm to optimally solve a non-trivial subclass of decentralized MDPs. It lays the foundation for further work in this area on both exact and approximate algorithms.
APA, Harvard, Vancouver, ISO, and other styles
16

Belousov, Boris, and Jan Peters. "Entropic Regularization of Markov Decision Processes." Entropy 21, no. 7 (July 10, 2019): 674. http://dx.doi.org/10.3390/e21070674.

Full text
Abstract:
An optimal feedback controller for a given Markov decision process (MDP) can in principle be synthesized by value or policy iteration. However, if the system dynamics and the reward function are unknown, a learning agent must discover an optimal controller via direct interaction with the environment. Such interactive data gathering commonly leads to divergence towards dangerous or uninformative regions of the state space unless additional regularization measures are taken. Prior works proposed bounding the information loss measured by the Kullback–Leibler (KL) divergence at every policy improvement step to eliminate instability in the learning dynamics. In this paper, we consider a broader family of f-divergences, and more concretely α -divergences, which inherit the beneficial property of providing the policy improvement step in closed form at the same time yielding a corresponding dual objective for policy evaluation. Such entropic proximal policy optimization view gives a unified perspective on compatible actor-critic architectures. In particular, common least-squares value function estimation coupled with advantage-weighted maximum likelihood policy improvement is shown to correspond to the Pearson χ 2 -divergence penalty. Other actor-critic pairs arise for various choices of the penalty-generating function f. On a concrete instantiation of our framework with the α -divergence, we carry out asymptotic analysis of the solutions for different values of α and demonstrate the effects of the divergence function choice on common standard reinforcement learning problems.
APA, Harvard, Vancouver, ISO, and other styles
17

Ortega-Gutiérrez, R. Israel, and H. Cruz-Suárez. "A Moreau-Yosida regularization for Markov decision processes." Proyecciones (Antofagasta) 40, no. 1 (February 1, 2020): 117–37. http://dx.doi.org/10.22199/issn.0717-6279-2021-01-0008.

Full text
Abstract:
This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.
APA, Harvard, Vancouver, ISO, and other styles
18

Ortega-Gutiérrez, R. Israel, and H. Cruz-Suárez. "A Moreau-Yosida regularization for Markov decision processes." Proyecciones (Antofagasta) 40, no. 1 (February 1, 2020): 117–37. http://dx.doi.org/10.22199/issn.0717-6279-2021-01-0008.

Full text
Abstract:
This paper addresses a class of sequential optimization problems known as Markov decision processes. These kinds of processes are considered on Euclidean state and action spaces with the total expected discounted cost as the objective function. The main goal of the paper is to provide conditions to guarantee an adequate Moreau-Yosida regularization for Markov decision processes (named the original process). In this way, a new Markov decision process that conforms to the Markov control model of the original process except for the cost function induced via the Moreau-Yosida regularization is established. Compared to the original process, this new discounted Markov decision process has richer properties, such as the differentiability of its optimal value function, strictly convexity of the value function, uniqueness of optimal policy, and the optimal value function and the optimal policy of both processes, are the same. To complement the theory presented, an example is provided.
APA, Harvard, Vancouver, ISO, and other styles
19

Patek, Stephen D. "On terminating Markov decision processes with a risk-averse objective function." Automatica 37, no. 9 (September 2001): 1379–86. http://dx.doi.org/10.1016/s0005-1098(01)00084-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Zhou, Yuyang, Guang Cheng, Shanqing Jiang, Yuyu Zhao, and Zihan Chen. "Cost-effective moving target defense against DDoS attacks using trilateral game and multi-objective Markov decision processes." Computers & Security 97 (October 2020): 101976. http://dx.doi.org/10.1016/j.cose.2020.101976.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Wan, Chaowei, Xiaodao Chen, and Dongbo Liu. "A Multi-Objective-Driven Placement Technique for Digital Microfluidic Biochips." Journal of Circuits, Systems and Computers 28, no. 05 (May 2019): 1950076. http://dx.doi.org/10.1142/s0218126619500762.

Full text
Abstract:
Microfluidic biochips are extensively utilized in biochemistry procedures due to their low cost, high precision and efficiency when compared to traditional laboratory procedures. Recent, computer-aided design (CAD) techniques enable a high performance in digital microfluidic biochip design. A key part in digital microfluidic biochip CAD design is the biochip placement procedure which determines the physical location for biological reactions during the physical design. For the biochip physical design, multiple objects need to be considered, such as the size of the chip and the total operation time. In this paper, a multi-objective optimization is proposed based on Markov decision processes (MDPs). The proposed method is evaluated on a set of standard biochip benchmarks. Compared to existing works, experimental results show that the total operation time, the capacity for routing and the chip size can be optimized simultaneously.
APA, Harvard, Vancouver, ISO, and other styles
22

Messias, João, Matthijs Spaan, and Pedro Lima. "GSMDPs for Multi-Robot Sequential Decision-Making." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 29, 2013): 1408–14. http://dx.doi.org/10.1609/aaai.v27i1.8550.

Full text
Abstract:
Markov Decision Processes (MDPs) provide an extensive theoretical background for problems of decision-making under uncertainty. In order to maintain computational tractability, however, real-world problems are typically discretized in states and actions as well as in time. Assuming synchronous state transitions and actions at fixed rates may result in models which are not strictly Markovian, or where agents are forced to idle between actions, losing their ability to react to sudden changes in the environment. In this work, we explore the application of Generalized Semi-Markov Decision Processes (GSMDPs) to a realistic multi-robot scenario. A case study will be presented in the domain of cooperative robotics, where real-time reactivity must be preserved, and synchronous discrete-time approaches are therefore sub-optimal. This case study is tested on a team of real robots, and also in realistic simulation. By allowing asynchronous events to be modeled over continuous time, the GSMDP approach is shown to provide greater solution quality than its discrete-time counterparts, while still being approximately solvable by existing methods.
APA, Harvard, Vancouver, ISO, and other styles
23

Compare, Michele, Paolo Marelli, Piero Baraldi, and Enrico Zio. "A Markov decision process framework for optimal operation of monitored multi-state systems." Proceedings of the Institution of Mechanical Engineers, Part O: Journal of Risk and Reliability 232, no. 6 (February 16, 2018): 677–89. http://dx.doi.org/10.1177/1748006x18757077.

Full text
Abstract:
We develop a decision support framework based on Markov decision processes to maximize the profit from the operation of a multi-state system. This framework enables a comprehensive management of the multi-state system, which considers the maintenance decisions together with those on the multi-state system operation setting, that is, its loading condition and configuration. The decisions are informed by a condition monitoring system, which estimates the health state of the multi-state system components. The approach is shown with reference to a mechanical system made up of components affected by fatigue.
APA, Harvard, Vancouver, ISO, and other styles
24

Hartmanns, Arnd, Sebastian Junges, Joost-Pieter Katoen, and Tim Quatmann. "Multi-cost Bounded Tradeoff Analysis in MDP." Journal of Automated Reasoning 64, no. 7 (July 28, 2020): 1483–522. http://dx.doi.org/10.1007/s10817-020-09574-9.

Full text
Abstract:
Abstract We provide a memory-efficient algorithm for multi-objective model checking problems on Markov decision processes (MDPs) with multiple cost structures. The key problem at hand is to check whether there exists a scheduler for a given MDP such that all objectives over cost vectors are fulfilled. We cover multi-objective reachability and expected cost objectives, and combinations thereof. We further transfer approaches for computing quantiles over single cost bounds to the multi-cost case and highlight the ensuing challenges. An empirical evaluation shows the scalability of our new approach both in terms of memory consumption and runtime. We discuss the need for more detailed visual presentations of results beyond Pareto curves and present a first visualisation approach that exploits all the available information from the algorithm to support decision makers.
APA, Harvard, Vancouver, ISO, and other styles
25

Tilak, Omkar, and Snehasis Mukhopadhyay. "Partially decentralized reinforcement learning in finite, multi-agent Markov decision processes." AI Communications 24, no. 4 (2011): 293–309. http://dx.doi.org/10.3233/aic-2011-0505.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Chang, Hyeong Soo. "Random search for constrained Markov decision processes with multi-policy improvement." Automatica 58 (August 2015): 127–30. http://dx.doi.org/10.1016/j.automatica.2015.05.016.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Altman, Eitan, and Flos Spieksma. "The Linear Program approach in multi-chain Markov Decision Processes revisited." ZOR Zeitschrift f�r Operations Research Methods and Models of Operations Research 42, no. 2 (June 1995): 169–88. http://dx.doi.org/10.1007/bf01415752.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Dufour, François, and Alexei B. Piunovskiy. "Impulsive Control for Continuous-Time Markov Decision Processes." Advances in Applied Probability 47, no. 1 (March 2015): 106–27. http://dx.doi.org/10.1239/aap/1427814583.

Full text
Abstract:
In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.
APA, Harvard, Vancouver, ISO, and other styles
29

Dufour, François, and Alexei B. Piunovskiy. "Impulsive Control for Continuous-Time Markov Decision Processes." Advances in Applied Probability 47, no. 01 (March 2015): 106–27. http://dx.doi.org/10.1017/s0001867800007722.

Full text
Abstract:
In this paper our objective is to study continuous-time Markov decision processes on a general Borel state space with both impulsive and continuous controls for the infinite time horizon discounted cost. The continuous-time controlled process is shown to be nonexplosive under appropriate hypotheses. The so-called Bellman equation associated to this control problem is studied. Sufficient conditions ensuring the existence and the uniqueness of a bounded measurable solution to this optimality equation are provided. Moreover, it is shown that the value function of the optimization problem under consideration satisfies this optimality equation. Sufficient conditions are also presented to ensure on the one hand the existence of an optimal control strategy, and on the other hand the existence of a ε-optimal control strategy. The decomposition of the state space into two disjoint subsets is exhibited where, roughly speaking, one should apply a gradual action or an impulsive action correspondingly to obtain an optimal or ε-optimal strategy. An interesting consequence of our previous results is as follows: the set of strategies that allow interventions at time t = 0 and only immediately after natural jumps is a sufficient set for the control problem under consideration.
APA, Harvard, Vancouver, ISO, and other styles
30

Roijers, Diederik, Erwin Walraven, and Matthijs Spaan. "Bootstrapping LPs in Value Iteration for Multi-Objective and Partially Observable MDPs." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 218–26. http://dx.doi.org/10.1609/icaps.v28i1.13903.

Full text
Abstract:
Iteratively solving a set of linear programs (LPs) is a common strategy for solving various decision-making problems in Artificial Intelligence, such as planning in multi-objective or partially observable Markov Decision Processes (MDPs). A prevalent feature is that the solutions to these LPs become increasingly similar as the solving algorithm converges, because the solution computed by the algorithm approaches the fixed point of a Bellman backup operator. In this paper, we propose to speed up the solving process of these LPs by bootstrapping based on similar LPs solved previously. We use these LPs to initialize a subset of relevant LP constraints, before iteratively generating the remaining constraints. The resulting algorithm is the first to consider such information sharing across iterations. We evaluate our approach on planning in Multi-Objective MDPs (MOMDPs) and Partially Observable MDPs (POMDPs), showing that it solves fewer LPs than the state of the art, which leads to a significant speed-up. Moreover, for MOMDPs we show that our method scales better in both the number of states and the number of objectives, which is vital for multi-objective planning.
APA, Harvard, Vancouver, ISO, and other styles
31

Matignon, Laetitia, Laurent Jeanpierre, and Abdel-Illah Mouaddib. "Coordinated Multi-Robot Exploration Under Communication Constraints Using Decentralized Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 2017–23. http://dx.doi.org/10.1609/aaai.v26i1.8380.

Full text
Abstract:
Recent works on multi-agent sequential decision making using decentralized partially observable Markov decision processes have been concerned with interaction-oriented resolution techniques and provide promising results. These techniques take advantage of local interactions and coordination. In this paper, we propose an approach based on an interaction-oriented resolution of decentralized decision makers. To this end, distributed value functions (DVF) have been used by decoupling the multi-agent problem into a set of individual agent problems. However existing DVF techniques assume permanent and free communication between the agents. In this paper, we extend the DVF methodology to address full local observability, limited share of information and communication breaks. We apply our new DVF in a real-world application consisting of multi-robot exploration where each robot computes locally a strategy that minimizes the interactions between the robots and maximizes the space coverage of the team even under communication constraints. Our technique has been implemented and evaluated in simulation and in real-world scenarios during a robotic challenge for the exploration and mapping of an unknown environment. Experimental results from real-world scenarios and from the challenge are given where our system was vice-champion.
APA, Harvard, Vancouver, ISO, and other styles
32

Bernstein, D. S., C. Amato, E. A. Hansen, and S. Zilberstein. "Policy Iteration for Decentralized Control of Markov Decision Processes." Journal of Artificial Intelligence Research 34 (March 1, 2009): 89–132. http://dx.doi.org/10.1613/jair.2667.

Full text
Abstract:
Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.
APA, Harvard, Vancouver, ISO, and other styles
33

Lacerda, Bruno, David Parker, and Nick Hawes. "Multi-Objective Policy Generation for Mobile Robots under Probabilistic Time-Bounded Guarantees." Proceedings of the International Conference on Automated Planning and Scheduling 27 (June 5, 2017): 504–12. http://dx.doi.org/10.1609/icaps.v27i1.13865.

Full text
Abstract:
We present a methodology for the generation of mobile robot controllers which offer probabilistic time-bounded guarantees on successful task completion, whilst also trying to satisfy soft goals. The approach is based on a stochastic model of the robot’s environment and action execution times, a set of soft goals, and a formal task specification in co-safe linear temporal logic, which are analysed using multi-objective model checking techniques for Markov decision processes. For efficiency, we propose a novel two-step approach. First, we explore policies on the Pareto front for minimising expected task execution time whilst optimising the achievement of soft goals. Then, we use this to prune a model with more detailed timing information, yielding a time-dependent policy for which more fine-grained probabilistic guarantees can be provided. We illustrate and evaluate the generation of policies on a delivery task in a care home scenario, where the robot also tries to engage in entertainment activities with the patients.
APA, Harvard, Vancouver, ISO, and other styles
34

Linzner, Dominik, and Heinz Koeppl. "A Variational Perturbative Approach to Planning in Graph-Based Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7203–10. http://dx.doi.org/10.1609/aaai.v34i05.6210.

Full text
Abstract:
Coordinating multiple interacting agents to achieve a common goal is a difficult task with huge applicability. This problem remains hard to solve, even when limiting interactions to be mediated via a static interaction-graph. We present a novel approximate solution method for multi-agent Markov decision problems on graphs, based on variational perturbation theory. We adopt the strategy of planning via inference, which has been explored in various prior works. We employ a non-trivial extension of a novel high-order variational method that allows for approximate inference in large networks and has been shown to surpass the accuracy of existing variational methods. To compare our method to two state-of-the-art methods for multi-agent planning on graphs, we apply the method different standard GMDP problems. We show that in cases, where the goal is encoded as a non-local cost function, our method performs well, while state-of-the-art methods approach the performance of random guess. In a final experiment, we demonstrate that our method brings significant improvement for synchronization tasks.
APA, Harvard, Vancouver, ISO, and other styles
35

Ahluwalia, Vinayak S., Lauren N. Steimle, and Brian T. Denton. "Policy-based branch-and-bound for infinite-horizon Multi-model Markov decision processes." Computers & Operations Research 126 (February 2021): 105108. http://dx.doi.org/10.1016/j.cor.2020.105108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

OSADA, H. "CHQ: A Multi-Agent Reinforcement Learning Scheme for Partially Observable Markov Decision Processes." IEICE Transactions on Information and Systems E88-D, no. 5 (May 1, 2005): 1004–11. http://dx.doi.org/10.1093/ietisy/e88-d.5.1004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Brázdil, Tomáš, Krishnendu Chatterjee, Petr Novotný, and Jiří Vahala. "Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 9794–801. http://dx.doi.org/10.1609/aaai.v34i06.6531.

Full text
Abstract:
Markov decision processes (MDPs) are the defacto framework for sequential decision making in the presence of stochastic uncertainty. A classical optimization criterion for MDPs is to maximize the expected discounted-sum payoff, which ignores low probability catastrophic events with highly negative impact on the system. On the other hand, risk-averse policies require the probability of undesirable events to be below a given threshold, but they do not account for optimization of the expected payoff. We consider MDPs with discounted-sum payoff with failure states which represent catastrophic outcomes. The objective of risk-constrained planning is to maximize the expected discounted-sum payoff among risk-averse policies that ensure the probability to encounter a failure state is below a desired threshold. Our main contribution is an efficient risk-constrained planning algorithm that combines UCT-like search with a predictor learned through interaction with the MDP (in the style of AlphaZero) and with a risk-constrained action selection via linear programming. We demonstrate the effectiveness of our approach with experiments on classical MDPs from the literature, including benchmarks with an order of 106 states.
APA, Harvard, Vancouver, ISO, and other styles
38

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Full text
Abstract:
This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.
APA, Harvard, Vancouver, ISO, and other styles
39

Rigter, Marc, Bruno Lacerda, and Nick Hawes. "Minimax Regret Optimisation for Robust Planning in Uncertain Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (May 18, 2021): 11930–38. http://dx.doi.org/10.1609/aaai.v35i13.17417.

Full text
Abstract:
The parameters for a Markov Decision Process (MDP) often cannot be specified exactly. Uncertain MDPs (UMDPs) capture this model ambiguity by defining sets which the parameters belong to. Minimax regret has been proposed as an objective for planning in UMDPs to find robust policies which are not overly conservative. In this work, we focus on planning for Stochastic Shortest Path (SSP) UMDPs with uncertain cost and transition functions. We introduce a Bellman equation to compute the regret for a policy. We propose a dynamic programming algorithm that utilises the regret Bellman equation, and show that it optimises minimax regret exactly for UMDPs with independent uncertainties. For coupled uncertainties, we extend our approach to use options to enable a trade off between computation and solution quality. We evaluate our approach on both synthetic and real-world domains, showing that it significantly outperforms existing baselines.
APA, Harvard, Vancouver, ISO, and other styles
40

Dufour, François, and A. B. Piunovskiy. "The Expected Total Cost Criterion for Markov Decision Processes under Constraints." Advances in Applied Probability 45, no. 3 (September 2013): 837–59. http://dx.doi.org/10.1239/aap/1377868541.

Full text
Abstract:
In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.
APA, Harvard, Vancouver, ISO, and other styles
41

Dufour, François, and A. B. Piunovskiy. "The Expected Total Cost Criterion for Markov Decision Processes under Constraints." Advances in Applied Probability 45, no. 03 (September 2013): 837–59. http://dx.doi.org/10.1017/s0001867800006601.

Full text
Abstract:
In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be nonnegative or to be bounded below. Several examples are presented to illustrate our results.
APA, Harvard, Vancouver, ISO, and other styles
42

Kariotoglou, Nikolaos, Maryam Kamgarpour, Tyler H. Summers, and John Lygeros. "The Linear Programming Approach to Reach-Avoid Problems for Markov Decision Processes." Journal of Artificial Intelligence Research 60 (October 4, 2017): 263–85. http://dx.doi.org/10.1613/jair.5500.

Full text
Abstract:
One of the most fundamental problems in Markov decision processes is analysis and control synthesis for safety and reachability specifications. We consider the stochastic reach-avoid problem, in which the objective is to synthesize a control policy to maximize the probability of reaching a target set at a given time, while staying in a safe set at all prior times. We characterize the solution to this problem through an infinite dimensional linear program. We then develop a tractable approximation to the infinite dimensional linear program through finite dimensional approximations of the decision space and constraints. For a large class of Markov decision processes modeled by Gaussian mixtures kernels we show that through a proper selection of the finite dimensional space, one can further reduce the computational complexity of the resulting linear program. We validate the proposed method and analyze its potential with numerical case studies.
APA, Harvard, Vancouver, ISO, and other styles
43

Salari, Nooshin, and Viliam Makis. "Application of Markov renewal theory and semi‐Markov decision processes in maintenance modeling and optimization of multi‐unit systems." Naval Research Logistics (NRL) 67, no. 7 (August 3, 2020): 548–58. http://dx.doi.org/10.1002/nav.21932.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Bouton, Maxime, Jana Tumova, and Mykel J. Kochenderfer. "Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10061–68. http://dx.doi.org/10.1609/aaai.v34i06.6563.

Full text
Abstract:
Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.
APA, Harvard, Vancouver, ISO, and other styles
45

Zhang, Zhicong, Shuai Li, and Xiaohui Yan. "Online Self-Organizing Network Control with Time Averaged Weighted Throughput Objective." Discrete Dynamics in Nature and Society 2018 (2018): 1–11. http://dx.doi.org/10.1155/2018/4184805.

Full text
Abstract:
We study an online multisource multisink queueing network control problem characterized with self-organizing network structure and self-organizing job routing. We decompose the self-organizing queueing network control problem into a series of interrelated Markov Decision Processes and construct a control decision model for them based on the coupled reinforcement learning (RL) architecture. To maximize the mean time averaged weighted throughput of the jobs through the network, we propose a reinforcement learning algorithm with time averaged reward to deal with the control decision model and obtain a control policy integrating the jobs routing selection strategy and the jobs sequencing strategy. Computational experiments verify the learning ability and the effectiveness of the proposed reinforcement learning algorithm applied in the investigated self-organizing network control problem.
APA, Harvard, Vancouver, ISO, and other styles
46

Park, Bumjin, Cheongwoong Kang, and Jaesik Choi. "Cooperative Multi-Robot Task Allocation with Reinforcement Learning." Applied Sciences 12, no. 1 (December 28, 2021): 272. http://dx.doi.org/10.3390/app12010272.

Full text
Abstract:
This paper deals with the concept of multi-robot task allocation, referring to the assignment of multiple robots to tasks such that an objective function is maximized. The performance of existing meta-heuristic methods worsens as the number of robots or tasks increases. To tackle this problem, a novel Markov decision process formulation for multi-robot task allocation is presented for reinforcement learning. The proposed formulation sequentially allocates robots to tasks to minimize the total time taken to complete them. Additionally, we propose a deep reinforcement learning method to find the best allocation schedule for each problem. Our method adopts the cross-attention mechanism to compute the preference of robots to tasks. The experimental results show that the proposed method finds better solutions than meta-heuristic methods, especially when solving large-scale allocation problems.
APA, Harvard, Vancouver, ISO, and other styles
47

Lachikhin, A. V. "ALGORITHMS FOR FINDING OPTIMAL POLICY FOR INTELLIGENT AGENTS BASED ON MARKOV DECISION-MAKING PROCESSES." Issues of radio electronics, no. 11 (November 20, 2018): 29–32. http://dx.doi.org/10.21778/2218-5453-2018-11-29-32.

Full text
Abstract:
Currently, the paradigm of intelligent agents and multi-agent systems is actively developing. The policy of agents ‘ actions can be represented as a Markov decision-making process. Such agents need methods to develop optimal policies. The purpose of this study is to review existing techniques, determine the possibility and conditions of their application. The main approaches based on linear and dynamic programming are considered. The specific algorithms used to find the extreme value of utility are given. The method of linear programming - simplex method, and the method of dynamic programming method-iteration of values are considered. The equations necessary to find the optimal policy of intelligent agent actions are given. Restrictions of application of various algorithms are considered. Conclusions the most suitable method for finding the optimal policy is the iteration of values.
APA, Harvard, Vancouver, ISO, and other styles
48

Giuliani, M., S. Galelli, and R. Soncini-Sessa. "A dimensionality reduction approach for many-objective Markov Decision Processes: Application to a water reservoir operation problem." Environmental Modelling & Software 57 (July 2014): 101–14. http://dx.doi.org/10.1016/j.envsoft.2014.02.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Ahmed, Asrar, Pradeep Varakantham, Meghna Lowalekar, Yossiri Adulyasak, and Patrick Jaillet. "Sampling Based Approaches for Minimizing Regret in Uncertain Markov Decision Processes (MDPs)." Journal of Artificial Intelligence Research 59 (July 1, 2017): 229–64. http://dx.doi.org/10.1613/jair.5242.

Full text
Abstract:
Markov Decision Processes (MDPs) are an effective model to represent decision processes in the presence of transitional uncertainty and reward tradeoffs. However, due to the difficulty in exactly specifying the transition and reward functions in MDPs, researchers have proposed uncertain MDP models and robustness objectives in solving those models. Most approaches for computing robust policies have focused on the computation of maximin policies which maximize the value in the worst case amongst all realisations of uncertainty. Given the overly conservative nature of maximin policies, recent work has proposed minimax regret as an ideal alternative to the maximin objective for robust optimization. However, existing algorithms for handling minimax regret are restricted to models with uncertainty over rewards only and they are also limited in their scalability. Therefore, we provide a general model of uncertain MDPs that considers uncertainty over both transition and reward functions. Furthermore, we also consider dependence of the uncertainty across different states and decision epochs. We also provide a mixed integer linear program formulation for minimizing regret given a set of samples of the transition and reward functions in the uncertain MDP. In addition, we provide two myopic variants of regret, namely Cumulative Expected Myopic Regret (CEMR) and One Step Regret (OSR) that can be optimized in a scalable manner. Specifically, we provide dynamic programming and policy iteration based algorithms to optimize CEMR and OSR respectively. Finally, to demonstrate the effectiveness of our approaches, we provide comparisons on two benchmark problems from literature. We observe that optimizing the myopic variants of regret, OSR and CEMR are better than directly optimizing the regret.
APA, Harvard, Vancouver, ISO, and other styles
50

Xing Zhang, Qiuhong Zhao, and Guoping Xia. "Research on Integrated Optimization Problem in a Multi-product Supply Chain Based on Markov Decision Processes." Journal of Convergence Information Technology 7, no. 1 (January 31, 2012): 45–53. http://dx.doi.org/10.4156/jcit.vol7.issue1.6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography