Log in

Relevant bibliographies by topics / POMDT planning / Journal articles

Journal articles on the topic 'POMDT planning'

To see the other types of publications on this topic, follow the link: POMDT planning.

Author: Grafiati

Published: 10 December 2022

Last updated: 29 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'POMDT planning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Rafferty, Anna N., Emma Brunskill, Thomas L. Griffiths, and Patrick Shafto. "Faster Teaching via POMDP Planning." Cognitive Science 40, no. 6 (September 24, 2015): 1290–332. http://dx.doi.org/10.1111/cogs.12290.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Luo, Yuanfu, Haoyu Bai, David Hsu, and Wee Sun Lee. "Importance sampling for online planning under uncertainty." International Journal of Robotics Research 38, no. 2-3 (June 19, 2018): 162–81. http://dx.doi.org/10.1177/0278364918780322.

Full text

Abstract:

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

APA, Harvard, Vancouver, ISO, and other styles

3

Khalvati, Koosha, and Alan Mackworth. "A Fast Pairwise Heuristic for Planning under Uncertainty." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 503–9. http://dx.doi.org/10.1609/aaai.v27i1.8672.

Full text

Abstract:

POMDP (Partially Observable Markov Decision Process) is a mathematical framework that models planning under uncertainty. Solving a POMDP is an intractable problem and even the state of the art POMDP solvers are too computationally expensive for large domains. This is a major bottleneck. In this paper, we propose a new heuristic, called the pairwise heuristic, that can be used in a one-step greedy strategy to find a near optimal solution for POMDP problems very quickly. This approach is a good candidate for large problems where real-time solution is a necessity but exact optimality of the solution is not vital. The pairwise heuristic uses the optimal solutions for pairs of states. For each pair of states in the POMDP, we find the optimal sequence of actions to resolve the uncertainty and to maximize the reward, given that the agent is uncertain about which state of the pair it is in. Then we use these sequences as a heuristic and find the optimal action in each step of the greedy strategy using this heuristic. We have tested our method on the available large classical test benchmarks in various domains. The resulting total reward is close to, if not greater than, the total reward obtained by other state of the art POMDP solvers, while the time required to find the solution is always much less.

APA, Harvard, Vancouver, ISO, and other styles

4

Ross, S., J. Pineau, S. Paquet, and B. Chaib-draa. "Online Planning Algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (July 29, 2008): 663–704. http://dx.doi.org/10.1613/jair.2567.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.

APA, Harvard, Vancouver, ISO, and other styles

5

Ye, Nan, Adhiraj Somani, David Hsu, and Wee Sun Lee. "DESPOT: Online POMDP Planning with Regularization." Journal of Artificial Intelligence Research 58 (January 26, 2017): 231–66. http://dx.doi.org/10.1613/jair.5328.

Full text

Abstract:

The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.

APA, Harvard, Vancouver, ISO, and other styles

6

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.

APA, Harvard, Vancouver, ISO, and other styles

7

Brafman, Ronen, Guy Shani, and Shlomo Zilberstein. "Qualitative Planning under Partial Observability in Multi-Agent Domains." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 130–37. http://dx.doi.org/10.1609/aaai.v27i1.8643.

Full text

Abstract:

Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more “classical” in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.

APA, Harvard, Vancouver, ISO, and other styles

8

He, Ruijie, Emma Brunskill, and Nicholas Roy. "PUMA: Planning Under Uncertainty with Macro-Actions." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1089–95. http://dx.doi.org/10.1609/aaai.v24i1.7749.

Full text

Abstract:

Planning in large, partially observable domains is challenging, especially when a long-horizon lookahead is necessary to obtain a good policy. Traditional POMDP planners that plan a different potential action for each future observation can be prohibitively expensive when planning many steps ahead. An efficient solution for planning far into the future in fully observable domains is to use temporally-extended sequences of actions, or "macro-actions." In this paper, we present a POMDP algorithm for planning under uncertainty with macro-actions (PUMA) that automatically constructs and evaluates open-loop macro-actions within forward-search planning, where the planner branches on observations only at the end of each macro-action. Additionally, we show how to incrementally refine the plan over time, resulting in an anytime algorithm that provably converges to an epsilon-optimal policy. In experiments on several large POMDP problems which require a long horizon lookahead, PUMA outperforms existing state-of-the art solvers.

APA, Harvard, Vancouver, ISO, and other styles

9

Yang, Qiming, Jiancheng Xu, Haibao Tian, and Yong Wu. "Decision Modeling of UAV On-Line Path Planning Based on IMM." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 2 (April 2018): 323–31. http://dx.doi.org/10.1051/jnwpu/20183620323.

Full text

Abstract:

In order to enhance the capability of tracking targets autonomously of UAV, a model for UAV on-line path planning is established based on the theoretical framework of partially observable markov decision process(POMDP). The elements of the POMDP model are analyzed and described. According to the diversity of the target motion in real world, the law of state transition in POMDP model is described by the method of Interactive Multiple Model(IMM) To adapt to the target maneuvering changes. The action strategy of the UAV is calculated through nominal belief-state optimization(NBO) algorithm which is designed to search optimal action policy to minimize the cumulative cost of action. The generated action strategy controls the UAV flight. The simulation results show that the established POMDP model can achieve autonomous planning for UAV route, and it can control the UAV to effectively track target. The planning path is more reasonable and efficient than the result of using single state transition law.

APA, Harvard, Vancouver, ISO, and other styles

10

WANG, YI, SHIQI ZHANG, and JOOHYUNG LEE. "Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language." Theory and Practice of Logic Programming 19, no. 5-6 (September 2019): 1090–106. http://dx.doi.org/10.1017/s1471068419000371.

Full text

Abstract:

AbstractTo be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (icorpp), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of icorpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate icorpp’s reasoning and planning components. In particular, we extend probabilistic action language pBC+ to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp, which compiles a pBC+ action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC+ action description. Our experiments show that it retains the advantages of icorpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner.

APA, Harvard, Vancouver, ISO, and other styles

11

Sonu, Ekhlas, Yingke Chen, and Prashant Doshi. "Decision-Theoretic Planning Under Anonymity in Agent Populations." Journal of Artificial Intelligence Research 59 (August 29, 2017): 725–70. http://dx.doi.org/10.1613/jair.5449.

Full text

Abstract:

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

APA, Harvard, Vancouver, ISO, and other styles

12

Zhang, Zongzhang, Michael Littman, and Xiaoping Chen. "Covering Number as a Complexity Measure for POMDP Planning and Learning." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1853–59. http://dx.doi.org/10.1609/aaai.v26i1.8360.

Full text

Abstract:

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.

APA, Harvard, Vancouver, ISO, and other styles

13

Zhao, Wenrui, and Weidong Chen. "Hierarchical POMDP planning for object manipulation in clutter." Robotics and Autonomous Systems 139 (May 2021): 103736. http://dx.doi.org/10.1016/j.robot.2021.103736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Feng, Qi, Xuezhong Zhou, Houkuan Huang, Xiaoping Zhang, and Runshun Zhang. "Modelling traditional Chinese medicine therapy planning with POMDP." International Journal of Functional Informatics and Personalised Medicine 4, no. 2 (2013): 145. http://dx.doi.org/10.1504/ijfipm.2013.057405.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Yali, Zhang. "Path Planning Algorithm based on Partially POMDP Model." International Journal of Hybrid Information Technology 9, no. 11 (November 30, 2016): 315–22. http://dx.doi.org/10.14257/ijhit.2016.9.11.27.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Lee, Taekhee, and Young J. Kim. "Massively parallel motion planning algorithms under uncertainty using POMDP." International Journal of Robotics Research 35, no. 8 (August 21, 2015): 928–42. http://dx.doi.org/10.1177/0278364915594856.

Full text

Abstract:

We present new parallel algorithms that solve continuous-state partially observable Markov decision process (POMDP) problems using the GPU (gPOMDP) and a hybrid of the GPU and CPU (hPOMDP). We choose the Monte Carlo value iteration (MCVI) method as our base algorithm and parallelize this algorithm using the multi-level parallel formulation of MCVI. For each parallel level, we propose efficient algorithms to utilize the massive data parallelism available on modern GPUs. Our GPU-based method uses the two workload distribution techniques, compute/data interleaving and workload balancing, in order to obtain the maximum parallel performance at the highest level. Here we also present a CPU–GPU hybrid method that takes advantage of both CPU and GPU parallelism in order to solve highly complex POMDP planning problems. The CPU is responsible for data preparation, while the GPU performs Monte Cacrlo simulations; these operations are performed concurrently using the compute/data overlap technique between the CPU and GPU. To the best of the authors’ knowledge, our algorithms are the first parallel algorithms that efficiently execute POMDP in a massively parallel fashion utilizing the GPU or a hybrid of the GPU and CPU. Our algorithms outperform the existing CPU-based algorithm by a factor of 75–99 based on the chosen benchmark.

APA, Harvard, Vancouver, ISO, and other styles

17

Huang, Xin, Sungkweon Hong, Andreas Hofmann, and Brian C. Williams. "Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 214–22. http://dx.doi.org/10.1609/icaps.v29i1.3479.

Full text

Abstract:

A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or overconservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.

APA, Harvard, Vancouver, ISO, and other styles

18

Amato, Christopher, George Konidaris, Ariel Anders, Gabriel Cruz, Jonathan P. How, and Leslie P. Kaelbling. "Policy search for multi-robot coordination under uncertainty." International Journal of Robotics Research 35, no. 14 (December 2016): 1760–78. http://dx.doi.org/10.1177/0278364916679611.

Full text

Abstract:

We introduce a principled method for multi-robot coordination based on a general model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations. A new MacDec-POMDP planning algorithm is presented that searches over policies represented as finite-state controllers, rather than the previous policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significant performance improvements over previous methods and show that our method can be used for actual multi-robot systems through experiments on a cooperative multi-robot bartending domain.

APA, Harvard, Vancouver, ISO, and other styles

19

Amato, Christopher, George Konidaris, Leslie P. Kaelbling, and Jonathan P. How. "Modeling and Planning with Macro-Actions in Decentralized POMDPs." Journal of Artificial Intelligence Research 64 (March 25, 2019): 817–59. http://dx.doi.org/10.1613/jair.1.11418.

Full text

Abstract:

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.

APA, Harvard, Vancouver, ISO, and other styles

20

Bubnov, Yakov. "DNS Data Exfiltration Detection Using Online Planning for POMDP." European Journal of Engineering Research and Science 4, no. 9 (September 10, 2019): 22–25. http://dx.doi.org/10.24018/ejers.2019.4.9.1500.

Full text

Abstract:

This paper addresses a problem of blocking Domain Name System (DNS) exfiltration in a computer network. DNS exfiltration implies unauthorized transfer of sensitive data from the organization network to the remote adversary. Given detector of data exfiltration in DNS lookup queries this paper proposes an approach to automate query blocking decisions. More precisely, it defines an L-parametric Partially Observable Markov Decision Process (POMDP) formulation to enforce query blocking strategy on each network egress point, where L is a hyper-parameter that defines necessary level of the network security. The efficiency of the approach is based on (i) absence of interactions between distributed detectors, blocking decisions are taken individually by each detector; (ii) blocking strategy is applied to each particular query, therefore minimizing potentially incorrect blocking decisions.

APA, Harvard, Vancouver, ISO, and other styles

21

Mern, John, Anil Yildiz, Lawrence Bush, Tapan Mukerji, and Mykel J. Kochenderfer. "Improved POMDP Tree Search Planning with Prioritized Action Branching." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 13 (May 18, 2021): 11888–94. http://dx.doi.org/10.1609/aaai.v35i13.17412.

Full text

Abstract:

Online solvers for partially observable Markov decision processes have difficulty scaling to problems with large action spaces. This paper proposes a method called PA-POMCPOW to sample a subset of the action space that provides varying mixtures of exploitation and exploration for inclusion in a search tree. The proposed method first evaluates the action space according to a score function that is a linear combination of expected reward and expected information gain. The actions with the highest score are then added to the search tree during tree expansion. Experiments show that PA-POMCPOW is able to outperform existing state-of-the-art solvers on problems with large discrete action spaces.

APA, Harvard, Vancouver, ISO, and other styles

22

Burks, Luke, Nisar Ahmed, Ian Loefgren, Luke Barbier, Jeremy Muesing, Jamison McGinley, and Sousheel Vunnam. "Collaborative human-autonomy semantic sensing through structured POMDP planning." Robotics and Autonomous Systems 140 (June 2021): 103753. http://dx.doi.org/10.1016/j.robot.2021.103753.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Bubnov, Yakov. "DNS Data Exfiltration Detection Using Online Planning for POMDP." European Journal of Engineering and Technology Research 4, no. 9 (September 10, 2019): 22–25. http://dx.doi.org/10.24018/ejeng.2019.4.9.1500.

Full text

Abstract:

This paper addresses a problem of blocking Domain Name System (DNS) exfiltration in a computer network. DNS exfiltration implies unauthorized transfer of sensitive data from the organization network to the remote adversary. Given detector of data exfiltration in DNS lookup queries this paper proposes an approach to automate query blocking decisions. More precisely, it defines an L-parametric Partially Observable Markov Decision Process (POMDP) formulation to enforce query blocking strategy on each network egress point, where L is a hyper-parameter that defines necessary level of the network security. The efficiency of the approach is based on (i) absence of interactions between distributed detectors, blocking decisions are taken individually by each detector; (ii) blocking strategy is applied to each particular query, therefore minimizing potentially incorrect blocking decisions.

APA, Harvard, Vancouver, ISO, and other styles

24

Zhang, Zongzhang, Qiming Fu, Xiaofang Zhang, and Quan Liu. "Reasoning and predicting POMDP planning complexity via covering numbers." Frontiers of Computer Science 10, no. 4 (January 19, 2016): 726–40. http://dx.doi.org/10.1007/s11704-015-5038-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Wu, Chenyang, Rui Kong, Guoyu Yang, Xianghan Kong, Zongzhang Zhang, Yang Yu, Dong Li, and Wulong Liu. "LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15927–28. http://dx.doi.org/10.1609/aaai.v35i18.17960.

Full text

Abstract:

Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only $K$ scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.

APA, Harvard, Vancouver, ISO, and other styles

26

HU, Xiangtao. "POMDP-based Planning Model of Driving Force during Shield Tunneling." Journal of Mechanical Engineering 50, no. 21 (2014): 84. http://dx.doi.org/10.3901/jme.2014.21.084.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Lauri, Mikko, and Risto Ritala. "Planning for multiple measurement channels in a continuous-state POMDP." Annals of Mathematics and Artificial Intelligence 67, no. 3-4 (March 2013): 283–317. http://dx.doi.org/10.1007/s10472-013-9361-y.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

Eck, Adam, Leen-Kiat Soh, Sam Devlin, and Daniel Kudenko. "Potential-based reward shaping for finite horizon online POMDP planning." Autonomous Agents and Multi-Agent Systems 30, no. 3 (March 5, 2015): 403–45. http://dx.doi.org/10.1007/s10458-015-9292-6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Zhang, Zongzhang, David Hsu, Wee Sun Lee, Zhan Wei Lim, and Aijun Bai. "PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces." Proceedings of the International Symposium on Combinatorial Search 6, no. 1 (September 1, 2021): 238–39. http://dx.doi.org/10.1609/socs.v6i1.18339.

Full text

Abstract:

This paper provides a novel POMDP planning method, called Palm LEAf SEarch (PLEASE), which allows the selection of more than one outcome when their potential impacts are close to the highest one during its forward exploration. Compared with existing trial-based algorithms, PLEASE can save considerable time to propagate the bound improvements of beliefs in deep levels of the search tree to the root belief because of fewer backup operations. Experiments showed that PLEASE scales up SARSOP, one of the fastest algorithms, by orders of magnitude on some POMDP tasks with large observation spaces.

APA, Harvard, Vancouver, ISO, and other styles

30

Tottori, Takehiro, and Tetsuya J. Kobayashi. "Forward and Backward Bellman Equations Improve the Efficiency of the EM Algorithm for DEC-POMDP." Entropy 23, no. 5 (April 29, 2021): 551. http://dx.doi.org/10.3390/e23050551.

Full text

Abstract:

Decentralized partially observable Markov decision process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by the EM algorithm. However, in EM for DEC-POMDP, the forward–backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose the Bellman EM algorithm (BEM) and the modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward–backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM calculates an inverse matrix. We circumvent this shortcoming in MBEM by calculating the forward and backward Bellman equations without the inverse matrix. Our numerical experiments demonstrate that the convergence of MBEM is faster than that of EM.

APA, Harvard, Vancouver, ISO, and other styles

31

Nguyen, Hoa Van, Hamid Rezatofighi, Ba-Ngu Vo, and Damith C. Ranasinghe. "Multi-Objective Multi-Agent Planning for Jointly Discovering and Tracking Mobile Objects." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 7227–35. http://dx.doi.org/10.1609/aaai.v34i05.6213.

Full text

Abstract:

We consider the challenging problem of online planning for a team of agents to autonomously search and track a time-varying number of mobile objects under the practical constraint of detection range limited onboard sensors. A standard POMDP with a value function that either encourages discovery or accurate tracking of mobile objects is inadequate to simultaneously meet the conflicting goals of searching for undiscovered mobile objects whilst keeping track of discovered objects. The planning problem is further complicated by misdetections or false detections of objects caused by range limited sensors and noise inherent to sensor measurements. We formulate a novel multi-objective POMDP based on information theoretic criteria, and an online multi-object tracking filter for the problem. Since controlling multi-agent is a well known combinatorial optimization problem, assigning control actions to agents necessitates a greedy algorithm. We prove that our proposed multi-objective value function is a monotone submodular set function; consequently, the greedy algorithm can achieve a (1-1/e) approximation for maximizing the submodular multi-objective function.

APA, Harvard, Vancouver, ISO, and other styles

32

Pineau, J., G. Gordon, and S. Thrun. "Anytime Point-Based Approximations for Large POMDPs." Journal of Artificial Intelligence Research 27 (November 26, 2006): 335–80. http://dx.doi.org/10.1613/jair.2078.

Full text

Abstract:

The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.

APA, Harvard, Vancouver, ISO, and other styles

33

Yang, Qiming, Jiandong Zhang, and Guoqing Shi. "Modeling of UAV path planning based on IMM under POMDP framework." Journal of Systems Engineering and Electronics 30, no. 03 (June 25, 2019): 545–54. http://dx.doi.org/10.21629/jsee.2019.03.12.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Zheng, Wei, and Hai Lin. "Vector Autoregressive POMDP Model Learning and Planning for Human–Robot Collaboration." IEEE Control Systems Letters 3, no. 3 (July 2019): 775–80. http://dx.doi.org/10.1109/lcsys.2019.2917805.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Bai, Haoyu, David Hsu, and Wee Sun Lee. "Integrated perception and planning in the continuous space: A POMDP approach." International Journal of Robotics Research 33, no. 9 (June 12, 2014): 1288–302. http://dx.doi.org/10.1177/0278364914528255.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Burks, Luke, Ian Loefgren, and Nisar R. Ahmed. "Optimal Continuous State POMDP Planning With Semantic Observations: A Variational Approach." IEEE Transactions on Robotics 35, no. 6 (December 2019): 1488–507. http://dx.doi.org/10.1109/tro.2019.2933720.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Bouton, Maxime, Jana Tumova, and Mykel J. Kochenderfer. "Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10061–68. http://dx.doi.org/10.1609/aaai.v34i06.6563.

Full text

Abstract:

Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

APA, Harvard, Vancouver, ISO, and other styles

38

Kumar, Akshat, Hala Mostafa, and Shlomo Zilberstein. "Dual Formulations for Optimizing Dec-POMDP Controllers." Proceedings of the International Conference on Automated Planning and Scheduling 26 (March 30, 2016): 202–10. http://dx.doi.org/10.1609/icaps.v26i1.13759.

Full text

Abstract:

Decentralized POMDP is an expressive model for multi-agent planning. Finite-state controllers (FSCs)---often used to represent policies for infinite-horizon problems---offer a compact, simple-to-execute policy representation. We exploit novel connections between optimizing decentralized FSCs and the dual linear program for MDPs. Consequently, we describe a dual mixed integer linear program (MIP) for optimizing deterministic FSCs. We exploit the Dec-POMDP structure to devise a compact MIP and formulate constraints that result in policies executable in partially-observable decentralized settings. We show analytically that the dual formulation can also be exploited within the expectation maximization (EM) framework to optimize stochastic FSCs. The resulting EM algorithm can be implemented by solving a sequence of linear programs, without requiring expensive message-passing over the Dec-POMDP DBN. We also present an efficient technique for policy improvement based on a weighted entropy measure. Compared with state-of-the-art FSC methods, our approach offers over an order-of-magnitude speedup, while producing similar or better solutions.

APA, Harvard, Vancouver, ISO, and other styles

39

Galvez-Serna, Julian, Fernando Vanegas, Shahzad Brar, Juan Sandino, David Flannery, and Felipe Gonzalez. "UAV4PE: An Open-Source Framework to Plan UAV Autonomous Missions for Planetary Exploration." Drones 6, no. 12 (December 2, 2022): 391. http://dx.doi.org/10.3390/drones6120391.

Full text

Abstract:

Autonomous Unmanned Aerial Vehicles (UAV) for planetary exploration missions require increased onboard mission-planning and decision-making capabilities to access full operational potential in remote environments (e.g., Antarctica, Mars or Titan). However, the uncertainty introduced by the environment and the limitation of available sensors has presented challenges for planning such missions. Partially Observable Markov Decision Processes (POMDPs) are commonly used to enable decision-making and mission-planning processes that account for environmental, perceptional (extrinsic) and actuation (intrinsics) uncertainty. Here, we propose the UAV4PE framework, a testing framework for autonomous UAV missions using POMDP formulations. This framework integrates modular components for simulation, emulation, UAV guidance, navigation and mission planning. State-of-the-art tools such as python, C++, ROS, PX4 and JuliaPOMDP are employed by the framework, and we used python data-science libraries for the analysis of the experimental results. The source code and the experiment data are included in the UAV4PE framework. The POMDP formulation proposed here was able to plan and command a UAV-based planetary exploration mission in simulation, emulation and real-world experiments. The experiments evaluated key indicators such as the mission success rate, the surface area explored and the number of commands (actions) executed. We also discuss future work aimed at improving the UAV4PE framework, and the autonomous UAV mission planning formulation for planetary exploration.

APA, Harvard, Vancouver, ISO, and other styles

40

Walker, Ory, Fernando Vanegas, and Felipe Gonzalez. "A Framework for Multi-Agent UAV Exploration and Target-Finding in GPS-Denied and Partially Observable Environments." Sensors 20, no. 17 (August 21, 2020): 4739. http://dx.doi.org/10.3390/s20174739.

Full text

Abstract:

The problem of multi-agent remote sensing for the purposes of finding survivors or surveying points of interest in GPS-denied and partially observable environments remains a challenge. This paper presents a framework for multi-agent target-finding using a combination of online POMDP based planning and Deep Reinforcement Learning based control. The framework is implemented considering planning and control as two separate problems. The planning problem is defined as a decentralised multi-agent graph search problem and is solved using a modern online POMDP solver. The control problem is defined as a local continuous-environment exploration problem and is solved using modern Deep Reinforcement Learning techniques. The proposed framework combines the solution to both of these problems and testing shows that it enables multiple agents to find a target within large, simulated test environments in the presence of unknown obstacles and obstructions. The proposed approach could also be extended or adapted to a number of time sensitive remote-sensing problems, from searching for multiple survivors during a disaster to surveying points of interest in a hazardous environment by adjusting the individual model definitions.

APA, Harvard, Vancouver, ISO, and other styles

41

Torrado, Ruben, Jesus Rios, and Gerald Tesauro. "Optimal Sequential Drilling for Hydrocarbon Field Development Planning." Proceedings of the AAAI Conference on Artificial Intelligence 31, no. 2 (February 11, 2017): 4734–39. http://dx.doi.org/10.1609/aaai.v31i2.19103.

Full text

Abstract:

We present a novel approach for planning the development of hydrocarbon fields, taking into account the sequential nature of well drilling decisions and the possibility to react to future information. In a dynamic fashion, we want to optimally decide where to drill each well conditional on every possible piece of information that could be obtained from previous wells. We formulate this sequential drilling optimization problem as a POMDP, and propose an algorithm to search for an optimal drilling policy. We show that our new approach leads to better results compared to the current standard in the oil and gas (O&G) industry.

APA, Harvard, Vancouver, ISO, and other styles

42

Sandino, Juan, Frederic Maire, Peter Caccetta, Conrad Sanderson, and Felipe Gonzalez. "Drone-Based Autonomous Motion Planning System for Outdoor Environments under Object Detection Uncertainty." Remote Sensing 13, no. 21 (November 8, 2021): 4481. http://dx.doi.org/10.3390/rs13214481.

Full text

Abstract:

Recent advances in autonomy of unmanned aerial vehicles (UAVs) have increased their use in remote sensing applications, such as precision agriculture, biosecurity, disaster monitoring, and surveillance. However, onboard UAV cognition capabilities for understanding and interacting in environments with imprecise or partial observations, for objects of interest within complex scenes, are limited, and have not yet been fully investigated. This limitation of onboard decision-making under uncertainty has delegated the motion planning strategy in complex environments to human pilots, which rely on communication subsystems and real-time telemetry from ground control stations. This paper presents a UAV-based autonomous motion planning and object finding system under uncertainty and partial observability in outdoor environments. The proposed system architecture follows a modular design, which allocates most of the computationally intensive tasks to a companion computer onboard the UAV to achieve high-fidelity results in simulated environments. We demonstrate the system with a search and rescue (SAR) case study, where a lost person (victim) in bushland needs to be found using a sub-2 kg quadrotor UAV. The navigation problem is mathematically formulated as a partially observable Markov decision process (POMDP). A motion strategy (or policy) is obtained once a POMDP is solved mid-flight and in real time using augmented belief trees (ABT) and the TAPIR toolkit. The system’s performance was assessed using three flight modes: (1) mission mode, which follows a survey plan and used here as the baseline motion planner; (2) offboard mode, which runs the POMDP-based planner across the flying area; and (3) hybrid mode, which combines mission and offboard modes for improved coverage in outdoor scenarios. Results suggest the increased cognitive power added by the proposed motion planner and flight modes allow UAVs to collect more accurate victim coordinates compared to the baseline planner. Adding the proposed system to UAVs results in improved robustness against potential false positive readings of detected objects caused by data noise, inaccurate detections, and elevated complexity to navigate in time-critical applications, such as SAR.

APA, Harvard, Vancouver, ISO, and other styles

43

Erez, Tom, Julian Tramper, William Smart, and Stan Gielen. "A POMDP Model of Eye-Hand Coordination." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 952–57. http://dx.doi.org/10.1609/aaai.v25i1.8007.

Full text

Abstract:

This paper presents a generative model of eye-hand coordination. We use numerical optimization to solve for the joint behavior of an eye and two hands, deriving a predicted motion pattern from first principles, without imposing heuristics. We model the planar scene as a POMDP with 17 continuous state dimensions. Belief-space optimization is facilitated by using a nominal-belief heuristic, whereby we assume (during planning) that the maximum likelihood observation is always obtained. Since a globally-optimal solution for such a high-dimensional domain is computationally intractable, we employ local optimization in the belief domain. By solving for a locally-optimal plan through belief space, we generate a motion pattern of mutual coordination between hands and eye: the eye's saccades disambiguate the scene in a task-relevant manner, and the hands' motions anticipate the eye's saccades. Finally, the model is validated through a behavioral experiment, in which human subjects perform the same eye-hand coordination task. We show how simulation is congruent with the experimental results.

APA, Harvard, Vancouver, ISO, and other styles

44

Ko, Li Ling, David Hsu, Wee Sun Lee, and Sylvie Ong. "Structured Parameter Elicitation." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1102–7. http://dx.doi.org/10.1609/aaai.v24i1.7744.

Full text

Abstract:

The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable Markov decision processes (POMDPs) provide a principled framework for such uncertainty planning tasks, but they suffer from high computational complexity. However, POMDPs for parameter elicitation often possess special structural properties, specifically, factorization and symmetry. This work identifies these properties and exploits them for efficient solution through a factored belief representation. The experimental results show that our new POMDP solvers outperform SARSOP and MOMDP, two of the fastest general-purpose POMDP solvers available, and can handle significantly larger problems.

APA, Harvard, Vancouver, ISO, and other styles

45

Lee, Jongmin, Jungpyo Hong, Jaeyoung Park, Kanghoon Lee, Kee-Eung Kim, Il-Chul Moon, and Jae-Hyun Park. "Case Studies on Planning and Learning for Large-Scale CGFs with POMDPs through Counterfire and Mechanized Infantry Scenarios." KIISE Transactions on Computing Practices 23, no. 6 (June 30, 2017): 343–49. http://dx.doi.org/10.5626/ktcp.2017.23.6.343.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Hoffmann, Joerg, and Alan Fern. "Journal Track Paper Abstracts." Proceedings of the International Conference on Automated Planning and Scheduling 25 (April 8, 2015): 357–58. http://dx.doi.org/10.1609/icaps.v25i1.13688.

Full text

Abstract:

This edited compilation contains abstracts of the journal paper summaries presented at the 25th ICAPs conference Journal Paper Track. Papers include Simple Regret Optimization in Online Planning for Markov Decision Processes, by Zohar Feldman and Carmel Domshlak; Energy Efficient Execution of POMDP Policies by Marek Grzes, Pascal Poupart, Xiao Yang, and Jesse Hoey; Envisioning the Qualitative Effects of Robot Manipulation Actions Using Simulation-Based Projections by Lars Kunze and Michael Beetz; and Distributed Heuristic Forward Search for Multi-Agent Planning by Raz Nissim and Ronen Brafman.

APA, Harvard, Vancouver, ISO, and other styles

47

Shmaryahu, Dorin, Guy Shani, Joerg Hoffmann, and Marcel Steinmetz. "Simulated Penetration Testing as Contingent Planning." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 241–49. http://dx.doi.org/10.1609/icaps.v28i1.13902.

Full text

Abstract:

In penetration testing (pentesting), network administrators attack their own network to identify and fix vulnerabilities. Planning-based simulated pentesting can achieve much higher testing coverage than manual pentesting. A key challenge is for the attack planning to imitate human hackers as faithfully as possible. POMDP models have been proposed to this end, yet they are computationally very hard, and it is unclear how to acquire the models in practice. At the other extreme, classical planning models are scalable and simple to obtain, yet completely ignore the incomplete knowledge characteristic of hacking. We propose contingent planning as a new middle ground, feasible in both computation burden and model acquisition effort while allowing for a representation of incomplete knowledge. We design the model, show how to adapt available solvers, and show how to acquire the model from real network scans in practice. We experiment on real networks and show that our approach scales to practical input sizes.

APA, Harvard, Vancouver, ISO, and other styles

48

Roy, N., G. Gordon, and S. Thrun. "Finding Approximate POMDP solutions Through Belief Compression." Journal of Artificial Intelligence Research 23 (January 1, 2005): 1–40. http://dx.doi.org/10.1613/jair.1496.

Full text

Abstract:

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.

APA, Harvard, Vancouver, ISO, and other styles

49

Capitan, Jesus, Matthijs Spaan, Luis Merino, and Anibal Ollero. "Decentralized Multi-Robot Cooperation with Auctioned POMDPs." Proceedings of the International Conference on Automated Planning and Scheduling 24 (May 11, 2014): 515–18. http://dx.doi.org/10.1609/icaps.v24i1.13658.

Full text

Abstract:

Planning under uncertainty faces a scalability problem when considering multi-robot teams, as the information space scales exponentially with the number of robots. To address this issue, this paper proposes to decentralize multi-robot Partially Observable Markov Decision Processes (POMDPs) while maintaining cooperation between robots by using POMDP policy auctions. Auctions provide a flexible way of coordinating individual policies modeled by POMDPs and have low communication requirements. Additionally, communication models in the multi-agent POMDP literature severely mismatch with real inter-robot communication. We address this issue by exploiting a decentralized data fusion method in order to efficiently maintain a joint belief state among the robots. The paper presents results in two different applications: environmental monitoring with Unmanned Aerial Vehicles (UAVs); and cooperative tracking, in which several robots have to jointly track a moving target of interest.

APA, Harvard, Vancouver, ISO, and other styles

50

Walraven, Erwin, and Matthijs T. J. Spaan. "Column Generation Algorithms for Constrained POMDPs." Journal of Artificial Intelligence Research 62 (July 17, 2018): 489–533. http://dx.doi.org/10.1613/jair.1.11216.

Full text

Abstract:

In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!