Journal articles on the topic 'POMDPs'

To see the other types of publications on this topic, follow the link: POMDPs.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'POMDPs.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.
APA, Harvard, Vancouver, ISO, and other styles
2

Aras, R., and A. Dutech. "An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs." Journal of Artificial Intelligence Research 37 (March 26, 2010): 329–96. http://dx.doi.org/10.1613/jair.2915.

Full text
Abstract:
Decentralized planning in uncertain environments is a complex task generally dealt with by using a decision-theoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DEC-POMDPs). Although DEC-POMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper, we study an alternate formulation of DEC-POMDPs relying on a sequence-form representation of policies. From this formulation, we show how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to the DEC-POMDPs. We show that these MILPs can be derived either by using some combinatorial characteristics of the optimal solutions of the DEC-POMDPs or by using concepts borrowed from game theory. Through an experimental validation on classical test problems from the DEC-POMDP literature, we compare our approach to existing algorithms. Results show that mathematical programming outperforms dynamic programming but is less efficient than forward search, except for some particular problems. The main contributions of this work are the use of mathematical programming for DEC-POMDPs and a better understanding of DEC-POMDPs and of their solutions. Besides, we argue that our alternate representation of DEC-POMDPs could be helpful for designing novel algorithms looking for approximate solutions to DEC-POMDPs.
APA, Harvard, Vancouver, ISO, and other styles
3

Tennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.

Full text
Abstract:
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.
APA, Harvard, Vancouver, ISO, and other styles
4

Walraven, Erwin, and Matthijs T. J. Spaan. "Column Generation Algorithms for Constrained POMDPs." Journal of Artificial Intelligence Research 62 (July 17, 2018): 489–533. http://dx.doi.org/10.1613/jair.1.11216.

Full text
Abstract:
In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art.
APA, Harvard, Vancouver, ISO, and other styles
5

Doshi, P., and P. J. Gmytrasiewicz. "Monte Carlo Sampling Methods for Approximating Interactive POMDPs." Journal of Artificial Intelligence Research 34 (March 24, 2009): 297–337. http://dx.doi.org/10.1613/jair.2630.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agent’s belief about the physical world, about beliefs of other agents, and about their beliefs about others’ beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity – sometimes also called the curse of history – we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curse’s impact substantially. We provide experimental results and chart future work.
APA, Harvard, Vancouver, ISO, and other styles
6

Walraven, Erwin, and Matthijs T. J. Spaan. "Point-Based Value Iteration for Finite-Horizon POMDPs." Journal of Artificial Intelligence Research 65 (July 11, 2019): 307–41. http://dx.doi.org/10.1613/jair.1.11324.

Full text
Abstract:
Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.
APA, Harvard, Vancouver, ISO, and other styles
7

Ross, S., J. Pineau, S. Paquet, and B. Chaib-draa. "Online Planning Algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (July 29, 2008): 663–704. http://dx.doi.org/10.1613/jair.2567.

Full text
Abstract:
Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.
APA, Harvard, Vancouver, ISO, and other styles
8

NI, YAODONG, and ZHI-QIANG LIU. "BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 21, no. 06 (December 2013): 821–63. http://dx.doi.org/10.1142/s0218488513500396.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.
APA, Harvard, Vancouver, ISO, and other styles
9

Oliehoek, F. A., M. T. J. Spaan, and N. Vlassis. "Optimal and Approximate Q-value Functions for Decentralized POMDPs." Journal of Artificial Intelligence Research 32 (May 28, 2008): 289–353. http://dx.doi.org/10.1613/jair.2447.

Full text
Abstract:
Decision-theoretic planning is a popular approach to sequential decision making problems, because it treats uncertainty in sensing and acting in a principled way. In single-agent frameworks like MDPs and POMDPs, planning can be carried out by resorting to Q-value functions: an optimal Q-value function Q* is computed in a recursive manner by dynamic programming, and then an optimal policy is extracted from Q*. In this paper we study whether similar Q-value functions can be defined for decentralized POMDP models (Dec-POMDPs), and how policies can be extracted from such value functions. We define two forms of the optimal Q-value function for Dec-POMDPs: one that gives a normative description as the Q-value function of an optimal pure joint policy and another one that is sequentially rational and thus gives a recipe for computation. This computation, however, is infeasible for all but the smallest problems. Therefore, we analyze various approximate Q-value functions that allow for efficient computation. We describe how they relate, and we prove that they all provide an upper bound to the optimal Q-value function Q*. Finally, unifying some previous approaches for solving Dec-POMDPs, we describe a family of algorithms for extracting policies from such Q-value functions, and perform an experimental evaluation on existing test problems, including a new firefighting benchmark problem.
APA, Harvard, Vancouver, ISO, and other styles
10

Spaan, M. T. J., and N. Vlassis. "Perseus: Randomized Point-based Value Iteration for POMDPs." Journal of Artificial Intelligence Research 24 (August 1, 2005): 195–220. http://dx.doi.org/10.1613/jair.1659.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.
APA, Harvard, Vancouver, ISO, and other styles
11

Victorio-Meza, Hermilo, Manuel Mejía-Lavalle, Alicia Martínez Rebollar, Andrés Blanco Ortega, Obdulia Pichardo Lagunas, and Grigori Sidorov. "Searching for Cerebrovascular Disease Optimal Treatment Recommendations Applying Partially Observable Markov Decision Processes." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 01 (October 9, 2017): 1860015. http://dx.doi.org/10.1142/s0218001418600157.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are mathematical models for the planning of action sequences under conditions of uncertainty. Uncertainty in POMDPs is manifested in two ways: uncertainty in the perception of model states and uncertainty in the effects of actions on states. The diagnosis and treatment of cerebral vascular diseases (CVD) present this double condition of uncertainty, so we think that POMDP is the most suitable method to model them. In this paper, we propose a model of CVD that is based on observations obtained from neuroimaging studies such as computed tomography, magnetic resonance and ultrasound. The model is designed as a POMDP because the health status of the patient is not directly observable, and only can be deduced, with some probability, from the observations in the cerebral images. The components of the model (states, observations, actions, etc.) were defined based on specialized literature. A diagnosis of the patient’s health status is made and the most appropriate action for the recovery of health is recommended after introducing the observations when operating the model. Consultation of the probable state of health of the patient and alternative actions is also allowed.
APA, Harvard, Vancouver, ISO, and other styles
12

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Full text
Abstract:
This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.
APA, Harvard, Vancouver, ISO, and other styles
13

Dibangoye, Jilles Steeve, Christopher Amato, Olivier Buffet, and François Charpillet. "Optimally Solving Dec-POMDPs as Continuous-State MDPs." Journal of Artificial Intelligence Research 55 (February 24, 2016): 443–97. http://dx.doi.org/10.1613/jair.4623.

Full text
Abstract:
Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.
APA, Harvard, Vancouver, ISO, and other styles
14

Zhang, W., and N. L. Zhang. "Restricted Value Iteration: Theory and Algorithms." Journal of Artificial Intelligence Research 23 (February 1, 2005): 123–65. http://dx.doi.org/10.1613/jair.1379.

Full text
Abstract:
Value iteration is a popular algorithm for finding near optimal policies for POMDPs. It is inefficient due to the need to account for the entire belief space, which necessitates the solution of large numbers of linear programs. In this paper, we study value iteration restricted to belief subsets. We show that, together with properly chosen belief subsets, restricted value iteration yields near-optimal policies and we give a condition for determining whether a given belief subset would bring about savings in space and time. We also apply restricted value iteration to two interesting classes of POMDPs, namely informative POMDPs and near-discernible POMDPs.
APA, Harvard, Vancouver, ISO, and other styles
15

Amato, Christopher, Daniel S. Bernstein, and Shlomo Zilberstein. "Optimizing fixed-size stochastic controllers for POMDPs and decentralized POMDPs." Autonomous Agents and Multi-Agent Systems 21, no. 3 (August 25, 2009): 293–320. http://dx.doi.org/10.1007/s10458-009-9103-z.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Roy, N., G. Gordon, and S. Thrun. "Finding Approximate POMDP solutions Through Belief Compression." Journal of Artificial Intelligence Research 23 (January 1, 2005): 1–40. http://dx.doi.org/10.1613/jair.1496.

Full text
Abstract:
Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.
APA, Harvard, Vancouver, ISO, and other styles
17

Itoh, Hideaki, Hisao Fukumoto, Hiroshi Wakuya, and Tatsuya Furukawa. "Bottom-up learning of hierarchical models in a class of deterministic POMDP environments." International Journal of Applied Mathematics and Computer Science 25, no. 3 (September 1, 2015): 597–615. http://dx.doi.org/10.1515/amcs-2015-0044.

Full text
Abstract:
AbstractThe theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments
APA, Harvard, Vancouver, ISO, and other styles
18

Amato, Christopher, George Konidaris, Leslie P. Kaelbling, and Jonathan P. How. "Modeling and Planning with Macro-Actions in Decentralized POMDPs." Journal of Artificial Intelligence Research 64 (March 25, 2019): 817–59. http://dx.doi.org/10.1613/jair.1.11418.

Full text
Abstract:
Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.
APA, Harvard, Vancouver, ISO, and other styles
19

Chatterjee, Krishnendu, and Martin Chmelík. "POMDPs under probabilistic semantics." Artificial Intelligence 221 (April 2015): 46–72. http://dx.doi.org/10.1016/j.artint.2014.12.009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Oliehoek, F. A., M. T. J. Spaan, C. Amato, and S. Whiteson. "Incremental Clustering and Expansion for Faster Optimal Planning in Dec-POMDPs." Journal of Artificial Intelligence Research 46 (March 29, 2013): 449–509. http://dx.doi.org/10.1613/jair.3804.

Full text
Abstract:
This article presents the state-of-the-art in optimal solution methods for decentralized partially observable Markov decision processes (Dec-POMDPs), which are general models for collaborative multiagent planning under uncertainty. Building off the generalized multiagent A* (GMAA*) algorithm, which reduces the problem to a tree of one-shot collaborative Bayesian games (CBGs), we describe several advances that greatly expand the range of Dec-POMDPs that can be solved optimally. First, we introduce lossless incremental clustering of the CBGs solved by GMAA*, which achieves exponential speedups without sacrificing optimality. Second, we introduce incremental expansion of nodes in the GMAA* search tree, which avoids the need to expand all children, the number of which is in the worst case doubly exponential in the node's depth. This is particularly beneficial when little clustering is possible. In addition, we introduce new hybrid heuristic representations that are more compact and thereby enable the solution of larger Dec-POMDPs. We provide theoretical guarantees that, when a suitable heuristic is used, both incremental clustering and incremental expansion yield algorithms that are both complete and search equivalent. Finally, we present extensive empirical results demonstrating that GMAA*-ICE, an algorithm that synthesizes these advances, can optimally solve Dec-POMDPs of unprecedented size.
APA, Harvard, Vancouver, ISO, and other styles
21

WANG, YI, SHIQI ZHANG, and JOOHYUNG LEE. "Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language." Theory and Practice of Logic Programming 19, no. 5-6 (September 2019): 1090–106. http://dx.doi.org/10.1017/s1471068419000371.

Full text
Abstract:
AbstractTo be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (icorpp), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of icorpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate icorpp’s reasoning and planning components. In particular, we extend probabilistic action language pBC+ to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp, which compiles a pBC+ action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC+ action description. Our experiments show that it retains the advantages of icorpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner.
APA, Harvard, Vancouver, ISO, and other styles
22

Zhang, N. L., and W. Zhang. "Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 14 (February 1, 2001): 29–51. http://dx.doi.org/10.1613/jair.761.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.
APA, Harvard, Vancouver, ISO, and other styles
23

Phan, Thomy, Lenz Belzner, Marie Kiermeier, Markus Friedrich, Kyrill Schmid, and Claudia Linnhoff-Popien. "Memory Bounded Open-Loop Planning in Large POMDPs Using Thompson Sampling." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 7941–48. http://dx.doi.org/10.1609/aaai.v33i01.33017941.

Full text
Abstract:
State-of-the-art approaches to partially observable planning like POMCP are based on stochastic tree search. While these approaches are computationally efficient, they may still construct search trees of considerable size, which could limit the performance due to restricted memory resources. In this paper, we propose Partially Observable Stacked Thompson Sampling (POSTS), a memory bounded approach to openloop planning in large POMDPs, which optimizes a fixed size stack of Thompson Sampling bandits. We empirically evaluate POSTS in four large benchmark problems and compare its performance with different tree-based approaches. We show that POSTS achieves competitive performance compared to tree-based open-loop planning and offers a performancememory tradeoff, making it suitable for partially observable planning with highly restricted computational and memory resources.
APA, Harvard, Vancouver, ISO, and other styles
24

ZHANG, Zong-Zhang, and Xiao-Ping CHEN. "Hybrid Heuristic Online Planning for POMDPs." Journal of Software 24, no. 7 (January 16, 2014): 1589–600. http://dx.doi.org/10.3724/sp.j.1001.2013.04318.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Shani, Guy. "Task-Based Decomposition of Factored POMDPs." IEEE Transactions on Cybernetics 44, no. 2 (February 2014): 208–16. http://dx.doi.org/10.1109/tcyb.2013.2252009.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Williams, Jason D., and Steve Young. "Scaling POMDPs for Spoken Dialog Management." IEEE Transactions on Audio, Speech and Language Processing 15, no. 7 (September 2007): 2116–29. http://dx.doi.org/10.1109/tasl.2007.902050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Vargo, Erik P., and Randy Cogill. "Expectation-maximization for Bayes-adaptive POMDPs." Journal of the Operational Research Society 66, no. 10 (October 2015): 1605–23. http://dx.doi.org/10.1057/jors.2015.49.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Ni, Yaodong, and Zhi-Qiang Liu. "Policy iteration for bounded-parameter POMDPs." Soft Computing 17, no. 4 (September 27, 2012): 537–48. http://dx.doi.org/10.1007/s00500-012-0932-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Petric, Frano, Damjan Miklić, and Zdenko Kovačić. "POMDP-Based Coding of Child–Robot Interaction within a Robot-Assisted ASD Diagnostic Protocol." International Journal of Humanoid Robotics 15, no. 02 (April 2018): 1850011. http://dx.doi.org/10.1142/s0219843618500111.

Full text
Abstract:
The existing procedures for autism spectrum disorder (ASD) diagnosis are often time consuming and tiresome both for highly-trained human evaluators and children, which may be alleviated by using humanoid robots in the diagnostic process. Hence, this paper proposes a framework for robot-assisted ASD evaluation based on partially observable Markov decision process (POMDP) modeling, specifically POMDPs with mixed observability (MOMDPs). POMDP is broadly used for modeling optimal sequential decision making tasks under uncertainty. Spurred by the widely accepted autism diagnostic observation schedule (ADOS), we emulate ADOS through four tasks, whose models incorporate observations of multiple social cues such as eye contact, gestures and utterances. Relying only on those observations, the robot provides an assessment of the child’s ASD-relevant functioning level (which is partially observable) within a particular task and provides human evaluators with readable information by partitioning its belief space. Finally, we evaluate the proposed MOMDP task models and demonstrate that chaining the tasks provides fine-grained outcome quantification, which could also increase the appeal of robot-assisted diagnostic protocols in the future.
APA, Harvard, Vancouver, ISO, and other styles
30

Lin, Yong, Xingjia Lu, and Fillia Makedon. "Approximate Planning in POMDPs with Weighted Graph Models." International Journal on Artificial Intelligence Tools 24, no. 04 (August 2015): 1550014. http://dx.doi.org/10.1142/s0218213015500141.

Full text
Abstract:
Markov decision process (MDP) based heuristic algorithms have been considered as simple, fast, but imprecise solutions for partially observable Markov decision processes (POMDPs). The main reason comes from how we approximate belief points. We use weighted graphs to model the state space and the belief space, in order for a detailed analysis of the MDP heuristic algorithm. As a result, we provide the prerequisite conditions to build up a robust belief graph. We further introduce a dynamic mechanism to manage belief space in the belief graph, so as to improve the efficiency and decrease the space complexity. Experimental results indicate our approach is fast and has high quality for POMDPs.
APA, Harvard, Vancouver, ISO, and other styles
31

Shatkay, H., and L. P. Kaelbling. "Learning Geometrically-Constrained Hidden Markov Models for Robot Navigation: Bridging the Topological-Geometrical Gap." Journal of Artificial Intelligence Research 16 (March 1, 2002): 167–207. http://dx.doi.org/10.1613/jair.874.

Full text
Abstract:
Hidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) provide useful tools for modeling dynamical systems. They are particularly useful for representing the topology of environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information and geometrical constraints into both the models and the algorithm that learns them. By taking advantage of such information, learning HMMs/POMDPs can be made to generate better solutions and require fewer iterations, while being robust in the face of data reduction. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach.
APA, Harvard, Vancouver, ISO, and other styles
32

Ye, Nan, Adhiraj Somani, David Hsu, and Wee Sun Lee. "DESPOT: Online POMDP Planning with Regularization." Journal of Artificial Intelligence Research 58 (January 26, 2017): 231–66. http://dx.doi.org/10.1613/jair.5328.

Full text
Abstract:
The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.
APA, Harvard, Vancouver, ISO, and other styles
33

Francois-Lavet, Vincent, Guillaume Rabusseau, Joelle Pineau, Damien Ernst, and Raphael Fonteneau. "On Overfitting and Asymptotic Bias in Batch Reinforcement Learning with Partial Observability." Journal of Artificial Intelligence Research 65 (May 5, 2019): 1–30. http://dx.doi.org/10.1613/jair.1.11478.

Full text
Abstract:
This paper provides an analysis of the tradeoff between asymptotic bias (suboptimality with unlimited data) and overfitting (additional suboptimality due to limited data) in the context of reinforcement learning with partial observability. Our theoretical analysis formally characterizes that while potentially increasing the asymptotic bias, a smaller state representation decreases the risk of overfitting. This analysis relies on expressing the quality of a state representation by bounding $L_1$ error terms of the associated belief states. Theoretical results are empirically illustrated when the state representation is a truncated history of observations, both on synthetic POMDPs and on a large-scale POMDP in the context of smartgrids, with real-world data. Finally, similarly to known results in the fully observable setting, we also briefly discuss and empirically illustrate how using function approximators and adapting the discount factor may enhance the tradeoff between asymptotic bias and overfitting in the partially observable context.
APA, Harvard, Vancouver, ISO, and other styles
34

Pineau, J., G. Gordon, and S. Thrun. "Anytime Point-Based Approximations for Large POMDPs." Journal of Artificial Intelligence Research 27 (November 26, 2006): 335–80. http://dx.doi.org/10.1613/jair.2078.

Full text
Abstract:
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
APA, Harvard, Vancouver, ISO, and other styles
35

Enlu Zhou, Michael C. Fu, and Steven I. Marcus. "Solving Continuous-State POMDPs via Density Projection." IEEE Transactions on Automatic Control 55, no. 5 (May 2010): 1101–16. http://dx.doi.org/10.1109/tac.2010.2042005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Capitan, Jesus, Matthijs T. J. Spaan, Luis Merino, and Anibal Ollero. "Decentralized multi-robot cooperation with auctioned POMDPs." International Journal of Robotics Research 32, no. 6 (May 2013): 650–71. http://dx.doi.org/10.1177/0278364913483345.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Chatterjee, Krishnendu, Martin Chmelík, Raghav Gupta, and Ayush Kanodia. "Optimal cost almost-sure reachability in POMDPs." Artificial Intelligence 234 (May 2016): 26–48. http://dx.doi.org/10.1016/j.artint.2016.01.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

XIANG, YANG, and FRANK HANSHAR. "MULTIAGENT EXPEDITION WITH GRAPHICAL MODELS." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 19, no. 06 (December 2011): 939–76. http://dx.doi.org/10.1142/s0218488511007416.

Full text
Abstract:
We investigate a class of multiagent planning problems termed multiagent expedition, where agents move around an open, unknown, partially observable, stochastic, and physical environment, in pursuit of multiple and alternative goals of different utility. Optimal planning in multiagent expedition is highly intractable. We introduce the notion of conditional optimality, decompose the task into a set of semi-independent optimization subtasks, and apply a decision-theoretic multiagent graphical model to solve each subtask optimally. A set of techniques are proposed to enhance modeling so that the resultant graphical model can be practically evaluated. Effectiveness of the framework and its scalability are demonstrated through experiments. Multiagent expedition can be characterized as decentralized partially observable Markov decision processes (Dec-POMDPs). Hence, this work contributes towards practical planning in Dec-POMDPs.
APA, Harvard, Vancouver, ISO, and other styles
39

Han, Yanlin, and Piotr Gmytrasiewicz. "IPOMDP-Net: A Deep Neural Network for Partially Observable Multi-Agent Planning Using Interactive POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 6062–69. http://dx.doi.org/10.1609/aaai.v33i01.33016062.

Full text
Abstract:
This paper introduces the IPOMDP-net, a neural network architecture for multi-agent planning under partial observability. It embeds an interactive partially observable Markov decision process (I-POMDP) model and a QMDP planning algorithm that solves the model in a neural network architecture. The IPOMDP-net is fully differentiable and allows for end-to-end training. In the learning phase, we train an IPOMDP-net on various fixed and randomly generated environments in a reinforcement learning setting, assuming observable reinforcements and unknown (randomly initialized) model functions. In the planning phase, we test the trained network on new, unseen variants of the environments under the planning setting, using the trained model to plan without reinforcements. Empirical results show that our model-based IPOMDP-net outperforms the other state-of-the-art modelfree network and generalizes better to larger, unseen environments. Our approach provides a general neural computing architecture for multi-agent planning using I-POMDPs. It suggests that, in a multi-agent setting, having a model of other agents benefits our decision-making, resulting in a policy of higher quality and better generalizability.
APA, Harvard, Vancouver, ISO, and other styles
40

Hauskrecht, M. "Value-Function Approximations for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 13 (August 1, 2000): 33–94. http://dx.doi.org/10.1613/jair.678.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.
APA, Harvard, Vancouver, ISO, and other styles
41

Nair, R., and M. Tambe. "Hybrid BDI-POMDP Framework for Multiagent Teaming." Journal of Artificial Intelligence Research 23 (April 1, 2005): 367–420. http://dx.doi.org/10.1613/jair.1549.

Full text
Abstract:
Many current large-scale multiagent team implementations can be characterized as following the ``belief-desire-intention'' (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDI-POMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Role-based Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our technique gains in tractability by significantly curtailing RMTDP policy search; in particular, BDI team plans provide incomplete RMTDP policies, and the RMTDP policy search fills the gaps in such incomplete policies by searching for the best role allocation. Our second key contribution is a novel decomposition technique to further improve RMTDP policy search efficiency. Even though limited to searching role allocations, there are still combinatorially many role allocations, and evaluating each in RMTDP to identify the best is extremely difficult. Our decomposition technique exploits the structure in the BDI team plans to significantly prune the search space of role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation.
APA, Harvard, Vancouver, ISO, and other styles
42

WU, Bo, Min WU, and Jin-Hua SHE. "Point-Based Online Value Iteration Algorithm for POMDPs." Journal of Software 24, no. 1 (January 14, 2014): 25–36. http://dx.doi.org/10.3724/sp.j.1001.2013.04258.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Wu, Bo, Yan Peng Feng, and Hong Yan Zheng. "Point-Based Monte Carto Online Planning in POMDPs." Advanced Materials Research 846-847 (November 2013): 1388–91. http://dx.doi.org/10.4028/www.scientific.net/amr.846-847.1388.

Full text
Abstract:
The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.
APA, Harvard, Vancouver, ISO, and other styles
44

Png, Shaowei, Joelle Pineau, and Brahim Chaib-Draa. "Building Adaptive Dialogue Systems Via Bayes-Adaptive POMDPs." IEEE Journal of Selected Topics in Signal Processing 6, no. 8 (December 2012): 917–27. http://dx.doi.org/10.1109/jstsp.2012.2229962.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Brooks, Alex, Alexei Makarenko, Stefan Williams, and Hugh Durrant-Whyte. "Parametric POMDPs for planning in continuous state spaces." Robotics and Autonomous Systems 54, no. 11 (November 2006): 887–97. http://dx.doi.org/10.1016/j.robot.2006.05.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Foka, Amalia, and Panos Trahanias. "Real-time hierarchical POMDPs for autonomous robot navigation." Robotics and Autonomous Systems 55, no. 7 (July 2007): 561–71. http://dx.doi.org/10.1016/j.robot.2007.01.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Szer, Daniel, François Charpillet, and Shlomo Zilberstein. "Résolution optimale de DEC-POMDPs par recherche heuristique." Revue d'intelligence artificielle 21, no. 1 (February 15, 2007): 107–28. http://dx.doi.org/10.3166/ria.21.107-128.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Doshi, Prashant, Yifeng Zeng, and Qiongyu Chen. "Graphical models for interactive POMDPs: representations and solutions." Autonomous Agents and Multi-Agent Systems 18, no. 3 (September 25, 2008): 376–416. http://dx.doi.org/10.1007/s10458-008-9064-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Maliah, Shlomi, and Guy Shani. "Using POMDPs for learning cost sensitive decision trees." Artificial Intelligence 292 (March 2021): 103400. http://dx.doi.org/10.1016/j.artint.2020.103400.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Baxter, J., P. L. Bartlett, and L. Weaver. "Experiments with Infinite-Horizon, Policy-Gradient Estimation." Journal of Artificial Intelligence Research 15 (November 1, 2001): 351–81. http://dx.doi.org/10.1613/jair.807.

Full text
Abstract:
In this paper, we present algorithms that perform gradient ascent of the average reward in a partially observable Markov decision process (POMDP). These algorithms are based on GPOMDP, an algorithm introduced in a companion paper (Baxter & Bartlett, this volume), which computes biased estimates of the performance gradient in POMDPs. The algorithm's chief advantages are that it uses only one free parameter beta, which has a natural interpretation in terms of bias-variance trade-off, it requires no knowledge of the underlying state, and it can be applied to infinite state, control and observation spaces. We show how the gradient estimates produced by GPOMDP can be used to perform gradient ascent, both with a traditional stochastic-gradient algorithm, and with an algorithm based on conjugate-gradients that utilizes gradient information to bracket maxima in line searches. Experimental results are presented illustrating both the theoretical results of (Baxter & Bartlett, this volume) on a toy problem, and practical aspects of the algorithms on a number of more realistic problems.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography