Log in

Relevant bibliographies by topics / POMDP / Journal articles

Journal articles on the topic 'POMDP'

To see the other types of publications on this topic, follow the link: POMDP.

Author: Grafiati

Published: 4 June 2021

Last updated: 8 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'POMDP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.

APA, Harvard, Vancouver, ISO, and other styles

2

Kim, Sung-Kyun, Oren Salzman, and Maxim Likhachev. "POMHDP: Search-Based Belief Space Planning Using Multiple Heuristics." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 734–44. http://dx.doi.org/10.1609/icaps.v29i1.3542.

Full text

Abstract:

Robots operating in the real world encounter substantial uncertainty that cannot be modeled deterministically before the actual execution. This gives rise to the necessity of robust motion planning under uncertainty also known as belief space planning. Belief space planning can be formulated as Partially Observable Markov Decision Processes (POMDPs). However, computing optimal policies for non-trivial POMDPs is computationally intractable. Building upon recent progress from the search community, we propose a novel anytime POMDP solver, Partially Observable Multi-Heuristic Dynamic Programming (POMHDP), that leverages multiple heuristics to efficiently compute high-quality solutions while guaranteeing asymptotic convergence to an optimal policy. Through iterative forward search, POMHDP utilizes domain knowledge to solve POMDPs with specific goals and an infinite horizon. We demonstrate the efficacy of our proposed framework on a real-world, highly-complex, truck unloading application.

APA, Harvard, Vancouver, ISO, and other styles

3

Lim, Michael H., Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, and Zachary N. Sunberg. "Optimality Guarantees for Particle Belief Approximation of POMDPs." Journal of Artificial Intelligence Research 77 (August 27, 2023): 1591–636. http://dx.doi.org/10.1613/jair.1.14525.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of O(C), where C is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

APA, Harvard, Vancouver, ISO, and other styles

4

Brafman, Ronen, Guy Shani, and Shlomo Zilberstein. "Qualitative Planning under Partial Observability in Multi-Agent Domains." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 130–37. http://dx.doi.org/10.1609/aaai.v27i1.8643.

Full text

Abstract:

Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more “classical” in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhang, Zongzhang, Michael Littman, and Xiaoping Chen. "Covering Number as a Complexity Measure for POMDP Planning and Learning." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1853–59. http://dx.doi.org/10.1609/aaai.v26i1.8360.

Full text

Abstract:

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.

APA, Harvard, Vancouver, ISO, and other styles

6

Wu, Chenyang, Rui Kong, Guoyu Yang, Xianghan Kong, Zongzhang Zhang, Yang Yu, Dong Li, and Wulong Liu. "LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15927–28. http://dx.doi.org/10.1609/aaai.v35i18.17960.

Full text

Abstract:

Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only $K$ scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.

APA, Harvard, Vancouver, ISO, and other styles

7

Carvalho Chanel, Caroline, Florent Teichteil-Königsbuch, and Charles Lesire. "Multi-Target Detection and Recognition by UAVs Using Online POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 29, 2013): 1381–87. http://dx.doi.org/10.1609/aaai.v27i1.8551.

Full text

Abstract:

This paper tackles high-level decision-making techniques for robotic missions, which involve both active sensing and symbolic goal reaching, under uncertain probabilistic environments and strong time constraints. Our case study is a POMDP model of an online multi-target detection and recognition mission by an autonomous UAV. The POMDP model of the multi-target detection and recognition problem is generated online from a list of areas of interest, which are automatically extracted at the beginning of the flight from a coarse-grained high altitude observation of the scene. The POMDP observation model relies on a statistical abstraction of an image processing algorithm's output used to detect targets. As the POMDP problem cannot be known and thus optimized before the beginning of the flight, our main contribution is an "optimize-while-execute" algorithmic framework: it drives a POMDP sub-planner to optimize and execute the POMDP policy in parallel under action duration constraints. We present new results from real outdoor flights and SAIL simulations, which highlight both the benefits of using POMDPs in multi-target detection and recognition missions, and of our "optimize-while-execute" paradigm.

APA, Harvard, Vancouver, ISO, and other styles

8

Hoerger, Marcus, Joshua Song, Hanna Kurniawati, and Alberto Elfes. "POMDP-Based Candy Server:Lessons Learned from a Seven Day Demo." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 698–706. http://dx.doi.org/10.1609/icaps.v29i1.3538.

Full text

Abstract:

An autonomous robot must decide a good strategy to achieve its long term goal, despite various types of uncertainty. The Partially Observable Markov Decision Processes (POMDPs) is a principled framework to address such a decision making problem. Despite the computational intractability of solving POMDPs, the past decade has seen substantial advancement in POMDP solvers. This paper presents our experience in enabling on-line POMDP solving to become the sole motion planner for a robot manipulation demo at IEEE SIMPAR and ICRA 2018. The demo scenario is a candy-serving robot: A 6-DOFs robot arm must pick-up a cup placed on a table by a user, use the cup to scoop candies from a box, and put the cup of candies back on the table. The average perception error is ∼3cm (≈ the radius of the cup), affecting the position of the cup and the surface level of the candies. This paper presents a strategy to alleviate the curse of history issue plaguing this scenario, the perception system and its integration with the planner, and lessons learned in enabling an online POMDP solver to become the sole motion planner of this entire task. The POMDP-based system were tested through a 7 days live demo at the two conferences. In this demo, 150 runs were attempted and 98% of them were successful. We also conducted further experiments to test the capability of our POMDP-based system when the environment is relatively cluttered by obstacles and when the user moves the cup while the robot tries to pick it up. In both cases, our POMDP-based system reaches a success rate of 90% and above.

APA, Harvard, Vancouver, ISO, and other styles

9

Khonji, Majid, and Duoaa Khalifa. "Heuristic Search in Dual Space for Constrained Fixed-Horizon POMDPs with Durative Actions." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14927–36. http://dx.doi.org/10.1609/aaai.v37i12.26743.

Full text

Abstract:

The Partially Observable Markov Decision Process (POMDP) is widely used in probabilistic planning for stochastic domains. However, current extensions, such as constrained and chance-constrained POMDPs, have limitations in modeling real-world planning problems because they assume that all actions have a fixed duration. To address this issue, we propose a unified model that encompasses durative POMDP and its constrained extensions. To solve the durative POMDP and its constrained extensions, we first convert them into an Integer Linear Programming (ILP) formulation. This approach leverages existing solvers in the ILP literature and provides a foundation for solving these problems. We then introduce a heuristic search approach that prunes the search space, which is guided by solving successive partial ILP programs. Our empirical evaluation results show that our approach outperforms the current state-of-the-art fixed-horizon chance-constrained POMDP solver.

APA, Harvard, Vancouver, ISO, and other styles

10

Meli, Daniele, Alberto Castellini, and Alessandro Farinelli. "Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach." Journal of Artificial Intelligence Research 79 (February 28, 2024): 725–76. http://dx.doi.org/10.1613/jair.1.15826.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).

APA, Harvard, Vancouver, ISO, and other styles

11

Itoh, Hideaki, Hisao Fukumoto, Hiroshi Wakuya, and Tatsuya Furukawa. "Bottom-up learning of hierarchical models in a class of deterministic POMDP environments." International Journal of Applied Mathematics and Computer Science 25, no. 3 (September 1, 2015): 597–615. http://dx.doi.org/10.1515/amcs-2015-0044.

Full text

Abstract:

AbstractThe theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments

APA, Harvard, Vancouver, ISO, and other styles

12

Walraven, Erwin, and Matthijs T. J. Spaan. "Column Generation Algorithms for Constrained POMDPs." Journal of Artificial Intelligence Research 62 (July 17, 2018): 489–533. http://dx.doi.org/10.1613/jair.1.11216.

Full text

Abstract:

In several real-world domains it is required to plan ahead while there are finite resources available for executing the plan. The limited availability of resources imposes constraints on the plans that can be executed, which need to be taken into account while computing a plan. A Constrained Partially Observable Markov Decision Process (Constrained POMDP) can be used to model resource-constrained planning problems which include uncertainty and partial observability. Constrained POMDPs provide a framework for computing policies which maximize expected reward, while respecting constraints on a secondary objective such as cost or resource consumption. Column generation for linear programming can be used to obtain Constrained POMDP solutions. This method incrementally adds columns to a linear program, in which each column corresponds to a POMDP policy obtained by solving an unconstrained subproblem. Column generation requires solving a potentially large number of POMDPs, as well as exact evaluation of the resulting policies, which is computationally difficult. We propose a method to solve subproblems in a two-stage fashion using approximation algorithms. First, we use a tailored point-based POMDP algorithm to obtain an approximate subproblem solution. Next, we convert this approximate solution into a policy graph, which we can evaluate efficiently. The resulting algorithm is a new approximate method for Constrained POMDPs in single-agent settings, but also in settings in which multiple independent agents share a global constraint. Experiments based on several domains show that our method outperforms the current state of the art.

APA, Harvard, Vancouver, ISO, and other styles

13

Carr, Steven, Nils Jansen, and Ufuk Topcu. "Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 72 (November 18, 2021): 819–47. http://dx.doi.org/10.1613/jair.1.12963.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

APA, Harvard, Vancouver, ISO, and other styles

14

Ko, Li Ling, David Hsu, Wee Sun Lee, and Sylvie Ong. "Structured Parameter Elicitation." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1102–7. http://dx.doi.org/10.1609/aaai.v24i1.7744.

Full text

Abstract:

The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable Markov decision processes (POMDPs) provide a principled framework for such uncertainty planning tasks, but they suffer from high computational complexity. However, POMDPs for parameter elicitation often possess special structural properties, specifically, factorization and symmetry. This work identifies these properties and exploits them for efficient solution through a factored belief representation. The experimental results show that our new POMDP solvers outperform SARSOP and MOMDP, two of the fastest general-purpose POMDP solvers available, and can handle significantly larger problems.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhang, Chongjie, and Victor Lesser. "Coordinated Multi-Agent Reinforcement Learning in Networked Distributed POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 764–70. http://dx.doi.org/10.1609/aaai.v25i1.7886.

Full text

Abstract:

In many multi-agent applications such as distributed sensor nets, a network of agents act collaboratively under uncertainty and local interactions. Networked Distributed POMDP (ND-POMDP) provides a framework to model such cooperative multi-agent decision making. Existing work on ND-POMDPs has focused on offline techniques that require accurate models, which are usually costly to obtain in practice. This paper presents a model-free, scalable learning approach that synthesizes multi-agent reinforcement learning (MARL) and distributed constraint optimization (DCOP). By exploiting structured interaction in ND-POMDPs, our approach distributes the learning of the joint policy and employs DCOP techniques to coordinate distributed learning to ensure the global learning performance. Our approach can learn a globally optimal policy for ND-POMDPs with a property called groupwise observability. Experimental results show that, with communication during learning and execution, our approach significantly outperforms the nearly-optimal non-communication policies computed offline.

APA, Harvard, Vancouver, ISO, and other styles

16

Ross, S., J. Pineau, S. Paquet, and B. Chaib-draa. "Online Planning Algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (July 29, 2008): 663–704. http://dx.doi.org/10.1613/jair.2567.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.

APA, Harvard, Vancouver, ISO, and other styles

17

Capitan, Jesus, Matthijs Spaan, Luis Merino, and Anibal Ollero. "Decentralized Multi-Robot Cooperation with Auctioned POMDPs." Proceedings of the International Conference on Automated Planning and Scheduling 24 (May 11, 2014): 515–18. http://dx.doi.org/10.1609/icaps.v24i1.13658.

Full text

Abstract:

Planning under uncertainty faces a scalability problem when considering multi-robot teams, as the information space scales exponentially with the number of robots. To address this issue, this paper proposes to decentralize multi-robot Partially Observable Markov Decision Processes (POMDPs) while maintaining cooperation between robots by using POMDP policy auctions. Auctions provide a flexible way of coordinating individual policies modeled by POMDPs and have low communication requirements. Additionally, communication models in the multi-agent POMDP literature severely mismatch with real inter-robot communication. We address this issue by exploiting a decentralized data fusion method in order to efficiently maintain a joint belief state among the robots. The paper presents results in two different applications: environmental monitoring with Unmanned Aerial Vehicles (UAVs); and cooperative tracking, in which several robots have to jointly track a moving target of interest.

APA, Harvard, Vancouver, ISO, and other styles

18

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Full text

Abstract:

This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.

APA, Harvard, Vancouver, ISO, and other styles

19

WANG, YI, SHIQI ZHANG, and JOOHYUNG LEE. "Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language." Theory and Practice of Logic Programming 19, no. 5-6 (September 2019): 1090–106. http://dx.doi.org/10.1017/s1471068419000371.

Full text

Abstract:

AbstractTo be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (icorpp), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of icorpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate icorpp’s reasoning and planning components. In particular, we extend probabilistic action language pBC+ to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp, which compiles a pBC+ action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC+ action description. Our experiments show that it retains the advantages of icorpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner.

APA, Harvard, Vancouver, ISO, and other styles

20

Folsom-Kovarik, Jeremiah, Gita Sukthankar, and Sae Schatz. "Integrating Learner Help Requests Using a POMDP in an Adaptive Training System." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 2 (October 6, 2021): 2287–92. http://dx.doi.org/10.1609/aaai.v26i2.18971.

Full text

Abstract:

This paper describes the development and empirical testing of an intelligent tutoring system (ITS) with two emerging methodologies: (1) a partially observable Markov decision process (POMDP) for representing the learner model and (2) inquiry modeling, which informs the learner model with questions learners ask during instruction. POMDPs have been successfully applied to non-ITS domains but, until recently, have seemed intractable for large-scale intelligent tutoring challenges. New, ITS-specific representations leverage common regularities in intelligent tutoring to make a POMDP practical as a learner model. Inquiry modeling is a novel paradigm for informing learner models by observing rich features of learners’ help requests such as categorical content, context, and timing. The experiment described in this paper demonstrates that inquiry modeling and planning with POMDPs can yield significant and substantive learning improvements in a realistic, scenario-based training task.

APA, Harvard, Vancouver, ISO, and other styles

21

Nair, R., and M. Tambe. "Hybrid BDI-POMDP Framework for Multiagent Teaming." Journal of Artificial Intelligence Research 23 (April 1, 2005): 367–420. http://dx.doi.org/10.1613/jair.1549.

Full text

Abstract:

Many current large-scale multiagent team implementations can be characterized as following the ``belief-desire-intention'' (BDI) paradigm, with explicit representation of team plans. Despite their promise, current BDI team approaches lack tools for quantitative performance analysis under uncertainty. Distributed partially observable Markov decision problems (POMDPs) are well suited for such analysis, but the complexity of finding optimal policies in such models is highly intractable. The key contribution of this article is a hybrid BDI-POMDP approach, where BDI team plans are exploited to improve POMDP tractability and POMDP analysis improves BDI team plan performance. Concretely, we focus on role allocation, a fundamental problem in BDI teams: which agents to allocate to the different roles in the team. The article provides three key contributions. First, we describe a role allocation technique that takes into account future uncertainties in the domain; prior work in multiagent role allocation has failed to address such uncertainties. To that end, we introduce RMTDP (Role-based Markov Team Decision Problem), a new distributed POMDP model for analysis of role allocations. Our technique gains in tractability by significantly curtailing RMTDP policy search; in particular, BDI team plans provide incomplete RMTDP policies, and the RMTDP policy search fills the gaps in such incomplete policies by searching for the best role allocation. Our second key contribution is a novel decomposition technique to further improve RMTDP policy search efficiency. Even though limited to searching role allocations, there are still combinatorially many role allocations, and evaluating each in RMTDP to identify the best is extremely difficult. Our decomposition technique exploits the structure in the BDI team plans to significantly prune the search space of role allocations. Our third key contribution is a significantly faster policy evaluation algorithm suited for our BDI-POMDP hybrid approach. Finally, we also present experimental results from two domains: mission rehearsal simulation and RoboCupRescue disaster rescue simulation.

APA, Harvard, Vancouver, ISO, and other styles

22

Ajdarów, Michal, Šimon Brlej, and Petr Novotný. "Shielding in Resource-Constrained Goal POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14674–82. http://dx.doi.org/10.1609/aaai.v37i12.26715.

Full text

Abstract:

We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by the agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call resource-constrained goal optimization (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a shield for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.

APA, Harvard, Vancouver, ISO, and other styles

23

Varakantham, Pradeep, Jun-young Kwak, Matthew Taylor, Janusz Marecki, Paul Scerri, and Milind Tambe. "Exploiting Coordination Locales in Distributed POMDPs via Social Model Shaping." Proceedings of the International Conference on Automated Planning and Scheduling 19 (October 16, 2009): 313–20. http://dx.doi.org/10.1609/icaps.v19i1.13369.

Full text

Abstract:

Distributed POMDPs provide an expressive framework for modeling multiagent collaboration problems, but NEXP-Complete complexity hinders their scalability and application in real-world domains. This paper introduces a subclass of distributed POMDPs, and TREMOR, an algorithm to solve such distributed POMDPs. The primary novelty of TREMOR is that agents plan individually with a single agent POMDP solver and use social model shaping to implicitly coordinate with other agents. Experiments demonstrate that TREMOR can provide solutions orders of magnitude faster than existing algorithms while achieving comparable, or even superior, solution quality.

APA, Harvard, Vancouver, ISO, and other styles

24

Roy, N., G. Gordon, and S. Thrun. "Finding Approximate POMDP solutions Through Belief Compression." Journal of Artificial Intelligence Research 23 (January 1, 2005): 1–40. http://dx.doi.org/10.1613/jair.1496.

Full text

Abstract:

Standard value function approaches to finding policies for Partially Observable Markov Decision Processes (POMDPs) are generally considered to be intractable for large models. The intractability of these algorithms is to a large extent a consequence of computing an exact, optimal policy over the entire belief space. However, in real-world POMDP problems, computing the optimal policy for the full belief space is often unnecessary for good control even for problems with complicated policy classes. The beliefs experienced by the controller often lie near a structured, low-dimensional subspace embedded in the high-dimensional belief space. Finding a good approximation to the optimal value function for only this subspace can be much easier than computing the full value function. We introduce a new method for solving large-scale POMDPs by reducing the dimensionality of the belief space. We use Exponential family Principal Components Analysis (Collins, Dasgupta & Schapire, 2002) to represent sparse, high-dimensional belief spaces using small sets of learned features of the belief state. We then plan only in terms of the low-dimensional belief features. By planning in this low-dimensional space, we can find policies for POMDP models that are orders of magnitude larger than models that can be handled by conventional techniques. We demonstrate the use of this algorithm on a synthetic problem and on mobile robot navigation tasks.

APA, Harvard, Vancouver, ISO, and other styles

25

Dibangoye, Jilles Steeve, Christopher Amato, Olivier Buffet, and François Charpillet. "Optimally Solving Dec-POMDPs as Continuous-State MDPs." Journal of Artificial Intelligence Research 55 (February 24, 2016): 443–97. http://dx.doi.org/10.1613/jair.4623.

Full text

Abstract:

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

APA, Harvard, Vancouver, ISO, and other styles

26

Victorio-Meza, Hermilo, Manuel Mejía-Lavalle, Alicia Martínez Rebollar, Andrés Blanco Ortega, Obdulia Pichardo Lagunas, and Grigori Sidorov. "Searching for Cerebrovascular Disease Optimal Treatment Recommendations Applying Partially Observable Markov Decision Processes." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 01 (October 9, 2017): 1860015. http://dx.doi.org/10.1142/s0218001418600157.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) are mathematical models for the planning of action sequences under conditions of uncertainty. Uncertainty in POMDPs is manifested in two ways: uncertainty in the perception of model states and uncertainty in the effects of actions on states. The diagnosis and treatment of cerebral vascular diseases (CVD) present this double condition of uncertainty, so we think that POMDP is the most suitable method to model them. In this paper, we propose a model of CVD that is based on observations obtained from neuroimaging studies such as computed tomography, magnetic resonance and ultrasound. The model is designed as a POMDP because the health status of the patient is not directly observable, and only can be deduced, with some probability, from the observations in the cerebral images. The components of the model (states, observations, actions, etc.) were defined based on specialized literature. A diagnosis of the patient’s health status is made and the most appropriate action for the recovery of health is recommended after introducing the observations when operating the model. Consultation of the probable state of health of the patient and alternative actions is also allowed.

APA, Harvard, Vancouver, ISO, and other styles

27

Theocharous, Georgios, and Sridhar Mahadevan. "Compressing POMDPs Using Locality Preserving Non-Negative Matrix Factorization." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1147–52. http://dx.doi.org/10.1609/aaai.v24i1.7750.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDPs) are a well-established and rigorous framework for sequential decision-making under uncertainty. POMDPs are well-known to be intractable to solve exactly, and there has been significant work on finding tractable approximation methods. One well-studied approach is to find a compression of the original POMDP by projecting the belief states to a lower-dimensional space. We present a novel dimensionality reduction method for POMDPs based on locality preserving non-negative matrix factorization. Unlike previous approaches, such as Krylov compression and regular non-negative matrix factorization, our approach preserves the local geometry of the belief space manifold. We present results on standard benchmark POMDPs showing improved performance over previously explored compression algorithms for POMDPs.

APA, Harvard, Vancouver, ISO, and other styles

28

Kraemer, Landon, and Bikramjit Banerjee. "Informed Initial Policies for Learning in Dec-POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 2433–34. http://dx.doi.org/10.1609/aaai.v26i1.8426.

Full text

Abstract:

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.

APA, Harvard, Vancouver, ISO, and other styles

29

Ye, Nan, Adhiraj Somani, David Hsu, and Wee Sun Lee. "DESPOT: Online POMDP Planning with Regularization." Journal of Artificial Intelligence Research 58 (January 26, 2017): 231–66. http://dx.doi.org/10.1613/jair.5328.

Full text

Abstract:

The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.

APA, Harvard, Vancouver, ISO, and other styles

30

Chatterjee, Krishnendu, Martin Chmelik, and Ufuk Topcu. "Sensor Synthesis for POMDPs with Reachability Objectives." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 47–55. http://dx.doi.org/10.1609/icaps.v28i1.13875.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize “weakest” additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.

APA, Harvard, Vancouver, ISO, and other styles

31

Dressel, Louis, and Mykel Kochenderfer. "Efficient Decision-Theoretic Target Localization." Proceedings of the International Conference on Automated Planning and Scheduling 27 (June 5, 2017): 70–78. http://dx.doi.org/10.1609/icaps.v27i1.13832.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP — a state-of-the-art offline solver — to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source—a challenging real-world problem.

APA, Harvard, Vancouver, ISO, and other styles

32

Sonu, Ekhlas, Yingke Chen, and Prashant Doshi. "Decision-Theoretic Planning Under Anonymity in Agent Populations." Journal of Artificial Intelligence Research 59 (August 29, 2017): 725–70. http://dx.doi.org/10.1613/jair.5449.

Full text

Abstract:

We study the problem of self-interested planning under uncertainty in settings shared with more than a thousand other agents, each of which plans at its own individual level. We refer to such large numbers of agents as an agent population. The decision-theoretic formalism of interactive partially observable Markov decision process (I-POMDP) is used to model the agent's self-interested planning. The first contribution of this article is a method for drastically scaling the finitely-nested I-POMDP to certain agent populations for the first time. Our method exploits two types of structure that is often exhibited by agent populations -- anonymity and context-specific independence. We present a variant called the many-agent I-POMDP that models both these types of structure to plan efficiently under uncertainty in multiagent settings. In particular, the complexity of the belief update and solution in the many-agent I-POMDP is polynomial in the number of agents compared with the exponential growth that challenges the original framework. While exploiting structure helps mitigate the curse of many agents, the well-known curse of history that afflicts I-POMDPs continues to challenge scalability in terms of the planning horizon. The second contribution of this article is an application of the branch-and-bound scheme to reduce the exponential growth of the search tree for look ahead. For this, we introduce new fast-computing upper and lower bounds for the exact value function of the many-agent I-POMDP. This speeds up the look-ahead computations without trading off optimality, and reduces both memory and run time complexity. The third contribution is a comprehensive empirical evaluation of the methods on three new problems domains -- policing large protests, controlling traffic congestion at a busy intersection, and improving the AI for the popular Clash of Clans multiplayer game. We demonstrate the feasibility of exact self-interested planning in these large problems, and that our methods for speeding up the planning are effective. Altogether, these contributions represent a principled and significant advance toward moving self-interested planning under uncertainty to real-world applications.

APA, Harvard, Vancouver, ISO, and other styles

33

Ramachandran, Aditi, Sarah Strohkorb Sebo, and Brian Scassellati. "Personalized Robot Tutoring Using the Assistive Tutor POMDP (AT-POMDP)." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 8050–57. http://dx.doi.org/10.1609/aaai.v33i01.33018050.

Full text

Abstract:

Selecting appropriate tutoring help actions that account for both a student’s content mastery and engagement level is essential for effective human tutors, indicating the critical need for these skills in autonomous tutors. In this work, we formulate the robot-student tutoring help action selection problem as the Assistive Tutor partially observable Markov decision process (AT-POMDP). We designed the AT-POMDP and derived its parameters based on data from a prior robot-student tutoring study. The policy that results from solving the AT-POMDP allows a robot tutor to decide upon the optimal tutoring help action to give a student, while maintaining a belief of the student’s mastery of the material and engagement with the task. This approach is validated through a between-subjects field study, which involved 4th grade students (n=28) interacting with a social robot solving long division problems over five sessions. Students who received help from a robot using the AT-POMDP policy demonstrated significantly greater learning gains than students who received help from a robot with a fixed help action selection policy. Our results demonstrate that this robust computational framework can be used effectively to deliver diverse and personalized tutoring support over time for students.

APA, Harvard, Vancouver, ISO, and other styles

34

Amato, Christopher, George Konidaris, Leslie P. Kaelbling, and Jonathan P. How. "Modeling and Planning with Macro-Actions in Decentralized POMDPs." Journal of Artificial Intelligence Research 64 (March 25, 2019): 817–59. http://dx.doi.org/10.1613/jair.1.11418.

Full text

Abstract:

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.

APA, Harvard, Vancouver, ISO, and other styles

35

Tennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.

Full text

Abstract:

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.

APA, Harvard, Vancouver, ISO, and other styles

36

Petric, Frano, Damjan Miklić, and Zdenko Kovačić. "POMDP-Based Coding of Child–Robot Interaction within a Robot-Assisted ASD Diagnostic Protocol." International Journal of Humanoid Robotics 15, no. 02 (April 2018): 1850011. http://dx.doi.org/10.1142/s0219843618500111.

Full text

Abstract:

The existing procedures for autism spectrum disorder (ASD) diagnosis are often time consuming and tiresome both for highly-trained human evaluators and children, which may be alleviated by using humanoid robots in the diagnostic process. Hence, this paper proposes a framework for robot-assisted ASD evaluation based on partially observable Markov decision process (POMDP) modeling, specifically POMDPs with mixed observability (MOMDPs). POMDP is broadly used for modeling optimal sequential decision making tasks under uncertainty. Spurred by the widely accepted autism diagnostic observation schedule (ADOS), we emulate ADOS through four tasks, whose models incorporate observations of multiple social cues such as eye contact, gestures and utterances. Relying only on those observations, the robot provides an assessment of the child’s ASD-relevant functioning level (which is partially observable) within a particular task and provides human evaluators with readable information by partitioning its belief space. Finally, we evaluate the proposed MOMDP task models and demonstrate that chaining the tasks provides fine-grained outcome quantification, which could also increase the appeal of robot-assisted diagnostic protocols in the future.

APA, Harvard, Vancouver, ISO, and other styles

37

Lev-Yehudi, Idan, Moran Barenboim, and Vadim Indelman. "Simplifying Complex Observation Models in Continuous POMDP Planning with Probabilistic Guarantees and Practice." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 18 (March 24, 2024): 20176–84. http://dx.doi.org/10.1609/aaai.v38i18.29997.

Full text

Abstract:

Solving partially observable Markov decision processes (POMDPs) with high dimensional and continuous observations, such as camera images, is required for many real life robotics and planning problems. Recent researches suggested machine learned probabilistic models as observation models, but their use is currently too computationally expensive for online deployment. We deal with the question of what would be the implication of using simplified observation models for planning, while retaining formal guarantees on the quality of the solution. Our main contribution is a novel probabilistic bound based on a statistical total variation distance of the simplified model. We show that it bounds the theoretical POMDP value w.r.t. original model, from the empirical planned value with the simplified model, by generalizing recent results of particle-belief MDP concentration bounds. Our calculations can be separated into offline and online parts, and we arrive at formal guarantees without having to access the costly model at all during planning, which is also a novel result. Finally, we demonstrate in simulation how to integrate the bound into the routine of an existing continuous online POMDP solver.

APA, Harvard, Vancouver, ISO, and other styles

38

Wu, Bo, Hong Yan Zheng, and Yan Peng Feng. "A Novel Point-Based Incremental Pruning Algorithm for POMDP." Applied Mechanics and Materials 513-517 (February 2014): 1088–91. http://dx.doi.org/10.4028/www.scientific.net/amm.513-517.1088.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDP) provides piecewise-linear a natural and principled framework for sequential decision-making under uncertainty. However, large-scale POMDP suffers from the exponential growth of the belief points and policy trees space. We present a new point-based incremental pruning algorithm based on the piecewise linearity and convexity of the value function. Instead of reasoning about the whole belief space when pruning the cross-sums in POMDP policy construction, our algorithm uses belief points to perform approximate pruning by generating policy trees, and get the optimal policy in real-time belief states. The empirical results indicate that point-based incremental pruning for heuristic search methods can handle large POMDP domains efficiently.

APA, Harvard, Vancouver, ISO, and other styles

39

Simão, Thiago D., Marnix Suilen, and Nils Jansen. "Safe Policy Improvement for POMDPs via Finite-State Controllers." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 15109–17. http://dx.doi.org/10.1609/aaai.v37i12.26763.

Full text

Abstract:

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve upon the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data is available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.

APA, Harvard, Vancouver, ISO, and other styles

40

Warnquist, Håkan, Jonas Kvarnström, and Patrick Doherty. "Exploiting Fully Observable and Deterministic Structures in Goal POMDPs." Proceedings of the International Conference on Automated Planning and Scheduling 23 (June 2, 2013): 242–50. http://dx.doi.org/10.1609/icaps.v23i1.13554.

Full text

Abstract:

When parts of the states in a goal POMDP are fully observable and some actions are deterministic it is possible to take advantage of these properties to efficiently generate approximate solutions. Actions that deterministically affect the fully observable component of the world state can be abstracted away and combined into macro actions, permitting a planner to converge more quickly. This processing can be separated from the main search procedure, allowing us to leverage existing POMDP solvers. Theoretical results show how a POMDP can be analyzed to identify the exploitable properties and formal guarantees are provided showing that the use of macro actions preserves solvability. The efficiency of the method is demonstrated with examples when used in combination with existing POMDP solvers.

APA, Harvard, Vancouver, ISO, and other styles

41

Khalvati, Koosha, and Alan Mackworth. "A Fast Pairwise Heuristic for Planning under Uncertainty." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 503–9. http://dx.doi.org/10.1609/aaai.v27i1.8672.

Full text

Abstract:

POMDP (Partially Observable Markov Decision Process) is a mathematical framework that models planning under uncertainty. Solving a POMDP is an intractable problem and even the state of the art POMDP solvers are too computationally expensive for large domains. This is a major bottleneck. In this paper, we propose a new heuristic, called the pairwise heuristic, that can be used in a one-step greedy strategy to find a near optimal solution for POMDP problems very quickly. This approach is a good candidate for large problems where real-time solution is a necessity but exact optimality of the solution is not vital. The pairwise heuristic uses the optimal solutions for pairs of states. For each pair of states in the POMDP, we find the optimal sequence of actions to resolve the uncertainty and to maximize the reward, given that the agent is uncertain about which state of the pair it is in. Then we use these sequences as a heuristic and find the optimal action in each step of the greedy strategy using this heuristic. We have tested our method on the available large classical test benchmarks in various domains. The resulting total reward is close to, if not greater than, the total reward obtained by other state of the art POMDP solvers, while the time required to find the solution is always much less.

APA, Harvard, Vancouver, ISO, and other styles

42

Luo, Yuanfu, Haoyu Bai, David Hsu, and Wee Sun Lee. "Importance sampling for online planning under uncertainty." International Journal of Robotics Research 38, no. 2-3 (June 19, 2018): 162–81. http://dx.doi.org/10.1177/0278364918780322.

Full text

Abstract:

The partially observable Markov decision process (POMDP) provides a principled general framework for robot planning under uncertainty. Leveraging the idea of Monte Carlo sampling, recent POMDP planning algorithms have scaled up to various challenging robotic tasks, including, real-time online planning for autonomous vehicles. To further improve online planning performance, this paper presents IS-DESPOT, which introduces importance sampling to DESPOT, a state-of-the-art sampling-based POMDP algorithm for planning under uncertainty. Importance sampling improves DESPOT’s performance when there are critical, but rare events, which are difficult to sample. We prove that IS-DESPOT retains the theoretical guarantee of DESPOT. We demonstrate empirically that importance sampling significantly improves the performance of online POMDP planning for suitable tasks. We also present a general method for learning the importance sampling distribution.

APA, Harvard, Vancouver, ISO, and other styles

43

Rozek, Brandon, Junkyu Lee, Harsha Kokel, Michael Katz, and Shirin Sohrabi. "Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23635–36. http://dx.doi.org/10.1609/aaai.v38i21.30504.

Full text

Abstract:

Partially observable Markov decision processes (POMDPs) challenge reinforcement learning agents due to incomplete knowledge of the environment. Even assuming monotonicity in uncertainty, it is difficult for an agent to know how and when to stop exploring for a given task. In this abstract, we discuss how to use hierarchical reinforcement learning (HRL) and AI Planning (AIP) to improve exploration when the agent knows possible valuations of unknown predicates and how to discover them. By encoding the uncertainty in an abstract planning model, the agent can derive a high-level plan which is then used to decompose the overall POMDP into a tree of semi-POMDPs for training. We evaluate our agent's performance on the MiniGrid domain and show how guided exploration may improve agent performance.

APA, Harvard, Vancouver, ISO, and other styles

44

BUI, TRUNG H., MANNES POEL, ANTON NIJHOLT, and JOB ZWIERS. "A tractable hybrid DDN–POMDP approach to affective dialogue modeling for probabilistic frame-based dialogue systems." Natural Language Engineering 15, no. 2 (April 2009): 273–307. http://dx.doi.org/10.1017/s1351324908005032.

Full text

Abstract:

AbstractWe propose a novel approach to developing a tractable affective dialogue model for probabilistic frame-based dialogue systems. The affective dialogue model, based on Partially Observable Markov Decision Process (POMDP) and Dynamic Decision Network (DDN) techniques, is composed of two main parts: the slot-level dialogue manager and the global dialogue manager. It has two new features: (1) being able to deal with a large number of slots and (2) being able to take into account some aspects of the user's affective state in deriving the adaptive dialogue strategies. Our implemented prototype dialogue manager can handle hundreds of slots, where each individual slot might have hundreds of values. Our approach is illustrated through a route navigation example in the crisis management domain. We conducted various experiments to evaluate our approach and to compare it with approximate POMDP techniques and handcrafted policies. The experimental results showed that the DDN–POMDP policy outperforms three handcrafted policies when the user's action error is induced by stress as well as when the observation error increases. Further, performance of the one-step look-ahead DDN–POMDP policy after optimizing its internal reward is close to state-of-the-art approximate POMDP counterparts.

APA, Harvard, Vancouver, ISO, and other styles

45

Walraven, Erwin, and Matthijs T. J. Spaan. "Point-Based Value Iteration for Finite-Horizon POMDPs." Journal of Artificial Intelligence Research 65 (July 11, 2019): 307–41. http://dx.doi.org/10.1613/jair.1.11324.

Full text

Abstract:

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

APA, Harvard, Vancouver, ISO, and other styles

46

Park, Jaeyoung, Kee-Eung Kim, and Yoon-Kyu Song. "A POMDP-Based Optimal Control of P300-Based Brain-Computer Interfaces." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 1559–62. http://dx.doi.org/10.1609/aaai.v25i1.7956.

Full text

Abstract:

Most of the previous work on brain-computer interfaces (BCIs) exploiting the P300 in electroencephalography (EEG) has focused on low-level signal processing algorithms such as feature extraction and classification methods. Although a significant improvement has been made in the past, the accuracy of detecting P300 is limited by the inherently low signal-to-noise ratio in EEGs. In this paper, we present a systematic approach to optimize the interface using partially observable Markov decision processes (POMDPs). Through experiments involving human subjects, we show the P300 speller system that is optimized using the POMDP achieves a significant performance improvement in terms of the communication bandwidth in the interaction.

APA, Harvard, Vancouver, ISO, and other styles

47

Lee, Nian-Ze, and Jie-Hong R. Jiang. "Dependency Stochastic Boolean Satisfiability: A Logical Formalism for NEXPTIME Decision Problems with Uncertainty." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 5 (May 18, 2021): 3877–85. http://dx.doi.org/10.1609/aaai.v35i5.16506.

Full text

Abstract:

Stochastic Boolean Satisfiability (SSAT) is a logical formalism to model decision problems with uncertainty, such as Partially Observable Markov Decision Process (POMDP) for verification of probabilistic systems. SSAT, however, is limited by its descriptive power within the PSPACE complexity class. More complex problems, such as the NEXPTIME-complete Decentralized POMDP (Dec-POMDP), cannot be succinctly encoded with SSAT. To provide a logical formalism of such problems, we generalize the Dependency Quantified Boolean Formula (DQBF), a representative problem in the NEXPTIME-complete class, to its stochastic variant, named Dependency SSAT (DSSAT), and show that DSSAT is also NEXPTIME-complete. We demonstrate the potential applications of DSSAT to circuit synthesis of probabilistic and approximate design. Furthermore, to study the descriptive power of DSSAT, we establish a polynomial-time reduction from Dec-POMDP to DSSAT. With the theoretical foundations paved in this work, we hope to encourage the development of DSSAT solvers for potential broad applications.

APA, Harvard, Vancouver, ISO, and other styles

48

Wan, Xiao Ping, and Shu Yu Li. "SHP-VI Method of Solving DEC-POMDP Problem." Advanced Materials Research 926-930 (May 2014): 3245–49. http://dx.doi.org/10.4028/www.scientific.net/amr.926-930.3245.

Full text

Abstract:

DEC-POMDP(Distributed Partially Observable Markov Decision Process) model is a multi-agent model of collaborative decision-making is important, but due to an alarming number of DEC-POMDP problem state space and great strategy solution space, so DEC-POMDP solution of the problem becomes very difficult. The agent from the initial state to the target state during the interaction with the environment, the system's maximum benefit is often only with some small amount of a higher reward states. This article by searching from the initial belief state to the target state to get a shortest Hamiltonian path, according to the corresponding sequence of actions on the path forward search to get faith belief state space trajectory, and then along the trajectory reverse convictions value function iteration, thus forming the state with the largest gains beliefs trajectory corresponding optimal strategy. In this paper, shortest Hamiltonian path-based value iteration to search the optimal path of faith so as to solve the state Hamiltonian larger DEC-POMDP problem.

APA, Harvard, Vancouver, ISO, and other styles

49

Yang, Qiming, Jiancheng Xu, Haibao Tian, and Yong Wu. "Decision Modeling of UAV On-Line Path Planning Based on IMM." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 36, no. 2 (April 2018): 323–31. http://dx.doi.org/10.1051/jnwpu/20183620323.

Full text

Abstract:

In order to enhance the capability of tracking targets autonomously of UAV, a model for UAV on-line path planning is established based on the theoretical framework of partially observable markov decision process(POMDP). The elements of the POMDP model are analyzed and described. According to the diversity of the target motion in real world, the law of state transition in POMDP model is described by the method of Interactive Multiple Model(IMM) To adapt to the target maneuvering changes. The action strategy of the UAV is calculated through nominal belief-state optimization(NBO) algorithm which is designed to search optimal action policy to minimize the cumulative cost of action. The generated action strategy controls the UAV flight. The simulation results show that the established POMDP model can achieve autonomous planning for UAV route, and it can control the UAV to effectively track target. The planning path is more reasonable and efficient than the result of using single state transition law.

APA, Harvard, Vancouver, ISO, and other styles

50

Wang, Erli, Hanna Kurniawati, and Dirk Kroese. "An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 273–77. http://dx.doi.org/10.1609/icaps.v28i1.13906.

Full text

Abstract:

Making principled decisions in the presence of uncertainty is often facilitated by Partially Observable Markov Decision Processes (POMDPs). Despite tremendous advances in POMDP solvers, finding good policies with large action spaces remains difficult. To alleviate this difficulty, this paper presents an on-line approximate solver, called Quantile-Based Action Selector (QBASE). It uses quantile-statistics to adaptively evaluate a small subset of the action space without sacrificing the quality of the generated decision strategies by much. Experiments on four different robotics tasks with up to 10,000 actions indicate that QBASE can generate substantially better strategies than a state-of-the-art method.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!