Academic literature on the topic 'SMT, planning, POMDP, POMCP'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'SMT, planning, POMDP, POMCP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "SMT, planning, POMDP, POMCP"

1

Mazzi, Giulio, Alberto Castellini, and Alessandro Farinelli. "Rule-based Shielding for Partially Observable Monte-Carlo Planning." Proceedings of the International Conference on Automated Planning and Scheduling 31 (May 17, 2021): 243–51. http://dx.doi.org/10.1609/icaps.v31i1.15968.

Full text
Abstract:
Partially Observable Monte-Carlo Planning (POMCP) is a powerful online algorithm able to generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. The lack of an explicit representation however hinders policy interpretability and makes policy verification very complex. In this work, we propose two contributions. The first is a method for identifying unexpected actions selected by POMCP with respect to expert prior knowledge of the task. The second is a shielding approach that prevents POMCP from selecting unexpected actions. The first method is based on Satisfiability Modulo Theory (SMT). It inspects traces (i.e., sequences of belief-action-observation triplets) generated by POMCP to compute the parameters of logical formulas about policy properties defined by the expert. The second contribution is a module that uses online the logical formulas to identify anomalous actions selected by POMCP and substitute those actions with actions that satisfy the logical formulas fulfilling expert knowledge. We evaluate our approach on Tiger, a standard benchmark for POMDPs, and a real-world problem related to mobile robot navigation. Results show that the shielded POMCP outperforms the standard POMCP in a case study in which a wrong parameter of POMCP makes it select wrong actions from time to time. Moreover, we show that the approach keeps good performance also if the parameters of the logical formula are optimized using trajectories containing some wrong actions.
APA, Harvard, Vancouver, ISO, and other styles
2

Zhang, Zongzhang, Michael Littman, and Xiaoping Chen. "Covering Number as a Complexity Measure for POMDP Planning and Learning." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1853–59. http://dx.doi.org/10.1609/aaai.v26i1.8360.

Full text
Abstract:
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
APA, Harvard, Vancouver, ISO, and other styles
3

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Full text
Abstract:
This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.
APA, Harvard, Vancouver, ISO, and other styles
4

Ye, Nan, Adhiraj Somani, David Hsu, and Wee Sun Lee. "DESPOT: Online POMDP Planning with Regularization." Journal of Artificial Intelligence Research 58 (January 26, 2017): 231–66. http://dx.doi.org/10.1613/jair.5328.

Full text
Abstract:
The partially observable Markov decision process (POMDP) provides a principled general framework for planning under uncertainty, but solving POMDPs optimally is computationally intractable, due to the "curse of dimensionality" and the "curse of history". To overcome these challenges, we introduce the Determinized Sparse Partially Observable Tree (DESPOT), a sparse approximation of the standard belief tree, for online planning under uncertainty. A DESPOT focuses online planning on a set of randomly sampled scenarios and compactly captures the "execution" of all policies under these scenarios. We show that the best policy obtained from a DESPOT is near-optimal, with a regret bound that depends on the representation size of the optimal policy. Leveraging this result, we give an anytime online planning algorithm, which searches a DESPOT for a policy that optimizes a regularized objective function. Regularization balances the estimated value of a policy under the sampled scenarios and the policy size, thus avoiding overfitting. The algorithm demonstrates strong experimental results, compared with some of the best online POMDP algorithms available. It has also been incorporated into an autonomous driving system for real-time vehicle control. The source code for the algorithm is available online.
APA, Harvard, Vancouver, ISO, and other styles
5

Chatterjee, Krishnendu, Martin Chmelik, and Ufuk Topcu. "Sensor Synthesis for POMDPs with Reachability Objectives." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 47–55. http://dx.doi.org/10.1609/icaps.v28i1.13875.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize “weakest” additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.
APA, Harvard, Vancouver, ISO, and other styles
6

NI, YAODONG, and ZHI-QIANG LIU. "BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 21, no. 06 (December 2013): 821–63. http://dx.doi.org/10.1142/s0218488513500396.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.
APA, Harvard, Vancouver, ISO, and other styles
7

Spaan, M. T. J., and N. Vlassis. "Perseus: Randomized Point-based Value Iteration for POMDPs." Journal of Artificial Intelligence Research 24 (August 1, 2005): 195–220. http://dx.doi.org/10.1613/jair.1659.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.
APA, Harvard, Vancouver, ISO, and other styles
8

Amato, Christopher, George Konidaris, Ariel Anders, Gabriel Cruz, Jonathan P. How, and Leslie P. Kaelbling. "Policy search for multi-robot coordination under uncertainty." International Journal of Robotics Research 35, no. 14 (December 2016): 1760–78. http://dx.doi.org/10.1177/0278364916679611.

Full text
Abstract:
We introduce a principled method for multi-robot coordination based on a general model (termed a MacDec-POMDP) of multi-robot cooperative planning in the presence of stochasticity, uncertain sensing, and communication limitations. A new MacDec-POMDP planning algorithm is presented that searches over policies represented as finite-state controllers, rather than the previous policy tree representation. Finite-state controllers can be much more concise than trees, are much easier to interpret, and can operate over an infinite horizon. The resulting policy search algorithm requires a substantially simpler simulator that models only the outcomes of executing a given set of motor controllers, not the details of the executions themselves and can solve significantly larger problems than existing MacDec-POMDP planners. We demonstrate significant performance improvements over previous methods and show that our method can be used for actual multi-robot systems through experiments on a cooperative multi-robot bartending domain.
APA, Harvard, Vancouver, ISO, and other styles
9

Pineau, J., G. Gordon, and S. Thrun. "Anytime Point-Based Approximations for Large POMDPs." Journal of Artificial Intelligence Research 27 (November 26, 2006): 335–80. http://dx.doi.org/10.1613/jair.2078.

Full text
Abstract:
The Partially Observable Markov Decision Process has long been recognized as a rich framework for real-world planning and control problems, especially in robotics. However exact solutions in this framework are typically computationally intractable for all but the smallest problems. A well-known technique for speeding up POMDP solving involves performing value backups at specific belief points, rather than over the entire belief simplex. The efficiency of this approach, however, depends greatly on the selection of points. This paper presents a set of novel techniques for selecting informative belief points which work well in practice. The point selection procedure is combined with point-based value backups to form an effective anytime POMDP algorithm called Point-Based Value Iteration (PBVI). The first aim of this paper is to introduce this algorithm and present a theoretical analysis justifying the choice of belief selection technique. The second aim of this paper is to provide a thorough empirical comparison between PBVI and other state-of-the-art POMDP methods, in particular the Perseus algorithm, in an effort to highlight their similarities and differences. Evaluation is performed using both standard POMDP domains and realistic robotic tasks.
APA, Harvard, Vancouver, ISO, and other styles
10

Wu, Chenyang, Rui Kong, Guoyu Yang, Xianghan Kong, Zongzhang Zhang, Yang Yu, Dong Li, and Wulong Liu. "LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15927–28. http://dx.doi.org/10.1609/aaai.v35i18.17960.

Full text
Abstract:
Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only $K$ scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "SMT, planning, POMDP, POMCP"

1

Mazzi, Giulio, Alberto Castellini, and Alessandro Farinelli. "Rule-Based Policy Interpretation and Shielding for Partially Observable Monte Carlo Planning." Doctoral thesis, 2022. http://hdl.handle.net/11562/1067927.

Full text
Abstract:
Partially Observable Monte Carlo Planning (POMCP) is a powerful online algorithm that can generate approximate policies for large Partially Observable Markov Decision Processes. The online nature of this method supports scalability by avoiding complete policy representation. However, the lack of an explicit representation of the policy hinders interpretability. In this thesis, we propose a methodology based on Maximum Satisfiability Modulo Theory (MAX-SMT) for analyzing POMCP policies by inspecting their traces, namely, sequences of belief-action pairs generated by the algorithm. The proposed method explores local properties of the policy to build a compact and informative summary of the policy behaviour. This representation exploits a high-level description encoded using logical formulas that domain experts can provide. The final formula can be used to identify unexpected decisions, namely, decisions that violate the expert indications. We show that this identification process can be used offline (to improve the explainability of the policy and to identify anomalous behaviours) or online (to shield the decisions of the POMCP algorithm). We also present an active methodology that can effectively query a POMCP policy to build more reliable descriptions quickly. We extensively evaluate our methodologies on two standard benchmarks for POMDPs, namely, emph{tiger} and emph{rocksample}, and on a problem related to velocity regulation in mobile robot navigation. Results show that our approach achieves good performance due to its capability to exploit experts' knowledge of the domains. Specifically, our approach can be used both to identify anomalous behaviours in faulty POMCPs and to improve the performance of the system by using the shielding mechanism. In the first case, we test the methodology against a state-of-the-art anomaly detection algorithm, while in the second, we compared the performance of shielded and unshielded POMCPs. We implemented our methodology in CC, and the code is open-source and available at href{https://github.com/GiuMaz/XPOMCP}{https://github.com/GiuMaz/XPOMCP}.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "SMT, planning, POMDP, POMCP"

1

Wang, Yunbo, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, and Joshua B. Tenenbaum. "DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/579.

Full text
Abstract:
A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.
APA, Harvard, Vancouver, ISO, and other styles
2

Yang, Shuo, Xinjun Mao, and Wanwei Liu. "Towards an Extended POMDP Planning Approach with Adjoint Action Model for Robotic Task." In 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC). IEEE, 2020. http://dx.doi.org/10.1109/smc42975.2020.9283277.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Phan, Thomy, Thomas Gabor, Robert Müller, Christoph Roch, and Claudia Linnhoff-Popien. "Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/778.

Full text
Abstract:
We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography