Academic literature on the topic 'POMDP'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'POMDP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "POMDP"

1

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.
APA, Harvard, Vancouver, ISO, and other styles
2

Kim, Sung-Kyun, Oren Salzman, and Maxim Likhachev. "POMHDP: Search-Based Belief Space Planning Using Multiple Heuristics." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 734–44. http://dx.doi.org/10.1609/icaps.v29i1.3542.

Full text
Abstract:
Robots operating in the real world encounter substantial uncertainty that cannot be modeled deterministically before the actual execution. This gives rise to the necessity of robust motion planning under uncertainty also known as belief space planning. Belief space planning can be formulated as Partially Observable Markov Decision Processes (POMDPs). However, computing optimal policies for non-trivial POMDPs is computationally intractable. Building upon recent progress from the search community, we propose a novel anytime POMDP solver, Partially Observable Multi-Heuristic Dynamic Programming (POMHDP), that leverages multiple heuristics to efficiently compute high-quality solutions while guaranteeing asymptotic convergence to an optimal policy. Through iterative forward search, POMHDP utilizes domain knowledge to solve POMDPs with specific goals and an infinite horizon. We demonstrate the efficacy of our proposed framework on a real-world, highly-complex, truck unloading application.
APA, Harvard, Vancouver, ISO, and other styles
3

Lim, Michael H., Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, and Zachary N. Sunberg. "Optimality Guarantees for Particle Belief Approximation of POMDPs." Journal of Artificial Intelligence Research 77 (August 27, 2023): 1591–636. http://dx.doi.org/10.1613/jair.1.14525.

Full text
Abstract:
Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of O(C), where C is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.
APA, Harvard, Vancouver, ISO, and other styles
4

Brafman, Ronen, Guy Shani, and Shlomo Zilberstein. "Qualitative Planning under Partial Observability in Multi-Agent Domains." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 130–37. http://dx.doi.org/10.1609/aaai.v27i1.8643.

Full text
Abstract:
Decentralized POMDPs (Dec-POMDPs) provide a rich, attractive model for planning under uncertainty and partial observability in cooperative multi-agent domains with a growing body of research. In this paper we formulate a qualitative, propositional model for multi-agent planning under uncertainty with partial observability, which we call Qualitative Dec-POMDP (QDec-POMDP). We show that the worst-case complexity of planning in QDec-POMDPs is similar to that of Dec-POMDPs. Still, because the model is more “classical” in nature, it is more compact and easier to specify. Furthermore, it eases the adaptation of methods used in classical and contingent planning to solve problems that challenge current Dec-POMDPs solvers. In particular, in this paper we describe a method based on compilation to classical planning, which handles multi-agent planning problems significantly larger than those handled by current Dec-POMDP algorithms.
APA, Harvard, Vancouver, ISO, and other styles
5

Zhang, Zongzhang, Michael Littman, and Xiaoping Chen. "Covering Number as a Complexity Measure for POMDP Planning and Learning." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1853–59. http://dx.doi.org/10.1609/aaai.v26i1.8360.

Full text
Abstract:
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Chenyang, Rui Kong, Guoyu Yang, Xianghan Kong, Zongzhang Zhang, Yang Yu, Dong Li, and Wulong Liu. "LB-DESPOT: Efficient Online POMDP Planning Considering Lower Bound in Action Selection (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15927–28. http://dx.doi.org/10.1609/aaai.v35i18.17960.

Full text
Abstract:
Partially observable Markov decision process (POMDP) is an extension to MDP. It handles the state uncertainty by specifying the probability of getting a particular observation given the current state. DESPOT is one of the most popular scalable online planning algorithms for POMDPs, which manages to significantly reduce the size of the decision tree while deriving a near-optimal policy by considering only $K$ scenarios. Nevertheless, there is a gap in action selection criteria between planning and execution in DESPOT. During the planning stage, it keeps choosing the action with the highest upper bound, whereas when the planning ends, the action with the highest lower bound is chosen for execution. Here, we propose LB-DESPOT to alleviate this issue, which utilizes the lower bound in selecting an action branch to expand. Empirically, our method has attained better performance than DESPOT and POMCP, which is another state-of-the-art, on several challenging POMDP benchmark tasks.
APA, Harvard, Vancouver, ISO, and other styles
7

Carvalho Chanel, Caroline, Florent Teichteil-Königsbuch, and Charles Lesire. "Multi-Target Detection and Recognition by UAVs Using Online POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 29, 2013): 1381–87. http://dx.doi.org/10.1609/aaai.v27i1.8551.

Full text
Abstract:
This paper tackles high-level decision-making techniques for robotic missions, which involve both active sensing and symbolic goal reaching, under uncertain probabilistic environments and strong time constraints. Our case study is a POMDP model of an online multi-target detection and recognition mission by an autonomous UAV. The POMDP model of the multi-target detection and recognition problem is generated online from a list of areas of interest, which are automatically extracted at the beginning of the flight from a coarse-grained high altitude observation of the scene. The POMDP observation model relies on a statistical abstraction of an image processing algorithm's output used to detect targets. As the POMDP problem cannot be known and thus optimized before the beginning of the flight, our main contribution is an "optimize-while-execute" algorithmic framework: it drives a POMDP sub-planner to optimize and execute the POMDP policy in parallel under action duration constraints. We present new results from real outdoor flights and SAIL simulations, which highlight both the benefits of using POMDPs in multi-target detection and recognition missions, and of our "optimize-while-execute" paradigm.
APA, Harvard, Vancouver, ISO, and other styles
8

Hoerger, Marcus, Joshua Song, Hanna Kurniawati, and Alberto Elfes. "POMDP-Based Candy Server:Lessons Learned from a Seven Day Demo." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 698–706. http://dx.doi.org/10.1609/icaps.v29i1.3538.

Full text
Abstract:
An autonomous robot must decide a good strategy to achieve its long term goal, despite various types of uncertainty. The Partially Observable Markov Decision Processes (POMDPs) is a principled framework to address such a decision making problem. Despite the computational intractability of solving POMDPs, the past decade has seen substantial advancement in POMDP solvers. This paper presents our experience in enabling on-line POMDP solving to become the sole motion planner for a robot manipulation demo at IEEE SIMPAR and ICRA 2018. The demo scenario is a candy-serving robot: A 6-DOFs robot arm must pick-up a cup placed on a table by a user, use the cup to scoop candies from a box, and put the cup of candies back on the table. The average perception error is ∼3cm (≈ the radius of the cup), affecting the position of the cup and the surface level of the candies. This paper presents a strategy to alleviate the curse of history issue plaguing this scenario, the perception system and its integration with the planner, and lessons learned in enabling an online POMDP solver to become the sole motion planner of this entire task. The POMDP-based system were tested through a 7 days live demo at the two conferences. In this demo, 150 runs were attempted and 98% of them were successful. We also conducted further experiments to test the capability of our POMDP-based system when the environment is relatively cluttered by obstacles and when the user moves the cup while the robot tries to pick it up. In both cases, our POMDP-based system reaches a success rate of 90% and above.
APA, Harvard, Vancouver, ISO, and other styles
9

Khonji, Majid, and Duoaa Khalifa. "Heuristic Search in Dual Space for Constrained Fixed-Horizon POMDPs with Durative Actions." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14927–36. http://dx.doi.org/10.1609/aaai.v37i12.26743.

Full text
Abstract:
The Partially Observable Markov Decision Process (POMDP) is widely used in probabilistic planning for stochastic domains. However, current extensions, such as constrained and chance-constrained POMDPs, have limitations in modeling real-world planning problems because they assume that all actions have a fixed duration. To address this issue, we propose a unified model that encompasses durative POMDP and its constrained extensions. To solve the durative POMDP and its constrained extensions, we first convert them into an Integer Linear Programming (ILP) formulation. This approach leverages existing solvers in the ILP literature and provides a foundation for solving these problems. We then introduce a heuristic search approach that prunes the search space, which is guided by solving successive partial ILP programs. Our empirical evaluation results show that our approach outperforms the current state-of-the-art fixed-horizon chance-constrained POMDP solver.
APA, Harvard, Vancouver, ISO, and other styles
10

Meli, Daniele, Alberto Castellini, and Alessandro Farinelli. "Learning Logic Specifications for Policy Guidance in POMDPs: an Inductive Logic Programming Approach." Journal of Artificial Intelligence Research 79 (February 28, 2024): 725–76. http://dx.doi.org/10.1613/jair.1.15826.

Full text
Abstract:
Partially Observable Markov Decision Processes (POMDPs) are a powerful framework for planning under uncertainty. They allow to model state uncertainty as a belief probability distribution. Approximate solvers based on Monte Carlo sampling show great success to relax the computational demand and perform online planning. However, scaling to complex realistic domains with many actions and long planning horizons is still a major challenge, and a key point to achieve good performance is guiding the action-selection process with domain-dependent policy heuristics which are tailored for the specific application domain. We propose to learn high-quality heuristics from POMDP traces of executions generated by any solver. We convert the belief-action pairs to a logical semantics, and exploit data- and time-efficient Inductive Logic Programming (ILP) to generate interpretable belief-based policy specifications, which are then used as online heuristics. We evaluate thoroughly our methodology on two notoriously challenging POMDP problems, involving large action spaces and long planning horizons, namely, rocksample and pocman. Considering different state-of-the-art online POMDP solvers, including POMCP, DESPOT and AdaOPS, we show that learned heuristics expressed in Answer Set Programming (ASP) yield performance superior to neural networks and similar to optimal handcrafted task-specific heuristics within lower computational time. Moreover, they well generalize to more challenging scenarios not experienced in the training phase (e.g., increasing rocks and grid size in rocksample, incrementing the size of the map and the aggressivity of ghosts in pocman).
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "POMDP"

1

Folsom-Kovarik, Jeremiah. "Leveraging Help Requests in POMDP Intelligent Tutors." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5210.

Full text
Abstract:
Intelligent tutoring systems (ITSs) are computer programs that model individual learners and adapt instruction to help each learner differently. One way ITSs differ from human tutors is that few ITSs give learners a way to ask questions. When learners can ask for help, their questions have the potential to improve learning directly and also act as a new source of model data to help the ITS personalize instruction. Inquiry modeling gives ITSs the ability to answer learner questions and refine their learner models with an inexpensive new input channel. In order to support inquiry modeling, an advanced planning formalism is applied to ITS learner modeling. Partially observable Markov decision processes (POMDPs) differ from more widely used ITS architectures because they can plan complex action sequences in uncertain situations with machine learning. Tractability issues have previously precluded POMDP use in ITS models. This dissertation introduces two improvements, priority queues and observation chains, to make POMDPs scale well and encompass the large problem sizes that real-world ITSs must confront. A new ITS was created to support trainees practicing a military task in a virtual environment. The development of the Inquiry Modeling POMDP Adaptive Trainer (IMP) began with multiple formative studies on human and simulated learners that explored inquiry modeling and POMDPs in intelligent tutoring. The studies suggest the new POMDP representations will be effective in ITS domains having certain common characteristics. Finally, a summative study evaluated IMP's ability to train volunteers in specific practice scenarios. IMP users achieved post-training scores averaging up to 4.5 times higher than users who practiced without support and up to twice as high as trainees who used an ablated version of IMP with no inquiry modeling. IMP's implementation and evaluation helped explore questions about how inquiry modeling and POMDP ITSs work, while empirically demonstrating their efficacy.
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
APA, Harvard, Vancouver, ISO, and other styles
2

Kaplow, Robert. "Point-based POMDP solvers survey and comparative analysis /." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:8881/R/?func=dbin-jump-full&object_id=92275.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Png, ShaoWei. "Bayesian reinforcement learning for POMDP-based dialogue systems." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104830.

Full text
Abstract:
Spoken dialogue systems are gaining popularity with improvements in speech recognition technologies. Dialogue systems have been modeled effectively using Partially observable Markov decision processes (POMDPs), achieving improvements in robustness. However, past research on POMDP-based dialogue systems usually assumes that the model parameters are known. This limitation can be addressed through model-based Bayesian reinforcement learning, which offers a rich framework for simultaneous learning and planning. However, due to the high complexity of the framework, a major challenge is to scale up these algorithms for complex dialogue systems. In this work, we show that by exploiting certain known components of the system, such as knowledge of symmetrical properties, and using an approximate on-line planning algorithm, we are able to apply Bayesian RL on several realistic spoken dialogue system domains. We consider several experimental domains. First, a small synthetic data case, where we illustrate several properties of the approach. Second, a small dialogue manager based on the SACTI1 corpus which contains 144 dialogues between 36 users and 12 experts. Third, a dialogue manager aimed at patients with dementia, to assist them with activities of daily living. Finally, we consider a large dialogue manager designed to help patients to operate a wheelchair.
Les systèmes de dialogues sont de plus en plus populaires depuis l'amélioration des technologies de reconnaissance vocale. Ces systèmes de dialogues peuvent être modélisés efficacement à l'aide des processus de décision markoviens partiellement observables (POMDP). Toutefois, les recherches antérieures supposent généralement une connaissance des paramètres du modèle. L'apprentissage par renforcement basée sur un modèle bayéesien, qui offre un cadre riche pour l'apprentissage et la planification simultanéee, peut éeliminer la néecessitée de cette supposition à cause de la grande complexitée du cadre, le déeveloppement de ces algorithmes pour les systèmes de dialogues complexes repréesente un déefi majeur. Dans ce document, nous déemontrons qu'en exploitant certaines propriéetées connues du système, comme les syméetries, et en utilisant un algorithme de planification approximatif en ligne, nous sommes capables d'appliquer les techniques d'apprentissage par renforcement bayéesien dans le cadre de sur plusieurs domaines de dialogues réealistes. Nous considéerons quelques domaines expéerimentaux. Le premier comprend des donnéees synthéetiques qui servent à illustrer plusieurs propriéetées de notre approche. Le deuxième est un gestionnaire de dialogues basée sur le corpus SACTI1 qui contient 144 dialogues entre 36 utilisateurs et 12 experts. Le troisième gestionnaire aide les patients atteints de déemence à vivre au quotidien. Finalement, nous considéerons un grand gestionnaire de dialogue qui assise des patients à manoeuvrer une chaise roulante automatiséee.
APA, Harvard, Vancouver, ISO, and other styles
4

Chinaei, Hamid Reza. "Learning Dialogue POMDP Model Components from Expert Dialogues." Thesis, Université Laval, 2013. http://www.theses.ulaval.ca/2013/29690/29690.pdf.

Full text
Abstract:
Un système de dialogue conversationnel doit aider les utilisateurs humains à atteindre leurs objectifs à travers des dialogues naturels et efficients. C'est une tache toutefois difficile car les langages naturels sont ambiguës et incertains, de plus le système de reconnaissance vocale (ASR) est bruité. À cela s'ajoute le fait que l'utilisateur humain peut changer son intention lors de l'interaction avec la machine. Dans ce contexte, l'application des processus décisionnels de Markov partiellement observables (POMDPs) au système de dialogue conversationnel nous a permis d'avoir un cadre formel pour représenter explicitement les incertitudes, et automatiser la politique d'optimisation. L'estimation des composantes du modelé d'un POMDP-dialogue constitue donc un défi important, car une telle estimation a un impact direct sur la politique d'optimisation du POMDP-dialogue. Cette thèse propose des méthodes d'apprentissage des composantes d'un POMDPdialogue basées sur des dialogues bruités et sans annotation. Pour cela, nous présentons des méthodes pour apprendre les intentions possibles des utilisateurs à partir des dialogues, en vue de les utiliser comme états du POMDP-dialogue, et l'apprendre un modèle du maximum de vraisemblance à partir des données, pour transition du POMDP. Car c'est crucial de réduire la taille d'état d'observation, nous proposons également deux modèles d'observation: le modelé mot-clé et le modelé intention. Dans les deux modèles, le nombre d'observations est réduit significativement tandis que le rendement reste élevé, particulièrement dans le modele d'observation intention. En plus de ces composantes du modèle, les POMDPs exigent également une fonction de récompense. Donc, nous proposons de nouveaux algorithmes pour l'apprentissage du modele de récompenses, un apprentissage qui est basé sur le renforcement inverse (IRL). En particulier, nous proposons POMDP-IRL-BT qui fonctionne sur les états de croyance disponibles dans les dialogues du corpus. L'algorithme apprend le modele de récompense par l'estimation du modele de transition de croyance, semblable aux modèles de transition des états dans un MDP (processus décisionnel de Markov). Finalement, nous appliquons les méthodes proposées à un domaine de la santé en vue d'apprendre un POMDP-dialogue et ce essentiellement à partir de dialogues réels, bruités, et sans annotations.
Spoken dialogue systems should realize the user intentions and maintain a natural and efficient dialogue with users. This is however a difficult task as spoken language is naturally ambiguous and uncertain, and further the automatic speech recognition (ASR) output is noisy. In addition, the human user may change his intention during the interaction with the machine. To tackle this difficult task, the partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while supporting automated policy solving. In this context, estimating the dialogue POMDP model components is a signifficant challenge as they have a direct impact on the optimized dialogue POMDP policy. This thesis proposes methods for learning dialogue POMDP model components using noisy and unannotated dialogues. Speciffically, we introduce techniques to learn the set of possible user intentions from dialogues, use them as the dialogue POMDP states, and learn a maximum likelihood POMDP transition model from data. Since it is crucial to reduce the observation state size, we then propose two observation models: the keyword model and the intention model. Using these two models, the number of observations is reduced signifficantly while the POMDP performance remains high particularly in the intention POMDP. In addition to these model components, POMDPs also require a reward function. So, we propose new algorithms for learning the POMDP reward model from dialogues based on inverse reinforcement learning (IRL). In particular, we propose the POMDP-IRL-BT algorithm (BT for belief transition) that works on the belief states available in the dialogues. This algorithm learns the reward model by estimating a belief transition model, similar to MDP (Markov decision process) transition models. Ultimately, we apply the proposed methods on a healthcare domain and learn a dialogue POMDP essentially from real unannotated and noisy dialogues.
APA, Harvard, Vancouver, ISO, and other styles
5

Li, Xin. "POMDP compression and decomposition via belief state analysis." HKBU Institutional Repository, 2009. http://repository.hkbu.edu.hk/etd_ra/1012.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Zheltova, Ludmila. "STRUCTURED MAINTENANCE POLICIES ON INTERIOR SAMPLE PATHS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1264627939.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Memarzadeh, Milad. "System-Level Adaptive Monitoring and Control of Infrastructures: A POMDP-Based Framework." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/664.

Full text
Abstract:
Many infrastructure systems in the US such as road networks, bridges, water and wastewater pipelines, and wind farms are aging and their condition are deteriorating. Accurate risk analysis is crucial to extend the life span of these systems, and to guide decision making towards a sustainable use of resources. These systems are subjected to fatigue-induced degradation and need periodic inspections and repairs, which are usually performed through semi-annual, annual, or bi-annual scheduled maintenance. However, better maintenance can be achieved by flexible policies based on prior knowledge of the degradation process and on data collected in the field by sensors and visual inspections. Traditional methods to model the operation and maintenance (O&M) process, such as Markov decision processes (MDP) and partially observable MDP (POMDP) have limitations that do not allow the model to properly include the knowledge available and that may result in nonoptimal strategies for management of infrastructure systems. Specifically, the conditional probabilities for modeling the degradation process and the precision of the observations are usually affected by epistemic uncertainty: this cannot be captured by traditional methods. The goal of this dissertation is to propose a computational framework for adaptive monitoring and control of infrastructures at the system-level and to connect different aspects of the management process together. The first research question we address is how to take optimal sequential decisions under model uncertainty. Second, we propose how to combine decision optimization with learning of the degradation of components and the precision of monitoring system. Specifically, we address the issue of systems made by similar components, where iv transfer of knowledge across components is relevant. Finally, we propose how to assess the value of information in sequential decision making and whether it can be used as a heuristic for system-level inspection scheduling. In this dissertation, first a novel learning and planning method is proposed, called “Planning and Learning for Uncertain dynamic Systems” (PLUS), that can learn from the environment, update the distributions of parameters, and select the optimal strategy considering the uncertainty related to the model. Validating with synthetic data, the total management cost of operating a wind farm using PLUS is shown to be significantly less than costs achieved by a fixed policy or though the POMDP framework. Moreover, when the system is made up by similar components, data collected on one is also relevant in the management of others. This is typically the case of wind farms, which are made up by similar turbines. PLUS models the components as independent or identical and eithers learn the model for each component independently or learn a global model for all components. We extend that formulation, allowing for a weaker similarity among components. The proposed approach, called “Multiple Uncertain POMDP” (MU-POMDP), models the components as POMDPs, and assumes the corresponding model parameters as dependent random variables. By using this framework, we can calibrate specific degradation and emission models for each component while, at the same time, processing observations at the level of the entire system. We evaluate the performance of MU-POMDP compared to PLUS and discuss its potentials and computational complexity. Lastly, operation and maintenance of an infrastructure system rely on information collected on its components, which can provide the decision maker with an accurate assessment of their condition states. However, resources to be invested in data gathering are usually limited and v observations should be collected based on their value of information (VoI). VoI is a key concept for directing explorative actions, and in the context of infrastructure operation and maintenance, it has application to decisions about inspecting and monitoring the condition states of the components. Assessing the VoI is computationally intractable for most applications involving sequential decisions, such as long-term infrastructure maintenance. The component-level VoI can be used as a heuristic for assigning priorities to system-level inspection scheduling. In this research, we propose two alternative models for integrating adaptive maintenance planning based on POMDP and inspection scheduling based on a tractable approximation of VoI: the stochastic allocation model (and its two limiting scenarios called pessimistic and optimistic) that assumes observations are collected with a given probability, and the fee-based allocation model that assumes observations are available at a given cost. We illustrate how these models can be used at component-level and for system-level inspection scheduling. Furthermore, we evaluate the quality of solution provided by pessimistic and optimistic approaches. Finally, we introduce analytical formulas based on the stochastic and fee-based allocation models to predict the impact of a monitoring system (or a piece of information) on the operation and maintenance cost of infrastructure systems.
APA, Harvard, Vancouver, ISO, and other styles
8

Pinheiro, Paulo Gurgel 1983. "Planning for mobile robot localization using architectural design features on a hierarchical POMDP approach = Planejamento para localização de robôs móveis utilizando padrões arquitetônicos em um modelo hierárquico de POMDP." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275601.

Full text
Abstract:
Orientador: Jacques Wainer
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T02:06:24Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_D.pdf: 41476694 bytes, checksum: f3d5b1e2aa32aa6f00ef7ac689a261e2 (MD5) Previous issue date: 2013
Resumo: Localização de robôs móveis é uma das áreas mais exploradas da robótica devido a sua importância para a resolução de problemas, como: navegação, mapeamento e SLAM. Muitos trabalhos apresentaram soluções envolvendo cooperação, comunicação e exploração do ambiente, onde em geral a localização é obtida através de ações randômicas ou puramente orientadas pelo estado de crença. Nesta tese, é apresentado um modelo de planejamento para localização utilizando POMDP e Localização de Markov, que indicaria a melhor ação que o robô deve efetuar em cada momento, com o objetivo de diminuir a quantidade de passos. O foco está principalmente em: i) problemas de difícil localização: onde não há landmark ou informação extra no ambiente que auxilie o robô, ii) situações de performance crítica: onde o robô deve evitar passos randômicos e o gasto de energia e, por último, iii) situações com múltiplas missões. Sabendo que um robô é projetado para desempenhar missões, será proposto, neste trabalho, um modelo onde essas missões são consideradas em paralelo com a localização. Planejar para cenários com múltiplos ambientes é um desafio devido a grande quantidade de estados que deve ser tratada. Para esse tipo de problema, será apresentado um modelo de compressão de mapas que utiliza padrões arquiteturais e de design, como: quantidade de portas, paredes ou área total de um ambiente, para condensar informações que possam ser redundantes. O modelo baseia-se na similaridade das características de desing para agrupar ambientes similares e combiná-los, gerando um único mapa representante que possui uma quantidade de estados menor que a soma total de todos os estados dos ambientes do grupo. Planos em POMDP são gerados apenas para os representantes e não para todo o mapa. Finalmente, será apresentado o modelo hierárquico onde a localização é executada em duas camadas. Na camada superior, o robô utiliza os planos POMDP e os mapas compactos para estimar a grossa estimativa de sua localização e, na camada inferior, utiliza POMDP ou Localização de Markov para a obtenção da postura mais precisa. O modelo hierárquico foi demonstrado com experimentos utilizando o simulador V-REP, e o robô Pioneer 3-DX. Resultados comparativos mostraram que o robô utilizando o modelo proposto, foi capaz de realizar o processo de localização em cenários com múltiplos ambientes e cumprir a missão, mantendo a precisão com uma significativa redução na quantidade de passos efetuados
Abstract: Mobile Robot localization is one of the most explored areas in robotics due to its importance for solving problems, such as navigation, mapping and SLAM. In this work, we are interested in solving global localization problems, where the initial pose of the robot is completely unknown. Several works have proposed solutions for localization focusing on robot cooperation, communication or environment exploration, where the robot's pose is often found by a certain amount of random actions or state belief oriented actions. In order to decrease the total steps performed, we will introduce a model of planning for localization using POMDPs and Markov Localization that indicates the optimal action to be taken by the robot for each decision time. Our focus is on i) hard localization problems, where there are no special landmarks or extra features over the environment to help the robot, ii) critical performance situation, where the robot is required to avoid random actions and the waste of energy roaming over the environment, and iii) multiple missions situations. Aware the robot is designed to perform missions, we have proposed a model that runs missions and the localization process, simultaneously. Also, since the robot can have different missions, the model computes the planning for localization as an offline process, but loading the missions at runtime. Planning for multiple environments is a challenge due to the amount of states we must consider. Thus, we also proposed a solution to compress the original map, creating a smaller topological representation that is easier and cheaper to get plans done. The map compression takes advantage of the similarity of rooms found especially in offices and residential environments. Similar rooms have similar architectural design features that can be shared. To deal with the compressed map, we proposed a hierarchical approach that uses light POMDP plans and the compressed map on the higher layer to find the gross pose, and on the lower layer, decomposed maps to find the precise pose. We have demonstrated the hierarchical approach with the map compression using both V-REP Simulator and a Pioneer 3-DX robot. Comparing to other active localization models, the results show that our approach allowed the robot to perform both localization and the mission in a multiple room environment with a significant reduction on the number of steps while keeping the pose accuracy
Doutorado
Ciência da Computação
Doutor em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
9

Saldaña, Gadea Santiago Jesús. "The effectiveness of social plan sharing in online planning in POMDP-type domains." Winston-Salem, NC : Wake Forest University, 2009. http://dspace.zsr.wfu.edu/jspui/handle/10339/44699.

Full text
Abstract:
Thesis (M.S.)--Wake Forest University. Dept. of Computer Science, 2009.
Title from electronic thesis title page. Thesis advisor: William H. Turkett Jr. Vita. Includes bibliographical references (p. 47-48).
APA, Harvard, Vancouver, ISO, and other styles
10

BRAVO, RAISSA ZURLI BITTENCOURT. "THE USE OF UAVS IN HUMANITARIAN RELIEF: A POMDP BASED METHODOLOGY FOR FINDING VICTIMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30364@1.

Full text
Abstract:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
O uso de Veículos Aéreos Não Tripulados (VANTs) na ajuda humanitária tem sido proposto por pesquisadores para localizar vítimas em áreas afetadas por desastres. A urgência desse tipo de operação é encontrar pessoas afetadas o mais rápido possível, o que significa que determinar a roteirização ótima para os VANTs é muito importante para salvar vidas. Como os VANTs tem que percorrer toda a área afetada para encontrar vítimas, a operação de roteirização se torna equivalente a um problema de cobertura. Neste trabalho, uma metodologia para resolver o problema de cobertura é proposta, baseada na heurística do Processo de Decisão de Markov Parcialmente Observável (POMDP), onde as observações feitas pelos VANTs são consideradas. Essa heurística escolhe as ações baseando-se nas informações disponíveis, essas informações são as ações e observações anteriores. A formulação da roteirização do VANT é baseada na ideia de dar prioridades mais altas às áreas mais propensas a terem vítimas. Para aplicar esta técnica em casos reais, foi criada uma metodologia que consiste em quatro etapas. Primeiramente, o problema é modelado em relação à área afetada, tipo de drone que será utilizado, resolução da câmera, altura média do voo, ponto de partida ou decolagem, além do tamanho e prioridade dos estados. Em seguida, a fim de testar a eficiência do algoritmo através de simulações, grupos de vítimas são distribuídos pela área a ser sobrevoada. Então, o algoritmo é iniciado e o drone, a cada iteração, muda de estado de acordo com a heurística POMDP, até percorrer toda a área afetada. Por fim, a eficiência do algoritmo é testada através de quatro estatísticas: distância percorrida, tempo de operação, percentual de cobertura e tempo para encontrar grupos de vítimas. Essa metodologia foi aplicada em dois exemplos ilustrativos: um tornado em Xanxerê, no Brasil, que foi um desastre de início súbito em Abril de 2015, e em um campo de refugiados no Sudão do Sul, um desastre de início lento que começou em 2013. Depois de fazer simulações, foi demonstrado que a solução cobre toda a área afetada por desastres em um período de tempo razoável. A distância percorrida pelo VANT e a duração da operação, que dependem do número de estados, não tiveram um desvio padrão significativo entre as simulações, o que significa que, ainda que existam vários caminhos possíveis devido ao empate das prioridades, o algoritmo tem resultados homogêneos. O tempo para encontrar grupos de vítimas, e portanto o sucesso da operação de resgate, depende da definição das prioridades dos estados, estabelecidas por um especialista. Caso as prioridades sejam mal definidas, o VANT começará a sobrevoar áreas sem vítimas, o que levará ao fracasso da operação de resgate, uma vez que o algoritmo não estará salvando vidas o mais rápido possível. Ainda foi feita uma comparação do algoritmo proposto com o método guloso. A princípio, esse método não cobriu 100 por cento da área afetada, o que tornou a comparação injusta. Para contornar esse problema, o algoritmo guloso foi forçado a percorrer 100 por cento da área afetada e os resultados mostram que o POMDP tem resultados melhores em relação ao tempo para salvar vítimas. Já em relação a distância percorrida e tempo de operação, os resultados são iguais ou melhores para o POMDP. Isso ocorre porque o algoritmo guloso tem o viés de otimizar distância percorrida e, logo, otimiza o tempo de operação. Já o POMDP tem como objetivo, nesta dissertação, salvar vidas e faz isso de forma dinâmica, atualizando sua distribuição de probabilidades a cada observação feita. O ineditismo desta metodologia é ressaltado no capítulo 3, onde mais de 139 trabalhos foram lidos e classificados com o intuito de mostrar quais são as aplicações que drones em logística humanitária, como o POMDP é usado em drones e como a técnica de simulação é utilizada em logística humanitária. Apenas um artigo propõe o u
The use of Unmanned Aerial Vehicles (UAVs) in humanitarian relief has been proposed by researchers for searching victims in disaster affected areas. The urgency of this type of operation is to find the affected people as soon as possible, which means that determining the optimal flight path for UAVs is very important to save lifes. Since the UAVs have to search through the entire affected area to find victims, the path planning operation becomes equivalent to an area coverage problem. In this study, a methodology to solve the coverage problem is proposed, based on a Partially Observable Markov Decision Processes (POMDP) heuristic, which considers the observations made from UAVs. The formulation of the UAV path planning is based on the idea of assigning higher priorities to the areas which are more likely to contain victims. The methodology was applied in two illustrative examples: a tornado in Xanxerê, Brazil, which was a rapid-onset disaster in April 2015 and a refugee s camp in South Sudan, a slow-onset disaster that started in 2013. After simulations, it is demonstrated that this solution achieves full coverage of disaster affected areas in a reasonable time span. The traveled distance and the operation s durations, which are dependent on the number of states, did not have a significative standard deviation between the simulations. It means that even if there were many possible paths, due to the tied priorities, the algorithm has homogeneous results. The time to find groups of victims, and so the success of the search and rescue operation, depends on the specialist s definition of states priorities. A comparison with a greedy algorithm showed that POMDP is faster to find victims while greedy s performance focuses on minimizing the traveled distance. Future research indicates a practical application of the methodology proposed.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "POMDP"

1

Braziunas, Darius. Stochastic local search for POMDP controllers. Ottawa: National Library of Canada, 2003.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bayer, Valentina. A POMDP approximation algorithm that anticipates the need to observe. [Corvallis, OR: Oregon State University, Dept. of Computer Science, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Southwark (England). Planning Department. Peckham pomp. London: Southwark Planning Department, 1990.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Pomp and circumstances. Toronto, Ont: M&S, 1989.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pomp and circumstance. Alexandria, VA: Alexander Street Press, 2006.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

The pomp of man. Oke-Obere [Nigeria]: D' Virgo Publishers, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Youra, Paula Wilson. Pomp & circumstance: Ceremonial speaking. Greenwood, IN: Alistair Press, Educational Video Group, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chinaei, Hamidreza, and Brahim Chaib-draa. Building Dialogue POMDPs from Expert Dialogues. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-26200-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Oliehoek, Frans A., and Christopher Amato. A Concise Introduction to Decentralized POMDPs. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-28929-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Kusto, Zdzisław. Uwarunkowania ekonomicznej efektywności pomp ciepła. Gdańsk: Wydawn. IMP PAN, 2000.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "POMDP"

1

Beynier, Aurélie, François Charpillet, Daniel Szer, and Abdel-Illah Mouaddib. "DEC-MDP/POMDP." In Markov Decision Processes in Artificial Intelligence, 277–318. Hoboken, NJ USA: John Wiley & Sons, Inc., 2013. http://dx.doi.org/10.1002/9781118557426.ch9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Andriushchenko, Roman, Alexander Bork, Milan Češka, Sebastian Junges, Joost-Pieter Katoen, and Filip Macák. "Search and Explore: Symbiotic Policy Synthesis in POMDPs." In Computer Aided Verification, 113–35. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37709-9_6.

Full text
Abstract:
AbstractThis paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller—that solely decides based on the observations seen so far—to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.
APA, Harvard, Vancouver, ISO, and other styles
3

Oliehoek, Frans A., and Christopher Amato. "The Decentralized POMDP Framework." In SpringerBriefs in Intelligent Systems, 11–32. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-28929-8_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Borera, Eddy C., Larry D. Pyeatt, Arisoa S. Randrianasolo, and Madhi Naser-Moghadasi. "POMDP Filter: Pruning POMDP Value Functions with the Kaczmarz Iterative Method." In Advances in Artificial Intelligence, 254–65. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-16761-4_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Iwanari, Yuki, Yuichi Yabu, Makoto Tasaki, and Makoto Yokoo. "Network Distributed POMDP with Communication." In New Frontiers in Artificial Intelligence, 26–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-00609-8_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Shani, Guy, Ronen I. Brafman, and Solomon E. Shimony. "Prioritizing Point-Based POMDP Solvers." In Lecture Notes in Computer Science, 389–400. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006. http://dx.doi.org/10.1007/11871842_38.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rafferty, Anna N., Emma Brunskill, Thomas L. Griffiths, and Patrick Shafto. "Faster Teaching by POMDP Planning." In Lecture Notes in Computer Science, 280–87. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-21869-9_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Spel, Jip, Svenja Stein, and Joost-Pieter Katoen. "POMDP Controllers with Optimal Budget." In Quantitative Evaluation of Systems, 107–30. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-16336-4_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bork, Alexander, Joost-Pieter Katoen, and Tim Quatmann. "Under-Approximating Expected Total Rewards in POMDPs." In Tools and Algorithms for the Construction and Analysis of Systems, 22–40. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99527-0_2.

Full text
Abstract:
AbstractWe consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this—generally undecidable—problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.
APA, Harvard, Vancouver, ISO, and other styles
10

Pyeatt, Larry D., and Adele E. Howe. "A Parallel Algorithm for POMDP Solution." In Recent Advances in AI Planning, 73–83. Berlin, Heidelberg: Springer Berlin Heidelberg, 2000. http://dx.doi.org/10.1007/10720246_6.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "POMDP"

1

Baisero, Andrea, and Christopher Amato. "Reconciling Rewards with Predictive State Representations." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/299.

Full text
Abstract:
Predictive state representations (PSRs) are models of controlled non-Markov observation sequences which exhibit the same generative process governing POMDP observations without relying on an underlying latent state. In that respect, a PSR is indistinguishable from the corresponding POMDP. However, PSRs notoriously ignore the notion of rewards, which undermines the general utility of PSR models for control, planning, or reinforcement learning. Therefore, we describe a sufficient and necessary accuracy condition which determines whether a PSR is able to accurately model POMDP rewards, we show that rewards can be approximated even when the accuracy condition is not satisfied, and we find that a non-trivial number of POMDPs taken from a well-known third-party repository do not satisfy the accuracy condition. We propose reward-predictive state representations (R-PSRs), a generalization of PSRs which accurately models both observations and rewards, and develop value iteration for R-PSRs. We show that there is a mismatch between optimal POMDP policies and the optimal PSR policies derived from approximate rewards. On the other hand, optimal R-PSR policies perfectly match optimal POMDP policies, reconfirming R-PSRs as accurate state-less generative models of observations and rewards.
APA, Harvard, Vancouver, ISO, and other styles
2

Carr, Steven, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, and Ufuk Topcu. "Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/768.

Full text
Abstract:
We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Yunbo, Bo Liu, Jiajun Wu, Yuke Zhu, Simon S. Du, Li Fei-Fei, and Joshua B. Tenenbaum. "DualSMC: Tunneling Differentiable Filtering and Planning under Continuous POMDPs." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/579.

Full text
Abstract:
A major difficulty of solving continuous POMDPs is to infer the multi-modal distribution of the unobserved true states and to make the planning algorithm dependent on the perceived uncertainty. We cast POMDP filtering and planning problems as two closely related Sequential Monte Carlo (SMC) processes, one over the real states and the other over the future optimal trajectories, and combine the merits of these two parts in a new model named the DualSMC network. In particular, we first introduce an adversarial particle filter that leverages the adversarial relationship between its internal components. Based on the filtering results, we then propose a planning algorithm that extends the previous SMC planning approach [Piche et al., 2018] to continuous POMDPs with an uncertainty-dependent policy. Crucially, not only can DualSMC handle complex observations such as image input but also it remains highly interpretable. It is shown to be effective in three continuous POMDP domains: the floor positioning domain, the 3D light-dark navigation domain, and a modified Reacher domain.
APA, Harvard, Vancouver, ISO, and other styles
4

Khonji, Majid, Ashkan Jasour, and Brian Williams. "Approximability of Constant-horizon Constrained POMDP." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/775.

Full text
Abstract:
Partially Observable Markov Decision Process (POMDP) is a fundamental framework for planning and decision making under uncertainty. POMDP is known to be intractable to solve or even approximate when the planning horizon is long (i.e., within a polynomial number of time steps). Constrained POMDP (C-POMDP) allows constraints to be specified on some aspects of the policy in addition to the objective function. When the constraints involve bounding the probability of failure, the problem is called Chance-Constrained POMDP (CC-POMDP). Our first contribution is a reduction from CC-POMDP to C-POMDP and a novel Integer Linear Programming (ILP) formulation. Thus, any algorithm for the later problem can be utilized to solve any instance of the former. Second, we show that unlike POMDP, when the length of the planning horizon is constant, (C)C-POMDP is NP-Hard. Third, we present the first Fully Polynomial Time Approximation Scheme (FPTAS) that computes (near) optimal deterministic policies for constant-horizon (C)C-POMDP in polynomial time.
APA, Harvard, Vancouver, ISO, and other styles
5

Hsiao, Chuck, and Richard Malak. "Modeling Information Gathering Decisions in Systems Engineering Projects." In ASME 2014 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2014. http://dx.doi.org/10.1115/detc2014-34854.

Full text
Abstract:
Decisions in systems engineering projects commonly are made under significant amounts of uncertainty. This uncertainty can exist in many areas such as the performance of subsystems, interactions between subsystems, or project resource requirements such as budget or personnel. System engineers often can choose to gather information that reduces uncertainty, which allows for potentially better decisions, but at the cost of resources expended in acquiring the information. However, our understanding of how to analyze situations involving gathering information is limited, and thus heuristics, intuition, or deadlines are often used to judge the amount of information gathering needed in a decision. System engineers would benefit from a better understanding of how to determine the amount of information gathering needed to support a decision. This paper introduces Partially Observable Markov Decision Processes (POMDPs) as a formalism for modeling information-gathering decisions in systems engineering. A POMDP can model different states, alternatives, outcomes, and probabilities of outcomes to represent a decision maker’s beliefs about his situation. It also can represent sequential decisions in a compact format, avoiding the combinatorial explosion of decision trees and similar representations. The solution of a POMDP, in the form of value functions, prescribes the best course of action based on a decision maker’s beliefs about his situation. The value functions also determine if more information gathering is needed. Sophisticated computational solvers for POMDPs have been developed in recent years, allowing for a straightforward analysis of different alternatives, and determining the optimal course of action in a given situation. This paper demonstrates using a POMDP to model a systems engineering problem, and compares this approach with other approaches that account for information gathering in decision making.
APA, Harvard, Vancouver, ISO, and other styles
6

Williams, J. D., and S. Young. "Scaling up POMDPs for Dialog Management: The ``Summary POMDP'' Method." In IEEE Workshop on Automatic Speech Recognition and Understanding, 2005. IEEE, 2005. http://dx.doi.org/10.1109/asru.2005.1566498.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Bey, Henrik, Moritz Sackmann, Alexander Lange, and Jorn Thielecke. "POMDP Planning at Roundabouts." In 2021 IEEE Intelligent Vehicles Symposium Workshops (IV Workshops). IEEE, 2021. http://dx.doi.org/10.1109/ivworkshops54471.2021.9669232.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Phan, Thomy, Thomas Gabor, Robert Müller, Christoph Roch, and Claudia Linnhoff-Popien. "Adaptive Thompson Sampling Stacks for Memory Bounded Open-Loop Planning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/778.

Full text
Abstract:
We propose Stable Yet Memory Bounded Open-Loop (SYMBOL) planning, a general memory bounded approach to partially observable open-loop planning. SYMBOL maintains an adaptive stack of Thompson Sampling bandits, whose size is bounded by the planning horizon and can be automatically adapted according to the underlying domain without any prior domain knowledge beyond a generative model. We empirically test SYMBOL in four large POMDP benchmark problems to demonstrate its effectiveness and robustness w.r.t. the choice of hyperparameters and evaluate its adaptive memory consumption. We also compare its performance with other open-loop planning algorithms and POMCP.
APA, Harvard, Vancouver, ISO, and other styles
9

Clark-Turner, Madison, and Christopher Amato. "COG-DICE: An Algorithm for Solving Continuous-Observation Dec-POMDPs." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/638.

Full text
Abstract:
The decentralized partially observable Markov decision process (Dec-POMDP) is a powerful model for representing multi-agent problems with decentralized behavior. Unfortunately, current Dec-POMDP solution methods cannot solve problems with continuous observations, which are common in many real-world domains. To that end, we present a framework for representing and generating Dec-POMDP policies that explicitly include continuous observations. We apply our algorithm to a novel tagging problem and an extended version of a common benchmark, where it generates policies that meet or exceed the values of equivalent discretized domains without the need for finding an adequate discretization.
APA, Harvard, Vancouver, ISO, and other styles
10

Vien, Ngo Anh, and Marc Toussaint. "POMDP manipulation via trajectory optimization." In 2015 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2015. http://dx.doi.org/10.1109/iros.2015.7353381.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "POMDP"

1

Yost, Kirk A., and Alan R. Washburn. The LP/POMDP Marriage: Optimization with Imperfect Information. Fort Belvoir, VA: Defense Technical Information Center, January 2000. http://dx.doi.org/10.21236/ada486565.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Srivastava, Siddharth, Xiang Cheng, Stuart J. Russell, and Avi Pfeffer. First-Order Open-Universe POMDPs: Formulation and Algorithms. Fort Belvoir, VA: Defense Technical Information Center, December 2013. http://dx.doi.org/10.21236/ada603645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Theocharous, Georgios, Sridhar Mahadevan, and Leslie P. Kaelbling. Spatial and Temporal Abstractions in POMDPs Applied to Robot Navigation. Fort Belvoir, VA: Defense Technical Information Center, September 2005. http://dx.doi.org/10.21236/ada466737.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Banerjee, Bikramjit, and Landon Kraemer. Distributed Reinforcement Learning for Policy Synchronization in Infinite-Horizon Dec-POMDPs. Fort Belvoir, VA: Defense Technical Information Center, January 2012. http://dx.doi.org/10.21236/ada585093.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography