Добірка наукової літератури з теми "Partially Observable Markov Decision Processes (POMDPs)"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Partially Observable Markov Decision Processes (POMDPs)".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Partially Observable Markov Decision Processes (POMDPs)":

1

NI, YAODONG, and ZHI-QIANG LIU. "BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 21, no. 06 (December 2013): 821–63. http://dx.doi.org/10.1142/s0218488513500396.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.
2

Tennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.
3

Carr, Steven, Nils Jansen, and Ufuk Topcu. "Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 72 (November 18, 2021): 819–47. http://dx.doi.org/10.1613/jair.1.12963.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.
4

Kim, Sung-Kyun, Oren Salzman, and Maxim Likhachev. "POMHDP: Search-Based Belief Space Planning Using Multiple Heuristics." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 734–44. http://dx.doi.org/10.1609/icaps.v29i1.3542.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Robots operating in the real world encounter substantial uncertainty that cannot be modeled deterministically before the actual execution. This gives rise to the necessity of robust motion planning under uncertainty also known as belief space planning. Belief space planning can be formulated as Partially Observable Markov Decision Processes (POMDPs). However, computing optimal policies for non-trivial POMDPs is computationally intractable. Building upon recent progress from the search community, we propose a novel anytime POMDP solver, Partially Observable Multi-Heuristic Dynamic Programming (POMHDP), that leverages multiple heuristics to efficiently compute high-quality solutions while guaranteeing asymptotic convergence to an optimal policy. Through iterative forward search, POMHDP utilizes domain knowledge to solve POMDPs with specific goals and an infinite horizon. We demonstrate the efficacy of our proposed framework on a real-world, highly-complex, truck unloading application.
5

Wang, Chenggang, and Roni Khardon. "Relational Partially Observable MDPs." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1153–58. http://dx.doi.org/10.1609/aaai.v24i1.7742.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Relational Markov Decision Processes (MDP) are a useful abstraction for stochastic planning problems since one can develop abstract solutions for them that are independent of domain size or instantiation. While there has been an increased interest in developing relational fully observable MDPs, there has been very little work on relational partially observable MDPs (POMDP), which deal with uncertainty in problem states in addition to stochastic action effects. This paper provides a concrete formalization of relational POMDPs making several technical contributions toward their solution. First, we show that to maintain correctness one must distinguish between quantification over states and quantification over belief states; this implies that solutions based on value iteration are inherently limited to the finite horizon case. Second, we provide a symbolic dynamic programing algorithm for finite horizon relational POMDPs, solving them at an abstract level, by lifting the propositional incremental pruning algorithm. Third, we show that this algorithm can be implemented using first order decision diagrams, a compact representation for functions over relational structures, that has been recently used to solve relational MDPs.
6

Hauskrecht, M. "Value-Function Approximations for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 13 (August 1, 2000): 33–94. http://dx.doi.org/10.1613/jair.678.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.
7

Victorio-Meza, Hermilo, Manuel Mejía-Lavalle, Alicia Martínez Rebollar, Andrés Blanco Ortega, Obdulia Pichardo Lagunas, and Grigori Sidorov. "Searching for Cerebrovascular Disease Optimal Treatment Recommendations Applying Partially Observable Markov Decision Processes." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 01 (October 9, 2017): 1860015. http://dx.doi.org/10.1142/s0218001418600157.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) are mathematical models for the planning of action sequences under conditions of uncertainty. Uncertainty in POMDPs is manifested in two ways: uncertainty in the perception of model states and uncertainty in the effects of actions on states. The diagnosis and treatment of cerebral vascular diseases (CVD) present this double condition of uncertainty, so we think that POMDP is the most suitable method to model them. In this paper, we propose a model of CVD that is based on observations obtained from neuroimaging studies such as computed tomography, magnetic resonance and ultrasound. The model is designed as a POMDP because the health status of the patient is not directly observable, and only can be deduced, with some probability, from the observations in the cerebral images. The components of the model (states, observations, actions, etc.) were defined based on specialized literature. A diagnosis of the patient’s health status is made and the most appropriate action for the recovery of health is recommended after introducing the observations when operating the model. Consultation of the probable state of health of the patient and alternative actions is also allowed.
8

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.
9

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.
10

Rozek, Brandon, Junkyu Lee, Harsha Kokel, Michael Katz, and Shirin Sohrabi. "Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23635–36. http://dx.doi.org/10.1609/aaai.v38i21.30504.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) challenge reinforcement learning agents due to incomplete knowledge of the environment. Even assuming monotonicity in uncertainty, it is difficult for an agent to know how and when to stop exploring for a given task. In this abstract, we discuss how to use hierarchical reinforcement learning (HRL) and AI Planning (AIP) to improve exploration when the agent knows possible valuations of unknown predicates and how to discover them. By encoding the uncertainty in an abstract planning model, the agent can derive a high-level plan which is then used to decompose the overall POMDP into a tree of semi-POMDPs for training. We evaluate our agent's performance on the MiniGrid domain and show how guided exploration may improve agent performance.

Дисертації з теми "Partially Observable Markov Decision Processes (POMDPs)":

1

Aberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). ¶ In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ¶ Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. ¶ The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author's knowledge, no other policy-gradient algorithms have performed well at such tasks. ¶ The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU ``Bunyip'' Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001.
2

Olafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search space it is essential to be able to make intelligent decisions about where in the visual field to obtain information from to reduce the computational resources needed. In this report a POMDP (Partially Observable Markov Decision Process) learning framework, using a policy gradient method and information rewards as a training signal, has been implemented and used to train fixation policies that aim to maximize the information gathered in each fixation. The purpose of such policies is to make object recognition faster by reducing the number of fixations needed. The trained policies are evaluated by simulation and comparing them with several fixed policies. Finally it is shown that it is possible to use the framework to train policies that outperform the fixed policies for certain observation models.
3

Lusena, Christopher. "Finite Memory Policies for Partially Observable Markov Decision Proesses." UKnowledge, 2001. http://uknowledge.uky.edu/gradschool_diss/323.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
This dissertation makes contributions to areas of research on planning with POMDPs: complexity theoretic results and heuristic techniques. The most important contributions are probably the complexity of approximating the optimal history-dependent finite-horizon policy for a POMDP, and the idea of heuristic search over the space of FFTs.
4

Skoglund, Caroline. "Risk-aware Autonomous Driving Using POMDPs and Responsibility-Sensitive Safety." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300909.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Autonomous vehicles promise to play an important role aiming at increased efficiency and safety in road transportation. Although we have seen several examples of autonomous vehicles out on the road over the past years, how to ensure the safety of autonomous vehicle in the uncertain and dynamic environment is still a challenging problem. This thesis studies this problem by developing a risk-aware decision making framework. The system that integrates the dynamics of an autonomous vehicle and the uncertain environment is modelled as a Partially Observable Markov Decision Process (POMDP). A risk measure is proposed based on the Responsibility-Sensitive Safety (RSS) distance, which quantifying the minimum distance to other vehicles for ensuring safety. This risk measure is incorporated into the reward function of POMDP for achieving a risk-aware decision making. The proposed risk-aware POMDP framework is evaluated in two case studies. In a single-lane car following scenario, it is shown that the ego vehicle is able to successfully avoid a collision in an emergency event where a vehicle in front of it makes a full stop. In the merge scenario, the ego vehicle successfully enters the main road from a ramp with a satisfactory distance to other vehicles. As a conclusion, the risk-aware POMDP framework is able to realize a trade-off between safety and usability by keeping a reasonable distance and adapting to other vehicles behaviours.
Autonoma fordon förutspås spela en stor roll i framtiden med målen att förbättra effektivitet och säkerhet för vägtransporter. Men även om vi sett flera exempel av autonoma fordon ute på vägarna de senaste åren är frågan om hur säkerhet ska kunna garanteras ett utmanande problem. Det här examensarbetet har studerat denna fråga genom att utveckla ett ramverk för riskmedvetet beslutsfattande. Det autonoma fordonets dynamik och den oförutsägbara omgivningen modelleras med en partiellt observerbar Markov-beslutsprocess (POMDP från engelskans “Partially Observable Markov Decision Process”). Ett riskmått föreslås baserat på ett säkerhetsavstånd förkortat RSS (från engelskans “Responsibility-Sensitive Safety”) som kvantifierar det minsta avståndet till andra fordon för garanterad säkerhet. Riskmåttet integreras i POMDP-modellens belöningsfunktion för att åstadkomma riskmedvetna beteenden. Den föreslagna riskmedvetna POMDP-modellen utvärderas i två fallstudier. I ett scenario där det egna fordonet följer ett annat fordon på en enfilig väg visar vi att det egna fordonet kan undvika en kollision då det framförvarande fordonet bromsar till stillastående. I ett scenario där det egna fordonet ansluter till en huvudled från en ramp visar vi att detta görs med ett tillfredställande avstånd till andra fordon. Slutsatsen är att den riskmedvetna POMDP-modellen lyckas realisera en avvägning mellan säkerhet och användbarhet genom att hålla ett rimligt säkerhetsavstånd och anpassa sig till andra fordons beteenden.
5

You, Yang. "Probabilistic Decision-Making Models for Multi-Agent Systems and Human-Robot Collaboration." Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0014.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Dans cette thèse, nous nous intéressons à la prise de décision haut niveau (planification de tâches) pour la robotique à l'aide de modèles de prise de décision markoviens et sous deux aspects : la collaboration robot-robot et la collaboration homme-robot. Dans le cadre de la collaboration robot-robot (RRC), nous étudions les problèmes de décision de plusieurs robots devant atteindre un objectif commun de manière collaborative, et nous utilisons le cadre des processus de décision markoviens partiellement observables et décentralisés (Dec-POMDP) pour modéliser de tels problèmes. Nous proposons deux nouveaux algorithmes pour résoudre les Dec-POMDP. Le premier algorithme (Inf-JESP) trouve des équilibres de Nash en construisant itérativement pour chaque agent la politique meilleure réponse aux autres agents jusqu'à ce qu'aucune amélioration ne soit possible. Pour traiter les Dec-POMDP à horizon infini, nous représentons la politique de chaque agent à l'aide d'un contrôleur à états finis. Le deuxième algorithme (MC-JESP) étend Inf-JESP avec des modèles génératifs, ce qui nous permet de passer à l'échelle pour des grands problèmes. Nous démontrons expérimentalement que nos méthodes sont compétitives par rapport aux solveurs Dec-POMDP existants. Dans le cadre de la collaboration homme-robot (HRC), nous ne pouvons contrôler que le comportement du robot, lequel doit faire face à des objectifs humains et à des comportements induits incertains. Nous cherchons ainsi à dériver des politiques de robot qui sont robustes aux incertitudes sur les comportements humains. Pour cela, nous discutons des modèles mentaux qui peuvent être utilisés pour modéliser le comportement humain dans une telle tâche collaborative. Nous décrivons ensuite une approche générale pour dériver, automatiquement et sans connaissance préalable, un modèle de comportements humains basé sur l'hypothèse que l'humain contrôle aussi le robot. À partir de là, nous proposons deux algorithmes pour calculer des politiques robustes pour le robot en se basant sur la résolution d'un POMDP dont l'état contient l'état interne de l'humain. Le premier algorithme fonctionne hors ligne et fournit une politique complète qui peut être utilisée par le robot pendant son exécution. Le deuxième algorithme est une méthode en ligne, c'est-à-dire qu'il décide de l'action du robot à chaque pas de temps au cours de l'exécution. Par rapport à l'approche hors ligne, la méthode en ligne ne nécessite qu'un modèle génératif et peut donc s'adapter à de grands problèmes. Des expériences avec des humains synthétiques et réels sont menées dans un environnement simulé pour évaluer ces algorithmes. Nous observons que nos méthodes peuvent fournir des décisions robustes pour des robots collaboratifs malgré les incertitudes sur les objectifs et les comportements humains. Dans cette thèse, notre recherche sur la collaboration robot-robot fournit une base pour construire des politiques meilleure réponse dans un cadre partiellement observable et multi-agent, base qui sert d'étape intermédiaire importante pour aborder les problèmes de collaboration homme-robot. De plus, pour chaque contribution, nous fournissons des algorithmes plus flexibles utilisant des modèles génératifs dont nous pensons qu'ils faciliteront la mise en œuvre de nos contributions à des applications du monde réel
In this thesis, using Markov decision models, we investigate high-level decision-making (task-level planning) for robotics in two aspects: robot-robot collaboration and human-robot collaboration.In robot-robot collaboration (RRC), we study the decision problems of multiple robots involved to achieve a shared goal collaboratively, and we use the decentralized partially observable Markov decision process (Dec-POMDP) framework to model such RRC problems. Then, we propose two novel algorithms for solving Dec-POMDPs. The first algorithm (Inf-JESP) finds Nash equilibrium solutions by iteratively building the best-response policy for each agent until no improvement can be made. To handle infinite-horizon Dec-POMDPs, we represent each agent's policy using a finite-state controller. The second algorithm (MC-JESP) extends Inf-JESP with generative models, which enables us to scale up to large problems. Through experiments, we demonstrate our methods are competitive with existing Dec-POMDP solvers.In human-robot collaboration (HRC), we can only control the robot, and the robot faces uncertain human objectives and induced behaviors. Therefore, we attempt to address the challenge of deriving robot policies in HRC, which are robust to the uncertainties about human behaviors. In this direction, we discuss possible mental models that can be used to model humans in an HRC task. We propose a general approach to derive, automatically and without prior knowledge, a model of human behaviors based on the assumption that the human could also control the robot. From here, we then design two algorithms for computing robust robot policies relying on solving a robot POMDP, whose state contains the human's internal state. The first algorithm operates offline and gives a complete robot policy that can be used during the robot's execution. The second algorithm is an online method, i.e., it plans the robot's action at each time step during execution. Compared with the offline approach, the online method only requires a generative model and thus can scale up to large problems. Experiments with synthetic and real humans are conducted in a simulated environment to evaluate these algorithms. We observe that our methods can provide robust robot decisions despite the uncertainties over human objectives and behaviors.In this thesis, our research for RRC provides a foundation for building best-response policies in a partially observable and multi-agent setting, which serves as an important intermediate step for addressing HRC problems. Moreover, we provide more flexible algorithms using generative models in each contribution, and we believe this will facilitate applying our contributions to real-world applications
6

Cheng, Hsien-Te. "Algorithms for partially observable Markov decision processes." Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/29073.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
The thesis develops methods to solve discrete-time finite-state partially observable Markov decision processes. For the infinite horizon problem, only discounted reward case is considered. Several new algorithms for the finite horizon and the infinite horizon problems are developed. For the finite horizon problem, two new algorithms are developed. The first algorithm is called the relaxed region algorithm. For each support in the value function, this algorithm determines a region not smaller than its support region and modifies it implicitly in later steps until the exact support region is found. The second algorithm, called linear support algorithm, systematically approximates the value function until all supports in the value function are found. The most important feature of this algorithm is that it can be modified to find an approximate value function. The number of regions determined explicitly by both algorithms is the same as the number of supports in the value function, which is much less than the number of regions generated by the one-pass algorithm. Since the vertices of each region have to be found, these two algorithms are more efficient than the one-pass algorithm. The limited numerical examples also show that both methods are more efficient than the existing algorithms. For the infinite horizon problem, it is first shown that the approximation version of linear support algorithm can be used to substitute the policy improvement step in a standard successive approximation method to obtain an e-optimal value function. Next, an iterative discretization procedure is developed which uses a small number of states to find new supports and improve the value function between two policy improvement steps. Since only a finite number of states are chosen in this process, some techniques developed for finite MDP can be applied here. Finally, we prove that the policy improvement step in iterative discretization procedure can be replaced by the approximation version of linear support algorithm. The last part of the thesis deals with problems with continuous signals. We first show that if the signal processes are uniformly distributed, then the problem can be reformulated as a problem with finite number of signals. Then the result is extended to where the signal processes are step functions. Since step functions can be easily used to approximate most of the probability distributions, this method can be used to approximate most of the problems with continuous signals. Finally, we present some conditions which guarantee that the linear support can be computed for any given state, then the methods developed for finite signal cases can be easily modified and applied to problems for which the conditions hold.
Business, Sauder School of
Graduate
7

Jaulmes, Robin. "Active learning in partially observable Markov decision processes." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98733.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
People are efficient when they make decisions under uncertainty, even when their decisions have long-term ramifications, or when their knowledge and their perception of the environment are uncertain. We are able to experiment with the environment and learn, improving our behavior as experience is gathered. Most of the problems we face in real life are of that kind, and most of the problems that an automated agent would face in robotics too.
Our goal is to build Artificial Intelligence algorithms able to reproduce the reasoning of humans for these complex problems. We use the Reinforcement Learning framework, which allows to learn optimal behaviors in dynamic environments. More precisely, we adapt Partially-Observable Markov Decision Processes (POMDPs) to environments that are partially known.
We take inspiration from the field of Active Learning: we assume the existence of an oracle, who can, during a short learning phase, provide the agent with additional information about its environment. The agent actively learns everything that is useful in the environment, with a minimum use of the oracle.
After reviewing existing methods for solving learning problems in partially observable environments, we expose a theoretical active learning setup. We propose an algorithm, MEDUSA, and show theoretical and empirical proofs of performance for it.
8

Aberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /." View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Zawaideh, Zaid. "Eliciting preferences sequentially using partially observable Markov decision processes." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18794.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Decision Support systems have been gaining in importance recently. Yet one of the bottlenecks of designing such systems lies in understanding how the user values different decision outcomes, or more simply what the user preferences are. Preference elicitation promises to remove the guess work of designing decision making agents by providing more formal methods for measuring the `goodness' of outcomes. This thesis aims to address some of the challenges of preference elicitation such as the high dimensionality of the underlying problem. The problem is formulated as a partially observable Markov decision process (POMDP) using a factored representation to take advantage of the structure inherent to preference elicitation problems. Moreover, simple preference knowledge on problem attributes are used to acquire more accurate preferences without increasing the burden on the user. Sparse terminal actions are defined to allow a flexible trade-off between speed and accuracy of the elicited preference function. Empirical simulations are used to validate the proposed methodology. The result is a framework that is flexible enough to be applied to a wide range of domains that addresses some of the challenges facing preference elicitation methods
Les systèmes d'aide à la décision ont gagné en importance récemment. Pourtant, un des problèmes importants liés au design de tels systèmes demeure: comprendre comment l'usager évalue les différents résultats, ou plus simplement, déterminer quelles sont ses préférences. L'extraction des préférences vise à éliminer certains aspects arbitraires du design d'agents de décision en offrant des méthodes plus formelles pour mesurer la qualité des résultats. Cette thèse tente de résoudre certains problèmes ayant trait à l'extraction des préférences, tel que celui de la haute dimensionnalité du problème sous-jacent. Le problème est formulé en tant que processus de décision markovien partiellement observable (POMDP), et utilise une représentation factorisée afin de profiter de la structure inhérente aux problèmes d'extraction des préférences. De plus, des connaissances simples quant aux caractéristiques de ces problèmes sont exploitées afin d'obtenir des préférences plus précises, sans pour autant augmenter la tâche de l'usager. Les actions terminales "sparse" sont définies de manière à permettre un compromis flexible entre vitesse et précision. Le résultat est un système assez flexible pour être appliqué à un grand nombre de domaines qui ont à faire face aux problèmes liés aux méthodes d'extraction des préférences.
10

Williams, Jason Douglas. "Partially observable Markov decision processes for spoken dialogue management." Thesis, University of Cambridge, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.612754.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Partially Observable Markov Decision Processes (POMDPs)":

1

Howes, Andrew, Xiuli Chen, Aditya Acharya, and Richard L. Lewis. Interaction as an Emergent Property of a Partially Observable Markov Decision Process. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780198799603.003.0011.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
In this chapter we explore the potential advantages of modeling the interaction between a human and a computer as a consequence of a Partially Observable Markov Decision Process (POMDP) that models human cognition. POMDPs can be used to model human perceptual mechanisms, such as human vision, as partial (uncertain) observers of a hidden state are possible. In general, POMDPs permit a rigorous definition of interaction as the outcome of a reward maximizing stochastic sequential decision processes. They have been shown to explain interaction between a human and an environment in a range of scenarios, including visual search, interactive search and sense-making. The chapter uses these scenarios to illustrate the explanatory power of POMDPs in HCI. It also shows that POMDPs embrace the embodied, ecological and adaptive nature of human interaction.
2

Poupart, Pascal. Exploiting structure to efficiently solve large scale partially observable Markov decision processes. 2005.

Знайти повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Partially Observable Markov Decision Processes (POMDPs)":

1

Andriushchenko, Roman, Alexander Bork, Milan Češka, Sebastian Junges, Joost-Pieter Katoen, and Filip Macák. "Search and Explore: Symbiotic Policy Synthesis in POMDPs." In Computer Aided Verification, 113–35. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-37709-9_6.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractThis paper marries two state-of-the-art controller synthesis methods for partially observable Markov decision processes (POMDPs), a prominent model in sequential decision making under uncertainty. A central issue is to find a POMDP controller—that solely decides based on the observations seen so far—to achieve a total expected reward objective. As finding optimal controllers is undecidable, we concentrate on synthesising good finite-state controllers (FSCs). We do so by tightly integrating two modern, orthogonal methods for POMDP controller synthesis: a belief-based and an inductive approach. The former method obtains an FSC from a finite fragment of the so-called belief MDP, an MDP that keeps track of the probabilities of equally observable POMDP states. The latter is an inductive search technique over a set of FSCs, e.g., controllers with a fixed memory size. The key result of this paper is a symbiotic anytime algorithm that tightly integrates both approaches such that each profits from the controllers constructed by the other. Experimental results indicate a substantial improvement in the value of the controllers while significantly reducing the synthesis time and memory footprint.
2

Bork, Alexander, Joost-Pieter Katoen, and Tim Quatmann. "Under-Approximating Expected Total Rewards in POMDPs." In Tools and Algorithms for the Construction and Analysis of Systems, 22–40. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-99527-0_2.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractWe consider the problem: is the optimal expected total reward to reach a goal state in a partially observable Markov decision process (POMDP) below a given threshold? We tackle this—generally undecidable—problem by computing under-approximations on these total expected rewards. This is done by abstracting finite unfoldings of the infinite belief MDP of the POMDP. The key issue is to find a suitable under-approximation of the value function. We provide two techniques: a simple (cut-off) technique that uses a good policy on the POMDP, and a more advanced technique (belief clipping) that uses minimal shifts of probabilities between beliefs. We use mixed-integer linear programming (MILP) to find such minimal probability shifts and experimentally show that our techniques scale quite well while providing tight lower bounds on the expected total reward.
3

Bork, Alexander, Debraj Chakraborty, Kush Grover, Jan Křetínský, and Stefanie Mohr. "Learning Explainable and Better Performing Representations of POMDP Strategies." In Tools and Algorithms for the Construction and Analysis of Systems, 299–319. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-57249-4_15.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
AbstractStrategies for partially observable Markov decision processes (POMDP) typically require memory. One way to represent this memory is via automata. We present a method to learn an automaton representation of a strategy using a modification of the $$L^*$$ L ∗ -algorithm. Compared to the tabular representation of a strategy, the resulting automaton is dramatically smaller and thus also more explainable. Moreover, in the learning process, our heuristics may even improve the strategy’s performance. We compare our approach to an existing approach that synthesizes an automaton directly from the POMDP, thereby solving it. Our experiments show that our approach can lead to significant improvements in the size and quality of the resulting strategy representations.
4

Bäuerle, Nicole, and Ulrich Rieder. "Partially Observable Markov Decision Processes." In Universitext, 147–74. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-18324-9_5.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Dutech, Alain, and Bruno Scherrer. "Partially Observable Markov Decision Processes." In Markov Decision Processes in Artificial Intelligence, 185–228. Hoboken, NJ USA: John Wiley & Sons, Inc., 2013. http://dx.doi.org/10.1002/9781118557426.ch7.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Zeugmann, Thomas, Pascal Poupart, James Kennedy, Xin Jin, Jiawei Han, Lorenza Saitta, Michele Sebag, et al. "Partially Observable Markov Decision Processes." In Encyclopedia of Machine Learning, 754–60. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-30164-8_629.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Sucar, Luis Enrique. "Partially Observable Markov Decision Processes." In Probabilistic Graphical Models, 249–66. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-61943-5_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Spaan, Matthijs T. J. "Partially Observable Markov Decision Processes." In Adaptation, Learning, and Optimization, 387–414. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-27645-3_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Poupart, Pascal. "Partially Observable Markov Decision Processes." In Encyclopedia of Machine Learning and Data Mining, 959–66. Boston, MA: Springer US, 2017. http://dx.doi.org/10.1007/978-1-4899-7687-1_629.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Besse, Camille, and Brahim Chaib-draa. "Quasi-Deterministic Partially Observable Markov Decision Processes." In Neural Information Processing, 237–46. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009. http://dx.doi.org/10.1007/978-3-642-10677-4_27.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Partially Observable Markov Decision Processes (POMDPs)":

1

Soh, Harold, and Yiannis Demiris. "Evolving policies for multi-reward partially observable markov decision processes (MR-POMDPs)." In the 13th annual conference. New York, New York, USA: ACM Press, 2011. http://dx.doi.org/10.1145/2001576.2001674.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Castellini, Alberto, Georgios Chalkiadakis, and Alessandro Farinelli. "Influence of State-Variable Constraints on Partially Observable Monte Carlo Planning." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/769.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Online planning methods for partially observable Markov decision processes (POMDPs) have recently gained much interest. In this paper, we propose the introduction of prior knowledge in the form of (probabilistic) relationships among discrete state-variables, for online planning based on the well-known POMCP algorithm. In particular, we propose the use of hard constraint networks and probabilistic Markov random fields to formalize state-variable constraints and we extend the POMCP algorithm to take advantage of these constraints. Results on a case study based on Rocksample show that the usage of this knowledge provides significant improvements to the performance of the algorithm. The extent of this improvement depends on the amount of knowledge encoded in the constraints and reaches the 50% of the average discounted return in the most favorable cases that we analyzed.
3

Carr, Steven, Nils Jansen, Ralf Wimmer, Alexandru Serban, Bernd Becker, and Ufuk Topcu. "Counterexample-Guided Strategy Improvement for POMDPs Using Recurrent Neural Networks." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/768.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We study strategy synthesis for partially observable Markov decision processes (POMDPs). The particular problem is to determine strategies that provably adhere to (probabilistic) temporal logic constraints. This problem is computationally intractable and theoretically hard. We propose a novel method that combines techniques from machine learning and formal verification. First, we train a recurrent neural network (RNN) to encode POMDP strategies. The RNN accounts for memory-based decisions without the need to expand the full belief space of a POMDP. Secondly, we restrict the RNN-based strategy to represent a finite-memory strategy and implement it on a specific POMDP. For the resulting finite Markov chain, efficient formal verification techniques provide provable guarantees against temporal logic specifications. If the specification is not satisfied, counterexamples supply diagnostic information. We use this information to improve the strategy by iteratively training the RNN. Numerical experiments show that the proposed method elevates the state of the art in POMDP solving by up to three orders of magnitude in terms of solving times and model sizes.
4

Lim, Michael H., Claire Tomlin, and Zachary N. Sunberg. "Sparse Tree Search Optimality Guarantees in POMDPs with Continuous Observation Spaces." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/572.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) with continuous state and observation spaces have powerful flexibility for representing real-world decision and control problems but are notoriously difficult to solve. Recent online sampling-based algorithms that use observation likelihood weighting have shown unprecedented effectiveness in domains with continuous observation spaces. However there has been no formal theoretical justification for this technique. This work offers such a justification, proving that a simplified algorithm, partially observable weighted sparse sampling (POWSS), will estimate Q-values accurately with high probability and can be made to perform arbitrarily near the optimal solution by increasing computational power.
5

Hsiao, Chuck, and Richard Malak. "Modeling Information Gathering Decisions in Systems Engineering Projects." In ASME 2014 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2014. http://dx.doi.org/10.1115/detc2014-34854.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Decisions in systems engineering projects commonly are made under significant amounts of uncertainty. This uncertainty can exist in many areas such as the performance of subsystems, interactions between subsystems, or project resource requirements such as budget or personnel. System engineers often can choose to gather information that reduces uncertainty, which allows for potentially better decisions, but at the cost of resources expended in acquiring the information. However, our understanding of how to analyze situations involving gathering information is limited, and thus heuristics, intuition, or deadlines are often used to judge the amount of information gathering needed in a decision. System engineers would benefit from a better understanding of how to determine the amount of information gathering needed to support a decision. This paper introduces Partially Observable Markov Decision Processes (POMDPs) as a formalism for modeling information-gathering decisions in systems engineering. A POMDP can model different states, alternatives, outcomes, and probabilities of outcomes to represent a decision maker’s beliefs about his situation. It also can represent sequential decisions in a compact format, avoiding the combinatorial explosion of decision trees and similar representations. The solution of a POMDP, in the form of value functions, prescribes the best course of action based on a decision maker’s beliefs about his situation. The value functions also determine if more information gathering is needed. Sophisticated computational solvers for POMDPs have been developed in recent years, allowing for a straightforward analysis of different alternatives, and determining the optimal course of action in a given situation. This paper demonstrates using a POMDP to model a systems engineering problem, and compares this approach with other approaches that account for information gathering in decision making.
6

Horák, Karel, Branislav Bošanský, and Krishnendu Chatterjee. "Goal-HSVI: Heuristic Search Value Iteration for Goal POMDPs." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/662.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially observable Markov decision processes (POMDPs) are the standard models for planning under uncertainty with both finite and infinite horizon. Besides the well-known discounted-sum objective, indefinite-horizon objective (aka Goal-POMDPs) is another classical objective for POMDPs. In this case, given a set of target states and a positive cost for each transition, the optimization objective is to minimize the expected total cost until a target state is reached. In the literature, RTDP-Bel or heuristic search value iteration (HSVI) have been used for solving Goal-POMDPs. Neither of these algorithms has theoretical convergence guarantees, and HSVI may even fail to terminate its trials. We give the following contributions: (1) We discuss the challenges introduced in Goal-POMDPs and illustrate how they prevent the original HSVI from converging. (2) We present a novel algorithm inspired by HSVI, termed Goal-HSVI, and show that our algorithm has convergence guarantees. (3) We show that Goal-HSVI outperforms RTDP-Bel on a set of well-known examples.
7

Koops, Wietze, Nils Jansen, Sebastian Junges, and Thiago D. Simão. "Recursive Small-Step Multi-Agent A* for Dec-POMDPs." In Thirty-Second International Joint Conference on Artificial Intelligence {IJCAI-23}. California: International Joint Conferences on Artificial Intelligence Organization, 2023. http://dx.doi.org/10.24963/ijcai.2023/600.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We present recursive small-step multi-agent A* (RS-MAA*), an exact algorithm that optimizes the expected reward in decentralized partially observable Markov decision processes (Dec-POMDPs). RS-MAA* builds on multi-agent A* (MAA*), an algorithm that finds policies by exploring a search tree, but tackles two major scalability concerns. First, we employ a modified, small-step variant of the search tree that avoids the double exponential outdegree of the classical formulation. Second, we use a tight and recursive heuristic that we compute on-the-fly, thereby avoiding an expensive precomputation. The resulting algorithm is conceptually simple, yet it shows superior performance on a rich set of standard benchmarks.
8

Carr, Steven, Nils Jansen, and Ufuk Topcu. "Verifiable RNN-Based Policies for POMDPs Under Temporal Logic Constraints." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/570.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Recurrent neural networks (RNNs) have emerged as an effective representation of control policies in sequential decision-making problems. However, a major drawback in the application of RNN-based policies is the difficulty in providing formal guarantees on the satisfaction of behavioral specifications, e.g. safety and/or reachability. By integrating techniques from formal methods and machine learning, we propose an approach to automatically extract a finite-state controller (FSC) from an RNN, which, when composed with a finite-state system model, is amenable to existing formal verification tools. Specifically, we introduce an iterative modification to the so-called quantized bottleneck insertion technique to create an FSC as a randomized policy with memory. For the cases in which the resulting FSC fails to satisfy the specification, verification generates diagnostic information. We utilize this information to either adjust the amount of memory in the extracted FSC or perform focused retraining of the RNN. While generally applicable, we detail the resulting iterative procedure in the context of policy synthesis for partially observable Markov decision processes (POMDPs), which is known to be notoriously hard. The numerical experiments show that the proposed approach outperforms traditional POMDP synthesis methods by 3 orders of magnitude within 2% of optimal benchmark values.
9

Chatterjee, Krishnendu, Adrián Elgyütt, Petr Novotný, and Owen Rouillé. "Expectation Optimization with Probabilistic Guarantees in POMDPs with Discounted-Sum Objectives." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/652.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
Partially-observable Markov decision processes (POMDPs) with discounted-sum payoff are a standard framework to model a wide range of problems related to decision making under uncertainty. Traditionally, the goal has been to obtain policies that optimize the expectation of the discounted-sum payoff. A key drawback of the expectation measure is that even low probability events with extreme payoff can significantly affect the expectation, and thus the obtained policies are not necessarily risk averse. An alternate approach is to optimize the probability that the payoff is above a certain threshold, which allows to obtain risk-averse policies, but ignore optimization of the expectation. We consider the expectation optimization with probabilistic guarantee (EOPG) problem where the goal is to optimize the expectation ensuring that the payoff is above a given threshold with at least a specified probability. We present several results on the EOPG problem, including the first algorithm to solve it.
10

Dam, Tuan, Pascal Klink, Carlo D'Eramo, Jan Peters, and Joni Pajarinen. "Generalized Mean Estimation in Monte-Carlo Tree Search." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/332.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Анотація:
We consider Monte-Carlo Tree Search (MCTS) applied to Markov Decision Processes (MDPs) and Partially Observable MDPs (POMDPs), and the well-known Upper Confidence bound for Trees (UCT) algorithm. In UCT, a tree with nodes (states) and edges (actions) is incrementally built by the expansion of nodes, and the values of nodes are updated through a backup strategy based on the average value of child nodes. However, it has been shown that with enough samples the maximum operator yields more accurate node value estimates than averaging. Instead of settling for one of these value estimates, we go a step further proposing a novel backup strategy which uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power-UCT, and argue how the use of the power mean operator helps to speed up the learning in MCTS. We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known MDP and POMDP benchmarks, showing significant improvement in performance and convergence speed w.r.t. state of the art algorithms.

До бібліографії