Увійти

Готові списки джерел за темами / Partially Observable Markov Decision Processes (POMDPs) / Статті в журналах

Статті в журналах з теми "Partially Observable Markov Decision Processes (POMDPs)"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Partially Observable Markov Decision Processes (POMDPs).

Автор: Grafiati

Опубліковано: 8 червня 2024

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-50 статей у журналах для дослідження на тему "Partially Observable Markov Decision Processes (POMDPs)".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

NI, YAODONG, and ZHI-QIANG LIU. "BOUNDED-PARAMETER PARTIALLY OBSERVABLE MARKOV DECISION PROCESSES: FRAMEWORK AND ALGORITHM." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 21, no. 06 (December 2013): 821–63. http://dx.doi.org/10.1142/s0218488513500396.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) are powerful for planning under uncertainty. However, it is usually impractical to employ a POMDP with exact parameters to model the real-life situation precisely, due to various reasons such as limited data for learning the model, inability of exact POMDPs to model dynamic situations, etc. In this paper, assuming that the parameters of POMDPs are imprecise but bounded, we formulate the framework of bounded-parameter partially observable Markov decision processes (BPOMDPs). A modified value iteration is proposed as a basic strategy for tackling parameter imprecision in BPOMDPs. In addition, we design the UL-based value iteration algorithm, in which each value backup is based on two sets of vectors called U-set and L-set. We propose four strategies for computing U-set and L-set. We analyze theoretically the computational complexity and the reward loss of the algorithm. The effectiveness and robustness of the algorithm are shown empirically.

2

Tennenholtz, Guy, Uri Shalit, and Shie Mannor. "Off-Policy Evaluation in Partially Observable Environments." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10276–83. http://dx.doi.org/10.1609/aaai.v34i06.6590.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This work studies the problem of batch off-policy evaluation for Reinforcement Learning in partially observable environments. Off-policy evaluation under partial observability is inherently prone to bias, with risk of arbitrarily large errors. We define the problem of off-policy evaluation for Partially Observable Markov Decision Processes (POMDPs) and establish what we believe is the first off-policy evaluation result for POMDPs. In addition, we formulate a model in which observed and unobserved variables are decoupled into two dynamic processes, called a Decoupled POMDP. We show how off-policy evaluation can be performed under this new model, mitigating estimation errors inherent to general POMDPs. We demonstrate the pitfalls of off-policy evaluation in POMDPs using a well-known off-policy method, Importance Sampling, and compare it with our result on synthetic medical data.

3

Carr, Steven, Nils Jansen, and Ufuk Topcu. "Task-Aware Verifiable RNN-Based Policies for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 72 (November 18, 2021): 819–47. http://dx.doi.org/10.1613/jair.1.12963.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) are models for sequential decision-making under uncertainty and incomplete information. Machine learning methods typically train recurrent neural networks (RNN) as effective representations of POMDP policies that can efficiently process sequential data. However, it is hard to verify whether the POMDP driven by such RNN-based policies satisfies safety constraints, for instance, given by temporal logic specifications. We propose a novel method that combines techniques from machine learning with the field of formal methods: training an RNN-based policy and then automatically extracting a so-called finite-state controller (FSC) from the RNN. Such FSCs offer a convenient way to verify temporal logic constraints. Implemented on a POMDP, they induce a Markov chain, and probabilistic verification methods can efficiently check whether this induced Markov chain satisfies a temporal logic specification. Using such methods, if the Markov chain does not satisfy the specification, a byproduct of verification is diagnostic information about the states in the POMDP that are critical for the specification. The method exploits this diagnostic information to either adjust the complexity of the extracted FSC or improve the policy by performing focused retraining of the RNN. The method synthesizes policies that satisfy temporal logic specifications for POMDPs with up to millions of states, which are three orders of magnitude larger than comparable approaches.

4

Kim, Sung-Kyun, Oren Salzman, and Maxim Likhachev. "POMHDP: Search-Based Belief Space Planning Using Multiple Heuristics." Proceedings of the International Conference on Automated Planning and Scheduling 29 (May 25, 2021): 734–44. http://dx.doi.org/10.1609/icaps.v29i1.3542.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Robots operating in the real world encounter substantial uncertainty that cannot be modeled deterministically before the actual execution. This gives rise to the necessity of robust motion planning under uncertainty also known as belief space planning. Belief space planning can be formulated as Partially Observable Markov Decision Processes (POMDPs). However, computing optimal policies for non-trivial POMDPs is computationally intractable. Building upon recent progress from the search community, we propose a novel anytime POMDP solver, Partially Observable Multi-Heuristic Dynamic Programming (POMHDP), that leverages multiple heuristics to efficiently compute high-quality solutions while guaranteeing asymptotic convergence to an optimal policy. Through iterative forward search, POMHDP utilizes domain knowledge to solve POMDPs with specific goals and an infinite horizon. We demonstrate the efficacy of our proposed framework on a real-world, highly-complex, truck unloading application.

5

Wang, Chenggang, and Roni Khardon. "Relational Partially Observable MDPs." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1153–58. http://dx.doi.org/10.1609/aaai.v24i1.7742.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Relational Markov Decision Processes (MDP) are a useful abstraction for stochastic planning problems since one can develop abstract solutions for them that are independent of domain size or instantiation. While there has been an increased interest in developing relational fully observable MDPs, there has been very little work on relational partially observable MDPs (POMDP), which deal with uncertainty in problem states in addition to stochastic action effects. This paper provides a concrete formalization of relational POMDPs making several technical contributions toward their solution. First, we show that to maintain correctness one must distinguish between quantification over states and quantification over belief states; this implies that solutions based on value iteration are inherently limited to the finite horizon case. Second, we provide a symbolic dynamic programing algorithm for finite horizon relational POMDPs, solving them at an abstract level, by lifting the propositional incremental pruning algorithm. Third, we show that this algorithm can be implemented using first order decision diagrams, a compact representation for functions over relational structures, that has been recently used to solve relational MDPs.

6

Hauskrecht, M. "Value-Function Approximations for Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 13 (August 1, 2000): 33–94. http://dx.doi.org/10.1613/jair.678.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) provide an elegant mathematical framework for modeling complex decision and planning problems in stochastic domains in which states of the system are observable only indirectly, via a set of imperfect or noisy observations. The modeling advantage of POMDPs, however, comes at a price -- exact methods for solving them are computationally very expensive and thus applicable in practice only to very simple problems. We focus on efficient approximation (heuristic) methods that attempt to alleviate the computational problem and trade off accuracy for speed. We have two objectives here. First, we survey various approximation methods, analyze their properties and relations and provide some new insights into their differences. Second, we present a number of new approximation methods and novel refinements of existing techniques. The theoretical results are supported by experiments on a problem from the agent navigation domain.

7

Victorio-Meza, Hermilo, Manuel Mejía-Lavalle, Alicia Martínez Rebollar, Andrés Blanco Ortega, Obdulia Pichardo Lagunas, and Grigori Sidorov. "Searching for Cerebrovascular Disease Optimal Treatment Recommendations Applying Partially Observable Markov Decision Processes." International Journal of Pattern Recognition and Artificial Intelligence 32, no. 01 (October 9, 2017): 1860015. http://dx.doi.org/10.1142/s0218001418600157.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) are mathematical models for the planning of action sequences under conditions of uncertainty. Uncertainty in POMDPs is manifested in two ways: uncertainty in the perception of model states and uncertainty in the effects of actions on states. The diagnosis and treatment of cerebral vascular diseases (CVD) present this double condition of uncertainty, so we think that POMDP is the most suitable method to model them. In this paper, we propose a model of CVD that is based on observations obtained from neuroimaging studies such as computed tomography, magnetic resonance and ultrasound. The model is designed as a POMDP because the health status of the patient is not directly observable, and only can be deduced, with some probability, from the observations in the cerebral images. The components of the model (states, observations, actions, etc.) were defined based on specialized literature. A diagnosis of the patient’s health status is made and the most appropriate action for the recovery of health is recommended after introducing the observations when operating the model. Consultation of the probable state of health of the patient and alternative actions is also allowed.

8

Zhang, N. L., and W. Liu. "A Model Approximation Scheme for Planning in Partially Observable Stochastic Domains." Journal of Artificial Intelligence Research 7 (November 1, 1997): 199–230. http://dx.doi.org/10.1613/jair.419.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) are a natural model for planning problems where effects of actions are nondeterministic and the state of the world is not completely observable. It is difficult to solve POMDPs exactly. This paper proposes a new approximation scheme. The basic idea is to transform a POMDP into another one where additional information is provided by an oracle. The oracle informs the planning agent that the current state of the world is in a certain region. The transformed POMDP is consequently said to be region observable. It is easier to solve than the original POMDP. We propose to solve the transformed POMDP and use its optimal policy to construct an approximate policy for the original POMDP. By controlling the amount of additional information that the oracle provides, it is possible to find a proper tradeoff between computational time and approximation quality. In terms of algorithmic contributions, we study in details how to exploit region observability in solving the transformed POMDP. To facilitate the study, we also propose a new exact algorithm for general POMDPs. The algorithm is conceptually simple and yet is significantly more efficient than all previous exact algorithms.

9

Omidshafiei, Shayegan, Ali–Akbar Agha–Mohammadi, Christopher Amato, Shih–Yuan Liu, Jonathan P. How, and John Vian. "Decentralized control of multi-robot partially observable Markov decision processes using belief space macro-actions." International Journal of Robotics Research 36, no. 2 (February 2017): 231–58. http://dx.doi.org/10.1177/0278364917692864.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

This work focuses on solving general multi-robot planning problems in continuous spaces with partial observability given a high-level domain description. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This work extends the Dec-POMDP model to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP) to take advantage of the high-level representations that are natural for multi-robot problems and to facilitate scalable solutions to large discrete and continuous problems. The Dec-POSMDP formulation uses task macro-actions created from lower-level local actions that allow for asynchronous decision-making by the robots, which is crucial in multi-robot domains. This transformation from Dec-POMDPs to Dec-POSMDPs with a finite set of automatically-generated macro-actions allows use of efficient discrete-space search algorithms to solve them. The paper presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods since they can incorporate closed-loop belief space macro-actions in planning. These macro-actions are automatically constructed to produce robust solutions. The proposed algorithms are then evaluated on a complex multi-robot package delivery problem under uncertainty, showing that our approach can naturally represent realistic problems and provide high-quality solutions for large-scale problems.

10

Rozek, Brandon, Junkyu Lee, Harsha Kokel, Michael Katz, and Shirin Sohrabi. "Partially Observable Hierarchical Reinforcement Learning with AI Planning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 21 (March 24, 2024): 23635–36. http://dx.doi.org/10.1609/aaai.v38i21.30504.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) challenge reinforcement learning agents due to incomplete knowledge of the environment. Even assuming monotonicity in uncertainty, it is difficult for an agent to know how and when to stop exploring for a given task. In this abstract, we discuss how to use hierarchical reinforcement learning (HRL) and AI Planning (AIP) to improve exploration when the agent knows possible valuations of unknown predicates and how to discover them. By encoding the uncertainty in an abstract planning model, the agent can derive a high-level plan which is then used to decompose the overall POMDP into a tree of semi-POMDPs for training. We evaluate our agent's performance on the MiniGrid domain and show how guided exploration may improve agent performance.

11

Theocharous, Georgios, and Sridhar Mahadevan. "Compressing POMDPs Using Locality Preserving Non-Negative Matrix Factorization." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1147–52. http://dx.doi.org/10.1609/aaai.v24i1.7750.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially Observable Markov Decision Processes (POMDPs) are a well-established and rigorous framework for sequential decision-making under uncertainty. POMDPs are well-known to be intractable to solve exactly, and there has been significant work on finding tractable approximation methods. One well-studied approach is to find a compression of the original POMDP by projecting the belief states to a lower-dimensional space. We present a novel dimensionality reduction method for POMDPs based on locality preserving non-negative matrix factorization. Unlike previous approaches, such as Krylov compression and regular non-negative matrix factorization, our approach preserves the local geometry of the belief space manifold. We present results on standard benchmark POMDPs showing improved performance over previously explored compression algorithms for POMDPs.

12

Walraven, Erwin, and Matthijs T. J. Spaan. "Point-Based Value Iteration for Finite-Horizon POMDPs." Journal of Artificial Intelligence Research 65 (July 11, 2019): 307–41. http://dx.doi.org/10.1613/jair.1.11324.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially Observable Markov Decision Processes (POMDPs) are a popular formalism for sequential decision making in partially observable environments. Since solving POMDPs to optimality is a difficult task, point-based value iteration methods are widely used. These methods compute an approximate POMDP solution, and in some cases they even provide guarantees on the solution quality, but these algorithms have been designed for problems with an infinite planning horizon. In this paper we discuss why state-of-the-art point-based algorithms cannot be easily applied to finite-horizon problems that do not include discounting. Subsequently, we present a general point-based value iteration algorithm for finite-horizon problems which provides solutions with guarantees on solution quality. Furthermore, we introduce two heuristics to reduce the number of belief points considered during execution, which lowers the computational requirements. In experiments we demonstrate that the algorithm is an effective method for solving finite-horizon POMDPs.

13

Zhang, N. L., and W. Zhang. "Speeding Up the Convergence of Value Iteration in Partially Observable Markov Decision Processes." Journal of Artificial Intelligence Research 14 (February 1, 2001): 29–51. http://dx.doi.org/10.1613/jair.761.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) have recently become popular among many AI researchers because they serve as a natural model for planning under uncertainty. Value iteration is a well-known algorithm for finding optimal policies for POMDPs. It typically takes a large number of iterations to converge. This paper proposes a method for accelerating the convergence of value iteration. The method has been evaluated on an array of benchmark problems and was found to be very effective: It enabled value iteration to converge after only a few iterations on all the test problems.

14

Wang, Erli, Hanna Kurniawati, and Dirk Kroese. "An On-Line Planner for POMDPs with Large Discrete Action Space: A Quantile-Based Approach." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 273–77. http://dx.doi.org/10.1609/icaps.v28i1.13906.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Making principled decisions in the presence of uncertainty is often facilitated by Partially Observable Markov Decision Processes (POMDPs). Despite tremendous advances in POMDP solvers, finding good policies with large action spaces remains difficult. To alleviate this difficulty, this paper presents an on-line approximate solver, called Quantile-Based Action Selector (QBASE). It uses quantile-statistics to adaptively evaluate a small subset of the action space without sacrificing the quality of the generated decision strategies by much. Experiments on four different robotics tasks with up to 10,000 actions indicate that QBASE can generate substantially better strategies than a state-of-the-art method.

15

Zhang, Zongzhang, Michael Littman, and Xiaoping Chen. "Covering Number as a Complexity Measure for POMDP Planning and Learning." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1853–59. http://dx.doi.org/10.1609/aaai.v26i1.8360.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.

16

Ross, S., J. Pineau, S. Paquet, and B. Chaib-draa. "Online Planning Algorithms for POMDPs." Journal of Artificial Intelligence Research 32 (July 29, 2008): 663–704. http://dx.doi.org/10.1613/jair.2567.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially Observable Markov Decision Processes (POMDPs) provide a rich framework for sequential decision-making under uncertainty in stochastic domains. However, solving a POMDP is often intractable except for small problems due to their complexity. Here, we focus on online approaches that alleviate the computational complexity by computing good local policies at each decision step during the execution. Online algorithms generally consist of a lookahead search to find the best action to execute at each time step in an environment. Our objectives here are to survey the various existing online POMDP methods, analyze their properties and discuss their advantages and disadvantages; and to thoroughly evaluate these online approaches in different environments under various metrics (return, error bound reduction, lower bound improvement). Our experimental results indicate that state-of-the-art online heuristic search methods can handle large POMDP domains efficiently.

17

Ko, Li Ling, David Hsu, Wee Sun Lee, and Sylvie Ong. "Structured Parameter Elicitation." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1102–7. http://dx.doi.org/10.1609/aaai.v24i1.7744.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The behavior of a complex system often depends on parameters whose values are unknown in advance. To operate effectively, an autonomous agent must actively gather information on the parameter values while progressing towards its goal. We call this problem parameter elicitation. Partially observable Markov decision processes (POMDPs) provide a principled framework for such uncertainty planning tasks, but they suffer from high computational complexity. However, POMDPs for parameter elicitation often possess special structural properties, specifically, factorization and symmetry. This work identifies these properties and exploits them for efficient solution through a factored belief representation. The experimental results show that our new POMDP solvers outperform SARSOP and MOMDP, two of the fastest general-purpose POMDP solvers available, and can handle significantly larger problems.

18

XIANG, YANG, and FRANK HANSHAR. "MULTIAGENT EXPEDITION WITH GRAPHICAL MODELS." International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 19, no. 06 (December 2011): 939–76. http://dx.doi.org/10.1142/s0218488511007416.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We investigate a class of multiagent planning problems termed multiagent expedition, where agents move around an open, unknown, partially observable, stochastic, and physical environment, in pursuit of multiple and alternative goals of different utility. Optimal planning in multiagent expedition is highly intractable. We introduce the notion of conditional optimality, decompose the task into a set of semi-independent optimization subtasks, and apply a decision-theoretic multiagent graphical model to solve each subtask optimally. A set of techniques are proposed to enhance modeling so that the resultant graphical model can be practically evaluated. Effectiveness of the framework and its scalability are demonstrated through experiments. Multiagent expedition can be characterized as decentralized partially observable Markov decision processes (Dec-POMDPs). Hence, this work contributes towards practical planning in Dec-POMDPs.

19

Sanner, Scott, and Kristian Kersting. "Symbolic Dynamic Programming for First-order POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 1 (July 4, 2010): 1140–46. http://dx.doi.org/10.1609/aaai.v24i1.7747.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially-observable Markov decision processes (POMDPs) provide a powerful model for sequential decision-making problems with partially-observed state and are known to have (approximately) optimal dynamic programming solutions. Much work in recent years has focused on improving the efficiency of these dynamic programming algorithms by exploiting symmetries and factored or relational representations. In this work, we show that it is also possible to exploit the full expressive power of first-order quantification to achieve state, action, and observation abstraction in a dynamic programming solution to relationally specified POMDPs. Among the advantages of this approach are the ability to maintain compact value function representations, abstract over the space of potentially optimal actions, and automatically derive compact conditional policy trees that minimally partition relational observation spaces according to distinctions that have an impact on policy values. This is the first lifted relational POMDP solution that can optimally accommodate actions with a potentially infinite relational space of observation outcomes.

20

Capitan, Jesus, Matthijs Spaan, Luis Merino, and Anibal Ollero. "Decentralized Multi-Robot Cooperation with Auctioned POMDPs." Proceedings of the International Conference on Automated Planning and Scheduling 24 (May 11, 2014): 515–18. http://dx.doi.org/10.1609/icaps.v24i1.13658.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Planning under uncertainty faces a scalability problem when considering multi-robot teams, as the information space scales exponentially with the number of robots. To address this issue, this paper proposes to decentralize multi-robot Partially Observable Markov Decision Processes (POMDPs) while maintaining cooperation between robots by using POMDP policy auctions. Auctions provide a flexible way of coordinating individual policies modeled by POMDPs and have low communication requirements. Additionally, communication models in the multi-agent POMDP literature severely mismatch with real inter-robot communication. We address this issue by exploiting a decentralized data fusion method in order to efficiently maintain a joint belief state among the robots. The paper presents results in two different applications: environmental monitoring with Unmanned Aerial Vehicles (UAVs); and cooperative tracking, in which several robots have to jointly track a moving target of interest.

21

Lin, Yong, Xingjia Lu, and Fillia Makedon. "Approximate Planning in POMDPs with Weighted Graph Models." International Journal on Artificial Intelligence Tools 24, no. 04 (August 2015): 1550014. http://dx.doi.org/10.1142/s0218213015500141.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Markov decision process (MDP) based heuristic algorithms have been considered as simple, fast, but imprecise solutions for partially observable Markov decision processes (POMDPs). The main reason comes from how we approximate belief points. We use weighted graphs to model the state space and the belief space, in order for a detailed analysis of the MDP heuristic algorithm. As a result, we provide the prerequisite conditions to build up a robust belief graph. We further introduce a dynamic mechanism to manage belief space in the belief graph, so as to improve the efficiency and decrease the space complexity. Experimental results indicate our approach is fast and has high quality for POMDPs.

22

Aras, R., and A. Dutech. "An Investigation into Mathematical Programming for Finite Horizon Decentralized POMDPs." Journal of Artificial Intelligence Research 37 (March 26, 2010): 329–96. http://dx.doi.org/10.1613/jair.2915.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized planning in uncertain environments is a complex task generally dealt with by using a decision-theoretic approach, mainly through the framework of Decentralized Partially Observable Markov Decision Processes (DEC-POMDPs). Although DEC-POMDPS are a general and powerful modeling tool, solving them is a task with an overwhelming complexity that can be doubly exponential. In this paper, we study an alternate formulation of DEC-POMDPs relying on a sequence-form representation of policies. From this formulation, we show how to derive Mixed Integer Linear Programming (MILP) problems that, once solved, give exact optimal solutions to the DEC-POMDPs. We show that these MILPs can be derived either by using some combinatorial characteristics of the optimal solutions of the DEC-POMDPs or by using concepts borrowed from game theory. Through an experimental validation on classical test problems from the DEC-POMDP literature, we compare our approach to existing algorithms. Results show that mathematical programming outperforms dynamic programming but is less efficient than forward search, except for some particular problems. The main contributions of this work are the use of mathematical programming for DEC-POMDPs and a better understanding of DEC-POMDPs and of their solutions. Besides, we argue that our alternate representation of DEC-POMDPs could be helpful for designing novel algorithms looking for approximate solutions to DEC-POMDPs.

23

Wen, Xian, Haifeng Huo, and Jinhua Cui. "The optimal probability of the risk for finite horizon partially observable Markov decision processes." AIMS Mathematics 8, no. 12 (2023): 28435–49. http://dx.doi.org/10.3934/math.20231455.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

<abstract><p>This paper investigates the optimality of the risk probability for finite horizon partially observable discrete-time Markov decision processes (POMDPs). The probability of the risk is optimized based on the criterion of total rewards not exceeding the preset goal value, which is different from the optimal problem of expected rewards. Based on the Bayes operator and the filter equations, the optimization problem of risk probability can be equivalently reformulated as filtered Markov decision processes. As an advantage of developing the value iteration technique, the optimality equation satisfied by the value function is established and the existence of the risk probability optimal policy is proven. Finally, an example is given to illustrate the effectiveness of using the value iteration algorithm to compute the value function and optimal policy.</p></abstract>

24

Itoh, Hideaki, Hisao Fukumoto, Hiroshi Wakuya, and Tatsuya Furukawa. "Bottom-up learning of hierarchical models in a class of deterministic POMDP environments." International Journal of Applied Mathematics and Computer Science 25, no. 3 (September 1, 2015): 597–615. http://dx.doi.org/10.1515/amcs-2015-0044.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractThe theory of partially observable Markov decision processes (POMDPs) is a useful tool for developing various intelligent agents, and learning hierarchical POMDP models is one of the key approaches for building such agents when the environments of the agents are unknown and large. To learn hierarchical models, bottom-up learning methods in which learning takes place in a layer-by-layer manner from the lowest to the highest layer are already extensively used in some research fields such as hidden Markov models and neural networks. However, little attention has been paid to bottom-up approaches for learning POMDP models. In this paper, we present a novel bottom-up learning algorithm for hierarchical POMDP models and prove that, by using this algorithm, a perfect model (i.e., a model that can perfectly predict future observations) can be learned at least in a class of deterministic POMDP environments

25

Dressel, Louis, and Mykel Kochenderfer. "Efficient Decision-Theoretic Target Localization." Proceedings of the International Conference on Automated Planning and Scheduling 27 (June 5, 2017): 70–78. http://dx.doi.org/10.1609/icaps.v27i1.13832.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) offer a principled approach to control under uncertainty. However, POMDP solvers generally require rewards to depend only on the state and action. This limitation is unsuitable for information-gathering problems, where rewards are more naturally expressed as functions of belief. In this work, we consider target localization, an information-gathering task where an agent takes actions leading to informative observations and a concentrated belief over possible target locations. By leveraging recent theoretical and algorithmic advances, we investigate offline and online solvers that incorporate belief-dependent rewards. We extend SARSOP — a state-of-the-art offline solver — to handle belief-dependent rewards, exploring different reward strategies and showing how they can be compactly represented. We present an improved lower bound that greatly speeds convergence. POMDP-lite, an online solver, is also evaluated in the context of information-gathering tasks. These solvers are applied to control a hexcopter UAV searching for a radio frequency source—a challenging real-world problem.

26

Chatterjee, Krishnendu, Martin Chmelik, and Ufuk Topcu. "Sensor Synthesis for POMDPs with Reachability Objectives." Proceedings of the International Conference on Automated Planning and Scheduling 28 (June 15, 2018): 47–55. http://dx.doi.org/10.1609/icaps.v28i1.13875.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) are widely used in probabilistic planning problems in which an agent interacts with an environment using noisy and imprecise sensors. We study a setting in which the sensors are only partially defined and the goal is to synthesize “weakest” additional sensors, such that in the resulting POMDP, there is a small-memory policy for the agent that almost-surely (with probability 1) satisfies a reachability objective. We show that the problem is NP-complete, and present a symbolic algorithm by encoding the problem into SAT instances. We illustrate trade-offs between the amount of memory of the policy and the number of additional sensors on a simple example. We have implemented our approach and consider three classical POMDP examples from the literature, and show that in all the examples the number of sensors can be significantly decreased (as compared to the existing solutions in the literature) without increasing the complexity of the policies.

27

Park, Jaeyoung, Kee-Eung Kim, and Yoon-Kyu Song. "A POMDP-Based Optimal Control of P300-Based Brain-Computer Interfaces." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 1559–62. http://dx.doi.org/10.1609/aaai.v25i1.7956.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Most of the previous work on brain-computer interfaces (BCIs) exploiting the P300 in electroencephalography (EEG) has focused on low-level signal processing algorithms such as feature extraction and classification methods. Although a significant improvement has been made in the past, the accuracy of detecting P300 is limited by the inherently low signal-to-noise ratio in EEGs. In this paper, we present a systematic approach to optimize the interface using partially observable Markov decision processes (POMDPs). Through experiments involving human subjects, we show the P300 speller system that is optimized using the POMDP achieves a significant performance improvement in terms of the communication bandwidth in the interaction.

28

Doshi, P., and P. J. Gmytrasiewicz. "Monte Carlo Sampling Methods for Approximating Interactive POMDPs." Journal of Artificial Intelligence Research 34 (March 24, 2009): 297–337. http://dx.doi.org/10.1613/jair.2630.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) provide a principled framework for sequential planning in uncertain single agent settings. An extension of POMDPs to multiagent settings, called interactive POMDPs (I-POMDPs), replaces POMDP belief spaces with interactive hierarchical belief systems which represent an agents belief about the physical world, about beliefs of other agents, and about their beliefs about others beliefs. This modification makes the difficulties of obtaining solutions due to complexity of the belief and policy spaces even more acute. We describe a general method for obtaining approximate solutions of I-POMDPs based on particle filtering (PF). We introduce the interactive PF, which descends the levels of the interactive belief hierarchies and samples and propagates beliefs at each level. The interactive PF is able to mitigate the belief space complexity, but it does not address the policy space complexity. To mitigate the policy space complexity sometimes also called the curse of history we utilize a complementary method based on sampling likely observations while building the look ahead reachability tree. While this approach does not completely address the curse of history, it beats back the curses impact substantially. We provide experimental results and chart future work.

29

Shatkay, H., and L. P. Kaelbling. "Learning Geometrically-Constrained Hidden Markov Models for Robot Navigation: Bridging the Topological-Geometrical Gap." Journal of Artificial Intelligence Research 16 (March 1, 2002): 167–207. http://dx.doi.org/10.1613/jair.874.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Hidden Markov models (HMMs) and partially observable Markov decision processes (POMDPs) provide useful tools for modeling dynamical systems. They are particularly useful for representing the topology of environments such as road networks and office buildings, which are typical for robot navigation and planning. The work presented here describes a formal framework for incorporating readily available odometric information and geometrical constraints into both the models and the algorithm that learns them. By taking advantage of such information, learning HMMs/POMDPs can be made to generate better solutions and require fewer iterations, while being robust in the face of data reduction. Experimental results, obtained from both simulated and real robot data, demonstrate the effectiveness of the approach.

30

Lim, Michael H., Tyler J. Becker, Mykel J. Kochenderfer, Claire J. Tomlin, and Zachary N. Sunberg. "Optimality Guarantees for Particle Belief Approximation of POMDPs." Journal of Artificial Intelligence Research 77 (August 27, 2023): 1591–636. http://dx.doi.org/10.1613/jair.1.14525.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) provide a flexible representation for real-world decision and control problems. However, POMDPs are notoriously difficult to solve, especially when the state and observation spaces are continuous or hybrid, which is often the case for physical systems. While recent online sampling-based POMDP algorithms that plan with observation likelihood weighting have shown practical effectiveness, a general theory characterizing the approximation error of the particle filtering techniques that these algorithms use has not previously been proposed. Our main contribution is bounding the error between any POMDP and its corresponding finite sample particle belief MDP (PB-MDP) approximation. This fundamental bridge between PB-MDPs and POMDPs allows us to adapt any sampling-based MDP algorithm to a POMDP by solving the corresponding particle belief MDP, thereby extending the convergence guarantees of the MDP algorithm to the POMDP. Practically, this is implemented by using the particle filter belief transition model as the generative model for the MDP solver. While this requires access to the observation density model from the POMDP, it only increases the transition sampling complexity of the MDP solver by a factor of O(C), where C is the number of particles. Thus, when combined with sparse sampling MDP algorithms, this approach can yield algorithms for POMDPs that have no direct theoretical dependence on the size of the state and observation spaces. In addition to our theoretical contribution, we perform five numerical experiments on benchmark POMDPs to demonstrate that a simple MDP algorithm adapted using PB-MDP approximation, Sparse-PFT, achieves performance competitive with other leading continuous observation POMDP solvers.

31

Spaan, M. T. J., and N. Vlassis. "Perseus: Randomized Point-based Value Iteration for POMDPs." Journal of Artificial Intelligence Research 24 (August 1, 2005): 195–220. http://dx.doi.org/10.1613/jair.1659.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Partially observable Markov decision processes (POMDPs) form an attractive and principled framework for agent planning under uncertainty. Point-based approximate techniques for POMDPs compute a policy based on a finite set of points collected in advance from the agent's belief space. We present a randomized point-based value iteration algorithm called Perseus. The algorithm performs approximate value backup stages, ensuring that in each backup stage the value of each point in the belief set is improved; the key observation is that a single backup may improve the value of many belief points. Contrary to other point-based methods, Perseus backs up only a (randomly selected) subset of points in the belief set, sufficient for improving the value of each belief point in the set. We show how the same idea can be extended to dealing with continuous action spaces. Experimental results show the potential of Perseus in large scale POMDP problems.

32

Kraemer, Landon, and Bikramjit Banerjee. "Informed Initial Policies for Learning in Dec-POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 2433–34. http://dx.doi.org/10.1609/aaai.v26i1.8426.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a formal model for planning in cooperative multiagent systems where agents operate with noisy sensors and actuators, and local information. Prevalent Dec-POMDP solution techniques have mostly been centralized and have assumed knowledge of the model. In real world scenarios, however, solving centrally may not be an option and model parameters maybe unknown. To address this, we propose a distributed, model-free algorithm for learning Dec-POMDP policies, in which agents take turns learning, with each agent not currently learning following a static policy. For agents that have not yet learned a policy, this static policy must be initialized. We propose a principled method for learning such initial policies through interaction with the environment. We show that by using such informed initial policies, our alternate learning algorithm can find near-optimal policies for two benchmark problems.

33

Banerjee, Bikramjit, Jeremy Lyle, Landon Kraemer, and Rajesh Yellamraju. "Sample Bounded Distributed Reinforcement Learning for Decentralized POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1256–62. http://dx.doi.org/10.1609/aaai.v26i1.8260.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. We propose a distributed reinforcement learning approach, where agents take turns to learn best responses to each other’s policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. We derive the relation between the sample complexity of best response learning and error tolerance. Our key contribution is to show that sample complexity could grow exponentially with the problem horizon. We show empirically that even if the sample requirement is set lower than what theory demands, our learning approach can produce (near) optimal policies in some benchmark Dec-POMDP problems.

34

Bernstein, D. S., C. Amato, E. A. Hansen, and S. Zilberstein. "Policy Iteration for Decentralized Control of Markov Decision Processes." Journal of Artificial Intelligence Research 34 (March 1, 2009): 89–132. http://dx.doi.org/10.1613/jair.2667.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Coordination of distributed agents is required for problems arising in many areas, including multi-robot systems, networking and e-commerce. As a formal framework for such problems, we use the decentralized partially observable Markov decision process (DEC-POMDP). Though much work has been done on optimal dynamic programming algorithms for the single-agent version of the problem, optimal algorithms for the multiagent case have been elusive. The main contribution of this paper is an optimal policy iteration algorithm for solving DEC-POMDPs. The algorithm uses stochastic finite-state controllers to represent policies. The solution can include a correlation device, which allows agents to correlate their actions without communicating. This approach alternates between expanding the controller and performing value-preserving transformations, which modify the controller without sacrificing value. We present two efficient value-preserving transformations: one can reduce the size of the controller and the other can improve its value while keeping the size fixed. Empirical results demonstrate the usefulness of value-preserving transformations in increasing value while keeping controller size to a minimum. To broaden the applicability of the approach, we also present a heuristic version of the policy iteration algorithm, which sacrifices convergence to optimality. This algorithm further reduces the size of the controllers at each step by assuming that probability distributions over the other agents' actions are known. While this assumption may not hold in general, it helps produce higher quality solutions in our test problems.

35

Wu, Bo, Yan Peng Feng, and Hong Yan Zheng. "Point-Based Monte Carto Online Planning in POMDPs." Advanced Materials Research 846-847 (November 2013): 1388–91. http://dx.doi.org/10.4028/www.scientific.net/amr.846-847.1388.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

The online planning and learning in partially observable Markov decision processes are often intractable because belief states space has two curses: dimensionality and history. In order to address this problem, this paper proposes a point-based Monte Carto online planning approach in POMDPs. This approach involves performing value backup at specific reachable belief points, rather than over the entire belief simplex, to speed up computation processes. Then Monte Carlo tree search algorithm is exploited to share the value of actions across each subtree of the search tree so as to minimise the mean squared error. The experimental results show that the proposed algorithm is effective in real-time system.

36

Ajdarów, Michal, Šimon Brlej, and Petr Novotný. "Shielding in Resource-Constrained Goal POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14674–82. http://dx.doi.org/10.1609/aaai.v37i12.26715.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We consider partially observable Markov decision processes (POMDPs) modeling an agent that needs a supply of a certain resource (e.g., electricity stored in batteries) to operate correctly. The resource is consumed by the agent's actions and can be replenished only in certain states. The agent aims to minimize the expected cost of reaching some goal while preventing resource exhaustion, a problem we call resource-constrained goal optimization (RSGO). We take a two-step approach to the RSGO problem. First, using formal methods techniques, we design an algorithm computing a shield for a given scenario: a procedure that observes the agent and prevents it from using actions that might eventually lead to resource exhaustion. Second, we augment the POMCP heuristic search algorithm for POMDP planning with our shields to obtain an algorithm solving the RSGO problem. We implement our algorithm and present experiments showing its applicability to benchmarks from the literature.

37

Simão, Thiago D., Marnix Suilen, and Nils Jansen. "Safe Policy Improvement for POMDPs via Finite-State Controllers." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 15109–17. http://dx.doi.org/10.1609/aaai.v37i12.26763.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve upon the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data is available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.

38

Zhang, Zongzhang, David Hsu, Wee Sun Lee, Zhan Wei Lim, and Aijun Bai. "PLEASE: Palm Leaf Search for POMDPs with Large Observation Spaces." Proceedings of the International Conference on Automated Planning and Scheduling 25 (April 8, 2015): 249–57. http://dx.doi.org/10.1609/icaps.v25i1.13706.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Trial-based asynchronous value iteration algorithms for large Partially Observable Markov Decision Processes (POMDPs), such as HSVI2, FSVI and SARSOP, have made impressive progress in the past decade. In the forward exploration phase of these algorithms, only the outcome that has the highest potential impact is searched. This paper provides a novel approach, called Palm LEAf SEarch (PLEASE), which allows the selection of more than one outcome when their potential impacts are close to the highest one. Compared with existing trial-based algorithms, PLEASE can save considerable time to propagate the bound improvements of beliefs in deep levels of the search tree to the root belief because of fewer point-based value backups. Experiments show that PLEASE scales up SARSOP, one of the fastest algorithms, by orders of magnitude on some POMDP tasks with large observation spaces.

39

Bouton, Maxime, Jana Tumova, and Mykel J. Kochenderfer. "Point-Based Methods for Model Checking in Partially Observable Markov Decision Processes." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10061–68. http://dx.doi.org/10.1609/aaai.v34i06.6563.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Autonomous systems are often required to operate in partially observable environments. They must reliably execute a specified objective even with incomplete information about the state of the environment. We propose a methodology to synthesize policies that satisfy a linear temporal logic formula in a partially observable Markov decision process (POMDP). By formulating a planning problem, we show how to use point-based value iteration methods to efficiently approximate the maximum probability of satisfying a desired logical formula and compute the associated belief state policy. We demonstrate that our method scales to large POMDP domains and provides strong bounds on the performance of the resulting policy.

40

Sonu, Ekhlas, Yingke Chen, and Prashant Doshi. "Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs." Proceedings of the International Conference on Automated Planning and Scheduling 25 (April 8, 2015): 202–10. http://dx.doi.org/10.1609/icaps.v25i1.13712.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Interactive partially observable Markov decision processes (I-POMDP) provide a formal framework for planning for a self-interested agent in multiagent settings. An agent operating in a multiagent environment must deliberate about the actions that other agents may take and the effect these actions have on the environment and the rewards it receives. Traditional I-POMDPs model this dependence on the actions of other agents using joint action and model spaces. Therefore, the solution complexity grows exponentially with the number of agents thereby complicating scalability. In this paper, we model and extend anonymity and context-specific independence — problem structures often present in agent populations — for computational gain. We empirically demonstrate the efficiency from exploiting these problem structures by solving a new multiagent problem involving more than 1,000 agents.

41

Petrik, Marek, and Shlomo Zilberstein. "Linear Dynamic Programs for Resource Management." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 1377–83. http://dx.doi.org/10.1609/aaai.v25i1.7794.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Sustainable resource management in many domains presents large continuous stochastic optimization problems, which can often be modeled as Markov decision processes (MDPs). To solve such large MDPs, we identify and leverage linearity in state and action sets that is common in resource management. In particular, we introduce linear dynamic programs (LDPs) that generalize resource management problems and partially observable MDPs (POMDPs). We show that the LDP framework makes it possible to adapt point-based methods--the state of the art in solving POMDPs--to solving LDPs. The experimental results demonstrate the efficiency of this approach in managing the water level of a river reservoir. Finally, we discuss the relationship with dual dynamic programming, a method used to optimize hydroelectric systems.

42

Dibangoye, Jilles Steeve, Christopher Amato, Olivier Buffet, and François Charpillet. "Optimally Solving Dec-POMDPs as Continuous-State MDPs." Journal of Artificial Intelligence Research 55 (February 24, 2016): 443–97. http://dx.doi.org/10.1613/jair.4623.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized partially observable Markov decision processes (Dec-POMDPs) provide a general model for decision-making under uncertainty in decentralized settings, but are difficult to solve optimally (NEXP-Complete). As a new way of solving these problems, we introduce the idea of transforming a Dec-POMDP into a continuous-state deterministic MDP with a piecewise-linear and convex value function. This approach makes use of the fact that planning can be accomplished in a centralized offline manner, while execution can still be decentralized. This new Dec-POMDP formulation, which we call an occupancy MDP, allows powerful POMDP and continuous-state MDP methods to be used for the first time. To provide scalability, we refine this approach by combining heuristic search and compact representations that exploit the structure present in multi-agent domains, without losing the ability to converge to an optimal solution. In particular, we introduce a feature-based heuristic search value iteration (FB-HSVI) algorithm that relies on feature-based compact representations, point-based updates and efficient action selection. A theoretical analysis demonstrates that FB-HSVI terminates in finite time with an optimal solution. We include an extensive empirical analysis using well-known benchmarks, thereby demonstrating that our approach provides significant scalability improvements compared to the state of the art.

43

Ng, Brenda, Carol Meyers, Kofi Boakye, and John Nitao. "Towards Applying Interactive POMDPs to Real-World Adversary Modeling." Proceedings of the AAAI Conference on Artificial Intelligence 24, no. 2 (October 7, 2021): 1814–20. http://dx.doi.org/10.1609/aaai.v24i2.18818.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

We examine the suitability of using decision processes to model real-world systems of intelligent adversaries. Decision processes have long been used to study cooperative multiagent interactions, but their practical applicability to adversarial problems has received minimal study. We address the pros and cons of applying sequential decision-making in this area, using the crime of money laundering as a specific example. Motivated by case studies, we abstract out a model of the money laundering process, using the framework of interactive partially observable Markov decision processes (I-POMDPs). We address why this framework is well suited for modeling adversarial interactions. Particle filtering and value iteration are used to solve the model, with the application of different pruning and look-ahead strategies to assess the tradeoffs between solution quality and algorithmic run time. Our results show that there is a large gap in the level of realism that can currently be achieved by such decision models, largely due to computational demands that limit the size of problems that can be solved. While these results represent solutions to a simplified model of money laundering, they illustrate nonetheless the kinds of agent interactions that cannot be captured by standard approaches such as anomaly detection. This implies that I-POMDP methods may be valuable in the future, when algorithmic capabilities have further evolved.

44

Boots, Byron, and Geoffrey Gordon. "An Online Spectral Learning Algorithm for Partially Observable Nonlinear Dynamical Systems." Proceedings of the AAAI Conference on Artificial Intelligence 25, no. 1 (August 4, 2011): 293–300. http://dx.doi.org/10.1609/aaai.v25i1.7924.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Recently, a number of researchers have proposed spectral algorithms for learning models of dynamical systems — for example, Hidden Markov Models (HMMs), Partially Observable Markov Decision Processes (POMDPs), and Transformed Predictive State Representations (TPSRs). These algorithms are attractive since they are statistically consistent and not subject to local optima. However, they are batch methods: they need to store their entire training data set in memory at once and operate on it as a large matrix, and so they cannot scale to extremely large data sets (either many examples or many features per example). In turn, this restriction limits their ability to learn accurate models of complex systems. To overcome these limitations, we propose a new online spectral algorithm, which uses tricks such as incremental Singular Value Decomposition (SVD) and random projections to scale to much larger data sets and more complex systems than previous methods. We demonstrate the new method on an inertial measurement prediction task and a high-bandwidth video mapping task and we illustrate desirable behaviors such as "closing the loop," where the latent state representation changes suddenly as the learner recognizes that it has returned to a previously known place.

45

Banerjee, Bikramjit. "Pruning for Monte Carlo Distributed Reinforcement Learning in Decentralized POMDPs." Proceedings of the AAAI Conference on Artificial Intelligence 27, no. 1 (June 30, 2013): 88–94. http://dx.doi.org/10.1609/aaai.v27i1.8670.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized partially observable Markov decision processes (Dec-POMDPs) offer a powerful modeling technique for realistic multi-agent coordination problems under uncertainty. Prevalent solution techniques are centralized and assume prior knowledge of the model. Recently a Monte Carlo based distributed reinforcement learning approach was proposed, where agents take turns to learn best responses to each other’s policies. This promotes decentralization of the policy computation problem, and relaxes reliance on the full knowledge of the problem parameters. However, this Monte Carlo approach has a large sample complexity, which we address in this paper. In particular, we propose and analyze a modified version of the previous algorithm that adaptively eliminates parts of the experience tree from further exploration, thus requiring fewer samples while ensuring unchanged confidence in the learned value function. Experiments demonstrate significant reduction in sample complexity – the maximum reductions ranging from 61% to 91% over different benchmark Dec-POMDP problems – with the final policies being often better due to more focused exploration.

46

Amato, Christopher, George Konidaris, Leslie P. Kaelbling, and Jonathan P. How. "Modeling and Planning with Macro-Actions in Decentralized POMDPs." Journal of Artificial Intelligence Research 64 (March 25, 2019): 817–59. http://dx.doi.org/10.1613/jair.1.11418.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Decentralized partially observable Markov decision processes (Dec-POMDPs) are general models for decentralized multi-agent decision making under uncertainty. However, they typically model a problem at a low level of granularity, where each agent's actions are primitive operations lasting exactly one time step. We address the case where each agent has macro-actions: temporally extended actions that may require different amounts of time to execute. We model macro-actions as options in a Dec-POMDP, focusing on actions that depend only on information directly available to the agent during execution. Therefore, we model systems where coordination decisions only occur at the level of deciding which macro-actions to execute. The core technical difficulty in this setting is that the options chosen by each agent no longer terminate at the same time. We extend three leading Dec-POMDP algorithms for policy generation to the macro-action case, and demonstrate their effectiveness in both standard benchmarks and a multi-robot coordination problem. The results show that our new algorithms retain agent coordination while allowing high-quality solutions to be generated for significantly longer horizons and larger state-spaces than previous Dec-POMDP methods. Furthermore, in the multi-robot domain, we show that, in contrast to most existing methods that are specialized to a particular problem class, our approach can synthesize control policies that exploit opportunities for coordination while balancing uncertainty, sensor information, and information about other agents.

47

WANG, YI, SHIQI ZHANG, and JOOHYUNG LEE. "Bridging Commonsense Reasoning and Probabilistic Planning via a Probabilistic Action Language." Theory and Practice of Logic Programming 19, no. 5-6 (September 2019): 1090–106. http://dx.doi.org/10.1017/s1471068419000371.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

AbstractTo be responsive to dynamically changing real-world environments, an intelligent agent needs to perform complex sequential decision-making tasks that are often guided by commonsense knowledge. The previous work on this line of research led to the framework called interleaved commonsense reasoning and probabilistic planning (icorpp), which used P-log for representing commmonsense knowledge and Markov Decision Processes (MDPs) or Partially Observable MDPs (POMDPs) for planning under uncertainty. A main limitation of icorpp is that its implementation requires non-trivial engineering efforts to bridge the commonsense reasoning and probabilistic planning formalisms. In this paper, we present a unified framework to integrate icorpp’s reasoning and planning components. In particular, we extend probabilistic action language pBC+ to express utility, belief states, and observation as in POMDP models. Inheriting the advantages of action languages, the new action language provides an elaboration tolerant representation of POMDP that reflects commonsense knowledge. The idea led to the design of the system pbcplus2pomdp, which compiles a pBC+ action description into a POMDP model that can be directly processed by off-the-shelf POMDP solvers to compute an optimal policy of the pBC+ action description. Our experiments show that it retains the advantages of icorpp while avoiding the manual efforts in bridging the commonsense reasoner and the probabilistic planner.

48

Sarraute, Carlos, Olivier Buffet, and Jörg Hoffmann. "POMDPs Make Better Hackers: Accounting for Uncertainty in Penetration Testing." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 1816–24. http://dx.doi.org/10.1609/aaai.v26i1.8363.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

Penetration Testing is a methodology for assessing network security, by generating and executing possible hacking attacks. Doing so automatically allows for regular and systematic testing. A key question is how to generate the attacks. This is naturally formulated as planning under uncertainty, i.e., under incomplete knowledge about the network configuration. Previous work uses classical planning, and requires costly pre-processes reducing this uncertainty by extensive application of scanning methods. By contrast, we herein model the attack planning problem in terms of partially observable Markov decision processes (POMDP). This allows to reason about the knowledge available, and to intelligently employ scanning actions as part of the attack. As one would expect, this accurate solution does not scale. We devise a method that relies on POMDPs to find good attacks on individual machines, which are then composed into an attack on the network as a whole. This decomposition exploits network structure to the extent possible, making targeted approximations (only) where needed. Evaluating this method on a suitably adapted industrial test suite, we demonstrate its effectiveness in both runtime and solution quality.

49

Shi, Weihao, Shanhong Guo, Xiaoyu Cong, Weixing Sheng, Jing Yan, and Jinkun Chen. "Frequency Agile Anti-Interference Technology Based on Reinforcement Learning Using Long Short-Term Memory and Multi-Layer Historical Information Observation." Remote Sensing 15, no. 23 (November 23, 2023): 5467. http://dx.doi.org/10.3390/rs15235467.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

In modern electronic warfare, radar intelligence has become increasingly crucial when dealing with complex interference environments. This paper combines radar agile frequency technology with reinforcement learning to achieve adaptive frequency hopping for radar anti-jamming. Unlike traditional reinforcement learning with Markov decision processes (MDPs), the interaction between radar and jammers occurs within the partially observable Markov decision processes (POMDPs). In this context, the partial observation information available to the agent does not strictly satisfy the Markov property. This paper uses multiple layers of historical observation information to solve this problem. Historical observations can be viewed as a time series, and time-sensitive networks are employed to extract the temporal information embedded within the observations. In addition, the reward function is optimized to facilitate the faster learning of the agent in the jammer sweep environment. This simulation shows that the optimization of the agent state, network structure, and reward function can effectively help the radar to resist jamming.

50

Nababan, Maxtulus Junedy, Herman Mawengkang, Tulus Tulus, and Sutarman Sutarman. "Hidden Markov Model to Optimize Coordination Relationship for Learning Behaviour." International Journal of Religion 5, no. 9 (May 27, 2024): 459–69. http://dx.doi.org/10.61707/52exbt60.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Анотація:

School communities interact dynamically, much like the agents in a multi-agent system. For coordinated action, relationships between agents in a multi-agent system must be handled. One technique for persuading individuals to behave in a coordinated manner is to manage the role of agents in generating knowledge, attitudes, and practices. Managing these connections is difficult due to the large number of unknowns. Modeling can aid in the clarification of agent relationships. Coordination mechanisms can be modeled using Markov models. Agents can demonstrate and consider how their actions affect other agents in order to achieve desired behavior goals. This paper extends the state space of Partially Observable Markov Decision Processes (POMDPs) with an agent model to make them multi-agent friendly.