Tesis sobre el tema "Partially Observable Markov Decision Processes (POMDPs)"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 35 mejores tesis para su investigación sobre el tema "Partially Observable Markov Decision Processes (POMDPs)".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Aberdeen, Douglas Alexander y doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes". The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.
Texto completoOlafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.
Texto completoLusena, Christopher. "Finite Memory Policies for Partially Observable Markov Decision Proesses". UKnowledge, 2001. http://uknowledge.uky.edu/gradschool_diss/323.
Texto completoSkoglund, Caroline. "Risk-aware Autonomous Driving Using POMDPs and Responsibility-Sensitive Safety". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300909.
Texto completoAutonoma fordon förutspås spela en stor roll i framtiden med målen att förbättra effektivitet och säkerhet för vägtransporter. Men även om vi sett flera exempel av autonoma fordon ute på vägarna de senaste åren är frågan om hur säkerhet ska kunna garanteras ett utmanande problem. Det här examensarbetet har studerat denna fråga genom att utveckla ett ramverk för riskmedvetet beslutsfattande. Det autonoma fordonets dynamik och den oförutsägbara omgivningen modelleras med en partiellt observerbar Markov-beslutsprocess (POMDP från engelskans “Partially Observable Markov Decision Process”). Ett riskmått föreslås baserat på ett säkerhetsavstånd förkortat RSS (från engelskans “Responsibility-Sensitive Safety”) som kvantifierar det minsta avståndet till andra fordon för garanterad säkerhet. Riskmåttet integreras i POMDP-modellens belöningsfunktion för att åstadkomma riskmedvetna beteenden. Den föreslagna riskmedvetna POMDP-modellen utvärderas i två fallstudier. I ett scenario där det egna fordonet följer ett annat fordon på en enfilig väg visar vi att det egna fordonet kan undvika en kollision då det framförvarande fordonet bromsar till stillastående. I ett scenario där det egna fordonet ansluter till en huvudled från en ramp visar vi att detta görs med ett tillfredställande avstånd till andra fordon. Slutsatsen är att den riskmedvetna POMDP-modellen lyckas realisera en avvägning mellan säkerhet och användbarhet genom att hålla ett rimligt säkerhetsavstånd och anpassa sig till andra fordons beteenden.
You, Yang. "Probabilistic Decision-Making Models for Multi-Agent Systems and Human-Robot Collaboration". Electronic Thesis or Diss., Université de Lorraine, 2023. http://www.theses.fr/2023LORR0014.
Texto completoIn this thesis, using Markov decision models, we investigate high-level decision-making (task-level planning) for robotics in two aspects: robot-robot collaboration and human-robot collaboration.In robot-robot collaboration (RRC), we study the decision problems of multiple robots involved to achieve a shared goal collaboratively, and we use the decentralized partially observable Markov decision process (Dec-POMDP) framework to model such RRC problems. Then, we propose two novel algorithms for solving Dec-POMDPs. The first algorithm (Inf-JESP) finds Nash equilibrium solutions by iteratively building the best-response policy for each agent until no improvement can be made. To handle infinite-horizon Dec-POMDPs, we represent each agent's policy using a finite-state controller. The second algorithm (MC-JESP) extends Inf-JESP with generative models, which enables us to scale up to large problems. Through experiments, we demonstrate our methods are competitive with existing Dec-POMDP solvers.In human-robot collaboration (HRC), we can only control the robot, and the robot faces uncertain human objectives and induced behaviors. Therefore, we attempt to address the challenge of deriving robot policies in HRC, which are robust to the uncertainties about human behaviors. In this direction, we discuss possible mental models that can be used to model humans in an HRC task. We propose a general approach to derive, automatically and without prior knowledge, a model of human behaviors based on the assumption that the human could also control the robot. From here, we then design two algorithms for computing robust robot policies relying on solving a robot POMDP, whose state contains the human's internal state. The first algorithm operates offline and gives a complete robot policy that can be used during the robot's execution. The second algorithm is an online method, i.e., it plans the robot's action at each time step during execution. Compared with the offline approach, the online method only requires a generative model and thus can scale up to large problems. Experiments with synthetic and real humans are conducted in a simulated environment to evaluate these algorithms. We observe that our methods can provide robust robot decisions despite the uncertainties over human objectives and behaviors.In this thesis, our research for RRC provides a foundation for building best-response policies in a partially observable and multi-agent setting, which serves as an important intermediate step for addressing HRC problems. Moreover, we provide more flexible algorithms using generative models in each contribution, and we believe this will facilitate applying our contributions to real-world applications
Cheng, Hsien-Te. "Algorithms for partially observable Markov decision processes". Thesis, University of British Columbia, 1988. http://hdl.handle.net/2429/29073.
Texto completoBusiness, Sauder School of
Graduate
Jaulmes, Robin. "Active learning in partially observable Markov decision processes". Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=98733.
Texto completoOur goal is to build Artificial Intelligence algorithms able to reproduce the reasoning of humans for these complex problems. We use the Reinforcement Learning framework, which allows to learn optimal behaviors in dynamic environments. More precisely, we adapt Partially-Observable Markov Decision Processes (POMDPs) to environments that are partially known.
We take inspiration from the field of Active Learning: we assume the existence of an oracle, who can, during a short learning phase, provide the agent with additional information about its environment. The agent actively learns everything that is useful in the environment, with a minimum use of the oracle.
After reviewing existing methods for solving learning problems in partially observable environments, we expose a theoretical active learning setup. We propose an algorithm, MEDUSA, and show theoretical and empirical proofs of performance for it.
Aberdeen, Douglas Alexander. "Policy-gradient algorithms for partially observable Markov decision processes /". View thesis entry in Australian Digital Theses Program, 2003. http://thesis.anu.edu.au/public/adt-ANU20030410.111006/index.html.
Texto completoZawaideh, Zaid. "Eliciting preferences sequentially using partially observable Markov decision processes". Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18794.
Texto completoLes systèmes d'aide à la décision ont gagné en importance récemment. Pourtant, un des problèmes importants liés au design de tels systèmes demeure: comprendre comment l'usager évalue les différents résultats, ou plus simplement, déterminer quelles sont ses préférences. L'extraction des préférences vise à éliminer certains aspects arbitraires du design d'agents de décision en offrant des méthodes plus formelles pour mesurer la qualité des résultats. Cette thèse tente de résoudre certains problèmes ayant trait à l'extraction des préférences, tel que celui de la haute dimensionnalité du problème sous-jacent. Le problème est formulé en tant que processus de décision markovien partiellement observable (POMDP), et utilise une représentation factorisée afin de profiter de la structure inhérente aux problèmes d'extraction des préférences. De plus, des connaissances simples quant aux caractéristiques de ces problèmes sont exploitées afin d'obtenir des préférences plus précises, sans pour autant augmenter la tâche de l'usager. Les actions terminales "sparse" sont définies de manière à permettre un compromis flexible entre vitesse et précision. Le résultat est un système assez flexible pour être appliqué à un grand nombre de domaines qui ont à faire face aux problèmes liés aux méthodes d'extraction des préférences.
Williams, Jason Douglas. "Partially observable Markov decision processes for spoken dialogue management". Thesis, University of Cambridge, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.612754.
Texto completoLusena, Christopher. "Finite memory policies for partially observable Markov decision processes". Lexington, Ky. : [University of Kentucky Libraries], 2001. http://lib.uky.edu/ETD/ukycosc2001d00021/lusena01.pdf.
Texto completoTitle from document title page. Document formatted into pages; contains viii, 89 p. : ill. Includes abstract. Includes bibliographical references (p. 81-86).
Yu, Huizhen Ph D. Massachusetts Institute of Technology. "Approximate solution methods for partially observable Markov and semi-Markov decision processes". Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/35299.
Texto completoThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 165-169).
We consider approximation methods for discrete-time infinite-horizon partially observable Markov and semi-Markov decision processes (POMDP and POSMDP). One of the main contributions of this thesis is a lower cost approximation method for finite-space POMDPs with the average cost criterion, and its extensions to semi-Markov partially observable problems and constrained POMDP problems, as well as to problems with the undiscounted total cost criterion. Our method is an extension of several lower cost approximation schemes, proposed individually by various authors, for discounted POMDP problems. We introduce a unified framework for viewing all of these schemes together with some new ones. In particular, we establish that due to the special structure of hidden states in a POMDP, there is a class of approximating processes, which are either POMDPs or belief MDPs, that provide lower bounds to the optimal cost function of the original POMDP problem. Theoretically, POMDPs with the long-run average cost criterion are still not fully understood.
(cont.) The major difficulties relate to the structure of the optimal solutions, such as conditions for a constant optimal cost function, the existence of solutions to the optimality equations, and the existence of optimal policies that are stationary and deterministic. Thus, our lower bound result is useful not only in providing a computational method, but also in characterizing the optimal solution. We show that regardless of these theoretical difficulties, lower bounds of the optimal liminf average cost function can be computed efficiently by solving modified problems using multichain MDP algorithms, and the approximating cost functions can be also used to obtain suboptimal stationary control policies. We prove the asymptotic convergence of the lower bounds under certain assumptions. For semi-Markov problems and total cost problems, we show that the same method can be applied for computing lower bounds of the optimal cost function. For constrained average cost POMDPs, we show that lower bounds of the constrained optimal cost function can be computed by solving finite-dimensional LPs. We also consider reinforcement learning methods for POMDPs and MDPs. We propose an actor-critic type policy gradient algorithm that uses a structured policy known as a finite-state controller.
(cont.) We thus provide an alternative to the earlier actor-only algorithm GPOMDP. Our work also clarifies the relationship between the reinforcement learning methods for POMDPs and those for MDPs. For average cost MDPs, we provide a convergence and convergence rate analysis for a least squares temporal difference (TD) algorithm, called LSPE, and previously proposed for discounted problems. We use this algorithm in the critic portion of the policy gradient algorithm for POMDPs with finite-state controllers. Finally, we investigate the properties of the limsup and liminf average cost functions of various types of policies. We show various convexity and concavity properties of these costfunctions, and we give a new necessary condition for the optimal liminf average cost to be constant. Based on this condition, we prove the near-optimality of the class of finite-state controllers under the assumption of a constant optimal liminf average cost. This result provides a theoretical guarantee for the finite-state controller approach.
by Huizhen Yu.
Ph.D.
Tobin, Ludovic. "A Stochastic Point-Based Algorithm for Partially Observable Markov Decision Processes". Thesis, Université Laval, 2008. http://www.theses.ulaval.ca/2008/25194/25194.pdf.
Texto completoDecision making under uncertainty is a popular topic in the field of artificial intelligence. One popular way to attack such problems is by using a sound mathematical model. Notably, Partially Observable Markov Processes (POMDPs) have been the subject of extended researches over the last ten years or so. However, solving a POMDP is a very time-consuming task and for this reason, the model has not been used extensively. Our objective was to continue the tremendous progress that has been made over the last couple of years, with the hope that our work will be a step toward applying POMDPs in large-scale problems. To do so, we combined different ideas in order to produce a new algorithm called SSVI (Stochastic Search Value Iteration). Three major accomplishments were achieved throughout this research work. Firstly, we developed a new offline POMDP algorithm which, on benchmark problems, proved to be more efficient than state of the arts algorithm. The originality of our method comes from the fact that it is a stochastic algorithm, in comparison with the usual determinist algorithms. Secondly, the algorithm we developed can also be applied in a particular type of online environments, in which this algorithm outperforms by a significant margin the competition. Finally, we also applied a basic version of our algorithm in a complex military simulation in the context of the Combat Identification project from DRDC-Valcartier.
Olsen, Alan. "Pond-Hindsight: Applying Hindsight Optimization to Partially-Observable Markov Decision Processes". DigitalCommons@USU, 2011. https://digitalcommons.usu.edu/etd/1035.
Texto completoHudson, Joshua. "A Partially Observable Markov Decision Process for Breast Cancer Screening". Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154437.
Texto completoCastro, Rivadeneira Pablo Samuel. "On planning, prediction and knowledge transfer in fully and partially observable Markov decision processes". Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104525.
Texto completoCette thèse traite le problème de prises de décisions séquentielles en grand domaines. Les formalismes utilisés pour étudier ce problème sont processus de décision Markoviens entièrement ou partiellement observables (MDP et POMDPs, respectivement).La première contribution de cette thèse est une analyse théorique du comportement des POMDPs lorsque seulement sous-ensembles de l'ensemble d'observations sont utilisés. L'un de ces sous-ensembles est utilisé pour mettre à jour la confiance de l'agent sur son état actuel, tandis que l'autre est utilisé pour mesurer la performance de l'agent. Les comportements sont formalisés avec trois types de relations d'equivalence. La première relation place les états dans le même groupe en fonction de leurs valeurs en vertu des politiques optimales ou générales; la second relation place les etats dans le même groupe en fonction de leur capacité a predire sequences d'observations; la troisième relation est basé sur la bisimulation, qui est une relation d'equivalence bien connu emprunté à la théorie de la concurrence.Les relations de bisimulation peuvent être généralisés à métriques de bisimulation. Cette thèse présente métriques de bisimulation pour une MDP avec des actions prolongées (formalisées comme des options) et propose une nouvelle métrique de bisimulation qui fournit un resserrement des limites sur la différence de valeurs optimales. Une nouvelle preuve est fournie pour la convergence d'une méthode d'approximation pour le calcul le du métrique de bisimulation qui est basé sur un échantillonnage statistique. La nouvelle preuve permet de déterminer le nombre minimal d'échantillons nécessaires pour atteindre la qualité souhaitée de rapprochement avec une forte probabilité.Bien que mêtriques de bisimulation ont été précédemment utilisés pour la compression de l'espace d'état, cette thèse propose de les utiliser pour transférer des politiques d'un MDP à l'autre. Contrairement aux travaux de transfert existants,le mappage entre les deux systèmes est déterminé automatiquement par les métriques de bisimulation. Résultats théoriques sont présentés que limite la perte de l'optimalité encourus par la police transferée. Un certain nombre d'algorithmes sont introduites, qui sont évalués de façon empirique dans le contexte de la planification et de l'apprentissage.
Horgan, Casey Vi. "Dealing with uncertainty : a comparison of robust optimization and partially observable Markov decision processes". Thesis, Massachusetts Institute of Technology, 2017. http://hdl.handle.net/1721.1/112410.
Texto completoCataloged from PDF version of thesis.
Includes bibliographical references (pages 131-132).
Uncertainty is often present in real-life problems. Deciding how to deal with this uncertainty can be difficult. The proper formulation of a problem can be the larger part of the work required to solve it. This thesis is intended to be used by a decision maker to determine how best to formulate a problem. Robust optimization and partially observable Markov decision processes (POMDPs) are two methods of dealing with uncertainty in real life problems. Robust optimization is used primarily in operations research, while engineers will be more familiar with POMDPs. For a decision maker who is unfamiliar with one or both of these methods, this thesis will provide insight into a different way of problem solving in the presence of uncertainty. The formulation of each method is explained in detail, and the theory of common solution methods is presented. In addition, several examples are given for each method. While a decision maker may try to solve an entire problem using one method, sometimes there are natural partitions to a problem that encourage using multiple solution methods. In this thesis, one such problem is presented, a military planing problem consisting of two parts. The first part is best solved with POMDPs and the second with robust optimization. The reasoning behind this partition is explained and the formulation of each part is presented. Finally, a discussion of the problem types suitable for each method, including multiple applications, is provided.
by Casey Vi Horgan.
S.M.
Crook, Paul A. "Learning in a state of confusion : employing active perception and reinforcement learning in partially observable worlds". Thesis, University of Edinburgh, 2007. http://hdl.handle.net/1842/1471.
Texto completoOmidshafiei, Shayegan. "Decentralized control of multi-robot systems using partially observable Markov Decision Processes and belief space macro-actions". Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101447.
Texto completoThis electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 129-139).
Planning, control, perception, and learning for multi-robot systems present signicant challenges. Transition dynamics of the robots may be stochastic, making it difficult to select the best action each robot should take at a given time. The observation model, a function of the robots' sensors, may be noisy or partial, meaning that deterministic knowledge of the team's state is often impossible to attain. Robots designed for real-world applications require careful consideration of such sources of uncertainty. This thesis contributes a framework for multi-robot planning in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This thesis extends the Dec-POMDP framework to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP), taking advantage of high- level representations that are natural for multi-robot problems. Dec-POSMDPs allow asynchronous decision-making, which is crucial in multi-robot domains. This thesis also presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods due to use of closed-loop macro-actions in planning. The proposed framework's performance is evaluated in a constrained multi-robot package delivery domain, showing its ability to provide high-quality solutions for large problems. Due to the probabilistic nature of state transitions and observations, robots operate in belief space, the space of probability distributions over all of their possible states. This thesis also contributes a hardware platform called Measurable Augmented Reality for Prototyping Cyber-Physical Systems (MAR-CPS). MAR-CPS allows real-time visualization of the belief space in laboratory settings.
by Shayegan Omidshafiei.
S.M.
Folsom-Kovarik, Jeremiah. "Leveraging Help Requests in POMDP Intelligent Tutors". Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5210.
Texto completoPh.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science
Pradhan, Neil. "Deep Reinforcement Learning for Autonomous Highway Driving Scenario". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289444.
Texto completoVi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel.
Murugesan, Sugumar. "Opportunistic Scheduling Using Channel Memory in Markov-modeled Wireless Networks". The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1282065836.
Texto completoIbrahim, Rita. "Utilisation des communications Device-to-Device pour améliorer l'efficacité des réseaux cellulaires". Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLC002/document.
Texto completoThis thesis considers Device-to-Device (D2D) communications as a promising technique for enhancing future cellular networks. Modeling, evaluating and optimizing D2D features are the fundamental goals of this thesis and are mainly achieved using the following mathematical tools: queuing theory, Lyapunov optimization and Partially Observed Markov Decision Process (POMDP). The findings of this study are presented in three parts. In the first part, we investigate a D2D mode selection scheme. We derive the queuing stability regions of both scenarios: pure cellular networks and D2D-enabled cellular networks. Comparing both scenarios leads us to elaborate a D2D vs cellular mode selection design that improves the capacity of the network. In the second part, we develop a D2D resource allocation algorithm. We observe that D2D users are able to estimate their local Channel State Information (CSI), however the base station needs some signaling exchange to acquire this information. Based on the D2D users' knowledge of their local CSI, we provide an energy efficient resource allocation framework that shows how distributed scheduling outperforms centralized one. In the distributed approach, collisions may occur between the different CSI reporting; thus, we propose a collision reduction algorithm. Moreover, we give a detailed description on how both centralized and distributed algorithms can be implemented in practice. In the third part, we propose a mobile relay selection policy in a D2D relay-aided network. Relays' mobility appears as a crucial challenge for defining the strategy of selecting the optimal D2D relays. The problem is formulated as a constrained POMDP which captures the dynamism of the relays and aims to find the optimal relay selection policy that maximizes the performance of the network under cost constraints
Gonçalves, Luciano Vargas. "Uma arquitetura de Agentes BDI para auto-regulação de Trocas Sociais em Sistemas Multiagentes Abertos". Universidade Catolica de Pelotas, 2009. http://tede.ucpel.edu.br:8080/jspui/handle/tede/105.
Texto completoThe study and development of systems to control interactions in multiagent systems is an open problem in Artificial Intelligence. The system of social exchange values of Piaget is a social approach that allows for the foundations of the modeling of interactions between agents, where the interactions are seen as service exchanges between pairs of agents, with the evaluation of the realized or received services, thats is, the investments and profits in the exchange, and credits and debits to be charged or received, respectively, in future exchanges. This evaluation may be performed in different ways by the agents, considering that they may have different exchange personality traits. In an exchange process along the time, the different ways in the evaluation of profits and losses may cause disequilibrium in the exchange balances, where some agents may accumulate profits and others accumulate losses. To solve the exchange equilibrium problem, we use the Partially Observable Markov Decision Processes (POMDP) to help the agent decision of actions that can lead to the equilibrium of the social exchanges. Then, each agent has its own internal process to evaluate its current balance of the results of the exchange process between the other agents, observing its internal state, and with the observation of its partner s exchange behavior, it is able to deliberate on the best action it should perform in order to get the equilibrium of the exchanges. Considering an open multiagent system, it is necessary a mechanism to recognize the different personality traits, to build the POMDPs to manage the exchanges between the pairs of agents. This recognizing task is done by Hidden Markov Models (HMM), which, from models of known personality traits, can approximate the personality traits of the new partners, just by analyzing observations done on the agent behaviors in exchanges. The aim of this work is to develop an hybrid agent architecture for the self-regulation of social exchanges between personalitybased agents in a open multiagent system, based in the BDI (Beliefs, Desires, Intentions) architecture, where the agent plans are obtained from optimal policies of POMDPs, which model personality traits that are recognized by HMMs. To evaluate the proposed approach some simulations were done considering (known or new) different personality traits
O estudo e desenvolvimento de sistemas para o controle de interações em sistemas multiagentes é um tema em aberto dentro da Inteligência Artificial. O sistema de valores de trocas sociais de Piaget é uma abordagem social que possibilita fundamentar a modelagem de interações de agentes, onde as interações são vistas como trocas de serviços entre pares de agentes, com a valorização dos serviços realizados e recebidos, ou seja, investimentos e ganhos na troca realizada, e, também os créditos e débitos a serem cobrados ou recebidos, respectivamente, em trocas futuras. Esta avaliação pode ser realizada de maneira diferenciada pelos agentes envolvidos, considerando que estes apresentam traços de personalidade distintos. No decorrer de processo de trocas sociais a forma diferenciada de avaliar os ganhos e perdas nas interações pode causar desequilíbrio nos balanços de trocas dos agentes, onde alguns agentes acumulam ganhos e outros acumulam perdas. Para resolver a questão do equilíbrio das trocas, encontrou-se nos Processos de Decisão de Markov Parcialmente Observáveis (POMDP) uma metodologia capaz de auxiliar a tomada de decisões de cursos de ações na busca do equilíbrio interno dos agentes. Assim, cada agente conta com um mecanismo próprio para avaliar o seu estado interno, e, de posse das observações sobre o comportamento de troca dos parceiros, torna-se apto para deliberar sobre as melhores ações a seguir na busca do equilíbrio interno para o par de agentes. Com objetivo de operar em sistema multiagentes aberto, torna-se necessário um mecanismo para reconhecer os diferentes traços de personalidade, viabilizando o uso de POMDPs nestes ambientes. Esta tarefa de reconhecimento é desempenhada pelos Modelos de Estados Ocultos de Markov (HMM), que, a partir de modelos de traços de personalidade conhecidos, podem inferir os traços aproximados de novos parceiros de interações, através das observações sobre seus comportamentos nas trocas. O objetivo deste trabalho é desenvolver uma arquitetura de agentes híbrida para a auto-regulação de trocas sociais entre agentes baseados em traços de personalidade em sistemas multiagentes abertos. A arquitetura proposta é baseada na arquitetura BDI (Beliefs, Desires, Intentions), onde os planos dos agentes são obtidos através de políticas ótimas de POMDPs, que modelam traços de personalidade reconhecidos através de HMMs. Para avaliar a proposta, foram realizadas simulações envolvendo traços de personalidade conhecidos e novos traços
Sachan, Mohit. "Learning in Partially Observable Markov Decision Processes". 2013. http://hdl.handle.net/1805/3451.
Texto completoLearning in Partially Observable Markov Decision process (POMDP) is motivated by the essential need to address a number of realistic problems. A number of methods exist for learning in POMDPs, but learning with limited amount of information about the model of POMDP remains a highly anticipated feature. Learning with minimal information is desirable in complex systems as methods requiring complete information among decision makers are impractical in complex systems due to increase of problem dimensionality. In this thesis we address the problem of decentralized control of POMDPs with unknown transition probabilities and reward. We suggest learning in POMDP using a tree based approach. States of the POMDP are guessed using this tree. Each node in the tree has an automaton in it and acts as a decentralized decision maker for the POMDP. The start state of POMDP is known as the landmark state. Each automaton in the tree uses a simple learning scheme to update its action choice and requires minimal information. The principal result derived is that, without proper knowledge of transition probabilities and rewards, the automata tree of decision makers will converge to a set of actions that maximizes the long term expected reward per unit time obtained by the system. The analysis is based on learning in sequential stochastic games and properties of ergodic Markov chains. Simulation results are presented to compare the long term rewards of the system under different decision control algorithms.
Koltunova, Veronika. "Active Sensing for Partially Observable Markov Decision Processes". Thesis, 2013. http://hdl.handle.net/10012/7222.
Texto completoAberdeen, Douglas. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes". Phd thesis, 2003. http://hdl.handle.net/1885/48180.
Texto completoKinathil, Shamin. "Closed-form Solutions to Sequential Decision Making within Markets". Phd thesis, 2018. http://hdl.handle.net/1885/186490.
Texto completoDaswani, Mayank. "Generic Reinforcement Learning Beyond Small MDPs". Phd thesis, 2015. http://hdl.handle.net/1885/110545.
Texto completoPoupart, Pascal. "Exploiting structure to efficiently solve large scale partially observable Markov decision processes". 2005. http://link.library.utoronto.ca/eir/EIRdetail.cfm?Resources__ID=232732&T=F.
Texto completoLeung, Siu-Ki. "Exploring partially observable Markov decision processes by exploting structure and heuristic information". Thesis, 1996. http://hdl.handle.net/2429/5772.
Texto completoPoupart, Pascal. "Approximate value-directed belief state monitoring for partially observable Markov decision processes". Thesis, 2000. http://hdl.handle.net/2429/11462.
Texto completoAmato, Christopher. "Increasing scalability in algorithms for centralized and decentralized partially observable Markov decision processes: Efficient decision-making and coordination in uncertain environments". 2010. https://scholarworks.umass.edu/dissertations/AAI3427492.
Texto completoGoswami, Anindya. "Semi-Markov Processes In Dynamic Games And Finance". Thesis, 2008. https://etd.iisc.ac.in/handle/2005/727.
Texto completoGoswami, Anindya. "Semi-Markov Processes In Dynamic Games And Finance". Thesis, 2008. http://hdl.handle.net/2005/727.
Texto completo