Log in

Relevant bibliographies by topics / POMDP / Dissertations / Theses

Dissertations / Theses on the topic 'POMDP'

To see the other types of publications on this topic, follow the link: POMDP.

Author: Grafiati

Published: 4 June 2021

Last updated: 8 June 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'POMDP.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Folsom-Kovarik, Jeremiah. "Leveraging Help Requests in POMDP Intelligent Tutors." Doctoral diss., University of Central Florida, 2012. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/5210.

Full text

Abstract:

Intelligent tutoring systems (ITSs) are computer programs that model individual learners and adapt instruction to help each learner differently. One way ITSs differ from human tutors is that few ITSs give learners a way to ask questions. When learners can ask for help, their questions have the potential to improve learning directly and also act as a new source of model data to help the ITS personalize instruction. Inquiry modeling gives ITSs the ability to answer learner questions and refine their learner models with an inexpensive new input channel. In order to support inquiry modeling, an advanced planning formalism is applied to ITS learner modeling. Partially observable Markov decision processes (POMDPs) differ from more widely used ITS architectures because they can plan complex action sequences in uncertain situations with machine learning. Tractability issues have previously precluded POMDP use in ITS models. This dissertation introduces two improvements, priority queues and observation chains, to make POMDPs scale well and encompass the large problem sizes that real-world ITSs must confront. A new ITS was created to support trainees practicing a military task in a virtual environment. The development of the Inquiry Modeling POMDP Adaptive Trainer (IMP) began with multiple formative studies on human and simulated learners that explored inquiry modeling and POMDPs in intelligent tutoring. The studies suggest the new POMDP representations will be effective in ITS domains having certain common characteristics. Finally, a summative study evaluated IMP's ability to train volunteers in specific practice scenarios. IMP users achieved post-training scores averaging up to 4.5 times higher than users who practiced without support and up to twice as high as trainees who used an ablated version of IMP with no inquiry modeling. IMP's implementation and evaluation helped explore questions about how inquiry modeling and POMDP ITSs work, while empirically demonstrating their efficacy.
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

2

Kaplow, Robert. "Point-based POMDP solvers survey and comparative analysis /." Thesis, McGill University, 2010. http://digitool.Library.McGill.CA:8881/R/?func=dbin-jump-full&object_id=92275.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Png, ShaoWei. "Bayesian reinforcement learning for POMDP-based dialogue systems." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104830.

Full text

Abstract:

Spoken dialogue systems are gaining popularity with improvements in speech recognition technologies. Dialogue systems have been modeled effectively using Partially observable Markov decision processes (POMDPs), achieving improvements in robustness. However, past research on POMDP-based dialogue systems usually assumes that the model parameters are known. This limitation can be addressed through model-based Bayesian reinforcement learning, which offers a rich framework for simultaneous learning and planning. However, due to the high complexity of the framework, a major challenge is to scale up these algorithms for complex dialogue systems. In this work, we show that by exploiting certain known components of the system, such as knowledge of symmetrical properties, and using an approximate on-line planning algorithm, we are able to apply Bayesian RL on several realistic spoken dialogue system domains. We consider several experimental domains. First, a small synthetic data case, where we illustrate several properties of the approach. Second, a small dialogue manager based on the SACTI1 corpus which contains 144 dialogues between 36 users and 12 experts. Third, a dialogue manager aimed at patients with dementia, to assist them with activities of daily living. Finally, we consider a large dialogue manager designed to help patients to operate a wheelchair.
Les systèmes de dialogues sont de plus en plus populaires depuis l'amélioration des technologies de reconnaissance vocale. Ces systèmes de dialogues peuvent être modélisés efficacement à l'aide des processus de décision markoviens partiellement observables (POMDP). Toutefois, les recherches antérieures supposent généralement une connaissance des paramètres du modèle. L'apprentissage par renforcement basée sur un modèle bayéesien, qui offre un cadre riche pour l'apprentissage et la planification simultanéee, peut éeliminer la néecessitée de cette supposition à cause de la grande complexitée du cadre, le déeveloppement de ces algorithmes pour les systèmes de dialogues complexes repréesente un déefi majeur. Dans ce document, nous déemontrons qu'en exploitant certaines propriéetées connues du système, comme les syméetries, et en utilisant un algorithme de planification approximatif en ligne, nous sommes capables d'appliquer les techniques d'apprentissage par renforcement bayéesien dans le cadre de sur plusieurs domaines de dialogues réealistes. Nous considéerons quelques domaines expéerimentaux. Le premier comprend des donnéees synthéetiques qui servent à illustrer plusieurs propriéetées de notre approche. Le deuxième est un gestionnaire de dialogues basée sur le corpus SACTI1 qui contient 144 dialogues entre 36 utilisateurs et 12 experts. Le troisième gestionnaire aide les patients atteints de déemence à vivre au quotidien. Finalement, nous considéerons un grand gestionnaire de dialogue qui assise des patients à manoeuvrer une chaise roulante automatiséee.

APA, Harvard, Vancouver, ISO, and other styles

4

Chinaei, Hamid Reza. "Learning Dialogue POMDP Model Components from Expert Dialogues." Thesis, Université Laval, 2013. http://www.theses.ulaval.ca/2013/29690/29690.pdf.

Full text

Abstract:

Un système de dialogue conversationnel doit aider les utilisateurs humains à atteindre leurs objectifs à travers des dialogues naturels et efficients. C'est une tache toutefois difficile car les langages naturels sont ambiguës et incertains, de plus le système de reconnaissance vocale (ASR) est bruité. À cela s'ajoute le fait que l'utilisateur humain peut changer son intention lors de l'interaction avec la machine. Dans ce contexte, l'application des processus décisionnels de Markov partiellement observables (POMDPs) au système de dialogue conversationnel nous a permis d'avoir un cadre formel pour représenter explicitement les incertitudes, et automatiser la politique d'optimisation. L'estimation des composantes du modelé d'un POMDP-dialogue constitue donc un défi important, car une telle estimation a un impact direct sur la politique d'optimisation du POMDP-dialogue. Cette thèse propose des méthodes d'apprentissage des composantes d'un POMDPdialogue basées sur des dialogues bruités et sans annotation. Pour cela, nous présentons des méthodes pour apprendre les intentions possibles des utilisateurs à partir des dialogues, en vue de les utiliser comme états du POMDP-dialogue, et l'apprendre un modèle du maximum de vraisemblance à partir des données, pour transition du POMDP. Car c'est crucial de réduire la taille d'état d'observation, nous proposons également deux modèles d'observation: le modelé mot-clé et le modelé intention. Dans les deux modèles, le nombre d'observations est réduit significativement tandis que le rendement reste élevé, particulièrement dans le modele d'observation intention. En plus de ces composantes du modèle, les POMDPs exigent également une fonction de récompense. Donc, nous proposons de nouveaux algorithmes pour l'apprentissage du modele de récompenses, un apprentissage qui est basé sur le renforcement inverse (IRL). En particulier, nous proposons POMDP-IRL-BT qui fonctionne sur les états de croyance disponibles dans les dialogues du corpus. L'algorithme apprend le modele de récompense par l'estimation du modele de transition de croyance, semblable aux modèles de transition des états dans un MDP (processus décisionnel de Markov). Finalement, nous appliquons les méthodes proposées à un domaine de la santé en vue d'apprendre un POMDP-dialogue et ce essentiellement à partir de dialogues réels, bruités, et sans annotations.
Spoken dialogue systems should realize the user intentions and maintain a natural and efficient dialogue with users. This is however a difficult task as spoken language is naturally ambiguous and uncertain, and further the automatic speech recognition (ASR) output is noisy. In addition, the human user may change his intention during the interaction with the machine. To tackle this difficult task, the partially observable Markov decision process (POMDP) framework has been applied in dialogue systems as a formal framework to represent uncertainty explicitly while supporting automated policy solving. In this context, estimating the dialogue POMDP model components is a signifficant challenge as they have a direct impact on the optimized dialogue POMDP policy. This thesis proposes methods for learning dialogue POMDP model components using noisy and unannotated dialogues. Speciffically, we introduce techniques to learn the set of possible user intentions from dialogues, use them as the dialogue POMDP states, and learn a maximum likelihood POMDP transition model from data. Since it is crucial to reduce the observation state size, we then propose two observation models: the keyword model and the intention model. Using these two models, the number of observations is reduced signifficantly while the POMDP performance remains high particularly in the intention POMDP. In addition to these model components, POMDPs also require a reward function. So, we propose new algorithms for learning the POMDP reward model from dialogues based on inverse reinforcement learning (IRL). In particular, we propose the POMDP-IRL-BT algorithm (BT for belief transition) that works on the belief states available in the dialogues. This algorithm learns the reward model by estimating a belief transition model, similar to MDP (Markov decision process) transition models. Ultimately, we apply the proposed methods on a healthcare domain and learn a dialogue POMDP essentially from real unannotated and noisy dialogues.

APA, Harvard, Vancouver, ISO, and other styles

5

Li, Xin. "POMDP compression and decomposition via belief state analysis." HKBU Institutional Repository, 2009. http://repository.hkbu.edu.hk/etd_ra/1012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Zheltova, Ludmila. "STRUCTURED MAINTENANCE POLICIES ON INTERIOR SAMPLE PATHS." Case Western Reserve University School of Graduate Studies / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=case1264627939.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Memarzadeh, Milad. "System-Level Adaptive Monitoring and Control of Infrastructures: A POMDP-Based Framework." Research Showcase @ CMU, 2015. http://repository.cmu.edu/dissertations/664.

Full text

Abstract:

Many infrastructure systems in the US such as road networks, bridges, water and wastewater pipelines, and wind farms are aging and their condition are deteriorating. Accurate risk analysis is crucial to extend the life span of these systems, and to guide decision making towards a sustainable use of resources. These systems are subjected to fatigue-induced degradation and need periodic inspections and repairs, which are usually performed through semi-annual, annual, or bi-annual scheduled maintenance. However, better maintenance can be achieved by flexible policies based on prior knowledge of the degradation process and on data collected in the field by sensors and visual inspections. Traditional methods to model the operation and maintenance (O&M) process, such as Markov decision processes (MDP) and partially observable MDP (POMDP) have limitations that do not allow the model to properly include the knowledge available and that may result in nonoptimal strategies for management of infrastructure systems. Specifically, the conditional probabilities for modeling the degradation process and the precision of the observations are usually affected by epistemic uncertainty: this cannot be captured by traditional methods. The goal of this dissertation is to propose a computational framework for adaptive monitoring and control of infrastructures at the system-level and to connect different aspects of the management process together. The first research question we address is how to take optimal sequential decisions under model uncertainty. Second, we propose how to combine decision optimization with learning of the degradation of components and the precision of monitoring system. Specifically, we address the issue of systems made by similar components, where iv transfer of knowledge across components is relevant. Finally, we propose how to assess the value of information in sequential decision making and whether it can be used as a heuristic for system-level inspection scheduling. In this dissertation, first a novel learning and planning method is proposed, called “Planning and Learning for Uncertain dynamic Systems” (PLUS), that can learn from the environment, update the distributions of parameters, and select the optimal strategy considering the uncertainty related to the model. Validating with synthetic data, the total management cost of operating a wind farm using PLUS is shown to be significantly less than costs achieved by a fixed policy or though the POMDP framework. Moreover, when the system is made up by similar components, data collected on one is also relevant in the management of others. This is typically the case of wind farms, which are made up by similar turbines. PLUS models the components as independent or identical and eithers learn the model for each component independently or learn a global model for all components. We extend that formulation, allowing for a weaker similarity among components. The proposed approach, called “Multiple Uncertain POMDP” (MU-POMDP), models the components as POMDPs, and assumes the corresponding model parameters as dependent random variables. By using this framework, we can calibrate specific degradation and emission models for each component while, at the same time, processing observations at the level of the entire system. We evaluate the performance of MU-POMDP compared to PLUS and discuss its potentials and computational complexity. Lastly, operation and maintenance of an infrastructure system rely on information collected on its components, which can provide the decision maker with an accurate assessment of their condition states. However, resources to be invested in data gathering are usually limited and v observations should be collected based on their value of information (VoI). VoI is a key concept for directing explorative actions, and in the context of infrastructure operation and maintenance, it has application to decisions about inspecting and monitoring the condition states of the components. Assessing the VoI is computationally intractable for most applications involving sequential decisions, such as long-term infrastructure maintenance. The component-level VoI can be used as a heuristic for assigning priorities to system-level inspection scheduling. In this research, we propose two alternative models for integrating adaptive maintenance planning based on POMDP and inspection scheduling based on a tractable approximation of VoI: the stochastic allocation model (and its two limiting scenarios called pessimistic and optimistic) that assumes observations are collected with a given probability, and the fee-based allocation model that assumes observations are available at a given cost. We illustrate how these models can be used at component-level and for system-level inspection scheduling. Furthermore, we evaluate the quality of solution provided by pessimistic and optimistic approaches. Finally, we introduce analytical formulas based on the stochastic and fee-based allocation models to predict the impact of a monitoring system (or a piece of information) on the operation and maintenance cost of infrastructure systems.

APA, Harvard, Vancouver, ISO, and other styles

8

Pinheiro, Paulo Gurgel 1983. "Planning for mobile robot localization using architectural design features on a hierarchical POMDP approach = Planejamento para localização de robôs móveis utilizando padrões arquitetônicos em um modelo hierárquico de POMDP." [s.n.], 2013. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275601.

Full text

Abstract:

Orientador: Jacques Wainer
Tese (doutorado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-24T02:06:24Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_D.pdf: 41476694 bytes, checksum: f3d5b1e2aa32aa6f00ef7ac689a261e2 (MD5) Previous issue date: 2013
Resumo: Localização de robôs móveis é uma das áreas mais exploradas da robótica devido a sua importância para a resolução de problemas, como: navegação, mapeamento e SLAM. Muitos trabalhos apresentaram soluções envolvendo cooperação, comunicação e exploração do ambiente, onde em geral a localização é obtida através de ações randômicas ou puramente orientadas pelo estado de crença. Nesta tese, é apresentado um modelo de planejamento para localização utilizando POMDP e Localização de Markov, que indicaria a melhor ação que o robô deve efetuar em cada momento, com o objetivo de diminuir a quantidade de passos. O foco está principalmente em: i) problemas de difícil localização: onde não há landmark ou informação extra no ambiente que auxilie o robô, ii) situações de performance crítica: onde o robô deve evitar passos randômicos e o gasto de energia e, por último, iii) situações com múltiplas missões. Sabendo que um robô é projetado para desempenhar missões, será proposto, neste trabalho, um modelo onde essas missões são consideradas em paralelo com a localização. Planejar para cenários com múltiplos ambientes é um desafio devido a grande quantidade de estados que deve ser tratada. Para esse tipo de problema, será apresentado um modelo de compressão de mapas que utiliza padrões arquiteturais e de design, como: quantidade de portas, paredes ou área total de um ambiente, para condensar informações que possam ser redundantes. O modelo baseia-se na similaridade das características de desing para agrupar ambientes similares e combiná-los, gerando um único mapa representante que possui uma quantidade de estados menor que a soma total de todos os estados dos ambientes do grupo. Planos em POMDP são gerados apenas para os representantes e não para todo o mapa. Finalmente, será apresentado o modelo hierárquico onde a localização é executada em duas camadas. Na camada superior, o robô utiliza os planos POMDP e os mapas compactos para estimar a grossa estimativa de sua localização e, na camada inferior, utiliza POMDP ou Localização de Markov para a obtenção da postura mais precisa. O modelo hierárquico foi demonstrado com experimentos utilizando o simulador V-REP, e o robô Pioneer 3-DX. Resultados comparativos mostraram que o robô utilizando o modelo proposto, foi capaz de realizar o processo de localização em cenários com múltiplos ambientes e cumprir a missão, mantendo a precisão com uma significativa redução na quantidade de passos efetuados
Abstract: Mobile Robot localization is one of the most explored areas in robotics due to its importance for solving problems, such as navigation, mapping and SLAM. In this work, we are interested in solving global localization problems, where the initial pose of the robot is completely unknown. Several works have proposed solutions for localization focusing on robot cooperation, communication or environment exploration, where the robot's pose is often found by a certain amount of random actions or state belief oriented actions. In order to decrease the total steps performed, we will introduce a model of planning for localization using POMDPs and Markov Localization that indicates the optimal action to be taken by the robot for each decision time. Our focus is on i) hard localization problems, where there are no special landmarks or extra features over the environment to help the robot, ii) critical performance situation, where the robot is required to avoid random actions and the waste of energy roaming over the environment, and iii) multiple missions situations. Aware the robot is designed to perform missions, we have proposed a model that runs missions and the localization process, simultaneously. Also, since the robot can have different missions, the model computes the planning for localization as an offline process, but loading the missions at runtime. Planning for multiple environments is a challenge due to the amount of states we must consider. Thus, we also proposed a solution to compress the original map, creating a smaller topological representation that is easier and cheaper to get plans done. The map compression takes advantage of the similarity of rooms found especially in offices and residential environments. Similar rooms have similar architectural design features that can be shared. To deal with the compressed map, we proposed a hierarchical approach that uses light POMDP plans and the compressed map on the higher layer to find the gross pose, and on the lower layer, decomposed maps to find the precise pose. We have demonstrated the hierarchical approach with the map compression using both V-REP Simulator and a Pioneer 3-DX robot. Comparing to other active localization models, the results show that our approach allowed the robot to perform both localization and the mission in a multiple room environment with a significant reduction on the number of steps while keeping the pose accuracy
Doutorado
Ciência da Computação
Doutor em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

9

Saldaña, Gadea Santiago Jesús. "The effectiveness of social plan sharing in online planning in POMDP-type domains." Winston-Salem, NC : Wake Forest University, 2009. http://dspace.zsr.wfu.edu/jspui/handle/10339/44699.

Full text

Abstract:

Thesis (M.S.)--Wake Forest University. Dept. of Computer Science, 2009.
Title from electronic thesis title page. Thesis advisor: William H. Turkett Jr. Vita. Includes bibliographical references (p. 47-48).

APA, Harvard, Vancouver, ISO, and other styles

10

BRAVO, RAISSA ZURLI BITTENCOURT. "THE USE OF UAVS IN HUMANITARIAN RELIEF: A POMDP BASED METHODOLOGY FOR FINDING VICTIMS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=30364@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE SUPORTE À PÓS-GRADUAÇÃO DE INSTS. DE ENSINO
O uso de Veículos Aéreos Não Tripulados (VANTs) na ajuda humanitária tem sido proposto por pesquisadores para localizar vítimas em áreas afetadas por desastres. A urgência desse tipo de operação é encontrar pessoas afetadas o mais rápido possível, o que significa que determinar a roteirização ótima para os VANTs é muito importante para salvar vidas. Como os VANTs tem que percorrer toda a área afetada para encontrar vítimas, a operação de roteirização se torna equivalente a um problema de cobertura. Neste trabalho, uma metodologia para resolver o problema de cobertura é proposta, baseada na heurística do Processo de Decisão de Markov Parcialmente Observável (POMDP), onde as observações feitas pelos VANTs são consideradas. Essa heurística escolhe as ações baseando-se nas informações disponíveis, essas informações são as ações e observações anteriores. A formulação da roteirização do VANT é baseada na ideia de dar prioridades mais altas às áreas mais propensas a terem vítimas. Para aplicar esta técnica em casos reais, foi criada uma metodologia que consiste em quatro etapas. Primeiramente, o problema é modelado em relação à área afetada, tipo de drone que será utilizado, resolução da câmera, altura média do voo, ponto de partida ou decolagem, além do tamanho e prioridade dos estados. Em seguida, a fim de testar a eficiência do algoritmo através de simulações, grupos de vítimas são distribuídos pela área a ser sobrevoada. Então, o algoritmo é iniciado e o drone, a cada iteração, muda de estado de acordo com a heurística POMDP, até percorrer toda a área afetada. Por fim, a eficiência do algoritmo é testada através de quatro estatísticas: distância percorrida, tempo de operação, percentual de cobertura e tempo para encontrar grupos de vítimas. Essa metodologia foi aplicada em dois exemplos ilustrativos: um tornado em Xanxerê, no Brasil, que foi um desastre de início súbito em Abril de 2015, e em um campo de refugiados no Sudão do Sul, um desastre de início lento que começou em 2013. Depois de fazer simulações, foi demonstrado que a solução cobre toda a área afetada por desastres em um período de tempo razoável. A distância percorrida pelo VANT e a duração da operação, que dependem do número de estados, não tiveram um desvio padrão significativo entre as simulações, o que significa que, ainda que existam vários caminhos possíveis devido ao empate das prioridades, o algoritmo tem resultados homogêneos. O tempo para encontrar grupos de vítimas, e portanto o sucesso da operação de resgate, depende da definição das prioridades dos estados, estabelecidas por um especialista. Caso as prioridades sejam mal definidas, o VANT começará a sobrevoar áreas sem vítimas, o que levará ao fracasso da operação de resgate, uma vez que o algoritmo não estará salvando vidas o mais rápido possível. Ainda foi feita uma comparação do algoritmo proposto com o método guloso. A princípio, esse método não cobriu 100 por cento da área afetada, o que tornou a comparação injusta. Para contornar esse problema, o algoritmo guloso foi forçado a percorrer 100 por cento da área afetada e os resultados mostram que o POMDP tem resultados melhores em relação ao tempo para salvar vítimas. Já em relação a distância percorrida e tempo de operação, os resultados são iguais ou melhores para o POMDP. Isso ocorre porque o algoritmo guloso tem o viés de otimizar distância percorrida e, logo, otimiza o tempo de operação. Já o POMDP tem como objetivo, nesta dissertação, salvar vidas e faz isso de forma dinâmica, atualizando sua distribuição de probabilidades a cada observação feita. O ineditismo desta metodologia é ressaltado no capítulo 3, onde mais de 139 trabalhos foram lidos e classificados com o intuito de mostrar quais são as aplicações que drones em logística humanitária, como o POMDP é usado em drones e como a técnica de simulação é utilizada em logística humanitária. Apenas um artigo propõe o u
The use of Unmanned Aerial Vehicles (UAVs) in humanitarian relief has been proposed by researchers for searching victims in disaster affected areas. The urgency of this type of operation is to find the affected people as soon as possible, which means that determining the optimal flight path for UAVs is very important to save lifes. Since the UAVs have to search through the entire affected area to find victims, the path planning operation becomes equivalent to an area coverage problem. In this study, a methodology to solve the coverage problem is proposed, based on a Partially Observable Markov Decision Processes (POMDP) heuristic, which considers the observations made from UAVs. The formulation of the UAV path planning is based on the idea of assigning higher priorities to the areas which are more likely to contain victims. The methodology was applied in two illustrative examples: a tornado in Xanxerê, Brazil, which was a rapid-onset disaster in April 2015 and a refugee s camp in South Sudan, a slow-onset disaster that started in 2013. After simulations, it is demonstrated that this solution achieves full coverage of disaster affected areas in a reasonable time span. The traveled distance and the operation s durations, which are dependent on the number of states, did not have a significative standard deviation between the simulations. It means that even if there were many possible paths, due to the tied priorities, the algorithm has homogeneous results. The time to find groups of victims, and so the success of the search and rescue operation, depends on the specialist s definition of states priorities. A comparison with a greedy algorithm showed that POMDP is faster to find victims while greedy s performance focuses on minimizing the traveled distance. Future research indicates a practical application of the methodology proposed.

APA, Harvard, Vancouver, ISO, and other styles

11

Corona, Gabriel. "Utilisation de croyances heuristiques pour la planification multi-agent dans le cadre des Dec-POMDP." Phd thesis, Université Henri Poincaré - Nancy I, 2011. http://tel.archives-ouvertes.fr/tel-00598689.

Full text

Abstract:

Nous nous intéressons dans cette thèse à la planification pour les problèmes de prise de décision décentralisée séquentielle dans l'incertain. Dans le cadre centralisé, l'utilisation des formalismes MDP et POMDP a permis d'élaborer des techniques de planification efficaces. Le cadre Dec-POMDP permet de formaliser les problèmes décentralisés. Ce type de problèmes appartient à une autre classe de complexité que les problèmes centralisés. Pour cette raison, jusqu'à récemment, seuls de très petits problèmes pouvaient être résolus et uniquement pour des horizons très faibles. Des algorithmes heuristiques ont récemment été proposés pour traiter des problèmes de taille plus conséquente mais n'ont pas de preuve théorique de qualité de solution. Nous montrons comment une information heuristique sur le problème à résoudre représentée par une distribution de probabilité sur les croyances centralisées permet de guider la recherche approchée de politique. Cette information heuristique permet de formuler chaque étape de la planification comme un problème d'optimisation combinatoire. Cette formulation conduit à des politiques de meilleure qualité que les approches existantes.

APA, Harvard, Vancouver, ISO, and other styles

12

Corona, Gabriel. "Utilisation de croyances heuristiques pour la planification multi-agent dans le cadre des Dec-POMDP." Electronic Thesis or Diss., Nancy 1, 2011. http://www.theses.fr/2011NAN10026.

Full text

Abstract:

Nous nous intéressons dans cette thèse à la planification pour les problèmes de prise de décision décentralisée séquentielle dans l'incertain. Dans le cadre centralisé, l'utilisation des formalismes MDP et POMDP a permis d'élaborer des techniques de planification efficaces. Le cadre Dec-POMDP permet de formaliser les problèmes décentralisés. Ce type de problèmes appartient à une autre classe de complexité que les problèmes centralisés. Pour cette raison, jusqu'à récemment, seuls de très petits problèmes pouvaient être résolus et uniquement pour des horizons très faibles. Des algorithmes heuristiques ont récemment été proposés pour traiter des problèmes de taille plus conséquente mais n'ont pas de preuve théorique de qualité de solution. Nous montrons comment une information heuristique sur le problème à résoudre représentée par une distribution de probabilité sur les croyances centralisées permet de guider la recherche approchée de politique. Cette information heuristique permet de formuler chaque étape de la planification comme un problème d'optimisation combinatoire. Cette formulation conduit à des politiques de meilleure qualité que les approches existantes
In this thesis, we focus on planning in decentralised sequentialdecision taking in uncertainty. In the centralised case, the MDP andPOMDP frameworks leads to efficient planning algorithms. The Dec-POMDPframework is used to model decentralised problems. This kind ofproblems is in a higher class of complexity than the centralisedproblem. For this reason, until recently, only very small problem could be solved and only for very small horizons. Recently, some heuristic algorithms have been proposed to handle problem of higher size but there is no theoretic proof of the solution quality. In this thesis, we show how to use a heuristic information in the problem, modelled as a probability distribution on the centralised beliefs, to guide the search for a good approximate policy. Using this heuristic information, we formulate each time step of the planning procedure as a combinatorial optimisation problem. This formulation leads to policies of better quality than previously existing approaches

APA, Harvard, Vancouver, ISO, and other styles

13

Morere, Philippe. "Bayesian Optimisation for Planning And Reinforcement Learning." Thesis, The University of Sydney, 2019. https://hdl.handle.net/2123/21230.

Full text

Abstract:

This thesis addresses the problem of achieving efficient non-myopic decision making by explicitly balancing exploration and exploitation. Decision making, both in planning and reinforcement learning (RL), enables agents or robots to complete tasks by acting on their environments. Complexity arises when completing objectives requires sacrificing short-term performance in order to achieve better long-term performance. Decision making algorithms with this characteristic are known as non-myopic, and require long sequences of actions to be evaluated, thereby greatly increasing the search space size. Optimal behaviours need balance two key quantities: exploration and exploitation. Exploitation takes advantage of previously acquired information or high performing solutions, whereas exploration focuses on acquiring more informative data. The balance between these quantities is crucial in both RL and planning. This thesis brings the following contributions: Firstly, a reward function trading off exploration and exploitation of gradients for sequential planning is proposed. It is based on Bayesian optimisation (BO) and is combined to a non-myopic planner to achieve efficient spatial monitoring. Secondly, the algorithm is extended to continuous actions spaces, called continuous belief tree search (CBTS), and uses BO to dynamically sample actions within a tree search, balancing high-performing actions and novelty. Finally, the framework is extended to RL, for which a multi-objective methodology for explicit exploration and exploitation balance is proposed. The two objectives are modelled explicitly and balanced at a policy level, as in BO. This allows for online exploration strategies, as well as a data-efficient model-free RL algorithm achieving exploration by minimising the uncertainty of Q-values (EMU-Q). The proposed algorithms are evaluated on different simulated and real-world robotics problems, displaying superior performance in terms of sample efficiency and exploration.

APA, Harvard, Vancouver, ISO, and other styles

14

Marchant, Matus Roman. "Bayesian Optimisation for Planning in Dynamic Environments." Thesis, The University of Sydney, 2015. http://hdl.handle.net/2123/14497.

Full text

Abstract:

This thesis addresses the problem of trajectory planning for monitoring extreme values of an environmental phenomenon that changes in space and time. The most relevant case study corresponds to environmental monitoring using an autonomous mobile robot for air, water and land pollution monitoring. Since the dynamics of the phenomenon are initially unknown, the planning algorithm needs to satisfy two objectives simultaneously: 1) Learn and predict spatial-temporal patterns and, 2) find areas of interest (e.g. high pollution), addressing the exploration-exploitation trade-off. Consequently, the thesis brings the following contributions: Firstly, it applies and formulates Bayesian Optimisation (BO) to planning in robotics. By maintaining a Gaussian Process (GP) model of the environmental phenomenon the planning algorithms are able to learn the spatial and temporal patterns. A new family of acquisition functions which consider the position of the robot is proposed, allowing an efficient trajectory planning. Secondly, BO is generalised for optimisation over continuous paths, not only determining where and when to sample, but also how to get there. Under these new circumstances, the optimisation of the acquisition function for each iteration of the BO algorithm becomes costly, thus a second layer of BO is included in order to effectively reduce the number of iterations. Finally, this thesis presents Sequential Bayesian Optimisation (SBO), which is a generalisation of the plain BO algorithm with the goal of achieving non-myopic trajectory planning. SBO is formulated under a Partially Observable Markov Decision Process (POMDP) framework, which can find the optimal decision for a sequence of actions with their respective outcomes. An online solution of the POMDP based on Monte Carlo Tree Search (MCTS) allows an efficient search of the optimal action for multistep lookahead. The proposed planning algorithms are evaluated under different scenarios. Experiments on large scale ozone pollution monitoring and indoor light intensity monitoring are conducted for simulated and real robots. The results show the advantages of planning over continuous paths and also demonstrate the benefit of deeper search strategies using SBO.

APA, Harvard, Vancouver, ISO, and other styles

15

Aberdeen, Douglas Alexander, and doug aberdeen@anu edu au. "Policy-Gradient Algorithms for Partially Observable Markov Decision Processes." The Australian National University. Research School of Information Sciences and Engineering, 2003. http://thesis.anu.edu.au./public/adt-ANU20030410.111006.

Full text

Abstract:

Partially observable Markov decision processes are interesting because of their ability to model most conceivable real-world learning problems, for example, robot navigation, driving a car, speech recognition, stock trading, and playing games. The downside of this generality is that exact algorithms are computationally intractable. Such computational complexity motivates approximate approaches. One such class of algorithms are the so-called policy-gradient methods from reinforcement learning. They seek to adjust the parameters of an agent in the direction that maximises the long-term average of a reward signal. Policy-gradient methods are attractive as a \emph{scalable} approach for controlling partially observable Markov decision processes (POMDPs). ¶ In the most general case POMDP policies require some form of internal state, or memory, in order to act optimally. Policy-gradient methods have shown promise for problems admitting memory-less policies but have been less successful when memory is required. This thesis develops several improved algorithms for learning policies with memory in an infinite-horizon setting. Directly, when the dynamics of the world are known, and via Monte-Carlo methods otherwise. The algorithms simultaneously learn how to act and what to remember. ¶ Monte-Carlo policy-gradient approaches tend to produce gradient estimates with high variance. Two novel methods for reducing variance are introduced. The first uses high-order filters to replace the eligibility trace of the gradient estimator. The second uses a low-variance value-function method to learn a subset of the parameters and a policy-gradient method to learn the remainder. ¶ The algorithms are applied to large domains including a simulated robot navigation scenario, a multi-agent scenario with 21,000 states, and the complex real-world task of large vocabulary continuous speech recognition. To the best of the author's knowledge, no other policy-gradient algorithms have performed well at such tasks. ¶ The high variance of Monte-Carlo methods requires lengthy simulation and hence a super-computer to train agents within a reasonable time. The ANU ``Bunyip'' Linux cluster was built with such tasks in mind. It was used for several of the experimental results presented here. One chapter of this thesis describes an application written for the Bunyip cluster that won the international Gordon-Bell prize for price/performance in 2001.

APA, Harvard, Vancouver, ISO, and other styles

16

Ferrari, Fabio Valerio. "Cooperative POMDPs for human-Robot joint activities." Thesis, Normandie, 2017. http://www.theses.fr/2017NORMC257/document.

Full text

Abstract:

Objectif de cette thèse est le développent de méthodes de planification pour la résolution de tâches jointes homme-robot dans des espaces publiques. Dans les espaces publiques, les utilisateurs qui coopèrent avec le robot peuvent facilement se distraire et abandonner la tâche jointe. Cette thèse se focalise donc sur les défis posés par l’incertitude et imprévisibilité d’une coopération avec un humain. La thèse décrit l’état de l’art sur la coopération homme-robot dans la robotique de service, et sur les modèles de planification. Elle présente ensuite une nouvelle approche théorique, basée sur les processus décisionnels de Markov partiellement observables, qui permet de garantir la coopération de l’humain tout au long de la tâche, de façon flexible, robuste et rapide. La thèse introduit une structure hiérarchique qui sépare l’aspect coopératif d’une activité jointe de la tâche en soi. L’approche a été appliquée dans un scénario réel, un robot guide dans un centre commercial. La thèse présente les expériences effectuées pour mesurer la qualité de l’approche proposée, ainsi que les expériences avec le robot réel
This thesis presents a novel method for ensuring cooperation between humans and robots in public spaces, under the constraint of human behavior uncertainty. The thesis introduces a hierarchical and flexible framework based on POMDPs. The framework partitions the overall joint activity into independent planning modules, each dealing with a specific aspect of the joint activity: either ensuring the human-robot cooperation, or proceeding with the task to achieve. The cooperation part can be solved independently from the task and executed as a finite state machine in order to contain online planning effort. In order to do so, we introduce a belief shift function and describe how to use it to transform a POMDP policy into an executable finite state machine.The developed framework has been implemented in a real application scenario as part of the COACHES project. The thesis describes the Escort mission used as testbed application and the details of implementation on the real robots. This scenario has as well been used to carry several experiments and to evaluate our contributions

APA, Harvard, Vancouver, ISO, and other styles

17

Pinault, Florian. "Apprentissage par renforcement pour la généralisation des approches automatiques dans la conception des systèmes de dialogue oral." Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00933937.

Full text

Abstract:

Les systèmes de dialogue homme machine actuellement utilisés dans l'industrie sont fortement limités par une forme de communication très rigide imposant à l'utilisateur de suivre la logique du concepteur du système. Cette limitation est en partie due à leur représentation de l'état de dialogue sous la forme de formulaires préétablis.Pour répondre à cette difficulté, nous proposons d'utiliser une représentation sémantique à structure plus riche et flexible visant à permettre à l'utilisateur de formuler librement sa demande.Une deuxième difficulté qui handicape grandement les systèmes de dialogue est le fort taux d'erreur du système de reconnaissance vocale. Afin de traiter ces erreurs de manière quantitative, la volonté de réaliser une planification de stratégie de dialogue en milieu incertain a conduit à utiliser des méthodes d'apprentissage par renforcement telles que les processus de décision de Markov partiellement observables (POMDP). Mais un inconvénient du paradigme POMDP est sa trop grande complexité algorithmique. Certaines propositions récentes permettent de réduire la complexité du modèle. Mais elles utilisent une représentation en formulaire et ne peuvent être appliqués directement à la représentation sémantique riche que nous proposons d'utiliser.Afin d'appliquer le modèle POMDP dans un système dont le modèle sémantique est complexe, nous proposons une nouvelle façon de contrôler sa complexité en introduisant un nouveau paradigme : le POMDP résumé à double suivi de la croyance. Dans notre proposition, le POMDP maitre, complexe, est transformé en un POMDP résumé, plus simple. Un premier suivi de croyance (belief update) est réalisé dans l'espace maitre (en intégrant des observations probabilistes sous forme de listes nbest). Et un second suivi de croyance est réalisé dans l'espace résumé, les stratégies obtenues sont ainsi optimisées sur un véritable POMDP.Nous proposons deux méthodes pour définir la projection du POMDP maitre en un POMDP résumé : par des règles manuelles et par regroupement automatique par k plus proches voisins. Pour cette dernière, nous proposons d'utiliser la distance d'édition entre graphes, que nous généralisons pour obtenir une distance entre listes nbest.En outre, le couplage entre un système résumé, reposant sur un modèle statistique par POMDP, et un système expert, reposant sur des règles ad hoc, fournit un meilleur contrôle sur la stratégie finale. Ce manque de contrôle est en effet une des faiblesses empêchant l'adoption des POMDP pour le dialogue dans l'industrie.Dans le domaine du renseignement d'informations touristiques et de la réservation de chambres d'hôtel, les résultats sur des dialogues simulés montrent l'efficacité de l'approche par renforcement associée à un système de règles pour s'adapter à un environnement bruité. Les tests réels sur des utilisateurs humains montrent qu'un système optimisé par renforcement obtient cependant de meilleures performances sur le critère pour lequel il a été optimisé.

APA, Harvard, Vancouver, ISO, and other styles

18

Habachi, Oussama. "Optimisation des Systèmes Partiellement Observables dans les Réseaux Sans-fil : Théorie des jeux, Auto-adaptation et Apprentissage." Phd thesis, Université d'Avignon, 2012. http://tel.archives-ouvertes.fr/tel-00799903.

Full text

Abstract:

La dernière décennie a vu l'émergence d'Internet et l'apparition des applications multimédia qui requièrent de plus en plus de bande passante, ainsi que des utilisateurs qui exigent une meilleure qualité de service. Dans cette perspective, beaucoup de travaux ont été effectués pour améliorer l'utilisation du spectre sans fil.Le sujet de ma thèse de doctorat porte sur l'application de la théorie des jeux, la théorie des files d'attente et l'apprentissage dans les réseaux sans fil,en particulier dans des environnements partiellement observables. Nous considérons différentes couches du modèle OSI. En effet, nous étudions l'accès opportuniste au spectre sans fil à la couche MAC en utilisant la technologie des radios cognitifs (CR). Par la suite, nous nous concentrons sur le contrôle de congestion à la couche transport, et nous développons des mécanismes de contrôle de congestion pour le protocole TCP.

APA, Harvard, Vancouver, ISO, and other styles

19

Vanegas, Alvarez Fernando. "Uncertainty based online planning for UAV missions in GPS-denied and cluttered environments." Thesis, Queensland University of Technology, 2017. https://eprints.qut.edu.au/103846/1/Fernando_Vanegas%20Alvarez_Thesis.pdf.

Full text

Abstract:

This research is a novel approach to enabling Unmanned Aerial Vehicle (UAV) navigation and target finding and tracking missions under uncertainty in cluttered and GPS-denied environments. A novel framework, implemented as a modular system, formulates the missions as online Partially Observable Markov Decision Processes (POMDP). The online POMDP computes a motion policy that balances multiple mission objectives optimally. The motion policy is updated while flying based onboard sensor observations. This research provides an enabling technology for UAV missions such as search and rescue, biodiversity assessment, underground mining and infrastructure inspection in challenging and natural environments.

APA, Harvard, Vancouver, ISO, and other styles

20

Raiss, El Fenni Mohammed. "Opportunistic spectrum usage and optimal control in heterogeneous wireless networks." Phd thesis, Université d'Avignon, 2012. http://tel.archives-ouvertes.fr/tel-00907120.

Full text

Abstract:

The present dissertation deals with how to use the precious wireless resources that are usually wasted by under-utilization of networks. We have been particularly interested by all resources that can be used in an opportunistic fashion using different technologies. We have designed new schemes for better and more efficient use of wireless systems by providing mathematical frameworks. In the first part, We have been interested in cognitive radio networks, where a cellular service provider can lease a part of its resources to secondary users or virtual providers. In the second part, we have chosen delay-tolerant networks as a solution to reduce the pressure on the cell traffic, where mobile users come to use available resources effectively and with a cheaper cost. We have focused on optimal strategy for smartphones in hybrid wireless networks. In the last part, an alternative to delay-tolerant networks, specially in regions that are not covered by the cellular network, is to use Ad-hoc networks. Indeed, they can be used as an extension of the coverage area. We have developed a new analytical modeling of the IEEE 802.11e DCF/EDCF. We have investigated the intricate interactions among layers by building a general cross-layered framework to represent multi-hop ad hoc networks with asymmetric topology and traffic

APA, Harvard, Vancouver, ISO, and other styles

21

Zhang, Zhao. "Learning Path Recommendation : A Sequential Decision Process." Electronic Thesis or Diss., Université de Lorraine, 2022. http://www.theses.fr/2022LORR0108.

Full text

Abstract:

Au cours des deux dernières décennies, nous avons assisté à une adoption croissante du numérique dans le domaine de l'education. Cela est accompagné par un accroissement du nombre de ressources pédagogiques accessibles par les apprenants. Par conséquent, des systèmes de recommandation deviennent nécessaires pour aider les apprenants à trouver des ressources qui leur sont utiles. En particulier, cela inclut les systèmes de recommandation de parcours d'apprentissage qui visent par exemple à améliorer l'expérience d'apprentissage des apprenants, et notamment leur niveau de connaissance. Dans ce contexte, cette thèse se concentre sur le domaine des systèmes de recommandation de parcours d'apprentissage et sur l'évaluation de ces parcours d'apprentissage recommandés. Cette thèse propose d'aborder la tâche de recommandation comme un problème de prise de décision séquentielle et considère les processus décisionnels de Markov partiellement observables comme une approche adéquate. Dans le domaine spécifique de l'éducation, la mémoire des apprenants est un facteur très important qui doit être pris en compte, et cela a été proposé dans la littérature et utilisé pour promouvoir des recommandations liées à de la révision. Cependant, peu de travaux ont été menés pour la recommandation basée sur des POMDP, et les modèles proposés sont complexes et requièrent beaucoup de données. Cette thèse propose deux modèles de recommandation basés sur POMDP qui considèrent la mémoire des apprenants, tout en limitant la complexité et le volume de données requis. L'évaluation de la recommandation d'un parcours d'apprentissage est une tâche difficile de la littérature, qui peut être effectuée soit en ligne ou hors ligne. L'évaluation en ligne est très populaire, mais elle repose sur des recommandations effectives de parcours aux apprenants, ce qui peut avoir des conséquences dramatiques si les recommandations ne sont pas de qualité. L'évaluation hors ligne repose sur des ensembles de données statiques des activités d'apprentissage des apprenants et simule les recommandations de parcours d'apprentissage. Bien que plus facile à exécuter, il est difficile de procéder à une évaluation hors ligne de l'efficacité d'une recommandation de parcours d'apprentissage avec précision. Ceci tend à justifier le manque de travaux de la littérature sur ce sujet. Pour résoudre ce problème, cette thèse propose également des mesures d'évaluation hors ligne simples. Enfin, ces algorithmes et mesures sont évaluées sur deux jeux de données réels. Nous avons montré que les algorithmes de recommandation proposés ont une qualité de recommandation supérieure à ceux de la littérature, avec une augmentation de la complexité limitée, y compris sur un jeu de données de taille moyenne. En ce qui concerne les mesures d'évaluation, nous avons montré qu'elles permettent effectivement de caractériser et de différencier les algorithmes de recommandation
Over the past couple of decades, there has been an increasing adoption of Internet technology in the e-learning domain, associated with the availability of an increasing number of educational resources. Effective systems are thus needed to help learners to find useful and adequate resources, among which recommender systems play an important role. In particular, learning path recommender systems, that recommend sequences of educational resources, are highly valuable to improve learners' learning experiences. Under this context, this PhD Thesis focuses on the field of learning path recommender systems and the associated offline evaluation of these systems. This PhD Thesis views the learning path recommendation task as a sequential decision problem and considers the partially observable Markov decision process (POMDP) as an adequate approach. In the field of education, the learners' memory strength is a very important factor and several models of learners' memory strength have been proposed in the literature and used to promote review in recommendations. However, little work has been conducted for POMDP-based recommendations, and the models proposed are complex and data-intensive. This PhD Thesis proposes POMDP-based recommendation models that manage learners' memory strength, while limiting the increase in complexity and data required. Under the premise that recommending learners useful and effective learning paths is becoming more and more popular, the evaluation of the effectiveness these recommended learning paths is still a challenging task, that is not often addressed in the literature. Online evaluation is highly popular but it relies on the path recommendations to actual learners, which may have dramatic implications if the recommendations are not accurate. Offline evaluation relies on static datasets of learners' learning activities and simulates learning paths recommendations. Although easier to run, it is difficult to accurately evaluate the effectiveness of a learning path recommendation. This tends to justify the lack of literature on this topic. To tackle this issue, this PhD Thesis also proposes offline evaluation measures, that are designed to be simple to be used in most of the application cases. The recommendation models and evaluation measures the we propose are evaluated on two real learning datasets. The experiments confirm that the recommendation models proposed outperform the models from the literature, with a limited increase in complexity, including for a medium-size dataset

APA, Harvard, Vancouver, ISO, and other styles

22

Ponzoni, Carvalho Chanel Caroline. "Planification de perception et de mission en environnement incertain : Application à la détection et à la reconnaissance de cibles par un hélicoptère autonome." Thesis, Toulouse, ISAE, 2013. http://www.theses.fr/2013ESAE0011/document.

Full text

Abstract:

Les agents robotiques mobiles ou aériens sont confrontés au besoin de planifier des actions avec information incomplètesur l'état du monde. Dans ce contexte, cette thèse propose un cadre de modélisation et de résolution de problèmes deplanification de perception et de mission pour un drone hélicoptère qui évolue dans un environnement incertain etpartiellement observé afin de détecter et de reconnaître des cibles. Nous avons fondé notre travail sur les ProcessusDécisionnels Markoviens Partiellement Observables (POMDP), car ils proposent un schéma d'optimisation général pour lestâches de perception et de décision à long terme. Une attention particulière est donnée à la modélisation des sortiesincertaines de l'algorithme de traitement d'image en tant que fonction d'observation. Une analyse critique de la mise enoeuvre en pratique du modèle POMDP et du critère d'optimisation associé est proposée. Afin de respecter les contraintes desécurité et de sûreté de nos robots aériens, nous proposons ensuite une approche pour tenir compte des propriétés defaisabilité d'actions dans des domaines partiellement observables : le modèle AC-POMDP, qui sépare l'informationconcernant la vérification des propriétés du modèle, de celle qui renseigne sur la nature des cibles. Enfin, nous proposonsun cadre d'optimisation et d'exécution en parallèle de politiques POMDP en temps contraint. Ce cadre est basé sur uneoptimisation anticipée et probabilisée des états d'exécution futurs du système. Nous avons embarqué ce cadrealgorithmique sur les hélicoptères autonomes de l'Onera, et l'avons testé en vol et en environnement réel sur une missionde détection et reconnaissance de cibles
Mobile and aerial robots are faced to the need of planning actions with incomplete information about the state of theworld. In this context, this thesis proposes a modeling and resolution framework for perception and mission planningproblems where an autonomous helicopter must detect and recognize targets in an uncertain and partially observableenvironment. We founded our work on Partially Observable Markov Decision Processes (POMDPs), because it proposes ageneral optimization framework for perception and decision tasks under long-term horizon. A special attention is given tothe outputs of the image processing algorithm in order to model its uncertain behavior as a probabilistic observationfunction. A critical study on the POMDP model and its optimization criterion is also conducted. In order to respect safetyconstraints of aerial robots, we then propose an approach to properly handle action feasibility constraints in partiallyobservable domains: the AC-POMDP model, which distinguishes between the verification of environmental properties andthe information about targets' nature. Furthermore, we propose a framework to optimize and execute POMDP policies inparallel under time constraints. This framework is based on anticipated and probabilistic optimization of future executionstates of the system. Finally, we embedded this algorithmic framework on-board Onera's autonomous helicopters, andperformed real flight experiments for multi-target detection and recognition missions

APA, Harvard, Vancouver, ISO, and other styles

23

Olafsson, Björgvin. "Partially Observable Markov Decision Processes for Faster Object Recognition." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-198632.

Full text

Abstract:

Object recognition in the real world is a big challenge in the field of computer vision. Given the potentially enormous size of the search space it is essential to be able to make intelligent decisions about where in the visual field to obtain information from to reduce the computational resources needed. In this report a POMDP (Partially Observable Markov Decision Process) learning framework, using a policy gradient method and information rewards as a training signal, has been implemented and used to train fixation policies that aim to maximize the information gathered in each fixation. The purpose of such policies is to make object recognition faster by reducing the number of fixations needed. The trained policies are evaluated by simulation and comparing them with several fixed policies. Finally it is shown that it is possible to use the framework to train policies that outperform the fixed policies for certain observation models.

APA, Harvard, Vancouver, ISO, and other styles

24

Hudson, Joshua. "A Partially Observable Markov Decision Process for Breast Cancer Screening." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154437.

Full text

Abstract:

In the US, breast cancer is one of the most common forms of cancer and the most lethal. There are many decisions that must be made by the doctor and/or the patient when dealing with a potential breast cancer. Many of these decisions are made under uncertainty, whether it is the uncertainty related to the progression of the patient's health, or that related to the accuracy of the doctor's tests. Each possible action under consideration can have positive effects, such as a surgery successfully removing a tumour, and negative effects: a post-surgery infection for example. The human mind simply cannot take into account all the variables involved and possible outcomes when making these decisions. In this report, a detailed Partially Observable Markov Decision Process (POMDP) for breast cancer screening decisions is presented. It includes 151 states, covering 144 different cancer states, and 2 competing screening methods. The necessary parameters were first set up using relevant medical literature and a patient history simulator. Then the POMDP was solved optimally for an infinite horizon, using the Perseus algorithm. The resulting policy provided several recommendations for breast cancer screening. The results indicated that clinical breast examinations are important for screening younger women. Regarding the decision to operate on a woman with breast cancer, the policy showed that invasive cancers with either a tumour size above 1.5 cm or which are in metastasis, should be surgically removed as soon as possible. However, the policy also recommended that patients who are certain to be healthy should have a breast biopsy. The cause of this error was explored further and the conclusion was reached that a finite horizon may be more appropriate for this application.

APA, Harvard, Vancouver, ISO, and other styles

25

Dutech, Alain. "Apprentissage par Renforcement : Au delà des Processus Décisionnels de Markov (Vers la cognition incarnée)." Habilitation à diriger des recherches, Université Nancy II, 2010. http://tel.archives-ouvertes.fr/tel-00549108.

Full text

Abstract:

Ce document présente mon ``projet de recherche'' sur le thème de l'embodiment (``cognition incarnée'') au croisement des sciences cognitives, de l'intelligence artificielle et de la robotique. Plus précisément, je montre comment je compte explorer la façon dont un agent, artificiel ou biologique, élabore des représentations utiles et pertinentes de son environnement. Dans un premier temps, je positionne mes travaux en explicitant notamment les concepts de l'embodiment et de l'apprentissage par renforcement. Je m'attarde notamment sur la problématique de l'apprentissage par renforcement pour des tâches non-Markoviennes qui est une problématique commune aux différents travaux de recherche que j'ai menés au cours des treize dernières années dans des contextes mono et multi-agents, mais aussi robotique. L'analyse de ces travaux et de l'état de l'art du domaine me conforte dans l'idée que la principale difficulté pour l'agent est bien celle de trouver des représentations adaptées, utiles et pertinentes. J'argumente que l'on se retrouve face à une problématique fondamentale de la cognition, intimement liée aux problèmes de ``l'ancrage des symboles'', du ``frame problem'' et du fait ``d'être en situation'' et qu'on ne pourra y apporter des réponses que dans le cadre de l'embodiment. C'est à partir de ce constat que, dans une dernière partie, j'aborde les axes et les approches que je vais suivre pour poursuivre mes travaux en développant des techniques d'apprentissage robotique qui soient incrémentales, holistiques et motivationnelles.

APA, Harvard, Vancouver, ISO, and other styles

26

Pradhan, Neil. "Deep Reinforcement Learning for Autonomous Highway Driving Scenario." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-289444.

Full text

Abstract:

We present an autonomous driving agent on a simulated highway driving scenario with vehicles such as cars and trucks moving with stochastically variable velocity profiles. The focus of the simulated environment is to test tactical decision making in highway driving scenarios. When an agent (vehicle) maintains an optimal range of velocity it is beneficial both in terms of energy efficiency and greener environment. In order to maintain an optimal range of velocity, in this thesis work I proposed two novel reward structures: (a) gaussian reward structure and (b) exponential rise and fall reward structure. I trained respectively two deep reinforcement learning agents to study their differences and evaluate their performance based on a set of parameters that are most relevant in highway driving scenarios. The algorithm implemented in this thesis work is double-dueling deep-Q-network with prioritized experience replay buffer. Experiments were performed by adding noise to the inputs, simulating Partially Observable Markov Decision Process in order to obtain reliability comparison between different reward structures. Velocity occupancy grid was found to be better than binary occupancy grid as input for the algorithm. Furthermore, methodology for generating fuel efficient policies has been discussed and demonstrated with an example.
Vi presenterar ett autonomt körföretag på ett simulerat motorvägsscenario med fordon som bilar och lastbilar som rör sig med stokastiskt variabla hastighetsprofiler. Fokus för den simulerade miljön är att testa taktiskt beslutsfattande i motorvägsscenarier. När en agent (fordon) upprätthåller ett optimalt hastighetsområde är det fördelaktigt både när det gäller energieffektivitet och grönare miljö. För att upprätthålla ett optimalt hastighetsområde föreslog jag i detta avhandlingsarbete två nya belöningsstrukturer: (a) gaussisk belöningsstruktur och (b) exponentiell uppgång och nedgång belöningsstruktur. Jag utbildade respektive två djupförstärkande inlärningsagenter för att studera deras skillnader och utvärdera deras prestanda baserat på en uppsättning parametrar som är mest relevanta i motorvägsscenarier. Algoritmen som implementeras i detta avhandlingsarbete är dubbel-duell djupt Q- nätverk med prioriterad återuppspelningsbuffert. Experiment utfördes genom att lägga till brus i ingångarna, simulera delvis observerbar Markov-beslutsprocess för att erhålla tillförlitlighetsjämförelse mellan olika belöningsstrukturer. Hastighetsbeläggningsgaller visade sig vara bättre än binärt beläggningsgaller som inmatning för algoritmen. Dessutom har metodik för att generera bränsleeffektiv politik diskuterats och demonstrerats med ett exempel.

APA, Harvard, Vancouver, ISO, and other styles

27

Drougard, Nicolas. "Exploiting imprecise information sources in sequential decision making problems under uncertainty." Thesis, Toulouse, ISAE, 2015. http://www.theses.fr/2015ESAE0037/document.

Full text

Abstract:

Les Processus Décisionnels de Markov Partiellement Observables (PDMPOs) permettent de modéliser facilement lesproblèmes probabilistes de décision séquentielle dans l'incertain. Lorsqu'il s'agit d'une mission robotique, lescaractéristiques du robot et de son environnement nécessaires à la définition de la mission constituent le système. Son étatn'est pas directement visible par l'agent (le robot). Résoudre un PDMPO revient donc à calculer une stratégie qui remplit lamission au mieux en moyenne, i.e. une fonction prescrivant les actions à exécuter selon l'information reçue par l'agent. Cetravail débute par la mise en évidence, dans le contexte robotique, de limites pratiques du modèle PDMPO: ellesconcernent l'ignorance de l'agent, l'imprécision du modèle d'observation ainsi que la complexité de résolution. Unhomologue du modèle PDMPO appelé pi-PDMPO, simplifie la représentation de l'incertitude: il vient de la Théorie desPossibilités Qualitatives qui définit la plausibilité des événements de manière qualitative, permettant la modélisation del'imprécision et de l'ignorance. Une fois les modèles PDMPO et pi-PDMPO présentés, une mise à jour du modèle possibilisteest proposée. Ensuite, l'étude des pi-PDMPOs factorisés permet de mettre en place un algorithme appelé PPUDD utilisantdes Arbres de Décision Algébriques afin de résoudre plus facilement les problèmes structurés. Les stratégies calculées parPPUDD, testées par ailleurs lors de la compétition IPPC 2014, peuvent être plus efficaces que celles des algorithmesprobabilistes dans un contexte d'imprécision ou de grande dimension. Cette thèse propose d'utiliser les possibilitésqualitatives dans le but d'obtenir des améliorations en termes de temps de calcul et de modélisation
Partially Observable Markov Decision Processes (POMDPs) define a useful formalism to express probabilistic sequentialdecision problems under uncertainty. When this model is used for a robotic mission, the system is defined as the featuresof the robot and its environment, needed to express the mission. The system state is not directly seen by the agent (therobot). Solving a POMDP consists thus in computing a strategy which, on average, achieves the mission best i.e. a functionmapping the information known by the agent to an action. Some practical issues of the POMDP model are first highlightedin the robotic context: it concerns the modeling of the agent ignorance, the imprecision of the observation model and thecomplexity of solving real world problems. A counterpart of the POMDP model, called pi-POMDP, simplifies uncertaintyrepresentation with a qualitative evaluation of event plausibilities. It comes from Qualitative Possibility Theory whichprovides the means to model imprecision and ignorance. After a formal presentation of the POMDP and pi-POMDP models,an update of the possibilistic model is proposed. Next, the study of factored pi-POMDPs allows to set up an algorithmnamed PPUDD which uses Algebraic Decision Diagrams to solve large structured planning problems. Strategies computedby PPUDD, which have been tested in the context of the competition IPPC 2014, can be more efficient than those producedby probabilistic solvers when the model is imprecise or for high dimensional problems. This thesis proposes some ways ofusing Qualitative Possibility Theory to improve computation time and uncertainty modeling in practice

APA, Harvard, Vancouver, ISO, and other styles

28

Araya-López, Mauricio. "Des algorithmes presque optimaux pour les problèmes de décision séquentielle à des fins de collecte d'information." Phd thesis, Université de Lorraine, 2013. http://tel.archives-ouvertes.fr/tel-00943513.

Full text

Abstract:

Le formalisme des MDP, comme ses variantes, sert typiquement à contrôler l'état d'un système par l'intermédiaire d'un agent et de sa politique. Lorsque l'agent fait face à des informations incomplètes, sa politique peut eff ectuer des actions pour acquérir de l'information typiquement (1) dans le cas d'une observabilité partielle, ou (2) dans le cas de l'apprentissage par renforcement. Toutefois cette information ne constitue qu'un moyen pour contrôler au mieux l'état du système, de sorte que la collecte d'informations n'est qu'une conséquence de la maximisation de la performance escomptée. Cette thèse s'intéresse au contraire à des problèmes de prise de décision séquentielle dans lesquels l'acquisition d'information est une fin en soi. Plus précisément, elle cherche d'abord à savoir comment modi fier le formalisme des POMDP pour exprimer des problèmes de collecte d'information et à proposer des algorithmes pour résoudre ces problèmes. Cette approche est alors étendue à des tâches d'apprentissage par renforcement consistant à apprendre activement le modèle d'un système. De plus, cette thèse propose un nouvel algorithme d'apprentissage par renforcement bayésien, lequel utilise des transitions locales optimistes pour recueillir des informations de manière e fficace tout en optimisant la performance escomptée. Grâce à une analyse de l'existant, des résultats théoriques et des études empiriques, cette thèse démontre que ces problèmes peuvent être résolus de façon optimale en théorie, que les méthodes proposées sont presque optimales, et que ces méthodes donnent des résultats comparables ou meilleurs que des approches de référence. Au-delà de ces résultats concrets, cette thèse ouvre la voie (1) à une meilleure compréhension de la relation entre la collecte d'informations et les politiques optimales dans les processus de prise de décision séquentielle, et (2) à une extension des très nombreux travaux traitant du contrôle de l'état d'un système à des problèmes de collecte d'informations.

APA, Harvard, Vancouver, ISO, and other styles

29

Pokharel, Gaurab. "Increasing the Value of Information During Planning in Uncertain Environments." Oberlin College Honors Theses / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1624976272271825.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Ibrahim, Rita. "Utilisation des communications Device-to-Device pour améliorer l'efficacité des réseaux cellulaires." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLC002/document.

Full text

Abstract:

Cette thèse étudie les communications directes entre les mobiles, appelées communications D2D, en tant que technique prometteuse pour améliorer les futurs réseaux cellulaires. Cette technologie permet une communication directe entre deux terminaux mobiles sans passer par la station de base. La modélisation, l'évaluation et l'optimisation des différents aspects des communications D2D constituent les objectifs fondamentaux de cette thèse et sont réalisés principalement à l'aide des outils mathématiques suivants: la théorie des files d'attente, l'optimisation de Lyapunov et les processus de décision markovien partiellement observable POMDP. Les résultats de cette étude sont présentés en trois parties. Dans la première partie, nous étudions un schéma de sélection entre mode cellulaire et mode D2D. Nous dérivons les régions de stabilité des scénarios suivants: réseaux cellulaires purs et réseaux cellulaires où les communications D2D sont activées. Une comparaison entre ces deux scénarios conduit à l'élaboration d'un algorithme de sélection entre le mode cellulaire et le mode D2D qui permet d'améliorer la capacité du réseau. Dans la deuxième partie, nous développons un algorithme d'allocation de ressources des communications D2D. Les utilisateurs D2D sont en mesure d'estimer leur propre qualité de canal, cependant la station de base a besoin de recevoir des messages de signalisation pour acquérir cette information. Sur la base de cette connaissance disponibles au niveau des utilisateurs D2D, une approche d'allocation des ressources est proposée afin d'améliorer l'efficacité énergétique des communications D2D. La version distribuée de cet algorithme s'avère plus performante que celle centralisée. Dans le schéma distribué des collisions peuvent se produire durant la transmission de l'état des canaux D2D ; ainsi un algorithme de réduction des collisions est élaboré. En outre, la mise en œuvre des algorithmes centralisé et distribué dans un réseau cellulaire, type LTE, est décrite en détails. Dans la troisième partie, nous étudions une politique de sélection des relais D2D mobiles. La mobilité des relais représente un des principaux défis que rencontre toute stratégie de sélection de relais. Le problème est modélisé par un processus contraint de décision markovien partiellement observable qui prend en compte le dynamisme des relais et vise à trouver la politique de sélection de relais qui optimise la performance du réseau cellulaire sous des contraintes de coût
This thesis considers Device-to-Device (D2D) communications as a promising technique for enhancing future cellular networks. Modeling, evaluating and optimizing D2D features are the fundamental goals of this thesis and are mainly achieved using the following mathematical tools: queuing theory, Lyapunov optimization and Partially Observed Markov Decision Process (POMDP). The findings of this study are presented in three parts. In the first part, we investigate a D2D mode selection scheme. We derive the queuing stability regions of both scenarios: pure cellular networks and D2D-enabled cellular networks. Comparing both scenarios leads us to elaborate a D2D vs cellular mode selection design that improves the capacity of the network. In the second part, we develop a D2D resource allocation algorithm. We observe that D2D users are able to estimate their local Channel State Information (CSI), however the base station needs some signaling exchange to acquire this information. Based on the D2D users' knowledge of their local CSI, we provide an energy efficient resource allocation framework that shows how distributed scheduling outperforms centralized one. In the distributed approach, collisions may occur between the different CSI reporting; thus, we propose a collision reduction algorithm. Moreover, we give a detailed description on how both centralized and distributed algorithms can be implemented in practice. In the third part, we propose a mobile relay selection policy in a D2D relay-aided network. Relays' mobility appears as a crucial challenge for defining the strategy of selecting the optimal D2D relays. The problem is formulated as a constrained POMDP which captures the dynamism of the relays and aims to find the optimal relay selection policy that maximizes the performance of the network under cost constraints

APA, Harvard, Vancouver, ISO, and other styles

31

Allen, Martin William. "Agent interactions in decentralized environments." Amherst, Mass. : University of Massachusetts Amherst, 2009. http://scholarworks.umass.edu/open_access_dissertations/1.

Full text

Abstract:

The decentralized Markov decision process (Dec-POMDP) is a powerful formal model for studying multiagent problems where cooperative, coordinated action is optimal, but each agent acts based on local data alone. Unfortunately, it is known that Dec-POMDPs are fundamentally intractable: they are NEXP-complete in the worst case, and have been empirically observed to be beyond feasible optimal solution.To get around these obstacles, researchers have focused on special classes of the general Dec-POMDP problem, restricting the degree to which agent actions can interact with one another. In some cases, it has been proven that these sorts of structured forms of interaction can in fact reduce worst-case complexity. Where formal proofs have been lacking, empirical observations suggest that this may also be true for other cases, although less is known precisely.This thesis unifies a range of this existing work, extending analysis to establish novel complexity results for some popular restricted-interaction models. We also establish some new results concerning cases for which reduced complexity has been proven, showing correspondences between basic structural features and the potential for dimensionality reduction when employing mathematical programming techniques.As our new complexity results establish that worst-case intractability is more widespread than previously known, we look to new ways of analyzing the potential average-case difficulty of Dec-POMDP instances. As this would be extremely difficult using the tools of traditional complexity theory, we take a more empirical approach. In so doing, we identify new analytical measures that apply to all Dec-POMDPs, whatever their structure. These measures allow us to identify problems that are potentially easier to solve on average, and validate this claim empirically. As we show, the performance of well-known optimal dynamic programming methods correlates with our new measure of difficulty. Finally, we explore the approximate case, showing that our measure works well as a predictor of difficulty there, too, and provides a means of setting algorithm parameters to achieve far more efficient performance.

APA, Harvard, Vancouver, ISO, and other styles

32

Hänsel, Rosemarie. "Abschied von Ingeborg Pomp." Saechsische Landesbibliothek- Staats- und Universitaetsbibliothek Dresden, 2007. http://nbn-resolving.de/urn:nbn:de:swb:14-1184240704601-52673.

Full text

Abstract:

Am 4. Juni 2007 erhielten wir die traurige Nachricht, dass Ingeborg Barbara Pomp, Leiterin i. R. der Stenografischen Sammlung von 1996 bis 2006 nach kurzer schwerer Krankheit verstorben ist. Die Mitarbeiterinnen und Mitarbeiter der Sächsischen Landesbibliothek – Staats- und Universitätsbibliothek trauern um eine liebenswerte Kollegin, die sich immer mit einem Höchstmaß an persönlichem Einsatz für die Weiterentwicklung der Stenografischen Sammlung engagiert hat.

APA, Harvard, Vancouver, ISO, and other styles

33

Aras, Raghav. "Mathematical programming methods for decentralized POMDPs." Thesis, Nancy 1, 2008. http://www.theses.fr/2008NAN10092/document.

Full text

Abstract:

Nous étudions le problème du contrôle optimale décentralisé d'un processus de Markoff partiellement observé sur un horizon fini. Mathématiquement, ce problème se défini comme un DEC-POMDP. Plusieurs problèmes des domaines de l'intélligence artificielles et recherche opérationelles se formalisent comme des DEC-POMDPs. Résoudre un DEC-POMDP dans une mannière exacte est un problème difficile (NEXP-dur). Pourtant, des algorithmes exactes sont importants du point de vue des algorithmes approximés pour résoudre des problèmes pratiques. Les algorithmes existants sont nettement inefficace même pour des DEC-POMDP d'une très petite taille. Dans cette thèse, nous proposons une nouvelle approche basée sur la programmation mathématique. En utilisant la forme séquentielle d'une politique, nous montrons que ce problème peut être formalisé comme un programme non-linéaire. De plus, nous montrons comment transformer ce programme nonl-linéaire un des programmes linéaire avec des variables bivalents et continus (0-1 MIPs). L'éxpérience computationelle sur quatres problèmes DEC-POMDP standards montrent que notre approche trouve une politique optimale beaucoup plus rapidement que des approches existantes. Le temps réduit des heures aux seconds ou minutes
In this thesis, we study the problem of the optimal decentralized control of a partially observed Markov process over a finite horizon. The mathematical model corresponding to the problem is a decentralized POMDP (DEC-POMDP). Many problems in practice from the domains of artificial intelligence and operations research can be modeled as DEC-POMDPs. However, solving a DEC-POMDP exactly is intractable (NEXP-hard). The development of exact algorithms is necessary in order to guide the development of approximate algorithms that can scale to practical sized problems. Existing algorithms are mainly inspired from POMDP research (dynamic programming and forward search) and require an inordinate amount of time for even very small DEC-POMDPs. In this thesis, we develop a new mathematical programming based approach for exactly solving a finite horizon DEC-POMDP. We use the sequence form of a control policy in this approach. Using the sequence form, we show how the problem can be formulated as a mathematical progam with a nonlinear object and linear constraints. We thereby show how this nonlinear program can be linearized to a 0-1 mixed integer linear program (MIP). We present two different 0-1 MIPs based on two different properties of a DEC-POMDP. The computational experience of the mathematical programs presented in the thesis on four benchmark problems (MABC, MA-Tiger, Grid Meeting, Fire Fighting) shows that the time taken to find an optimal joint policy is one or two orders or magnitude lesser than the exact existing algorithms. In the problems tested, the time taken drops from several hours to a few seconds or minutes

APA, Harvard, Vancouver, ISO, and other styles

34

Aras, Raghav Charpillet François Dutech Alain. "Mathematical programming methods for decentralized POMDPs." S. l. : Nancy 1, 2008. http://www.scd.uhp-nancy.fr/docnum/SCD_T_2008_0092_ARAS.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Endo, Yoichiro. "Countering Murphys law the use of anticipation and improvisation via an episodic memory in support of intelligent robot behavior /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26466.

Full text

Abstract:

Thesis (Ph.D)--Computing, Georgia Institute of Technology, 2009.
Committee Chair: Arkin, Ronald; Committee Member: Balch, Tucker; Committee Member: Dellaert, Frank; Committee Member: Potter, Steve; Committee Member: Ram, Ashwin. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

36

Ortiz, Olga L. "Stochastic inventory control with partial demand observability." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/22551.

Full text

Abstract:

Thesis (Ph. D.)--Industrial and Systems Engineering, Georgia Institute of Technology, 2008.
Committee Co-Chair: Alan L Erera; Committee Co-Chair: Chelsea C, White III; Committee Member: Julie Swann; Committee Member: Paul Griffin; Committee Member: Soumen Ghosh.

APA, Harvard, Vancouver, ISO, and other styles

37

Brooks, Alex. "Parametric POMDPs for planning in continuous state spaces." University of Sydney, 2007. http://hdl.handle.net/2123/1861.

Full text

Abstract:

PhD
This thesis is concerned with planning and acting under uncertainty in partially-observable continuous domains. In particular, it focusses on the problem of mobile robot navigation given a known map. The dominant paradigm for robot localisation is to use Bayesian estimation to maintain a probability distribution over possible robot poses. In contrast, control algorithms often base their decisions on the assumption that a single state, such as the mode of this distribution, is correct. In scenarios involving significant uncertainty, this can lead to serious control errors. It is generally agreed that the reliability of navigation in uncertain environments would be greatly improved by the ability to consider the entire distribution when acting, rather than the single most likely state. The framework adopted in this thesis for modelling navigation problems mathematically is the Partially Observable Markov Decision Process (POMDP). An exact solution to a POMDP problem provides the optimal balance between reward-seeking behaviour and information-seeking behaviour, in the presence of sensor and actuation noise. Unfortunately, previous exact and approximate solution methods have had difficulty scaling to real applications. The contribution of this thesis is the formulation of an approach to planning in the space of continuous parameterised approximations to probability distributions. Theoretical and practical results are presented which show that, when compared with similar methods from the literature, this approach is capable of scaling to larger and more realistic problems. In order to apply the solution algorithm to real-world problems, a number of novel improvements are proposed. Specifically, Monte Carlo methods are employed to estimate distributions over future parameterised beliefs, improving planning accuracy without a loss of efficiency. Conditional independence assumptions are exploited to simplify the problem, reducing computational requirements. Scalability is further increased by focussing computation on likely beliefs, using metric indexing structures for efficient function approximation. Local online planning is incorporated to assist global offline planning, allowing the precision of the latter to be decreased without adversely affecting solution quality. Finally, the algorithm is implemented and demonstrated during real-time control of a mobile robot in a challenging navigation task. We argue that this task is substantially more challenging and realistic than previous problems to which POMDP solution methods have been applied. Results show that POMDP planning, which considers the evolution of the entire probability distribution over robot poses, produces significantly more robust behaviour when compared with a heuristic planner which considers only the most likely states and outcomes.

APA, Harvard, Vancouver, ISO, and other styles

38

Brooks, Alex M. "Parametric POMDPs for planning in continuous state spaces." Connect to full text, 2007. http://hdl.handle.net/2123/1861.

Full text

Abstract:

Thesis (Ph. D.)--University of Sydney, 2007.
Title from title screen (viewed 15 January 2009). Submitted in fulfilment of the requirements for the degree of Doctor of Philosophy to the Australian Centre for Field Robotics, School of Aerospace, Mechanical and Mechatronic Engineering. Includes bibliographical references. Also available in print form.

APA, Harvard, Vancouver, ISO, and other styles

39

Parikh, Rachel. "Persian pomp, Indian circumstance : the Khalili Falnama." Thesis, University of Cambridge, 2014. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.648619.

Full text

APA, Harvard, Vancouver, ISO, and other styles

40

Jarzyna, Tomasz. "Modelowanie i analiza dynamiczna pionowych pomp diagonalnych." Rozprawa doktorska, [Nakł.aut.], 2011. http://dlibra.utp.edu.pl/Content/272.

Full text

Abstract:

Celem pracy jest zbadanie wpływu parametrów mechanicznych na stan drganiowy pompy. Praca zawiera przegląd stanu wiedzy z zakresu dynamiki pomp, maszyn wirnikowych, analizy sił hydraulicznych oddziałujących na tarcze wirnikowe oraz modelowania wałów, badania doświdczalne na obiekcie rzeczywistym, wyznaczanie wartości sił hydraulicznych towarzyszących pracy tarcz wirnikowych, opracowanie i weryfikację modelu matematycznego ujmującego fizyczne parametry podparć, przeprowadzenie analizy numerycznej wału wraz z osadzomymi na nim tarczami wirnikowymi

APA, Harvard, Vancouver, ISO, and other styles

41

Atrash, Amin. "A Bayesian Framework for Online Parameter Learning in POMDPs." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104587.

Full text

Abstract:

Decision-making under uncertainty has become critical as autonomous and semi-autonomous agents become more ubiquitious in our society. These agents must deal with uncertainty and ambiguity from the environment and still perform desired tasks robustly. Partially observable Markov decision processes (POMDPs) provide a principled mathematical framework for modelling agents operating in such an environment. These models are able to capture the uncertainty from noisy sensors, inaccurate actuators, and perform decision-making in light of the agent's incomplete knowledge of the world. POMDPs have been applied successfully in domains ranging from robotics to dialogue management to medical systems. Extensive research has been conducted on methods for optimizing policies for POMDPs. However, these methods typically assume a model of the environment is known. This thesis presents a Bayesian reinforcement learning framework for learning POMDP parameters during execution. This framework takes advantage of agents which work alongside an operator who can provide optimal policy information to help direct the learning. By using Bayesian reinforcement learning, the agent can perform learning concurrently with execution, incorporate incoming data immediately, and take advantage of prior knowledge of the world. By using such a framework, an agent is able to adapt its policy to that of the operator. This framework is validated on data collected from the interaction manager of an autonomous wheelchair. The interaction manager acts as an intelligent interface between the user and the robot, allowing the user to issue high-level commands through natural interface such as speech. This interaction manager is controlled using a POMDP and acts as a rich scenario for learning in which the agent must adjust to the needs of the user over time.
Comme le nombre d'agents autonomes et semi-autonomes dansnotre société ne cesse de croître, les prises de décisions sous incertitude constituent désormais un problème critique. Malgré l'incertitude et l'ambiguité inhérentes à leurs environnements, ces agents doivent demeurer robustes dans l'exécution de leurs tâches. Les processus de décision markoviens partiellement observables (POMDP) offrent un cadre mathématique permettant la modélisation des agents et de leurs environnements. Ces modèles sont capables de capturer l'incertitude due aux perturbations dans les capteurs ainsi qu'aux actionneurs imprécis. Ils permettent conséquemment une prise de décision tenant compte des connaissances imparfaites des agents. À ce jour, les POMDP ont été utilisés avec succès dans un éventail de domaines, allant de la robotique à la gestion de dialogue, en passant par la médecine. Plusieurs travaux de recherche se sont penchés sur des méthodes visant à optimiser les POMDP. Cependant, ces méthodes requièrent habituellement un modèle environnemental préalablement connu. Dans ce mémoire, une méthode bayésienne d'apprentissage par renforcement est présentée, avec laquelle il est possible d'apprendre les paramètres du modèle POMDP pendant l'éxécution. Cette méthode tire avantage d'une coopération avec un opérateur capable de guider l'apprentissage en divulguant certaines données optimales. Avec l'aide du renforcement bayésien, l'agent peut apprendre pendant l'éxécution, incorporer immédiatement les données nouvelles et profiter des connaissances précédentes, pour finalement pouvoir adapter sa politique de décision à celle de l'opérateur. La méthodologie décrite est validée à l'aide de données produites par le gestionnaire d'interactions d'une chaise roulante autonome. Ce gestionnaire prend la forme d'une interface intelligente entre le robot et l'usager, permettant à celui-ci de stipuler des commandes de haut niveau de façon naturelle, par exemple en parlant à voix haute. Les fonctions du gestionnaire sont accomplies à l'aide d'un POMDP et constituent un scénario d'apprentissage idéal, dans lequel l'agent doit s'ajuster progressivement aux besoins de l'usager.

APA, Harvard, Vancouver, ISO, and other styles

42

Skoglund, Caroline. "Risk-aware Autonomous Driving Using POMDPs and Responsibility-Sensitive Safety." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300909.

Full text

Abstract:

Autonomous vehicles promise to play an important role aiming at increased efficiency and safety in road transportation. Although we have seen several examples of autonomous vehicles out on the road over the past years, how to ensure the safety of autonomous vehicle in the uncertain and dynamic environment is still a challenging problem. This thesis studies this problem by developing a risk-aware decision making framework. The system that integrates the dynamics of an autonomous vehicle and the uncertain environment is modelled as a Partially Observable Markov Decision Process (POMDP). A risk measure is proposed based on the Responsibility-Sensitive Safety (RSS) distance, which quantifying the minimum distance to other vehicles for ensuring safety. This risk measure is incorporated into the reward function of POMDP for achieving a risk-aware decision making. The proposed risk-aware POMDP framework is evaluated in two case studies. In a single-lane car following scenario, it is shown that the ego vehicle is able to successfully avoid a collision in an emergency event where a vehicle in front of it makes a full stop. In the merge scenario, the ego vehicle successfully enters the main road from a ramp with a satisfactory distance to other vehicles. As a conclusion, the risk-aware POMDP framework is able to realize a trade-off between safety and usability by keeping a reasonable distance and adapting to other vehicles behaviours.
Autonoma fordon förutspås spela en stor roll i framtiden med målen att förbättra effektivitet och säkerhet för vägtransporter. Men även om vi sett flera exempel av autonoma fordon ute på vägarna de senaste åren är frågan om hur säkerhet ska kunna garanteras ett utmanande problem. Det här examensarbetet har studerat denna fråga genom att utveckla ett ramverk för riskmedvetet beslutsfattande. Det autonoma fordonets dynamik och den oförutsägbara omgivningen modelleras med en partiellt observerbar Markov-beslutsprocess (POMDP från engelskans “Partially Observable Markov Decision Process”). Ett riskmått föreslås baserat på ett säkerhetsavstånd förkortat RSS (från engelskans “Responsibility-Sensitive Safety”) som kvantifierar det minsta avståndet till andra fordon för garanterad säkerhet. Riskmåttet integreras i POMDP-modellens belöningsfunktion för att åstadkomma riskmedvetna beteenden. Den föreslagna riskmedvetna POMDP-modellen utvärderas i två fallstudier. I ett scenario där det egna fordonet följer ett annat fordon på en enfilig väg visar vi att det egna fordonet kan undvika en kollision då det framförvarande fordonet bromsar till stillastående. I ett scenario där det egna fordonet ansluter till en huvudled från en ramp visar vi att detta görs med ett tillfredställande avstånd till andra fordon. Slutsatsen är att den riskmedvetna POMDP-modellen lyckas realisera en avvägning mellan säkerhet och användbarhet genom att hålla ett rimligt säkerhetsavstånd och anpassa sig till andra fordons beteenden.

APA, Harvard, Vancouver, ISO, and other styles

43

Wright, Allan. "Frank Zappa's orchestral works art music or "bogus pomp"? /." Connect to e-thesis, 2007. http://theses.gla.ac.uk/492/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Cohen, Jonathan. "Formation dynamique d'équipes dans les DEC-POMDPS ouverts à base de méthodes Monte-Carlo." Thesis, Normandie, 2019. http://www.theses.fr/2019NORMC225/document.

Full text

Abstract:

Cette thèse traite du problème où une équipe d'agents coopératifs et autonomes, évoluant dans un environnement stochastique partiellement observable, et œuvrant à la résolution d'une tâche complexe, doit modifier dynamiquement sa composition durant l'exécution de la tâche afin de s'adapter à l'évolution de celle-ci. Il s'agit d'un problème qui n'a été que peu étudié dans le domaine de la planification multi-agents. Pourtant, il existe de nombreuses situations où l'équipe d'agent mobilisée est amenée à changer au fil de l'exécution de la tâche.Nous nous intéressons plus particulièrement au cas où les agents peuvent décider d'eux-même de quitter ou de rejoindre l'équipe opérationnelle. Certaines fois, utiliser peu d'agents peut être bénéfique si les coûts induits par l'utilisation des agents sont trop prohibitifs. Inversement, il peut parfois être utile de faire appel à plus d'agents si la situation empire et que les compétences de certains agents se révèlent être de précieux atouts.Afin de proposer un modèle de décision qui permette de représenter ces situations, nous nous basons sur les processus décisionnels de Markov décentralisés et partiellement observables, un modèle standard utilisé dans le cadre de la planification multi-agents sous incertitude. Nous étendons ce modèle afin de permettre aux agents d'entrer et sortir du système. On parle alors de système ouvert. Nous présentons également deux algorithmes de résolution basés sur les populaires méthodes de recherche arborescente Monte-Carlo. Le premier de ces algorithmes nous permet de construire des politiques jointes séparables via des calculs de meilleures réponses successives, tandis que le second construit des politiques jointes non séparables en évaluant les équipes dans chaque situation via un système de classement Elo. Nous évaluons nos méthodes sur de nouveaux jeux de tests qui permettent de mettre en lumière les caractéristiques des systèmes ouverts
This thesis addresses the problem where a team of cooperative and autonomous agents, working in a stochastic and partially observable environment towards solving a complex task, needs toe dynamically modify its structure during the process execution, so as to adapt to the evolution of the task. It is a problem that has been seldom studied in the field of multi-agent planning. However, there are many situations where the team of agents is likely to evolve over time.We are particularly interested in the case where the agents can decide for themselves to leave or join the operational team. Sometimes, using few agents can be for the greater good. Conversely, it can sometimes be useful to call on more agents if the situation gets worse and the skills of some agents turn out to be valuable assets.In order to propose a decision model that can represent those situations, we base upon the decentralized and partially observable Markov decision processes, the standard model for planning under uncertainty in decentralized multi-agent settings. We extend this model to allow agents to enter and exit the system. This is what is called agent openness. We then present two planning algorithms based on the popular Monte-Carlo Tree Search methods. The first algorithm builds separable joint policies by computing series of best responses individual policies, while the second algorithm builds non-separable joint policies by ranking the teams in each situation via an Elo rating system. We evaluate our methods on new benchmarks that allow to highlight some interesting features of open systems

APA, Harvard, Vancouver, ISO, and other styles

45

Paulmann, Johannes. "Pomp und Politik : Monarchenbegegnungen in Europa zwischen Ancien Régime und Erstem Weltkrieg /." Paderborn [u.a.] : Schöningh, 2000. http://www.gbv.de/dms/bs/toc/31944564x.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

46

Błaszczyk, Andrzej. "Metoda projektowania pomp o specjalnych wymaganiach eksploatacyjno-ruchowych z wykorzystaniem numerycznej analizy przepływów trójwymiarowych /." Łódź : Wydawn. Politechn, 2003. http://www.gbv.de/dms/goettingen/372713971.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Pomp, Sarah [Verfasser]. "The role of depressive symptoms in the process of health behavior change / Sarah Pomp." Berlin : Freie Universität Berlin, 2012. http://d-nb.info/1027815340/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Fricke, Benjamin. "Lokalisation, Isolierung und in vitro Generierung von Assemblierungsintermediaten des humanen 20S Proteasoms." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät I, 2006. http://dx.doi.org/10.18452/15533.

Full text

Abstract:

Das 20S Proteasom bildet den Protein degradierenden Teil des Ubiquitin-Proteasom-Systems (UPS) und ist damit an wichtigen zellulären Prozessen wie Genexpressionskontrolle, Zellzykluskontrolle, Apoptose, Peptidgenerierung zur MHC Klasse I Präsentation und Degradation fehlgefalteter Proteine beteiligt. Die einzelnen Schritte der Biogenese des 20S Proteasoms in Eukaryoten sind bisher nur in Ansätzen verstanden. In dieser Arbeit wird die Untereinheitenzusammensetzung von Biogeneseintermediaten und ihre subzelluläre Lokalisation und Organisation in humanen Zelllinien untersucht. Durch die Etablierung eines in vitro Systems konnten distinkte Assemblierungsintermediate humaner Proteasomen generiert werden und ein (-Ring als früheres Intermediat im in vitro System nachgewiesen werden. Aufschluss über den weiteren Vorgang der Assemblierung wurden durch in vivo Experimente mit radioaktiv markierten HeLa Zellextrakten gewonnen. So konnten vor allem neu synthetisierte und zuletzt eingebaute Untereinheiten identifiziert werden. Hierzu gehören die Untereinheiten ?1 und (7, die aufgrund ihrer in trans agierenden C-terminalen Verlängerungen einen Dimerisierungsprozess zweier Halbproteasom-Vorläuferkomplexe forcieren. Darüber hinaus kann die Untereinheit (1 aufgrund der gewonnen Erkenntnisse als die wahrscheinlich den (-Ring schließende Untereinheit im Vorläuferkomplex postuliert werden. Proteasomale Assemblierungsintermediate konnten außerdem durch immuncytochemische und biochemische Methoden am ER von humanen Zelllinien lokalisiert werden. Dabei scheint dem Assemblierungsfaktor POMP eine Schlüsselrolle zuzukommen, da dieser eine Assoziation der Vorläuferkomplexe mit dem ER erst ermöglicht. In dieser Arbeit sind weitere Schritte des komplexen Biogenese-Vorgangs konstitutiver 20S Proteasomen in humanen Zelllinien aufgeklärt worden und es konnte erstmals die subzelluläre Lokalisation für Assemblierungsintermediate in humanen Zellen beschrieben werden.
The 20S Proteasom represents the protein degrading part of the Ubiquitin- Proteasom- System and is therefore a participant in important cellular processes like gene expression, cell cycle control, apoptosis, peptide generation for MHC class I presentation and degradation of misfolded proteins. Only the beginnings of the individual steps of the 20S proteasome biogenesis in eucaryotes are so far understood. Tis work examines the subunit composition of assembly intermediates and their subcellular localisation and organisation in eucaryotic cells. Distinct assembly intermediates of human proteasomes have been generated by establishing an in vitro system. As an earlier intermediate in the in vitro system an (-ring could be identified. In vivo experiments using radioactive marked total lysates of HeLa cells shed light on the following sequence of assembly. Thus new synthesised and finally incorporated subunits could be indentified. Two of this subunits are (1 and (7 which could force the dimerisation process of two half-proteasome-precursor by their trans acting c-terminal extensions. Furthermore the (1 subunit has been identified as the (-ring completing subunit in the precursor complex. In addition it was possible to detect proteasomal assembly intermediates through immuncytochemical and biochemical methods on the ER of human cell lines. Thereby the assembly factor POMP plays a key role as it allows in the first place the precursor association with the ER. This work clarifies further steps of complex procedure of biogenesis of constitutive 20S proteasomes in human cell lines and allows the characterisation of the subcellular localisation of assembly intermediates in human cells for the first time.

APA, Harvard, Vancouver, ISO, and other styles

49

Keryell, Ronan. "Pomp : d'un petit ordinateur massivement parallele smid a base de processeurs risc concepts, etude et realisation." Paris 11, 1992. http://www.theses.fr/1992PA112499.

Full text

Abstract:

Le parallelisme est un moyen efficace d'augmenter les performances des ordinateurs plus rapidement que l'evolution technologique des composants elementaires le permet a priori. Une classe de parallelisme, le parallelisme de donnees, est particulierement interessante car elle reflete bien le parallelisme habituel des algorithmes de calcul numerique qui manipulent generalement de gros ensembles de donnees. Dans cette optique nous presentons un petit ordinateur massivement parallele (pomp) simd qui offrirait des performances elevees dans un petit volume. Alors que classiquement simd implique le developpement d'un processeur, nous proposons la perversion d'un processeur commercial simd a gros grain ainsi qu'un couplage vliw avec le processeur scalaire, augmentant ainsi la densite de puissance. Un langage parallele (pompc) base sur c est presente pour exprimer le parallelisme de donnees explicite des programmes. Une methodologie de compilation nous permet de recuperer l'environnement de programmation du processeur et de programmer aussi d'autres machines paralleles (cm-2, mp-1, ipsc/860) ou pas. De nouvelles methodes de controle de flot simd sont exposees pour augmenter l'efficacite de celui-ci. Ensuite nous developpons un nouveau type de reseau hybride statique/dynamique pour les communications. A titre comparatif, une machine spmd est presentee, laissant entrevoir une evolution possible de pomp. Enfin, deux applications issues de la physique sont decrites en pompc

APA, Harvard, Vancouver, ISO, and other styles

50

Lusena, Christopher. "Finite Memory Policies for Partially Observable Markov Decision Proesses." UKnowledge, 2001. http://uknowledge.uky.edu/gradschool_diss/323.

Full text

Abstract:

This dissertation makes contributions to areas of research on planning with POMDPs: complexity theoretic results and heuristic techniques. The most important contributions are probably the complexity of approximating the optimal history-dependent finite-horizon policy for a POMDP, and the idea of heuristic search over the space of FFTs.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!