Dissertations / Theses on the topic 'Markov Decision Process Planning'

To see the other types of publications on this topic, follow the link: Markov Decision Process Planning.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Markov Decision Process Planning.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Geng, Na. "Combinatorial optimization and Markov decision process for planning MRI examinations." Phd thesis, Saint-Etienne, EMSE, 2010. http://tel.archives-ouvertes.fr/tel-00566257.

Full text
Abstract:
This research is motivated by our collaborations with a large French university teaching hospital in order to reduce the Length of Stay (LoS) of stroke patients treated in the neurovascular department. Quick diagnosis is critical for stroke patients but relies on expensive and heavily used imaging facilities such as MRI (Magnetic Resonance Imaging) scanners. Therefore, it is very important for the neurovascular department to reduce the patient LoS by reducing their waiting time of imaging examinations. From the neurovascular department perspective, this thesis proposes a new MRI examinations reservation process in order to reduce patient waiting times without degrading the utilization of MRI. The service provider, i.e., the imaging department, reserves each week a certain number of appropriately distributed contracted time slots (CTS) for the neurovascular department to ensure quick MRI examination of stroke patients. In addition to CTS, it is still possible for stroke patients to get MRI time slots through regular reservation (RTS). This thesis first proposes a stochastic programming model to simultaneously determine the contract decision, i.e., the number of CTS and its distribution, and the patient assignment policy to assign patients to either CTS or RTS. To solve this problem, structure properties of the optimal patient assignment policy for a given contract are proved by an average cost Markov decision process (MDP) approach. The contract is determined by a Monte Carlo approximation approach and then improved by local search. Computational experiments show that the proposed algorithms can efficiently solve the model. The new reservation process greatly reduces the average waiting time of stroke patients. At the same time, some CTS cannot be used for the lack of patients.To reduce the unused CTS, we further explore the possibility of the advance cancellation of CTS. Structure properties of optimal control policies for one-day and two-day advance cancellation are established separately via an average-cost MDP approach with appropriate modeling and advanced convexity concepts used in control of queueing systems. Computational experiments show that appropriate advance cancellations of CTS greatly reduce the unused CTS with nearly the same waiting times.
APA, Harvard, Vancouver, ISO, and other styles
2

Dai, Peng. "FASTER DYNAMIC PROGRAMMING FOR MARKOV DECISION PROCESSES." UKnowledge, 2007. http://uknowledge.uky.edu/gradschool_theses/428.

Full text
Abstract:
Markov decision processes (MDPs) are a general framework used by Artificial Intelligence (AI) researchers to model decision theoretic planning problems. Solving real world MDPs has been a major and challenging research topic in the AI literature. This paper discusses two main groups of approaches in solving MDPs. The first group of approaches combines the strategies of heuristic search and dynamic programming to expedite the convergence process. The second makes use of graphical structures in MDPs to decrease the effort of classic dynamic programming algorithms. Two new algorithms proposed by the author, MBLAO* and TVI, are described here.
APA, Harvard, Vancouver, ISO, and other styles
3

Alizadeh, Pegah. "Elicitation and planning in Markov decision processes with unknown rewards." Thesis, Sorbonne Paris Cité, 2016. http://www.theses.fr/2016USPCD011/document.

Full text
Abstract:
Les processus décisionnels de Markov (MDPs) modélisent des problèmes de décisionsséquentielles dans lesquels un utilisateur interagit avec l’environnement et adapte soncomportement en prenant en compte les signaux de récompense numérique reçus. La solutiond’unMDP se ramène à formuler le comportement de l’utilisateur dans l’environnementà l’aide d’une fonction de politique qui spécifie quelle action choisir dans chaque situation.Dans de nombreux problèmes de décision du monde réel, les utilisateurs ont despréférences différentes, donc, les gains de leurs actions sur les états sont différents et devraientêtre re-décodés pour chaque utilisateur. Dans cette thèse, nous nous intéressonsà la résolution des MDPs pour les utilisateurs ayant des préférences différentes.Nous utilisons un modèle nommé MDP à Valeur vectorielle (VMDP) avec des récompensesvectorielles. Nous proposons un algorithme de recherche-propagation qui permetd’attribuer une fonction de valeur vectorielle à chaque politique et de caractériser chaqueutilisateur par un vecteur de préférences sur l’ensemble des fonctions de valeur, où levecteur de préférence satisfait les priorités de l’utilisateur. Etant donné que le vecteurde préférences d’utilisateur n’est pas connu, nous présentons plusieurs méthodes pourrésoudre des MDP tout en approximant le vecteur de préférence de l’utilisateur.Nous introduisons deux algorithmes qui réduisent le nombre de requêtes nécessairespour trouver la politique optimale d’un utilisateur: 1) Un algorithme de recherchepropagation,où nous propageons un ensemble de politiques optimales possibles pourle MDP donné sans connaître les préférences de l’utilisateur. 2) Un algorithme interactifd’itération de la valeur (IVI) sur les MDPs, nommé algorithme d’itération de la valeurbasé sur les avantages (ABVI) qui utilise le clustering et le regroupement des avantages.Nous montrons également comment l’algorithme ABVI fonctionne correctement pourdeux types d’utilisateurs différents: confiant et incertain.Nous travaillons finalement sur une méthode d’approximation par critére de regret minimaxcomme méthode pour trouver la politique optimale tenant compte des informationslimitées sur les préférences de l’utilisateur. Dans ce système, tous les objectifs possiblessont simplement bornés entre deux limites supérieure et inférieure tandis que le systèmeine connaît pas les préférences de l’utilisateur parmi ceux-ci. Nous proposons une méthodeheuristique d’approximation par critère de regret minimax pour résoudre des MDPsavec des récompenses inconnues. Cette méthode est plus rapide et moins complexe queles méthodes existantes dans la littérature
Markov decision processes (MDPs) are models for solving sequential decision problemswhere a user interacts with the environment and adapts her policy by taking numericalreward signals into account. The solution of an MDP reduces to formulate the userbehavior in the environment with a policy function that specifies which action to choose ineach situation. In many real world decision problems, the users have various preferences,and therefore, the gain of actions on states are different and should be re-decoded foreach user. In this dissertation, we are interested in solving MDPs for users with differentpreferences.We use a model named Vector-valued MDP (VMDP) with vector rewards. We propose apropagation-search algorithm that allows to assign a vector-value function to each policyand identify each user with a preference vector on the existing set of preferences wherethe preference vector satisfies the user priorities. Since the user preference vector is notknown we present several methods for solving VMDPs while approximating the user’spreference vector.We introduce two algorithms that reduce the number of queries needed to find the optimalpolicy of a user: 1) A propagation-search algorithm, where we propagate a setof possible optimal policies for the given MDP without knowing the user’s preferences.2) An interactive value iteration algorithm (IVI) on VMDPs, namely Advantage-basedValue Iteration (ABVI) algorithm that uses clustering and regrouping advantages. Wealso demonstrate how ABVI algorithm works properly for two different types of users:confident and uncertain.We finally work on a minimax regret approximation method as a method for findingthe optimal policy w.r.t the limited information about user’s preferences. All possibleobjectives in the system are just bounded between two higher and lower bounds while thesystem is not aware of user’s preferences among them. We propose an heuristic minimaxregret approximation method for solving MDPs with unknown rewards that is faster andless complex than the existing methods in the literature
APA, Harvard, Vancouver, ISO, and other styles
4

Ernsberger, Timothy S. "Integrating Deterministic Planning and Reinforcement Learning for Complex Sequential Decision Making." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1354813154.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Al, Sabban Wesam H. "Autonomous vehicle path planning for persistence monitoring under uncertainty using Gaussian based Markov decision process." Thesis, Queensland University of Technology, 2015. https://eprints.qut.edu.au/82297/1/Wesam%20H_Al%20Sabban_Thesis.pdf.

Full text
Abstract:
One of the main challenges facing online and offline path planners is the uncertainty in the magnitude and direction of the environmental energy because it is dynamic, changeable with time, and hard to forecast. This thesis develops an artificial intelligence for a mobile robot to learn from historical or forecasted data of environmental energy available in the area of interest which will help for a persistence monitoring under uncertainty using the developed algorithm.
APA, Harvard, Vancouver, ISO, and other styles
6

Poulin, Nolan. "Proactive Planning through Active Policy Inference in Stochastic Environments." Digital WPI, 2018. https://digitalcommons.wpi.edu/etd-theses/1267.

Full text
Abstract:
In multi-agent Markov Decision Processes, a controllable agent must perform optimal planning in a dynamic and uncertain environment that includes another unknown and uncontrollable agent. Given a task specification for the controllable agent, its ability to complete the task can be impeded by an inaccurate model of the intent and behaviors of other agents. In this work, we introduce an active policy inference algorithm that allows a controllable agent to infer a policy of the environmental agent through interaction. Active policy inference is data-efficient and is particularly useful when data are time-consuming or costly to obtain. The controllable agent synthesizes an exploration-exploitation policy that incorporates the knowledge learned about the environment's behavior. Whenever possible, the agent also tries to elicit behavior from the other agent to improve the accuracy of the environmental model. This is done by mapping the uncertainty in the environmental model to a bonus reward, which helps elicit the most informative exploration, and allows the controllable agent to return to its main task as fast as possible. Experiments demonstrate the improved sample efficiency of active learning and the convergence of the policy for the controllable agents.
APA, Harvard, Vancouver, ISO, and other styles
7

Pokharel, Gaurab. "Increasing the Value of Information During Planning in Uncertain Environments." Oberlin College Honors Theses / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=oberlin1624976272271825.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Stárek, Ivo. "Plánování cesty robota pomocí dynamického programování." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2009. http://www.nusl.cz/ntk/nusl-228655.

Full text
Abstract:
This work is dedicated to robot path planning with using principles of dynamic programing in discrete state space. Theoretical part is dedicated to actual situation in this field and to principle of applying Markov decission process to path planning. Practical part is dedicated to implementation of two algorithms based on MDP principles.
APA, Harvard, Vancouver, ISO, and other styles
9

Pinheiro, Paulo Gurgel 1983. "Localização multirrobo cooperativa com planejamento." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276155.

Full text
Abstract:
Orientador: Jacques Wainer
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-09-11T21:14:07Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_M.pdf: 1259816 bytes, checksum: a4783df9aa3755becb68ee233ad43e3c (MD5) Previous issue date: 2009
Resumo: Em um problema de localização multirrobô cooperativa, um grupo de robôs encontra-se em um determinado ambiente, cuja localização exata de cada um dos robôs é desconhecida. Neste cenário, uma distribuição de probabilidades aponta as chances de um robô estar em um determinado estado. É necessário então, que os robôs se movimentem pelo ambiente e gerem novas observações que serão compartilhadas, para calcular novas estimativas. Nos últimos anos, muitos trabalhos têm focado no estudo de técnicas probabilísticas, modelos de comunicação e modelos de detecções, para resolver o problema de localização. No entanto, a movimentação dos robôs é, em geral, definida por ações aleatórias. Ações aleatórias geram observações que podem ser inúteis para a melhoria da estimativa. Este trabalho apresenta uma proposta de localização com suporte a planejamento de ações. O objetivo é apresentar um modelo cujas ações realizadas pelos robôs são definidas por políticas. Escolhendo a melhor ação a ser realizada, é possível receber informações mais úteis dos sensores internos e externos e estimar as posturas mais rapidamente. O modelo proposto, denominado Modelo de Localização Planejada - MLP, utiliza POMDPs para modelar os problemas de localização e algoritmos específicos de geração de políticas. Foi utilizada a localização de Markov como técnica probabilística de localização e implementadas versões de modelos de detecção e propagação de informação. Neste trabalho, um simulador de problemas de localização multirrobô foi desenvolvido, no qual foram realizados experimentos em que o modelo proposto foi comparado a um modelo que não faz uso de planejamento de ações. Os resultados obtidos apontam que o modelo proposto é capaz de estimar as posturas dos robôs com uma menor quantidade de passos, sendo significativamente mais e ciente do que o modelo comparado sem planejamento.
Abstract: In a cooperative multi-robot localization problem, a group of robots is in a certain environment, where the exact location of each robot is unknown. In this scenario, there is only a distribution of probabilities indicating the chance of a robot to be in a particular state. It is necessary for the robots to move in the environment generating new observations, which will be shared to calculate new estimates. Currently, many studies have focused on the study of probabilistic techniques, models of communication and models of detection to solve the localization problem. However, the movement of robots is generally defined by random actions. Random actions generate observations that can be useless for improving the estimate. This work describes a proposal for multi-robot localization with support planning of actions. The objective is to describe a model whose actions performed by robots are defined by policies. Choosing the best action to be performed, the robot gets more useful information from internal and external sensors and estimates the posture more quickly. The proposed model, called Model of Planned Localization - MPL, uses POMDPs to model the problems of location and specific algorithms to generate policies. The Markov localization was used as probabilistic technique of localization and implemented versions of detection models and information propagation model. In this work, a simulator to multi-robot localization problems was developed, in which experiments were performed. The proposed model was compared to a model that does not make use of planning actions. The results showed that the proposed model is able to estimate the positions of robots with lower number of steps, being more e-cient than model compared.
Mestrado
Inteligencia Artificial
Mestre em Ciência da Computação
APA, Harvard, Vancouver, ISO, and other styles
10

Junyent, Barbany Miquel. "Width-Based Planning and Learning." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/672779.

Full text
Abstract:
Optimal sequential decision making is a fundamental problem to many diverse fields. In recent years, Reinforcement Learning (RL) methods have experienced unprecedented success, largely enabled by the use of deep learning models, reaching human-level performance in several domains, such as the Atari video games or the ancient game of Go. In contrast to the RL approach in which the agent learns a policy from environment interaction samples, ignoring the structure of the problem, the planning approach for decision making assumes known models for the agent's goals and domain dynamics, and focuses on determining how the agent should behave to achieve its objectives. Current planners are able to solve problem instances involving huge state spaces by precisely exploiting the problem structure that is defined in the state-action model. In this work we combine the two approaches, leveraging fast and compact policies from learning methods and the capacity to perform lookaheads in combinatorial problems from planning methods. In particular, we focus on a family of planners called width-based planners, that has demonstrated great success in recent years due to its ability to scale independently of the size of the state space. The basic algorithm, Iterated Width (IW), was originally proposed for classical planning problems, where a model for state transitions and goals, represented by sets of atoms, is fully determined. Nevertheless, width-based planners do not require a fully defined model of the environment, and can be used with simulators. For instance, they have been recently applied in pixel domains such as the Atari games. Despite its success, IW is purely exploratory, and does not leverage past reward information. Furthermore, it requires the state to be factored into features that need to be pre-defined for the particular task. Moreover, running the algorithm with a width larger than 1 in practice is usually computationally intractable, prohibiting IW from solving higher width problems. We begin this dissertation by studying the complexity of width-based methods when the state space is defined by multivalued features, as in the RL setting, instead of Boolean atoms. We provide a tight upper bound on the amount of nodes expanded by IW, as well as overall algorithmic complexity results. In order to deal with more challenging problems (i.e., those with a width higher than 1), we present a hierarchical algorithm that plans at two levels of abstraction. A high-level planner uses abstract features that are incrementally discovered from low-level pruning decisions. We illustrate this algorithm in classical planning PDDL domains as well as in pixel-based simulator domains. In classical planning, we show how IW(1) at two levels of abstraction can solve problems of width 2. To leverage past reward information, we extend width-based planning by incorporating an explicit policy in the action selection mechanism. Our method, called π-IW, interleaves width-based planning and policy learning using the state-actions visited by the planner. The policy estimate takes the form of a neural network and is in turn used to guide the planning step, thus reinforcing promising paths. Notably, the representation learned by the neural network can be used as a feature space for the width-based planner without degrading its performance, thus removing the requirement of pre-defined features for the planner. We compare π-IW with previous width-based methods and with AlphaZero, a method that also interleaves planning and learning, in simple environments, and show that π-IW has superior performance. We also show that the π-IW algorithm outperforms previous width-based methods in the pixel setting of Atari games suite. Finally, we show that the proposed hierarchical IW can be seamlessly integrated with our policy learning scheme, resulting in an algorithm that outperforms flat IW-based planners in Atari games with sparse rewards.
La presa seqüencial de decisions òptimes és un problema fonamental en diversos camps. En els últims anys, els mètodes d'aprenentatge per reforç (RL) han experimentat un èxit sense precedents, en gran part gràcies a l'ús de models d'aprenentatge profund, aconseguint un rendiment a nivell humà en diversos dominis, com els videojocs d'Atari o l'antic joc de Go. En contrast amb l'enfocament de RL, on l'agent aprèn una política a partir de mostres d'interacció amb l'entorn, ignorant l'estructura del problema, l'enfocament de planificació assumeix models coneguts per als objectius de l'agent i la dinàmica del domini, i es basa en determinar com ha de comportar-se l'agent per aconseguir els seus objectius. Els planificadors actuals són capaços de resoldre problemes que involucren grans espais d'estats precisament explotant l'estructura del problema, definida en el model estat-acció. En aquest treball combinem els dos enfocaments, aprofitant polítiques ràpides i compactes dels mètodes d'aprenentatge i la capacitat de fer cerques en problemes combinatoris dels mètodes de planificació. En particular, ens enfoquem en una família de planificadors basats en el width (ample), que han tingut molt èxit en els últims anys gràcies a que la seva escalabilitat és independent de la mida de l'espai d'estats. L'algorisme bàsic, Iterated Width (IW), es va proposar originalment per problemes de planificació clàssica, on el model de transicions d'estat i objectius ve completament determinat, representat per conjunts d'àtoms. No obstant, els planificadors basats en width no requereixen un model de l'entorn completament definit i es poden utilitzar amb simuladors. Per exemple, s'han aplicat recentment a dominis gràfics com els jocs d'Atari. Malgrat el seu èxit, IW és un algorisme purament exploratori i no aprofita la informació de recompenses anteriors. A més, requereix que l'estat estigui factoritzat en característiques, que han de predefinirse per a la tasca en concret. A més, executar l'algorisme amb un width superior a 1 sol ser computacionalment intractable a la pràctica, el que impedeix que IW resolgui problemes de width superior. Comencem aquesta tesi estudiant la complexitat dels mètodes basats en width quan l'espai d'estats està definit per característiques multivalor, com en els problemes de RL, en lloc d'àtoms booleans. Proporcionem un límit superior més precís en la quantitat de nodes expandits per IW, així com resultats generals de complexitat algorísmica. Per fer front a problemes més complexos (és a dir, aquells amb un width superior a 1), presentem un algorisme jeràrquic que planifica en dos nivells d'abstracció. El planificador d'alt nivell utilitza característiques abstractes que es van descobrint gradualment a partir de decisions de poda en l'arbre de baix nivell. Il·lustrem aquest algorisme en dominis PDDL de planificació clàssica, així com en dominis de simuladors gràfics. En planificació clàssica, mostrem com IW(1) en dos nivells d'abstracció pot resoldre problemes de width 2. Per aprofitar la informació de recompenses passades, incorporem una política explícita en el mecanisme de selecció d'accions. El nostre mètode, anomenat π-IW, intercala la planificació basada en width i l'aprenentatge de la política usant les accions visitades pel planificador. Representem la política amb una xarxa neuronal que, al seu torn, s'utilitza per guiar la planificació, reforçant així camins prometedors. A més, la representació apresa per la xarxa neuronal es pot utilitzar com a característiques per al planificador sense degradar el seu rendiment, eliminant així el requisit d'usar característiques predefinides. Comparem π-IW amb mètodes anteriors basats en width i amb AlphaZero, un mètode que també intercala planificació i aprenentatge, i mostrem que π-IW té un rendiment superior en entorns simples. També mostrem que l'algorisme π-IW supera altres mètodes basats en width en els jocs d'Atari. Finalment, mostrem que el mètode IW jeràrquic proposat pot integrar-se fàcilment amb el nostre esquema d'aprenentatge de la política, donant com a resultat un algorisme que supera els planificadors no jeràrquics basats en IW en els jocs d'Atari amb recompenses distants.
La toma secuencial de decisiones óptimas es un problema fundamental en diversos campos. En los últimos años, los métodos de aprendizaje por refuerzo (RL) han experimentado un éxito sin precedentes, en gran parte gracias al uso de modelos de aprendizaje profundo, alcanzando un rendimiento a nivel humano en varios dominios, como los videojuegos de Atari o el antiguo juego de Go. En contraste con el enfoque de RL, donde el agente aprende una política a partir de muestras de interacción con el entorno, ignorando la estructura del problema, el enfoque de planificación asume modelos conocidos para los objetivos del agente y la dinámica del dominio, y se basa en determinar cómo debe comportarse el agente para lograr sus objetivos. Los planificadores actuales son capaces de resolver problemas que involucran grandes espacios de estados precisamente explotando la estructura del problema, definida en el modelo estado-acción. En este trabajo combinamos los dos enfoques, aprovechando políticas rápidas y compactas de los métodos de aprendizaje y la capacidad de realizar búsquedas en problemas combinatorios de los métodos de planificación. En particular, nos enfocamos en una familia de planificadores basados en el width (ancho), que han demostrado un gran éxito en los últimos años debido a que su escalabilidad es independiente del tamaño del espacio de estados. El algoritmo básico, Iterated Width (IW), se propuso originalmente para problemas de planificación clásica, donde el modelo de transiciones de estado y objetivos viene completamente determinado, representado por conjuntos de átomos. Sin embargo, los planificadores basados en width no requieren un modelo del entorno completamente definido y se pueden utilizar con simuladores. Por ejemplo, se han aplicado recientemente en dominios gráficos como los juegos de Atari. A pesar de su éxito, IW es un algoritmo puramente exploratorio y no aprovecha la información de recompensas anteriores. Además, requiere que el estado esté factorizado en características, que deben predefinirse para la tarea en concreto. Además, ejecutar el algoritmo con un width superior a 1 suele ser computacionalmente intratable en la práctica, lo que impide que IW resuelva problemas de width superior. Empezamos esta tesis estudiando la complejidad de los métodos basados en width cuando el espacio de estados está definido por características multivalor, como en los problemas de RL, en lugar de átomos booleanos. Proporcionamos un límite superior más preciso en la cantidad de nodos expandidos por IW, así como resultados generales de complejidad algorítmica. Para hacer frente a problemas más complejos (es decir, aquellos con un width superior a 1), presentamos un algoritmo jerárquico que planifica en dos niveles de abstracción. El planificador de alto nivel utiliza características abstractas que se van descubriendo gradualmente a partir de decisiones de poda en el árbol de bajo nivel. Ilustramos este algoritmo en dominios PDDL de planificación clásica, así como en dominios de simuladores gráficos. En planificación clásica, mostramos cómo IW(1) en dos niveles de abstracción puede resolver problemas de width 2. Para aprovechar la información de recompensas pasadas, incorporamos una política explícita en el mecanismo de selección de acciones. Nuestro método, llamado π-IW, intercala la planificación basada en width y el aprendizaje de la política usando las acciones visitadas por el planificador. Representamos la política con una red neuronal que, a su vez, se utiliza para guiar la planificación, reforzando así caminos prometedores. Además, la representación aprendida por la red neuronal se puede utilizar como características para el planificador sin degradar su rendimiento, eliminando así el requisito de usar características predefinidas. Comparamos π-IW con métodos anteriores basados en width y con AlphaZero, un método que también intercala planificación y aprendizaje, y mostramos que π-IW tiene un rendimiento superior en entornos simples. También mostramos que el algoritmo π-IW supera otros métodos basados en width en los juegos de Atari. Finalmente, mostramos que el IW jerárquico propuesto puede integrarse fácilmente con nuestro esquema de aprendizaje de la política, dando como resultado un algoritmo que supera a los planificadores no jerárquicos basados en IW en los juegos de Atari con recompensas distantes.
APA, Harvard, Vancouver, ISO, and other styles
11

Lacerda, Dênis Antonio. "Aprendizado por reforço em lote: um estudo de caso para o problema de tomada de decisão em processos de venda." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-03072014-101251/.

Full text
Abstract:
Planejamento Probabilístico estuda os problemas de tomada de decisão sequencial de um agente, em que as ações possuem efeitos probabilísticos, modelados como um processo de decisão markoviano (Markov Decision Process - MDP). Dadas a função de transição de estados probabilística e os valores de recompensa das ações, é possível determinar uma política de ações (i.e., um mapeamento entre estado do ambiente e ações do agente) que maximiza a recompensa esperada acumulada (ou minimiza o custo esperado acumulado) pela execução de uma sequência de ações. Nos casos em que o modelo MDP não é completamente conhecido, a melhor política deve ser aprendida através da interação do agente com o ambiente real. Este processo é chamado de aprendizado por reforço. Porém, nas aplicações em que não é permitido realizar experiências no ambiente real, por exemplo, operações de venda, é possível realizar o aprendizado por reforço sobre uma amostra de experiências passadas, processo chamado de aprendizado por reforço em lote (Batch Reinforcement Learning). Neste trabalho, estudamos técnicas de aprendizado por reforço em lote usando um histórico de interações passadas, armazenadas em um banco de dados de processos, e propomos algumas formas de melhorar os algoritmos existentes. Como um estudo de caso, aplicamos esta técnica no aprendizado de políticas para o processo de venda de impressoras de grande formato, cujo objetivo é a construção de um sistema de recomendação de ações para vendedores iniciantes.
Probabilistic planning studies the problems of sequential decision-making of an agent, in which actions have probabilistic effects, and can be modeled as a Markov decision process (MDP). Given the probabilities and reward values of each action, it is possible to determine an action policy (in other words, a mapping between the state of the environment and the agent\'s actions) that maximizes the expected reward accumulated by executing a sequence of actions. In cases where the MDP model is not completely known, the best policy needs to be learned through the interaction of the agent in the real environment. This process is called reinforcement learning. However, in applications where it is not allowed to perform experiments in the real environment, for example, sales process, it is possible to perform the reinforcement learning using a sample of past experiences. This process is called Batch Reinforcement Learning. In this work, we study techniques of batch reinforcement learning (BRL), in which learning is done using a history of past interactions, stored in a processes database. As a case study, we apply this technique for learning policies in the sales process for large format printers, whose goal is to build a action recommendation system for beginners sellers.
APA, Harvard, Vancouver, ISO, and other styles
12

Borges, Igor Oliveira. "Estratégias para otimização do algoritmo de Iteração de Valor Sensível a Risco." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-09012019-103826/.

Full text
Abstract:
Processos de decisão markovianos sensíveis a risco (Risk Sensitive Markov Decision Process - RS-MDP) permitem modelar atitudes de aversão e propensão ao risco no processo de tomada de decisão usando um fator de risco para representar a atitude ao risco. Para esse modelo, existem operadores que são baseados em funções de transformação linear por partes que incluem fator de risco e fator de desconto. Nesta dissertação são formulados dois algoritmos de Iteração de Valor Sensível a Risco baseados em um desses operadores, esses algoritmos são chamados de Iteração de Valor Sensível a Risco Síncrono (Risk Sensitive Value Iteration - RSVI) e Iteração de Valor Sensível a Risco Assíncrono (Asynchronous Risk Sensitive Value Iteration- A-RSVI). Também são propostas duas heurísticas que podem ser utilizadas para inicializar os valores dos algoritmos de forma a torná-los mais eficentes. Os resultados dos experimentos no domínio de Travessia do Rio em dois cenários de recompensas distintos mostram que: (i) o custo de processamento de políticas extremas a risco, tanto de aversão quanto de propensão, é elevado; (ii) um desconto elevado aumenta o tempo de convergência do algoritmo e reforça a sensibilidade ao risco adotada; (iii) políticas com valores para o fator de risco intermediários possuem custo computacional baixo e já possuem certa sensibilidade ao risco dependendo do fator de desconto utilizado; e (iv) o algoritmo A-RSVI com a heurística baseada no fator de risco pode reduzir o tempo para o algoritmo convergir, especialmente para valores extremos do fator de risco
Risk Sensitive Markov Decision Process (RS-MDP) allows modeling risk-averse and risk-prone attitudes in decision-making process using a risk factor to represent the risk-attitude. For this model, there are operators that are based on a piecewise linear transformation function that includes a risk factor and a discount factor. In this dissertation we formulate two Risk Sensitive Value Iteration algorithms based on one of these operators, these algorithms are called Synchronous Risk Sensitive Value Iteration (RSVI) and Asynchronous Risk Sensitive Value Iteration (A-RSVI). We also propose two heuristics that can be used to initialize the value of the RSVI or A-RSVI algorithms in order to make them more efficient. The results of experiments with the River domain in two distinct rewards scenarios show that: (i) the processing cost in extreme risk policies, for both risk-averse and risk-prone, is high; (ii) a high discount value increases the convergence time and reinforces the chosen risk attitude; (iii) policies with intermediate risk factor values have a low computational cost and show a certain sensitivity to risk based on the discount factor; and (iv) the A-RSVI algorithm with the heuristic based on the risk factor can decrease the convergence time of the algorithm, especially when we need a solution for extreme values of the risk factor
APA, Harvard, Vancouver, ISO, and other styles
13

Holguin, Mijail Gamarra. "Planejamento probabilístico usando programação dinâmica assíncrona e fatorada." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-14042013-131306/.

Full text
Abstract:
Processos de Decisão Markovianos (Markov Decision Process - MDP) modelam problemas de tomada de decisão sequencial em que as possíveis ações de um agente possuem efeitos probabilísticos sobre os estados sucessores (que podem ser definidas por matrizes de transição de estados). Programação dinâmica em tempo real (Real-time dynamic programming - RTDP), é uma técnica usada para resolver MDPs quando existe informação sobre o estado inicial. Abordagens tradicionais apresentam melhor desempenho em problemas com matrizes esparsas de transição de estados porque podem alcançar eficientemente a convergência para a política ótima, sem ter que visitar todos os estados. Porém essa vantagem pode ser perdida em problemas com matrizes densas de transição, nos quais muitos estados podem ser alcançados em um passo (por exemplo, problemas de controle com eventos exógenos). Uma abordagem para superar essa limitação é explorar regularidades existentes na dinâmica do domínio através de uma representação fatorada, isto é, uma representação baseada em variáveis de estado. Nesse trabalho de mestrado, propomos um novo algoritmo chamado de FactRTDP (RTDP Fatorado), e sua versão aproximada aFactRTDP (RTDP Fatorado e Aproximado), que é a primeira versão eficiente fatorada do algoritmo clássico RTDP. Também propomos outras 2 extensões desses algoritmos, o FactLRTDP e aFactLRTDP, que rotulam estados cuja função valor convergiu para o ótimo. Os resultados experimentais mostram que estes novos algoritmos convergem mais rapidamente quando executados em domínios com matrizes de transição densa e tem bom comportamento online em domínios com matrizes de transição densa com pouca dependência entre as variáveis de estado.
Markov Decision Process (MDP) model problems of sequential decision making, where the possible actions have probabilistic effects on the successor states (defined by state transition matrices). Real-time dynamic programming (RTDP), is a technique for solving MDPs when there exists information about the initial state. Traditional approaches show better performance in problems with sparse state transition matrices, because they can achieve the convergence to optimal policy efficiently, without visiting all states. But, this advantage can be lose in problems with dense state transition matrices, in which several states can be achieved in a step (for example, control problems with exogenous events). An approach to overcome this limitation is to explore regularities existing in the domain dynamics through a factored representation, i.e., a representation based on state variables. In this master thesis, we propose a new algorithm called FactRTDP (Factored RTDP), and its approximate version aFactRTDP (Approximate and Factored RTDP), that are the first factored efficient versions of the classical RTDP algorithm. We also propose two other extensions, FactLRTDP and aFactLRTDP, that label states for which the value function has converged to the optimal. The experimental results show that when these new algorithms are executed in domains with dense transition matrices, they converge faster. And they have a good online performance in domains with dense transition matrices and few dependencies among state variables.
APA, Harvard, Vancouver, ISO, and other styles
14

Sandino, Mora Juan David. "Autonomous decision-making for UAVs operating under environmental and object detection uncertainty." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/232513/1/Juan%20David_Sandino%20Mora_Thesis.pdf.

Full text
Abstract:
This study established a framework that increases cognitive levels in small UAVs (or drones), enabling autonomous navigation in partially observable environments. The UAV system was validated under search and rescue by locating victims last seen inside cluttered buildings and in bushlands. This framework improved the decision-making skills of the drone to collect more accurate statistics of detected victims. This study assists validation processes of detected objects in real-time when data is complex to interpret for UAV pilots and reduces human bias on scouting strategies.
APA, Harvard, Vancouver, ISO, and other styles
15

Freitas, Elthon Manhas de. "Planejamento probabilístico sensível a risco com ILAO* e função utilidade exponencial." Universidade de São Paulo, 2018. http://www.teses.usp.br/teses/disponiveis/100/100131/tde-17012019-092638/.

Full text
Abstract:
Os processos de decisão de Markov (Markov Decision Process - MDP) têm sido usados para resolução de problemas de tomada de decisão sequencial. Existem problemas em que lidar com os riscos do ambiente para obter um resultado confiável é mais importante do que maximizar o retorno médio esperado. MDPs que lidam com esse tipo de problemas são chamados de processos de decisão de Markov sensíveis a risco (Risk-Sensitive Markov Decision Process - RSMDP). Dentre as diversas variações de RSMDP, estão os trabalhos baseados em utilidade exponencial que utilizam um fator de risco, o qual modela a atitude a risco do agente e que pode ser propensa ou aversa. Os algoritmos existentes na literatura para resolver esse tipo de RSMDPs são ineficientes se comparados a outros algoritmos de MDP. Neste projeto, é apresentada uma solução que pode ser usada em problemas maiores, tanto por executar cálculos apenas em estados relevantes para atingir um conjunto de estados meta partindo de um estado inicial, quanto por permitir processamento de números com expoentes muito elevados para os ambientes computacionais atuais. Os experimentos realizados evidenciam que (i) o algoritmo proposto é mais eficiente, se comparado aos algoritmos estado-da-arte para RSMDPs; e (ii) o uso da técnica LogSumExp permite resolver o problema de trabalhar com expoentes muito elevados em RSMDPs.
Markov Decision Process (MDP) has been used very efficiently to solve sequential decision-making problems. There are problems where dealing with environmental risks to get a reliable result is more important than maximizing the expected average return. MDPs that deal with this type of problem are called risk-sensitive Markov decision processes (RSMDP). Among the several variations of RSMDP are the works based on exponential utility that use a risk factor, which models the agent\'s risk attitude that can be prone or averse. The algorithms in the literature to solve this type of RSMDPs are inefficient when compared to other MDP algorithms. In this project, a solution is presented that can be used in larger problems, either by performing calculations only in relevant states to reach a set of meta states starting from an initial state, or by allowing the processing of numbers with very high exponents for the current computational environments. The experiments show that (i) the proposed algorithm is more efficient when compared to state-of-the-art algorithms for RSMDPs; and (ii) the LogSumExp technique solves the problem of working with very large exponents in RSMDPs
APA, Harvard, Vancouver, ISO, and other styles
16

Hireche, Chabha. "Etude et implémentation sur SoC-FPGA d'une méthode probabiliste pour le contrôle de mission de véhicule autonome Embedded context aware diagnosis for a UAV SoC platform, in Microprocessors and Microsystems 51, June 2017 Context/Resource-Aware Mission Planning Based on BNs and Concurrent MDPs for Autonomous UAVs, in MDPI-Sensors Journal, December 2018." Thesis, Brest, 2019. http://www.theses.fr/2019BRES0067.

Full text
Abstract:
Les systèmes autonomes embarquent différents types de capteurs, d’applications et de calculateurs puissants. Ils sont donc utilisés dans différents domaines d’application et réalisent diverses missions simples ou complexes. Ces missions se déroulent souvent dans des environnements non déterministes avec la présence d’évènements aléatoires pouvant perturber le déroulement de la mission. Il est donc nécessaire d’évaluer régulièrement l’état de santé du système et de ses composants matériels et logiciels dans le but de détecter les défaillances à l’aide de réseaux Bayésiens. Par la suite, une décision est prise par le planificateur de mission en générant un nouveau plan de mission assurant la continuité de la mission en réponse à l’événement détecté. Cette décision est prise à l’aide du modèle Markov Decision Process en fonction de contraintes telles que l’objectif de la mission, l’état de santé des capteurs et des applications embarqués, la stratégie de réalisation de la mission ‘stratégie safety’ ou ‘stratégie mission first’, etc. Comme les systèmes autonomes exécutent différentes tâches qui demandent différentes performances, il est nécessaire de penser à l’utilisation d’accélérateurs matériels sur SoC-FPGA dans le but de répondre aux contraintes de calculs hautes performances et décharger le CPU si besoin
Autonomous systems embed different types of sensors, applications and powerful calculators. Thus, they are used in different fields of application and perform various simple or complex tasks. Generally, these missions are executed in nondeterministic environments with the presence of random events that can affect the mission's progress. Therefore, it is necessary to regularly assess the health of the system and its hardware and software components in order to detect failures using Bayesian Networks.Subsequently, a decision is made by the mission planner by generating a new mission plan that ensures the mission in response to the detected event. This decision is made using the Markov Decision Process model based on constraints such as the mission objective, the health status of sensors and embedded applications, the mission policy "safety policy" or "mission first policy", etc. As autonomous systems perform different tasks that require different performance, it is necessary to consider the use of hardware accelerators on SoC-FPGA in order to meet high-performance computing constraints and unload the CPU if needed
APA, Harvard, Vancouver, ISO, and other styles
17

Hoock, Jean-Baptiste. "Contributions to Simulation-based High-dimensional Sequential Decision Making." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00912338.

Full text
Abstract:
My thesis is entitled "Contributions to Simulation-based High-dimensional Sequential Decision Making". The context of the thesis is about games, planning and Markov Decision Processes. An agent interacts with its environment by successively making decisions. The agent starts from an initial state until a final state in which the agent can not make decision anymore. At each timestep, the agent receives an observation of the state of the environment. From this observation and its knowledge, the agent makes a decision which modifies the state of the environment. Then, the agent receives a reward and a new observation. The goal is to maximize the sum of rewards obtained during a simulation from an initial state to a final state. The policy of the agent is the function which, from the history of observations, returns a decision. We work in a context where (i) the number of states is huge, (ii) reward carries little information, (iii) the probability to reach quickly a good final state is weak and (iv) prior knowledge is either nonexistent or hardly exploitable. Both applications described in this thesis present these constraints : the game of Go and a 3D simulator of the european project MASH (Massive Sets of Heuristics). In order to take a satisfying decision in this context, several solutions are brought : 1. Simulating with the compromise exploration/exploitation (MCTS) 2. Reducing the complexity by local solving (GoldenEye) 3. Building a policy which improves itself (RBGP) 4. Learning prior knowledge (CluVo+GMCTS) Monte-Carlo Tree Search (MCTS) is the state of the art for the game of Go. From a model of the environment, MCTS builds incrementally and asymetrically a tree of possible futures by performing Monte-Carlo simulations. The tree starts from the current observation of the agent. The agent switches between the exploration of the model and the exploitation of decisions which statistically give a good cumulative reward. We discuss 2 ways for improving MCTS : the parallelization and the addition of prior knowledge. The parallelization does not solve some weaknesses of MCTS; in particular some local problems remain challenges. We propose an algorithm (GoldenEye) which is composed of 2 parts : detection of a local problem and then its resolution. The algorithm of resolution reuses some concepts of MCTS and it solves difficult problems of a classical database. The addition of prior knowledge by hand is laborious and boring. We propose a method called Racing-based Genetic Programming (RBGP) in order to add automatically prior knowledge. The strong point is that RBGP rigorously validates the addition of a prior knowledge and RBGP can be used for building a policy (instead of only optimizing an algorithm). In some applications such as MASH, simulations are too expensive in time and there is no prior knowledge and no model of the environment; therefore Monte-Carlo Tree Search can not be used. So that MCTS becomes usable in this context, we propose a method for learning prior knowledge (CluVo). Then we use pieces of prior knowledge for improving the rapidity of learning of the agent and for building a model, too. We use from this model an adapted version of Monte-Carlo Tree Search (GMCTS). This method solves difficult problems of MASH and gives good results in an application to a word game.
APA, Harvard, Vancouver, ISO, and other styles
18

Regatti, Jayanth Reddy. "Dynamic Routing for Fuel Optimization in Autonomous Vehicles." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524145002064074.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Santos, Felipe Martins dos. "Soluções eficientes para processos de decisão markovianos baseadas em alcançabilidade e bissimulações estocásticas." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-12022014-140538/.

Full text
Abstract:
Planejamento em inteligência artificial é a tarefa de determinar ações que satisfaçam um dado objetivo. Nos problemas de planejamento sob incerteza, as ações podem ter efeitos probabilísticos. Esses problemas são modelados como Processos de Decisão Markovianos (Markov Decision Processes - MDPs), modelos que permitem o cálculo de soluções ótimas considerando o valor esperado de cada ação em cada estado. Contudo, resolver problemas grandes de planejamento probabilístico, i.e., com um grande número de estados e ações, é um enorme desafio. MDPs grandes podem ser reduzidos através da computação de bissimulações estocásticas, i.e., relações de equivalência sobre o conjunto de estados do MDP original. A partir das bissimulações estocásticas, que podem ser exatas ou aproximadas, é possível obter um modelo abstrato reduzido que pode ser mais fácil de resolver do que o MDP original. No entanto, para problemas de alguns domínios, a computação da bissimulação estocástica sobre todo o espaço de estados é inviável. Os algoritmos propostos neste trabalho estendem os algoritmos usados para a computação de bissimulações estocásticas para MDPs de forma que elas sejam computadas sobre o conjunto de estados alcançáveis a partir de um dado estado inicial, que pode ser muito menor do que o conjunto de estados completo. Os resultados experimentais mostram que é possível resolver problemas grandes de planejamento probabilístico com desempenho superior às técnicas conhecidas de bissimulação estocástica.
Planning in artificial intelligence is the task of finding actions to reach a given goal. In planning under uncertainty, the actions can have probabilistic effects. This problems are modeled using Markov Decision Processes (MDPs), models that enable the computation of optimal solutions considering the expected value of each action when applied in each state. However, to solve big probabilistic planning problems, i.e., those with a large number of states and actions, is still a challenge. Large MDPs can be reduced by computing stochastic bisimulations, i.e., equivalence relations over the original MDP states. From the stochastic bisimulations, that can be exact or approximated, it is possible to get an abstract reduced model that can be easier to solve than the original MDP. But, for some problems, the stochastic bisimulation computation over the whole state space is unfeasible. The algorithms proposed in this work extend the algorithms that are used to compute stochastic bisimulations for MDPs in a way that they can be computed over the reachable set of states with a given initial state, which can be much smaller than the complete set of states. The empirical results show that it is possible to solve large probabilistic planning problems with better performance than the known techniques of stochastic bisimulation.
APA, Harvard, Vancouver, ISO, and other styles
20

Skoglund, Caroline. "Risk-aware Autonomous Driving Using POMDPs and Responsibility-Sensitive Safety." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-300909.

Full text
Abstract:
Autonomous vehicles promise to play an important role aiming at increased efficiency and safety in road transportation. Although we have seen several examples of autonomous vehicles out on the road over the past years, how to ensure the safety of autonomous vehicle in the uncertain and dynamic environment is still a challenging problem. This thesis studies this problem by developing a risk-aware decision making framework. The system that integrates the dynamics of an autonomous vehicle and the uncertain environment is modelled as a Partially Observable Markov Decision Process (POMDP). A risk measure is proposed based on the Responsibility-Sensitive Safety (RSS) distance, which quantifying the minimum distance to other vehicles for ensuring safety. This risk measure is incorporated into the reward function of POMDP for achieving a risk-aware decision making. The proposed risk-aware POMDP framework is evaluated in two case studies. In a single-lane car following scenario, it is shown that the ego vehicle is able to successfully avoid a collision in an emergency event where a vehicle in front of it makes a full stop. In the merge scenario, the ego vehicle successfully enters the main road from a ramp with a satisfactory distance to other vehicles. As a conclusion, the risk-aware POMDP framework is able to realize a trade-off between safety and usability by keeping a reasonable distance and adapting to other vehicles behaviours.
Autonoma fordon förutspås spela en stor roll i framtiden med målen att förbättra effektivitet och säkerhet för vägtransporter. Men även om vi sett flera exempel av autonoma fordon ute på vägarna de senaste åren är frågan om hur säkerhet ska kunna garanteras ett utmanande problem. Det här examensarbetet har studerat denna fråga genom att utveckla ett ramverk för riskmedvetet beslutsfattande. Det autonoma fordonets dynamik och den oförutsägbara omgivningen modelleras med en partiellt observerbar Markov-beslutsprocess (POMDP från engelskans “Partially Observable Markov Decision Process”). Ett riskmått föreslås baserat på ett säkerhetsavstånd förkortat RSS (från engelskans “Responsibility-Sensitive Safety”) som kvantifierar det minsta avståndet till andra fordon för garanterad säkerhet. Riskmåttet integreras i POMDP-modellens belöningsfunktion för att åstadkomma riskmedvetna beteenden. Den föreslagna riskmedvetna POMDP-modellen utvärderas i två fallstudier. I ett scenario där det egna fordonet följer ett annat fordon på en enfilig väg visar vi att det egna fordonet kan undvika en kollision då det framförvarande fordonet bromsar till stillastående. I ett scenario där det egna fordonet ansluter till en huvudled från en ramp visar vi att detta görs med ett tillfredställande avstånd till andra fordon. Slutsatsen är att den riskmedvetna POMDP-modellen lyckas realisera en avvägning mellan säkerhet och användbarhet genom att hålla ett rimligt säkerhetsavstånd och anpassa sig till andra fordons beteenden.
APA, Harvard, Vancouver, ISO, and other styles
21

Desquesnes, Guillaume Louis Florent. "Distribution de Processus Décisionnels Markoviens pour une gestion prédictive d’une ressource partagée : application aux voies navigables des Hauts-de-France dans le contexte incertain du changement climatique." Thesis, Ecole nationale supérieure Mines-Télécom Lille Douai, 2018. http://www.theses.fr/2018MTLD0001/document.

Full text
Abstract:
Les travaux de cette thèse visent à mettre en place une gestion prédictive sous incertitudes de la ressource en eau pour les réseaux de voies navigables. L'objectif est de proposer un plan de gestion de l'eau pour optimiser les conditions de navigation de l'ensemble du réseau supervisé sur un horizon spécifié. La solution attendue doit rendre le réseau résilient aux effets probables du changement climatique et aux évolutions du trafic fluvial. Dans un premier temps, une modélisation générique d'une ressource distribuée sur un réseau est proposée. Celle-ci, basée sur les processus décisionnels markoviens, prend en compte les nombreuses incertitudes affectant les réseaux considérés. L'objectif de cette modélisation est de couvrir l'ensemble des cas possibles, prévus ou non, afin d'avoir une gestion résiliente de ces réseaux. La seconde contribution consiste en une distribution du modèle sur plusieurs agents afin de permettre son passage à l'échelle. Ceci consiste en une répartition des capacités de contrôle du réseau entre les agents. Chaque agent ne possède ainsi qu'une connaissance locale du réseau supervisé. De ce fait, les agents ont besoin de se cordonner pour proposer une gestion efficace du réseau. Une résolution itérative avec échanges de plans temporaires de chaque agent est utilisée pour l'obtention de politiques de gestion locales à chaque agent. Finalement, des expérimentations ont été réalisées sur des réseaux réels de voies navigables françaises pour observer la qualité des solutions produites. Plusieurs scénarios climatiques différents ont été simulés pour tester la résilience des politiques produites
The work of this thesis aims to introduce and implement a predictive management under uncertainties of the water resource for inland waterway networks. The objective is to provide a water management plan to optimize the navigation conditions of the entire supervised network over a specified horizon. The expected solution must render the network resilient to probable effects of the climate change and changes in waterway traffic. Firstly, a generic modeling of a resource distributed on a network is proposed. This modeling, based on Markovian Decision Processes, takes into account the numerous uncertainties affecting considered networks. The objective of this modeling is to cover all possible cases, foreseen or not, in order to have a resilient management of those networks. The second contribution consists in a distribution of the model over several agents to facilitate the scaling. This consists of a repartition of the network's control capacities among the agents. Thus, each agent has only local knowledge of the supervised network. As a result, agents require coordination to provide an efficient management of the network. An iterative resolution, with exchanges of temporary plans from each agent, is used to obtain local management policies for each agent. Finally, experiments were carried out on realistic and real networks of the French waterways to observe the quality of the solutions produced. Several different climatic scenarios have been simulated to test the resilience of the produced policies
APA, Harvard, Vancouver, ISO, and other styles
22

Sun-Hosoya, Lisheng. "Meta-Learning as a Markov Decision Process." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS588/document.

Full text
Abstract:
L'apprentissage automatique (ML) a connu d'énormes succès ces dernières années et repose sur un nombre toujours croissant d'applications réelles. Cependant, la conception d'algorithmes prometteurs pour un problème spécifique nécessite toujours un effort humain considérable. L'apprentissage automatique (AutoML) a pour objectif de sortir l'homme de la boucle. AutoML est généralement traité comme un problème de sélection d’algorithme / hyper-paramètre. Les approches existantes incluent l’optimisation Bayésienne, les algorithmes évolutionnistes et l’apprentissage par renforcement. Parmi eux, auto-sklearn, qui intègre des techniques de meta-learning à l'initialisation de la recherche, occupe toujours une place de choix dans les challenges AutoML. Cette observation a orienté mes recherches vers le domaine du meta-learning. Cette orientation m'a amené à développer un nouveau cadre basé sur les processus de décision Markovien (MDP) et l'apprentissage par renforcement (RL). Après une introduction générale (chapitre 1), mon travail de thèse commence par une analyse approfondie des résultats du Challenge AutoML (chapitre 2). Cette analyse a orienté mon travail vers le meta-learning, menant tout d’abord à proposer une formulation d’AutoML en tant que problème de recommandation, puis à formuler une nouvelle conceptualisation du problème en tant que MDP (chapitre 3). Dans le cadre du MDP, le problème consiste à remplir de manière aussi rapide et efficace que possible une matrice S de meta-learning, dans laquelle les lignes correspondent aux tâches et les colonnes aux algorithmes. Un élément de matrice S (i, j) est la performance de l'algorithme j appliqué à la tâche i. La recherche efficace des meilleures valeurs dans S nous permet d’identifier rapidement les algorithmes les mieux adaptés à des tâches données. Dans le chapitre 4, nous examinons d’abord le cadre classique d’optimisation des hyper-paramètres. Au chapitre 5, une première approche de meta-learning est introduite, qui combine des techniques d'apprentissage actif et de filtrage collaboratif pour prédire les valeurs manquantes dans S. Nos dernières recherches appliquent RL au problème du MDP défini pour apprendre une politique efficace d’exploration de S. Nous appelons cette approche REVEAL et proposons une analogie avec une série de jeux pour permettre de visualiser les stratégies des agents pour révéler progressivement les informations. Cette ligne de recherche est développée au chapitre 6. Les principaux résultats de mon projet de thèse sont : 1) Sélection HP / modèle : j'ai exploré la méthode Freeze-Thaw et optimisé l'algorithme pour entrer dans le premier challenge AutoML, obtenant la 3ème place du tour final (chapitre 3). 2) ActivMetaL : j'ai conçu un nouvel algorithme pour le meta-learning actif (ActivMetaL) et l'ai comparé à d'autres méthodes de base sur des données réelles et artificielles. Cette étude a démontré qu'ActiveMetaL est généralement capable de découvrir le meilleur algorithme plus rapidement que les méthodes de base. 3) REVEAL : j'ai développé une nouvelle conceptualisation du meta-learning en tant que processus de décision Markovien et je l'ai intégrée dans le cadre plus général des jeux REVEAL. Avec un stagiaire en master, j'ai développé des agents qui apprennent (avec l'apprentissage par renforcement) à prédire le meilleur algorithme à essayer. Le travail présenté dans ma thèse est de nature empirique. Plusieurs méta-données du monde réel ont été utilisées dans cette recherche. Des méta-données artificielles et semi-artificielles sont également utilisées dans mon travail. Les résultats indiquent que RL est une approche viable de ce problème, bien qu'il reste encore beaucoup à faire pour optimiser les algorithmes et les faire passer à l’échelle aux problèmes de méta-apprentissage plus vastes
Machine Learning (ML) has enjoyed huge successes in recent years and an ever- growing number of real-world applications rely on it. However, designing promising algorithms for a specific problem still requires huge human effort. Automated Machine Learning (AutoML) aims at taking the human out of the loop and develop machines that generate / recommend good algorithms for a given ML tasks. AutoML is usually treated as an algorithm / hyper-parameter selection problems, existing approaches include Bayesian optimization, evolutionary algorithms as well as reinforcement learning. Among them, auto-sklearn which incorporates meta-learning techniques in their search initialization, ranks consistently well in AutoML challenges. This observation oriented my research to the Meta-Learning domain. This direction led me to develop a novel framework based on Markov Decision Processes (MDP) and reinforcement learning (RL).After a general introduction (Chapter 1), my thesis work starts with an in-depth analysis of the results of the AutoML challenge (Chapter 2). This analysis oriented my work towards meta-learning, leading me first to propose a formulation of AutoML as a recommendation problem, and ultimately to formulate a novel conceptualisation of the problem as a MDP (Chapter 3). In the MDP setting, the problem is brought back to filling up, as quickly and efficiently as possible, a meta-learning matrix S, in which lines correspond to ML tasks and columns to ML algorithms. A matrix element S(i, j) is the performance of algorithm j applied to task i. Searching efficiently for the best values in S allows us to identify quickly algorithms best suited to given tasks. In Chapter 4 the classical hyper-parameter optimization framework (HyperOpt) is first reviewed. In Chapter 5 a first meta-learning approach is introduced along the lines of our paper ActivMetaL that combines active learning and collaborative filtering techniques to predict the missing values in S. Our latest research applies RL to the MDP problem we defined to learn an efficient policy to explore S. We call this approach REVEAL and propose an analogy with a series of toy games to help visualize agents’ strategies to reveal information progressively, e.g. masked areas of images to be classified, or ship positions in a battleship game. This line of research is developed in Chapter 6. The main results of my PhD project are: 1) HP / model selection: I have explored the Freeze-Thaw method and optimized the algorithm to enter the first AutoML challenge, achieving 3rd place in the final round (Chapter 3). 2) ActivMetaL: I have designed a new algorithm for active meta-learning (ActivMetaL) and compared it with other baseline methods on real-world and artificial data. This study demonstrated that ActiveMetaL is generally able to discover the best algorithm faster than baseline methods. 3) REVEAL: I developed a new conceptualization of meta-learning as a Markov Decision Process and put it into the more general framework of REVEAL games. With a master student intern, I developed agents that learns (with reinforcement learning) to predict the next best algorithm to be tried. To develop this agent, we used surrogate toy tasks of REVEAL games. We then applied our methods to AutoML problems. The work presented in my thesis is empirical in nature. Several real world meta-datasets were used in this research. Artificial and semi-artificial meta-datasets are also used in my work. The results indicate that RL is a viable approach to this problem, although much work remains to be done to optimize algorithms to make them scale to larger meta-learning problems
APA, Harvard, Vancouver, ISO, and other styles
23

Berggren, Andreas, Martin Gunnarsson, and Johannes Wallin. "Artificial intelligence as a decision support system in property development and facility management." Thesis, Högskolan i Borås, Akademin för textil, teknik och ekonomi, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:hb:diva-25535.

Full text
Abstract:
The construction industry has been hesitant for a long time to apply new technologies. In property development, the industry relies heavily on employees bringing experience from one project to another. These employees learn to manage risks in connection with the acquisition of land, but when these people retire, the knowledge disappears. An AI-based decision-support system that takes the risks and the market into account when acquiring land can learn from each project and bring this knowledge into future projects. In facility management, artificial intelligence could increase the efficiency of the allocation of staff in the ongoing operations. The purpose of the study is to analyse how companies in the real estate industry can improve their decision-making with the help of AI in property development and property management. In this study, two case studies of two different players in the real estate industry have been performed. One player, Bygg-Fast, represents property development and the other player, VGR, represents facility management. The study is based on interviews, discussions, and collected data. By mapping and then quantifying the risks and market indicators that are input data in the process, a basis can be created. The data can be used for a model that lays the foundation for an AI-based decision support system that will help the property developer to make calculated decisions in the land acquisition process. By mapping what a flow through a property looks like, measuring points can be set out to analyse how long the activities take in the specific business. These measured values provide a collection of data that makes it easier to plan the activities conducted in the property. A more efficient flow can be achieved by visualizing the entire process so staff can be allocated to the right part of the flow. By being flexible and being able to re-plan the business quickly if planning is disrupted, a high level of efficiency can be achieved. This could be done by an AI-based decision support system that simulates alternative day plans.
Byggbranschen har länge varit tveksamt till att applicera nya tekniker. Inom fastighetsutveckling bygger branschen mycket på att anställda tar med sig erfarenheter från ett projekt till ett annat. Dessa anställda lär sig hantera risker i samband med förvärv av mark men när dessa personer slutar eller går i pension försvinner kunskapen. Ett AI baserat beslutssystem som tar risk och marknad i beaktning vid förvärv av mark kan lära sig av varje projekt och ta med dessa kunskaper till framtida projekt. Inom fastighetsförvaltning skulle artificiell intelligens kunna effektivisera allokerandet av personal i den pågående verksamheten. Syftet med studien är att analysera hur företag i fastighetsbranschen kan förbättra sitt beslutstagande med hjälp av AI i utveckling av fastigheter samt fastighetsförvaltning. I denna studien har två fallstudier av två olika aktörer i fastighetsbranschen utförts. Ena aktören, Bygg-Fast, representerar fastighetsutveckling och den andra aktören, VGR, representerar fastighetsförvaltning. Studien bygger på intervjuer, diskussioner och insamlade data. Genom att kartlägga och sedan kvantifiera de risker samt marknadsindikatorer som är indata i processen kan ett underlag skapas. Underlaget kan användas för en modell som lägger grunden för ett AI baserat beslutsstödsystem som ska hjälpa fastighetsutvecklaren med att ta kalkylerade beslut i mark förvärvsprocessen. Genom att kartlägga hur ett flöde genom en fastighet ser ut kan mätpunkter sättas ut för att analysera hur lång tid aktiviteterna tar i den specifika verksamheten. Dessa mätvärden ger en samlad data som gör det lättare att planera verksamheten som bedrivs i fastigheten. Ett effektivare flöde kan uppnås genom att visualisera hela processen så personal kan allokeras till rätt del av flödet. Genom att vara flexibel och kunna planera om verksamheten snabbt ifall planering störs kan en hög effektivitet nås. Detta skulle kunna göras av ett AI baserat beslutsstödsystem som simulerar alternativa dagsplaneringar.
APA, Harvard, Vancouver, ISO, and other styles
24

So, Mee Chi. "Optimizing credit limit policy by Markov decision process models." Thesis, University of Southampton, 2009. https://eprints.soton.ac.uk/68761/.

Full text
Abstract:
Credit cards have become an essential product for most consumers. Lenders have recognized the profit that can be achieved from the credit card market and thus they have introduced different credit cards to attract consumers. Thus, the credit card market has undergone keen competition in recent years. Lenders realize their operation decisions are crucial in determining how much pofit is achieved from a card. This thesis focuses on the most well-known operating policy: the management of credit limit. Lenders traditionally applied static decision models to manage the credit limit of credit card accounts. A growing number of lenders though want improved models so as to monitor the long-term risk and return of credit card borrowers. This study aims to use Markov Decision Process, which is a well-developed sequential decision model, to adjust the credit limit of current credit card accounts. The behavioural score, which is the way of assessing credit card holder's default risk in the next year, is used as the key parameter to monitor the risk of every individual account. The model formulation and the corresponding application techniques, such as state coarse-classication, choice of Markovity order, are discussed in this thesis. One major concern of using Markov Decision Process model is the small sample size in certain states. In general credit card lenders have lots of data. However, there may be no examples in the data of transitions from certain states to default, particularly for those high quality credit card accounts. If one simply uses zero to estimate these states' transition probabilities, this leads to apparent 'structural zeros' states which change the connectedness of the dynamics in the state space. A method is developed in this thesis to overcome such problems in real applications. The economy and retail credit risk are highly correlated and so one key focus of this study is to look at the interaction between credit card behavioural score migrations and the economy. This study uses dierent credit card datasets, one from Hong Kong and one from United Kingdom, to examine the impact of economy on the credit card borrowers' behaviour. The economies in these two areas were dierent during the sampling period. Based on these empirical ndings, this study has generalized the use of macroeconomic measurements in the credit limit models. This thesis also proposed segmenting the credit card accounts by the accounts' repayment patterns. The credit card population in general can be segmented into Transactors or Revolvers. Empirical ndings show the impact of economy are signicantly different for Transactors and Revolvers. This study provides a detailed picture of the application of Markov Decision Process models in adjusting the credit limit of credit card accounts.
APA, Harvard, Vancouver, ISO, and other styles
25

Liu, Yaxin. "Decision-Theoretic Planning under Risk-Sensitive Planning Objectives." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/6959.

Full text
Abstract:
Risk attitudes are important for human decision making, especially in scenarios where huge wins or losses are possible, as exemplified by planetary rover navigation, oilspill response, and business applications. Decision-theoretic planners therefore need to take risk aspects into account to serve their users better. However, most existing decision-theoretic planners use simplistic planning objectives that are risk-neutral. The thesis research is the first comprehensive study of how to incorporate risk attitudes into decision-theoretic planners and solve large-scale planning problems represented as Markov decision process models. The thesis consists of three parts. The first part of the thesis work studies risk-sensitive planning in case where exponential utility functions are used to model risk attitudes. I show that existing decision-theoretic planners can be transformed to take risk attitudes into account. Moreover, different versions of the transformation are needed if the transition probabilities are implicitly given, namely, temporally extended probabilities and probabilities given in a factored form. The second part of the thesis work studies risk-sensitive planning in case where general nonlinear utility functions are used to model risk attitudes. I show that a state-augmentation approach can be used to reduce a risk-sensitive planning problem to a risk-neutral planning problem with an augmented state space. I further use a functional interpretation of value functions and approximation methods to solve the planning problems efficiently with value iteration. I also show an exact method for solving risk-sensitive planning problems where one-switch utility functions are used to model risk attitudes. The third part of the thesis work studies risk sensitive planning in case where arbitrary rewards are used. I propose a spectrum of conditions that can be used to constrain the utility function and the planning problem so that the optimal expected utilities exist and are finite. I prove that the existence and finiteness properties hold for stationary plans, where the action to perform in each state does not change over time, under different sets of conditions.
APA, Harvard, Vancouver, ISO, and other styles
26

Zang, Peng. "Scaling solutions to Markov Decision Problems." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42906.

Full text
Abstract:
The Markov Decision Problem (MDP) is a widely applied mathematical model useful for describing a wide array of real world decision problems ranging from navigation to scheduling to robotics. Existing methods for solving MDPs scale poorly when applied to large domains where there are many components and factors to consider. In this dissertation, I study the use of non-tabular representations and human input as scaling techniques. I will show that the joint approach has desirable optimality and convergence guarantees, and demonstrates several orders of magnitude speedup over conventional tabular methods. Empirical studies of speedup were performed using several domains including a clone of the classic video game, Super Mario Bros. In the course of this work, I will address several issues including: how approximate representations can be used without losing convergence and optimality properties, how human input can be solicited to maximize speedup and user engagement, and how that input should be used so as to insulate against possible errors.
APA, Harvard, Vancouver, ISO, and other styles
27

Hudson, Joshua. "A Partially Observable Markov Decision Process for Breast Cancer Screening." Thesis, Linköpings universitet, Statistik och maskininlärning, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-154437.

Full text
Abstract:
In the US, breast cancer is one of the most common forms of cancer and the most lethal. There are many decisions that must be made by the doctor and/or the patient when dealing with a potential breast cancer. Many of these decisions are made under uncertainty, whether it is the uncertainty related to the progression of the patient's health, or that related to the accuracy of the doctor's tests. Each possible action under consideration can have positive effects, such as a surgery successfully removing a tumour, and negative effects: a post-surgery infection for example. The human mind simply cannot take into account all the variables involved and possible outcomes when making these decisions. In this report, a detailed Partially Observable Markov Decision Process (POMDP) for breast cancer screening decisions is presented. It includes 151 states, covering 144 different cancer states, and 2 competing screening methods. The necessary parameters were first set up using relevant medical literature and a patient history simulator. Then the POMDP was solved optimally for an infinite horizon, using the Perseus algorithm. The resulting policy provided several recommendations for breast cancer screening. The results indicated that clinical breast examinations are important for screening younger women. Regarding the decision to operate on a woman with breast cancer, the policy showed that invasive cancers with either a tumour size above 1.5 cm or which are in metastasis, should be surgically removed as soon as possible. However, the policy also recommended that patients who are certain to be healthy should have a breast biopsy. The cause of this error was explored further and the conclusion was reached that a finite horizon may be more appropriate for this application.
APA, Harvard, Vancouver, ISO, and other styles
28

Tabaeh, Izadi Masoumeh. "On knowledge representation and decision making under uncertainty." Thesis, McGill University, 2007. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=103012.

Full text
Abstract:
Designing systems with the ability to make optimal decisions under uncertainty is one of the goals of artificial intelligence. However, in many applications the design of optimal planners is complicated due to imprecise inputs and uncertain outputs resulting from stochastic dynamics. Partially Observable Markov Decision Processes (POMDPs) provide a rich mathematical framework to model these kinds of problems. However, the high computational demand of solution methods for POMDPs is a drawback for applying them in practice.
In this thesis, we present a two-fold approach for improving the tractability of POMDP planning. First, we focus on designing good heuristics for POMDP approximation algorithms. We aim to scale up the efficiency of a class of POMDP approximations called point-based planning methods by designing a good planning space. We study the effect of three properties of reachable belief state points that may influence the performance of point-based approximation methods. Second, we investigate approaches to designing good controllers using an alternative representation of systems with partial observability called Predictive State Representation (PSR). This part of the thesis advocates the usefulness and practicality of PSRs in planning under uncertainty. We also attempt to move some useful characteristics of the PSR model, which has a predictive view of the world, to the POMDP model, which has a probabilistic view of the hidden states of the world. We propose a planning algorithm motivated by the connections between the two models.
APA, Harvard, Vancouver, ISO, and other styles
29

Lommel, Peter Hans. "An extended Kalman filter extension of the augmented Markov decision process." Thesis, Massachusetts Institute of Technology, 2005. http://hdl.handle.net/1721.1/32453.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2005.
Includes bibliographical references (p. 99-102).
As the field of robotics continues to mature, individual robots are increasingly capable of performing multiple complex tasks. As a result, the ability for robots to move autonomously through their environments is a fundamental necessity. If perfect knowledge of the robot's position is available, the robot motion planning problem can be solved efficiently using any of a number of existing algorithms. Frequently though, the robot's position can only be estimated using incomplete and imperfect information from its sensors and an approximate model of its dynamics. Algorithms which assume perfect knowledge of the robot's position can still be applied by treating the mean or maximum likelihood estimate of the robot's position as certain. However, unless the uncertainty in the agent's position is very small, this approach is not reliable. In order to perform optimally in this situation, planners, such as the partially observable Markov decision process, plan over the entire set of beliefs (distributions over the robot's position). Unfortunately, this approach is only tractable for problems with very few states. Between these two extreme approaches, however, lies a continuum of possible planners which plan over a subset of the belief space. The difficulty that these planners face is choosing and representing a minimal subset of the belief space which spans the set of beliefs that the robot will actually experience. In this paper, we show that there exists a very natural such set, the set, of Gaussian beliefs. By combining an extended Kalman filter with an augmented Markov decision process, we create a path planner which efficiently plans over a discrete approximation of the set of Gaussian beliefs.
(cont.) The resulting planner is demonstrated via simulation to be both computationally tractable and robust to uncertainty in the robot's position.
by Peter Hans Lommel.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
30

Hamroun, Youcef F. "The decision-making process in metropolitan planning organizations." Access to citation, abstract and download form provided by ProQuest Information and Learning Company; downloadable PDF file 0.19 Mb., p, 2006. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&res_dat=xri:pqdiss&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&rft_dat=xri:pqdiss:1435819.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Balali, Samaneh. "Incorporating expert judgement into condition based maintenance decision support using a coupled hidden markov model and a partially observable markov decision process." Thesis, University of Strathclyde, 2012. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=19510.

Full text
Abstract:
Preventive maintenance consists of activities performed to maintain a system in a satisfactory functional condition. Condition Based Maintenance (CBM) aims to reduce the cost of preventive maintenance by supporting decisions on performing maintenance actions, based on information reflecting a system's health condition. In practice, the condition related information can be obtained in various ways, including continuous condition monitoring performed by sensors, or subjective assessment performed by humans. An experienced engineer might provide such subjective assessment by visually inspecting a system, or by interpreting the data collected by condition monitoring devices, and hence give an 'expert judgement' on the state of the system. There is limited academic literature on the development of CBM models incorporating expert judgement. This research aims to reduce this gap by developing models that formally incorporate expert judgement into the CBM decisi on process. A Coupled Hidden Markov Model is proposed to model the evolutionary relationship between expert judgement and the true deterioration state of a system. This model is used to estimate the underlying condition of the system and predict the remaining time to failure. A training algorithm is developed to support model parameter estimation. The algorithm's performance is evaluated with respect to the number of expert judgements and initial settings of model parameters. A decision-making problem is formulated to account for the use of expert judgement in selecting maintenance actions in light of the physical investigation of the system's condition. A Partially Observable Markov Decision Process is proposed to recommend the most cost-effective decisions on inspection choice and maintenance action in two consecutive steps. An approximate method is developed to solve the proposed decision optimisation model and obtain the optimal policy. The sensitivity of the optimal policy is evaluated with respect to model parameters settings, such as the accuracy of the expert judgement.
APA, Harvard, Vancouver, ISO, and other styles
32

Chang, Yanling. "A leader-follower partially observed Markov game." Diss., Georgia Institute of Technology, 2015. http://hdl.handle.net/1853/54407.

Full text
Abstract:
The intent of this dissertation is to generate a set of non-dominated finite-memory policies from which one of two agents (the leader) can select a most preferred policy to control a dynamic system that is also affected by the control decisions of the other agent (the follower). The problem is described by an infinite horizon total discounted reward, partially observed Markov game (POMG). Each agent’s policy assumes that the agent knows its current and recent state values, its recent actions, and the current and recent possibly inaccurate observations of the other agent’s state. For each candidate finite-memory leader policy, we assume the follower, fully aware of the leader policy, determines a policy that optimizes the follower’s criterion. The leader-follower assumption allows the POMG to be transformed into a specially structured, partially observed Markov decision process that we use to determine the follower’s best response policy for a given leader policy. We then present a value determination procedure to evaluate the performance of the leader for a given leader policy, based on which non-dominated set of leader polices can be selected by existing heuristic approaches. We then analyze how the value of the leader’s criterion changes due to changes in the leader’s quality of observation of the follower. We give conditions that insure improved observation quality will improve the leader’s value function, assuming that changes in the observation quality do not cause the follower to change its policy. We show that discontinuities in the value of the leader’ criterion, as a function of observation quality, can occur when the change of observation quality is significant enough for the follower to change its policy. We present conditions that determine when a discontinuity may occur and conditions that guarantee a discontinuity will not degrade the leader’s performance. This framework has been used to develop a dynamic risk analysis approach for U.S. food supply chains and to compare and create supply chain designs and sequential control strategies for risk mitigation.
APA, Harvard, Vancouver, ISO, and other styles
33

Woodward, Mark P. "Framing Human-Robot Task Communication as a Partially Observable Markov Decision Process." Thesis, Harvard University, 2012. http://dissertations.umi.com/gsas.harvard:10188.

Full text
Abstract:
As general purpose robots become more capable, pre-programming of all tasks at the factory will become less practical. We would like for non-technical human owners to be able to communicate, through interaction with their robot, the details of a new task; I call this interaction "task communication". During task communication the robot must infer the details of the task from unstructured human signals, and it must choose actions that facilitate this inference. In this dissertation I propose the use of a partially observable Markov decision process (POMDP) for representing the task communication problem; with the unobservable task details and unobservable intentions of the human teacher captured in the state, with all signals from the human represented as observations, and with the cost function chosen to penalize uncertainty. This dissertation presents the framework, works through an example of framing task communication as a POMDP, and presents results from a user experiment where subjects communicated a task to a POMDP-controlled virtual robot and to a human controlled virtual robot. The task communicated in the experiment consisted of a single object movement and the communication in the experiment was limited to binary approval signals from the teacher. This dissertation makes three contributions: 1) It frames human-robot task communication as a POMDP, a widely used framework. This enables the leveraging of techniques developed for other problems framed as a POMDP. 2) It provides an example of framing a task communication problem as a POMDP. 3) It validates the framework through results from a user experiment. The results suggest that the proposed POMDP framework produces robots that are robust to teacher error, that can accurately infer task details, and that are perceived to be intelligent.
Engineering and Applied Sciences
APA, Harvard, Vancouver, ISO, and other styles
34

Morad, Olivia. "Prefetching control for on-demand contents distribution : a Markov decision process study." Thesis, Montpellier 2, 2014. http://www.theses.fr/2014MON20111/document.

Full text
Abstract:
Le contexte de la thèse porte sur le contrôle des réseaux de distribution de contenu à la demande. La performance des systèmes distribués interactifs dépend essentiellement sur la prévision du comportement de l'utilisateur et la bande passante en tant que ressource de réseau critique. Le préchargement est une approche prédictive bien connu dans le World Wide Web ce qui évite les délais de réponse en exploitant un temps d'arrêt que permet d'anticiper les futures demandes de l'utilisateur et prend avantage des ressources réseau disponibles. Le contrôle de préchargement est une opération vitale pour les systèmes à la demande interactifs où la réponse instantanée est le facteur crucial pour la réussite du système. Le contrôleur en ce type de système interactif fonctionne dans un environnement incertain et rend séquences de décisions à court et long terme effets stochastique. La difficulté est alors de déterminer à chaque état du système les contenus préchargés dans le cache. Le plan de préchargement pendant une session en flux continu interactif peut être modélisé comme un problème de décision séquentielle par les processus de décision de Markov (MDP). Nous nous concentrons sur le problème de contrôle de préchargement, dans lequel le contrôleur cherche à atteindre l'état du système à coût zéro aussi vite que possible. Nous modélisons ce problème de contrôle comme un problème de programmation dynamique stochastique négatif dans lequel nous minimisons le coût total prévu. Dans ce contexte, nous avons abordé les questions de recherche suivantes: 1) Comment fournir un politique de préchargement optimale/ approximative optimale qui maximise l'utilisation de la bande passante tout en minimisant les coûts de blocage et de la latence de l'utilisateur engagés sur le chemin? 2) Comment exploiter la structure du modèle de contrôle de préchargement pour aider efficacement calculer la politique de contrôle de préchargement avec la réduction des efforts de calcul et la mémoire de stockage? 3) Comment mener une étude d'évaluation pour évaluer le préchargement de différents algorithmes heuristiques basée sur le contexte de l'optimisation au lieu du cadre de l'empirique / simulation. Pour l'étude de notre problème de recherche, nous avons développé notre modèle MDP de préchargement, PREF-CT, nous avons établi ses propriétés théoriques et nous avons résolu par l'algorithme Value Iteration comme algorithme MDP pour calculer la politique de préchargement optimale. Pour calcul de la politique de préchargement optimale efficace, nous avons détecté une structure spéciale qui réalise un modèle de contrôle plus compact. Cette structure spéciale permet de développer deux algorithmes différents stratégiquement qui améliorent la complexité du calcul de la politique de préchargement optimale: - la première est « ONE-PASS » le second est « TREE-DEC ». Pour surmonter le problème de la dimensionnalité résultant du calcul de la politique de préchargement optimale, nous avons proposé l'algorithme de préchargement heuristique: « Relevant Blocks Prefetching » (RBP). Pour évaluer et comparer le préchargement politiques calculés par des algorithmes de préchargement heuristiques différents, nous avons présenté un cadre fondé sur des différentes mesures de performance. Nous avons appliqué le cadre proposé sous différentes configurations de coûts et différents comportements des utilisateurs pour évaluer les politiques de préchargement calculées par notre algorithme de préchargement proposé; RBP. Par rapport aux politiques de préchargement optimales, l'analyse expérimentale a prouvé des performances significatives des politiques de préchargement de l'heuristique du RBP algorithme. En outre, l'algorithme heuristique de préchargement; RBP se distingue par une propriété de clustériser qui est important pour réduire considérablement la mémoire nécessaire pour stocker la politique de préchargement
The thesis context is concerned with the control of theOn-demand contents distribution networks. The performance of suchinteractive distributed systems basically depends on the prediction ofthe user behavior and the bandwidth as a critical network resource.Prefetching is a well-known predictive approach in the World Wide Webwhich avoids the response delays by exploiting some downtime thatpermits to anticipate the user future requests and takes advantage ofthe available network resources. Prefetching control is a vitaloperation for the On-demand interactive systems where the instantaneousresponse is the crucial factor for the system success. The controller insuch type of interactive system operates in an uncertain environment andmakes sequences of decisions with long and short term stochasticeffects. The difficulty, then, is to determine at every system statewhich contents to prefetch into the cache. The prefetching plan duringan interactive streaming session can be modeled as a sequential decisionmaking problem by a Markov Decision Process (MDP). We focus on theprefetching control problem in which the controller seeks to reach aZero-Cost system state as quickly as possible. We model this controlproblem as a Negative Stochastic Dynamic Programming problem in which weminimize the undiscounted total expected cost. Within this context, weaddressed the following research questions: 1) How to provide anoptimal/approximate-optimal prefetching policy that, maximizes thebandwidth utilization while minimizes the user's blocking and latencycosts incurred along the way? 2) How to exploit structure in theprefetching control model to help efficiently compute such prefetchingcontrol policy with both computational efforts and storage memoryreduction? 3) How to conduct a performance evaluation study to evaluatedifferent prefetching heuristic algorithms based on the context of thecontrol optimization rather than the context of theempirical/simulation. For studying our research problem, we developedour MDP prefetching control model, PREF-CT, we established itstheoretical properties and we solved it by the Value Iteration algorithmas MDP algorithm for computing the optimal prefetching policy. Forcomputing the optimal prefetching policy efficiently, we detected aspecial structure that achieves more compact control model. This specialstructure permits to develop two strategically different algorithmswhich improve the complexities of computing the optimal prefetchingpolicy: - the first one is the ONE-PASS which is based mainly on solvinga system of linear equations simultaneously in only one iteration,whereas the second is the TREE-DEC which is based on Markov decisiontree decomposition in which sequential sets of systems of equations aresolved. For overcoming the problem of the curse of dimensionalityresulting from the computation of the optimal prefetching policy, weproposed the prefetching heuristic algorithm: the Relevant BlocksPrefetching algorithm (RBP). For evaluating and comparing prefetchingpolicies computed by different prefetching heuristic algorithms, wepresented a framework based on different performance measures. Weapplied the suggested framework under different costs configurations anddifferent user behaviors to evaluate the prefetching policies computedby our proposed prefetching heuristic algorithm; the RBP. Compared tothe optimal prefetching policies, the experimental analysis provedsignificant performance of the prefetching policies of the RBP heuristicalgorithm. In addition, the RBP prefetching heuristic algorithm isdistinguished by a clustering property which is of importance to reducesignificantly the memory necessary to store the prefetching policy tothe controller
APA, Harvard, Vancouver, ISO, and other styles
35

HUANG, HEXIANG. "APPLICATION OF VISUALIZATION IN URBAN PLANNING DECISION-MAKING PROCESS." University of Cincinnati / OhioLINK, 2004. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1100979099.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Selvi, Ersin Suleyman. "Cognitive Radar Applied To Target Tracking Using Markov Decision Processes." Thesis, Virginia Tech, 2018. http://hdl.handle.net/10919/81968.

Full text
Abstract:
The radio-frequency spectrum is a precious resource, with many applications and users, especially with the recent spectrum auction in the United States. Future platforms and devices, such as radars and radios, need to be adaptive to their spectral environment in order to continue serving the needs of their users. This thesis considers an environment with one tracking radar, a single target, and a communications system. The radar-communications coexistence problem is modeled as a Markov decision process (MDP), and reinforcement learning is applied to drive the radar to optimal behavior.
Master of Science
APA, Harvard, Vancouver, ISO, and other styles
37

Castro, Rivadeneira Pablo Samuel. "On planning, prediction and knowledge transfer in fully and partially observable Markov decision processes." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104525.

Full text
Abstract:
This dissertation addresses the problem of sequential decision making under uncertainty in large systems. The formalisms used to study this problem are fully and partially observable Markov Decision Processes (MDPs and POMDPs, respectively). The first contribution of this dissertation is a theoretical analysis of the behavior of POMDPs when only subsets of the observation set are used. One of these subsets is used to update the agent's state estimate, while the other subset contains observations the agent is interested in predicting and/or optimizing. The behaviors are formalized as three types of equivalence relations. The first groups states based on their values under optimal or general policies; the second groups states according to their ability to predict observations sequences; the third type isbased on bisimulation, which is a well known equivalence relation borrowed from concurrency theory.Bisimulation relations can be generalized to bisimulation metrics. This dissertation introduces bisimulation metrics for an MDP with temporally extended actions (formalized as options) and proposes a new bisimulation metric that provides atighter bound on the difference in optimal values. A new proof is provided for the convergence of an approximation method for computing bisimulation metrics that is based on statistical sampling, using only a finite number of samples. The newproof allows one to determine the minimum number of samples needed in order to achieve the desired quality of approximation with high probability.Although bisimulation metrics have been previously used for state space compression, this dissertation proposes using them to transfer policies from one MDP to another. In contrast to existing transfer work, the mapping between the twosystems is determined automatically by means of the bisimulation metrics. Theoretical results are provided that bound the loss in optimality incurred by the transferred policy. A number of algorithms are introduced which are evaluatedempirically in the context of planning and learning.
Cette thèse traite le problème de prises de décisions séquentielles en grand domaines. Les formalismes utilisés pour étudier ce problème sont processus de décision Markoviens entièrement ou partiellement observables (MDP et POMDPs, respectivement).La première contribution de cette thèse est une analyse théorique du comportement des POMDPs lorsque seulement sous-ensembles de l'ensemble d'observations sont utilisés. L'un de ces sous-ensembles est utilisé pour mettre à jour la confiance de l'agent sur son état actuel, tandis que l'autre est utilisé pour mesurer la performance de l'agent. Les comportements sont formalisés avec trois types de relations d'equivalence. La première relation place les états dans le même groupe en fonction de leurs valeurs en vertu des politiques optimales ou générales; la second relation place les etats dans le même groupe en fonction de leur capacité a predire sequences d'observations; la troisième relation est basé sur la bisimulation, qui est une relation d'equivalence bien connu emprunté à la théorie de la concurrence.Les relations de bisimulation peuvent être généralisés à métriques de bisimulation. Cette thèse présente métriques de bisimulation pour une MDP avec des actions prolongées (formalisées comme des options) et propose une nouvelle métrique de bisimulation qui fournit un resserrement des limites sur la différence de valeurs optimales. Une nouvelle preuve est fournie pour la convergence d'une méthode d'approximation pour le calcul le du métrique de bisimulation qui est basé sur un échantillonnage statistique. La nouvelle preuve permet de déterminer le nombre minimal d'échantillons nécessaires pour atteindre la qualité souhaitée de rapprochement avec une forte probabilité.Bien que mêtriques de bisimulation ont été précédemment utilisés pour la compression de l'espace d'état, cette thèse propose de les utiliser pour transférer des politiques d'un MDP à l'autre. Contrairement aux travaux de transfert existants,le mappage entre les deux systèmes est déterminé automatiquement par les métriques de bisimulation. Résultats théoriques sont présentés que limite la perte de l'optimalité encourus par la police transferée. Un certain nombre d'algorithmes sont introduites, qui sont évalués de façon empirique dans le contexte de la planification et de l'apprentissage.
APA, Harvard, Vancouver, ISO, and other styles
38

Chen, Yu Fan Ph D. Massachusetts Institute of Technology. "Hierarchical decomposition of multi-agent Markov decision processes with application to health aware planning." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/93795.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 99-104).
Multi-agent robotic systems have attracted the interests of both researchers and practitioners because they provide more capabilities and afford greater flexibility than single-agent systems. Coordination of individual agents within large teams is often challenging because of the combinatorial nature of such problems. In particular, the number of possible joint configurations is the product of that of every agent. Further, real world applications often contain various sources of uncertainties. This thesis investigates techniques to address the scalability issue of multi-agent planning under uncertainties. This thesis develops a novel hierarchical decomposition approach (HD-MMDP) for solving Multi-agent Markov Decision Processes (MMDPs), which is a natural framework for formulating stochastic sequential decision-making problems. In particular, the HD-MMDP algorithm builds a decomposition structure by exploiting coupling relationships in the reward function. A number of smaller subproblems are formed and are solved individually. The planning spaces of each subproblem are much smaller than that of the original problem, which improves the computational efficiency, and the solutions to the subproblems can be combined to form a solution (policy) to the original problem. The HD-MMDP algorithm is applied on a ten agent persistent search and track (PST) mission and shows more than 35% improvement over an existing algorithm developed specifically for this domain. This thesis also contributes to the development of the software infrastructure that enables hardware experiments involving multiple robots. In particular, the thesis presents a novel optimization based multi-agent path planning algorithm, which was tested in simulation and hardware (quadrotor) experiment. The HD-MMDP algorithm is also used to solve a multi-agent intruder monitoring mission implemented using real robots.
by Yu Fan Chen.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
39

West, Aaron P. "A decision support system for fabrication process planning in stereolithography." Thesis, Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/16896.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Rezende, Marisa Barcia Guaraldo Marcondes. "Planning and citizenship : decision-making in the planning process in Campo Grande, MS, Brazil." Thesis, University of Sheffield, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.240366.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Erdogdu, Utku. "Efficient Partially Observable Markov Decision Process Based Formulation Of Gene Regulatory Network Control Problem." Phd thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12614317/index.pdf.

Full text
Abstract:
The need to analyze and closely study the gene related mechanisms motivated the research on the modeling and control of gene regulatory networks (GRN). Dierent approaches exist to model GRNs
they are mostly simulated as mathematical models that represent relationships between genes. Though it turns into a more challenging problem, we argue that partial observability would be a more natural and realistic method for handling the control of GRNs. Partial observability is a fundamental aspect of the problem
it is mostly ignored and substituted by the assumption that states of GRN are known precisely, prescribed as full observability. On the other hand, current works addressing partially observability focus on formulating algorithms for the nite horizon GRN control problem. So, in this work we explore the feasibility of realizing the problem in a partially observable setting, mainly with Partially Observable Markov Decision Processes (POMDP). We proposed a POMDP formulation for the innite horizon version of the problem. Knowing the fact that POMDP problems suer from the curse of dimensionality, we also proposed a POMDP solution method that automatically decomposes the problem by isolating dierent unrelated parts of the problem, and then solves the reduced subproblems. We also proposed a method to enrich gene expression data sets given as input to POMDP control task, because in available data sets there are thousands of genes but only tens or rarely hundreds of samples. The method is based on the idea of generating more than one model using the available data sets, and then sampling data from each of the models and nally ltering the generated samples with the help of metrics that measure compatibility, diversity and coverage of the newly generated samples.
APA, Harvard, Vancouver, ISO, and other styles
42

Ni, Wenlong. "Optimal call admission control policies in wireless cellular networks using Semi Markov Decision Process /." Connect to full text in OhioLINK ETD Center, 2008. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=toledo1227028023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Wongsuphasawat, Luxmon. "Extended metropolitanisation and the process of industrial location decision-making in Thailand." Thesis, University of Hull, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Chippa, Mukesh K. "Goal-seeking Decision Support System to Empower Personal Wellness Management." University of Akron / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=akron1480413936639467.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Allen, Martha Paralee. "A constructivist study of the decision-making process in permanency planning." CSUSB ScholarWorks, 1993. https://scholarworks.lib.csusb.edu/etd-project/688.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Gokhale, Mihir. "Use of analytical hierarchy process in university strategy planning." Diss., Rolla, Mo. : University of Missouri-Rolla, 2007. http://scholarsmine.mst.edu/thesis/pdf/thesis_mihir_09007dcc804ef452.pdf.

Full text
Abstract:
Thesis (M.S.)--University of Missouri--Rolla, 2007.
Vita. The entire thesis text is included in file. Title from title screen of thesis/dissertation PDF file (viewed April 29, 2008) Includes bibliographical references (p. 96-100).
APA, Harvard, Vancouver, ISO, and other styles
47

Alasmari, Khalid R. "Novel methods for Reduced Energy and Time Consumption for Mobile Devices using Markov Decision Process." University of Toledo / OhioLINK, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1587906971444328.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Huh, Keun S. M. Massachusetts Institute of Technology. "Asian real estate investment : data utilization for the decision making process." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/42020.

Full text
Abstract:
Thesis (S.M. in Real Estate Development)--Massachusetts Institute of Technology, Dept. of Urban Studies and Planning, 2007.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (leaf 38).
Many investors in developed countries believe the Asian emerging market to be highly risky due to numerous uncertainties including limited market information to make sound investment decisions. However, still successful investment deals were completed by many investors who are already in such markets, and how those investors make sound investment decisions is in need on explanation. Therefore, this thesis explores institutional real estate investors' behavior on entrance to Asian emerging markets, their investment decision making process, and data utilization within each process. Through 14 qualitative in-depth interviews with key institutional investors in the U.S. and Korea, investment decision making processes and data utilization behaviors are examined and summarized in a generalized format. Interview results indicate that an initial entrance to Asian emerging markets occurs rather through a passive introduction of a potential deal by third party than an active search for a target deal by the investor. Also, the general decision-making process for real estate investment starts from sourcing and sorting, followed by review of a potential deal, primary investment committee meeting, due diligence, secondary investment committee meeting, and close of the deal. Each stage of the decision-making process is highly interconnected with simultaneous communication among responsible teams and the investment committee. The Korean real estate market, at least for the office sector, has such abundant market information available that it is rather perceived as "emerged." Investors also have extensive experience in the market and use their market knowledge and third party information as a reference point to verify the deal information. However, they also physically collect market data and use their own internal transaction record when necessary. Furthermore, data utilization in Korea is found to be easy and convenient especially when combined with extensive market experience and a wide professional network in the industry.
by Keun Huh.
S.M.in Real Estate Development
APA, Harvard, Vancouver, ISO, and other styles
49

Ngai, Ka-kui. "Web-based intelligent decision support system for optimization of polishing process planning." Click to view the E-thesis via HKUTO, 2007. http://sunzi.lib.hku.hk/HKUTO/record/B39558472.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Ngai, Ka-kui, and 魏家駒. "Web-based intelligent decision support system for optimization of polishing process planning." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2007. http://hub.hku.hk/bib/B39558472.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography