Dissertations / Theses on the topic 'Sequential decision processes'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 20 dissertations / theses for your research on the topic 'Sequential decision processes.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Saebi, Nasrollah. "Sequential decision procedures for point processes." Thesis, Birkbeck (University of London), 1987. http://eprints.kingston.ac.uk/8409/.
Full textRamsey, David Mark. "Models of evolution, interaction and learning in sequential decision processes." Thesis, University of Bristol, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.239085.
Full textWang, You-Gan. "Contributions to the theory of Gittins indices : with applications in pharmaceutical research and clinical trials." Thesis, University of Oxford, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.293423.
Full textEl, Khalfi Zeineb. "Lexicographic refinements in possibilistic sequential decision-making models." Thesis, Toulouse 3, 2017. http://www.theses.fr/2017TOU30269/document.
Full textThis work contributes to possibilistic decision theory and more specifically to sequential decision-making under possibilistic uncertainty, at both the theoretical and practical levels. Even though appealing for its ability to handle qualitative decision problems, possibilisitic decision theory suffers from an important drawback: qualitative possibilistic utility criteria compare acts through min and max operators, which leads to a drowning effect. To overcome this lack of decision power, several refinements have been proposed in the literature. Lexicographic refinements are particularly appealing since they allow to benefit from the expected utility background, while remaining "qualitative". However, these refinements are defined for the non-sequential decision problems only. In this thesis, we present results on the extension of the lexicographic preference relations to sequential decision problems, in particular, to possibilistic Decision trees and Markov Decision Processes. This leads to new planning algorithms that are more "decisive" than their original possibilistic counterparts. We first present optimistic and pessimistic lexicographic preference relations between policies with and without intermediate utilities that refine the optimistic and pessimistic qualitative utilities respectively. We prove that these new proposed criteria satisfy the principle of Pareto efficiency as well as the property of strict monotonicity. This latter guarantees that dynamic programming algorithm can be used for calculating lexicographic optimal policies. Considering the problem of policy optimization in possibilistic decision trees and finite-horizon Markov decision processes, we provide adaptations of dynamic programming algorithm that calculate lexicographic optimal policy in polynomial time. These algorithms are based on the lexicographic comparison of the matrices of trajectories associated to the sub-policies. This algorithmic work is completed with an experimental study that shows the feasibility and the interest of the proposed approach. Then we prove that the lexicographic criteria still benefit from an Expected Utility grounding, and can be represented by infinitesimal expected utilities. The last part of our work is devoted to policy optimization in (possibly infinite) stationary Markov Decision Processes. We propose a value iteration algorithm for the computation of lexicographic optimal policies. We extend these results to the infinite-horizon case. Since the size of the matrices increases exponentially (which is especially problematic in the infinite-horizon case), we thus propose an approximation algorithm which keeps the most interesting part of each matrix of trajectories, namely the first lines and columns. Finally, we reports experimental results that show the effectiveness of the algorithms based on the cutting of the matrices
Raffensperger, Peter Abraham. "Measuring and Influencing Sequential Joint Agent Behaviours." Thesis, University of Canterbury. Electrical and Computer Engineering, 2013. http://hdl.handle.net/10092/7472.
Full textDulac-Arnold, Gabriel. "A General Sequential Model for Constrained Classification." Thesis, Paris 6, 2014. http://www.theses.fr/2014PA066572.
Full textThis thesis introduces a body of work on sequential models for classification. These models allow for a more flexible and general approach to classification tasks. Many tasks ultimately require the classification of some object, but cannot be handled with a single atomic classification step. This is the case for tasks where information is either not immediately available upfront, or where the act of accessing different aspects of the object being classified may present various costs (due to time, computational power, monetary cost, etc.). The goal of this thesis is to introduce a new method, which we call datum-wise classification, that is able to handle these more complex classifications tasks by modelling them as sequential processes
Warren, Adam L. "Sequential decision-making under uncertainty /." *McMaster only, 2004.
Find full textZawaideh, Zaid. "Eliciting preferences sequentially using partially observable Markov decision processes." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=18794.
Full textLes systèmes d'aide à la décision ont gagné en importance récemment. Pourtant, un des problèmes importants liés au design de tels systèmes demeure: comprendre comment l'usager évalue les différents résultats, ou plus simplement, déterminer quelles sont ses préférences. L'extraction des préférences vise à éliminer certains aspects arbitraires du design d'agents de décision en offrant des méthodes plus formelles pour mesurer la qualité des résultats. Cette thèse tente de résoudre certains problèmes ayant trait à l'extraction des préférences, tel que celui de la haute dimensionnalité du problème sous-jacent. Le problème est formulé en tant que processus de décision markovien partiellement observable (POMDP), et utilise une représentation factorisée afin de profiter de la structure inhérente aux problèmes d'extraction des préférences. De plus, des connaissances simples quant aux caractéristiques de ces problèmes sont exploitées afin d'obtenir des préférences plus précises, sans pour autant augmenter la tâche de l'usager. Les actions terminales "sparse" sont définies de manière à permettre un compromis flexible entre vitesse et précision. Le résultat est un système assez flexible pour être appliqué à un grand nombre de domaines qui ont à faire face aux problèmes liés aux méthodes d'extraction des préférences.
Hoock, Jean-Baptiste. "Contributions to Simulation-based High-dimensional Sequential Decision Making." Phd thesis, Université Paris Sud - Paris XI, 2013. http://tel.archives-ouvertes.fr/tel-00912338.
Full textFilho, Ricardo Shirota. "Processos de decisão Markovianos com probabilidades imprecisas e representações relacionais: algoritmos e fundamentos." Universidade de São Paulo, 2012. http://www.teses.usp.br/teses/disponiveis/3/3152/tde-13062013-160912/.
Full textThis work is devoted to the theoretical and algorithmic development of Markov Decision Processes with Imprecise Probabilities and relational representations. In the literature, this configuration is important within artificial intelligence planning, where the use of relational representations allow compact representations and imprecise probabilities result in a more general form of uncertainty. There are three main contributions. First, we present a brief discussion of the foundations of decision making with imprecise probabilities, pointing towards key questions that remain unanswered. These results have direct influence upon the model discussed within this text, that is, Markov Decision Processes with Imprecise Probabilities. Second, we propose three algorithms for Markov Decision Processes with Imprecise Probabilities based on mathematical programming. And third, we develop ideas proposed by Trevizan, Cozman e de Barros (2008) on the use of variants of Real-Time Dynamic Programming to solve problems of probabilistic planning described by an extension of the Probabilistic Planning Domain Definition Language (PPDDL).
Ernsberger, Timothy S. "Integrating Deterministic Planning and Reinforcement Learning for Complex Sequential Decision Making." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1354813154.
Full textCouetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Full textIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Poolla, Radhika. "A Reinforcement Learning Approach To Obtain Treatment Strategies In Sequential Medical Decision Problems." [Tampa, Fla.] : University of South Florida, 2003. http://purl.fcla.edu/fcla/etd/SFE0000215.
Full textHadoux, Emmanuel. "Markovian sequential decision-making in non-stationary environments : application to argumentative debates." Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066489/document.
Full textIn sequential decision-making problems under uncertainty, an agent makes decisions, one after another, considering the current state of the environment where she evolves. In most work, the environment the agent evolves in is assumed to be stationary, i.e., its dynamics do not change over time. However, the stationarity hypothesis can be invalid if, for instance, exogenous events can occur. In this document, we are interested in sequential decision-making in non-stationary environments. We propose a new model named HS3MDP, allowing us to represent non-stationary problems whose dynamics evolve among a finite set of contexts. In order to efficiently solve those problems, we adapt the POMCP algorithm to HS3MDPs. We also present RLCD with SCD, a new method to learn the dynamics of the environments, without knowing a priori the number of contexts. We then explore the field of argumentation problems, where few works consider sequential decision-making. We address two types of problems: stochastic debates (APS ) and mediation problems with non-stationary agents (DMP). In this work, we present a model formalizing APS and allowing us to transform them into an MOMDP in order to optimize the sequence of arguments of one agent in the debate. We then extend this model to DMPs to allow a mediator to strategically organize speak-turns in a debate
Li, Yongchang. "An Intelligent, Knowledge-based Multiple Criteria Decision Making Advisor for Systems Design." Diss., Georgia Institute of Technology, 2007. http://hdl.handle.net/1853/14559.
Full textDi, Caro Gianni. "Ant colony optimization and its application to adaptive routing in telecommunication networks." Doctoral thesis, Universite Libre de Bruxelles, 2004. http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/211149.
Full textThe simultaneous presence of these and other fascinating and unique characteristics have made ant societies an attractive and inspiring model for building new algorithms and new multi-agent systems. In the last decade, ant societies have been taken as a reference for an ever growing body of scientific work, mostly in the fields of robotics, operations research, and telecommunications.
Among the different works inspired by ant colonies, the Ant Colony Optimization metaheuristic (ACO) is probably the most successful and popular one. The ACO metaheuristic is a multi-agent framework for combinatorial optimization whose main components are: a set of ant-like agents, the use of memory and of stochastic decisions, and strategies of collective and distributed learning.
It finds its roots in the experimental observation of a specific foraging behavior of some ant colonies that, under appropriate conditions, are able to select the shortest path among few possible paths connecting their nest to a food site. The pheromone, a volatile chemical substance laid on the ground by the ants while walking and affecting in turn their moving decisions according to its local intensity, is the mediator of this behavior.
All the elements playing an essential role in the ant colony foraging behavior were understood, thoroughly reverse-engineered and put to work to solve problems of combinatorial optimization by Marco Dorigo and his co-workers at the beginning of the 1990's.
From that moment on it has been a flourishing of new combinatorial optimization algorithms designed after the first algorithms of Dorigo's et al. and of related scientific events.
In 1999 the ACO metaheuristic was defined by Dorigo, Di Caro and Gambardella with the purpose of providing a common framework for describing and analyzing all these algorithms inspired by the same ant colony behavior and by the same common process of reverse-engineering of this behavior. Therefore, the ACO metaheuristic was defined a posteriori, as the result of a synthesis effort effectuated on the study of the characteristics of all these ant-inspired algorithms and on the abstraction of their common traits.
The ACO's synthesis was also motivated by the usually good performance shown by the algorithms (e.g. for several important combinatorial problems like the quadratic assignment, vehicle routing and job shop scheduling, ACO implementations have outperformed state-of-the-art algorithms).
The definition and study of the ACO metaheuristic is one of the two fundamental goals of the thesis. The other one, strictly related to this former one, consists in the design, implementation, and testing of ACO instances for problems of adaptive routing in telecommunication networks.
This thesis is an in-depth journey through the ACO metaheuristic, during which we have (re)defined ACO and tried to get a clear understanding of its potentialities, limits, and relationships with other frameworks and with its biological background. The thesis takes into account all the developments that have followed the original 1999's definition, and provides a formal and comprehensive systematization of the subject, as well as an up-to-date and quite comprehensive review of current applications. We have also identified in dynamic problems in telecommunication networks the most appropriate domain of application for the ACO ideas. According to this understanding, in the most applicative part of the thesis we have focused on problems of adaptive routing in networks and we have developed and tested four new algorithms.
Adopting an original point of view with respect to the way ACO was firstly defined (but maintaining full conceptual and terminological consistency), ACO is here defined and mainly discussed in the terms of sequential decision processes and Monte Carlo sampling and learning.
More precisely, ACO is characterized as a policy search strategy aimed at learning the distributed parameters (called pheromone variables in accordance with the biological metaphor) of the stochastic decision policy which is used by so-called ant agents to generate solutions. Each ant represents in practice an independent sequential decision process aimed at constructing a possibly feasible solution for the optimization problem at hand by using only information local to the decision step.
Ants are repeatedly and concurrently generated in order to sample the solution set according to the current policy. The outcomes of the generated solutions are used to partially evaluate the current policy, spot the most promising search areas, and update the policy parameters in order to possibly focus the search in those promising areas while keeping a satisfactory level of overall exploration.
This way of looking at ACO has facilitated to disclose the strict relationships between ACO and other well-known frameworks, like dynamic programming, Markov and non-Markov decision processes, and reinforcement learning. In turn, this has favored reasoning on the general properties of ACO in terms of amount of complete state information which is used by the ACO's ants to take optimized decisions and to encode in pheromone variables memory of both the decisions that belonged to the sampled solutions and their quality.
The ACO's biological context of inspiration is fully acknowledged in the thesis. We report with extensive discussions on the shortest path behaviors of ant colonies and on the identification and analysis of the few nonlinear dynamics that are at the very core of self-organized behaviors in both the ants and other societal organizations. We discuss these dynamics in the general framework of stigmergic modeling, based on asynchronous environment-mediated communication protocols, and (pheromone) variables priming coordinated responses of a number of ``cheap' and concurrent agents.
The second half of the thesis is devoted to the study of the application of ACO to problems of online routing in telecommunication networks. This class of problems has been identified in the thesis as the most appropriate for the application of the multi-agent, distributed, and adaptive nature of the ACO architecture.
Four novel ACO algorithms for problems of adaptive routing in telecommunication networks are throughly described. The four algorithms cover a wide spectrum of possible types of network: two of them deliver best-effort traffic in wired IP networks, one is intended for quality-of-service (QoS) traffic in ATM networks, and the fourth is for best-effort traffic in mobile ad hoc networks.
The two algorithms for wired IP networks have been extensively tested by simulation studies and compared to state-of-the-art algorithms for a wide set of reference scenarios. The algorithm for mobile ad hoc networks is still under development, but quite extensive results and comparisons with a popular state-of-the-art algorithm are reported. No results are reported for the algorithm for QoS, which has not been fully tested. The observed experimental performance is excellent, especially for the case of wired IP networks: our algorithms always perform comparably or much better than the state-of-the-art competitors.
In the thesis we try to understand the rationale behind the brilliant performance obtained and the good level of popularity reached by our algorithms. More in general, we discuss the reasons of the general efficacy of the ACO approach for network routing problems compared to the characteristics of more classical approaches. Moving further, we also informally define Ant Colony Routing (ACR), a multi-agent framework explicitly integrating learning components into the ACO's design in order to define a general and in a sense futuristic architecture for autonomic network control.
Most of the material of the thesis comes from a re-elaboration of material co-authored and published in a number of books, journal papers, conference proceedings, and technical reports. The detailed list of references is provided in the Introduction.
Doctorat en sciences appliquées
info:eu-repo/semantics/nonPublished
Wei, Wei. "Stochastic Dynamic Optimization and Games in Operations Management." Case Western Reserve University School of Graduate Studies / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=case1354751981.
Full textSantos, Hugo Henrique Kegler dos. "Procedimentos sequenciais Bayesianos aplicados ao processo de captura-recaptura." Universidade Federal de São Carlos, 2014. https://repositorio.ufscar.br/handle/ufscar/4494.
Full textFinanciadora de Estudos e Projetos
In this work, we make a study of the Bayes sequential decision procedure applied to capture-recapture with fixed sample sizes, to estimate the size of a finite and closed population process. We present the statistical model, review the Bayesian decision theory, presenting the pure decision problem, the statistical decision problem and the sequential decision procedure. We illustrate the theoretical methods discussed using simulated data.
Neste trabalho, fazemos um estudo do procedimento de decisão sequencial de Bayes aplicado ao processo de captura-recaptura com tamanhos amostrais fixados, para estimação do tamanho de uma população finita e fechada. Apresentamos o modelo estatístico, revisamos a teoria de decisão bayesiana, apresentando o problema de decisão puro, o problema de decisão estatística e o procedimento de decisão sequencial. Ilustramos os métodos teóricos discutidos através de dados simulados.
Grand-Clement, Julien. "Robust and Interpretable Sequential Decision-Making for Healthcare." Thesis, 2021. https://doi.org/10.7916/d8-maqq-mp30.
Full textKhan, Omar Zia. "Policy Explanation and Model Refinement in Decision-Theoretic Planning." Thesis, 2013. http://hdl.handle.net/10012/7808.
Full text