Log in

Relevant bibliographies by topics / Approximate dynamic programming / Dissertations / Theses

To see the other types of publications on this topic, follow the link: Approximate dynamic programming.

Dissertations / Theses on the topic 'Approximate dynamic programming'

Author: Grafiati

Published: 4 June 2021

Last updated: 20 February 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Approximate dynamic programming.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Sadiq, Mohammad. "Approximate Dynamic Programming Methods in HEVs." Thesis, KTH, Maskinkonstruktion (Inst.), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182762.

Full text

Abstract:

Hybrid Electric Vehicles (HEV) have been gaining popularity worldwide for their efficient fuel consumption and therefore an overall reduction in the oil demand. This greatly benefits the environment since this leads to lesser emissions and hence lower greenhouse effect. Therefore research in this field is very active with a demand for new and better fuel consumption strategies. Many different methods for the energy management of HEV are being used, one particular method which promises global optimality is Dynamic Programming. Dynamic Programming yields a global optimum results but suffers from high computation cost. Among the different methods to counter this curse of dimensionality one of the popular is Approximate Dynamic Programming (ADP). This thesis investigates the literature on the different methods of ADP applied to HEV and an implementation showing a reduction in the computation time for a specific HEV energy management problem.
Elhybridfordon (HEV) har ökat i popularitet över hela världen pga sin låga bränsleförbrukning, vilket har minskat efterfrågan på olja. Detta gynnar i hög grad miljön, eftersom detta leder till mindre utsläpp och följaktligen en lägre växthuseffekt. Därför pågår aktiv forskning inom detta område med en ökad efterfrågan på nya och bättre strategier för bränsleförbrukning. Många olika metoder för energihantering av HEV använder en särskild metod; dynamisk programmering. Dynamisk programmering ger ett optimalt globalt resultat men på bekostnad av längre beräkningstider. Den mest använda metoden för att motverka denna typ av problematik i högdimensionella system är Approximate Dynamic Programming (ADP). Denna avhandling undersöker och beskriver litteraturen på de olika metoderna för ADP tillämpade på HEV samt en genomförandefas som visar en minskning av beräkningstiden för ett HEV-problem gällande energihantering.

APA, Harvard, Vancouver, ISO, and other styles

2

Vyzas, Elias. "Approximate dynamic programming for some queueing problems." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10282.

Full text

Abstract:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 1997, and Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering, 1997.
Includes bibliographical references (p. 81-82).
by Elias Vyzas.
M.S.

APA, Harvard, Vancouver, ISO, and other styles

3

Sauré, Antoine. "Approximate dynamic programming methods for advance patient scheduling." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/43448.

Full text

Abstract:

This dissertation studies an advance multi-priority patient scheduling problem. Patrick et al. (2008) formulated a version of this problem as a discounted infinite-horizon Markov decision process (MDP) and studied it using a linear programming method based on an affine value function approximation. This thesis starts by presenting an alternative solution approach for this problem based on the use of simulation, a policy iteration framework and a non-linear value function approximation. It then extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. by considering patients who receive service across multiple days and for irregular lengths of time, and by allowing the possibility of using overtime on different days of the booking horizon. The research described in this dissertation is based on the hypothesis that some patients can be booked further into the future allowing the appointments for urgent patients to be scheduled earlier, and it seeks to identify effective policies for allocating available service capacity to incoming demand while reducing patient wait times in a cost-effective manner. Through the use of approximate dynamic programming techniques, it illustrates the importance of adequately assessing the future impact of today's decisions in order to more intelligently allocate capacity. Chapter 1 provides an overview of the multi-priority patient scheduling problem and a review of the literature relevant to it. Chapter 2 describes a simulation-based algorithm for solving a version of this problem and compares the performance of the resulting appointment scheduling policies against the performance of four other policies, including the one derived from the linear programming method. Chapter 3 extends the dynamic multi-priority patient scheduling model and solution approach developed by Patrick et al. It presents a discounted infinite-horizon MDP model for scheduling cancer treatments in radiation therapy units and a linear programming method for solving it. The benefits from the proposed method are evaluated by simulating its performance for a practical example based on data provided by the British Columbia Cancer Agency. Chapter 4 describes a teaching tool developed to illustrate advance patient scheduling practices to health care professionals and students. Finally, this dissertation concludes with additional discussion, extensions and further applications.

APA, Harvard, Vancouver, ISO, and other styles

4

Child, Christopher H. T. "Approximate dynamic programming with parallel stochastic planning operators." Thesis, City University London, 2011. http://openaccess.city.ac.uk/1109/.

Full text

Abstract:

This thesis presents an approximate dynamic programming (ADP) technique for environment modelling agents. The agent learns a set of parallel stochastic planning operators (P-SPOs) by evaluating changes in its environment in response to actions, using an association rule mining approach. An approximate policy is then derived by iteratively improving state value aggregation estimates attached to the operators using the P-SPOs as a model in a Dyna-Q-like architecture. Reinforcement learning and dynamic programming are powerful techniques for automated agent decision making in stochastic environments. Dynamic programming is effective when there is a known environment model, while reinforcement learning is effective when a model is not available. The techniques derive a policy: a mapping from each environment state to an action which optimizes the long term reward the agent receives. The standard methods become less effective as the state space for the environment increases because they require values to be associated with each state, the storage and processing of which is exponential to the number of state variables. Resolving this “curse of dimensionality” is an important topic of research amongst all communities working on this problem. Two key methods are to: (i) derive an estimate of the value (approximate dynamic programming) using function approximation or state aggregation; or (ii) build a model of the environment from experience. This thesis presents a method of combining these approaches by exploiting structure in the state transition and value functions captured in a set of planning operators which are learnt through experience in the environment. Standard planning operators define the deterministic changes that occur in an environment in response to an action. This work presents Parallel Stochastic Planning Operators (P-SPOs), a novel form of planning operator providing a structured model of the state transition function in environments which are both non-deterministic and for which changes can occur outside the influence of actions. Next, an automated method for extracting P-SPOs from observations in an environment is explored using an adaptation of association rule mining. Finally, methods of relating the state transition structure encapsulated in the P-SPOs to state values, using the operators to store state value aggregation estimates, are evaluated. The framework described provides a method by which approximate dynamic programming can be applied by designers of AI agents and AI planning systems for which they have minimal prior knowledge. The framework and P-SPO based implementations are tested against standard techniques in two bench-mark stochastic environments: a “slippery gripper” block painting robot; and a “predator-prey” agent environment. Experimental results show that an agent using a P-SPO-based approach is able to learn an accurate model of its environment if successor state variables exhibit conditional independence, and an approximate model in the non-independent case. Results also demonstrate that the agent’s ability to generalise to previously unseen states using the model allow it to form an improved policy over an agent employing a standard Dyna-Q based technique. Finally, an approximate policy stored in state aggregation estimates attached to operators is shown to be optimal in experiments for which the P-SPO set contains sufficient information for effective aggregations to be formed.

APA, Harvard, Vancouver, ISO, and other styles

5

Liu, Ning. "Approximate dynamic programming algorithms for production-planning problems." Thesis, Wichita State University, 2013. http://hdl.handle.net/10057/10636.

Full text

Abstract:

The capacitated lot-sizing problem (CLSP) is a core problem for successfully reducing overall costs in any production process. The exact approaches proposed for solving the CLSP are based on two major methods: mixed-integer programming and dynamic programming. This thesis provides a new idea for approximating the inventory cost function to be used in a truncated dynamic program for solving the CLSP. In the proposed method, by using only a partial dynamic process, the inventory cost function is approximated, and then the resulting approximate cost function is used as a value function in each stage of the approximate dynamic program. In this thesis, six different algorithms are developed for the CLSP, based on three different types of approximate dynamic programming approaches. The general methodology combines dynamic programming with data fitting and approximation techniques to estimate the inventory cost function at each stage of the dynamic program. Furthermore, three main algorithmic frameworks to compute a piecewise linear approximate inventory cost function for the CLSP are provided. The first approach integrates regression models into an approximate dynamic program. The second approach uses the information obtained by a partial dynamic process to approximate the piecewise linear inventory cost function. The third approach uses slope-check and bisection techniques to locate the breakpoints of the piecewise linear function in order to approximate the inventory cost function for the CLSP. The effectiveness of the proposed methods are analyzed on various types of CLSP instances with different cost and capacity characteristics. Computational results show that approximation approaches could considerably decrease the computational time required by the dynamic program and the integer program for different CLSP instances. Furthermore, in most cases, some of the proposed approaches can accurately capture the optimal solution of the problem.
Thesis (M.S.)--Wichita State University, College of Engineering, Dept. of Industrial and Manufacturing Engineering

APA, Harvard, Vancouver, ISO, and other styles

6

Demir, Ramazan. "An approximate dynamic programming approach to discrete optimization." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/9137.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Sloan School of Management, Operations Research Center, 2000.
Includes bibliographical references (leaves 181-189).
We develop Approximate Dynamic Programming (ADP) methods to integer programming problems. We describe and investigate parametric, nonparametric and base-heuristic learning approaches to approximate the value function in order to break the curse of dimensionality. Through an extensive computational study we illustrate that our ADP approach to integer programming competes successfully with existing methodologies including state of art commercial packages like CPLEX. Our benchmarks for comparison are solution quality, running time and robustness (i.e., small deviations in the computational resources such as running time for varying instances of same size). In this thesis, we particularly focus on knapsack problems and the binary integer programming problem. We explore an integrated approach to solve discrete optimization problems by unifying optimization techniques with statistical learning. Overall, this research illustrates that the ADP is a promising technique by providing near-optimal solutions within reasonable amount of computation time especially for large scale problems with thousands of variables and constraints. Thus, Approximate Dynamic Programming can be considered as a new alternative to existing approximate methods for discrete optimization problems.
by Ramazan Demir.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

7

Cai, C. "Adaptive traffic signal control using approximate dynamic programming." Thesis, University College London (University of London), 2010. http://discovery.ucl.ac.uk/20164/.

Full text

Abstract:

This thesis presents a study on an adaptive traffic signal controller for real-time operation. An approximate dynamic programming (ADP) algorithm is developed for controlling traffic signals at isolated intersection and in distributed traffic networks. This approach is derived from the premise that classic dynamic programming is computationally difficult to solve, and approximation is the second-best option for establishing sequential decision-making for complex process. The proposed ADP algorithm substantially reduces computational burden by using a linear approximation function to replace the exact value function of dynamic programming solution. Machine-learning techniques are used to improve the approximation progressively. Not knowing the ideal response for the approximation to learn from, we use the paradigm of unsupervised learning, and reinforcement learning in particular. Temporal-difference learning and perturbation learning are investigated as appropriate candidates in the family of unsupervised learning. We find in computer simulation that the proposed method achieves substantial reduction in vehicle delays in comparison with optimised fixed-time plans, and is competitive against other adaptive methods in computational efficiency and effectiveness in managing varying traffic. Our results show that substantial benefits can be gained by increasing the frequency at which the signal plans are revised. The proposed ADP algorithm is in compliance with a range of discrete systems of resolution from 0.5 to 5 seconds per temporal step. This study demonstrates the readiness of the proposed approach for real-time operations at isolated intersections and the potentials for distributed network control.

APA, Harvard, Vancouver, ISO, and other styles

8

Nadarajah, Selvaprabu. "Approximate Dynamic Programming for Commodity and Energy Merchant Operations." Research Showcase @ CMU, 2014. http://repository.cmu.edu/dissertations/350.

Full text

Abstract:

We study the merchant operations of commodity and energy conversion assets. Examples of such assets include natural gas pipelines systems, commodity swing options, and power plants. Merchant operations involves managing these assets as real options on commodity and energy prices with the objective of maximizing the market value of these assets. The economic relevance of natural gas conversion assets has increased considerably since the occurrence of the oil and gas shale boom; for example, the Energy Information Agency expects natural gas to be the source of 30% of the world's electricity production by 2040 and the McKinsey Global Institute projects United States spending on energy infrastructure to be about 100 Billion dollars by 2020. Managing commodity and energy conversion assets can be formulated as intractable Markov decision problems (MDPs), especially when using high dimensional price models commonly employed in practice. We develop approximate dynamic programming (ADP) methods for computing near optimal policies and lower and upper bounds on the market value of these assets. We focus on overcoming issues with the standard math programming and financial engineering ADP methods, that is, approximate linear programing (ALP) and least squares Monte Carlo (LSM), respectively. In particular, we develop: (i) a novel ALP relaxation framework to improve the ALP approach and use it to derive two new classes of ALP relaxations; (ii) an LSM variant in the context of popular practice-based price models to alleviate the substantial computational overhead when estimating upper bounds on the market value using existing LSM variants; and (iii) a mixed integer programming based ADP method that is exact with respect to a policy performance measure, while methods in the literature are heuristic in nature. Computational experiments on realistic instances of natural gas storage and crude oil swing options show that both our ALP relaxations and LSM methods are efficient and deliver near optimal policies and tight lower and upper bounds. Our LSM variant is also between one and three orders of magnitude faster than existing LSM variants for estimating upper bounds. Our mixed integer programming ADP model is computationally expensive to solve but its exact nature motivates further research into its solution. We provide theoretical support for our methods: By deriving bounds on approximation error we establish the optimality of our best ALP relaxation class in limiting regimes of practical relevance and provide a theoretical perspective on the relative performance of our LSM variant and existing LSM variants. We also unify different ADP methods in the literature using our ALP relaxation framework, including the financial engineering based LSM method. In addition, we employ ADP to study the novel application of jointly managing storage and transport assets in a natural gas pipeline system; the literature studies these assets in isolation. We leverage our structural analysis of the optimal storage policy to extend an LSM variant for this problem. This extension computes near optimal policies and tight bounds on instances formulated in collaboration with a major natural gas trading company. We use our extension and these instances to answer questions relevant to merchants managing such assets. Overall, our findings highlight the role of math programming for developing ADP methods. Although we focus on managing commodity and energy conversion assets, the techniques in this thesis have potential broader relevance for solving MDPs in other application contexts, such as inventory control with demand forecast updating, multiple sourcing, and optimal medical treatment design.

APA, Harvard, Vancouver, ISO, and other styles

9

Bethke, Brett (Brett M. ). "Kernel-based approximate dynamic programming using Bellman residual elimination." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/57544.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Aeronautics and Astronautics, 2010.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student submitted PDF version of thesis.
Includes bibliographical references (p. 207-221).
Many sequential decision-making problems related to multi-agent robotic systems can be naturally posed as Markov Decision Processes (MDPs). An important advantage of the MDP framework is the ability to utilize stochastic system models, thereby allowing the system to make sound decisions even if there is randomness in the system evolution over time. Unfortunately, the curse of dimensionality prevents most MDPs of practical size from being solved exactly. One main focus of the thesis is on the development of a new family of algorithms for computing approximate solutions to large-scale MDPs. Our algorithms are similar in spirit to Bellman residual methods, which attempt to minimize the error incurred in solving Bellman's equation at a set of sample states. However, by exploiting kernel-based regression techniques (such as support vector regression and Gaussian process regression) with nondegenerate kernel functions as the underlying cost-to-go function approximation architecture, our algorithms are able to construct cost-to-go solutions for which the Bellman residuals are explicitly forced to zero at the sample states. For this reason, we have named our approach Bellman residual elimination (BRE). In addition to developing the basic ideas behind BRE, we present multi-stage and model-free extensions to the approach. The multistage extension allows for automatic selection of an appropriate kernel for the MDP at hand, while the model-free extension can use simulated or real state trajectory data to learn an approximate policy when a system model is unavailable.
(cont.) We present theoretical analysis of all BRE algorithms proving convergence to the optimal policy in the limit of sampling the entire state space, and show computational results on several benchmark problems. Another challenge in implementing control policies based on MDPs is that there may be parameters of the system model that are poorly known and/or vary with time as the system operates. System performance can suer if the model used to compute the policy differs from the true model. To address this challenge, we develop an adaptive architecture that allows for online MDP model learning and simultaneous re-computation of the policy. As a result, the adaptive architecture allows the system to continuously re-tune its control policy to account for better model information 3 obtained through observations of the actual system in operation, and react to changes in the model as they occur. Planning in complex, large-scale multi-agent robotic systems is another focus of the thesis. In particular, we investigate the persistent surveillance problem, in which one or more unmanned aerial vehicles (UAVs) and/or unmanned ground vehicles (UGVs) must provide sensor coverage over a designated location on a continuous basis. This continuous coverage must be maintained even in the event that agents suer failures over the course of the mission. The persistent surveillance problem is pertinent to a number of applications, including search and rescue, natural disaster relief operations, urban traffic monitoring, etc.
(cont.) Using both simulations and actual flight experiments conducted in the MIT RAVEN indoor flight facility, we demonstrate the successful application of the BRE algorithms and the adaptive MDP architecture in achieving high mission performance despite the random occurrence of failures. Furthermore, we demonstrate performance benefits of our approach over a deterministic planning approach that does not account for these failures.
by Brett M. Bethke.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

10

Valenti, Mario J. (Mario James) 1976. "Approximate dynamic programming with applications in multi-agent systems." Thesis, Massachusetts Institute of Technology, 2007. http://hdl.handle.net/1721.1/40330.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
MIT Institute Archives copy: contains CDROM of thesis in .pdf format.
Includes bibliographical references (p. 151-161).
This thesis presents the development and implementation of approximate dynamic programming methods used to manage multi-agent systems. The purpose of this thesis is to develop an architectural framework and theoretical methods that enable an autonomous mission system to manage real-time multi-agent operations. To meet this goal, we begin by discussing aspects of the real-time multi-agent mission problem. Next, we formulate this problem as a Markov Decision Process (MDP) and present a system architecture designed to improve mission-level functional reliability through system self-awareness and adaptive mission planning. Since most multi-agent mission problems are computationally difficult to solve in real-time, approximation techniques are needed to find policies for these large-scale problems. Thus, we have developed theoretical methods used to find feasible solutions to large-scale optimization problems. More specifically, we investigate methods designed to automatically generate an approximation to the cost-to-go function using basis functions for a given MDP. Next, these these techniques are used by an autonomous mission system to manage multi-agent mission scenarios. Simulation results using these methods are provided for a large-scale mission problem. In addition, this thesis presents the implementation of techniques used to manage autonomous unmanned aerial vehicles (UAVs) performing persistent surveillance operations. We present an indoor multi-vehicle testbed called RAVEN (Real-time indoor Autonomous Vehicle test ENvironment) that was developed to study long-duration missions in a controlled environment.
(cont.) The RAVEN's design allows researchers to focus on high-level tasks by autonomously managing the platform's realistic air and ground vehicles during multi-vehicle operations, thus promoting the rapid prototyping of UAV technologies by flight testing new vehicle configurations and algorithms without redesigning vehicle hardware. Finally, using the RAVEN, we present flight test results from autonomous, extended mission tests using the technologies developed in this thesis. Flight results from a 24 hr, fully-autonomous air vehicle flight-recharge test and an autonomous, multi-vehicle extended mission test using small, electric-powered air vehicles are provided.
by Mario J. Valenti.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

11

Keng, Leng Hui. "Approximate String Matching With Dynamic Programming and Suffix Trees." UNF Digital Commons, 2006. http://digitalcommons.unf.edu/etd/196.

Full text

Abstract:

The importance and the contribution of string matching algorithms to the modern society cannot be overstated. From basic search algorithms such as spell checking and data querying, to advanced algorithms such as DNA sequencing, trend analysis and signal processing, string matching algorithms form the foundation of many aspects in computing that have been pivotal in technological advancement. In general, string matching algorithms can be divided into the categories of exact string matching and approximate string matching. We study each area and examine some of the well known algorithms. We probe into one of the most intriguing data structure in string algorithms, the suffix tree. The lowest common ancestor extension of the suffix tree is the key to many advanced string matching algorithms. With these tools, we are able to solve string problems that were, until recently, thought intractable by many. Another interesting and relatively new data structure in string algorithms is the suffix array, which has significant breakthroughs in its linear time construction in recent years. Primarily, this thesis focuses on approximate string matching using dynamic programming and hybrid dynamic programming with suffix tree. We study both approaches in detail and see how the merger of exact string matching and approximate string matching algorithms can yield synergistic results in our experiments.

APA, Harvard, Vancouver, ISO, and other styles

12

Boussios, Constantinos I. "An approach for nonlinear control design via approximate dynamic programming." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9792.

Full text

Abstract:

Thesis (Ph.D.)--Massachusetts Institute of Technology, Dept. of Mechanical Engineering, 1998.
Includes bibliographical references (p. 173-181).
This thesis proposes and studies a methodology for designing controllers for nonlinear dynamic systems. We are interested in state feedback controllers (policies) that stabilize the state in a given region around an equilibrium point while minimizing a cost functional that captures the performance of the closed loop system. The optimal control problem can be solved in principle using dynamic programming algorithms such as policy iteration. Exact policy iteration is computationally infeasible for systems of even moderate dimension, which leads us to consider methods based on Approximate Policy Iteration. In such methods, we first select an approximation architecture (i.e., a parametrized class of functions) that is used to approximate the cost-to-go function under a given policy, on the basis of cost samples that are obtained through simulation. The resulting approximate cost function is used to derive another, hopefully better policy, and the procedure is repeated iteratively. There are several case studies exploring the use of this methodology, but they are of limited generality, and without much of a theoretical foundation. This thesis explores the soundness of Approximate Policy Iteration. We address the problem of improving the performance of a given stabilizing controller, as well as the problem of designing stabilizing controllers for unstable systems. For the first problem, we develop bounds on the approximation error that can be tolerated if we wish to guarantee that the resulting controllers are stabilizing and/or offer improved performance. We give bounds on the suboptimality of the resulting controllers, in terms of the assumed approximation errors. We also extend the methodology to the unstable case by introducing an appropriate modification of the optimal control problem. The computational burden of cost function approximation can be often reduced, thereby enhancing the practicality of the method, by exploiting special structure. We illustrate this for a special class of nonlinear systems with fast linear and slow nonlinear dynamics. We also present an approximation based on state space gridding, whose performance can be evaluated via a systematic test. Finally, analysis is supported by applying Approximate Policy Iteration to two specific problems, one involving a missile model and the other involving a beam-and-ball model.
by Constantinos I. Boussios.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

13

Keller, Philipp Wilhelm. "Automatic basis function construction for reinforcement learning and approximate dynamic programming." Thesis, McGill University, 2008. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=19284.

Full text

Abstract:

We address the problem of automatically constructing basis functions for linear approximation of the value function of a Markov decision process (MDP). Our work builds on results by Bertsekas and Casta˜non (1989) who proposed a method for automatically aggregating states to speed up value iteration. We propose to use neighbourhood component analysis , a dimensionality reduction technique created for supervised learning, in order to map a high-dimensional state space to a low-dimensional space, based on the Bellman error, or on the temporal difference (TD) error. We then place basis functions in the lower-dimensional space. These are added as new features for the linear function approximator. This approach is applied to a high-dimensional inventory control problem, and to a number of benchmark reinforcement learning problems.
Nous adressons la construction automatique de fonctions base pour l'approximation linéaire de la fonction valeur d'un processus de décision Markov. Cette thèse se base sur les résultats de Bertsekas et Castañon (1989), qui ont proposé une méthode pour automatiquement grouper des états dans le but d'accélérer la programmation dynamique. Nous proposons d'utiliser une technique récente de réduction de dimension afin de projeter des états en haute dimension dans un espace à basse dimension. Nous plaçons alors des fonctions base radiales dans ce nouvel espace. Cette technique est appliquée à plusieurs problèmes de référence standards pour les algorithmes d'apprentissage par renforcement, ainsi qu'à un problème de contrôle d'inventaire en haute dimension.

APA, Harvard, Vancouver, ISO, and other styles

14

Hwang, Daw-sen. "Projected equation and aggregation-based approximate dynamic programming methods for Tetris." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66033.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 65-67).
In this thesis, we survey approximate dynamic programming (ADP) methods and test the methods with the game of Tetris. We focus on ADP methods where the cost-to- go function J is approximated with [phi]r, where [phi] is some matrix and r is a vector with relatively low dimension. There are two major categories of methods: projected equation methods and aggregation methods. In projected equation methods, the cost-to-go function approximation [phi]r is updated by simulation using one of several policy-updated algorithms such as LSTD([lambda]) [BB96], and LSPE(A) [B196]. Projected equation methods generally may not converge. We define a pseudometric of policies and view the oscillations of policies in Tetris. Aggregation methods are based on a model approximation approach. The original problem is reduced to an aggregate problem with significantly fewer states. The weight vector r is the cost-to-go function of the aggregate problem and [phi] is the matrix of aggregation probabilities. In aggregation methods, the vector r converges to the optimal cost-to-go function of the aggregate problem. In this thesis, we implement aggregation methods for Tetris, and compare the performance of projected equation methods and aggregation methods.
by Daw-sen Hwang.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

15

Löhndorf, Nils, David Wozabal, and Stefan Minner. "Optimizing Trading Decisions for Hydro Storage Systems using Approximate Dual Dynamic Programming." INFORMS, 2013. http://dx.doi.org/10.1287/opre.2013.1182.

Full text

Abstract:

We propose a new approach to optimize operations of hydro storage systems with multiple connected reservoirs whose operators participate in wholesale electricity markets. Our formulation integrates short-term intraday with long-term interday decisions. The intraday problem considers bidding decisions as well as storage operation during the day and is formulated as a stochastic program. The interday problem is modeled as a Markov decision process of managing storage operation over time, for which we propose integrating stochastic dual dynamic programming with approximate dynamic programming. We show that the approximate solution converges towards an upper bound of the optimal solution. To demonstrate the efficiency of the solution approach, we fit an econometric model to actual price and in inflow data and apply the approach to a case study of an existing hydro storage system. Our results indicate that the approach is tractable for a real-world application and that the gap between theoretical upper and a simulated lower bound decreases sufficiently fast. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

16

Jeria, David (David O. Jeria López). "An approximate dynamic programming approach to risk sensitive control of execution costs." Thesis, Massachusetts Institute of Technology, 2008. http://hdl.handle.net/1721.1/55112.

Full text

Abstract:

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, February 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 43-44).
We study the problem of optimal execution within a dynamic programming framework. Given an exponential objective function, system variables which are normally distributed, and linear market dynamics, we derive a closed form solution for optimal trading trajectories. We show that a trader lacking private information has trajectories which are static in nature, whilst a trader with private information requires real time observations to execute optimally. We further show that Bellman's equations become increasingly complex to solve if either the market dynamics are nonlinear, or if additional constraints are added to the problem. As such, we propose an approximate dynamic program using linear programming which achieves near-optimality. The algorithm approximates the exponential objective function within a class of linear architectures, and takes advantage of a probabilistic constraint sampling scheme in order to terminate. The performance of the algorithm relies on the quality of the approximation, and as such we propose a set of heuristics for its efficient implementation.
by David Jeria.
M.Eng.

APA, Harvard, Vancouver, ISO, and other styles

17

Pratikakis, Nikolaos. "Multistage decisions and risk in Markov decision processes towards effective approximate dynamic programming architectures /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/31654.

Full text

Abstract:

Thesis (Ph.D)--Chemical Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Jay H. Lee; Committee Member: Martha Grover; Committee Member: Matthew J. Realff; Committee Member: Shabbir Ahmed; Committee Member: Stylianos Kavadias. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

18

Astaraky, Davood. "A Simulation Based Approximate Dynamic Programming Approach to Multi-class, Multi-resource Surgical Scheduling." Thèse, Université d'Ottawa / University of Ottawa, 2013. http://hdl.handle.net/10393/23622.

Full text

Abstract:

The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.

APA, Harvard, Vancouver, ISO, and other styles

19

Nachmani, Gil. "Minimum-energy flight paths for UAVs using mesoscale wind forecasts and approximate dynamic programming." Thesis, Monterey, Calif. : Naval Postgraduate School, 2007. http://bosun.nps.edu/uhtbin/hyperion-image.exe/07Dec%5FNachmani.pdf.

Full text

Abstract:

Thesis (M.S. in Operations Research)--Naval Postgraduate School, December 2007.
Thesis Advisor(s): Royset, Johannes O. Description based on title screen as viewed on January 22, 2007. Includes bibliographical references (p. 57-60). Also available in print.

APA, Harvard, Vancouver, ISO, and other styles

20

Xu, Jinbiao. "An approximate dynamic programming approach for coordinated charging control at vehicle-to-grid aggregator." Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/36970.

Full text

Abstract:

A vehicle-to-grid (V2G) aggregator is an agent between the power grid and the plug-in hybrid electrical vehicles (PHEVs). In this thesis, we study the coordinated charging control at a V2G aggregator. The coordinated charging control brings the advantages of minimizing the charging cost and reducing the power losses, by coordinating the control sequences of a group of PHEVs. On one hand, the lower cost of charging gives the users of PHEVs an incentive to cooperate. On the other hand, with an increasing popularity of PHEVs, the impact on the power distribution grid such as power losses should be of concern to the aggregator. To this end, we investigate the tradeoffs between reducing the charging cost and the power losses. We formulate the coordinated charging control as a dynamic programming problem, given the planned schedules of all the vehicles at an aggregator. As an inherent property of a V2G aggregator, we enable bidirectional electric power flows between PHEVs and the power grid. Due to the curse of dimensionality, we apply an approximate dynamic programming approach to decrease the dimensionality of both the state space and control space. Simulation results show that coordinated charging control can reduce both the total charging cost and the aggregated power losses significantly, compared with the uncoordinated control where every vehicle starts charging as soon as it is plugged in. We also show that the charging control with bidirectional power flows outperforms remarkably the one with unidirectional flows.

APA, Harvard, Vancouver, ISO, and other styles

21

Hearnes, Warren E. II. "Near-optimal intelligent control for continuous set-point regulator problems via approximate dynamic programming." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/24882.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Chen, Xiaoting. "Optimal Control of Non-Conventional Queueing Networks: A Simulation-Based Approximate Dynamic Programming Approach." University of Cincinnati / OhioLINK, 2015. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1427799942.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Yang, Xinan. "Top-percentile traffic routing problem." Thesis, University of Edinburgh, 2012. http://hdl.handle.net/1842/5883.

Full text

Abstract:

Multi-homing is a technology used by Internet Service Provider (ISP) to connect to the Internet via multiple networks. This connectivity enhances the network reliability and service quality of the ISP. However, using multi-networks may imply multiple costs on the ISP. To make full use of the underlying networks with minimum cost, a routing strategy is requested by ISPs. Of course, this optimal routing strategy depends on the pricing regime used by network providers. In this study we investigate a relatively new pricing regime – top-percentile pricing. Under top-percentile pricing, network providers divide the charging period into several fixed length time intervals and calculate their cost according to the traffic volume that has been shipped during the θ-th highest time interval. Unlike traditional pricing regimes, the network design under top-percentile pricing has not been fully studied. This paper investigates the optimal routing strategy in case where network providers charge ISPs according to top-percentile pricing. We call this problem the Top-percentile Traffic Routing Problem (TpTRP). As the ISP cannot predict next time interval’s traffic volume in real world application, in our setting up the TpTRP is a multi-stage stochastic optimisation problem. Routing decisions should be made at the beginning of every time period before knowing the amount of traffic that is to be sent. The stochastic nature of the TpTRP forms the critical difficulty of this study. In this paper several approaches are investigated in either the modelling or solving steps of the problem. We begin by exploring several simplifications of the original TpTRP to get an insight of the features of the problem. Some of these allow analytical solutions which lead to bounds on the achievable optimal solution. We also establish bounds by investigating several “naive” routing policies. In the second part of this work, we build the multi-stage stochastic programming model of the TpTRP, which is hard to solve due to the integer variables introduced in the calculation of the top-percentile traffic. A lift-and-project based cutting plane method is investigated in solving the SMIP for very small examples of TpTRP. Nevertheless it is too inefficient to be applicable on large sized instances. As an alternative, we explore the solution of the TpTRP as a Stochastic Dynamic Programming (SDP) problem by a discretization of the state space. This SDP model gives us achievable routing policies on small size instances of the TpTRP, which of course improve the naive routing policies. However, the solution approach based on SDP suffers from the curse of dimensionality which restricts its applicability. To overcome this we suggest using Approximate Dynamic Programming (ADP) which largely avoids the curse of dimensionality by exploiting the structure of the problem to construct parameterized approximations of the value function in SDP and train the model iteratively to get a converged set of parameters. The resulting ADP model with discrete parameter for every time interval works well for medium size instances of TpTRP, though it still requires too long to be trained for large size instances. To make the realistically sized TpTRP problem solvable, we improve on the ADP model by using Bezier Curves/Surfaces to do the aggregation over time. This modification accelerates the efficiency of parameter training in the solution of the ADP model, which makes the realistically sized TpTRP tractable.

APA, Harvard, Vancouver, ISO, and other styles

24

Li, Dong. "An approximate dynamic programming approach to the scheduling of impatient jobs in a clearing system." Thesis, Lancaster University, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.557156.

Full text

Abstract:

A single server is faced with a collection of jobs of varying duration and urgency. Before service starts, all jobs are subject to an initial triage, i.e., an assessment of both their urgency and of their service requirement, and are allocated to distinct classes. Jobs in one class have independent and identically distributed lifetimes, during which they are available for service. Should a job's lifetime expire before its service begins then it is lost from the system unserved. The goal is to schedule the jobs for service to maximise the expected number served to completion. Two heuristic policies have been proposed in the literature. One works well in a lino loss" limit while the other does so when lifetimes are short. Both can exhibit poor performance for problems at some distance from the regimes for which they were designed. We develop a robustly good heuristic by an approximative approach to the application of a single policy improvement step to the first policy above, in which we use a fluid model to obtain an approximation for its value function. The performance of the proposed heuristic is investigated in an extensive numerical study, This problem is substantially complicated if the initial triage is subject to error. We take a Bayesian approach to this additional uncertainty and discuss the design of heuristic policies to maximise the Bayes' return. We identify problem features for which a: high price is paid for poor initial triage and for which improvements in initial job assessment yield significant improvements in service outcomes. An analytical upper bound for the cost of imperfect classification is developed for exponentially distributed lifetime cases. An extensive numerical study is conducted to explore the behaviour of the cost in more general situations.

APA, Harvard, Vancouver, ISO, and other styles

25

Lee, Jong Min. "A Study on Architecture, Algorithms, and Applications of Approximate Dynamic Programming Based Approach to Optimal Control." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5048.

Full text

Abstract:

This thesis develops approximate dynamic programming (ADP) strategies suitable for process control problems aimed at overcoming the limitations of MPC, which are the potentially exorbitant on-line computational requirement and the inability to consider the future interplay between uncertainty and estimation in the optimal control calculation. The suggested approach solves the DP only for the state points visited by closed-loop simulations with judiciously chosen control policies. The approach helps us combat a well-known problem of the traditional DP called 'curse-of-dimensionality,' while it allows the user to derive an improved control policy from the initial ones. The critical issue of the suggested method is a proper choice and design of function approximator. A local averager with a penalty term is proposed to guarantee a stably learned control policy as well as acceptable on-line performance. The thesis also demonstrates versatility of the proposed ADP strategy with difficult process control problems. First, a stochastic adaptive control problem is presented. In this application an ADP-based control policy shows an "active" probing property to reduce uncertainties, leading to a better control performance. The second example is a dual-mode controller, which is a supervisory scheme that actively prevents the progression of abnormal situations under a local controller at their onset. Finally, two ADP strategies for controlling nonlinear processes based on input-output data are suggested. They are model-based and model-free approaches, and have the advantage of conveniently incorporating the knowledge of identification data distribution into the control calculation with performance improvement.

APA, Harvard, Vancouver, ISO, and other styles

26

Seelhof, Michael. "Long term infrastructure investments under uncertainty in the electric power sector using approximate dynamic programming techniques." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/90724.

Full text

Abstract:

Thesis: S.M. in Engineering and Management, Massachusetts Institute of Technology, Engineering Systems Division, System Design and Management Program, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 179-183).
A computer model was developed to find optimal long-term investment strategies for the electric power sector under uncertainty with respect to future regulatory regimes and market conditions. The model is based on a multi-stage problem formulation and uses approximate dynamic programming techniques to find an optimal solution. The model was tested under various scenarios. The model results were analyzed with regards to the optimal first-stage investment decision, the final technology mix, total costs, the cost of ignoring uncertainty and the cost of regulatory uncertainty.
by Michael Seelhof.
S.M. in Engineering and Management

APA, Harvard, Vancouver, ISO, and other styles

27

Hanselmann, Thomas. "Approximate dynamic programming with adaptive critics and the algebraic perceptron as a fast neural network related to support vector machines." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2003. http://theses.library.uwa.edu.au/adt-WU2004.0005.

Full text

Abstract:

[Truncated abstract. Please see the pdf version for the complete text. Also, formulae and special characters can only be approximated here. Please see the pdf version of this abstract for an accurate reproduction.] This thesis treats two aspects of intelligent control: The first part is about long-term optimization by approximating dynamic programming and in the second part a specific class of a fast neural network, related to support vector machines (SVMs), is considered. The first part relates to approximate dynamic programming, especially in the framework of adaptive critic designs (ACDs). Dynamic programming can be used to find an optimal decision or control policy over a long-term period. However, in practice it is difficult, and often impossible, to calculate a dynamic programming solution, due to the 'curse of dimensionality'. The adaptive critic design framework addresses this issue and tries to find a good solution by approximating the dynamic programming process for a stationary environment. In an adaptive critic design there are three modules, the plant or environment to be controlled, a critic to estimate the long-term cost and an action or controller module to produce the decision or control strategy. Even though there have been many publications on the subject over the past two decades, there are some points that have had less attention. While most of the publications address the training of the critic, one of the points that has not received systematic attention is training of the action module.¹ Normally, training starts with an arbitrary, hopefully stable, decision policy and its long-term cost is then estimated by the critic. Often the critic is a neural network that has to be trained, using a temporal difference and Bellman's principle of optimality. Once the critic network has converged, a policy improvement step is carried out by gradient descent to adjust the parameters of the controller network. Then the critic is retrained again to give the new long-term cost estimate. However, it would be preferable to focus more on extremal policies earlier in the training. Therefore, the Calculus of Variations is investigated to discard the idea of using the Euler equations to train the actor. However, an adaptive critic formulation for a continuous plant with a short-term cost as an integral cost density is made and the chain rule is applied to calculate the total derivative of the short-term cost with respect to the actor weights. This is different from the discrete systems, usually used in adaptive critics, which are used in conjunction with total ordered derivatives. This idea is then extended to second order derivatives such that Newton's method can be applied to speed up convergence. Based on this, an almost concurrent actor and critic training was proposed. The equations are developed for any non-linear system and short-term cost density function and these were tested on a linear quadratic regulator (LQR) setup. With this approach the solution to the actor and critic weights can be achieved in only a few actor-critic training cycles. Some other, more minor issues, in the adaptive critic framework are investigated, such as the influence of the discounting factor in the Bellman equation on total ordered derivatives, the target interpretation in backpropagation through time as moving and fixed targets, the relation between simultaneous recurrent networks and dynamic programming is stated and a reinterpretation of the recurrent generalized multilayer perceptron (GMLP) as a recurrent generalized finite impulse MLP (GFIR-MLP) is made. Another subject in this area that is investigated, is that of a hybrid dynamical system, characterized as a continuous plant and a set of basic feedback controllers, which are used to control the plant by finding a switching sequence to select one basic controller at a time. The special but important case is considered when the plant is linear but with some uncertainty in the state space and in the observation vector, and a quadratic cost function. This is a form of robust control, where a dynamic programming solution has to be calculated. ¹Werbos comments that most treatment of action nets or policies either assume enumerative maximization, which is good only for small problems, except for the games of Backgammon or Go [1], or, gradient-based training. The latter is prone to difficulties with local minima due to the non-convex nature of the cost-to-go function. With incremental methods, such as backpropagation through time, calculus of variations and model-predictive control, the dangers of non-convexity of the cost-to-go function with respect to the control is much less than the with respect to the critic parameters, when the sampling times are small. Therefore, getting the critic right has priority. But with larger sampling times, when the control represents a more complex plan, non-convexity becomes more serious.

APA, Harvard, Vancouver, ISO, and other styles

28

Stellato, Bartolomeo. "Mixed-integer optimal control of fast dynamical systems." Thesis, University of Oxford, 2017. https://ora.ox.ac.uk/objects/uuid:b8a7323c-e36e-45ec-ae8d-6c9eb4350629.

Full text

Abstract:

Many applications in engineering, computer science and economics involve mixed-integer optimal control problems. Solving these problems in real-time is a challenging task because of the explosion of integer combinations to evaluate. This thesis focuses on the development of new algorithms for mixed-integer programming with an emphasis on optimal control problems of fast dynamical systems with discrete controls. The first part proposes two reformulations to reduce the computational complexity. The first reformulation avoids integer variables altogether. By considering a sequence of switched dynamics, we analyze the switching time optimization problem. Even though it is a continuous smooth problem, it is non-convex and the cost function and derivatives are hard to compute. We develop a new efficient method to compute the cost function and its derivatives. Our technique brings up to two orders of magnitude speedups with respect to state-of-the-art tools. The second approach reduces the number of integer decisions. In hybrid model predictive control (MPC) the computational complexity grows exponentially with the horizon length. Using approximate dynamic programming (ADP) we reduce the horizon length while maintaining good control performance by approximating the tail cost offline. This approach allows, for the first time, the application of such control techniques to fast dynamical systems with sampling times of only a few microseconds. The second part investigates embedded branch-and-bound algorithms for mixed-integer quadratic programs (MIQPs). A core component of these methods is the solution of continuous quadratic programs (QPs). We develop OSQP, a new robust and efficient general-purpose QP solver based on the alternating direction method of multipliers (ADMM) and able, for the first time, to detect infeasible problems. We include OSQP into a custom branch-and-bound algorithm suitable for embedded systems. Our extension requires only a single matrix factorization and exploits warm-starting, thereby greatly reducing the number of ADMM iterations required. Numerical examples show that our algorithm solves small to medium scale MIQPs more quickly than commercial solvers.

APA, Harvard, Vancouver, ISO, and other styles

29

Ramirez, Jose A. "Optimal and Simulation-Based Approximate Dynamic Programming Approaches for the Control of Re-Entrant Line Manufacturing Models." University of Cincinnati / OhioLINK, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1282329260.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Tosukhowong, Thidarat. "Dynamic Real-time Optimization and Control of an Integrated Plant." Diss., Georgia Institute of Technology, 2006. http://hdl.handle.net/1853/14087.

Full text

Abstract:

Applications of the existing steady-state plant-wide optimization and the single-scale fast-rate dynamic optimization strategies to an integrated plant with material recycle have been impeded by several factors. While the steady-state optimization formulation is very simple, the very long transient dynamics of an integrated plant have limited the optimizers execution rate to be extremely low, yielding a suboptimal performance. In contrast, performing dynamic plant-wide optimization at the same rate as local controllers requires exorbitant on-line computational load and may increase the sensitivity to high-frequency dynamics that are irrelevant to the plant-level interactions, which are slow-scale in nature. This thesis proposes a novel multi-scale dynamic optimization and control strategy suitable for an integrated plant. The dynamic plant-wide optimizer in this framework executes at a slow rate to track the slow-scale plant-wide interactions and economics, while leaving the local controllers to handle fast changes related to the local units. Moreover, this slow execution rate demands less computational and modeling requirement than the fast-rate optimizer. An important issue of this method is obtaining a suitable dynamic model when first-principles are unavailable. The difficulties in the system identification process are designing proper input signal to excite this ill-conditioned system and handling the lack of slow-scale dynamic data when the plant experiment cannot be conducted for a long time compared to the settling time. This work presents a grey-box modeling method to incorporate steady-state information to improve the model prediction accuracy. A case study of an integrated plant example is presented to address limitations of the nonlinear model predictive control (NMPC) in terms of the on-line computation and its inability to handle stochastic uncertainties. Then, the approximate dynamic programming (ADP) framework is investigated. This method computes an optimal operating policy under uncertainties off-line. Then, the on-line multi-stage optimization can be transformed into a single-stage problem, thus reducing the real-time computational effort drastically. However, the existing ADP framework is not suitable for an integrated plant with high dimensional state and action space. In this study, we combine several techniques with ADP to apply nonlinear optimal control to the integrated plant example and show its efficacy over NMPC.

APA, Harvard, Vancouver, ISO, and other styles

31

Keerthisinghe, Chanaka. "Fast Solution Techniques for Energy Management in Smart Homes." Thesis, The University of Sydney, 2016. http://hdl.handle.net/2123/16033.

Full text

Abstract:

In the future, residential energy users will seize the full potential of demand response schemes by using an automated smart home energy management system (SHEMS) to schedule their distributed energy resources. The underlying optimisation problem facing a SHEMS is a sequential decision making problem under uncertainty because the states of the devices depend on the past state. There are two major challenges to optimisation in this domain; namely, handling uncertainty, and planning over suitably long decision horizons. In more detail, in order to generate high quality schedules, a SHEMS should consider the stochastic nature of the photovoltaic (PV) generation and energy consumption. In addition, the SHEMS should accommodate predictable inter-daily variations over several days. Ideally, the SHEMS should also be able to integrate into an existing smart meter or a similar device with low computational power. However, extending the decision horizon of existing solution techniques for sequential stochastic decision making problems is computationally difficult and moreover, these approaches are only computationally feasible with a limited number of storage devices and a daily decision horizon. Given this, the research investigates, proposes and develops fast solution techniques for implementing efficient SHEMSs. Specifically, three novel methods for overcoming these challenges: a two-stage lookahead stochastic optimisation framework; an approximate dynamic programming (ADP) approach with temporal difference learning; and a policy function approximation (PFA) algorithm using extreme learning machines (ELM) are presented. Throughout the thesis, the performance of these solution techniques are benchmarked against dynamic programming (DP) and stochastic mixed-integer linear programming (MILP) using a range of residential PV-storage (thermal and battery) systems. We use empirical data collected during the Smart Grid Smart City project in New South Wales, Australia, to estimate the parameters of a Markov chain model of PV output and electrical demand using an hierarchical approach, which first cluster empirical data and then learns probability density functions using kernel regression (Chapter 2). The two-stage lookahead method uses deterministic MILP to solve a longer decision horizon, while its end-of-day battery state of charge is used as a constraint for a daily DP approach (Chapter 4). Here DP is used for the daily horizon as it is shown to provide close-to-optimal solutions when the state, decision and outcome spaces are finely discretised (Chapter 3). However, DP is computationally difficult because of the dimensionalities of state, decision and outcome spaces, so we resort to MILP to solve the longer decision horizon. The two-stage lookahead results in significant financial benefits compared to daily DP and stochastic MILP approaches (8.54% electricity cost savings for a very suitable house), however, the benefits decreases as the actual PV output and demand deviates from their forecast values. Building on this, ADP is proposed in Chapter 5 to implement a computationally efficient SHEMS. Here we obtain policies from value function approximations (VFAs) by stepping forward in time, compared to the value functions obtained by backward induction in DP. Similar to DP, we can use VFAs generated during the offline planning phase to generate fast real-time solutions using the Bellman optimality condition, which is computationally efficient compared to having to solve the entire stochastic MILP problem. The decisions obtained from VFAs at a given time-step are optimal regardless of what happened in the previous time-steps. Our results show that ADP computes a solution much faster than both DP and stochastic MILP, and provides only a slight reduction in quality compared to the optimal DP solution. In addition, incorporating a thermal energy storage unit using the proposed ADP-based SHEMS reduces the daily electricity cost by up to 57.27% for a most suitable home, with low computational burden. Moreover, ADP with a two-day decision horizon reduces the average yearly electricity cost by a 4.6% over a daily DP method, yet requires less than half of the computational effort. However, ADP still takes a considerable amount of time to generate VFAs in the off-line planning phase and require us to estimate PV and demand models. Given this, a PFA algorithm that uses ELM is proposed in Chapter 6 to overcome these difficulties. Here ELM is used to learn models that map input states and output decisions within seconds, without solving an optimisation problem. This off-line planning process requires a training data set, which has to be generated by solving the deterministic SHEMS problem over couple of years. Here we can use a powerful cloud or home computer as it is only needed once. PFA models can be used to make fast real-time decisions and can easily be embedded in an existing smart meter or a similar low power device. Moreover, we can use PFA models over a long period of time without updating the model and still obtain similar quality solutions. Collectively, ADP and PFA using ELM can overcome challenges of considering the stochastic variables, extending the decision horizon and integrating multiple controllable devices using existing smart meters or a device with low computational power, and represent a significant advancement to the state of the art in this domain.

APA, Harvard, Vancouver, ISO, and other styles

32

Wong, Wee Chin. "Estimation and control of jump stochastic systems." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2009. http://hdl.handle.net/1853/31775.

Full text

Abstract:

Thesis (Ph.D)--Chemical Engineering, Georgia Institute of Technology, 2010.
Committee Chair: Jay H. Lee; Committee Member: Alexander Gray; Committee Member: Erik Verriest; Committee Member: Magnus Egerstedt; Committee Member: Martha Grover; Committee Member: Matthew Realff. Part of the SMARTech Electronic Thesis and Dissertation Collection.

APA, Harvard, Vancouver, ISO, and other styles

33

Papageorgiou, Dimitri Jason. "Optimization in maritime inventory routing." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/50117.

Full text

Abstract:

The primary aim of this thesis is to develop effective solution techniques for large-scale maritime inventory routing problems that possess a core substructure common in many real-world applications. We use the term “large-scale” to refer to problems whose standard mixed-integer linear programming (MIP) formulations involve tens of thousands of binary decision variables and tens of thousands of constraints and require days to solve on a personal computer. Although a large body of literature already exists for problems combining vehicle routing and inventory control for road-based applications, relatively little work has been published in the realm of maritime logistics. A major contribution of this research is in the advancement of novel methods for tackling problems orders of magnitude larger than most of those considered in the literature. Coordinating the movement of massive vessels all around the globe to deliver large quantities of high value products is a challenging and important problem within the maritime transportation industry. After introducing a core maritime inventory routing model to aid decision-makers with their coordination efforts, we make three main contributions. First, we present a two-stage algorithm that exploits aggregation and decomposition to produce provably good solutions to complex instances with a 60-period (two-month) planning horizon. Not only is our solution approach different from previous methods discussed in the maritime transportation literature, but computational experience shows that our approach is promising. Second, building on the recent successes of approximate dynamic programming (ADP) for road-based applications, we present an ADP procedure to quickly generate good solutions to maritime inventory routing problems with a long planning horizon of up to 365 periods. For instances with many ports (customers) and many vessels, leading MIP solvers often require hours to produce good solutions even when the planning horizon is limited to 90 periods. Our approach requires minutes. Our algorithm operates by solving many small subproblems and, in so doing, collecting and learning information about how to produce better solutions. Our final research contribution is a polyhedral study of an optimization problem that was motivated by maritime inventory routing, but is applicable to a more general class of problems. Numerous planning models within the chemical, petroleum, and process industries involve coordinating the movement of raw materials in a distribution network so that they can be blended into final products. The uncapacitated fixed-charge transportation problem with blending (FCTPwB) that we study captures a core structure encountered in many of these environments. We model the FCTPwB as a mixed-integer linear program and derive two classes of facets, both exponential in size, for the convex hull of solutions for the problem with a single consumer and show that they can be separated in polynomial time. Finally, a computational study demonstrates that these classes of facets are effective in reducing the integrality gap and solution time for more general instances of the FCTPwB.

APA, Harvard, Vancouver, ISO, and other styles

34

Löhndorf, Nils, and Stefan Minner. "Simulation Optimization for the Stochastic Economic Lot Scheduling Problem." Taylor and Francis, 2013. http://dx.doi.org/10.1080/0740817X.2012.662310.

Full text

Abstract:

We study simulation optimization methods for the stochastic economic lot scheduling problem. In contrast to prior research, we focus on methods that treat this problem as a black box. Based on a large-scale numerical study, we compare approximate dynamic programming with a global search for parameters of simple control policies. We propose two value function approximation schemes based on linear combinations of piecewise- constant functions as well as control policies that can be described by a small set of parameters. While approximate value iteration worked well for small problems with three products, it was clearly outperformed by the global policy search as soon as problem size increased. The most reliable choice in our study was a globally optimized fixed-cycle policy. An additional analysis of the response surface of model parameters on optimal average cost revealed that the cost effect of product diversity was negligible. (authors' abstract)

APA, Harvard, Vancouver, ISO, and other styles

35

Regatti, Jayanth Reddy. "Dynamic Routing for Fuel Optimization in Autonomous Vehicles." The Ohio State University, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=osu1524145002064074.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Cakir, Fahrettin. "Data-centric solution methodologies for vehicle routing problems." Diss., University of Iowa, 2016. https://ir.uiowa.edu/etd/2052.

Full text

Abstract:

Data-driven decision making has become more popular in today’s businesses including logistics and vehicle routing. Leveraging historical data, companies can achieve goals such as customer satisfaction management, scalable and efficient operation, and higher overall revenue. In the management of customer satisfaction, logistics companies use consistent assignment of their drivers to customers over time. Creating this consistency takes time and depends on the history experienced between the company and the customer. While pursuing this goal, companies trade off the cost of capacity with consistency because demand is unknown on a daily basis. We propose concepts and methods that enable a parcel delivery company to balance the trade-off between cost and customer satisfaction. We use clustering methods that use cumulative historical service data to generate better consistency using the information entropy measure. Parcel delivery companies route many vehicles to serve customer requests on a daily basis. While clustering was important to the development of early routing algorithms, modern solution methods rely on metaheuristics, which are not easily deployable and often do not have open source code bases. We propose a two-stage, shape-based clustering approach that efficiently obtains a clustering of delivery request locations. Our solution technique is based on creating clusters that form certain shapes with respect to the depot. We obtain a routing solution by ordering all locations in every cluster separately. Our results are competitive with a state-of-the-art vehicle routing solver in terms of quality. Moreover, the results show that the algorithm is more scalable and is robust to problem parameters in terms of runtime. Fish trawling can be considered as a vehicle routing problem where the main objective is to maximize the amount of fish (revenue) facing uncertainty on catch. This uncertainty creates an embedded prediction problem before deciding where to harvest. Using previous catch data to train prediction models, we solve the routing problem a fish trawler faces using dynamically updated routing decisions allowing for spatiotemporal correlation in the random catch. We investigate the relationship between the quality of predictions and the quality of revenue generated as a result.

APA, Harvard, Vancouver, ISO, and other styles

37

Bountourelis, Theologos. "Efficient pac-learning for episodic tasks with acyclic state spaces and the optimal node visitation problem in acyclic stochastic digaphs." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/28144.

Full text

Abstract:

Thesis (M. S.)--Industrial and Systems Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Reveliotis, Spyros; Committee Member: Ayhan, Hayriye; Committee Member: Goldsman, Dave; Committee Member: Shamma, Jeff; Committee Member: Zwart, Bert.

APA, Harvard, Vancouver, ISO, and other styles

38

Jeddi, Babak. "A coordinated energy management scheme in a residential neighborhood under given market framework." Thesis, Queensland University of Technology, 2020. https://eprints.qut.edu.au/200710/1/Babak_Jeddi_Thesis.pdf.

Full text

Abstract:

This thesis proposes a computationally efficient home energy management system to optimize the electricity payment and improve the occupant's comfort degree by appropriately scheduling all devices of the home. It incorporates solar panels, battery systems, thermostatically controlled appliances, and deferrable appliances. Also, this thesis develops a coordinated framework for the operation of multiple home energy management systems in a residential neighborhood based on the optimal and secure operation of the grid. The coordinated load scheduling framework enables customers to cooperate to optimize energy consumption at the neighborhood level and prevents any limitation violation in the grid operational constraints.

APA, Harvard, Vancouver, ISO, and other styles

39

Fiocchi, Leonardo. "A Reinforcement Learning strategy for Satellite Attitude Control." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

In recent years space missions for both scientific and commercial purposes have substantially increased. More and more spacecrafts have flexible multibody structures, are subject to liquid volume changes, fuel utilization, and other behaviours that alter the parameters of the spacecraft's model. Moreover, varying disturbances such as the gravity angle torque due to Earth's gravitational field, aerodynamic torque, and others may lead to unwanted effects on the satellite's dynamics. These uncertainties in the model and environment descriptions make it difficult to set up an exact mathematical model, which makes even more difficult the attitude control tasks. For these reasons, data-driven approaches are introduced to alleviate the shortcomings of model-based control methodologies, or even solving them completely. This thesis focuses on these issues, introducing and implementing a data-driven approach, bridging optimal control and reinforcement learning building blocks. The developments are carried out on discrete-time time-varying linear systems. The particular feature of this methodology is that no prior knowledge of the system parameters is necessary. While no a priori information is used, the results show how the algorithm converges to the optimal control of a controller with full and precise knowledge of the system. While the algorithm and learning process use quaternions representation, a custom animation of the algorithm and system results in Euler Angles is provided to better evaluate the performances of the solution.

APA, Harvard, Vancouver, ISO, and other styles

40

Goodson, Justin Christopher. "Solution methodologies for vehicle routing problems with stochastic demand." Diss., University of Iowa, 2010. https://ir.uiowa.edu/etd/675.

Full text

Abstract:

We present solution methodologies for vehicle routing problems (VRPs) with stochastic demand, with a specific focus on the vehicle routing problem with stochastic demand (VRPSD) and the vehicle routing problem with stochastic demand and duration limits (VRPSDL). The VRPSD and the VRPSDL are fundamental problems underlying many operational challenges in the fields of logistics and supply chain management. We model the VRPSD and the VRPSDL as large-scale Markov decision processes. We develop cyclic-order neighborhoods, a general methodology for solving a broad class of VRPs, and use this technique to obtain static, fixed route policies for the VRPSD. We develop pre-decision, post-decision, and hybrid rollout policies for approximate dynamic programming (ADP). These policies lay a methodological foundation for solving large-scale sequential decision problems and provide a framework for developing dynamic routing policies. Our dynamic rollout policies for the VRPSDL significantly improve upon a method frequently implemented in practice. We also identify circumstances in which our rollout policies appear to offer little or no benefit compared to this benchmark. These observations can guide managerial decision making regarding when the use of our procedures is justifiable. We also demonstrate that our methodology lends itself to real-time implementation, thereby providing a mechanism to make high-quality, dynamic routing decisions for large-scale operations. Finally, we consider a more traditional ADP approach to the VRPSDL by developing a parameterized linear function to approximate the value functions corresponding to our problem formulation. We estimate parameters via a simulation-based algorithm and show that initializing parameter values via our rollout policies leads to significant improvements. However, we conclude that additional research is required to develop a parametric ADP methodology comparable or superior to our rollout policies.

APA, Harvard, Vancouver, ISO, and other styles

41

Yin, Biao. "Contrôle adaptatif des feux de signalisation dans les carrefours : modélisation du système de trafic dynamique et approches de résolution." Thesis, Belfort-Montbéliard, 2015. http://www.theses.fr/2015BELF0279/document.

Full text

Abstract:

La régulation adaptative des feux de signalisation est un problème très important. Beaucoup de chercheurs travaillent continuellement afin de résoudre les problémes liés à l’embouteillage dans les intersections urbaines. Il devient par conséquent très utile d’employer des algorithmes intelligents afin d’améliorer les performances de régulation et la qualité du service. Dans cette thèse, nous essayons d'étudier ce problème d’une part à travers une modèlisation microscopique et dynamique en temps discret, et d’autre part en explorant plusieurs approches de résoltion pour une intersection isolée ainsi que pour un réseau distribué d'intersections.La première partie se concentre sur la modélisation dynamique des problèmes des feux de signalisation ainsi que de la charge du réseau d’intersections. Le mode de la “séquence de phase adaptative” (APS) dans un plan de feux est d'abord considéré. Quant à la modélisation du contrôle des feux aux intersections, elle est formulée grâce à un processus décisionnel de markov (MDP). En particulier, la notion de “l'état du système accordable” est alors proposée pour la coordination du réseau de trafic. En outre, un nouveau modèle de “véhicule-suiveur” est proposé pour l'environnement de trafic. En se basant sur la modélisation proposée, les méthodes de contrôle des feux dans cette thèse comportent des algorithmes optimaux et quasi-optimaux. Deux algorithmes exacts de résolution basées sur la programmation dynamique (DP) sont alors étudiés et les résultats montrent certaines limites de cette solution DP surtout dans quelques cas complexes où l'espace d'états est assez important. En raison de l’importance du temps d’execution de l'algorithme DP et du manque d'information du modèle (notamment l’information exacte relative à l’arrivée des véhicules à l’intersection), nous avons opté pour un algorithme de programmation dynamique approximative (ADP). Enfin, un algorithme quasi-optimal utilisant l'ADP combinée à la méthode d’amélioration RLS-TD (λ) est choisi. Dans les simulations, en particulier avec l'intégration du mode de phase APS, l'algorithme proposé montre de bons résultats notamment en terme de performance et d'efficacité de calcul
Adaptive traffic signal control is a decision making optimization problem. People address this crucial problem constantly in order to solve the traffic congestion at urban intersections. It is very popular to use intelligent algorithms to improve control performances, such as traffic delay. In the thesis, we try to study this problem comprehensively with a microscopic and dynamic model in discrete-time, and investigate the related algorithms both for isolated intersection and distributed network control. At first, we focus on dynamic modeling for adaptive traffic signal control and network loading problems. The proposed adaptive phase sequence (APS) mode is highlighted as one of the signal phase control mechanisms. As for the modeling of signal control at intersections, problems are fundamentally formulated by Markov decision process (MDP), especially the concept of tunable system state is proposed for the traffic network coordination. Moreover, a new vehicle-following model supports for the network loading environment.Based on the model, signal control methods in the thesis are studied by optimal and near-optimal algorithms in turn. Two exact DP algorithms are investigated and results show some limitations of DP solution when large state space appears in complex cases. Because of the computational burden and unknown model information in dynamic programming (DP), it is suggested to use an approximate dynamic programming (ADP). Finally, the online near-optimal algorithm using ADP with RLS-TD(λ) is confirmed. In simulation experiments, especially with the integration of APS, the proposed algorithm indicates a great advantage in performance measures and computation efficiency

APA, Harvard, Vancouver, ISO, and other styles

42

Holguin, Mijail Gamarra. "Planejamento probabilístico usando programação dinâmica assíncrona e fatorada." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-14042013-131306/.

Full text

Abstract:

Processos de Decisão Markovianos (Markov Decision Process - MDP) modelam problemas de tomada de decisão sequencial em que as possíveis ações de um agente possuem efeitos probabilísticos sobre os estados sucessores (que podem ser definidas por matrizes de transição de estados). Programação dinâmica em tempo real (Real-time dynamic programming - RTDP), é uma técnica usada para resolver MDPs quando existe informação sobre o estado inicial. Abordagens tradicionais apresentam melhor desempenho em problemas com matrizes esparsas de transição de estados porque podem alcançar eficientemente a convergência para a política ótima, sem ter que visitar todos os estados. Porém essa vantagem pode ser perdida em problemas com matrizes densas de transição, nos quais muitos estados podem ser alcançados em um passo (por exemplo, problemas de controle com eventos exógenos). Uma abordagem para superar essa limitação é explorar regularidades existentes na dinâmica do domínio através de uma representação fatorada, isto é, uma representação baseada em variáveis de estado. Nesse trabalho de mestrado, propomos um novo algoritmo chamado de FactRTDP (RTDP Fatorado), e sua versão aproximada aFactRTDP (RTDP Fatorado e Aproximado), que é a primeira versão eficiente fatorada do algoritmo clássico RTDP. Também propomos outras 2 extensões desses algoritmos, o FactLRTDP e aFactLRTDP, que rotulam estados cuja função valor convergiu para o ótimo. Os resultados experimentais mostram que estes novos algoritmos convergem mais rapidamente quando executados em domínios com matrizes de transição densa e tem bom comportamento online em domínios com matrizes de transição densa com pouca dependência entre as variáveis de estado.
Markov Decision Process (MDP) model problems of sequential decision making, where the possible actions have probabilistic effects on the successor states (defined by state transition matrices). Real-time dynamic programming (RTDP), is a technique for solving MDPs when there exists information about the initial state. Traditional approaches show better performance in problems with sparse state transition matrices, because they can achieve the convergence to optimal policy efficiently, without visiting all states. But, this advantage can be lose in problems with dense state transition matrices, in which several states can be achieved in a step (for example, control problems with exogenous events). An approach to overcome this limitation is to explore regularities existing in the domain dynamics through a factored representation, i.e., a representation based on state variables. In this master thesis, we propose a new algorithm called FactRTDP (Factored RTDP), and its approximate version aFactRTDP (Approximate and Factored RTDP), that are the first factored efficient versions of the classical RTDP algorithm. We also propose two other extensions, FactLRTDP and aFactLRTDP, that label states for which the value function has converged to the optimal. The experimental results show that when these new algorithms are executed in domains with dense transition matrices, they converge faster. And they have a good online performance in domains with dense transition matrices and few dependencies among state variables.

APA, Harvard, Vancouver, ISO, and other styles

43

Filho, Antonio Martins Lima. "Alocação dinâmica de recursos: aplicação ao transporte rodoviário de cargas em longa distância." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/3/3138/tde-25082011-135158/.

Full text

Abstract:

O planejamento operacional de um sistema de transporte de longa distância implica resolver um problema de otimização de rede dinâmica, visando a efetuar, de forma eficaz e eficiente, o atendimento às demandas de cargas, utilizando a capacidade de transporte disponível. A metodologia de solução proposta utiliza a abordagem de Rede de Filas Logísticas, a qual substitui o processo de otimização global da rede (usualmente utilizando Programação Linear Inteira) por um modelo de Programação Dinâmica Estocástica, Aproximada e Adaptativa, que permite a resolução de uma série de subproblemas delimitados no tempo, reduzindo sensivelmente a quantidade de variáveis envolvidas. Este método permite a utilização de modelos matemáticos mais realistas em horizontes de planejamento mais amplos. O presente trabalho estende os modelos encontrados na Literatura, aplicando o método a problemas de maior complexidade, incluindo a consideração de frotas heterogêneas de veículos, janelas de início de atendimento, utilização de terceiros transportadores e penalidades pelo não atendimento das demandas. São apresentados exemplos de problemas experimentais submetidos com sucesso à técnica desenvolvida. O trabalho inclui ainda o delineamento de um Sistema de Apoio à Decisão incorporando a metodologia proposta.
Operational planning of a long haul transportation system implies to solve a dynamic network optimization problem, aiming to perform the freight movements in an efficient and effective way, while utilizing the available transportation capacity. The proposed solution methodology utilizes the Logistic Queueing Network approach, replacing the network global optimization process through Integer Linear Programming by a model of Stochastic, Approximate and Adaptive Dynamic Programming, which allows the resolution of a sequence of sub- problems delimited in time, strongly reducing the quantity of variables involved. This method allows the utilization of more realistic mathematical models in a broader planning horizon. The research extends models found in the literature to solve more complex problems, including the consideration of heterogeneous fleet of vehicles, time windows, third party vehicles and penalties for not attendance of demands. Experimental problems solved successfully with the developed technique are presented. The work also presents the delineation of a Decision Support System incorporating the proposed methodology.

APA, Harvard, Vancouver, ISO, and other styles

44

Heng, Jeremy. "On the use of transport and optimal control methods for Monte Carlo simulation." Thesis, University of Oxford, 2016. https://ora.ox.ac.uk/objects/uuid:6cbc7690-ac54-4a6a-b235-57fa62e5b2fc.

Full text

Abstract:

This thesis explores ideas from transport theory and optimal control to develop novel Monte Carlo methods to perform efficient statistical computation. The first project considers the problem of constructing a transport map between two given probability measures. In the Bayesian formalism, this approach is natural when one introduces a curve of probability measures connecting the prior to posterior by tempering the likelihood function. The main idea is to move samples from the prior using an ordinary differential equation (ODE), constructed by solving the Liouville partial differential equation (PDE) which governs the time evolution of measures along the curve. In this work, we first study the regularity solutions of Liouville equation should satisfy to guarantee validity of this construction. We place an emphasis on understanding these issues as it explains the difficulties associated with solutions that have been previously reported. After ensuring that the flow transport problem is well-defined, we give a constructive solution. However, this result is only formal as the representation is given in terms of integrals which are intractable. For computational tractability, we proposed a novel approximation of the PDE which yields an ODE whose drift depends on the full conditional distributions of the intermediate distributions. Even when the ODE is time-discretized and the full conditional distributions are approximated numerically, the resulting distribution of mapped samples can be evaluated and used as a proposal within Markov chain Monte Carlo and sequential Monte Carlo (SMC) schemes. We then illustrate experimentally that the resulting algorithm can outperform state-of-the-art SMC methods at a fixed computational complexity. The second project aims to exploit ideas from optimal control to design more efficient SMC methods. The key idea is to control the proposal distribution induced by a time-discretized Langevin dynamics so as to minimize the Kullback-Leibler divergence of the extended target distribution from the proposal. The optimal value functions of the resulting optimal control problem can then be approximated using algorithms developed in the approximate dynamic programming (ADP) literature. We introduce a novel iterative scheme to perform ADP, provide a theoretical analysis of the proposed algorithm and demonstrate that the latter can provide significant gains over state-of-the-art methods at a fixed computational complexity.

APA, Harvard, Vancouver, ISO, and other styles

45

Andrade, Gustavo Araújo de. "PROGRAMAÇÃO DINÂMICA HEURÍSTICA DUAL E REDES DE FUNÇÕES DE BASE RADIAL PARA SOLUÇÃO DA EQUAÇÃO DE HAMILTON-JACOBI-BELLMAN EM PROBLEMAS DE CONTROLE ÓTIMO." Universidade Federal do Maranhão, 2014. http://tedebc.ufma.br:8080/jspui/handle/tede/517.

Full text

Abstract:

Made available in DSpace on 2016-08-17T14:53:28Z (GMT). No. of bitstreams: 1 Dissertacao Gustavo Araujo.pdf: 2606649 bytes, checksum: efb1a5ded768b058f25d23ee8967bd38 (MD5) Previous issue date: 2014-04-28
In this work the main objective is to present the development of learning algorithms for online application for the solution of algebraic Hamilton-Jacobi-Bellman equation. The concepts covered are focused on developing the methodology for control systems, through techniques that aims to design online adaptive controllers to reject noise sensors, parametric variations and modeling errors. Concepts of neurodynamic programming and reinforcement learning are are discussed to design algorithms where the context of a given operating point causes the control system to adapt and thus present the performance according to specifications design. Are designed methods for online estimation of adaptive critic focusing efforts on techniques for gradient estimating of the environment value function.
Neste trabalho o principal objetivo é apresentar o desenvolvimento de algoritmos de aprendizagem para execução online para a solução da equação algébrica de Hamilton-Jacobi-Bellman. Os conceitos abordados se concentram no desenvolvimento da metodologia para sistemas de controle, por meio de técnicas que tem como objetivo o projeto online de controladores adaptativos são projetados para rejeitar ruídos de sensores, variações paramétricas e erros de modelagem. Conceitos de programação neurodinâmica e aprendizagem por reforço são abordados para desenvolver algoritmos onde a contextualização de determinado ponto de operação faz com que o sistema de controle se adapte e, dessa forma, apresente o desempenho de acordo com as especificações de projeto. Desenvolve-se métodos para a estimação online do crítico adaptativo concentrando os esforços em técnicas de estimação do gradiente da função valor do ambiente.

APA, Harvard, Vancouver, ISO, and other styles

46

Romão, Oberlan Christo. "O problema de corte não-guilhotinado multiperíodo com sobras aproveitáveis." Universidade de São Paulo, 2017. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-01022018-180800/.

Full text

Abstract:

Neste trabalho, estudamos o problema de corte bidimensional multiperíodo com sobras aproveitáveis, que consiste em cortar objetos grandes visando a produção de um conjunto de itens menores. Supomos um horizonte de planejamento finito com uma quantidade finita de períodos entre os tempos inicial e final. Primeiramente consideramos uma versão determinística em que conhecemos, à priori, os itens solicitados em uma ordem de trabalho e o custo dos objetos a cada período. Algumas das sobras geradas durante o processo de corte dos itens solicitados em um período podem ser utilizadas como objetos no futuro. As sobras que podem ser usadas no futuro são denominadas sobras aproveitáveis. De forma geral, uma sobra é considerada aproveitável se possui dimensões iguais ou superiores as de algum item de uma lista pré-definida para o período. O objetivo é minimizar o custo total dos objetos utilizados para satisfazer a ordem de trabalho dos itens solicitados de todo o horizonte considerado. Havendo soluções com o mesmo custo, desejamos encontrar aquela que, no fim do horizonte de tempo considerado, maximize o valor das sobras aproveitáveis remanescentes. Apresentamos uma modelagem matemática do problema usando uma formulação em dois níveis, que é transformada em um modelo de programação linear inteira mista, devido às características do problema. Considerando a dificuldade em resolver o modelo desenvolvido, apresentamos uma proposta de uma abordagem heurística baseada em Programação Dinâmica Aproximada (PDA) para lidar com o problema proposto. Outras opções baseadas em estratégias do tipo horizonte rolante e relax-and-fix também são consideradas. Consideramos também o cenário onde não conhecemos de antemão os itens da ordem de trabalho e o custo dos objetos, mas temos informações das distribuições de probabilidade de ambos. Nesse caso, apresentamos uma abordagem baseada em programação dinâmica aproximada para estimar a melhor estratégia a ser seguida em cada período. Comparamos os resultados obtidos pela PDA com os resultados encontrados por um método guloso. Em cenários adequados, os resultados mostram que a PDA consegue soluções superiores ao método guloso.
In this research, we study the multi-period two-dimensional cutting problem with usable leftover, which consists of cutting objects to produce a set of items. We assume a finite planning horizon with a finite amount of periods between the initial and final times. First we consider a deterministic version in which we know, a priori, the set of ordered items and the cost of the objects at each period. Some of the leftovers generated during the cutting process of the ordered items in a period may be used as objects in the future. The leftovers that can be used in the future are called usable leftovers. In general, a leftover is considered usable if it has dimensions equal to or greater than that of some item from a predefined list for the period. The goal is to minimize the total cost of the objects used to cut the set of ordered items of the entire considered horizon. If there are solutions with the same cost, we wish to find one that, at the end of the considered time horizon, maximizes the value of the remaining usable leftovers. We present a mathematical model of the problem using a bilevel formulation, which is transformed into a mixed integer linear programming model, due to the characteristics of the problem. Considering the difficulty in solving the developed model, we propose a heuristic approach based on approximate dynamic programming (ADP) to deal with the proposed problem. Other options based on the rolling horizon and relax-and-fix strategies are also considered. We also consider the scenario where we do not know in advance the set of ordered items and the cost of the objects, but we have information about the probability distributions of both. In this case, we present an approach based on approximate dynamic programming to estimate the best strategy to be followed at each period. We compared the results obtained by the ADP with the results found by a greedy method. In suitable scenarios, the results show that the ADP achieves superior solutions to the greedy method.

APA, Harvard, Vancouver, ISO, and other styles

47

Zhang, Jian. "Advance Surgery Scheduling with Consideration of Downstream Capacity Constraints and Multiple Sources of Uncertainty." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA023.

Full text

Abstract:

Les travaux de ce mémoire portent sur une gestion optimisée des blocs opératoires dans un service chirurgical. Les arrivées des patients chaque semaine, la durée des opérations et les temps de séjour des patients sont considérés comme des paramètres assujettis à des incertitudes. Chaque semaine, le gestionnaire hospitalier doit déterminer les blocs chirurgicaux à mettre en service et leur affecter certaines opérations figurant sur la liste d'attente. L'objectif est la minimisation d'une part des coûts liés à la réalisation et au report des opérations, et d'autre part des coûts hospitaliers liés aux ressources chirurgicaux. Lorsque nous considérons que les modèles de programmations mathématiques couramment utilisés dans la littérature n'optimisent pas la performance à long terme des programmes chirurgicaux, nous proposons un nouveau modèle d'optimisation à deux phases combinant le processus de décision Markovien (MDP) et la programmation stochastique. Le MDP de la première phase détermine les opérations à effectuer chaque semaine et minimise les coûts totaux sur un horizon infini. La programmation stochastique de la deuxième phase optimise les affectations des opérations sélectionnées dans les blocs chirurgicaux. Afin de résoudre la complexité des problèmes de grande taille, nous développons un algorithme de programmation dynamique approximatif basé sur l'apprentissage par renforcement et plusieurs autres heuristiques basés sur la génération de colonnes. Nous développons des applications numériques afin d'évaluer le modèle et les algorithmes proposés. Les résultats expérimentaux indiquent que ces algorithmes sont considérablement plus efficaces que les algorithmes traditionnels. Les programmes chirurgicaux du modèle d’optimisation à deux phases sont plus performants de manière significative que ceux d’un modèle de programmation stochastique classique en termes de temps d’attente des patients et de coûts totaux sur le long terme
This thesis deals with the advance scheduling of elective surgeries in an operating theatre that is composed of operating rooms and downstream recovery units. The arrivals of new patients in each week, the duration of each surgery, and the length-of-stay of each patient in the downstream recovery unit are subject to uncertainty. In each week, the surgery planner should determine the surgical blocks to open and assign some of the surgeries in the waiting list to the open surgical blocks. The objective is to minimize the patient-related costs incurred by performing and postponing surgeries as well as the hospital-related costs caused by the utilization of surgical resources. Considering that the pure mathematical programming models commonly used in literature do not optimize the long-term performance of the surgery schedules, we propose a novel two-phase optimization model that combines Markov decision process (MDP) and stochastic programming to overcome this drawback. The MDP model in the first phase determines the surgeries to be performed in each week and minimizes the expected total costs over an infinite horizon, then the stochastic programming model in the second phase optimizes the assignments of the selected surgeries to surgical blocks. In order to cope with the huge complexity of realistically sized problems, we develop a reinforcement-learning-based approximate dynamic programming algorithm and several column-generation-based heuristic algorithms as the solution approaches. We conduct numerical experiments to evaluate the model and algorithms proposed in this thesis. The experimental results indicate that the proposed algorithms are considerably more efficient than the traditional ones, and that the resulting schedules of the two-phase optimization model significantly outperform those of a conventional stochastic programming model in terms of the patients' waiting times and the total costs on the long run

APA, Harvard, Vancouver, ISO, and other styles

48

Qu, Zheng. "Nonlinear Perron-Frobenius theory and max-plus numerical methods for Hamilton-Jacobi equations." Palaiseau, Ecole polytechnique, 2013. http://pastel.archives-ouvertes.fr/docs/00/92/71/22/PDF/thesis.pdf.

Full text

Abstract:

Une approche fondamentale pour la résolution de problémes de contrôle optimal est basée sur le principe de programmation dynamique. Ce principe conduit aux équations d'Hamilton-Jacobi, qui peuvent être résolues numériquement par des méthodes classiques comme la méthode des différences finies, les méthodes semi-lagrangiennes, ou les schémas antidiffusifs. À cause de la discrétisation de l'espace d'état, la dimension des problèmes de contrôle pouvant être abordés par ces méthodes classiques est souvent limitée à 3 ou 4. Ce phénomène est appellé malédiction de la dimension. Cette thèse porte sur les méthodes numériques max-plus en contôle optimal deterministe et ses analyses de convergence. Nous étudions et developpons des méthodes numériques destinées à attenuer la malédiction de la dimension, pour lesquelles nous obtenons des estimations théoriques de complexité. Les preuves reposent sur des résultats de théorie de Perron-Frobenius non linéaire. En particulier, nous étudions les propriétés de contraction des opérateurs monotones et non expansifs, pour différentes métriques de Finsler sur un cône (métrique de Thompson, métrique projective d'Hilbert). Nous donnons par ailleurs une généralisation du "coefficient d'ergodicité de Dobrushin" à des opérateurs de Markov sur un cône général. Nous appliquons ces résultats aux systèmes de consensus ainsi qu'aux équations de Riccati généralisées apparaissant en contrôle stochastique
Dynamic programming is one of the main approaches to solve optimal control problems. It reduces the latter problems to Hamilton-Jacobi partial differential equations (PDE). Several techniques have been proposed in the literature to solve these PDE. We mention, for example, finite difference schemes, the so-called discrete dynamic programming method or semi-Lagrangian method, or the antidiffusive schemes. All these methods are grid-based, i. E. , they require a discretization of the state space, and thus suffer from the so-called curse of dimensionality. The present thesis focuses on max-plus numerical solutions and convergence analysis for medium to high dimensional deterministic optimal control problems. We develop here max-plus based numerical algorithms for which we establish theoretical complexity estimates. The proof of these estimates is based on results of nonlinear Perron-Frobenius theory. In particular, we study the contraction properties of monotone or non-expansive nonlinear operators, with respect to several classical metrics on cones (Thompson's metric, Hilbert's projective metric), and obtain nonlinear or non-commutative generalizations of the "ergodicity coefficients" arising in the theory of Markov chains. These results have applications in consensus theory and also to the generalized Riccati equations arising in stochastic optimal control

APA, Harvard, Vancouver, ISO, and other styles

49

Molina, Diogenes. "Intelligent control and system aggregation techniques for improving rotor-angle stability of large-scale power systems." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50291.

Full text

Abstract:

A variety of factors such as increasing electrical energy demand, slow expansion of transmission infrastructures, and electric energy market deregulation, are forcing utilities and system operators to operate power systems closer to their design limits. Operating under stressed regimes can have a detrimental effect on the rotor-angle stability of the system. This stability reduction is often reflected by the emergence or worsening of poorly damped low-frequency electromechanical oscillations. Without appropriate measures these can lead to costly blackouts. To guarantee system security, operators are sometimes forced to limit power transfers that are economically beneficial but that can result in poorly damped oscillations. Controllers that damp these oscillations can improve system reliability by preventing blackouts and provide long term economic gains by enabling more extensive utilization of the transmission infrastructure. Previous research in the use of artificial neural network-based intelligent controllers for power system damping control has shown promise when tested in small power system models. However, these controllers do not scale-up well enough to be deployed in realistically-sized power systems. The work in this dissertation focuses on improving the scalability of intelligent power system stabilizing controls so that they can significantly improve the rotor-angle stability of large-scale power systems. A framework for designing effective and robust intelligent controllers capable of scaling-up to large scale power systems is proposed. Extensive simulation results on a large-scale power system simulation model demonstrate the rotor-angle stability improvements attained by controllers designed using this framework.

APA, Harvard, Vancouver, ISO, and other styles

50

Divila, Jaroslav. "Vyhledávání přibližných palindromů v DNA sekvencích." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2012. http://www.nusl.cz/ntk/nusl-236524.

Full text

Abstract:

This work deals with conception and implemetation of tools for finding approximate palindromes in DNA sequences. The work focuses on the description of DNA structure, and on the function of palindromes in DNA sequences, and on the description of methods for finding approximate palindromes. Main part of thesis is focused on conclusion and description of implementation approximate palidromes finding tool.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!