Увійти

Готові списки джерел за темами / Multi-Objective Markov Decision Processes / Дисертації

Дисертації з теми "Multi-Objective Markov Decision Processes"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Multi-Objective Markov Decision Processes.

Автор: Grafiati

Опубліковано: 10 грудня 2022

Оновлено: 29 січня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-27 дисертацій для дослідження на тему "Multi-Objective Markov Decision Processes".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Pratikakis, Nikolaos. "Multistage decisions and risk in Markov decision processes towards effective approximate dynamic programming architectures /." Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/31654.

Повний текст джерела

Анотація:

Thesis (Ph.D)--Chemical Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Jay H. Lee; Committee Member: Martha Grover; Committee Member: Matthew J. Realff; Committee Member: Shabbir Ahmed; Committee Member: Stylianos Kavadias. Part of the SMARTech Electronic Thesis and Dissertation Collection.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Dorff, Rebecca. "Modelling Infertility with Markov Chains." BYU ScholarsArchive, 2013. https://scholarsarchive.byu.edu/etd/4070.

Повний текст джерела

Анотація:

Infertility affects approximately 15% of couples. Testing and interventions are costly, in time, money, and emotional energy. This paper will discuss using Markov decision and multi-armed bandit processes to identify a systematic approach of interventions that will lead to the desired baby while minimizing costs.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Chen, Yu Fan Ph D. Massachusetts Institute of Technology. "Hierarchical decomposition of multi-agent Markov decision processes with application to health aware planning." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/93795.

Повний текст джерела

Анотація:

Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2014.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 99-104).
Multi-agent robotic systems have attracted the interests of both researchers and practitioners because they provide more capabilities and afford greater flexibility than single-agent systems. Coordination of individual agents within large teams is often challenging because of the combinatorial nature of such problems. In particular, the number of possible joint configurations is the product of that of every agent. Further, real world applications often contain various sources of uncertainties. This thesis investigates techniques to address the scalability issue of multi-agent planning under uncertainties. This thesis develops a novel hierarchical decomposition approach (HD-MMDP) for solving Multi-agent Markov Decision Processes (MMDPs), which is a natural framework for formulating stochastic sequential decision-making problems. In particular, the HD-MMDP algorithm builds a decomposition structure by exploiting coupling relationships in the reward function. A number of smaller subproblems are formed and are solved individually. The planning spaces of each subproblem are much smaller than that of the original problem, which improves the computational efficiency, and the solutions to the subproblems can be combined to form a solution (policy) to the original problem. The HD-MMDP algorithm is applied on a ten agent persistent search and track (PST) mission and shows more than 35% improvement over an existing algorithm developed specifically for this domain. This thesis also contributes to the development of the software infrastructure that enables hardware experiments involving multiple robots. In particular, the thesis presents a novel optimization based multi-agent path planning algorithm, which was tested in simulation and hardware (quadrotor) experiment. The HD-MMDP algorithm is also used to solve a multi-agent intruder monitoring mission implemented using real robots.
by Yu Fan Chen.
S.M.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Omidshafiei, Shayegan. "Decentralized control of multi-robot systems using partially observable Markov Decision Processes and belief space macro-actions." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101447.

Повний текст джерела

Анотація:

Thesis: S.M., Massachusetts Institute of Technology, Department of Aeronautics and Astronautics, 2015.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 129-139).
Planning, control, perception, and learning for multi-robot systems present signicant challenges. Transition dynamics of the robots may be stochastic, making it difficult to select the best action each robot should take at a given time. The observation model, a function of the robots' sensors, may be noisy or partial, meaning that deterministic knowledge of the team's state is often impossible to attain. Robots designed for real-world applications require careful consideration of such sources of uncertainty. This thesis contributes a framework for multi-robot planning in continuous spaces with partial observability. Decentralized Partially Observable Markov Decision Processes (Dec-POMDPs) are general models for multi-robot coordination problems. However, representing and solving Dec-POMDPs is often intractable for large problems. This thesis extends the Dec-POMDP framework to the Decentralized Partially Observable Semi-Markov Decision Process (Dec-POSMDP), taking advantage of high- level representations that are natural for multi-robot problems. Dec-POSMDPs allow asynchronous decision-making, which is crucial in multi-robot domains. This thesis also presents algorithms for solving Dec-POSMDPs, which are more scalable than previous methods due to use of closed-loop macro-actions in planning. The proposed framework's performance is evaluated in a constrained multi-robot package delivery domain, showing its ability to provide high-quality solutions for large problems. Due to the probabilistic nature of state transitions and observations, robots operate in belief space, the space of probability distributions over all of their possible states. This thesis also contributes a hardware platform called Measurable Augmented Reality for Prototyping Cyber-Physical Systems (MAR-CPS). MAR-CPS allows real-time visualization of the belief space in laboratory settings.
by Shayegan Omidshafiei.
S.M.

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Dorini, Gianluca. "The neighbour search approach for solving multi-objectie Markov Decision Processes, and the application in reservoirs operation planning." Thesis, University of Exeter, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445450.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Fowler, Michael C. "Intelligent Knowledge Distribution for Multi-Agent Communication, Planning, and Learning." Diss., Virginia Tech, 2020. http://hdl.handle.net/10919/97996.

Повний текст джерела

Анотація:

This dissertation addresses a fundamental question of multi-agent coordination: what infor- mation should be sent to whom and when, with the limited resources available to each agent? Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this dissertation introduces new concepts to enable Intelligent Knowledge Distribution (IKD), including Constrained-action POMDPs (CA-POMDP) and concurrent decentralized (CoDec) POMDPs for an agnostic plug-and-play capability for fully autonomous systems. Each agent runs a CoDec POMDP where all the decision making (motion planning, task allocation, asset monitoring, and communication) are separated into concurrent individual MDPs to reduce the combinatorial explosion of the action and state space while maintaining dependencies between the models. We also introduce the CA-POMDP with action-based constraints on partially observable Markov decision processes, rewards driven by the value of information, and probabilistic constraint satisfaction through discrete optimization and Markov chain Monte Carlo analysis. IKD is adapted real-time through machine learning of the actual environmental impacts on the behavior of the system, including collaboration strategies between autonomous agents, the true value of information between heterogeneous systems, observation probabilities and resource utilization.
Doctor of Philosophy
This dissertation addresses a fundamental question behind when multiple autonomous sys- tems, like drone swarms, in the field need to coordinate and share data: what information should be sent to whom and when, with the limited resources available to each agent? Intelligent Knowledge Distribution is a framework that answers these questions. Communication requirements for multi-agent systems can be rather high when an accurate picture of the environment and the state of other agents must be maintained. To reduce the impact of multi-agent coordination on networked systems, e.g., power and bandwidth, this dissertation introduces new concepts to enable Intelligent Knowledge Distribution (IKD), including Constrained-action POMDPs and concurrent decentralized (CoDec) POMDPs for an agnostic plug-and-play capability for fully autonomous systems. The IKD model was able to demonstrate its validity as a "plug-and-play" library that manages communications between agents that ensures the right information is being transmitted at the right time to the right agent to ensure mission success.

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Leung, Hiu-lan, and 梁曉蘭. "Wandering ideal point models for single or multi-attribute ranking data: a Bayesian approach." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2003. http://hub.hku.hk/bib/B29552357.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Murugesan, Sugumar. "Opportunistic Scheduling Using Channel Memory in Markov-modeled Wireless Networks." The Ohio State University, 2010. http://rave.ohiolink.edu/etdc/view?acc_num=osu1282065836.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Raffensperger, Peter Abraham. "Measuring and Influencing Sequential Joint Agent Behaviours." Thesis, University of Canterbury. Electrical and Computer Engineering, 2013. http://hdl.handle.net/10092/7472.

Повний текст джерела

Анотація:

Algorithmically designed reward functions can influence groups of learning agents toward measurable desired sequential joint behaviours. Influencing learning agents toward desirable behaviours is non-trivial due to the difficulties of assigning credit for global success to the deserving agents and of inducing coordination. Quantifying joint behaviours lets us identify global success by ranking some behaviours as more desirable than others. We propose a real-valued metric for turn-taking, demonstrating how to measure one sequential joint behaviour. We describe how to identify the presence of turn-taking in simulation results and we calculate the quantity of turn-taking that could be observed between independent random agents. We demonstrate our turn-taking metric by reinterpreting previous work on turn-taking in emergent communication and by analysing a recorded human conversation. Given a metric, we can explore the space of reward functions and identify those reward functions that result in global success in groups of learning agents. We describe 'medium access games' as a model for human and machine communication and we present simulation results for an extensive range of reward functions for pairs of Q-learning agents. We use the Nash equilibria of medium access games to develop predictors for determining which reward functions result in turn-taking. Having demonstrated the predictive power of Nash equilibria for turn-taking in medium access games, we focus on synthesis of reward functions for stochastic games that result in arbitrary desirable Nash equilibria. Our method constructs a reward function such that a particular joint behaviour is the unique Nash equilibrium of a stochastic game, provided that such a reward function exists. This method builds on techniques for designing rewards for Markov decision processes and for normal form games. We explain our reward design methods in detail and formally prove that they are correct.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Lafleur, Jarret Marshall. "A Markovian state-space framework for integrating flexibility into space system design decisions." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/43749.

Повний текст джерела

Анотація:

The past decades have seen the state of the art in aerospace system design progress from a scope of simple optimization to one including robustness, with the objective of permitting a single system to perform well even in off-nominal future environments. Integrating flexibility, or the capability to easily modify a system after it has been fielded in response to changing environments, into system design represents a further step forward. One challenge in accomplishing this rests in that the decision-maker must consider not only the present system design decision, but also sequential future design and operation decisions. Despite extensive interest in the topic, the state of the art in designing flexibility into aerospace systems, and particularly space systems, tends to be limited to analyses that are qualitative, deterministic, single-objective, and/or limited to consider a single future time period. To address these gaps, this thesis develops a stochastic, multi-objective, and multi-period framework for integrating flexibility into space system design decisions. Central to the framework are five steps. First, system configuration options are identified and costs of switching from one configuration to another are compiled into a cost transition matrix. Second, probabilities that demand on the system will transition from one mission to another are compiled into a mission demand Markov chain. Third, one performance matrix for each design objective is populated to describe how well the identified system configurations perform in each of the identified mission demand environments. The fourth step employs multi-period decision analysis techniques, including Markov decision processes (MDPs) from the field of operations research, to find efficient paths and policies a decision-maker may follow. The final step examines the implications of these paths and policies for the primary goal of informing initial system selection. Overall, this thesis unifies state-centric concepts of flexibility from economics and engineering literature with sequential decision-making techniques from operations research. The end objective of this thesis' framework and its supporting analytic and computational tools is to enable selection of the next-generation space systems today, tailored to decision-maker budget and performance preferences, that will be best able to adapt and perform in a future of changing environments and requirements. Following extensive theoretical development, the framework and its steps are applied to space system planning problems of (1) DARPA-motivated multiple- or distributed-payload satellite selection and (2) NASA human space exploration architecture selection.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Riauke, Jelena. "SPEA2-based safety system multi-objective optimization." Thesis, Loughborough University, 2009. https://dspace.lboro.ac.uk/2134/5514.

Повний текст джерела

Анотація:

Safety systems are designed to prevent the occurrence of certain conditions and their future development into a hazardous situation. The consequence of the failure of a safety system of a potentially hazardous industrial system or process varies from minor inconvenience and cost to personal injury, significant economic loss and death. To minimise the likelihood of a hazardous situation, safety systems must be designed to maximise their availability. Therefore, the purpose of this thesis is to propose an effective safety system design optimization scheme. A multi-objective genetic algorithm has been adopted, where the criteria catered for includes unavailability, cost, spurious trip and maintenance down time. Analyses of individual system designs are carried out using the latest advantages of the fault tree analysis technique and the binary decision diagram approach (BDD). The improved strength Pareto evolutionary approach (SPEA2) is chosen to perform the system optimization resulting in the final design specifications. The practicality of the developed approach is demonstrated initially through application to a High Integrity Protection System (HIPS) and subsequently to test scalability using the more complex Firewater Deluge System (FDS). Computer code has been developed to carry out the analysis. The results for both systems are compared to those using a single objective optimization approach (GASSOP) and exhaustive search. The overall conclusions show a number of benefits of the SPEA2 based technique application to the safety system design optimization. It is common for safety systems to feature dependency relationships between its components. To enable the use of the fault tree analysis technique and the BDD approach for such systems, the Markov method is incorporated into the optimization process. The main types of dependency which can exist between the safety system component failures are identified. The Markov model generation algorithms are suggested for each type of dependency. The modified optimization tool is tested on the HIPS and FDS. Results comparison shows the benefit of using the modified technique for safety system optimization. Finally the effectiveness and application to general safety systems is discussed.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Reynolds, Toby J. "Bayesian modelling of integrated data and its application to seabird populations." Thesis, University of St Andrews, 2010. http://hdl.handle.net/10023/1635.

Повний текст джерела

Анотація:

Integrated data analyses are becoming increasingly popular in studies of wild animal populations where two or more separate sources of data contain information about common parameters. Here we develop an integrated population model using abundance and demographic data from a study of common guillemots (Uria aalge) on the Isle of May, southeast Scotland. A state-space model for the count data is supplemented by three demographic time series (productivity and two mark-recapture-recovery (MRR)), enabling the estimation of prebreeder emigration rate - a parameter for which there is no direct observational data, and which is unidentifiable in the separate analysis of MRR data. A Bayesian approach using MCMC provides a flexible and powerful analysis framework. This model is extended to provide predictions of future population trajectories. Adopting random effects models for the survival and productivity parameters, we implement the MCMC algorithm to obtain a posterior sample of the underlying process means and variances (and population sizes) within the study period. Given this sample, we predict future demographic parameters, which in turn allows us to predict future population sizes and obtain the corresponding posterior distribution. Under the assumption that recent, unfavourable conditions persist in the future, we obtain a posterior probability of 70% that there is a population decline of >25% over a 10-year period. Lastly, using MRR data we test for spatial, temporal and age-related correlations in guillemot survival among three widely separated Scottish colonies that have varying overlap in nonbreeding distribution. We show that survival is highly correlated over time for colonies/age classes sharing wintering areas, and essentially uncorrelated for those with separate wintering areas. These results strongly suggest that one or more aspects of winter environment are responsible for spatiotemporal variation in survival of British guillemots, and provide insight into the factors driving multi-population dynamics of the species.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Astaraky, Davood. "A Simulation Based Approximate Dynamic Programming Approach to Multi-class, Multi-resource Surgical Scheduling." Thèse, Université d'Ottawa / University of Ottawa, 2013. http://hdl.handle.net/10393/23622.

Повний текст джерела

Анотація:

The thesis focuses on a model that seeks to address patient scheduling step of the surgical scheduling process to determine the number of surgeries to perform in a given day. Specifically, provided a master schedule that provides a cyclic breakdown of total OR availability into specific daily allocations to each surgical specialty, we look to provide a scheduling policy for all surgeries that minimizes a combination of the lead time between patient request and surgery date, overtime in the ORs and congestion in the wards. We cast the problem of generating optimal control strategies into the framework of Markov Decision Process (MDP). The Approximate Dynamic Programming (ADP) approach has been employed to solving the model which would otherwise be intractable due to the size of the state space. We assess performance of resulting policy and quality of the driven policy through simulation and we provide our policy insights and conclusions.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Pinheiro, Paulo Gurgel 1983. "Localização multirrobo cooperativa com planejamento." [s.n.], 2009. http://repositorio.unicamp.br/jspui/handle/REPOSIP/276155.

Повний текст джерела

Анотація:

Orientador: Jacques Wainer
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-09-11T21:14:07Z (GMT). No. of bitstreams: 1 Pinheiro_PauloGurgel_M.pdf: 1259816 bytes, checksum: a4783df9aa3755becb68ee233ad43e3c (MD5) Previous issue date: 2009
Resumo: Em um problema de localização multirrobô cooperativa, um grupo de robôs encontra-se em um determinado ambiente, cuja localização exata de cada um dos robôs é desconhecida. Neste cenário, uma distribuição de probabilidades aponta as chances de um robô estar em um determinado estado. É necessário então, que os robôs se movimentem pelo ambiente e gerem novas observações que serão compartilhadas, para calcular novas estimativas. Nos últimos anos, muitos trabalhos têm focado no estudo de técnicas probabilísticas, modelos de comunicação e modelos de detecções, para resolver o problema de localização. No entanto, a movimentação dos robôs é, em geral, definida por ações aleatórias. Ações aleatórias geram observações que podem ser inúteis para a melhoria da estimativa. Este trabalho apresenta uma proposta de localização com suporte a planejamento de ações. O objetivo é apresentar um modelo cujas ações realizadas pelos robôs são definidas por políticas. Escolhendo a melhor ação a ser realizada, é possível receber informações mais úteis dos sensores internos e externos e estimar as posturas mais rapidamente. O modelo proposto, denominado Modelo de Localização Planejada - MLP, utiliza POMDPs para modelar os problemas de localização e algoritmos específicos de geração de políticas. Foi utilizada a localização de Markov como técnica probabilística de localização e implementadas versões de modelos de detecção e propagação de informação. Neste trabalho, um simulador de problemas de localização multirrobô foi desenvolvido, no qual foram realizados experimentos em que o modelo proposto foi comparado a um modelo que não faz uso de planejamento de ações. Os resultados obtidos apontam que o modelo proposto é capaz de estimar as posturas dos robôs com uma menor quantidade de passos, sendo significativamente mais e ciente do que o modelo comparado sem planejamento.
Abstract: In a cooperative multi-robot localization problem, a group of robots is in a certain environment, where the exact location of each robot is unknown. In this scenario, there is only a distribution of probabilities indicating the chance of a robot to be in a particular state. It is necessary for the robots to move in the environment generating new observations, which will be shared to calculate new estimates. Currently, many studies have focused on the study of probabilistic techniques, models of communication and models of detection to solve the localization problem. However, the movement of robots is generally defined by random actions. Random actions generate observations that can be useless for improving the estimate. This work describes a proposal for multi-robot localization with support planning of actions. The objective is to describe a model whose actions performed by robots are defined by policies. Choosing the best action to be performed, the robot gets more useful information from internal and external sensors and estimates the posture more quickly. The proposed model, called Model of Planned Localization - MPL, uses POMDPs to model the problems of location and specific algorithms to generate policies. The Markov localization was used as probabilistic technique of localization and implemented versions of detection models and information propagation model. In this work, a simulator to multi-robot localization problems was developed, in which experiments were performed. The proposed model was compared to a model that does not make use of planning actions. The results showed that the proposed model is able to estimate the positions of robots with lower number of steps, being more e-cient than model compared.
Mestrado
Inteligencia Artificial
Mestre em Ciência da Computação

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Ozcan-Deniz, Gulbin. "An Integrated Multi-Agent Framework for Optimizing Time, Cost and Environmental Impact of Construction Processes." FIU Digital Commons, 2011. http://digitalcommons.fiu.edu/etd/455.

Повний текст джерела

Анотація:

Environmentally conscious construction has received a significant amount of research attention during the last decades. Even though construction literature is rich in studies that emphasize the importance of environmental impact during the construction phase, most of the previous studies failed to combine environmental analysis with other project performance criteria in construction. This is mainly because most of the studies have overlooked the multi-objective nature of construction projects. In order to achieve environmentally conscious construction, multi-objectives and their relationships need to be successfully analyzed in the complex construction environment. The complex construction system is composed of changing project conditions that have an impact on the relationship between time, cost and environmental impact (TCEI) of construction operations. Yet, this impact is still unknown by construction professionals. Studying this impact is vital to fulfill multiple project objectives and achieve environmentally conscious construction. This research proposes an analytical framework to analyze the impact of changing project conditions on the relationship of TCEI. This study includes green house gas (GHG) emissions as an environmental impact category. The methodology utilizes multi-agent systems, multi-objective optimization, analytical network process, and system dynamics tools to study the relationships of TCEI and support decision-making under the influence of project conditions. Life cycle assessment (LCA) is applied to the evaluation of environmental impact in terms of GHG. The mixed method approach allowed for the collection and analysis of qualitative and quantitative data. Structured interviews of professionals in the highway construction field were conducted to gain their perspectives in decision-making under the influence of certain project conditions, while the quantitative data were collected from the Florida Department of Transportation (FDOT) for highway resurfacing projects. The data collected were used to test the framework. The framework yielded statistically significant results in simulating project conditions and optimizing TCEI. The results showed that the change in project conditions had a significant impact on the TCEI optimal solutions. The correlation between TCEI suggested that they affected each other positively, but in different strengths. The findings of the study will assist contractors to visualize the impact of their decision on the relationship of TCEI.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Paniah, Crédo. "Approche multi-agents pour la gestion des fermes éoliennes offshore." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112067/document.

Повний текст джерела

Анотація:

La raréfaction des sources de production conventionnelles et leurs émissions nocives ont favorisé l’essor notable de la production renouvelable, plus durable et mieux répartie géographiquement. Toutefois, son intégration au système électrique est problématique. En effet, la production renouvelable est peu prédictible et issue de sources majoritairement incontrôlables, ce qui compromet la stabilité du réseau, la viabilité économique des producteurs et rend nécessaire la définition de solutions adaptées pour leur participation au marché de l’électricité. Dans ce contexte, le projet scientifique Winpower propose de relier par un réseau à courant continu les ressources de plusieurs acteurs possédant respectivement des fermes éoliennes offshore (acteurs EnR) et des centrales de stockage de masse (acteurs CSM). Cette configuration impose aux acteurs d’assurer conjointement la gestion du réseau électrique.Nous supposons que les acteurs participent au marché comme une entité unique : cette hypothèse permet aux acteurs EnR de tirer profit de la flexibilité des ressources contrôlables pour minimiser le risque de pénalités sur le marché de l’électricité, aux acteurs CSM de valoriser leurs ressources auprès des acteurs EnR et/ou auprès du marché et à la coalition de faciliter la gestion des déséquilibres sur le réseau électrique, en agrégeant les ressources disponibles. Dans ce cadre, notre travail s’attaque à la problématique de la participation au marché EPEX SPOT Day-Ahead de la coalition comme une centrale électrique virtuelle ou CVPP (Cooperative Virtual Power Plant). Nous proposons une architecture de pilotage multi-acteurs basée sur les systèmes multi-agents (SMA) : elle permet d’allier les objectifs et contraintes locaux des acteurs et les objectifs globaux de la coalition.Nous formalisons alors l’agrégation et la planification de l’utilisation des ressources comme un processus décisionnel de Markov (MDP), un modèle formel adapté à la décision séquentielle en environnement incertain, pour déterminer la séquence d’actions sur les ressources contrôlables qui maximise l’espérance des revenus effectifs de la coalition. Toutefois, au moment de la planification des ressources de la coalition, l’état de la production renouvelable n’est pas connue et le MDP n’est pas résoluble en l’état : on parle de MDP partiellement observable (POMDP). Nous décomposons le POMDP en un MDP classique et un état d’information (la distribution de probabilités des erreurs de prévision de la production renouvelable) ; en extrayant cet état d’information de l’expression du POMDP, nous obtenons un MDP à état d’information (IS-MDP), pour la résolution duquel nous proposons une adaptation d’un algorithme de résolution classique des MDP, le Backwards Induction.Nous décrivons alors un cadre de simulation commun pour comparer dans les mêmes conditions nos propositions et quelques autres stratégies de participation au marché dont l’état de l’art dans la gestion des ressources renouvelables et contrôlables. Les résultats obtenus confortent l’hypothèse de la minimisation du risque associé à la production renouvelable, grâce à l’agrégation des ressources et confirment l’intérêt de la coopération des acteurs EnR et CSM dans leur participation au marché de l’électricité. Enfin, l’architecture proposée offre la possibilité de distribuer le processus de décision optimale entre les différents acteurs de la coalition : nous proposons quelques pistes de solution dans cette direction
Renewable Energy Sources (RES) has grown remarkably in last few decades. Compared to conventional energy sources, renewable generation is more available, sustainable and environment-friendly - for example, there is no greenhouse gases emission during the energy generation. However, while electrical network stability requires production and consumption equality and the electricity market constrains producers to contract future production a priori and respect their furniture commitments or pay substantial penalties, RES are mainly uncontrollable and their behavior is difficult to forecast accurately. De facto, they jeopardize the stability of the physical network and renewable producers competitiveness in the market. The Winpower project aims to design realistic, robust and stable control strategies for offshore networks connecting to the main electricity system renewable sources and controllable storage devices owned by different autonomous actors. Each actor must embed its own local physical device control strategy but a global network management mechanism, jointly decided between connected actors, should be designed as well.We assume a market participation of the actors as an unique entity (the coalition of actors connected by the Winpower network) allowing the coalition to facilitate the network management through resources aggregation, renewable producers to take advantage of controllable sources flexibility to handle market penalties risks, as well as storage devices owners to leverage their resources on the market and/or with the management of renewable imbalances. This work tackles the market participation of the coalition as a Cooperative Virtual Power Plant. For this purpose, we describe a multi-agent architecture trough the definition of intelligent agents managing and operating actors resources and the description of these agents interactions; it allows the alliance of local constraints and objectives and the global network management objective.We formalize the aggregation and planning of resources utilization as a Markov Decision Process (MDP), a formal model suited for sequential decision making in uncertain environments. Its aim is to define the sequence of actions which maximize expected actual incomes of the market participation, while decisions over controllable resources have uncertain outcomes. However, market participation decision is prior to the actual operation when renewable generation still is uncertain. Thus, the Markov Decision Process is intractable as its state in each decision time-slot is not fully observable. To solve such a Partially Observable MDP (POMDP), we decompose it into a classical MDP and an information state (a probability distribution over renewable generation errors). The Information State MDP (IS-MDP) obtained is solved with an adaptation of the Backwards Induction, a classical MDP resolution algorithm.Then, we describe a common simulation framework to compare our proposed methodology to some other strategies, including the state of the art in renewable generation market participation. Simulations results validate the resources aggregation strategy and confirm that cooperation is beneficial to renewable producers and storage devices owners when they participate in electricity market. The proposed architecture is designed to allow the distribution of the decision making between the coalition’s actors, through the implementation of a suitable coordination mechanism. We propose some distribution methodologies, to this end

Стилі APA, Harvard, Vancouver, ISO та ін.

17

El, Helou Melhem. "Radio Access Technology Selection in Heterogeneous Wireless Networks." Thesis, Rennes 1, 2014. http://www.theses.fr/2014REN1S086/document.

Повний текст джерела

Анотація:

Pour faire face à la croissance rapide du trafic mobile, différentes technologies d'accès radio (par exemple, HSPA, LTE, WiFi, et WiMAX) sont intégrées et gérées conjointement. Dans ce contexte, la sélection de TAR est une fonction clé pour améliorer les performances du réseau et l'expérience de l'utilisateur. Elle consiste à décider quelle TAR est la plus appropriée aux mobiles. Quand l'intelligence est poussée à la périphérie du réseau, les mobiles décident de manière autonome de leur meilleur TAR. Ils cherchent à maximiser égoïstement leur utilité. Toutefois, puisque les mobiles ne disposent d'aucune information sur les conditions de charge du réseau, leurs décisions peuvent conduire à une inefficacité de la performance. En outre, déléguer les décisions au réseau optimise la performance globale, mais au prix d'une augmentation de la complexité du réseau, des charges de signalisation et de traitement. Dans cette thèse, au lieu de favoriser une de ces deux approches décisionnelles, nous proposons un cadre de décision hybride: le réseau fournit des informations pour les mobiles pour mieux décider de leur TAR. Plus précisément, les utilisateurs mobiles choisissent leur TAR en fonction de leurs besoins et préférences individuelles, ainsi que des paramètres de coût monétaire et de QoS signalés par le réseau. En ajustant convenablement les informations du réseau, les décisions des utilisateurs répondent globalement aux objectifs de l'opérateur. Nous introduisons d'abord notre cadre de décision hybride. Afin de maximiser l'expérience de l'utilisateur, nous présentons une méthode de décision multicritère (MDMC) basée sur la satisfaction. Outre leurs conditions radio, les utilisateurs mobiles tiennent compte des paramètres de coût et de QoS, signalées par le réseau, pour évaluer les TAR disponibles. En comparaison avec les solutions existantes, notre algorithme répond aux besoins de l'utilisateur (par exemple, les demandes en débit, la tolérance de coût, la classe de trafic), et évite les décisions inadéquates. Une attention particulière est ensuite portée au réseau pour s'assurer qu'il diffuse des informations décisionnelles appropriées, afin de mieux exploiter ses ressources radio alors que les mobiles maximisent leur propre utilité. Nous présentons deux méthodes heuristiques pour dériver dynamiquement quoi signaler aux mobiles. Puisque les paramètres de QoS sont modulées en fonction des conditions de charge, l'exploitation des ressources radio s'est avérée efficace. Aussi, nous nous concentrons sur l'optimisation de l'information du réseau. La dérivation des paramètres de QoS est formulée comme un processus de décision semi-markovien, et les stratégies optimales sont calculées en utilisant l'algorithme de Policy Iteration. En outre, et puisque les paramètres du réseau ne peuvent pas être facilement obtenues, une approche par apprentissage par renforcement est introduite pour dériver quoi signaler aux mobiles
To cope with the rapid growth of mobile broadband traffic, various radio access technologies (e.g., HSPA, LTE, WiFi, and WiMAX) are being integrated and jointly managed. Radio Access Technology (RAT) selection, devoted to decide to what RAT mobiles should connect, is a key functionality to improve network performance and user experience. When intelligence is pushed to the network edge, mobiles make autonomous decisions regarding selection of their most appropriate RAT. They aim to selfishly maximize their utility. However, because mobiles have no information on network load conditions, their decisions may lead to performance inefficiency. Moreover, delegating decisions to the network optimizes overall performance, but at the cost of increased network complexity, signaling, and processing load. In this thesis, instead of favoring either of these decision-making approaches, we propose a hybrid decision framework: the network provides information for the mobiles to make robust RAT selections. More precisely, mobile users select their RAT depending on their individual needs and preferences, as well as on the monetary cost and QoS parameters signaled by the network. By appropriately tuning network information, user decisions are globally expected to meet operator objectives, avoiding undesirable network states. We first introduce our hybrid decision framework. Decision makings, on the network and user sides, are investigated. To maximize user experience, we present a satisfaction-based Multi-Criteria Decision-Making (MCDM) method. In addition to their radio conditions, mobile users consider the cost and QoS parameters, signaled by the network, to evaluate serving RATs. In comparison with existing MCDM solutions, our algorithm meets user needs (e.g., traffic class, throughput demand, cost tolerance), avoiding inadequate decisions. A particular attention is then addressed to the network to make sure it broadcasts suitable decisional information, so as to better exploit its radio resources while mobiles maximize their own utility. We present two heuristic methods to dynamically derive what to signal to mobiles. While QoS parameters are modulated as a function of the load conditions, radio resources are shown to be efficiently exploited. Moreover, we focus on optimizing network information. Deriving QoS parameters is formulated as a semi-Markov decision process, and optimal policies are computed using the Policy Iteration algorithm. Also, and since network parameters may not be easily obtained, a reinforcement learning approach is introduced to derive what to signal to mobiles. The performances of optimal, learning-based, and heuristic policies are analyzed. When thresholds are pertinently set, our heuristic method provides performance very close to the optimal solution. Moreover, although lower performances are observed, our learning-based algorithm has the crucial advantage of requiring no prior parameterization

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Teng, Sin Yong. "Intelligent Energy-Savings and Process Improvement Strategies in Energy-Intensive Industries." Doctoral thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2020. http://www.nusl.cz/ntk/nusl-433427.

Повний текст джерела

Анотація:

S tím, jak se neustále vyvíjejí nové technologie pro energeticky náročná průmyslová odvětví, stávající zařízení postupně zaostávají v efektivitě a produktivitě. Tvrdá konkurence na trhu a legislativa v oblasti životního prostředí nutí tato tradiční zařízení k ukončení provozu a k odstavení. Zlepšování procesu a projekty modernizace jsou zásadní v udržování provozních výkonů těchto zařízení. Současné přístupy pro zlepšování procesů jsou hlavně: integrace procesů, optimalizace procesů a intenzifikace procesů. Obecně se v těchto oblastech využívá matematické optimalizace, zkušeností řešitele a provozní heuristiky. Tyto přístupy slouží jako základ pro zlepšování procesů. Avšak, jejich výkon lze dále zlepšit pomocí moderní výpočtové inteligence. Účelem této práce je tudíž aplikace pokročilých technik umělé inteligence a strojového učení za účelem zlepšování procesů v energeticky náročných průmyslových procesech. V této práci je využit přístup, který řeší tento problém simulací průmyslových systémů a přispívá následujícím: (i)Aplikace techniky strojového učení, která zahrnuje jednorázové učení a neuro-evoluci pro modelování a optimalizaci jednotlivých jednotek na základě dat. (ii) Aplikace redukce dimenze (např. Analýza hlavních komponent, autoendkodér) pro vícekriteriální optimalizaci procesu s více jednotkami. (iii) Návrh nového nástroje pro analýzu problematických částí systému za účelem jejich odstranění (bottleneck tree analysis – BOTA). Bylo také navrženo rozšíření nástroje, které umožňuje řešit vícerozměrné problémy pomocí přístupu založeného na datech. (iv) Prokázání účinnosti simulací Monte-Carlo, neuronové sítě a rozhodovacích stromů pro rozhodování při integraci nové technologie procesu do stávajících procesů. (v) Porovnání techniky HTM (Hierarchical Temporal Memory) a duální optimalizace s několika prediktivními nástroji pro podporu managementu provozu v reálném čase. (vi) Implementace umělé neuronové sítě v rámci rozhraní pro konvenční procesní graf (P-graf). (vii) Zdůraznění budoucnosti umělé inteligence a procesního inženýrství v biosystémech prostřednictvím komerčně založeného paradigmatu multi-omics.

Стилі APA, Harvard, Vancouver, ISO та ін.

19

Marre, Jean-Baptiste. "L'évaluation économique des services écosystémiques marins et côtiers et son utilisation dans la prise de décision : cas d'étude en Nouvelle-Calédonie et en Australie." Thesis, Brest, 2014. http://www.theses.fr/2014BRES0087/document.

Повний текст джерела

Анотація:

No abstract
Coastal and marine ecosystems are some of the most heavily exploited with increasing degradation. This alarming situation appeals for urgent and effective actions. The optimal balance between use and conservation of ecosystems theoretically requires all costs and benefits to be considered in decision-making, including intangible costs and benefits such as non-market use and non-use values. The broad aim of this PhD is to examine how these economic values associated with coastal and marine ecosystem services can be measured, and how the economic valuation exercise may be considered and influence management decision- making.The first analytical part of the thesis focuses on assessing non-market use and non-use values, through econometric methods. The characterization and estimation of non-use values are complex and controversial, especially when the valuation exercise is focusing on individuals who are users of the ecosystem services being considered. An original approach based on a stated preference method, namely choice experiments, is developed then empirically applied in quantifying non-market values for marine and coastal ecosystems in two areas in New Caledonia. It allows the estimation of non-use values for populations of users in an implicit way. An in-depth analysis of the individuals’ choice heuristics during the valuation exercise is also conducted, with a focus on payment non-attendance. This issue is dealt with by comparing multiple modelling approaches in terms of: (1) inferred attendance, in relation to stated attendance; (2) attendance distribution according to several socio-economic variables; and (3) welfare estimates.After noting that the potential influence of economic valuation in decision making is unclear and largely unexplored in the literature, the second major component of this PhD aims to examine if, how and to what extent the economic valuation of ecosystem services, including measures of non-market values, influence decision-making regarding coastal and marine ecosystems management in Australia. Based on two nation-wide surveys, the perceived usefulness of the economic valuation of ecosystem services by the general public and decision-makers is studied, and the reasons why decision-makers may or may not fully consider economic values are elicited. Using a multi-criteria analysis, a part of the surveys also aims at examining the relative importance of different evaluation criteria (ecological, social and economic) when assessing the consequences of a hypothetical coastal development project on commercial activities, recreational activities and marine biodiversity

Стилі APA, Harvard, Vancouver, ISO та ін.

20

Saha, Subhamay. "Single and Multi-player Stochastic Dynamic Optimization." Thesis, 2013. http://etd.iisc.ernet.in/2005/3357.

Повний текст джерела

Анотація:

In this thesis we investigate single and multi-player stochastic dynamic optimization prob-lems. We consider both discrete and continuous time processes. In the multi-player setup we investigate zero-sum games with both complete and partial information. We study partially observable stochastic games with average cost criterion and the state process be-ing discrete time controlled Markov chain. The idea involved in studying this problem is to replace the original unobservable state variable with a suitable completely observable state variable. We establish the existence of the value of the game and also obtain optimal strategies for both players. We also study a continuous time zero-sum stochastic game with complete observation. In this case the state is a pure jump Markov process. We investigate the nite horizon total cost criterion. We characterise the value function via appropriate Isaacs equations. This also yields optimal Markov strategies for both players. In the single player setup we investigate risk-sensitive control of continuous time Markov chains. We consider both nite and in nite horizon problems. For the nite horizon total cost problem and the in nite horizon discounted cost problem we characterise the value function as the unique solution of appropriate Hamilton Jacobi Bellman equations. We also derive optimal Markov controls in both the cases. For the in nite horizon average cost case we shown the existence of an optimal stationary control. we also give a value iteration scheme for computing the optimal control in the case of nite state and action spaces. Further we introduce a new class of stochastic processes which we call stochastic processes with \age-dependent transition rates". We give a rigorous construction of the process. We prove that under certain assunptions the process is Feller. We also compute the limiting probabilities for our process. We then study the controlled version of the above process. In this case we take the risk-neutral cost criterion. We solve the in nite horizon discounted cost problem and the average cost problem for this process. The crucial step in analysing these problems is to prove that the original control problem is equivalent to an appropriate semi-Markov decision problem. Then the value functions and optimal controls are characterised using this equivalence and the theory of semi-Markov decision processes (SMDP). The analysis of nite horizon problems becomes di erent from that of in nite horizon problems because of the fact that in this case the idea of converting into an equivalent SMDP does not seem to work. So we deal with the nite horizon total cost problem by showing that our problem is equivalent to another appropriately de ned discrete time Markov decision problem. This allows us to characterise the value function and to nd an optimal Markov control.

Стилі APA, Harvard, Vancouver, ISO та ін.

21

Wang, Jue. "Multi-state Bayesian Process Control." Thesis, 2013. http://hdl.handle.net/1807/43750.

Повний текст джерела

Анотація:

Bayesian process control is a statistical process control (SPC) scheme that uses the posterior state probabilities as the control statistic. The key issue is to decide when to restore the process based on real-time observations. Such problems have been extensively studied in the framework of partially observable Markov decision processes (POMDP), with particular emphasis on the structure of optimal control policy. Almost all existing structural results on the optimal policies are limited to the two-state processes, where the class of control-limit policy is optimal. However, the two-state model is a gross simplification, as real production processes almost always involve multiple states. For example, a machine in the production system often has multiple failure modes differing in their effects; the deterioration process can often be divided into multiple stages with different degradation levels; the condition of a complex multi-unit system also requires a multi-state representation. We investigate the optimal control policies for multi-state processes with fixed sampling scheme, in which information about the process is represented by a belief vector within a high dimensional probability simplex. It is well known that obtaining structural results for such high-dimensional POMDP is challenging. Firstly, we prove that for an infinite-horizon process subject to multiple competing assignable causes, a so-called conditional control limit policy is optimal. The optimal policy divides the belief space into two individually connected regions, which have analytical bounds. Next, we address a finite-horizon process with at least one absorbing state and show that a structured optimal policy can be established by transforming the belief space into a polar coordinate system, where a so-called polar control limit policy is optimal. Our model is general enough to include many existing models in the literature as special cases. The structural results also lead to significantly efficient algorithms for computing the optimal policies. In addition, we characterize the condition for some out-of-control state to be more desirable than the in-control state. The existence of such counterintuitive situation indicates that multi-state process control is drastically different from the two-state case.

Стилі APA, Harvard, Vancouver, ISO та ін.

22

"TaxiWorld: Developing and Evaluating Solution Methods for Multi-Agent Planning Domains." Master's thesis, 2011. http://hdl.handle.net/2286/R.I.9358.

Повний текст джерела

Анотація:

abstract: TaxiWorld is a Matlab simulation of a city with a fleet of taxis which operate within it, with the goal of transporting passengers to their destinations. The size of the city, as well as the number of available taxis and the frequency and general locations of fare appearances can all be set on a scenario-by-scenario basis. The taxis must attempt to service the fares as quickly as possible, by picking each one up and carrying it to its drop-off location. The TaxiWorld scenario is formally modeled using both Decentralized Partially-Observable Markov Decision Processes (Dec-POMDPs) and Multi-agent Markov Decision Processes (MMDPs). The purpose of developing formal models is to learn how to build and use formal Markov models, such as can be given to planners to solve for optimal policies in problem domains. However, finding optimal solutions for Dec-POMDPs is NEXP-Complete, so an empirical algorithm was also developed as an improvement to the method already in use on the simulator, and the methods were compared in identical scenarios to determine which is more effective. The empirical method is of course not optimal - rather, it attempts to simply account for some of the most important factors to achieve an acceptable level of effectiveness while still retaining a reasonable level of computational complexity for online solving.
Dissertation/Thesis
M.S. Computer Science 2011

Стилі APA, Harvard, Vancouver, ISO та ін.

23

Royden-Turner, Stuart Jack. "Asset allocation in wealth management using stochastic models." Diss., 2016. http://hdl.handle.net/10500/22129.

Повний текст джерела

Анотація:

Modern financial asset pricing theory is a broad, and at times, complex field. The literature review in this study covers many of the asset pricing techniques including factor models, random walk models, correlation models, Bayesian methods, autoregressive models, moment-matching models, stochastic jumps and mean reversion models. An important topic in finance is portfolio opti-misation with respect to risk and reward such as the mean variance optimisation introduced by Markowitz (1952). This study covers optimisation techniques such as single period mean variance optimisation, optimisation with risk aversion, multi-period stochastic programs, two-fund separa- tion theory, downside optimisation techniques and multi-period optimisation such as the Bellman dynamic programming model. The question asked in this study is, in the context of investing for South African individuals in a multi-asset portfolio, whether an active investment strategy is signi cantly di erent from a passive investment strategy. The passive strategy is built using stochastic programming with moment matching methods for non-Gaussian asset class distributions. The strategy is optimised in a framework using a downside risk metric, the conditional variance at risk. The active strategy is built with forward forecasts for asset classes using the time-varying transitional-probability Markov regime switching model. The active portfolio is finalised by a dynamic optimisation using a two-stage stochastic programme with recourse, which is solved as a large linear program. A hypothesis test is used to establish whether the results of two strategies are statistically different. The performance of the strategies are also reviewed relative to multi-asset peer rankings. Lastly, we consider whether the findings reveal information on the degree of effi ciency in the market place for multi-asset investments for the South African investor.
Operations Management
M. Sc. (Operations Research)

Стилі APA, Harvard, Vancouver, ISO та ін.

24

Abed-Alguni, Bilal Hashem Kalil. "Cooperative reinforcement learning for independent learners." Thesis, 2014. http://hdl.handle.net/1959.13/1052917.

Повний текст джерела

Анотація:

Research Doctorate - Doctor of Philosophy (PhD)
Machine learning in multi-agent domains poses several research challenges. One challenge is how to model cooperation between reinforcement learners. Cooperation between independent reinforcement learners is known to accelerate convergence to optimal solutions. In large state space problems, independent reinforcement learners normally cooperate to accelerate the learning process using decomposition techniques or knowledge sharing strategies. This thesis presents two techniques to multi-agent reinforcement learning and a comparison study. The first technique is a formal decomposition model and an algorithm for distributed systems. The second technique is a cooperative Q-learning algorithm for multi-goal decomposable systems. The comparison study compares the performance of some of the best known cooperative Q-learning algorithms for independent learners. Distributed systems are normally organised into two levels: system and subsystem levels. This thesis presents a formal solution for decomposition of Markov Decision Processes (MDPs) in distributed systems that takes advantage of the organisation of distributed systems and provides support for migration of learners. This is accomplished by two proposals: a Distributed, Hierarchical Learning Model (DHLM) and an Intelligent Distributed Q-Learning algorithm (IDQL) that are based on three specialisations of agents: workers, tutors and consultants. Worker agents are the actual learners and performers of tasks, while tutor agents and consultant agents are coordinators at the subsystem level and the system level, respectively. A main duty of consultant and tutor agents is the assignment of problem space to worker agents. The experimental results in a distributed hunter prey problem suggest that IDQL converges to a solution faster than the single agent Q-learning approach. An important feature of DHLM is that it provides a solution for migration of agents. This feature provides support for the IDQL algorithm where the problem space of each worker agent can change dynamically. Other hierarchical RL models do not cover this issue. Problems that have multiple goal-states can be decomposed into sub-problems by taking advantage of the loosely-coupled bonds among the goal states. In such problems, each goal state and its problem space form a sub-problem. This thesis introduces Q-learning with Aggregation algorithm (QA-learning), an algorithm for problems with multiple goal-states that is based on two roles: learner and tutor. A learner is an agent that learns and uses the knowledge of its neighbours (tutors) to construct its Q-table. A tutor is a learner that is ready to share its Q-table with its neighbours (learners). These roles are based on the concept of learners reusing tutors' sub-solutions. This algorithm provides solutions to problems with multiple goal-states. In this algorithm, each learner incorporates its tutors' knowledge into its own Q-table calculations. A comprehensive solution can then be obtained by combining these partial solutions. The experimental results in an instance of the shortest path problem suggest that the output of QA-learning is comparable to the output of a single Q-learner whose problem space is the whole system. But the QA-learning algorithm converges to a solution faster than a single learner approach. Cooperative Q-learning algorithms for independent learners accelerate the learning process of individual learners. In this type of Q-learning, independent learners share and update their Q-values by following a sharing strategy after some episodes learning independently. This thesis presents a comparison study of the performance of some famous cooperative Q-learning algorithms (BEST-Q, AVE-Q, PSO-Q, and WSS) as well as an algorithm that aggregates their results. These algorithms are compared in two cases: equal experience and different experiences cases. In the first case, the learners have equal learning time, while in the second case, the learners have different learning times. The comparison study also examines the effects of the frequency of Q-value sharing on the learning speed of independent learners. The experimental results in the equal experience case indicate that sharing of Q-values is not beneficial and produces similar results to single agent Q-learning. While, the experimental results in the different experiences case suggest that each of the cooperative Q-learning algorithms performs similarly, but better than single agent Q-learning. In both cases, high-frequency sharing of Q-values accelerates the convergence to optimal solutions compared to low-frequency sharing. Low-frequency Q-value sharing degrades the performance of the cooperative Q-learning algorithms in the equal experience and different experiences cases.

Стилі APA, Harvard, Vancouver, ISO та ін.

25

"A MULTI-FUNCTIONAL PROVENANCE ARCHITECTURE: CHALLENGES AND SOLUTIONS." Thesis, 2013. http://hdl.handle.net/10388/ETD-2013-12-1419.

Повний текст джерела

Анотація:

In service-oriented environments, services are put together in the form of a workflow with the aim of distributed problem solving. Capturing the execution details of the services' transformations is a significant advantage of using workflows. These execution details, referred to as provenance information, are usually traced automatically and stored in provenance stores. Provenance data contains the data recorded by a workflow engine during a workflow execution. It identifies what data is passed between services, which services are involved, and how results are eventually generated for particular sets of input values. Provenance information is of great importance and has found its way through areas in computer science such as: Bioinformatics, database, social, sensor networks, etc. Current exploitation and application of provenance data is very limited as provenance systems started being developed for specific applications. Thus, applying learning and knowledge discovery methods to provenance data can provide rich and useful information on workflows and services. Therefore, in this work, the challenges with workflows and services are studied to discover the possibilities and benefits of providing solutions by using provenance data. A multifunctional architecture is presented which addresses the workflow and service issues by exploiting provenance data. These challenges include workflow composition, abstract workflow selection, refinement, evaluation, and graph model extraction. The specific contribution of the proposed architecture is its novelty in providing a basis for taking advantage of the previous execution details of services and workflows along with artificial intelligence and knowledge management techniques to resolve the major challenges regarding workflows. The presented architecture is application-independent and could be deployed in any area. The requirements for such an architecture along with its building components are discussed. Furthermore, the responsibility of the components, related works and the implementation details of the architecture along with each component are presented.

Стилі APA, Harvard, Vancouver, ISO та ін.

26

Karim, Mohammad Shahedul. "Instantly Decodable Network Coding: From Point to Multi-Point to Device-to-Device Communications." Phd thesis, 2016. http://hdl.handle.net/1885/118239.

Повний текст джерела

Анотація:

The network coding paradigm enhances transmission efficiency by combining information flows and has drawn significant attention in information theory, networking, communications and data storage. Instantly decodable network coding (IDNC), a subclass of network coding, has demonstrated its ability to improve the quality of service of time critical applications thanks to its attractive properties, namely the throughput enhancement, delay reduction, simple XOR-based encoding and decoding, and small coefficient overhead. Nonetheless, for point to multi-point (PMP) networks, IDNC cannot guarantee the decoding of a specific new packet at individual devices in each transmission. Furthermore, for device-to-device (D2D) networks, the transmitting devices may possess only a subset of packets, which can be used to form coded packets. These challenges require the optimization of IDNC algorithms to be suitable for different application requirements and network configurations. In this thesis, we first study a scalable live video broadcast over a wireless PMP network, where the devices receive video packets from a base station. Such layered live video has a hard deadline and imposes a decoding order on the video layers. We design two prioritized IDNC algorithms that provide a high level of priority to the most important video layer before considering additional video layers in coding decisions. These prioritized algorithms are shown to increase the number of decoded video layers at the devices compared to the existing network coding schemes. We then study video distribution over a partially connected D2D network, where a group of devices cooperate with each other to recover their missing video content. We introduce a cooperation aware IDNC graph that defines all feasible coding and transmission conflictfree decisions. Using this graph, we propose an IDNC solution that avoids coding and transmission conflicts, and meets the hard deadline for high importance video packets. It is demonstrated that the proposed solution delivers an improved video quality to the devices compared to the video and cooperation oblivious coding schemes. We also consider a heterogeneous network wherein devices use two wireless interfaces to receive packets from the base station and another device concurrently. For such network, we are interested in applications with reliable in-order packet delivery requirements. We represent all feasible coding opportunities and conflict-free transmissions using a dual interface IDNC graph. We select a maximal independent set over the graph by considering dual interfaces of individual devices, in-order delivery requirements of packets and lossy channel conditions. This graph based solution is shown to reduce the in-order delivery delay compared to the existing network coding schemes. Finally, we consider a D2D network with a group of devices experiencing heterogeneous channel capacities. For such cooperative scenarios, we address the problem of minimizing the completion time required for recovering all missing packets at the devices using IDNC and physical layer rate adaptation. Our proposed IDNC algorithm balances between the adopted transmission rate and the number of targeted devices that can successfully receive the transmitted packet. We show that the proposed rate aware IDNC algorithm reduces the completion time compared to the rate oblivious coding schemes

Стилі APA, Harvard, Vancouver, ISO та ін.

27

Kumar, Sandip. "Generalized Sampling-Based Feedback Motion Planners." Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10663.

Повний текст джерела

Анотація:

The motion planning problem can be formulated as a Markov decision process (MDP), if the uncertainties in the robot motion and environments can be modeled probabilistically. The complexity of solving these MDPs grow exponentially as the dimension of the problem increases and hence, it is nearly impossible to solve the problem even without constraints. Using hierarchical methods, these MDPs can be transformed into a semi-Markov decision process (SMDP) which only needs to be solved at certain landmark states. In the deterministic robotics motion planning community, sampling based algorithms like probabilistic roadmaps (PRM) and rapidly exploring random trees (RRTs) have been successful in solving very high dimensional deterministic problem. However they are not robust to system with uncertainties in the system dynamics and hence, one of the primary objective of this work is to generalize PRM/RRT to solve motion planning with uncertainty. We first present generalizations of randomized sampling based algorithms PRM and RRT, to incorporate the process uncertainty, and obstacle location uncertainty, termed as "generalized PRM" (GPRM) and "generalized RRT" (GRRT). The controllers used at the lower level of these planners are feedback controllers which ensure convergence of trajectories while mitigating the effects of process uncertainty. The results indicate that the algorithms solve the motion planning problem for a single agent in continuous state/control spaces in the presence of process uncertainty, and constraints such as obstacles and other state/input constraints. Secondly, a novel adaptive sampling technique, termed as "adaptive GPRM" (AGPRM), is proposed for these generalized planners to increase the efficiency and overall success probability of these planners. It was implemented on high-dimensional robot n-link manipulators, with up to 8 links, i.e. in a 16-dimensional state-space. The results demonstrate the ability of the proposed algorithm to handle the motion planning problem for highly non-linear systems in very high-dimensional state space. Finally, a solution methodology, termed the "multi-agent AGPRM" (MAGPRM), is proposed to solve the multi-agent motion planning problem under uncertainty. The technique uses a existing solution technique to the multiple traveling salesman problem (MTSP) in conjunction with GPRM. For real-time implementation, an ?inter-agent collision detection and avoidance? module was designed which ensures that no two agents collide at any time-step. Algorithm was tested on teams of homogeneous and heterogeneous agents in cluttered obstacle space and the algorithm demonstrate the ability to handle such problems in continuous state/control spaces in presence of process uncertainty.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!