Зміст
Добірка наукової літератури з теми "Processus de décision markovien partiellement observable"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Processus de décision markovien partiellement observable".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Дисертації з теми "Processus de décision markovien partiellement observable"
Poiron-Guidoni, Nicolas. "Apports des méthodes d’optimisation et du calcul haute performance à la théorie de la modélisation et de la simulation : application à la gestion des ressources halieutiques." Thesis, Corte, 2021. http://www.theses.fr/2021CORT0013.
Повний текст джерелаThe computer science project (SiSU) of the CNRS Science for the Environment Joint Research Unit designs decision support methods to help better management of complex environmental systems.This thesis work is part of this context. They aim to study the contributions of several types of computer methods to improve our knowledge of complex systems and thus provide assistance in their management in situations of high uncertainty. Indeed, complex environmental systems cannot always be known and modeled with precision. This is for example the case in fisheries biology where management methods must be proposed despite a lack of knowledge on the observed system, in our case study: the Corsican coastal fishery. Our first work focused on the calibration of models, i.e. the search for parameter values allowing our models to best represent the dynamics of the system. They have shown the limits of the usual approaches and the need to use probabilistic approaches based on large quantities of simulations. They bring a precious help for the acquisition of knowledge, in particular by delimiting sets of solutions. These sets can then be used in robust optimization methods, or even in adjustable robust optimization. These approaches allow not only to take into account the uncertainties, but also to quantify the reduction of uncertainty that new years of data can bring, in order to propose more and more precise strategies in the long term. Optimization can therefore be used effectively at the level of decision makers. However, the small-scale coastal fishery in Corsica is a system in which a large number of actors act with different behaviors that are difficult to predict and control. Optimization does not seem adapted to the study of this scale because of the quantity of parameters and the infinite number of stochastic transitions generated. For this, methods based on deep reinforcement learning have been proposed. These approaches allowed us to propose a model that manages both decision-makers and fishermen, the former seeking to reduce the ecological impact, the latter to maximize their gains. From this, we were able to show that little knowledge is sufficient for the maximization of the fishermen's gains. Moreover, this approach, coupled with optimization, allowed us to obtain efficient quota decisions. Finally, this system allowed us to study the impact of certain individual behaviors of maximizing gains to the detriment of respecting the recommendations of the decision makers. It then appeared that effective and adapted management policies can help to mitigate the ecological impact of a significant amount of these behaviors. Thus, we were able to contribute in a theoretical way to broaden the application domains of the theory of modeling and simulation, to propose a set of optimization and machine learning tools for the management of dynamic systems partially observable, but also applicative for the problem of fisheries management in Corsica
Habachi, Oussama. "Optimisation des Systèmes Partiellement Observables dans les Réseaux Sans-fil : Théorie des jeux, Auto-adaptation et Apprentissage." Phd thesis, Université d'Avignon, 2012. http://tel.archives-ouvertes.fr/tel-00799903.
Повний текст джерелаIbrahim, Rita. "Utilisation des communications Device-to-Device pour améliorer l'efficacité des réseaux cellulaires." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLC002/document.
Повний текст джерелаThis thesis considers Device-to-Device (D2D) communications as a promising technique for enhancing future cellular networks. Modeling, evaluating and optimizing D2D features are the fundamental goals of this thesis and are mainly achieved using the following mathematical tools: queuing theory, Lyapunov optimization and Partially Observed Markov Decision Process (POMDP). The findings of this study are presented in three parts. In the first part, we investigate a D2D mode selection scheme. We derive the queuing stability regions of both scenarios: pure cellular networks and D2D-enabled cellular networks. Comparing both scenarios leads us to elaborate a D2D vs cellular mode selection design that improves the capacity of the network. In the second part, we develop a D2D resource allocation algorithm. We observe that D2D users are able to estimate their local Channel State Information (CSI), however the base station needs some signaling exchange to acquire this information. Based on the D2D users' knowledge of their local CSI, we provide an energy efficient resource allocation framework that shows how distributed scheduling outperforms centralized one. In the distributed approach, collisions may occur between the different CSI reporting; thus, we propose a collision reduction algorithm. Moreover, we give a detailed description on how both centralized and distributed algorithms can be implemented in practice. In the third part, we propose a mobile relay selection policy in a D2D relay-aided network. Relays' mobility appears as a crucial challenge for defining the strategy of selecting the optimal D2D relays. The problem is formulated as a constrained POMDP which captures the dynamism of the relays and aims to find the optimal relay selection policy that maximizes the performance of the network under cost constraints
Duran, Santiago. "Resource allocation with observable and unobservable environments." Thesis, Toulouse 3, 2020. http://www.theses.fr/2020TOU30018.
Повний текст джерелаThis thesis studies resource allocation problems in large-scale stochastic networks. We work on problems where the availability of resources is subject to time fluctuations, a situation that one may encounter, for example, in load balancing systems or in wireless downlink scheduling systems. The time fluctuations are modelled considering two types of processes, controllable processes, whose evolution depends on the action of the decision maker, and environment processes, whose evolution is exogenous. The stochastic evolution of the controllable process depends on the the current state of the environment. Depending on whether the decision maker observes the state of the environment, we say that the environment is observable or unobservable. The mathematical formulation used is the Markov Decision Processes (MDPs).The thesis follows three main research axes. In the first problem we study the optimal control of a Multi-armed restless bandit problem (MARBP) with an unobservable environment. The objective is to characterise the optimal policy for the controllable process in spite of the fact that the environment cannot be observed. We consider the large-scale asymptotic regime in which the number of bandits and the speed of the environment both tend to infinity. In our main result we establish that a set of priority policies is asymptotically optimal. We show that, in particular, this set includes Whittle index policy of a system whose parameters are averaged over the stationary behaviour of the environment. In the second problem, we consider an MARBP with an observable environment. The objective is to leverage information on the environment to derive an optimal policy for the controllable process. Assuming that the technical condition of indexability holds, we develop an algorithm to compute Whittle's index. We then apply this result to the particular case of a queue with abandonments. We prove indexability, and we provide closed-form expressions of Whittle's index. In the third problem we consider a model of a large-scale storage system, where there are files distributed across a set of nodes. Each node breaks down following a law that depends on the load it handles. Whenever a node breaks down, all the files it had are reallocated to other nodes. We study the evolution of the load of a single node in the mean-field regime, when the number of nodes and files grow large. We prove the existence of the process in the mean-field regime. We further show the convergence in distribution of the load in steady state as the average number of files per node tends to infinity
Filippi, Sarah. "Stratégies optimistes en apprentissage par renforcement." Phd thesis, Ecole nationale supérieure des telecommunications - ENST, 2010. http://tel.archives-ouvertes.fr/tel-00551401.
Повний текст джерела