Literatura académica sobre el tema "Stochastic Multi-armed Bandit"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Stochastic Multi-armed Bandit".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Stochastic Multi-armed Bandit"
Xiong, Guojun y Jian Li. "Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 9 (26 de junio de 2023): 10528–36. http://dx.doi.org/10.1609/aaai.v37i9.26251.
Texto completoCiucanu, Radu, Pascal Lafourcade, Gael Marcadet y Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits". Journal of Artificial Intelligence Research 73 (23 de febrero de 2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Texto completoWan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang y Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 8 (26 de junio de 2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Texto completoLesage-Landry, Antoine y Joshua A. Taylor. "The Multi-Armed Bandit With Stochastic Plays". IEEE Transactions on Automatic Control 63, n.º 7 (julio de 2018): 2280–86. http://dx.doi.org/10.1109/tac.2017.2765501.
Texto completoEsfandiari, Hossein, Amin Karbasi, Abbas Mehrabian y Vahab Mirrokni. "Regret Bounds for Batched Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 8 (18 de mayo de 2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.
Texto completoDzhoha, A. S. "Sequential resource allocation in a stochastic environment: an overview and numerical experiments". Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, n.º 3 (2021): 13–25. http://dx.doi.org/10.17721/1812-5409.2021/3.1.
Texto completoJuditsky, A., A. V. Nazin, A. B. Tsybakov y N. Vayatis. "Gap-free Bounds for Stochastic Multi-Armed Bandit". IFAC Proceedings Volumes 41, n.º 2 (2008): 11560–63. http://dx.doi.org/10.3182/20080706-5-kr-1001.01959.
Texto completoAllesiardo, Robin, Raphaël Féraud y Odalric-Ambrym Maillard. "The non-stationary stochastic multi-armed bandit problem". International Journal of Data Science and Analytics 3, n.º 4 (30 de marzo de 2017): 267–83. http://dx.doi.org/10.1007/s41060-017-0050-5.
Texto completoHuo, Xiaoguang y Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection". Royal Society Open Science 4, n.º 11 (noviembre de 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Texto completoXu, Lily, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang y Milind Tambe. "Dual-Mandate Patrols: Multi-Armed Bandits for Green Security". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 17 (18 de mayo de 2021): 14974–82. http://dx.doi.org/10.1609/aaai.v35i17.17757.
Texto completoTesis sobre el tema "Stochastic Multi-armed Bandit"
Wang, Kehao. "Multi-channel opportunistic access : a restless multi-armed bandit perspective". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00832569.
Texto completoCELLA, LEONARDO. "EFFICIENCY AND REALISM IN STOCHASTIC BANDITS". Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/807862.
Texto completoMénard, Pierre. "Sur la notion d'optimalité dans les problèmes de bandit stochastique". Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30087/document.
Texto completoThe topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning
Ruíz, Hernández Diego. "Essays on indexability of stochastic sheduling and dynamic allocation problems". Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7347.
Texto completoThe second class of problems concerns two families of Markov decision problems. The spinning plates problem concerns the optimal management of a portfolio of assets whose yields grow with investment but otherwise decline. In the model of asset exploitation called the squad system, the yield from an asset declines when it is utilised but will recover when the asset is at rest. Simply stated conditions are given which guarantee general indexability of the problem together with necessary and sufficient conditions for strict indexability. The index heuristics, which emerge from the analysis, are assessed numerically and found to perform strongly.
Degenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms". Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Texto completoIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Magureanu, Stefan. "Structured Stochastic Bandits". Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Texto completoQC 20160223
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Texto completoThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
McInerney, Robert E. "Decision making under uncertainty". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:a34e87ad-8330-42df-8ba6-d55f10529331.
Texto completoCayci, Semih. "Online Learning for Optimal Control of Communication and Computing Systems". The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595516470389826.
Texto completoCouetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems". Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Texto completoIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Libros sobre el tema "Stochastic Multi-armed Bandit"
Bubeck, Sébastian y Cesa-Bianchi Nicolò. Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Now Publishers, 2012.
Buscar texto completoCapítulos de libros sobre el tema "Stochastic Multi-armed Bandit"
Zheng, Rong y Cunqing Hua. "Stochastic Multi-armed Bandit". En Wireless Networks, 9–25. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_2.
Texto completoAgrawal, Shipra. "The Stochastic Multi-Armed Bandit Problem". En Springer Series in Supply Chain Management, 3–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-01926-5_1.
Texto completoPanaganti, Kishan, Dileep Kalathil y Pravin Varaiya. "Bounded Regret for Finitely Parameterized Multi-Armed Bandits". En Stochastic Analysis, Filtering, and Stochastic Optimization, 411–29. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98519-6_17.
Texto completoMaillard, Odalric-Ambrym. "Robust Risk-Averse Stochastic Multi-armed Bandits". En Lecture Notes in Computer Science, 218–33. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40935-6_16.
Texto completoActas de conferencias sobre el tema "Stochastic Multi-armed Bandit"
Vakili, Sattar, Qing Zhao y Yuan Zhou. "Time-varying stochastic multi-armed bandit problems". En 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 2014. http://dx.doi.org/10.1109/acssc.2014.7094845.
Texto completoChang, Hyeong Soo, Michael C. Fu y Steven I. Marcus. "Adversarial Multi-Armed Bandit Approach to Stochastic Optimization". En Proceedings of the 45th IEEE Conference on Decision and Control. IEEE, 2006. http://dx.doi.org/10.1109/cdc.2006.377724.
Texto completoZhang, Xiaofang, Qian Zhou, Peng Zhang y Quan Liu. "Adaptive Exploration in Stochastic Multi-armed Bandit Problem". En MOL2NET 2016, International Conference on Multidisciplinary Sciences, 2nd edition. Basel, Switzerland: MDPI, 2016. http://dx.doi.org/10.3390/mol2net-02-03848.
Texto completoKveton, Branislav, Csaba Szepesvári, Mohammad Ghavamzadeh y Craig Boutilier. "Perturbed-History Exploration in Stochastic Multi-Armed Bandits". En Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/386.
Texto completoCarlsson, Emil, Devdatt Dubhashi y Fredrik D. Johansson. "Thompson Sampling for Bandits with Clustered Arms". En Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/305.
Texto completoMuller, Matias I., Patricio E. Valenzuela, Alexandre Proutiere y Cristian R. Rojas. "A stochastic multi-armed bandit approach to nonparametric H∞-norm estimation". En 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017. http://dx.doi.org/10.1109/cdc.2017.8264343.
Texto completoZhao, Tianchi, Bo Jiang, Ming Li y Ravi Tandon. "Regret Analysis of Stochastic Multi-armed Bandit Problem with Clustered Information Feedback". En 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207422.
Texto completoMadhushani, Udari y Naomi Ehrich Leonard. "Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem". En 2019 18th European Control Conference (ECC). IEEE, 2019. http://dx.doi.org/10.23919/ecc.2019.8796036.
Texto completoWang, Xiong y Riheng Jia. "Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward". En Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/429.
Texto completoRomano, Giulia, Andrea Agostini, Francesco Trovò, Nicola Gatti y Marcello Restelli. "Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts". En Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/472.
Texto completoInformes sobre el tema "Stochastic Multi-armed Bandit"
Glazebrook, Kevin D., Donald P. Gaver y Patricia A. Jacobs. Military Stochastic Scheduling Treated As a 'Multi-Armed Bandit' Problem. Fort Belvoir, VA: Defense Technical Information Center, septiembre de 2001. http://dx.doi.org/10.21236/ada385864.
Texto completo