Academic literature on the topic 'Stochastic Multi-armed Bandit'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Stochastic Multi-armed Bandit.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Stochastic Multi-armed Bandit"
Xiong, Guojun, and Jian Li. "Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 9 (June 26, 2023): 10528–36. http://dx.doi.org/10.1609/aaai.v37i9.26251.
Full textCiucanu, Radu, Pascal Lafourcade, Gael Marcadet, and Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits." Journal of Artificial Intelligence Research 73 (February 23, 2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Full textWan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang, and Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Full textLesage-Landry, Antoine, and Joshua A. Taylor. "The Multi-Armed Bandit With Stochastic Plays." IEEE Transactions on Automatic Control 63, no. 7 (July 2018): 2280–86. http://dx.doi.org/10.1109/tac.2017.2765501.
Full textEsfandiari, Hossein, Amin Karbasi, Abbas Mehrabian, and Vahab Mirrokni. "Regret Bounds for Batched Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.
Full textDzhoha, A. S. "Sequential resource allocation in a stochastic environment: an overview and numerical experiments." Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, no. 3 (2021): 13–25. http://dx.doi.org/10.17721/1812-5409.2021/3.1.
Full textJuditsky, A., A. V. Nazin, A. B. Tsybakov, and N. Vayatis. "Gap-free Bounds for Stochastic Multi-Armed Bandit." IFAC Proceedings Volumes 41, no. 2 (2008): 11560–63. http://dx.doi.org/10.3182/20080706-5-kr-1001.01959.
Full textAllesiardo, Robin, Raphaël Féraud, and Odalric-Ambrym Maillard. "The non-stationary stochastic multi-armed bandit problem." International Journal of Data Science and Analytics 3, no. 4 (March 30, 2017): 267–83. http://dx.doi.org/10.1007/s41060-017-0050-5.
Full textHuo, Xiaoguang, and Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection." Royal Society Open Science 4, no. 11 (November 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Full textXu, Lily, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, and Milind Tambe. "Dual-Mandate Patrols: Multi-Armed Bandits for Green Security." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 14974–82. http://dx.doi.org/10.1609/aaai.v35i17.17757.
Full textDissertations / Theses on the topic "Stochastic Multi-armed Bandit"
Wang, Kehao. "Multi-channel opportunistic access : a restless multi-armed bandit perspective." Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00832569.
Full textCELLA, LEONARDO. "EFFICIENCY AND REALISM IN STOCHASTIC BANDITS." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/807862.
Full textMénard, Pierre. "Sur la notion d'optimalité dans les problèmes de bandit stochastique." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30087/document.
Full textThe topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning
Ruíz, Hernández Diego. "Essays on indexability of stochastic sheduling and dynamic allocation problems." Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7347.
Full textThe second class of problems concerns two families of Markov decision problems. The spinning plates problem concerns the optimal management of a portfolio of assets whose yields grow with investment but otherwise decline. In the model of asset exploitation called the squad system, the yield from an asset declines when it is utilised but will recover when the asset is at rest. Simply stated conditions are given which guarantee general indexability of the problem together with necessary and sufficient conditions for strict indexability. The index heuristics, which emerge from the analysis, are assessed numerically and found to perform strongly.
Degenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms." Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Full textIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Magureanu, Stefan. "Structured Stochastic Bandits." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Full textQC 20160223
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Full textThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
McInerney, Robert E. "Decision making under uncertainty." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:a34e87ad-8330-42df-8ba6-d55f10529331.
Full textCayci, Semih. "Online Learning for Optimal Control of Communication and Computing Systems." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595516470389826.
Full textCouetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Full textIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Books on the topic "Stochastic Multi-armed Bandit"
Bubeck, Sébastian, and Cesa-Bianchi Nicolò. Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Now Publishers, 2012.
Find full textBook chapters on the topic "Stochastic Multi-armed Bandit"
Zheng, Rong, and Cunqing Hua. "Stochastic Multi-armed Bandit." In Wireless Networks, 9–25. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_2.
Full textAgrawal, Shipra. "The Stochastic Multi-Armed Bandit Problem." In Springer Series in Supply Chain Management, 3–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-01926-5_1.
Full textPanaganti, Kishan, Dileep Kalathil, and Pravin Varaiya. "Bounded Regret for Finitely Parameterized Multi-Armed Bandits." In Stochastic Analysis, Filtering, and Stochastic Optimization, 411–29. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98519-6_17.
Full textMaillard, Odalric-Ambrym. "Robust Risk-Averse Stochastic Multi-armed Bandits." In Lecture Notes in Computer Science, 218–33. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40935-6_16.
Full textConference papers on the topic "Stochastic Multi-armed Bandit"
Vakili, Sattar, Qing Zhao, and Yuan Zhou. "Time-varying stochastic multi-armed bandit problems." In 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 2014. http://dx.doi.org/10.1109/acssc.2014.7094845.
Full textChang, Hyeong Soo, Michael C. Fu, and Steven I. Marcus. "Adversarial Multi-Armed Bandit Approach to Stochastic Optimization." In Proceedings of the 45th IEEE Conference on Decision and Control. IEEE, 2006. http://dx.doi.org/10.1109/cdc.2006.377724.
Full textZhang, Xiaofang, Qian Zhou, Peng Zhang, and Quan Liu. "Adaptive Exploration in Stochastic Multi-armed Bandit Problem." In MOL2NET 2016, International Conference on Multidisciplinary Sciences, 2nd edition. Basel, Switzerland: MDPI, 2016. http://dx.doi.org/10.3390/mol2net-02-03848.
Full textKveton, Branislav, Csaba Szepesvári, Mohammad Ghavamzadeh, and Craig Boutilier. "Perturbed-History Exploration in Stochastic Multi-Armed Bandits." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/386.
Full textCarlsson, Emil, Devdatt Dubhashi, and Fredrik D. Johansson. "Thompson Sampling for Bandits with Clustered Arms." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/305.
Full textMuller, Matias I., Patricio E. Valenzuela, Alexandre Proutiere, and Cristian R. Rojas. "A stochastic multi-armed bandit approach to nonparametric H∞-norm estimation." In 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017. http://dx.doi.org/10.1109/cdc.2017.8264343.
Full textZhao, Tianchi, Bo Jiang, Ming Li, and Ravi Tandon. "Regret Analysis of Stochastic Multi-armed Bandit Problem with Clustered Information Feedback." In 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207422.
Full textMadhushani, Udari, and Naomi Ehrich Leonard. "Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem." In 2019 18th European Control Conference (ECC). IEEE, 2019. http://dx.doi.org/10.23919/ecc.2019.8796036.
Full textWang, Xiong, and Riheng Jia. "Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward." In Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/429.
Full textRomano, Giulia, Andrea Agostini, Francesco Trovò, Nicola Gatti, and Marcello Restelli. "Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/472.
Full textReports on the topic "Stochastic Multi-armed Bandit"
Glazebrook, Kevin D., Donald P. Gaver, and Patricia A. Jacobs. Military Stochastic Scheduling Treated As a 'Multi-Armed Bandit' Problem. Fort Belvoir, VA: Defense Technical Information Center, September 2001. http://dx.doi.org/10.21236/ada385864.
Full text