Gotowa bibliografia na temat „Stochastic Multi-armed Bandit”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Stochastic Multi-armed Bandit”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Stochastic Multi-armed Bandit"
Xiong, Guojun, i Jian Li. "Decentralized Stochastic Multi-Player Multi-Armed Walking Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 9 (26.06.2023): 10528–36. http://dx.doi.org/10.1609/aaai.v37i9.26251.
Pełny tekst źródłaCiucanu, Radu, Pascal Lafourcade, Gael Marcadet i Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits". Journal of Artificial Intelligence Research 73 (23.02.2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Pełny tekst źródłaWan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang i Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 8 (26.06.2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Pełny tekst źródłaLesage-Landry, Antoine, i Joshua A. Taylor. "The Multi-Armed Bandit With Stochastic Plays". IEEE Transactions on Automatic Control 63, nr 7 (lipiec 2018): 2280–86. http://dx.doi.org/10.1109/tac.2017.2765501.
Pełny tekst źródłaEsfandiari, Hossein, Amin Karbasi, Abbas Mehrabian i Vahab Mirrokni. "Regret Bounds for Batched Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 8 (18.05.2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.
Pełny tekst źródłaDzhoha, A. S. "Sequential resource allocation in a stochastic environment: an overview and numerical experiments". Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, nr 3 (2021): 13–25. http://dx.doi.org/10.17721/1812-5409.2021/3.1.
Pełny tekst źródłaJuditsky, A., A. V. Nazin, A. B. Tsybakov i N. Vayatis. "Gap-free Bounds for Stochastic Multi-Armed Bandit". IFAC Proceedings Volumes 41, nr 2 (2008): 11560–63. http://dx.doi.org/10.3182/20080706-5-kr-1001.01959.
Pełny tekst źródłaAllesiardo, Robin, Raphaël Féraud i Odalric-Ambrym Maillard. "The non-stationary stochastic multi-armed bandit problem". International Journal of Data Science and Analytics 3, nr 4 (30.03.2017): 267–83. http://dx.doi.org/10.1007/s41060-017-0050-5.
Pełny tekst źródłaHuo, Xiaoguang, i Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection". Royal Society Open Science 4, nr 11 (listopad 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Pełny tekst źródłaXu, Lily, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang i Milind Tambe. "Dual-Mandate Patrols: Multi-Armed Bandits for Green Security". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 17 (18.05.2021): 14974–82. http://dx.doi.org/10.1609/aaai.v35i17.17757.
Pełny tekst źródłaRozprawy doktorskie na temat "Stochastic Multi-armed Bandit"
Wang, Kehao. "Multi-channel opportunistic access : a restless multi-armed bandit perspective". Phd thesis, Université Paris Sud - Paris XI, 2012. http://tel.archives-ouvertes.fr/tel-00832569.
Pełny tekst źródłaCELLA, LEONARDO. "EFFICIENCY AND REALISM IN STOCHASTIC BANDITS". Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/807862.
Pełny tekst źródłaMénard, Pierre. "Sur la notion d'optimalité dans les problèmes de bandit stochastique". Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30087/document.
Pełny tekst źródłaThe topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning
Ruíz, Hernández Diego. "Essays on indexability of stochastic sheduling and dynamic allocation problems". Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7347.
Pełny tekst źródłaThe second class of problems concerns two families of Markov decision problems. The spinning plates problem concerns the optimal management of a portfolio of assets whose yields grow with investment but otherwise decline. In the model of asset exploitation called the squad system, the yield from an asset declines when it is utilised but will recover when the asset is at rest. Simply stated conditions are given which guarantee general indexability of the problem together with necessary and sufficient conditions for strict indexability. The index heuristics, which emerge from the analysis, are assessed numerically and found to perform strongly.
Degenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms". Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Pełny tekst źródłaIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Magureanu, Stefan. "Structured Stochastic Bandits". Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Pełny tekst źródłaQC 20160223
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Pełny tekst źródłaThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
McInerney, Robert E. "Decision making under uncertainty". Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:a34e87ad-8330-42df-8ba6-d55f10529331.
Pełny tekst źródłaCayci, Semih. "Online Learning for Optimal Control of Communication and Computing Systems". The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595516470389826.
Pełny tekst źródłaCouetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems". Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Pełny tekst źródłaIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Książki na temat "Stochastic Multi-armed Bandit"
Bubeck, Sébastian, i Cesa-Bianchi Nicolò. Regret Analysis of Stochastic and Nonstochastic Multi-Armed Bandit Problems. Now Publishers, 2012.
Znajdź pełny tekst źródłaCzęści książek na temat "Stochastic Multi-armed Bandit"
Zheng, Rong, i Cunqing Hua. "Stochastic Multi-armed Bandit". W Wireless Networks, 9–25. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_2.
Pełny tekst źródłaAgrawal, Shipra. "The Stochastic Multi-Armed Bandit Problem". W Springer Series in Supply Chain Management, 3–13. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-01926-5_1.
Pełny tekst źródłaPanaganti, Kishan, Dileep Kalathil i Pravin Varaiya. "Bounded Regret for Finitely Parameterized Multi-Armed Bandits". W Stochastic Analysis, Filtering, and Stochastic Optimization, 411–29. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-98519-6_17.
Pełny tekst źródłaMaillard, Odalric-Ambrym. "Robust Risk-Averse Stochastic Multi-armed Bandits". W Lecture Notes in Computer Science, 218–33. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40935-6_16.
Pełny tekst źródłaStreszczenia konferencji na temat "Stochastic Multi-armed Bandit"
Vakili, Sattar, Qing Zhao i Yuan Zhou. "Time-varying stochastic multi-armed bandit problems". W 2014 48th Asilomar Conference on Signals, Systems and Computers. IEEE, 2014. http://dx.doi.org/10.1109/acssc.2014.7094845.
Pełny tekst źródłaChang, Hyeong Soo, Michael C. Fu i Steven I. Marcus. "Adversarial Multi-Armed Bandit Approach to Stochastic Optimization". W Proceedings of the 45th IEEE Conference on Decision and Control. IEEE, 2006. http://dx.doi.org/10.1109/cdc.2006.377724.
Pełny tekst źródłaZhang, Xiaofang, Qian Zhou, Peng Zhang i Quan Liu. "Adaptive Exploration in Stochastic Multi-armed Bandit Problem". W MOL2NET 2016, International Conference on Multidisciplinary Sciences, 2nd edition. Basel, Switzerland: MDPI, 2016. http://dx.doi.org/10.3390/mol2net-02-03848.
Pełny tekst źródłaKveton, Branislav, Csaba Szepesvári, Mohammad Ghavamzadeh i Craig Boutilier. "Perturbed-History Exploration in Stochastic Multi-Armed Bandits". W Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/386.
Pełny tekst źródłaCarlsson, Emil, Devdatt Dubhashi i Fredrik D. Johansson. "Thompson Sampling for Bandits with Clustered Arms". W Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/305.
Pełny tekst źródłaMuller, Matias I., Patricio E. Valenzuela, Alexandre Proutiere i Cristian R. Rojas. "A stochastic multi-armed bandit approach to nonparametric H∞-norm estimation". W 2017 IEEE 56th Annual Conference on Decision and Control (CDC). IEEE, 2017. http://dx.doi.org/10.1109/cdc.2017.8264343.
Pełny tekst źródłaZhao, Tianchi, Bo Jiang, Ming Li i Ravi Tandon. "Regret Analysis of Stochastic Multi-armed Bandit Problem with Clustered Information Feedback". W 2020 International Joint Conference on Neural Networks (IJCNN). IEEE, 2020. http://dx.doi.org/10.1109/ijcnn48605.2020.9207422.
Pełny tekst źródłaMadhushani, Udari, i Naomi Ehrich Leonard. "Heterogeneous Stochastic Interactions for Multiple Agents in a Multi-armed Bandit Problem". W 2019 18th European Control Conference (ECC). IEEE, 2019. http://dx.doi.org/10.23919/ecc.2019.8796036.
Pełny tekst źródłaWang, Xiong, i Riheng Jia. "Mean Field Equilibrium in Multi-Armed Bandit Game with Continuous Reward". W Thirtieth International Joint Conference on Artificial Intelligence {IJCAI-21}. California: International Joint Conferences on Artificial Intelligence Organization, 2021. http://dx.doi.org/10.24963/ijcai.2021/429.
Pełny tekst źródłaRomano, Giulia, Andrea Agostini, Francesco Trovò, Nicola Gatti i Marcello Restelli. "Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts". W Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/472.
Pełny tekst źródłaRaporty organizacyjne na temat "Stochastic Multi-armed Bandit"
Glazebrook, Kevin D., Donald P. Gaver i Patricia A. Jacobs. Military Stochastic Scheduling Treated As a 'Multi-Armed Bandit' Problem. Fort Belvoir, VA: Defense Technical Information Center, wrzesień 2001. http://dx.doi.org/10.21236/ada385864.
Pełny tekst źródła