Academic literature on the topic 'Reinforcement Learning, Multi-armed Bandits'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Reinforcement Learning, Multi-armed Bandits.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Reinforcement Learning, Multi-armed Bandits"
Wan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang, and Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 8 (June 26, 2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Full textCiucanu, Radu, Pascal Lafourcade, Gael Marcadet, and Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits." Journal of Artificial Intelligence Research 73 (February 23, 2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Full textHuanca-Anquise, Candy A., Ana Lúcia Cetertich Bazzan, and Anderson R. Tavares. "Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice." Revista de Informática Teórica e Aplicada 30, no. 1 (January 30, 2023): 11–23. http://dx.doi.org/10.22456/2175-2745.122929.
Full textGiachino, Chiara, Luigi Bollani, Alessandro Bonadonna, and Marco Bertetti. "Reinforcement learning for content's customization: a first step of experimentation in Skyscanner." Industrial Management & Data Systems 121, no. 6 (January 15, 2021): 1417–34. http://dx.doi.org/10.1108/imds-12-2019-0722.
Full textNoothigattu, Ritesh, Tom Yan, and Ariel D. Procaccia. "Inverse Reinforcement Learning From Like-Minded Teachers." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 9197–204. http://dx.doi.org/10.1609/aaai.v35i10.17110.
Full textXiong, Guojun, Jian Li, and Rahul Singh. "Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 8 (June 28, 2022): 8726–34. http://dx.doi.org/10.1609/aaai.v36i8.20852.
Full textHuo, Xiaoguang, and Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection." Royal Society Open Science 4, no. 11 (November 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Full textNobari, Sadegh. "DBA: Dynamic Multi-Armed Bandit Algorithm." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 9869–70. http://dx.doi.org/10.1609/aaai.v33i01.33019869.
Full textEsfandiari, Hossein, MohammadTaghi HajiAghayi, Brendan Lucier, and Michael Mitzenmacher. "Online Pandora’s Boxes and Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 1885–92. http://dx.doi.org/10.1609/aaai.v33i01.33011885.
Full textLefebvre, Germain, Christopher Summerfield, and Rafal Bogacz. "A Normative Account of Confirmation Bias During Reinforcement Learning." Neural Computation 34, no. 2 (January 14, 2022): 307–37. http://dx.doi.org/10.1162/neco_a_01455.
Full textDissertations / Theses on the topic "Reinforcement Learning, Multi-armed Bandits"
Magureanu, Stefan. "Structured Stochastic Bandits." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Full textQC 20160223
Talebi, Mazraeh Shahi Mohammad Sadegh. "Minimizing Regret in Combinatorial Bandits and Reinforcement Learning." Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970.
Full textQC 20171215
Hauser, Kristen. "Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy Sampling." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1613034993418088.
Full textOlkhovskaya, Julia. "Large-scale online learning under partial feedback." Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673926.
Full textSequential decision making under uncertainty covers a broad class of problems. Real-world applications require the algorithms to be computationally efficient and scalable. We study a range of sequential learning problems, where the learner observe only partial information about the rewards we develop the algorithms that are robust and computationally efficient in large-scale settings. First problem that we consider is an online influence maximization problem in which a decision maker sequentiaonally selects a node in the graph in order to spread the information throughout the graph by placing the information in the chosen node. The available feedback is only some information about a small neighbourhood of the selected vertex. Our results show that such partial local observations can be sufficient for maximizing global influence. We propose sequential learning algorithms that aim at maximizing influence, and provide their theoretical analysis in both the subcritical and supercritical regimes of broadly studied graph models. Thus this is the first algorithms in the sequential influence maximization setting, that perform efficiently in the graph with a huge number of nodes. In another line of work, we study the contextual bandit problem, where the reward function is allowed to change in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We assume that the number of arms is finite and the context space can be infinite. We develop a computationally efficient algorithm under the assumption that the d-dimensional contexts are generated i.i.d. at random from a known distribution. We also propose an algorithm that is shown to be robust to misspecification in the setting where the true reward function is linear up to an additive nonlinear error. To our knowledge, our performance guarantees constitute the very first results on this problem setting. We also provide an extension when the context is an element of a reproducing kernel Hilbert space. Finally, we consider an extension of the contextual bandit problem described above. We study a setting where the learner interacts with a Markov decision process in a sequence of episodes, where an adversary chooses the reward function and the reward observations are available only for the selected action. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner at least has access to the simulator of the trajectories in the MDP. Our main contributions are the first algorithms that are shown to be robust and efficient in this problem setting.
Racey, Deborah Elaine. "EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK." OpenSIUC, 2009. https://opensiuc.lib.siu.edu/dissertations/86.
Full textBesson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Full textIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Achab, Mastane. "Ranking and risk-aware reinforcement learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Full textThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.
Full textJedor, Matthieu. "Bandit algorithms for recommender system optimization." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Full textIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Couetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Full textIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Books on the topic "Reinforcement Learning, Multi-armed Bandits"
Zhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Find full textZhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Find full textZhao, Qing. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Springer International Publishing AG, 2019.
Find full textZhao, Qing, and R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Find full textBook chapters on the topic "Reinforcement Learning, Multi-armed Bandits"
Rao, Ashwin, and Tikhon Jelvis. "Multi-Armed Bandits: Exploration versus Exploitation." In Foundations of Reinforcement Learning with Applications in Finance, 411–38. Boca Raton: Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003229193-15.
Full textKubat, Miroslav. "Reinforcement Learning: N-Armed Bandits and Episodes." In An Introduction to Machine Learning, 353–76. Cham: Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-030-81935-4_17.
Full textRoijers, Diederik M., Luisa M. Zintgraf, Pieter Libin, Mathieu Reymond, Eugenio Bargiacchi, and Ann Nowé. "Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models." In Machine Learning and Knowledge Discovery in Databases, 463–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-67664-3_28.
Full textCombrink, Herkulaas MvE, Vukosi Marivate, and Benjamin Rosman. "Reinforcement Learning in Education: A Multi-armed Bandit Approach." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 3–16. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-35883-8_1.
Full textAntos, András, Varun Grover, and Csaba Szepesvári. "Active Learning in Multi-armed Bandits." In Lecture Notes in Computer Science, 287–302. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-87987-9_25.
Full textQureshi, Ubaid, Mehreen Mushtaq, Juveeryah Qureshi, Mir Aiman, Mansha Ali, and Shahnawaz Ali. "Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach." In Proceedings of International Conference on Data Science and Applications, 625–35. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-6634-7_44.
Full textBaransi, Akram, Odalric-Ambrym Maillard, and Shie Mannor. "Sub-sampling for Multi-armed Bandits." In Machine Learning and Knowledge Discovery in Databases, 115–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-44848-9_8.
Full textCarpentier, Alexandra, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, and Peter Auer. "Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits." In Lecture Notes in Computer Science, 189–203. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24412-4_17.
Full textMendonça, Vânia, Luísa Coheur, and Alberto Sardinha. "One Arm to Rule Them All: Online Learning with Multi-armed Bandits for Low-Resource Conversational Agents." In Progress in Artificial Intelligence, 625–34. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86230-5_49.
Full textZhang, Lili, Ruben Mukherjee, Piyush Wadhai, Willie Muehlhausen, and Tomas Ward. "Computational Phenotyping of Decision-Making over Voice Interfaces." In Communications in Computer and Information Science, 475–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_37.
Full textConference papers on the topic "Reinforcement Learning, Multi-armed Bandits"
Liu, Yi-Pei, Kuo Li, Xi Cao, Qing-Shan Jia, and Xu Wang. "Quantum Reinforcement Learning for Multi-Armed Bandits." In 2022 41st Chinese Control Conference (CCC). IEEE, 2022. http://dx.doi.org/10.23919/ccc55666.2022.9902595.
Full textZhang, Junzhe, and Elias Bareinboim. "Transfer Learning in Multi-Armed Bandits: A Causal Approach." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/186.
Full textYahyaa, Saba Q., Madalina M. Drugan, and Bernard Manderick. "Annealing-pareto multi-objective multi-armed bandit algorithm." In 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010619.
Full textJiang, Daniel, Haipeng Luo, Chu Wang, and Yingfei Wang. "Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond." In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3447548.3469457.
Full text"EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING - Similarity Models for Multi-Armed Bandits and MDPs." In 2nd International Conference on Agents and Artificial Intelligence. SciTePress - Science and and Technology Publications, 2010. http://dx.doi.org/10.5220/0002703002030210.
Full textTariq, Zain Ul Abideen, Emna Baccour, Aiman Erbad, Mohsen Guizani, and Mounir Hamdi. "Network Intrusion Detection for Smart Infrastructure using Multi-armed Bandit based Reinforcement Learning in Adversarial Environment." In 2022 International Conference on Cyber Warfare and Security (ICCWS). IEEE, 2022. http://dx.doi.org/10.1109/iccws56285.2022.9998440.
Full textElSayed, Karim A., Ilias Bilionis, and Jitesh H. Panchal. "Evaluating Heuristics in Engineering Design: A Reinforcement Learning Approach." In ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/detc2021-70425.
Full textKalathil, Dileep, Naumaan Nayyar, and Rahul Jain. "Decentralized learning for multi-player multi-armed bandits." In 2012 IEEE 51st Annual Conference on Decision and Control (CDC). IEEE, 2012. http://dx.doi.org/10.1109/cdc.2012.6426587.
Full textSankararaman, Abishek, Ayalvadi Ganesh, and Sanjay Shakkottai. "Social Learning in Multi Agent Multi Armed Bandits." In SIGMETRICS '20: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3393691.3394217.
Full textRadlinski, Filip, Robert Kleinberg, and Thorsten Joachims. "Learning diverse rankings with multi-armed bandits." In the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390255.
Full text