Literatura académica sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Reinforcement Learning, Multi-armed Bandits".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Artículos de revistas sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Wan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang y Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets". Proceedings of the AAAI Conference on Artificial Intelligence 37, n.º 8 (26 de junio de 2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Texto completoCiucanu, Radu, Pascal Lafourcade, Gael Marcadet y Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits". Journal of Artificial Intelligence Research 73 (23 de febrero de 2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Texto completoHuanca-Anquise, Candy A., Ana Lúcia Cetertich Bazzan y Anderson R. Tavares. "Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice". Revista de Informática Teórica e Aplicada 30, n.º 1 (30 de enero de 2023): 11–23. http://dx.doi.org/10.22456/2175-2745.122929.
Texto completoGiachino, Chiara, Luigi Bollani, Alessandro Bonadonna y Marco Bertetti. "Reinforcement learning for content's customization: a first step of experimentation in Skyscanner". Industrial Management & Data Systems 121, n.º 6 (15 de enero de 2021): 1417–34. http://dx.doi.org/10.1108/imds-12-2019-0722.
Texto completoNoothigattu, Ritesh, Tom Yan y Ariel D. Procaccia. "Inverse Reinforcement Learning From Like-Minded Teachers". Proceedings of the AAAI Conference on Artificial Intelligence 35, n.º 10 (18 de mayo de 2021): 9197–204. http://dx.doi.org/10.1609/aaai.v35i10.17110.
Texto completoXiong, Guojun, Jian Li y Rahul Singh. "Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 36, n.º 8 (28 de junio de 2022): 8726–34. http://dx.doi.org/10.1609/aaai.v36i8.20852.
Texto completoHuo, Xiaoguang y Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection". Royal Society Open Science 4, n.º 11 (noviembre de 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Texto completoNobari, Sadegh. "DBA: Dynamic Multi-Armed Bandit Algorithm". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julio de 2019): 9869–70. http://dx.doi.org/10.1609/aaai.v33i01.33019869.
Texto completoEsfandiari, Hossein, MohammadTaghi HajiAghayi, Brendan Lucier y Michael Mitzenmacher. "Online Pandora’s Boxes and Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julio de 2019): 1885–92. http://dx.doi.org/10.1609/aaai.v33i01.33011885.
Texto completoLefebvre, Germain, Christopher Summerfield y Rafal Bogacz. "A Normative Account of Confirmation Bias During Reinforcement Learning". Neural Computation 34, n.º 2 (14 de enero de 2022): 307–37. http://dx.doi.org/10.1162/neco_a_01455.
Texto completoTesis sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Magureanu, Stefan. "Structured Stochastic Bandits". Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Texto completoQC 20160223
Talebi, Mazraeh Shahi Mohammad Sadegh. "Minimizing Regret in Combinatorial Bandits and Reinforcement Learning". Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970.
Texto completoQC 20171215
Hauser, Kristen. "Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy Sampling". Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1613034993418088.
Texto completoOlkhovskaya, Julia. "Large-scale online learning under partial feedback". Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673926.
Texto completoSequential decision making under uncertainty covers a broad class of problems. Real-world applications require the algorithms to be computationally efficient and scalable. We study a range of sequential learning problems, where the learner observe only partial information about the rewards we develop the algorithms that are robust and computationally efficient in large-scale settings. First problem that we consider is an online influence maximization problem in which a decision maker sequentiaonally selects a node in the graph in order to spread the information throughout the graph by placing the information in the chosen node. The available feedback is only some information about a small neighbourhood of the selected vertex. Our results show that such partial local observations can be sufficient for maximizing global influence. We propose sequential learning algorithms that aim at maximizing influence, and provide their theoretical analysis in both the subcritical and supercritical regimes of broadly studied graph models. Thus this is the first algorithms in the sequential influence maximization setting, that perform efficiently in the graph with a huge number of nodes. In another line of work, we study the contextual bandit problem, where the reward function is allowed to change in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We assume that the number of arms is finite and the context space can be infinite. We develop a computationally efficient algorithm under the assumption that the d-dimensional contexts are generated i.i.d. at random from a known distribution. We also propose an algorithm that is shown to be robust to misspecification in the setting where the true reward function is linear up to an additive nonlinear error. To our knowledge, our performance guarantees constitute the very first results on this problem setting. We also provide an extension when the context is an element of a reproducing kernel Hilbert space. Finally, we consider an extension of the contextual bandit problem described above. We study a setting where the learner interacts with a Markov decision process in a sequence of episodes, where an adversary chooses the reward function and the reward observations are available only for the selected action. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner at least has access to the simulator of the trajectories in the MDP. Our main contributions are the first algorithms that are shown to be robust and efficient in this problem setting.
Racey, Deborah Elaine. "EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK". OpenSIUC, 2009. https://opensiuc.lib.siu.edu/dissertations/86.
Texto completoBesson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks". Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Texto completoIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Achab, Mastane. "Ranking and risk-aware reinforcement learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Texto completoThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning". Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.
Texto completoJedor, Matthieu. "Bandit algorithms for recommender system optimization". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Texto completoIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Couetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems". Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Texto completoIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Libros sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Zhao, Qing y R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Buscar texto completoZhao, Qing y R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Buscar texto completoZhao, Qing. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Springer International Publishing AG, 2019.
Buscar texto completoZhao, Qing y R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Buscar texto completoCapítulos de libros sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Rao, Ashwin y Tikhon Jelvis. "Multi-Armed Bandits: Exploration versus Exploitation". En Foundations of Reinforcement Learning with Applications in Finance, 411–38. Boca Raton: Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003229193-15.
Texto completoKubat, Miroslav. "Reinforcement Learning: N-Armed Bandits and Episodes". En An Introduction to Machine Learning, 353–76. Cham: Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-030-81935-4_17.
Texto completoRoijers, Diederik M., Luisa M. Zintgraf, Pieter Libin, Mathieu Reymond, Eugenio Bargiacchi y Ann Nowé. "Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models". En Machine Learning and Knowledge Discovery in Databases, 463–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-67664-3_28.
Texto completoCombrink, Herkulaas MvE, Vukosi Marivate y Benjamin Rosman. "Reinforcement Learning in Education: A Multi-armed Bandit Approach". En Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 3–16. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-35883-8_1.
Texto completoAntos, András, Varun Grover y Csaba Szepesvári. "Active Learning in Multi-armed Bandits". En Lecture Notes in Computer Science, 287–302. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-87987-9_25.
Texto completoQureshi, Ubaid, Mehreen Mushtaq, Juveeryah Qureshi, Mir Aiman, Mansha Ali y Shahnawaz Ali. "Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach". En Proceedings of International Conference on Data Science and Applications, 625–35. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-6634-7_44.
Texto completoBaransi, Akram, Odalric-Ambrym Maillard y Shie Mannor. "Sub-sampling for Multi-armed Bandits". En Machine Learning and Knowledge Discovery in Databases, 115–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-44848-9_8.
Texto completoCarpentier, Alexandra, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos y Peter Auer. "Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits". En Lecture Notes in Computer Science, 189–203. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24412-4_17.
Texto completoMendonça, Vânia, Luísa Coheur y Alberto Sardinha. "One Arm to Rule Them All: Online Learning with Multi-armed Bandits for Low-Resource Conversational Agents". En Progress in Artificial Intelligence, 625–34. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86230-5_49.
Texto completoZhang, Lili, Ruben Mukherjee, Piyush Wadhai, Willie Muehlhausen y Tomas Ward. "Computational Phenotyping of Decision-Making over Voice Interfaces". En Communications in Computer and Information Science, 475–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_37.
Texto completoActas de conferencias sobre el tema "Reinforcement Learning, Multi-armed Bandits"
Liu, Yi-Pei, Kuo Li, Xi Cao, Qing-Shan Jia y Xu Wang. "Quantum Reinforcement Learning for Multi-Armed Bandits". En 2022 41st Chinese Control Conference (CCC). IEEE, 2022. http://dx.doi.org/10.23919/ccc55666.2022.9902595.
Texto completoZhang, Junzhe y Elias Bareinboim. "Transfer Learning in Multi-Armed Bandits: A Causal Approach". En Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/186.
Texto completoYahyaa, Saba Q., Madalina M. Drugan y Bernard Manderick. "Annealing-pareto multi-objective multi-armed bandit algorithm". En 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010619.
Texto completoJiang, Daniel, Haipeng Luo, Chu Wang y Yingfei Wang. "Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond". En KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3447548.3469457.
Texto completo"EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING - Similarity Models for Multi-Armed Bandits and MDPs". En 2nd International Conference on Agents and Artificial Intelligence. SciTePress - Science and and Technology Publications, 2010. http://dx.doi.org/10.5220/0002703002030210.
Texto completoTariq, Zain Ul Abideen, Emna Baccour, Aiman Erbad, Mohsen Guizani y Mounir Hamdi. "Network Intrusion Detection for Smart Infrastructure using Multi-armed Bandit based Reinforcement Learning in Adversarial Environment". En 2022 International Conference on Cyber Warfare and Security (ICCWS). IEEE, 2022. http://dx.doi.org/10.1109/iccws56285.2022.9998440.
Texto completoElSayed, Karim A., Ilias Bilionis y Jitesh H. Panchal. "Evaluating Heuristics in Engineering Design: A Reinforcement Learning Approach". En ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/detc2021-70425.
Texto completoKalathil, Dileep, Naumaan Nayyar y Rahul Jain. "Decentralized learning for multi-player multi-armed bandits". En 2012 IEEE 51st Annual Conference on Decision and Control (CDC). IEEE, 2012. http://dx.doi.org/10.1109/cdc.2012.6426587.
Texto completoSankararaman, Abishek, Ayalvadi Ganesh y Sanjay Shakkottai. "Social Learning in Multi Agent Multi Armed Bandits". En SIGMETRICS '20: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3393691.3394217.
Texto completoRadlinski, Filip, Robert Kleinberg y Thorsten Joachims. "Learning diverse rankings with multi-armed bandits". En the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390255.
Texto completo