Gotowa bibliografia na temat „Reinforcement Learning, Multi-armed Bandits”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Reinforcement Learning, Multi-armed Bandits”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Artykuły w czasopismach na temat "Reinforcement Learning, Multi-armed Bandits"
Wan, Zongqi, Zhijie Zhang, Tongyang Li, Jialin Zhang i Xiaoming Sun. "Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets". Proceedings of the AAAI Conference on Artificial Intelligence 37, nr 8 (26.06.2023): 10087–94. http://dx.doi.org/10.1609/aaai.v37i8.26202.
Pełny tekst źródłaCiucanu, Radu, Pascal Lafourcade, Gael Marcadet i Marta Soare. "SAMBA: A Generic Framework for Secure Federated Multi-Armed Bandits". Journal of Artificial Intelligence Research 73 (23.02.2022): 737–65. http://dx.doi.org/10.1613/jair.1.13163.
Pełny tekst źródłaHuanca-Anquise, Candy A., Ana Lúcia Cetertich Bazzan i Anderson R. Tavares. "Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice". Revista de Informática Teórica e Aplicada 30, nr 1 (30.01.2023): 11–23. http://dx.doi.org/10.22456/2175-2745.122929.
Pełny tekst źródłaGiachino, Chiara, Luigi Bollani, Alessandro Bonadonna i Marco Bertetti. "Reinforcement learning for content's customization: a first step of experimentation in Skyscanner". Industrial Management & Data Systems 121, nr 6 (15.01.2021): 1417–34. http://dx.doi.org/10.1108/imds-12-2019-0722.
Pełny tekst źródłaNoothigattu, Ritesh, Tom Yan i Ariel D. Procaccia. "Inverse Reinforcement Learning From Like-Minded Teachers". Proceedings of the AAAI Conference on Artificial Intelligence 35, nr 10 (18.05.2021): 9197–204. http://dx.doi.org/10.1609/aaai.v35i10.17110.
Pełny tekst źródłaXiong, Guojun, Jian Li i Rahul Singh. "Reinforcement Learning Augmented Asymptotically Optimal Index Policy for Finite-Horizon Restless Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 36, nr 8 (28.06.2022): 8726–34. http://dx.doi.org/10.1609/aaai.v36i8.20852.
Pełny tekst źródłaHuo, Xiaoguang, i Feng Fu. "Risk-aware multi-armed bandit problem with application to portfolio selection". Royal Society Open Science 4, nr 11 (listopad 2017): 171377. http://dx.doi.org/10.1098/rsos.171377.
Pełny tekst źródłaNobari, Sadegh. "DBA: Dynamic Multi-Armed Bandit Algorithm". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 9869–70. http://dx.doi.org/10.1609/aaai.v33i01.33019869.
Pełny tekst źródłaEsfandiari, Hossein, MohammadTaghi HajiAghayi, Brendan Lucier i Michael Mitzenmacher. "Online Pandora’s Boxes and Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 1885–92. http://dx.doi.org/10.1609/aaai.v33i01.33011885.
Pełny tekst źródłaLefebvre, Germain, Christopher Summerfield i Rafal Bogacz. "A Normative Account of Confirmation Bias During Reinforcement Learning". Neural Computation 34, nr 2 (14.01.2022): 307–37. http://dx.doi.org/10.1162/neco_a_01455.
Pełny tekst źródłaRozprawy doktorskie na temat "Reinforcement Learning, Multi-armed Bandits"
Magureanu, Stefan. "Structured Stochastic Bandits". Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Pełny tekst źródłaQC 20160223
Talebi, Mazraeh Shahi Mohammad Sadegh. "Minimizing Regret in Combinatorial Bandits and Reinforcement Learning". Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970.
Pełny tekst źródłaQC 20171215
Hauser, Kristen. "Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy Sampling". Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1613034993418088.
Pełny tekst źródłaOlkhovskaya, Julia. "Large-scale online learning under partial feedback". Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673926.
Pełny tekst źródłaSequential decision making under uncertainty covers a broad class of problems. Real-world applications require the algorithms to be computationally efficient and scalable. We study a range of sequential learning problems, where the learner observe only partial information about the rewards we develop the algorithms that are robust and computationally efficient in large-scale settings. First problem that we consider is an online influence maximization problem in which a decision maker sequentiaonally selects a node in the graph in order to spread the information throughout the graph by placing the information in the chosen node. The available feedback is only some information about a small neighbourhood of the selected vertex. Our results show that such partial local observations can be sufficient for maximizing global influence. We propose sequential learning algorithms that aim at maximizing influence, and provide their theoretical analysis in both the subcritical and supercritical regimes of broadly studied graph models. Thus this is the first algorithms in the sequential influence maximization setting, that perform efficiently in the graph with a huge number of nodes. In another line of work, we study the contextual bandit problem, where the reward function is allowed to change in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We assume that the number of arms is finite and the context space can be infinite. We develop a computationally efficient algorithm under the assumption that the d-dimensional contexts are generated i.i.d. at random from a known distribution. We also propose an algorithm that is shown to be robust to misspecification in the setting where the true reward function is linear up to an additive nonlinear error. To our knowledge, our performance guarantees constitute the very first results on this problem setting. We also provide an extension when the context is an element of a reproducing kernel Hilbert space. Finally, we consider an extension of the contextual bandit problem described above. We study a setting where the learner interacts with a Markov decision process in a sequence of episodes, where an adversary chooses the reward function and the reward observations are available only for the selected action. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner at least has access to the simulator of the trajectories in the MDP. Our main contributions are the first algorithms that are shown to be robust and efficient in this problem setting.
Racey, Deborah Elaine. "EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK". OpenSIUC, 2009. https://opensiuc.lib.siu.edu/dissertations/86.
Pełny tekst źródłaBesson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks". Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Pełny tekst źródłaIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Achab, Mastane. "Ranking and risk-aware reinforcement learning". Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Pełny tekst źródłaThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning". Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.
Pełny tekst źródłaJedor, Matthieu. "Bandit algorithms for recommender system optimization". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Pełny tekst źródłaIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Couetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems". Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Pełny tekst źródłaIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Książki na temat "Reinforcement Learning, Multi-armed Bandits"
Zhao, Qing, i R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Znajdź pełny tekst źródłaZhao, Qing, i R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Znajdź pełny tekst źródłaZhao, Qing. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Springer International Publishing AG, 2019.
Znajdź pełny tekst źródłaZhao, Qing, i R. Srikant. Multi-Armed Bandits: Theory and Applications to Online Learning in Networks. Morgan & Claypool Publishers, 2019.
Znajdź pełny tekst źródłaCzęści książek na temat "Reinforcement Learning, Multi-armed Bandits"
Rao, Ashwin, i Tikhon Jelvis. "Multi-Armed Bandits: Exploration versus Exploitation". W Foundations of Reinforcement Learning with Applications in Finance, 411–38. Boca Raton: Chapman and Hall/CRC, 2022. http://dx.doi.org/10.1201/9781003229193-15.
Pełny tekst źródłaKubat, Miroslav. "Reinforcement Learning: N-Armed Bandits and Episodes". W An Introduction to Machine Learning, 353–76. Cham: Springer International Publishing, 2012. http://dx.doi.org/10.1007/978-3-030-81935-4_17.
Pełny tekst źródłaRoijers, Diederik M., Luisa M. Zintgraf, Pieter Libin, Mathieu Reymond, Eugenio Bargiacchi i Ann Nowé. "Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models". W Machine Learning and Knowledge Discovery in Databases, 463–78. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-67664-3_28.
Pełny tekst źródłaCombrink, Herkulaas MvE, Vukosi Marivate i Benjamin Rosman. "Reinforcement Learning in Education: A Multi-armed Bandit Approach". W Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 3–16. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-35883-8_1.
Pełny tekst źródłaAntos, András, Varun Grover i Csaba Szepesvári. "Active Learning in Multi-armed Bandits". W Lecture Notes in Computer Science, 287–302. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-87987-9_25.
Pełny tekst źródłaQureshi, Ubaid, Mehreen Mushtaq, Juveeryah Qureshi, Mir Aiman, Mansha Ali i Shahnawaz Ali. "Dynamic Pricing for Electric Vehicle Charging at a Commercial Charging Station in Presence of Uncertainty: A Multi-armed Bandit Reinforcement Learning Approach". W Proceedings of International Conference on Data Science and Applications, 625–35. Singapore: Springer Nature Singapore, 2023. http://dx.doi.org/10.1007/978-981-19-6634-7_44.
Pełny tekst źródłaBaransi, Akram, Odalric-Ambrym Maillard i Shie Mannor. "Sub-sampling for Multi-armed Bandits". W Machine Learning and Knowledge Discovery in Databases, 115–31. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-44848-9_8.
Pełny tekst źródłaCarpentier, Alexandra, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos i Peter Auer. "Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits". W Lecture Notes in Computer Science, 189–203. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24412-4_17.
Pełny tekst źródłaMendonça, Vânia, Luísa Coheur i Alberto Sardinha. "One Arm to Rule Them All: Online Learning with Multi-armed Bandits for Low-Resource Conversational Agents". W Progress in Artificial Intelligence, 625–34. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86230-5_49.
Pełny tekst źródłaZhang, Lili, Ruben Mukherjee, Piyush Wadhai, Willie Muehlhausen i Tomas Ward. "Computational Phenotyping of Decision-Making over Voice Interfaces". W Communications in Computer and Information Science, 475–87. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-26438-2_37.
Pełny tekst źródłaStreszczenia konferencji na temat "Reinforcement Learning, Multi-armed Bandits"
Liu, Yi-Pei, Kuo Li, Xi Cao, Qing-Shan Jia i Xu Wang. "Quantum Reinforcement Learning for Multi-Armed Bandits". W 2022 41st Chinese Control Conference (CCC). IEEE, 2022. http://dx.doi.org/10.23919/ccc55666.2022.9902595.
Pełny tekst źródłaZhang, Junzhe, i Elias Bareinboim. "Transfer Learning in Multi-Armed Bandits: A Causal Approach". W Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/186.
Pełny tekst źródłaYahyaa, Saba Q., Madalina M. Drugan i Bernard Manderick. "Annealing-pareto multi-objective multi-armed bandit algorithm". W 2014 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL). IEEE, 2014. http://dx.doi.org/10.1109/adprl.2014.7010619.
Pełny tekst źródłaJiang, Daniel, Haipeng Luo, Chu Wang i Yingfei Wang. "Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond". W KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3447548.3469457.
Pełny tekst źródła"EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING - Similarity Models for Multi-Armed Bandits and MDPs". W 2nd International Conference on Agents and Artificial Intelligence. SciTePress - Science and and Technology Publications, 2010. http://dx.doi.org/10.5220/0002703002030210.
Pełny tekst źródłaTariq, Zain Ul Abideen, Emna Baccour, Aiman Erbad, Mohsen Guizani i Mounir Hamdi. "Network Intrusion Detection for Smart Infrastructure using Multi-armed Bandit based Reinforcement Learning in Adversarial Environment". W 2022 International Conference on Cyber Warfare and Security (ICCWS). IEEE, 2022. http://dx.doi.org/10.1109/iccws56285.2022.9998440.
Pełny tekst źródłaElSayed, Karim A., Ilias Bilionis i Jitesh H. Panchal. "Evaluating Heuristics in Engineering Design: A Reinforcement Learning Approach". W ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/detc2021-70425.
Pełny tekst źródłaKalathil, Dileep, Naumaan Nayyar i Rahul Jain. "Decentralized learning for multi-player multi-armed bandits". W 2012 IEEE 51st Annual Conference on Decision and Control (CDC). IEEE, 2012. http://dx.doi.org/10.1109/cdc.2012.6426587.
Pełny tekst źródłaSankararaman, Abishek, Ayalvadi Ganesh i Sanjay Shakkottai. "Social Learning in Multi Agent Multi Armed Bandits". W SIGMETRICS '20: ACM SIGMETRICS / International Conference on Measurement and Modeling of Computer Systems. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3393691.3394217.
Pełny tekst źródłaRadlinski, Filip, Robert Kleinberg i Thorsten Joachims. "Learning diverse rankings with multi-armed bandits". W the 25th international conference. New York, New York, USA: ACM Press, 2008. http://dx.doi.org/10.1145/1390156.1390255.
Pełny tekst źródła