Log in

Relevant bibliographies by topics / Multiarmed Bandits

Contents

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Academic literature on the topic 'Multiarmed Bandits'

Author: Grafiati

Published: 6 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Multiarmed Bandits.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Multiarmed Bandits"

1

Righter, Rhonda, and J. George Shanthikumar. "Independently Expiring Multiarmed Bandits." Probability in the Engineering and Informational Sciences 12, no. 4 (October 1998): 453–68. http://dx.doi.org/10.1017/s0269964800005325.

Full text

Abstract:

We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.

APA, Harvard, Vancouver, ISO, and other styles

2

Gao, Xiujuan, Hao Liang, and Tong Wang. "A Common Value Experimentation with Multiarmed Bandits." Mathematical Problems in Engineering 2018 (July 30, 2018): 1–8. http://dx.doi.org/10.1155/2018/4791590.

Full text

Abstract:

We study a value common experimentation with multiarmed bandits and give an application about the experimentation. The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers among multiple sellers, we obtain the necessary and sufficient conditions about the common experimentation. The Markov perfect equilibrium and the socially effective allocation in K-armed markets are discussed.

APA, Harvard, Vancouver, ISO, and other styles

3

Kalathil, Dileep, Naumaan Nayyar, and Rahul Jain. "Decentralized Learning for Multiplayer Multiarmed Bandits." IEEE Transactions on Information Theory 60, no. 4 (April 2014): 2331–45. http://dx.doi.org/10.1109/tit.2014.2302471.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Cesa-Bianchi, Nicolò. "MULTIARMED BANDITS IN THE WORST CASE." IFAC Proceedings Volumes 35, no. 1 (2002): 91–96. http://dx.doi.org/10.3182/20020721-6-es-1901.01001.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Bray, Robert L., Decio Coviello, Andrea Ichino, and Nicola Persico. "Multitasking, Multiarmed Bandits, and the Italian Judiciary." Manufacturing & Service Operations Management 18, no. 4 (October 2016): 545–58. http://dx.doi.org/10.1287/msom.2016.0586.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Denardo, Eric V., Haechurl Park, and Uriel G. Rothblum. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits." Mathematics of Operations Research 32, no. 2 (May 2007): 374–94. http://dx.doi.org/10.1287/moor.1060.0240.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Weber, Richard. "On the Gittins Index for Multiarmed Bandits." Annals of Applied Probability 2, no. 4 (November 1992): 1024–33. http://dx.doi.org/10.1214/aoap/1177005588.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Drugan, Madalina M. "Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits." IEEE Transactions on Neural Networks and Learning Systems 30, no. 8 (August 2019): 2493–502. http://dx.doi.org/10.1109/tnnls.2018.2885123.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Burnetas, Apostolos N., and Michael N. Katehakis. "ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM." Probability in the Engineering and Informational Sciences 17, no. 1 (January 2003): 53–82. http://dx.doi.org/10.1017/s0269964803171045.

Full text

Abstract:

The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.

APA, Harvard, Vancouver, ISO, and other styles

10

Nayyar, Naumaan, Dileep Kalathil, and Rahul Jain. "On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits." IEEE Transactions on Control of Network Systems 5, no. 1 (March 2018): 597–606. http://dx.doi.org/10.1109/tcns.2016.2635380.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Multiarmed Bandits"

1

Lin, Haixia 1977. "Multiple machine maintenance : applying a separable value function approximation to a variation of the multiarmed bandit." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87269.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Saha, Aadirupa. "Battle of Bandits: Online Learning from Subsetwise Preferences and Other Structured Feedback." Thesis, 2020. https://etd.iisc.ac.in/handle/2005/5184.

Full text

Abstract:

The elicitation and aggregation of preferences is often the key to making better decisions. Be it a perfume company wanting to relaunch their 5 most popular fragrances, a movie recommender system trying to rank the most favoured movies, or a pharmaceutical company testing the relative efficacies of a set of drugs, learning from preference feedback is a widely applicable problem to solve. One can model the sequential version of this problem using the classical multiarmed-bandit (MAB) (e.g., Auer, 2002) by representing each decision choice as one bandit-arm, or more appropriately as a Dueling-Bandit (DB) problem (Yue \& Joachims, 2009). Although DB is similar to MAB in that it is an online decision making framework, DB is different in that it specifically models learning from pairwise preferences. In practice, it is often much easier to elicit information, especially when humans are in the loop, through relative preferences: `Item A is better than item B' is easier to elicit than its absolute counterpart: `Item A is worth 7 and B is worth 4'. However, instead of pairwise preferences, a more general $k$-subset-wise preference model $(k \ge 2)$ is more relevant in various practical scenarios, e.g. recommender systems, search engines, crowd-sourcing, e-learning platforms, design of surveys, ranking in multiplayer games. Subset-wise preference elicitation is not only more budget friendly, but also flexible in conveying several types of feedback. For example, with subset-wise preferences, the learner could elicit the best item, a partial preference of the top 5 items, or even an entire rank ordering of a subset of items, whereas all these boil down to the same feedback over pairs (subsets of size 2). The problem of how to learn adaptively with subset-wise preferences, however, remains largely unexplored; this is primarily due to the computational burden of maintaining a combinatorially large, $O(n^k)$, size of preference information in general (for a decision problem with $n$ items and subsetsize $k$). We take a step in the above direction by proposing ``Battling Bandits (BB)''---a new online learning framework to learn a set of optimal ('good') items by sequentially, and adaptively, querying subsets of items of size up to $k$ ($k\ge 2$). The preference feedback from a subset is assumed to arise from an underlying parametric discrete choice model, such as the well-known Plackett-Luce model, or more generally any random utility (RUM) based model. It is this structure that we leverage to design efficient algorithms for various problems of interest, e.g. identifying the best item, set of top-k items, full ranking etc., for both in PAC and regret minimization setting. We propose computationally efficient and (near-) optimal algorithms for above objectives along with matching lower bound guarantees. Interestingly this leads us to finding answers to some basic questions about the value of subset-wise preferences: Does playing a general $k$-set really help in faster information aggregation, i.e. is there a tradeoff between subsetsize-$k$ vs the learning rate? Under what type of feedback models? How do the performance limits (performance lower bounds) vary over different combinations of feedback and choice models? And above all, what more can we achieve through BB where DB fails? We proceed to analyse the BB problem in the contextual scenario – this is relevant in settings where items have known attributes, and allows for potentially infinite decision spaces. This is more general and of practical interest than the finite-arm case, but, naturally, on the other hand more challenging. Moreover, none of the existing online learning algorithms extend straightforwardly to the continuous case, even for the most simple Dueling Bandit setup (i.e. when $k=2$). Towards this, we formulate the problem of ``Contextual Battling Bandits (C-BB)'' under utility based subsetwise-preference feedback, and design provably optimal algorithms for the regret minimization problem. Our regret bounds are also accompanied by matching lower bound guarantees showing optimality of our proposed methods. All our theoretical guarantees are corroborated with empirical evaluations. Lastly, it goes without saying, that there are still many open threads to explore based on BB. These include studying different choice-feedback model combinations, performance objectives, or even extending BB to other useful frameworks like assortment selection, revenue maximization, budget-constrained bandits etc. Towards the end we will also discuss some interesting combinations of the BB framework with other, well-known, problems, e.g. Sleeping / Rotting Bandits, Preference based Reinforcement Learning, Learning on Graphs, Preferential Bandit-Convex-Optimization etc.

APA, Harvard, Vancouver, ISO, and other styles

3

Mann, Timothy 1984. "Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration." Thesis, 2012. http://hdl.handle.net/1969.1/148402.

Full text

Abstract:

The purpose of this dissertation is to understand how algorithms can efficiently learn to solve new tasks based on previous experience, instead of being explicitly programmed with a solution for each task that we want it to solve. Here a task is a series of decisions, such as a robot vacuum deciding which room to clean next or an intelligent car deciding to stop at a traffic light. In such a case, state-of-the-art learning algorithms are difficult to employ in practice because they often make thou- sands of mistakes before reliably solving a task. However, humans learn solutions to novel tasks, often making fewer mistakes, which suggests that efficient learning algorithms may exist. One advantage that humans have over state- of-the-art learning algorithms is that, while learning a new task, humans can apply knowledge gained from previously solved tasks. The central hypothesis investigated by this dissertation is that learning algorithms can solve new tasks more efficiently when they take into consideration knowledge learned from solving previous tasks. Al- though this hypothesis may appear to be obviously true, what knowledge to use and how to apply that knowledge to new tasks is a challenging, open research problem. I investigate this hypothesis in three ways. First, I developed a new learning algorithm that is able to use prior knowledge to constrain the exploration space. Second, I extended a powerful theoretical framework in machine learning, called Probably Approximately Correct, so that I can formally compare the efficiency of algorithms that solve only a single task to algorithms that consider knowledge from previously solved tasks. With this framework, I found sufficient conditions for using knowledge from previous tasks to improve efficiency of learning to solve new tasks and also identified conditions where transferring knowledge may impede learning. I present situations where transfer learning can be used to intelligently constrain the exploration space so that optimality loss can be minimized. Finally, I tested the efficiency of my algorithms in various experimental domains. These theoretical and empirical results provide support for my central hypothesis. The theory and experiments of this dissertation provide a deeper understanding of what makes a learning algorithm efficient so that it can be widely used in practice. Finally, these results also contribute the general goal of creating autonomous machines that can be reliably employed to solve complex tasks.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Multiarmed Bandits"

1

Lee, Chia-Jung, Yalei Yang, Sheng-Hui Meng, and Tien-Wen Sung. "Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds." In Advances in Smart Vehicular Technology, Transportation, Communication and Applications, 305–11. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70730-3_36.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

"Multiarmed Bandits." In Mathematical Analysis of Machine Learning Algorithms, 326–44. Cambridge University Press, 2023. http://dx.doi.org/10.1017/9781009093057.017.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Agrawal, Shipra. "Recent Advances in Multiarmed Bandits for Sequential Decision Making." In Operations Research & Management Science in the Age of Analytics, 167–88. INFORMS, 2019. http://dx.doi.org/10.1287/educ.2019.0204.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Multiarmed Bandits"

1

Niño-Mora, José. "An Index Policy for Multiarmed Multimode Restless Bandits." In 3rd International ICST Conference on Performance Evaluation Methodologies and Tools. ICST, 2008. http://dx.doi.org/10.4108/icst.valuetools2008.4410.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Landgren, Peter, Vaibhav Srivastava, and Naomi Ehrich Leonard. "On distributed cooperative decision-making in multiarmed bandits." In 2016 European Control Conference (ECC). IEEE, 2016. http://dx.doi.org/10.1109/ecc.2016.7810293.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Nino-Mora, José. "Computing an Index Policy for Multiarmed Bandits with Deadlines." In 3rd International ICST Conference on Performance Evaluation Methodologies and Tools. ICST, 2008. http://dx.doi.org/10.4108/icst.valuetools2008.4406.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Srivastava, Vaibhav, Paul Reverdy, and Naomi E. Leonard. "Surveillance in an abruptly changing world via multiarmed bandits." In 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039462.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Landgren, Peter, Vaibhav Srivastava, and Naomi Ehrich Leonard. "Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms." In 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016. http://dx.doi.org/10.1109/cdc.2016.7798264.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Landgren, Peter, Vaibhav Srivastava, and Naomi Ehrich Leonard. "Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information." In 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018. http://dx.doi.org/10.1109/cdc.2018.8619744.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Anantharam, V., and P. Varaiya. "Asymptotically efficient rules in multiarmed Bandit problems." In 1986 25th IEEE Conference on Decision and Control. IEEE, 1986. http://dx.doi.org/10.1109/cdc.1986.267217.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Gummadi, Ramakrishna, Ramesh Johari, and Jia Yuan Yu. "Mean field equilibria of multiarmed bandit games." In the 13th ACM Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2229012.2229060.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Mersereau, Adam J., Paat Rusmevichientong, and John N. Tsitsiklis. "A structured multiarmed bandit problem and the greedy policy." In 2008 47th IEEE Conference on Decision and Control. IEEE, 2008. http://dx.doi.org/10.1109/cdc.2008.4738680.

Full text

APA, Harvard, Vancouver, ISO, and other styles

10

Wei, Lai, and Vaibhav Srivatsva. "On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems." In 2018 Annual American Control Conference (ACC). IEEE, 2018. http://dx.doi.org/10.23919/acc.2018.8431265.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!