Literatura académica sobre el tema "Multiarmed Bandits"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Multiarmed Bandits".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Multiarmed Bandits"

1

Righter, Rhonda y J. George Shanthikumar. "Independently Expiring Multiarmed Bandits". Probability in the Engineering and Informational Sciences 12, n.º 4 (octubre de 1998): 453–68. http://dx.doi.org/10.1017/s0269964800005325.

Texto completo
Resumen
We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Gao, Xiujuan, Hao Liang y Tong Wang. "A Common Value Experimentation with Multiarmed Bandits". Mathematical Problems in Engineering 2018 (30 de julio de 2018): 1–8. http://dx.doi.org/10.1155/2018/4791590.

Texto completo
Resumen
We study a value common experimentation with multiarmed bandits and give an application about the experimentation. The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers among multiple sellers, we obtain the necessary and sufficient conditions about the common experimentation. The Markov perfect equilibrium and the socially effective allocation in K-armed markets are discussed.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Kalathil, Dileep, Naumaan Nayyar y Rahul Jain. "Decentralized Learning for Multiplayer Multiarmed Bandits". IEEE Transactions on Information Theory 60, n.º 4 (abril de 2014): 2331–45. http://dx.doi.org/10.1109/tit.2014.2302471.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Cesa-Bianchi, Nicolò. "MULTIARMED BANDITS IN THE WORST CASE". IFAC Proceedings Volumes 35, n.º 1 (2002): 91–96. http://dx.doi.org/10.3182/20020721-6-es-1901.01001.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Bray, Robert L., Decio Coviello, Andrea Ichino y Nicola Persico. "Multitasking, Multiarmed Bandits, and the Italian Judiciary". Manufacturing & Service Operations Management 18, n.º 4 (octubre de 2016): 545–58. http://dx.doi.org/10.1287/msom.2016.0586.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Denardo, Eric V., Haechurl Park y Uriel G. Rothblum. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits". Mathematics of Operations Research 32, n.º 2 (mayo de 2007): 374–94. http://dx.doi.org/10.1287/moor.1060.0240.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Weber, Richard. "On the Gittins Index for Multiarmed Bandits". Annals of Applied Probability 2, n.º 4 (noviembre de 1992): 1024–33. http://dx.doi.org/10.1214/aoap/1177005588.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Drugan, Madalina M. "Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits". IEEE Transactions on Neural Networks and Learning Systems 30, n.º 8 (agosto de 2019): 2493–502. http://dx.doi.org/10.1109/tnnls.2018.2885123.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Burnetas, Apostolos N. y Michael N. Katehakis. "ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM". Probability in the Engineering and Informational Sciences 17, n.º 1 (enero de 2003): 53–82. http://dx.doi.org/10.1017/s0269964803171045.

Texto completo
Resumen
The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Nayyar, Naumaan, Dileep Kalathil y Rahul Jain. "On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits". IEEE Transactions on Control of Network Systems 5, n.º 1 (marzo de 2018): 597–606. http://dx.doi.org/10.1109/tcns.2016.2635380.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Multiarmed Bandits"

1

Lin, Haixia 1977. "Multiple machine maintenance : applying a separable value function approximation to a variation of the multiarmed bandit". Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87269.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Saha, Aadirupa. "Battle of Bandits: Online Learning from Subsetwise Preferences and Other Structured Feedback". Thesis, 2020. https://etd.iisc.ac.in/handle/2005/5184.

Texto completo
Resumen
The elicitation and aggregation of preferences is often the key to making better decisions. Be it a perfume company wanting to relaunch their 5 most popular fragrances, a movie recommender system trying to rank the most favoured movies, or a pharmaceutical company testing the relative efficacies of a set of drugs, learning from preference feedback is a widely applicable problem to solve. One can model the sequential version of this problem using the classical multiarmed-bandit (MAB) (e.g., Auer, 2002) by representing each decision choice as one bandit-arm, or more appropriately as a Dueling-Bandit (DB) problem (Yue \& Joachims, 2009). Although DB is similar to MAB in that it is an online decision making framework, DB is different in that it specifically models learning from pairwise preferences. In practice, it is often much easier to elicit information, especially when humans are in the loop, through relative preferences: `Item A is better than item B' is easier to elicit than its absolute counterpart: `Item A is worth 7 and B is worth 4'. However, instead of pairwise preferences, a more general $k$-subset-wise preference model $(k \ge 2)$ is more relevant in various practical scenarios, e.g. recommender systems, search engines, crowd-sourcing, e-learning platforms, design of surveys, ranking in multiplayer games. Subset-wise preference elicitation is not only more budget friendly, but also flexible in conveying several types of feedback. For example, with subset-wise preferences, the learner could elicit the best item, a partial preference of the top 5 items, or even an entire rank ordering of a subset of items, whereas all these boil down to the same feedback over pairs (subsets of size 2). The problem of how to learn adaptively with subset-wise preferences, however, remains largely unexplored; this is primarily due to the computational burden of maintaining a combinatorially large, $O(n^k)$, size of preference information in general (for a decision problem with $n$ items and subsetsize $k$). We take a step in the above direction by proposing ``Battling Bandits (BB)''---a new online learning framework to learn a set of optimal ('good') items by sequentially, and adaptively, querying subsets of items of size up to $k$ ($k\ge 2$). The preference feedback from a subset is assumed to arise from an underlying parametric discrete choice model, such as the well-known Plackett-Luce model, or more generally any random utility (RUM) based model. It is this structure that we leverage to design efficient algorithms for various problems of interest, e.g. identifying the best item, set of top-k items, full ranking etc., for both in PAC and regret minimization setting. We propose computationally efficient and (near-) optimal algorithms for above objectives along with matching lower bound guarantees. Interestingly this leads us to finding answers to some basic questions about the value of subset-wise preferences: Does playing a general $k$-set really help in faster information aggregation, i.e. is there a tradeoff between subsetsize-$k$ vs the learning rate? Under what type of feedback models? How do the performance limits (performance lower bounds) vary over different combinations of feedback and choice models? And above all, what more can we achieve through BB where DB fails? We proceed to analyse the BB problem in the contextual scenario – this is relevant in settings where items have known attributes, and allows for potentially infinite decision spaces. This is more general and of practical interest than the finite-arm case, but, naturally, on the other hand more challenging. Moreover, none of the existing online learning algorithms extend straightforwardly to the continuous case, even for the most simple Dueling Bandit setup (i.e. when $k=2$). Towards this, we formulate the problem of ``Contextual Battling Bandits (C-BB)'' under utility based subsetwise-preference feedback, and design provably optimal algorithms for the regret minimization problem. Our regret bounds are also accompanied by matching lower bound guarantees showing optimality of our proposed methods. All our theoretical guarantees are corroborated with empirical evaluations. Lastly, it goes without saying, that there are still many open threads to explore based on BB. These include studying different choice-feedback model combinations, performance objectives, or even extending BB to other useful frameworks like assortment selection, revenue maximization, budget-constrained bandits etc. Towards the end we will also discuss some interesting combinations of the BB framework with other, well-known, problems, e.g. Sleeping / Rotting Bandits, Preference based Reinforcement Learning, Learning on Graphs, Preferential Bandit-Convex-Optimization etc.
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Mann, Timothy 1984. "Scaling Up Reinforcement Learning without Sacrificing Optimality by Constraining Exploration". Thesis, 2012. http://hdl.handle.net/1969.1/148402.

Texto completo
Resumen
The purpose of this dissertation is to understand how algorithms can efficiently learn to solve new tasks based on previous experience, instead of being explicitly programmed with a solution for each task that we want it to solve. Here a task is a series of decisions, such as a robot vacuum deciding which room to clean next or an intelligent car deciding to stop at a traffic light. In such a case, state-of-the-art learning algorithms are difficult to employ in practice because they often make thou- sands of mistakes before reliably solving a task. However, humans learn solutions to novel tasks, often making fewer mistakes, which suggests that efficient learning algorithms may exist. One advantage that humans have over state- of-the-art learning algorithms is that, while learning a new task, humans can apply knowledge gained from previously solved tasks. The central hypothesis investigated by this dissertation is that learning algorithms can solve new tasks more efficiently when they take into consideration knowledge learned from solving previous tasks. Al- though this hypothesis may appear to be obviously true, what knowledge to use and how to apply that knowledge to new tasks is a challenging, open research problem. I investigate this hypothesis in three ways. First, I developed a new learning algorithm that is able to use prior knowledge to constrain the exploration space. Second, I extended a powerful theoretical framework in machine learning, called Probably Approximately Correct, so that I can formally compare the efficiency of algorithms that solve only a single task to algorithms that consider knowledge from previously solved tasks. With this framework, I found sufficient conditions for using knowledge from previous tasks to improve efficiency of learning to solve new tasks and also identified conditions where transferring knowledge may impede learning. I present situations where transfer learning can be used to intelligently constrain the exploration space so that optimality loss can be minimized. Finally, I tested the efficiency of my algorithms in various experimental domains. These theoretical and empirical results provide support for my central hypothesis. The theory and experiments of this dissertation provide a deeper understanding of what makes a learning algorithm efficient so that it can be widely used in practice. Finally, these results also contribute the general goal of creating autonomous machines that can be reliably employed to solve complex tasks.
Los estilos APA, Harvard, Vancouver, ISO, etc.

Capítulos de libros sobre el tema "Multiarmed Bandits"

1

Lee, Chia-Jung, Yalei Yang, Sheng-Hui Meng y Tien-Wen Sung. "Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds". En Advances in Smart Vehicular Technology, Transportation, Communication and Applications, 305–11. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70730-3_36.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

"Multiarmed Bandits". En Mathematical Analysis of Machine Learning Algorithms, 326–44. Cambridge University Press, 2023. http://dx.doi.org/10.1017/9781009093057.017.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Agrawal, Shipra. "Recent Advances in Multiarmed Bandits for Sequential Decision Making". En Operations Research & Management Science in the Age of Analytics, 167–88. INFORMS, 2019. http://dx.doi.org/10.1287/educ.2019.0204.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Multiarmed Bandits"

1

Niño-Mora, José. "An Index Policy for Multiarmed Multimode Restless Bandits". En 3rd International ICST Conference on Performance Evaluation Methodologies and Tools. ICST, 2008. http://dx.doi.org/10.4108/icst.valuetools2008.4410.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
2

Landgren, Peter, Vaibhav Srivastava y Naomi Ehrich Leonard. "On distributed cooperative decision-making in multiarmed bandits". En 2016 European Control Conference (ECC). IEEE, 2016. http://dx.doi.org/10.1109/ecc.2016.7810293.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
3

Nino-Mora, José. "Computing an Index Policy for Multiarmed Bandits with Deadlines". En 3rd International ICST Conference on Performance Evaluation Methodologies and Tools. ICST, 2008. http://dx.doi.org/10.4108/icst.valuetools2008.4406.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
4

Srivastava, Vaibhav, Paul Reverdy y Naomi E. Leonard. "Surveillance in an abruptly changing world via multiarmed bandits". En 2014 IEEE 53rd Annual Conference on Decision and Control (CDC). IEEE, 2014. http://dx.doi.org/10.1109/cdc.2014.7039462.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
5

Landgren, Peter, Vaibhav Srivastava y Naomi Ehrich Leonard. "Distributed cooperative decision-making in multiarmed bandits: Frequentist and Bayesian algorithms". En 2016 IEEE 55th Conference on Decision and Control (CDC). IEEE, 2016. http://dx.doi.org/10.1109/cdc.2016.7798264.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
6

Landgren, Peter, Vaibhav Srivastava y Naomi Ehrich Leonard. "Social Imitation in Cooperative Multiarmed Bandits: Partition-Based Algorithms with Strictly Local Information". En 2018 IEEE Conference on Decision and Control (CDC). IEEE, 2018. http://dx.doi.org/10.1109/cdc.2018.8619744.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
7

Anantharam, V. y P. Varaiya. "Asymptotically efficient rules in multiarmed Bandit problems". En 1986 25th IEEE Conference on Decision and Control. IEEE, 1986. http://dx.doi.org/10.1109/cdc.1986.267217.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
8

Gummadi, Ramakrishna, Ramesh Johari y Jia Yuan Yu. "Mean field equilibria of multiarmed bandit games". En the 13th ACM Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2229012.2229060.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
9

Mersereau, Adam J., Paat Rusmevichientong y John N. Tsitsiklis. "A structured multiarmed bandit problem and the greedy policy". En 2008 47th IEEE Conference on Decision and Control. IEEE, 2008. http://dx.doi.org/10.1109/cdc.2008.4738680.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
10

Wei, Lai y Vaibhav Srivatsva. "On Abruptly-Changing and Slowly-Varying Multiarmed Bandit Problems". En 2018 Annual American Control Conference (ACC). IEEE, 2018. http://dx.doi.org/10.23919/acc.2018.8431265.

Texto completo
Los estilos APA, Harvard, Vancouver, ISO, etc.
Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!

Pasar a la bibliografía