Acceder

Bibliografías temáticas / Dueling Bandits Problem

Literatura académica sobre el tema "Dueling Bandits Problem"

Autor: Grafiati

Publicado: 6 de septiembre de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Índice

Artículos de revistas
Actas de conferencias

Consulte las listas temáticas de artículos, libros, tesis, actas de conferencias y otras fuentes académicas sobre el tema "Dueling Bandits Problem".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Artículos de revistas sobre el tema "Dueling Bandits Problem"

1

Xu, Liyuan, Junya Honda y Masashi Sugiyama. "Dueling Bandits with Qualitative Feedback". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 de julio de 2019): 5549–56. http://dx.doi.org/10.1609/aaai.v33i01.33015549.

Texto completo

Resumen

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper1, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Yue, Yisong, Josef Broder, Robert Kleinberg y Thorsten Joachims. "The K-armed dueling bandits problem". Journal of Computer and System Sciences 78, n.º 5 (septiembre de 2012): 1538–56. http://dx.doi.org/10.1016/j.jcss.2011.12.028.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Haddenhorst, Björn, Viktor Bengs y Eyke Hüllermeier. "On testing transitivity in online preference learning". Machine Learning 110, n.º 8 (12 de julio de 2021): 2063–84. http://dx.doi.org/10.1007/s10994-021-06026-2.

Texto completo

Resumen

AbstractThe efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Peköz, Erol, Sheldon M. Ross y Zhengyu Zhang. "DUELING BANDIT PROBLEMS". Probability in the Engineering and Informational Sciences, 20 de noviembre de 2020, 1–12. http://dx.doi.org/10.1017/s0269964820000601.

Texto completo

Resumen

There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the “weak regret problem,” we suppose there is a “best” bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the “strong regret problem”, we suppose that bandit i has unknown value v i , i = 1, …, n, and that i beats j with probability v i /(v i + v j ). One version of strong regret is interested in maximizing the number of times that the contest is between the players with the two largest values. Another version supposes that at any stage, rather than choosing two arms to play a game, the decision maker can declare that a particular arm is the best, with the objective of maximizing the number of stages in which the arm with the largest value is declared to be the best. In the weak regret problem, we propose a policy and obtain an analytic bound on the expected number of stages over an infinite time frame that the best arm is not one of the competitors when this policy is employed. In the strong regret problem, we propose a Thompson sampling type algorithm and empirically compare its performance with others in the literature.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Actas de conferencias sobre el tema "Dueling Bandits Problem"

1

Sui, Yanan, Masrour Zoghi, Katja Hofmann y Yisong Yue. "Advancements in Dueling Bandits". En Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/776.

Texto completo

Resumen

The dueling bandits problem is an online learning framework where learning happens ``on-the-fly'' through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Sui, Yanan y Joel W. Burdick. "Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces". En Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/389.

Texto completo

Resumen

We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a K-armed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Yue, Yisong y Thorsten Joachims. "Interactively optimizing information retrieval systems as a dueling bandits problem". En the 26th Annual International Conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1553374.1553527.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!