Academic literature on the topic 'Dueling Bandits Problem'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Dueling Bandits Problem.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Dueling Bandits Problem"

1

Xu, Liyuan, Junya Honda, and Masashi Sugiyama. "Dueling Bandits with Qualitative Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5549–56. http://dx.doi.org/10.1609/aaai.v33i01.33015549.

Full text
Abstract:
We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper1, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.
APA, Harvard, Vancouver, ISO, and other styles
2

Yue, Yisong, Josef Broder, Robert Kleinberg, and Thorsten Joachims. "The K-armed dueling bandits problem." Journal of Computer and System Sciences 78, no. 5 (September 2012): 1538–56. http://dx.doi.org/10.1016/j.jcss.2011.12.028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Haddenhorst, Björn, Viktor Bengs, and Eyke Hüllermeier. "On testing transitivity in online preference learning." Machine Learning 110, no. 8 (July 12, 2021): 2063–84. http://dx.doi.org/10.1007/s10994-021-06026-2.

Full text
Abstract:
AbstractThe efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.
APA, Harvard, Vancouver, ISO, and other styles
4

Peköz, Erol, Sheldon M. Ross, and Zhengyu Zhang. "DUELING BANDIT PROBLEMS." Probability in the Engineering and Informational Sciences, November 20, 2020, 1–12. http://dx.doi.org/10.1017/s0269964820000601.

Full text
Abstract:
There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the “weak regret problem,” we suppose there is a “best” bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the “strong regret problem”, we suppose that bandit i has unknown value v i , i = 1, …, n, and that i beats j with probability v i /(v i + v j ). One version of strong regret is interested in maximizing the number of times that the contest is between the players with the two largest values. Another version supposes that at any stage, rather than choosing two arms to play a game, the decision maker can declare that a particular arm is the best, with the objective of maximizing the number of stages in which the arm with the largest value is declared to be the best. In the weak regret problem, we propose a policy and obtain an analytic bound on the expected number of stages over an infinite time frame that the best arm is not one of the competitors when this policy is employed. In the strong regret problem, we propose a Thompson sampling type algorithm and empirically compare its performance with others in the literature.
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Dueling Bandits Problem"

1

Sui, Yanan, Masrour Zoghi, Katja Hofmann, and Yisong Yue. "Advancements in Dueling Bandits." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/776.

Full text
Abstract:
The dueling bandits problem is an online learning framework where learning happens ``on-the-fly'' through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.
APA, Harvard, Vancouver, ISO, and other styles
2

Sui, Yanan, and Joel W. Burdick. "Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/389.

Full text
Abstract:
We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a K-armed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.
APA, Harvard, Vancouver, ISO, and other styles
3

Yue, Yisong, and Thorsten Joachims. "Interactively optimizing information retrieval systems as a dueling bandits problem." In the 26th Annual International Conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1553374.1553527.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography