Gotowa bibliografia na temat „Dueling Bandits Problem”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Dueling Bandits Problem”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Dueling Bandits Problem"

1

Xu, Liyuan, Junya Honda i Masashi Sugiyama. "Dueling Bandits with Qualitative Feedback". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17.07.2019): 5549–56. http://dx.doi.org/10.1609/aaai.v33i01.33015549.

Pełny tekst źródła
Streszczenie:
We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper1, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.
Style APA, Harvard, Vancouver, ISO itp.
2

Yue, Yisong, Josef Broder, Robert Kleinberg i Thorsten Joachims. "The K-armed dueling bandits problem". Journal of Computer and System Sciences 78, nr 5 (wrzesień 2012): 1538–56. http://dx.doi.org/10.1016/j.jcss.2011.12.028.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Haddenhorst, Björn, Viktor Bengs i Eyke Hüllermeier. "On testing transitivity in online preference learning". Machine Learning 110, nr 8 (12.07.2021): 2063–84. http://dx.doi.org/10.1007/s10994-021-06026-2.

Pełny tekst źródła
Streszczenie:
AbstractThe efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.
Style APA, Harvard, Vancouver, ISO itp.
4

Peköz, Erol, Sheldon M. Ross i Zhengyu Zhang. "DUELING BANDIT PROBLEMS". Probability in the Engineering and Informational Sciences, 20.11.2020, 1–12. http://dx.doi.org/10.1017/s0269964820000601.

Pełny tekst źródła
Streszczenie:
There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the “weak regret problem,” we suppose there is a “best” bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the “strong regret problem”, we suppose that bandit i has unknown value v i , i = 1, …, n, and that i beats j with probability v i /(v i + v j ). One version of strong regret is interested in maximizing the number of times that the contest is between the players with the two largest values. Another version supposes that at any stage, rather than choosing two arms to play a game, the decision maker can declare that a particular arm is the best, with the objective of maximizing the number of stages in which the arm with the largest value is declared to be the best. In the weak regret problem, we propose a policy and obtain an analytic bound on the expected number of stages over an infinite time frame that the best arm is not one of the competitors when this policy is employed. In the strong regret problem, we propose a Thompson sampling type algorithm and empirically compare its performance with others in the literature.
Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Dueling Bandits Problem"

1

Sui, Yanan, Masrour Zoghi, Katja Hofmann i Yisong Yue. "Advancements in Dueling Bandits". W Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/776.

Pełny tekst źródła
Streszczenie:
The dueling bandits problem is an online learning framework where learning happens ``on-the-fly'' through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.
Style APA, Harvard, Vancouver, ISO itp.
2

Sui, Yanan, i Joel W. Burdick. "Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces". W Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/389.

Pełny tekst źródła
Streszczenie:
We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a K-armed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.
Style APA, Harvard, Vancouver, ISO itp.
3

Yue, Yisong, i Thorsten Joachims. "Interactively optimizing information retrieval systems as a dueling bandits problem". W the 26th Annual International Conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1553374.1553527.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii