Увійти

Готові списки джерел за темами / Dueling Bandits Problem

Добірка наукової літератури з теми "Dueling Bandits Problem"

Автор: Grafiati

Опубліковано: 6 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Dueling Bandits Problem".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Dueling Bandits Problem"

1

Xu, Liyuan, Junya Honda, and Masashi Sugiyama. "Dueling Bandits with Qualitative Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5549–56. http://dx.doi.org/10.1609/aaai.v33i01.33015549.

Повний текст джерела

Анотація:

We formulate and study a novel multi-armed bandit problem called the qualitative dueling bandit (QDB) problem, where an agent observes not numeric but qualitative feedback by pulling each arm. We employ the same regret as the dueling bandit (DB) problem where the duel is carried out by comparing the qualitative feedback. Although we can naively use classic DB algorithms for solving the QDB problem, this reduction significantly worsens the performance—actually, in the QDB problem, the probability that one arm wins the duel over another arm can be directly estimated without carrying out actual duels. In this paper1, we propose such direct algorithms for the QDB problem. Our theoretical analysis shows that the proposed algorithms significantly outperform DB algorithms by incorporating the qualitative feedback, and experimental results also demonstrate vast improvement over the existing DB algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Yue, Yisong, Josef Broder, Robert Kleinberg, and Thorsten Joachims. "The K-armed dueling bandits problem." Journal of Computer and System Sciences 78, no. 5 (September 2012): 1538–56. http://dx.doi.org/10.1016/j.jcss.2011.12.028.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Haddenhorst, Björn, Viktor Bengs, and Eyke Hüllermeier. "On testing transitivity in online preference learning." Machine Learning 110, no. 8 (July 12, 2021): 2063–84. http://dx.doi.org/10.1007/s10994-021-06026-2.

Повний текст джерела

Анотація:

AbstractThe efficiency of state-of-the-art algorithms for the dueling bandits problem is essentially due to a clever exploitation of (stochastic) transitivity properties of pairwise comparisons: If one arm is likely to beat a second one, which in turn is likely to beat a third one, then the first is also likely to beat the third one. By now, however, there is no way to test the validity of corresponding assumptions, although this would be a key prerequisite to guarantee the meaningfulness of the results produced by an algorithm. In this paper, we investigate the problem of testing different forms of stochastic transitivity in an online manner. We derive lower bounds on the expected sample complexity of any sequential hypothesis testing algorithm for various forms of stochastic transitivity, thereby providing additional motivation to focus on weak stochastic transitivity. To this end, we introduce an algorithmic framework for the dueling bandits problem, in which the statistical validity of weak stochastic transitivity can be tested, either actively or passively, based on a multiple binomial hypothesis test. Moreover, by exploiting a connection between weak stochastic transitivity and graph theory, we suggest an enhancement to further improve the efficiency of the testing algorithm. In the active setting, both variants achieve an expected sample complexity that is optimal up to a logarithmic factor.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Peköz, Erol, Sheldon M. Ross, and Zhengyu Zhang. "DUELING BANDIT PROBLEMS." Probability in the Engineering and Informational Sciences, November 20, 2020, 1–12. http://dx.doi.org/10.1017/s0269964820000601.

Повний текст джерела

Анотація:

There is a set of n bandits and at every stage, two of the bandits are chosen to play a game, with the result of a game being learned. In the “weak regret problem,” we suppose there is a “best” bandit that wins each game it plays with probability at least p > 1/2, with the value of p being unknown. The objective is to choose bandits to maximize the number of times that one of the competitors is the best bandit. In the “strong regret problem”, we suppose that bandit i has unknown value v i , i = 1, …, n, and that i beats j with probability v i /(v i + v j ). One version of strong regret is interested in maximizing the number of times that the contest is between the players with the two largest values. Another version supposes that at any stage, rather than choosing two arms to play a game, the decision maker can declare that a particular arm is the best, with the objective of maximizing the number of stages in which the arm with the largest value is declared to be the best. In the weak regret problem, we propose a policy and obtain an analytic bound on the expected number of stages over an infinite time frame that the best arm is not one of the competitors when this policy is employed. In the strong regret problem, we propose a Thompson sampling type algorithm and empirically compare its performance with others in the literature.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Dueling Bandits Problem"

1

Sui, Yanan, Masrour Zoghi, Katja Hofmann, and Yisong Yue. "Advancements in Dueling Bandits." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/776.

Повний текст джерела

Анотація:

The dueling bandits problem is an online learning framework where learning happens ``on-the-fly'' through preference feedback, i.e., from comparisons between a pair of actions. Unlike conventional online learning settings that require absolute feedback for each action, the dueling bandits framework assumes only the presence of (noisy) binary feedback about the relative quality of each pair of actions. The dueling bandits problem is well-suited for modeling settings that elicit subjective or implicit human feedback, which is typically more reliable in preference form. In this survey, we review recent results in the theories, algorithms, and applications of the dueling bandits problem. As an emerging domain, the theories and algorithms of dueling bandits have been intensively studied during the past few years. We provide an overview of recent advancements, including algorithmic advances and applications. We discuss extensions to standard problem formulation and novel application areas, highlighting key open research questions in our discussion.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Sui, Yanan, and Joel W. Burdick. "Correlational Dueling Bandits with Application to Clinical Treatment in Large Decision Spaces." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/389.

Повний текст джерела

Анотація:

We consider sequential decision making under uncertainty, the optimization over large decision space with noisy comparative feedback. This problem can be formulated as a K-armed Dueling Bandits problem where K is the total number of decisions. When K is very large, existing dueling bandits algorithms suffer huge cumulative regret before converging on the optimal arm. This paper studies the dueling bandits problem with a large number of dependent arms. Our problem is motivated by a clinical decision making process in large decision space. We propose an efficient algorithm CorrDuel for the problem which makes decisions to simultaneously deliver effective therapy and explore the decision space. Many sequential decision making problems with large and structured decision space could be facilitated by our algorithm. After evaluated the fast convergence of CorrDuel in analysis and simulation experiments, we applied it on a live clinical trial of therapeutic spinal cord stimulation. It is the first applied algorithm towards spinal cord injury treatments and experimental results show the effectiveness and efficiency of our algorithm.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Yue, Yisong, and Thorsten Joachims. "Interactively optimizing information retrieval systems as a dueling bandits problem." In the 26th Annual International Conference. New York, New York, USA: ACM Press, 2009. http://dx.doi.org/10.1145/1553374.1553527.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!