Готові списки джерел за темами / Adversarial bandits

Зміст

Статті в журналах
Дисертації
Книги
Частини книг
Тези доповідей конференцій

Добірка наукової літератури з теми "Adversarial bandits"

Автор: Grafiati

Опубліковано: 8 березня 2025

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Adversarial bandits".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Adversarial bandits"

Lu, Shiyin, Guanghui Wang, and Lijun Zhang. "Stochastic Graphical Bandits with Adversarial Corruptions." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 8749–57. http://dx.doi.org/10.1609/aaai.v35i10.17060.

Повний текст джерела

Анотація:

We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.

Стилі APA, Harvard, Vancouver, ISO та ін.

Pacchiano, Aldo, Heinrich Jiang, and Michael I. Jordan. "Robustness Guarantees for Mode Estimation with an Application to Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 9277–84. http://dx.doi.org/10.1609/aaai.v35i10.17119.

Повний текст джерела

Анотація:

Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and infinite continuous arms top arm recovery. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences, thus rendering modal bandits an attractive choice in situations where the rewards may have outliers or adversarial corruptions.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Zhiwei, Huazheng Wang, and Hongning Wang. "Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15770–77. http://dx.doi.org/10.1609/aaai.v38i14.29506.

Повний текст джерела

Анотація:

Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round. We also analyze the situation for general MAB algorithms equipped with our attack detection method and find that it is possible to have a stealthy attack that almost always succeeds. This brings new insights into the security risks of MAB algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

Esfandiari, Hossein, Amin Karbasi, Abbas Mehrabian, and Vahab Mirrokni. "Regret Bounds for Batched Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.

Повний текст джерела

Анотація:

We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.

Стилі APA, Harvard, Vancouver, ISO та ін.

Chen, Cheng, Canzhe Zhao, and Shuai Li. "Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6202–10. http://dx.doi.org/10.1609/aaai.v36i6.20569.

Повний текст джерела

Анотація:

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(log T) regret in the stochastic environment and O(m√nT) regret in the adversarial environment, where T is the number of rounds, n is the number of items and m is the number of positions. We also provide a lower bound of order Ω(m√nT) for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wang, Lingda, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney, and Zhizhen Zhao. "Adversarial Linear Contextual Bandits with Graph-Structured Side Observations." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 10156–64. http://dx.doi.org/10.1609/aaai.v35i11.17218.

Повний текст джерела

Анотація:

This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: contexts and side observations. In this setting, a learning agent repeatedly chooses from a set of K actions after being presented with a d-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on EXP3. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, EXP3-LGC-U, achieves a sub-linear regret with respect to the time horizon and the average independence number of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, EXP3-LGC-IX, is developed for a special class of problems, for which the regret is the same for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wachel, Pawel, and Cristian Rojas. "An Adversarial Approach to Adaptive Model Predictive Control." Journal of Advances in Applied & Computational Mathematics 9 (September 19, 2022): 135–46. http://dx.doi.org/10.15377/2409-5761.2022.09.10.

Повний текст джерела

Анотація:

This paper presents a novel approach to introducing adaptation in Model Predictive Control (MPC). Assuming limited a priori knowledge about the process, we consider a finite set of possible models (a dictionary), and use the theory of adversarial multi-armed bandits to develop an adaptive version of MPC called adversarial adaptive MPC (AAMPC). Under weak assumptions on the dictionary components, we then establish theoretical bounds on the performance of AAMPC and show its empirical behaviour via simulation examples.

Стилі APA, Harvard, Vancouver, ISO та ін.

Xu, Xiao, and Qing Zhao. "Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits." IEEE Transactions on Signal Processing 69 (2021): 2371–82. http://dx.doi.org/10.1109/tsp.2021.3070201.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Shi, Chengshuai, and Cong Shen. "On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications." IEEE Journal on Selected Areas in Information Theory 2, no. 2 (June 2021): 515–33. http://dx.doi.org/10.1109/jsait.2021.3076027.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Tae, Ki Hyun, Hantian Zhang, Jaeyoung Park, Kexin Rong, and Steven Euijong Whang. "Falcon: Fair Active Learning Using Multi-Armed Bandits." Proceedings of the VLDB Endowment 17, no. 5 (January 2024): 952–65. http://dx.doi.org/10.14778/3641204.3641207.

Повний текст джерела

Анотація:

Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8--4.5x higher than the second-best results.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Adversarial bandits"

Maillard, Odalric-Ambrym. "APPRENTISSAGE SÉQUENTIEL : Bandits, Statistique et Renforcement." Phd thesis, Université des Sciences et Technologie de Lille - Lille I, 2011. http://tel.archives-ouvertes.fr/tel-00845410.

Повний текст джерела

Анотація:

Cette thèse traite des domaines suivant en Apprentissage Automatique: la théorie des Bandits, l'Apprentissage statistique et l'Apprentissage par renforcement. Son fil rouge est l'étude de plusieurs notions d'adaptation, d'un point de vue non asymptotique : à un environnement ou à un adversaire dans la partie I, à la structure d'un signal dans la partie II, à la structure de récompenses ou à un modèle des états du monde dans la partie III. Tout d'abord nous dérivons une analyse non asymptotique d'un algorithme de bandit à plusieurs bras utilisant la divergence de Kullback-Leibler. Celle-ci permet d'atteindre, dans le cas de distributions à support fini, la borne inférieure de performance asymptotique dépendante des distributions de probabilité connue pour ce problème. Puis, pour un bandit avec un adversaire possiblement adaptatif, nous introduisons des modèles dépendants de l'histoire et traduisant une possible faiblesse de l'adversaire et montrons comment en tirer parti pour concevoir des algorithmes adaptatifs à cette faiblesse. Nous contribuons au problème de la régression en montrant l'utilité des projections aléatoires, à la fois sur le plan théorique et pratique, lorsque l'espace d'hypothèses considéré est de dimension grande, voire infinie. Nous utilisons également des opérateurs d'échantillonnage aléatoires dans le cadre de la reconstruction parcimonieuse lorsque la base est loin d'être orthogonale. Enfin, nous combinons la partie I et II : pour fournir une analyse non-asymptotique d'algorithmes d'apprentissage par renforcement; puis, en amont du cadre des Processus Décisionnel de Markov, pour discuter du problème pratique du choix d'un bon modèle d'états.

Стилі APA, Harvard, Vancouver, ISO та ін.

Aubert, Julien. "Théorie de l'estimation pour les processus d'apprentissage." Electronic Thesis or Diss., Université Côte d'Azur, 2025. http://www.theses.fr/2025COAZ5001.

Повний текст джерела

Анотація:

Cette thèse examine le problème de l'estimation du processus d'apprentissage d'un individu au cours d'une tâche à partir des actions qu'il effectue. Cette question se situe à l'intersection de la cognition, des statistiques et de l'apprentissage par renforcement, et implique le développement de modèles qui capturent avec précision la dynamique de l'apprentissage, l'estimation des paramètres des modèles et la sélection du modèle le mieux adapté. L'une des principales difficultés réside dans le fait que l'apprentissage, par sa nature même, conduit à des données non indépendantes et non stationnaires, puisque l'individu ajuste son processus de décision en fonction du résultat de ses choix précédents. Les théories et méthodes statistiques existantes sont bien établies pour les données indépendantes et stationnaires, mais leur application à un cadre d'apprentissage introduit des défis inédits. L'enjeu de cette thèse est de développer une théorie statistique capable d'expliquer et de justifier les méthodes empiriques existantes. Je commence par explorer les propriétés de l'estimateur du maximum de vraisemblance sur un modèle d'apprentissage fondé sur un problème de bandit. Je présente ensuite des résultats théoriques généraux sur la sélection de modèles à partir d'un critère de log-vraisemblance pénalisée pour des données non stationnaires et dépendantes. Ces résultats nécessitent le développement d'une nouvelle inégalité de concentration pour des suprema de processus renormalisés. Je présente également une procédure de hold-out et des garanties théoriques pour cette procédure dans un cadre d'apprentissage. Ces résultats théoriques sont étayés par des applications sur des données synthétiques et sur des expériences cognitives réelles en psychologie et en éthologie
This thesis considers the problem of estimating the learning process of an individual during a task based on observed choices or actions of that individual. This question lies at the intersection of cognition, statistics, and reinforcement learning, and involves developing models that accurately capture the dynamics of learning, estimating model parameters, and selecting the best-fitting model. A key difficulty is that learning, by nature, leads to non-independent and non-stationary data, as the individual selects its actions depending on the outcome of its previous choices.Existing statistical theories and methods are well-established for independent and stationary data, but their application to a learning framework introduces significant challenges. This thesis seeks to bridge the gap between empirical methods and theoretical guarantees in computational modeling. I first explore the properties of maximum likelihood estimation on a model of learning based on a bandit problem. I then present general theoretical results on penalized log-likelihood model selection for non-stationary and dependent data, for which I develop a new concentration inequality for the suprema of renormalized processes. I also introduce a hold-out procedure and theoretical guarantees for it in a learning framework. These theoretical results are supported with applications on synthetic data and on real cognitive experiments in psychology and ethology

Стилі APA, Harvard, Vancouver, ISO та ін.

Книги з теми "Adversarial bandits"

Parsons, Dave. Bandits!: Pictorial history of American adversarial aircraft. Osceola, WI: Motorbooks International, 1993.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Nelson, Derek, and Dave Parsons. Bandits!: Pictorial History of American Adversarial Aircraft. Motorbooks Intl, 1993.

Знайти повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Adversarial bandits"

Li, Yandi, and Jianxiong Guo. "A Modified EXP3 in Adversarial Bandits with Multi-user Delayed Feedback." In Lecture Notes in Computer Science, 263–78. Cham: Springer Nature Switzerland, 2023. http://dx.doi.org/10.1007/978-3-031-49193-1_20.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Zheng, Rong, and Cunqing Hua. "Adversarial Multi-armed Bandit." In Wireless Networks, 41–57. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-50502-2_4.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

St-Pierre, David L., and Olivier Teytaud. "Sharing Information in Adversarial Bandit." In Applications of Evolutionary Computation, 386–98. Berlin, Heidelberg: Springer Berlin Heidelberg, 2014. http://dx.doi.org/10.1007/978-3-662-45523-4_32.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Uchiya, Taishi, Atsuyoshi Nakamura, and Mineichi Kudo. "Algorithms for Adversarial Bandit Problems with Multiple Plays." In Lecture Notes in Computer Science, 375–89. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010. http://dx.doi.org/10.1007/978-3-642-16108-7_30.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Lee, Chia-Jung, Yalei Yang, Sheng-Hui Meng, and Tien-Wen Sung. "Adversarial Multiarmed Bandit Problems in Gradually Evolving Worlds." In Advances in Smart Vehicular Technology, Transportation, Communication and Applications, 305–11. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-70730-3_36.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

"Exp3 for Adversarial Linear Bandits." In Bandit Algorithms, 278–85. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.034.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

"The Relation between Adversarial and Stochastic Linear Bandits." In Bandit Algorithms, 306–12. Cambridge University Press, 2020. http://dx.doi.org/10.1017/9781108571401.036.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Srisawad, Phurinut, Juergen Branke, and Long Tran-Thanh. "Identifying the Best Arm in the Presence of Global Environment Shifts." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240735.

Повний текст джерела

Анотація:

This paper formulates a new Best-Arm Identification problem in the non-stationary stochastic bandits setting, where the means of all arms are shifted in the same way due to a global influence of the environment. The aim is to identify the unique best arm across environmental change given a fixed total budget. While this setting can be regarded as a special case of Adversarial Bandits or Corrupted Bandits, we demonstrate that existing solutions tailored to those settings do not fully utilise the nature of this global influence, and thus, do not work well in practice (despite their theoretical guarantees). To overcome this issue, in this paper we develop a novel selection policy that is consistent and robust in dealing with global environmental shifts. We then propose an allocation policy, LinLUCB, which exploits information about global shifts across all arms in each environment. Empirical tests depict a significant improvement in our policies against other existing methods.

Стилі APA, Harvard, Vancouver, ISO та ін.

Wissow, Stephen, and Masataro Asai. "Scale-Adaptive Balancing of Exploration and Exploitation in Classical Planning." In Frontiers in Artificial Intelligence and Applications. IOS Press, 2024. http://dx.doi.org/10.3233/faia240994.

Повний текст джерела

Анотація:

Balancing exploration and exploitation has been an important problem in both adversarial games and automated planning. While it has been extensively analyzed in the Multi-Armed Bandit (MAB) literature, and the game community has achieved great success with MAB-based Monte Carlo Tree Search (MCTS) methods, the planning community has struggled to advance in this area. We describe how Upper Confidence Bound 1’s (UCB1’s) assumption of reward distributions with known bounded support shared among siblings (arms) is violated when MCTS/Trial-based Heuristic Tree Search (THTS) in previous work uses heuristic values of search nodes in classical planning problems as rewards. To address this issue, we propose a new Gaussian bandit, UCB1-Normal2, and analyze its regret bound. It is variance-aware like UCB1-Normal and UCB-V, but has a distinct advantage: it neither shares UCB-V’s assumption of known bounded support nor relies on UCB1-Normal’s conjectures on Student’s t and χ2 distributions. Our theoretical analysis predicts that UCB1-Normal2 will perform well when the estimated variance is accurate, which can be expected in deterministic, discrete, finite state-space search, as in classical planning. Our empirical evaluation confirms that MCTS combined with UCB1-Normal2 outperforms Greedy Best First Search (traditional baseline) as well as MCTS with other bandits.

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Adversarial bandits"

Huang, Yin, Qingsong Liu, and Jie Xu. "Adversarial Combinatorial Bandits with Switching Cost and Arm Selection Constraints." In IEEE INFOCOM 2024 - IEEE Conference on Computer Communications, 371–80. IEEE, 2024. http://dx.doi.org/10.1109/infocom52122.2024.10621364.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Jinpeng, Yunni Xia, Xiaoning Sun, Peng Chen, Xiaobo Li, and Jiafeng Feng. "Delay-Aware Service Caching in Edge Cloud: An Adversarial Semi-Bandits Learning-Based Approach." In 2024 IEEE 17th International Conference on Cloud Computing (CLOUD), 411–18. IEEE, 2024. http://dx.doi.org/10.1109/cloud62652.2024.00053.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

La-aiddee, Panithan, Paramin Sangwongngam, Lunchakorn Wuttisittikulkij, and Pisit Vanichchanunt. "A Generative Adversarial Network-Based Approach for Reflective-Metasurface Unit-Cell Synthesis in mmWave Bands." In 2024 International Technical Conference on Circuits/Systems, Computers, and Communications (ITC-CSCC), 1–5. IEEE, 2024. http://dx.doi.org/10.1109/itc-cscc62988.2024.10628337.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Immorlica, Nicole, Karthik Abinav Sankararaman, Robert Schapire, and Aleksandrs Slivkins. "Adversarial Bandits with Knapsacks." In 2019 IEEE 60th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 2019. http://dx.doi.org/10.1109/focs.2019.00022.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Lykouris, Thodoris, Vahab Mirrokni, and Renato Paes Leme. "Stochastic bandits robust to adversarial corruptions." In STOC '18: Symposium on Theory of Computing. New York, NY, USA: ACM, 2018. http://dx.doi.org/10.1145/3188745.3188918.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Wan, Zongqi, Xiaoming Sun, and Jialin Zhang. "Bounded Memory Adversarial Bandits with Composite Anonymous Delayed Feedback." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/486.

Повний текст джерела

Анотація:

We study the adversarial bandit problem with composite anonymous delayed feedback. In this setting, losses of an action are split into d components, spreading over consecutive rounds after the action is chosen. And in each round, the algorithm observes the aggregation of losses that come from the latest d rounds. Previous works focus on oblivious adversarial setting, while we investigate the harder nonoblivious setting. We show nonoblivious setting incurs Omega(T) pseudo regret even when the loss sequence is bounded memory. However, we propose a wrapper algorithm which enjoys o(T) policy regret on many adversarial bandit problems with the assumption that the loss sequence is bounded memory. Especially, for K armed bandit and bandit convex optimization, our policy regret bound is in the order of T to the two third. We also prove a matching lower bound for K armed bandit. Our lower bound works even when the loss sequence is oblivious but the delay is nonoblivious. It answers the open problem proposed in [Wang, Wang, Huang 2021], showing that nonoblivious delay is enough to incur the regret in the order of T to the two third.

Стилі APA, Harvard, Vancouver, ISO та ін.

Bande, Meghana, and Venugopal V. Veeravalli. "Adversarial Multi-user Bandits for Uncoordinated Spectrum Access." In ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2019. http://dx.doi.org/10.1109/icassp.2019.8682263.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Han, Shuguang, Michael Bendersky, Przemek Gajda, Sergey Novikov, Marc Najork, Bernhard Brodowsky, and Alexandrin Popescul. "Adversarial Bandits Policy for Crawling Commercial Web Content." In WWW '20: The Web Conference 2020. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3366423.3380125.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Howard, William W., Anthony F. Martone, and R. Michael Buehrer. "Adversarial Multi-Player Bandits for Cognitive Radar Networks." In 2022 IEEE Radar Conference (RadarConf22). IEEE, 2022. http://dx.doi.org/10.1109/radarconf2248738.2022.9764226.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Rangi, Anshuka, Massimo Franceschetti, and Long Tran-Thanh. "Unifying the Stochastic and the Adversarial Bandits with Knapsack." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/459.

Повний текст джерела

Анотація:

This work investigates the adversarial Bandits with Knapsack (BwK) learning problem, where a player repeatedly chooses to perform an action, pays the corresponding cost of the action, and receives a reward associated with the action. The player is constrained by the maximum budget that can be spent to perform the actions, and the rewards and the costs of these actions are assigned by an adversary. This setting is studied in terms of expected regret, defined as the difference between the total expected rewards per unit cost corresponding the best fixed action and the total expected rewards per unit cost of the learning algorithm. We propose a novel algorithm EXP3.BwK and show that the expected regret of the algorithm is order optimal in the budget. We then propose another algorithm EXP3++.BwK, which is order optimal in the adversarial BwK setting, and incurs an almost optimal expected regret in the stochastic BwK setting where the rewards and the costs are drawn from unknown underlying distributions. These results are then extended to a more general online learning setting, by designing another algorithm EXP3++.LwK and providing its performance guarantees. Finally, we investigate the scenario where the costs of the actions are large and comparable to the budget. We show that for the adversarial setting, the achievable regret bounds scale at least linearly with the maximum cost for any learning algorithm, and are significantly worse in comparison to the case of having costs bounded by a constant, which is a common assumption in the BwK literature.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!