To see the other types of publications on this topic, follow the link: Adversarial bandits.

Journal articles on the topic 'Adversarial bandits'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Adversarial bandits.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lu, Shiyin, Guanghui Wang, and Lijun Zhang. "Stochastic Graphical Bandits with Adversarial Corruptions." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 8749–57. http://dx.doi.org/10.1609/aaai.v35i10.17060.

Full text
Abstract:
We study bandits with graph-structured feedback, where a learner repeatedly selects an arm and then observes rewards of the chosen arm as well as its neighbors in the feedback graph. Existing work on graphical bandits assumes either stochastic rewards or adversarial rewards, both of which are extremes and appear rarely in real-world scenarios. In this paper, we study graphical bandits with a reward model that interpolates between the two extremes, where the rewards are overall stochastically generated but a small fraction of them can be adversarially corrupted. For this problem, we propose an online algorithm that can utilize the stochastic pattern and also tolerate the adversarial corruptions. The main idea is to restrict exploration to carefully-designed independent sets of the feedback graph and perform exploitation by adopting a soft version of arm elimination. Theoretical analysis shows that our algorithm attains an $O(\alpha \ln{K} \ln{T} + \alpha C)$ regret, where $\alpha$ is the independence number of the feedback graph, $K$ is the number of arms, $T$ is the time horizon, and $C$ quantifies the total corruptions introduced by the adversary. The effectiveness of our algorithm is demonstrated by numerical experiments.
APA, Harvard, Vancouver, ISO, and other styles
2

Pacchiano, Aldo, Heinrich Jiang, and Michael I. Jordan. "Robustness Guarantees for Mode Estimation with an Application to Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 10 (May 18, 2021): 9277–84. http://dx.doi.org/10.1609/aaai.v35i10.17119.

Full text
Abstract:
Mode estimation is a classical problem in statistics with a wide range of applications in machine learning. Despite this, there is little understanding in its robustness properties under possibly adversarial data contamination. In this paper, we give precise robustness guarantees as well as privacy guarantees under simple randomization. We then introduce a theory for multi-armed bandits where the values are the modes of the reward distributions instead of the mean. We prove regret guarantees for the problems of top arm identification, top m-arms identification, contextual modal bandits, and infinite continuous arms top arm recovery. We show in simulations that our algorithms are robust to perturbation of the arms by adversarial noise sequences, thus rendering modal bandits an attractive choice in situations where the rewards may have outliers or adversarial corruptions.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Zhiwei, Huazheng Wang, and Hongning Wang. "Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 14 (March 24, 2024): 15770–77. http://dx.doi.org/10.1609/aaai.v38i14.29506.

Full text
Abstract:
Adversarial attacks against stochastic multi-armed bandit (MAB) algorithms have been extensively studied in the literature. In this work, we focus on reward poisoning attacks and find most existing attacks can be easily detected by our proposed detection method based on the test of homogeneity, due to their aggressive nature in reward manipulations. This motivates us to study the notion of stealthy attack against stochastic MABs and investigate the resulting attackability. Our analysis shows that against two popularly employed MAB algorithms, UCB1 and $\epsilon$-greedy, the success of a stealthy attack depends on the environmental conditions and the realized reward of the arm pulled in the first round. We also analyze the situation for general MAB algorithms equipped with our attack detection method and find that it is possible to have a stealthy attack that almost always succeeds. This brings new insights into the security risks of MAB algorithms.
APA, Harvard, Vancouver, ISO, and other styles
4

Esfandiari, Hossein, Amin Karbasi, Abbas Mehrabian, and Vahab Mirrokni. "Regret Bounds for Batched Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7340–48. http://dx.doi.org/10.1609/aaai.v35i8.16901.

Full text
Abstract:
We present simple algorithms for batched stochastic multi-armed bandit and batched stochastic linear bandit problems. We prove bounds for their expected regrets that improve and extend the best known regret bounds of Gao, Han, Ren, and Zhou (NeurIPS 2019), for any number of batches. In particular, our algorithms in both settings achieve the optimal expected regrets by using only a logarithmic number of batches. We also study the batched adversarial multi-armed bandit problem for the first time and provide the optimal regret, up to logarithmic factors, of any algorithm with predetermined batch sizes.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Cheng, Canzhe Zhao, and Shuai Li. "Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 6 (June 28, 2022): 6202–10. http://dx.doi.org/10.1609/aaai.v36i6.20569.

Full text
Abstract:
Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(log T) regret in the stochastic environment and O(m√nT) regret in the adversarial environment, where T is the number of rounds, n is the number of items and m is the number of positions. We also provide a lower bound of order Ω(m√nT) for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.
APA, Harvard, Vancouver, ISO, and other styles
6

Wang, Lingda, Bingcong Li, Huozhi Zhou, Georgios B. Giannakis, Lav R. Varshney, and Zhizhen Zhao. "Adversarial Linear Contextual Bandits with Graph-Structured Side Observations." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 10156–64. http://dx.doi.org/10.1609/aaai.v35i11.17218.

Full text
Abstract:
This paper studies the adversarial graphical contextual bandits, a variant of adversarial multi-armed bandits that leverage two categories of the most common side information: contexts and side observations. In this setting, a learning agent repeatedly chooses from a set of K actions after being presented with a d-dimensional context vector. The agent not only incurs and observes the loss of the chosen action, but also observes the losses of its neighboring actions in the observation structures, which are encoded as a series of feedback graphs. This setting models a variety of applications in social networks, where both contexts and graph-structured side observations are available. Two efficient algorithms are developed based on EXP3. Under mild conditions, our analysis shows that for undirected feedback graphs the first algorithm, EXP3-LGC-U, achieves a sub-linear regret with respect to the time horizon and the average independence number of the feedback graphs. A slightly weaker result is presented for the directed graph setting as well. The second algorithm, EXP3-LGC-IX, is developed for a special class of problems, for which the regret is the same for both directed as well as undirected feedback graphs. Numerical tests corroborate the efficiency of proposed algorithms.
APA, Harvard, Vancouver, ISO, and other styles
7

Wachel, Pawel, and Cristian Rojas. "An Adversarial Approach to Adaptive Model Predictive Control." Journal of Advances in Applied & Computational Mathematics 9 (September 19, 2022): 135–46. http://dx.doi.org/10.15377/2409-5761.2022.09.10.

Full text
Abstract:
This paper presents a novel approach to introducing adaptation in Model Predictive Control (MPC). Assuming limited a priori knowledge about the process, we consider a finite set of possible models (a dictionary), and use the theory of adversarial multi-armed bandits to develop an adaptive version of MPC called adversarial adaptive MPC (AAMPC). Under weak assumptions on the dictionary components, we then establish theoretical bounds on the performance of AAMPC and show its empirical behaviour via simulation examples.
APA, Harvard, Vancouver, ISO, and other styles
8

Xu, Xiao, and Qing Zhao. "Memory-Constrained No-Regret Learning in Adversarial Multi-Armed Bandits." IEEE Transactions on Signal Processing 69 (2021): 2371–82. http://dx.doi.org/10.1109/tsp.2021.3070201.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Shi, Chengshuai, and Cong Shen. "On No-Sensing Adversarial Multi-Player Multi-Armed Bandits With Collision Communications." IEEE Journal on Selected Areas in Information Theory 2, no. 2 (June 2021): 515–33. http://dx.doi.org/10.1109/jsait.2021.3076027.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Tae, Ki Hyun, Hantian Zhang, Jaeyoung Park, Kexin Rong, and Steven Euijong Whang. "Falcon: Fair Active Learning Using Multi-Armed Bandits." Proceedings of the VLDB Endowment 17, no. 5 (January 2024): 952–65. http://dx.doi.org/10.14778/3641204.3641207.

Full text
Abstract:
Biased data can lead to unfair machine learning models, highlighting the importance of embedding fairness at the beginning of data analysis, particularly during dataset curation and labeling. In response, we propose Falcon, a scalable fair active learning framework. Falcon adopts a data-centric approach that improves machine learning model fairness via strategic sample selection. Given a user-specified group fairness measure, Falcon identifies samples from "target groups" (e.g., (attribute=female, label=positive)) that are the most informative for improving fairness. However, a challenge arises since these target groups are defined using ground truth labels that are not available during sample selection. To handle this, we propose a novel trial-and-error method, where we postpone using a sample if the predicted label is different from the expected one and falls outside the target group. We also observe the trade-off that selecting more informative samples results in higher likelihood of postponing due to undesired label prediction, and the optimal balance varies per dataset. We capture the trade-off between informativeness and postpone rate as policies and propose to automatically select the best policy using adversarial multi-armed bandit methods, given their computational efficiency and theoretical guarantees. Experiments show that Falcon significantly outperforms existing fair active learning approaches in terms of fairness and accuracy and is more efficient. In particular, only Falcon supports a proper trade-off between accuracy and fairness where its maximum fairness score is 1.8--4.5x higher than the second-best results.
APA, Harvard, Vancouver, ISO, and other styles
11

Cheung, Wang Chi, David Simchi-Levi, and Ruihao Zhu. "Hedging the Drift: Learning to Optimize Under Nonstationarity." Management Science 68, no. 3 (March 2022): 1696–713. http://dx.doi.org/10.1287/mnsc.2021.4024.

Full text
Abstract:
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic regret bounds for a collection of nonstationary stochastic bandit settings. These settings capture applications such as advertisement allocation, dynamic pricing, and traffic network routing in changing environments. We show how the difficulty posed by the (unknown a priori and possibly adversarial) nonstationarity can be overcome by an unconventional marriage between stochastic and adversarial bandit learning algorithms. Beginning with the linear bandit setting, we design and analyze a sliding window-upper confidence bound algorithm that achieves the optimal dynamic regret bound when the underlying variation budget is known. This budget quantifies the total amount of temporal variation of the latent environments. Boosted by the novel bandit-over-bandit framework that adapts to the latent changes, our algorithm can further enjoy nearly optimal dynamic regret bounds in a (surprisingly) parameter-free manner. We extend our results to other related bandit problems, namely the multiarmed bandit, generalized linear bandit, and combinatorial semibandit settings, which model a variety of operations research applications. In addition to the classical exploration-exploitation trade-off, our algorithms leverage the power of the “forgetting principle” in the learning processes, which is vital in changing environments. Extensive numerical experiments with synthetic datasets and a dataset of an online auto-loan company during the severe acute respiratory syndrome (SARS) epidemic period demonstrate that our proposed algorithms achieve superior performance compared with existing algorithms. This paper was accepted by George J. Shanthikumar, Management Science Special Section on Data-Driven Prescriptive Analytics.
APA, Harvard, Vancouver, ISO, and other styles
12

Guan, Ziwei, Kaiyi Ji, Donald J. Bucci Jr., Timothy Y. Hu, Joseph Palombo, Michael Liston, and Yingbin Liang. "Robust Stochastic Bandit Algorithms under Probabilistic Unbounded Adversarial Attack." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 4036–43. http://dx.doi.org/10.1609/aaai.v34i04.5821.

Full text
Abstract:
The multi-armed bandit formalism has been extensively studied under various attack models, in which an adversary can modify the reward revealed to the player. Previous studies focused on scenarios where the attack value either is bounded at each round or has a vanishing probability of occurrence. These models do not capture powerful adversaries that can catastrophically perturb the revealed reward. This paper investigates the attack model where an adversary attacks with a certain probability at each round, and its attack value can be arbitrary and unbounded if it attacks. Furthermore, the attack value does not necessarily follow a statistical distribution. We propose a novel sample median-based and exploration-aided UCB algorithm (called med-E-UCB) and a median-based ϵ-greedy algorithm (called med-ϵ-greedy). Both of these algorithms are provably robust to the aforementioned attack model. More specifically we show that both algorithms achieve O(log T) pseudo-regret (i.e., the optimal regret without attacks). We also provide a high probability guarantee of O(log T) regret with respect to random rewards and random occurrence of attacks. These bounds are achieved under arbitrary and unbounded reward perturbation as long as the attack probability does not exceed a certain constant threshold. We provide multiple synthetic simulations of the proposed algorithms to verify these claims and showcase the inability of existing techniques to achieve sublinear regret. We also provide experimental results of the algorithm operating in a cognitive radio setting using multiple software-defined radios.
APA, Harvard, Vancouver, ISO, and other styles
13

Lattimore, Tor. "Improved regret for zeroth-order adversarial bandit convex optimisation." Mathematical Statistics and Learning 2, no. 3 (October 16, 2020): 311–34. http://dx.doi.org/10.4171/msl/17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Zhao, Haihong, Xinbin Li, Song Han, Lei Yan, and Xinping Guan. "Adaptive OFDM underwater acoustic transmission: An adversarial bandit approach." Neurocomputing 385 (April 2020): 148–59. http://dx.doi.org/10.1016/j.neucom.2019.12.063.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Esfandiari, Hossein, MohammadTaghi HajiAghayi, Brendan Lucier, and Michael Mitzenmacher. "Online Pandora’s Boxes and Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 1885–92. http://dx.doi.org/10.1609/aaai.v33i01.33011885.

Full text
Abstract:
We consider online variations of the Pandora’s box problem (Weitzman 1979), a standard model for understanding issues related to the cost of acquiring information for decision-making. Our problem generalizes both the classic Pandora’s box problem and the prophet inequality framework. Boxes are presented online, each with a random value and cost drawn jointly from some known distribution. Pandora chooses online whether to open each box given its cost, and then chooses irrevocably whether to keep the revealed prize or pass on it. We aim for approximation algorithms against adversaries that can choose the largest prize over any opened box, and use optimal offline policies to decide which boxes to open (without knowledge of the value inside)1. We consider variations where Pandora can collect multiple prizes subject to feasibility constraints, such as cardinality, matroid, or knapsack constraints. We also consider variations related to classic multi-armed bandit problems from reinforcement learning. Our results use a reduction-based framework where we separate the issues of the cost of acquiring information from the online decision process of which prizes to keep. Our work shows that in many scenarios, Pandora can achieve a good approximation to the best possible performance.
APA, Harvard, Vancouver, ISO, and other styles
16

Vural, Nuri Mert, Hakan Gokcesu, Kaan Gokcesu, and Suleyman S. Kozat. "Minimax Optimal Algorithms for Adversarial Bandit Problem With Multiple Plays." IEEE Transactions on Signal Processing 67, no. 16 (August 15, 2019): 4383–98. http://dx.doi.org/10.1109/tsp.2019.2928952.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Gokcesu, Kaan, and Suleyman Serdar Kozat. "An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem." IEEE Transactions on Neural Networks and Learning Systems 29, no. 11 (November 2018): 5565–80. http://dx.doi.org/10.1109/tnnls.2018.2806006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Wang, Siwei, Haoyun Wang, and Longbo Huang. "Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 10210–17. http://dx.doi.org/10.1609/aaai.v35i11.17224.

Full text
Abstract:
We study the multi-armed bandit (MAB) problem with composite and anonymous feedback. In this model, the reward of pulling an arm spreads over a period of time (we call this period as reward interval) and the player receives partial rewards of the action, convoluted with rewards from pulling other arms, successively. Existing results on this model require prior knowledge about the reward interval size as an input to their algorithms. In this paper, we propose adaptive algorithms for both the stochastic and the adversarial cases, without requiring any prior information about the reward interval. For the stochastic case, we prove that our algorithm guarantees a regret that matches the lower bounds (in order). For the adversarial case, we propose the first algorithm to jointly handle non-oblivious adversary and unknown reward interval size. We also conduct simulations based on real-world dataset. The results show that our algorithms outperform existing benchmarks.
APA, Harvard, Vancouver, ISO, and other styles
19

Bychkov, G. K., D. M. Dvinskikh, A. V. Antsiferova, A. V. Gasnikov, and A. V. Lobanov. "Accelerated Zero-Order SGD under High-Order Smoothness and Overparameterized Regime." Nelineinaya Dinamika 20, no. 5 (2024): 759–88. https://doi.org/10.20537/nd241209.

Full text
Abstract:
We present a novel gradient-free algorithm to solve a convex stochastic optimization problem, such as those encountered in medicine, physics, and machine learning (e.g., the adversarial multi-armed bandit problem), where the objective function can only be computed through numerical simulation, either as the result of a real experiment or as feedback given by the function evaluations from an adversary. Thus, we suppose that only black-box access to the function values of the objective is available, possibly corrupted by adversarial noise: deterministic or stochastic. The noisy setup can arise naturally from modeling randomness within a simulation or by computer discretization, or when exact values of the function are forbidden due to privacy
APA, Harvard, Vancouver, ISO, and other styles
20

Avadhanula, Vashist, Andrea Celli, Riccardo Colini-Baldeschi, Stefano Leonardi, and Matteo Russo. "Fully Dynamic Online Selection through Online Contention Resolution Schemes." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6693–700. http://dx.doi.org/10.1609/aaai.v37i6.25821.

Full text
Abstract:
We study fully dynamic online selection problems in an adversarial/stochastic setting that includes Bayesian online selection, prophet inequalities, posted price mechanisms, and stochastic probing problems subject to combinatorial constraints. In the classical ``incremental'' version of the problem, selected elements remain active until the end of the input sequence. On the other hand, in the fully dynamic version of the problem, elements stay active for a limited time interval, and then leave. This models, for example, the online matching of tasks to workers with task/worker-dependent working times, and sequential posted pricing of perishable goods. A successful approach to online selection problems in the adversarial setting is given by the notion of Online Contention Resolution Scheme (OCRS), that uses a priori information to formulate a linear relaxation of the underlying optimization problem, whose optimal fractional solution is rounded online for any adversarial order of the input sequence. Our main contribution is providing a general method for constructing an OCRS for fully dynamic online selection problems. Then, we show how to employ such OCRS to construct no-regret algorithms in a partial information model with semi-bandit feedback and adversarial inputs.
APA, Harvard, Vancouver, ISO, and other styles
21

Dai, Yan, and Longbo Huang. "Adversarial Network Optimization under Bandit Feedback: Maximizing Utility in Non-Stationary Multi-Hop Networks." Proceedings of the ACM on Measurement and Analysis of Computing Systems 8, no. 3 (December 10, 2024): 1–48. https://doi.org/10.1145/3700413.

Full text
Abstract:
Stochastic Network Optimization (SNO) concerns scheduling in stochastic queueing systems and has been widely studied in network theory. Classical SNO algorithms require network conditions to be stationary w.r.t. time, which fails to capture the non-stationary components in increasingly many real-world scenarios. Moreover, most existing algorithms in network optimization assume perfect knowledge of network conditions before decision, which again rules out applications where unpredictability in network conditions presents. Motivated by these issues, this paper considers Adversarial Network Optimization (ANO) under bandit feedback. Specifically, we consider the task of i) maximizing some unknown and time-varying utility function associated with scheduler's actions, where ii) the underlying network topology is a non-stationary multi-hop network whose conditions change arbitrarily with time, and iii) only bandit feedback (the effect of actually deployed actions) is revealed after decision-making. We propose the UMO2 algorithm, which does not require any pre-decision knowledge or counterfactual feedback, ensures network stability, and also matches the utility maximization performance of any "mildly varying" reference policy up to a polynomially decaying gap. To our knowledge, no previous algorithm can handle multi-hop networks or achieve utility maximization guarantees in ANO problems with bandit feedback, whereas ours is able to do both. Technically, our method builds upon a novel integration of online learning techniques into the Lyapunov drift-plus-penalty method. Specifically, we propose meticulous analytical techniques to jointly balance online learning and Lyapunov arguments, which is used to handle the complex inter-dependency among queues in multi-hop networks. To tackle the learning obstacles due to potentially unbounded queue sizes and negative queue differences, we design a new online linear optimization algorithm that automatically adapts to the unknown (potentially negative) loss magnitudes. Finally, we also propose a bandit convex optimization algorithm with novel queue-dependent learning rate scheduling that suites drastically varying queue lengths in utility maximization. Our new insights and techniques in online learning can also be of independent interest.
APA, Harvard, Vancouver, ISO, and other styles
22

Bao, Hongyan, Yufei Han, Yujun Zhou, Xin Gao, and Xiangliang Zhang. "Towards Efficient and Domain-Agnostic Evasion Attack with High-Dimensional Categorical Inputs." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 6 (June 26, 2023): 6753–61. http://dx.doi.org/10.1609/aaai.v37i6.25828.

Full text
Abstract:
Our work targets at searching feasible adversarial perturbation to attack a classifier with high-dimensional categorical inputs in a domain-agnostic setting. This is intrinsically a NP-hard knapsack problem where the exploration space becomes explosively larger as the feature dimension increases. Without the help of domain knowledge, solving this problem via heuristic method, such as Branch-and-Bound, suffers from exponential complexity, yet can bring arbitrarily bad attack results. We address the challenge via the lens of multi-armed bandit based combinatorial search. Our proposed method, namely FEAT, treats modifying each categorical feature as pulling an arm in multi-armed bandit programming. Our objective is to achieve highly efficient and effective attack using an Orthogonal Matching Pursuit (OMP)-enhanced Upper Confidence Bound (UCB) exploration strategy. Our theoretical analysis bounding the regret gap of FEAT guarantees its practical attack performance. In empirical analysis, we compare FEAT with other state-of-the-art domain-agnostic attack methods over various real-world categorical data sets of different applications. Substantial experimental observations confirm the expected efficiency and attack effectiveness of FEAT applied in different application scenarios. Our work further hints the applicability of FEAT for assessing the adversarial vulnerability of classification systems with high-dimensional categorical inputs.
APA, Harvard, Vancouver, ISO, and other styles
23

Yang, Jianyi, and Shaolei Ren. "Robust Bandit Learning with Imperfect Context." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 12 (May 18, 2021): 10594–602. http://dx.doi.org/10.1609/aaai.v35i12.17267.

Full text
Abstract:
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a novel contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis.
APA, Harvard, Vancouver, ISO, and other styles
24

Ramanujan, Raghuram, Ashish Sabharwal, and Bart Selman. "On Adversarial Search Spaces and Sampling-Based Planning." Proceedings of the International Conference on Automated Planning and Scheduling 20 (May 25, 2021): 242–45. http://dx.doi.org/10.1609/icaps.v20i1.13437.

Full text
Abstract:
Upper Confidence bounds applied to Trees (UCT), a bandit-based Monte-Carlo sampling algorithm for planning, has recently been the subject of great interest in adversarial reasoning. UCT has been shown to outperform traditional minimax based approaches in several challenging domains such as Go and Kriegspiel, although minimax search still prevails in other domains such as Chess. This work provides insights into the properties of adversarial search spaces that play a key role in the success or failure of UCT and similar sampling-based approaches. We show that certain "early loss" or "shallow trap" configurations, while unlikely in Go, occur surprisingly often in games like Chess (even in grandmaster games). We provide evidence that UCT, unlike minimax search, is unable to identify such traps in Chess and spends a great deal of time exploring much deeper game play than needed.
APA, Harvard, Vancouver, ISO, and other styles
25

Lancewicki, Tal, Aviv Rosenberg, and Yishay Mansour. "Learning Adversarial Markov Decision Processes with Delayed Feedback." Proceedings of the AAAI Conference on Artificial Intelligence 36, no. 7 (June 28, 2022): 7281–89. http://dx.doi.org/10.1609/aaai.v36i7.20690.

Full text
Abstract:
Reinforcement learning typically assumes that agents observe feedback for their actions immediately, but in many real-world applications (like recommendation systems) feedback is observed in delay. This paper studies online learning in episodic Markov decision processes (MDPs) with unknown transitions, adversarially changing costs and unrestricted delayed feedback. That is, the costs and trajectory of episode k are revealed to the learner only in the end of episode k+dᵏ, where the delays dᵏ are neither identical nor bounded, and are chosen by an oblivious adversary. We present novel algorithms based on policy optimization that achieve near-optimal high-probability regret of (K+D)¹ᐟ² under full-information feedback, where K is the number of episodes and D=∑ₖ dᵏ is the total delay. Under bandit feedback, we prove similar (K+D)¹ᐟ² regret assuming the costs are stochastic, and (K+D)²ᐟ³ regret in the general case. We are the first to consider regret minimization in the important setting of MDPs with delayed feedback.
APA, Harvard, Vancouver, ISO, and other styles
26

Wang, Peng, Jingju Liu, Dongdong Hou, and Shicheng Zhou. "A Cybersecurity Knowledge Graph Completion Method Based on Ensemble Learning and Adversarial Training." Applied Sciences 12, no. 24 (December 16, 2022): 12947. http://dx.doi.org/10.3390/app122412947.

Full text
Abstract:
The application of cybersecurity knowledge graphs is attracting increasing attention. However, many cybersecurity knowledge graphs are incomplete due to the sparsity of cybersecurity knowledge. Existing knowledge graph completion methods do not perform well in domain knowledge, and they are not robust enough relative to noise data. To address these challenges, in this paper we develop a new knowledge graph completion method called CSEA based on ensemble learning and adversarial training. Specifically, we integrate a variety of projection and rotation operations to model the relationships between entities, and use angular information to distinguish entities. A cooperative adversarial training method is designed to enhance the generalization and robustness of the model. We combine the method of generating perturbations for the embedding layers with the self-adversarial training method. The UCB (upper confidence bound) multi-armed bandit method is used to select the perturbations of the embedding layer. This achieves a balance between perturbation diversity and maximum loss. To this end, we build a cybersecurity knowledge graph based on the CVE, CWE, and CAPEC cybersecurity databases. Our experimental results demonstrate the superiority of our proposed model for completing cybersecurity knowledge graphs.
APA, Harvard, Vancouver, ISO, and other styles
27

Xu, Jianyu, Bin Liu, Huadong Mo, and Daoyi Dong. "Bayesian adversarial multi-node bandit for optimal smart grid protection against cyber attacks." Automatica 128 (June 2021): 109551. http://dx.doi.org/10.1016/j.automatica.2021.109551.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Hyeong Soo Chang, Jiaqiao Hu, M. C. Fu, and S. I. Marcus. "Adaptive Adversarial Multi-Armed Bandit Approach to Two-Person Zero-Sum Markov Games." IEEE Transactions on Automatic Control 55, no. 2 (February 2010): 463–68. http://dx.doi.org/10.1109/tac.2009.2036333.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Killian, Jackson A., Arpita Biswas, Lily Xu, Shresth Verma, Vineet Nair, Aparna Taneja, Aparna Hegde, et al. "Robust Planning over Restless Groups: Engagement Interventions for a Large-Scale Maternal Telehealth Program." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 12 (June 26, 2023): 14295–303. http://dx.doi.org/10.1609/aaai.v37i12.26672.

Full text
Abstract:
In 2020, maternal mortality in India was estimated to be as high as 130 deaths per 100K live births, nearly twice the UN's target. To improve health outcomes, the non-profit ARMMAN sends automated voice messages to expecting and new mothers across India. However, 38% of mothers stop listening to these calls, missing critical preventative care information. To improve engagement, ARMMAN employs health workers to intervene by making service calls, but workers can only call a fraction of the 100K enrolled mothers. Partnering with ARMMAN, we model the problem of allocating limited interventions across mothers as a restless multi-armed bandit (RMAB), where the realities of large scale and model uncertainty present key new technical challenges. We address these with GROUPS, a double oracle–based algorithm for robust planning in RMABs with scalable grouped arms. Robustness over grouped arms requires several methodological advances. First, to adversarially select stochastic group dynamics, we develop a new method to optimize Whittle indices over transition probability intervals. Second, to learn group-level RMAB policy best responses to these adversarial environments, we introduce a weighted index heuristic. Third, we prove a key theoretical result that planning over grouped arms achieves the same minimax regret--optimal strategy as planning over individual arms, under a technical condition. Finally, using real-world data from ARMMAN, we show that GROUPS produces robust policies that reduce minimax regret by up to 50%, halving the number of preventable missed voice messages to connect more mothers with life-saving maternal health information.
APA, Harvard, Vancouver, ISO, and other styles
30

Bubeck, Sébastien, Ronen Eldan, and Yin Tat Lee. "Kernel-based Methods for Bandit Convex Optimization." Journal of the ACM 68, no. 4 (June 30, 2021): 1–35. http://dx.doi.org/10.1145/3453721.

Full text
Abstract:
We consider the adversarial convex bandit problem and we build the first poly( T )-time algorithm with poly( n ) √ T -regret for this problem. To do so, we introduce three new ideas in the derivative-free optimization literature: (i) kernel methods, (ii) a generalization of Bernoulli convolutions, and (iii) a new annealing schedule for exponential weights (with increasing learning rate). The basic version of our algorithm achieves Õ( n 9.5 √ T )-regret, and we show that a simple variant of this algorithm can be run in poly( n log ( T ))-time per step (for polytopes with polynomially many constraints) at the cost of an additional poly( n ) T o(1) factor in the regret. These results improve upon the Õ( n 11 √ T -regret and exp (poly( T ))-time result of the first two authors and the log ( T ) poly( n ) √ T -regret and log( T ) poly( n ) -time result of Hazan and Li. Furthermore, we conjecture that another variant of the algorithm could achieve Õ( n 1.5 √ T )-regret, and moreover that this regret is unimprovable (the current best lower bound being Ω ( n √ T ) and it is achieved with linear functions). For the simpler situation of zeroth order stochastic convex optimization this corresponds to the conjecture that the optimal query complexity is of order n 3 / ɛ 2 .
APA, Harvard, Vancouver, ISO, and other styles
31

Sudianto, Edi. "Digest: Banding together to battle adversaries has its consequences*." Evolution 73, no. 6 (April 24, 2019): 1320–21. http://dx.doi.org/10.1111/evo.13750.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Doan, Thang, João Monteiro, Isabela Albuquerque, Bogdan Mazoure, Audrey Durand, Joelle Pineau, and R. Devon Hjelm. "On-Line Adaptative Curriculum Learning for GANs." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3470–77. http://dx.doi.org/10.1609/aaai.v33i01.33013470.

Full text
Abstract:
Generative Adversarial Networks (GANs) can successfully approximate a probability distribution and produce realistic samples. However, open questions such as sufficient convergence conditions and mode collapse still persist. In this paper, we build on existing work in the area by proposing a novel framework for training the generator against an ensemble of discriminator networks, which can be seen as a one-student/multiple-teachers setting. We formalize this problem within the full-information adversarial bandit framework, where we evaluate the capability of an algorithm to select mixtures of discriminators for providing the generator with feedback during learning. To this end, we propose a reward function which reflects the progress made by the generator and dynamically update the mixture weights allocated to each discriminator. We also draw connections between our algorithm and stochastic optimization methods and then show that existing approaches using multiple discriminators in literature can be recovered from our framework. We argue that less expressive discriminators are smoother and have a general coarse grained view of the modes map, which enforces the generator to cover a wide portion of the data distribution support. On the other hand, highly expressive discriminators ensure samples quality. Finally, experimental results show that our approach improves samples quality and diversity over existing baselines by effectively learning a curriculum. These results also support the claim that weaker discriminators have higher entropy improving modes coverage.
APA, Harvard, Vancouver, ISO, and other styles
33

Amballa, Chaitanya, Manu K. Gupta, and Sanjay P. Bhat. "Computing an Efficient Exploration Basis for Learning with Univariate Polynomial Features." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6636–43. http://dx.doi.org/10.1609/aaai.v35i8.16821.

Full text
Abstract:
Barycentric spanners have been used as an efficient exploration basis in online linear optimization problems in a bandit framework. We characterise the barycentric spanner for decision problems in which the cost (or reward) is a polynomial in a single decision variable. Our characterisation of the barycentric spanner is two-fold: we show that the barycentric spanner under a polynomial cost function is the unique solution to a set of nonlinear algebraic equations, as well as the solution to a convex optimization problem. We provide numerical results to show that our method computes the barycentric spanner for the polynomial case significantly faster than the only other known algorithm for the purpose. As an application, we consider a dynamic pricing problem in which the revenue is an unknown polynomial function of the price. We then empirically show that the use of a barycentric spanner to initialise the prior distribution in a Thompson sampling setting leads to lower cumulative regret as compared to standard initialisations. We also illustrate the importance of barycentric spanners in adversarial settings by showing, both theoretically and empirically, that a barycentric spanner achieves the minimax value in a static adversarial linear regression problem where the learner selects the training points while an adversary selects the testing points and controls the variance of the noise corrupting the training samples
APA, Harvard, Vancouver, ISO, and other styles
34

Han, Song, Xinbin Li, Lei Yan, Jiajie Xu, Zhixin Liu, and Xinping Guan. "Joint resource allocation in underwater acoustic communication networks: A game-based hierarchical adversarial multiplayer multiarmed bandit algorithm." Information Sciences 454-455 (July 2018): 382–400. http://dx.doi.org/10.1016/j.ins.2018.05.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Richie, Rodney C. "Basics of Artificial Intelligence (AI) Modeling." Journal of Insurance Medicine 51, no. 1 (May 28, 2024): 35–40. http://dx.doi.org/10.17849/insm-51-1-35-40.1.

Full text
Abstract:
AI with machine learning and its subset deep learning are revolutionizing research into the morbidity and mortality of diseases and conditions. The major models of AI are discussed, with an attempt to simplify what many acknowledge as agnostic processing of vast amounts of data to arrive at a conclusion or diagnosis. Such models include convolutional neural networks, artificial neural networks, recurrent neural networks, generative adversarial networks, local interpretable model-agnostic explanations, shapley additive explanations, counterfactual explanations, multi-armed bandit models, deep-Q-learning models, fusion models, federated learning, predictive modeling, and disease outbreak prediction. Topics are well-referenced for further research. Methodology A key-word search of artificial intelligence, artificial intelligence in medicine, and artificial intelligence models was done in PubMed and Google Scholar yielded more than 100 articles that were reviewed for summation in this article.
APA, Harvard, Vancouver, ISO, and other styles
36

Farina, Gabriele, Robin Schmucker, and Tuomas Sandholm. "Bandit Linear Optimization for Sequential Decision Making and Extensive-Form Games." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 6 (May 18, 2021): 5372–80. http://dx.doi.org/10.1609/aaai.v35i6.16677.

Full text
Abstract:
Tree-form sequential decision making (TFSDM) extends classical one-shot decision making by modeling tree-form interactions between an agent and a potentially adversarial environment. It captures the online decision-making problems that each player faces in an extensive-form game, as well as Markov decision processes and partially-observable Markov decision processes where the agent conditions on observed history. Over the past decade, there has been considerable effort into designing online optimization methods for TFSDM. Virtually all of that work has been in the full-feedback setting, where the agent has access to counterfactuals, that is, information on what would have happened had the agent chosen a different action at any decision node. Little is known about the bandit setting, where that assumption is reversed (no counterfactual information is available), despite this latter setting being well understood for almost 20 years in one-shot decision making. In this paper, we give the first algorithm for the bandit linear optimization problem for TFSDM that offers both (i) linear-time iterations (in the size of the decision tree) and (ii) O(sqrt(T)) cumulative regret in expectation compared to any fixed strategy, at all times T. This is made possible by new results that we derive, which may have independent uses as well: 1) geometry of the dilated entropy regularizer, 2) autocorrelation matrix of the natural sampling scheme for sequence-form strategies, 3) construction of an unbiased estimator for linear losses for sequence-form strategies, and 4) a refined regret analysis for mirror descent when using the dilated entropy regularizer.
APA, Harvard, Vancouver, ISO, and other styles
37

Kamikokuryo, Kenta, Takumi Haga, Gentiane Venture, and Vincent Hernandez. "Adversarial Autoencoder and Multi-Armed Bandit for Dynamic Difficulty Adjustment in Immersive Virtual Reality for Rehabilitation: Application to Hand Movement." Sensors 22, no. 12 (June 14, 2022): 4499. http://dx.doi.org/10.3390/s22124499.

Full text
Abstract:
Motor rehabilitation is used to improve motor control skills to improve the patient’s quality of life. Regular adjustments based on the effect of therapy are necessary, but this can be time-consuming for the clinician. This study proposes to use an efficient tool for high-dimensional data by considering a deep learning approach for dimensionality reduction of hand movement recorded using a wireless remote control embedded with the Oculus Rift S. This latent space is created as a visualization tool also for use in a reinforcement learning (RL) algorithm employed to provide a decision-making framework. The data collected consists of motions drawn with wireless remote control in an immersive VR environment for six different motions called “Cube”, “Cylinder”, “Heart”, “Infinity”, “Sphere”, and “Triangle”. From these collected data, different artificial databases were created to simulate variations of the data. A latent space representation is created using an adversarial autoencoder (AAE), taking into account unsupervised (UAAE) and semi-supervised (SSAAE) training. Then, each test point is represented by a distance metric and used as a reward for two classes of Multi-Armed Bandit (MAB) algorithms, namely Boltzmann and Sibling Kalman filters. The results showed that AAE models can represent high-dimensional data in a two-dimensional latent space and that MAB agents can efficiently and quickly learn the distance evolution in the latent space. The results show that Sibling Kalman filter exploration outperforms Boltzmann exploration with an average cumulative weighted probability error of 7.9 versus 19.9 using the UAAE latent space representation and 8.0 versus 20.0 using SSAAE. In conclusion, this approach provides an effective approach to visualize and track current motor control capabilities regarding a target in order to reflect the patient’s abilities in VR games in the context of DDA.
APA, Harvard, Vancouver, ISO, and other styles
38

Méndez Lara, Francisco Iván. "Francisco Villa en la prensa carrancista (1914-1915). La construcción del adversario." Bibliographica 3, no. 1 (March 6, 2020): 211. http://dx.doi.org/10.22201/iib.2594178xe.2020.1.56.

Full text
Abstract:
La imagen de Francisco Villa, uno de los revolucionarios mexicanos más conocidos en el mundo, ha vivido diversas transformaciones en el transcurso del tiempo. Ello se debe a la gran variedad de versiones que existen acerca de su vida, particularmente sobre su pasado de “bandido”, las cuales propiciaron el surgimiento de diversas leyendas que él mismo celebró y difundió en muchas ocasiones. Las fotografías, el cine, la historia oral, pero particularmente la prensa, construyeron la imagen de Pancho Villa. Este artículo explica cómo y por qué se construyó la representación del divisionario duranguense en la prensa carrancista publicada en México –opositora a su proyecto revolucionario– durante la lucha de facciones, de finales de 1914 a mediados de 1915; dicho de otra forma, desde la ruptura revolucionaria hasta el triunfo carrancista en las batallas del Bajío.
APA, Harvard, Vancouver, ISO, and other styles
39

Riou, Matthieu, Bassam Jabaian, Stéphane Huet, and Fabrice Lefèvre. "Reinforcement adaptation of an attention-based neural natural language generator for spoken dialogue systems." Dialogue & Discourse 10, no. 1 (February 22, 2019): 1–19. http://dx.doi.org/10.5087/dad.2019.101.

Full text
Abstract:
Following some recent propositions to handle natural language generation in spoken dialogue systems with long short-term memory recurrent neural network models~\citep{Wen2016a} we first investigate a variant thereof with the objective of a better integration of the attention subnetwork. Then our next objective is to propose and evaluate a framework to adapt the NLG module online through direct interactions with the users. When doing so the basic way is to ask the user to utter an alternative sentence to express a particular dialogue act. But then the system has to decide between using an automatic transcription or to ask for a manual transcription. To do so a reinforcement learning approach based on an adversarial bandit scheme is retained. We show that by defining appropriately the rewards as a linear combination of expected payoffs and costs of acquiring the new data provided by the user, a system design can balance between improving the system's performance towards a better match with the user's preferences and the burden associated with it. Then the actual benefits of this system is assessed with a human evaluation, showing that the addition of more diverse utterances allows to produce sentences more satisfying for the user.
APA, Harvard, Vancouver, ISO, and other styles
40

Vu, Dong Quan, Patrick Loiseau, Alonso Silva, and Long Tran-Thanh. "Path Planning Problems with Side Observations—When Colonels Play Hide-and-Seek." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 2252–59. http://dx.doi.org/10.1609/aaai.v34i02.5602.

Full text
Abstract:
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, Exp3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide an expected-regret bound of Exp3-OE in SOPPP matching the order of the best benchmark in the literature. Moreover, we introduce additional assumptions on the observability model under which we can further improve the regret bounds of Exp3-OE. We illustrate the benefit of using Exp3-OE in SOPPP by applying it to the online CB and HS games.
APA, Harvard, Vancouver, ISO, and other styles
41

He, Jianhao, Feidiao Yang, Jialin Zhang, and Lvzhou Li. "Quantum algorithm for online convex optimization." Quantum Science and Technology 7, no. 2 (March 17, 2022): 025022. http://dx.doi.org/10.1088/2058-9565/ac5919.

Full text
Abstract:
Abstract We explore whether quantum advantages can be found for the zeroth-order online convex optimization (OCO) problem, which is also known as bandit convex optimization with multi-point feedback. In this setting, given access to zeroth-order oracles (that is, the loss function is accessed as a black box that returns the function value for any queried input), a player attempts to minimize a sequence of adversarially generated convex loss functions. This procedure can be described as a T round iterative game between the player and the adversary. In this paper, we present quantum algorithms for the problem and show for the first time that potential quantum advantages are possible for problems of OCO. Specifically, our contributions are as follows. (i) When the player is allowed to query zeroth-order oracles O(1) times in each round as feedback, we give a quantum algorithm that achieves O ( T ) regret without additional dependence of the dimension n, which outperforms the already known optimal classical algorithm only achieving O ( n T ) regret. Note that the regret of our quantum algorithm has achieved the lower bound of classical first-order methods. (ii) We show that for strongly convex loss functions, the quantum algorithm can achieve O(log T) regret with O(1) queries as well, which means that the quantum algorithm can achieve the same regret bound as the classical algorithms in the full information setting.
APA, Harvard, Vancouver, ISO, and other styles
42

Eckardt, Jan-Niklas, Waldemar Hahn, Christoph Röllig, Sebastian Stasik, Uwe Platzbecker, Carsten Müller-Tidow, Hubert Serve, et al. "Mimicking Clinical Trials with Synthetic Acute Myeloid Leukemia Patients Using Generative Artificial Intelligence." Blood 142, Supplement 1 (November 28, 2023): 2268. http://dx.doi.org/10.1182/blood-2023-179817.

Full text
Abstract:
Data sharing is often hindered by concerns of patient privacy, regulatory aspects, and proprietary interests thereby impeding scientific progress and establishing a gatekeeping mechanism in clinical medicine since obtaining large data sets is costly and time-consuming. We employed two different generative artificial intelligence (AI) technologies: CTAB-GAN+ and Normalizing Flows (NFlow) to synthesize clinical trial data based on pooled patient data from four previous multicenter clinical trials of the German Study Alliance Leukemia (AML96, AML2003, AML60+, SORAML) that enrolled adult patients (n=1606) with acute myeloid leukemia (AML) who received intensive induction therapy. As a generative adversarial network (GAN), CTAB-GAN+ consists of two adversarial networks: a generator producing synthetic samples from random noise and a discriminator aiming to distinguish between real and synthetic samples. The model converges as the discriminator can no longer reliably differentiate between real or synthetic data. Contrastingly, NFlow consists of a sequence of invertible transformations (flows) starting from a simple base distribution and gradually adding complexity to better mirror the training data. Both models were trained on tabular data including demographic, laboratory, molecular genetic and cytogenetic patient variables. Detection of molecular alterations in the original cohort was performed via next-generation sequencing (NGS) using the TruSight Myeloid Sequencing Panel (Illumina, San Diego, CA, USA) with a 5% variant-allele frequency (VAF) mutation calling cut-off. For cytogenetics, standard techniques for chromosome banding and fluorescence-in-situ-hybridization (FISH) were used. Hyperparameter tuning of generative models was conducted using the Optuna Framework. For each model, we used a total of 70 optimization trials to optimize a custom score inspired by TabSynDex which assesses both the resemblance of the synthetic data to real training data and its utility. Pairwise analyses were conducted between the original and both synthetic data sets, respectively. All tests were carried out as two-sided tests using a significance level α of 0.05. Table 1 summarizes baseline patient characteristics and outcome for both synthetic cohorts compared to the original cohort. Firstly, we found both models to adequately represent patient features, albeit that some individual variables showed a statistically significant deviation from the original cohort. It is important to note that for such a large sample size (n=1606 for each cohort), even miniscule differences can be rendered statistically significant notwithstanding any meaningful clinical difference. Interestingly, variables that deviated from the original distribution were different for both models indicating model architecture to play a vital role in sample representation: While CTAB-GAN+ showed significant deviations for both age and sex, NFlow showed significant deviations for AML status. Complete remission rate was similar between original (70.7%, odds ratio [OR]: 2.41) and CTAB-GAN+ (73.7%, OR: 2.81, p=0.059) and NFlow (69.1%, OR: 2.24, p=0.356). For event-free survival (EFS), which was not included as a target in hyperparameter tuning, both networks deviated significantly from the original cohort (original: median 7.2 months, HR: 1.36; CTAB-GAN+: median 12.8 months, HR 0.74, p<0.001; NFlow: median 9.0 months, HR: 0.87, p=0.001). Overall survival (OS) was well represented by NFlow compared to the original cohort, while CTAB-GAN+ showed a significant deviation (original: median 17.5 months, HR: 1.14; CTAB-GAN+: median 19.5 months, HR 0.88, p<0.001; NFlow: median 16.2 months, HR: 1.00, p=0.055). Both models showed an adequate graph representation in Kaplan-Meier analysis (Figure 1). Here, we demonstrate using two different generative AI technologies that synthetic data generation provides an attractive solution to circumvent issues in current standards of data collection and sharing. It effectively allows for bypassing logistical, organizational, and financial burdens, as well as regulatory and ethical concerns. Ultimately, this enables explorative research inquiries into previously inaccessible data sets and offers the prospect of fully synthetic control arms in prospective clinical trials.
APA, Harvard, Vancouver, ISO, and other styles
43

Immorlica, Nicole, Karthik Abinav Sankararaman, Robert Schapire, and Aleksandrs Slivkins. "Adversarial Bandits with Knapsacks." Journal of the ACM, August 18, 2022. http://dx.doi.org/10.1145/3557045.

Full text
Abstract:
We consider Bandits with Knapsacks (henceforth, BwK ), a general model for multi-armed bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a well-known knapsack problem : find an optimal packing of items into a limited-size knapsack. The BwK problem is a common generalization of numerous motivating examples, which range from dynamic pricing to repeated auctions to dynamic ad allocation to network routing and scheduling. While the prior work on BwK focused on the stochastic version, we pioneer the other extreme in which the outcomes can be chosen adversarially. This is a considerably harder problem, compared to both the stochastic version and the “classic” adversarial bandits, in that regret minimization is no longer feasible. Instead, the objective is to minimize the competitive ratio : the ratio of the benchmark reward to algorithm’s reward. We design an algorithm with competitive ratio O (log T ) relative to the best fixed distribution over actions, where T is the time horizon; we also prove a matching lower bound. The key conceptual contribution is a new perspective on the stochastic version of the problem. We suggest a new algorithm for the stochastic version, which builds on the framework of regret minimization in repeated games and admits a substantially simpler analysis compared to prior work. We then analyze this algorithm for the adversarial version, and use it as a subroutine to solve the latter. Our algorithm is the first “black-box reduction” from bandits to BwK: it takes an arbitrary bandit algorithm and uses it as a subroutine. We use this reduction to derive several extensions.
APA, Harvard, Vancouver, ISO, and other styles
44

Dong, Yanyan, and Vincent Y. F. Tan. "Adversarial Combinatorial Bandits with Switching Costs." IEEE Transactions on Information Theory, 2024, 1. http://dx.doi.org/10.1109/tit.2024.3384033.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Lykouris, Thodoris, Karthik Sridharan, and Éva Tardos. "Small-Loss Bounds for Online Learning with Partial Information." Mathematics of Operations Research, January 25, 2022. http://dx.doi.org/10.1287/moor.2021.1204.

Full text
Abstract:
We consider the problem of adversarial (nonstochastic) online learning with partial-information feedback, in which, at each round, a decision maker selects an action from a finite set of alternatives. We develop a black-box approach for such problems in which the learner observes as feedback only losses of a subset of the actions that includes the selected action. When losses of actions are nonnegative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called “small-loss” [Formula: see text] regret bounds with high probability, where α is the independence number of the graph and [Formula: see text] is the loss of the best action. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e., utilizing the increased information feedback). Taking advantage of the black-box nature of our technique, we extend our results to many other applications, such as combinatorial semi-bandits (including routing in networks), contextual bandits (even with an infinite comparator class), and learning with slowly changing (shifting) comparators. In the special case of multi-armed bandit and combinatorial semi-bandit problems, we provide optimal small-loss, high-probability regret guarantees of [Formula: see text], where d is the number of actions, answering open questions of Neu. Previous bounds for multi-armed bandits and semi-bandits were known only for pseudo-regret and only in expectation. We also offer an optimal [Formula: see text] regret guarantee for fixed feedback graphs with clique-partition number at most κ.
APA, Harvard, Vancouver, ISO, and other styles
46

Alipour-Fanid, Amir, Monireh Dabaghchian, and Kai Zeng. "Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs." IEEE Transactions on Neural Networks and Learning Systems, 2021, 1–15. http://dx.doi.org/10.1109/tnnls.2021.3110194.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Zhou, Datong, and Claire Tomlin. "Budget-Constrained Multi-Armed Bandits With Multiple Plays." Proceedings of the AAAI Conference on Artificial Intelligence 32, no. 1 (April 29, 2018). http://dx.doi.org/10.1609/aaai.v32i1.11629.

Full text
Abstract:
We study the multi-armed bandit problem with multiple plays and a budget constraint for both the stochastic and the adversarial setting. At each round, exactly K out of N possible arms have to be played (with 1 ≤ K <= N). In addition to observing the individual rewards for each arm played, the player also learns a vector of costs which has to be covered with an a-priori defined budget B. The game ends when the sum of current costs associated with the played arms exceeds the remaining budget. Firstly, we analyze this setting for the stochastic case, for which we assume each arm to have an underlying cost and reward distribution with support [cmin, 1] and [0, 1], respectively. We derive an Upper Confidence Bound (UCB) algorithm which achieves O(NK4 log B) regret. Secondly, for the adversarial case in which the entire sequence of rewards and costs is fixed in advance, we derive an upper bound on the regret of order O(√NB log(N/K)) utilizing an extension of the well-known Exp3 algorithm. We also provide upper bounds that hold with high probability and a lower bound of order Ω((1 – K/N) √NB/K).
APA, Harvard, Vancouver, ISO, and other styles
48

Li, Yandi, Jianxiong Guo, Yupeng Li, Tian Wang, and Weijia Jia. "Adversarial Bandits With Multi-User Delayed Feedback: Theory and Application." IEEE Transactions on Mobile Computing, 2024, 1–15. http://dx.doi.org/10.1109/tmc.2024.3362237.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Tossou, Aristide, and Christos Dimitrakakis. "Achieving Privacy in the Adversarial Multi-Armed Bandit." Proceedings of the AAAI Conference on Artificial Intelligence 31, no. 1 (February 13, 2017). http://dx.doi.org/10.1609/aaai.v31i1.10896.

Full text
Abstract:
In this paper, we improve the previously best known regret bound to achieve ε-differential privacy in oblivious adversarial bandits from O(T2/3 /ε) to O(√T lnT/ε). This is achieved by combining a Laplace Mechanism with EXP3. We show that though EXP3 is already differentially private, it leaks a linear amount of information in T. However, we can improve this privacy by relying on its intrinsic exponential mechanism for selecting actions. This allows us to reach O(√ ln T)-DP, with a a regret of O(T2/3) that holds against an adaptive adversary, an improvement from the best known of O(T3/4). This is done by using an algorithm that run EXP3 in a mini-batch loop. Finally, we run experiments that clearly demonstrate the validity of our theoretical analysis.
APA, Harvard, Vancouver, ISO, and other styles
50

Huang, Yin, Lei Wang, and Jie Xu. "Quantum Entanglement Path Selection and Qubit Allocation via Adversarial Group Neural Bandits." IEEE/ACM Transactions on Networking, 2024, 1–12. https://doi.org/10.1109/tnet.2024.3510550.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography