Relevant bibliographies by topics / Combinatorial multi-armed bandits

Academic literature on the topic 'Combinatorial multi-armed bandits'

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Book chapters
Conference papers

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Combinatorial multi-armed bandits.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Combinatorial multi-armed bandits"

Ontañón, Santiago. "Combinatorial Multi-armed Bandits for Real-Time Strategy Games." Journal of Artificial Intelligence Research 58 (March 29, 2017): 665–702. http://dx.doi.org/10.1613/jair.5398.

Full text

Abstract:

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called "naive sampling", based on a variant of the Multi-armed Bandit problem called "Combinatorial Multi-armed Bandits" (CMAB). We analyze the theoretical properties of several variants of naive sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, naive sampling outperforms the other sampling strategies.

APA, Harvard, Vancouver, ISO, and other styles

Xu, Lily, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, and Milind Tambe. "Dual-Mandate Patrols: Multi-Armed Bandits for Green Security." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 14974–82. http://dx.doi.org/10.1609/aaai.v35i17.17757.

Full text

Abstract:

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

APA, Harvard, Vancouver, ISO, and other styles

Zhou, Huozhi, Lingda Wang, Lav Varshney, and Ee-Peng Lim. "A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6933–40. http://dx.doi.org/10.1609/aaai.v34i04.6176.

Full text

Abstract:

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

APA, Harvard, Vancouver, ISO, and other styles

Moraes, Rubens, Julian Mariño, Levi Lelis, and Mario Nascimento. "Action Abstractions for Combinatorial Multi-Armed Bandit Tree Search." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 14, no. 1 (September 25, 2018): 74–80. http://dx.doi.org/10.1609/aiide.v14i1.13018.

Full text

Abstract:

Search algorithms based on combinatorial multi-armed bandits (CMABs) are promising for dealing with state-space sequential decision problems. However, current CMAB-based algorithms do not scale to problem domains with very large actions spaces, such as real-time strategy games played in large maps. In this paper we introduce CMAB-based search algorithms that use action abstraction schemes to reduce the action space considered during search. One of the approaches we introduce use regular action abstractions (A1N), while the other two use asymmetric action abstractions (A2N and A3N). Empirical results on MicroRTS show that A1N, A2N, and A3N are able to outperform an existing CMAB-based algorithm in matches played in large maps, and A3N is able to outperform all state-of-the-art search algorithms tested.

APA, Harvard, Vancouver, ISO, and other styles

Du, Yihan, Siwei Wang, and Longbo Huang. "A One-Size-Fits-All Solution to Conservative Bandit Problems." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7254–61. http://dx.doi.org/10.1609/aaai.v35i8.16891.

Full text

Abstract:

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.

APA, Harvard, Vancouver, ISO, and other styles

Agarwal, Mridul, Vaneet Aggarwal, Abhishek Kumar Umrawal, and Chris Quinn. "DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6557–65. http://dx.doi.org/10.1609/aaai.v35i8.16812.

Full text

Abstract:

We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of Õ(K√KNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √log 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.

APA, Harvard, Vancouver, ISO, and other styles

Gai, Yi, Bhaskar Krishnamachari, and Rahul Jain. "Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations." IEEE/ACM Transactions on Networking 20, no. 5 (October 2012): 1466–78. http://dx.doi.org/10.1109/tnet.2011.2181864.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ontanon, Santiago. "The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 9, no. 1 (June 30, 2021): 58–64. http://dx.doi.org/10.1609/aiide.v9i1.12681.

Full text

Abstract:

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called "Naive Sampling", based on a variant of the Multi-armed Bandit problem called the "Combinatorial Multi-armed Bandit" (CMAB) problem. We present a new MCTS algorithm based on Naive Sampling called NaiveMCTS, and evaluate it in the context of real-time strategy (RTS) games. Our results show that as the branching factor grows, NaiveMCTS performs significantly better than other algorithms.

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Chen, and Steven C. H. Hoi. "Partially Observable Multi-Sensor Sequential Change Detection: A Combinatorial Multi-Armed Bandit Approach." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5733–40. http://dx.doi.org/10.1609/aaai.v33i01.33015733.

Full text

Abstract:

This paper explores machine learning to address a problem of Partially Observable Multi-sensor Sequential Change Detection (POMSCD), where only a subset of sensors can be observed to monitor a target system for change-point detection at each online learning round. In contrast to traditional Multisensor Sequential Change Detection tasks where all the sensors are observable, POMSCD is much more challenging because the learner not only needs to detect on-the-fly whether a change occurs based on partially observed multi-sensor data streams, but also needs to cleverly choose a subset of informative sensors to be observed in the next learning round, in order to maximize the overall sequential change detection performance. In this paper, we present the first online learning study to tackle POMSCD in a systemic and rigorous way. Our approach has twofold novelties: (i) we attempt to detect changepoints from partial observations effectively by exploiting potential correlations between sensors, and (ii) we formulate the sensor subset selection task as a Multi-Armed Bandit (MAB) problem and develop an effective adaptive sampling strategy using MAB algorithms. We offer theoretical analysis for the proposed online learning solution, and further validate its empirical performance via an extensive set of numerical studies together with a case study on real-world data sets.

APA, Harvard, Vancouver, ISO, and other styles

Nasim, Imtiaz, Ahmed S. Ibrahim, and Seungmo Kim. "Learning-Based Beamforming for Multi-User Vehicular Communications: A Combinatorial Multi-Armed Bandit Approach." IEEE Access 8 (2020): 219891–902. http://dx.doi.org/10.1109/access.2020.3043301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Combinatorial multi-armed bandits"

Talebi, Mazraeh Shahi Mohammad Sadegh. "Minimizing Regret in Combinatorial Bandits and Reinforcement Learning." Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970.

Full text

Abstract:

This thesis investigates sequential decision making tasks that fall in the framework of reinforcement learning (RL). These tasks involve a decision maker repeatedly interacting with an environment modeled by an unknown finite Markov decision process (MDP), who wishes to maximize a notion of reward accumulated during her experience. Her performance can be measured through the notion of regret, which compares her accumulated expected reward against that achieved by an oracle algorithm always following an optimal behavior. In order to maximize her accumulated reward, or equivalently to minimize the regret, she needs to face a trade-off between exploration and exploitation. The first part of this thesis investigates combinatorial multi-armed bandit (MAB) problems, which are RL problems whose state-space is a singleton. It also addresses some applications that can be cast as combinatorial MAB problems. The number of arms in such problems generically grows exponentially with the number of basic actions, but the rewards of various arms are correlated. Hence, the challenge in such problems is to exploit the underlying combinatorial structure.For these problems, we derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any admissible algorithm and investigate how these bounds scale with the dimension of the underlying combinatorial structure. We then propose several algorithms and provide finite-time analyses of their regret. The proposed algorithms efficiently exploit the structure of the problem, provide better performance guarantees than existing algorithms, and significantly outperform these algorithms in practice. The second part of the thesis concerns RL in an unknown and discrete MDP under the average-reward criterion. We develop some variations of the transportation lemma that could serve as novel tools for the regret analysis of RL algorithms. Revisiting existing regret lower bounds allows us to derive alternative bounds, which motivate that the local variance of the bias function of the MDP, i.e., the variance with respect to next-state transition laws, could serve as a notion of problem complexity for regret minimization in RL. Leveraging these tools also allows us to report a novel regret analysis of the KL-UCRL algorithm for ergodic MDPs. The leading term in our regret bound depends on the local variance of the bias function, thus coinciding with observations obtained from our presented lower bounds. Numerical evaluations in some benchmark MDPs indicate that the leading term of the derived bound can provide an order of magnitude improvement over previously known results for this algorithm.

QC 20171215

APA, Harvard, Vancouver, ISO, and other styles

Talebi, Mazraeh Shahi Mohammad Sadegh. "Online Combinatorial Optimization under Bandit Feedback." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-181321.

Full text

Abstract:

Multi-Armed Bandits (MAB) constitute the most fundamental model for sequential decision making problems with an exploration vs. exploitation trade-off. In such problems, the decision maker selects an arm in each round and observes a realization of the corresponding unknown reward distribution. Each decision is based on past decisions and observed rewards. The objective is to maximize the expected cumulative reward over some time horizon by balancing exploitation (arms with higher observed rewards should be selectedoften) and exploration (all arms should be explored to learn their average rewards). Equivalently, the performanceof a decision rule or algorithm can be measured through its expected regret, defined as the gap betweenthe expected reward achieved by the algorithm and that achieved by an oracle algorithm always selecting the bestarm. This thesis investigates stochastic and adversarial combinatorial MAB problems, where each arm is a collection of several basic actions taken from a set of $d$ elements, in a way that the set of arms has a certain combinatorial structure. Examples of such sets include the set of fixed-size subsets, matchings, spanning trees, paths, etc. These problems are specific forms of online linear optimization, where the decision space is a subset of $d$-dimensional hypercube.Due to the combinatorial nature, the number of arms generically grows exponentially with $d$. Hence, treating arms as independent and applying classical sequential arm selection policies would yield a prohibitive regret. It may then be crucial to exploit the combinatorial structure of the problem to design efficient arm selection algorithms.As the first contribution of this thesis, in Chapter 3 we investigate combinatorial MABs in the stochastic setting and with Bernoulli rewards. We derive asymptotic (i.e., when the time horizon grows large) lower bounds on the regret of any algorithm under bandit and semi-bandit feedback. The proposed lower bounds are problem-specific and tight in the sense that there exists an algorithm that achieves these regret bounds. Our derivation leverages some theoretical results in adaptive control of Markov chains. Under semi-bandit feedback, we further discuss the scaling of the proposed lower bound with the dimension of the underlying combinatorial structure. For the case of semi-bandit feedback, we propose ESCB, an algorithm that efficiently exploits the structure of the problem and provide a finite-time analysis of its regret. ESCB has better performance guarantees than existing algorithms, and significantly outperforms these algorithms in practice. In the fourth chapter, we consider stochastic combinatorial MAB problems where the underlying combinatorial structure is a matroid. Specializing the results of Chapter 3 to matroids, we provide explicit regret lower bounds for this class of problems. For the case of semi-bandit feedback, we propose KL-OSM, a computationally efficient greedy-based algorithm that exploits the matroid structure. Through a finite-time analysis, we prove that the regret upper bound of KL-OSM matches the proposed lower bound, thus making it the first asymptotically optimal algorithm for this class of problems. Numerical experiments validate that KL-OSM outperforms state-of-the-art algorithms in practice, as well.In the fifth chapter, we investigate the online shortest-path routing problem which is an instance of combinatorial MABs with geometric rewards. We consider and compare three different types of online routing policies, depending (i) on where routing decisions are taken (at the source or at each node), and (ii) on the received feedback (semi-bandit or bandit). For each case, we derive the asymptotic regret lower bound. These bounds help us to understand the performance improvements we can expect when (i) taking routing decisions at each hop rather than at the source only, and (ii) observing per-link delays rather than end-to-end path delays. In particular, we show that (i) is of no use while (ii) can have a spectacular impact.For source routing under semi-bandit feedback, we then propose two algorithms with a trade-off betweencomputational complexity and performance. The regret upper bounds of these algorithms improve over those ofthe existing algorithms, and they significantly outperform state-of-the-art algorithms in numerical experiments. Finally, we discuss combinatorial MABs in the adversarial setting and under bandit feedback. We concentrate on the case where arms have the same number of basic actions but are otherwise arbitrary. We propose CombEXP, an algorithm that has the same regret scaling as state-of-the-art algorithms. Furthermore, we show that CombEXP admits lower computational complexity for some combinatorial problems.

QC 20160201

APA, Harvard, Vancouver, ISO, and other styles

Ferreira, Alexandre Silvestre. "A cross-domain multi-armed bandit hyper-heuristic." reponame:Repositório Institucional da UFPR, 2016. http://hdl.handle.net/1884/41803.

Full text

Abstract:

Orientadora : Profª. Drª. Aurora Pozo
Co-orientador : Prof. Dr. Richard Aderbal Gonçalves
Dissertação (mestrado) - Universidade Federal do Paraná, Setor de Ciências Exatas, Programa de Pós-Graduação em Informática. Defesa: Curitiba, 26/02/2016
Inclui referências : f. 64-70
Resumo: Muitos problemas de otimização do mundo real são complexos e possuem muitas variáveis e restrições. Por esta causa, o uso de meta-heurísticas tornou-se a principal maneira de resolver problemas com essas características. Uma das principais desvantagens do uso de meta-heurísticas e que são geralmente desenvolvidas utilizando características do domínio fazendo com que sejam atreladas a ele dificultando sua utilização em outros problemas. Em buscas de algoritmos mais adaptáveis o conceito de hiper-heurísticas surgiu. Hiper- heurísticas são métodos de busca que visam solucionar problemas de otimização selecionando ou gerando heurísticas. Hiper-heurísticas de seleção escolhem uma boa heurística para ser aplicada a partir de um conjunto de heurísticas. O método de seleção e a principal peca de uma hiper-heurística de seleção tendo impacto fundamental em sua performance. Apesar de existirem vários trabalhos sobre hiper-heurísticas de seleção, ainda não existe consenso sobre como uma boa estratégia de seleção deve ser definida. Em busca de uma estratégia de seleção, algoritmos inspirados nos conceitos do problema Multi-Armed Bandit (MAB) serão estudados. Estes algoritmos foram aplicados ao contexto da Seleção Adaptativa de Operadores obtendo resultados promissores. Entretanto, ainda existem poucas abordagens para o contexto de hiper-heurísticas. Nesta dissertação propomos uma hiper-heurística que utiliza algoritmos MAB como sua estratégia de seleção. A abordagem proposta e desenvolvida utilizando o framework HyFlex, que foi proposto para facilitar a implementação e comparação de novas Hiper- heurísticas. Os parâmetros foram configurados através de um estudo empírico, e a melhor configuração encontrada foi comparada com os 10 primeiros colocados da competição CHeSC 2011. Os resultados obtidos foram bons e comparáveis com os das melhores abordagens da literatura. O algoritmo proposto alcançou a quarta colocação. Apesar dos bons resultados, os experimentos demonstram que a abordagem proposta sofre grande influencia dos parâmetros. Trabalhos futuros irão investigar formas de amenizar esta influência.
Abstract: Many real word optimization problems are very complex with many variables and constraints, and cannot be solved by exact methods in a reasonable computational time. As an alternative, meta-heuristics emerged as an efficient way to solve this type of problems even though they cannot ensure optimal values. The main issue of meta-heuristics is that they are built using domain-specific knowledge, therefore they require a great effort to be used in a new domain. In order to solve this problem, the concept of Hyper-heuristics were proposed. Hyper-heuristics are search methods that aim to solve optimization problems by selecting or generating heuristics. Selection hyper-heuristics choose from a pool of heuristics a good one to be applied at the current stage of the optimization process. The selection mechanism is the main part of a selection hyper-heuristic and has a great impact on its performance. Although there are several works focused on selection hyperheuristics, there is no unanimity about which is the best way to define a selection strategy. In this dissertation, a deterministic selection strategy based on the concepts of the MultiArmed Bandit (MAB) problem is proposed to cross-domain optimization. Multi-armed bandit approaches define a selection function with two components, the first is based on the performance of an operator and the second based on the number of times that the operator was used. These approaches had showed a promising performance over the Adaptive Operator Selection context. However, there are few works on literature that aim the hyper-heuristic context, as proposed here. The proposed approach is integrated into the HyFlex framework, that was developed to facilitate the implementation and comparison of hyper-heuristics. An empirical parameter configuration was performed and the best setup was compared to the top ten CHeSC 2011 algorithms using the same methodology adopted during the competition. The results obtained were good comparable to those attained by the literature. Moreover, it was concluded that the behavior of MAB selection is heavily affected by its parameters. As this is not a desirable behavior to hyper-heuristics, future research will investigate ways to better deal with the parameter setting.

APA, Harvard, Vancouver, ISO, and other styles

Prakash, Gujar Sujit. "Novel Mechanisms For Allocation Of Heterogeneous Items In Strategic Settings." Thesis, 2010. http://etd.iisc.ernet.in/handle/2005/1654.

Full text

Abstract:

Allocation of objects or resources to competing agents is a ubiquitous problem in the real world. For example, a federal government may wish to allocate different types of spectrum licenses to telecom service providers; a search engine has to assign different sponsored slots to the ads of advertisers; etc. The agents involved in such situations have private preferences over the allocations. The agents, being strategic, may manipulate the allocation procedure to get a favourable allocation. If the objects to be allocated are heterogeneous (rather than homogeneous), the problem becomes quite complex. The allocation problem becomes even more formidable in the presence of a dynamic supply and/or demand. This doctoral work is motivated by such problems involving strategic agents, heterogeneous objects, and dynamic supply and/or demand. In this thesis, we model such problems in a standard game theoretic setting and use mechanism design to propose novel solutions to the problems. We extend the current state-of-the-art in a non-trivial way by solving the following problems: Optimal combinatorial auctions with single minded bidders, generalizing the existing methods to take into account multiple units of heterogeneous objects Multi-armed bandit mechanisms for sponsored search auctions with multiple slots, generalizing the current methods that only consider a single slot. Strategyproof redistribution mechanisms for heterogeneous objects, expanding the scope of the current state of practice beyond homogeneous objects Online allocation mechanisms without money for one-sided and two-sided matching markets, extending the existing methods for static settings.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Combinatorial multi-armed bandits"

Mandaglio, Domenico, and Andrea Tagarelli. "A Combinatorial Multi-Armed Bandit Based Method for Dynamic Consensus Community Detection in Temporal Networks." In Discovery Science, 412–27. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-33778-0_31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Combinatorial multi-armed bandits"

Zuo, Jinhang, and Carlee Joe-Wong. "Combinatorial Multi-armed Bandits for Resource Allocation." In 2021 55th Annual Conference on Information Sciences and Systems (CISS). IEEE, 2021. http://dx.doi.org/10.1109/ciss50987.2021.9400228.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tang, Shaojie, Yaqin Zhou, Kai Han, Zhao Zhang, Jing Yuan, and Weili Wu. "Networked Stochastic Multi-armed Bandits with Combinatorial Strategies." In 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS). IEEE, 2017. http://dx.doi.org/10.1109/icdcs.2017.303.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Egger, Maximilian, Rawad Bitar, Antonia Wachter-Zeh, and Deniz Gunduz. "Efficient Distributed Machine Learning via Combinatorial Multi-Armed Bandits." In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022. http://dx.doi.org/10.1109/isit50566.2022.9834499.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Xu, Huanle, Yang Liu, Wing Cheong Lau, and Rui Li. "Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/354.

Full text

Abstract:

The problem of multi-armed bandit (MAB) with fairness constraint has emerged as an important research topic recently. For such problems, one common objective is to maximize the total rewards within a fixed round of pulls, while satisfying the fairness requirement of a minimum selection fraction for each individual arm in the long run. Previous works have made substantial advancements in designing efficient online selection solutions, however, they fail to achieve a sublinear regret bound when incorporating such fairness constraints. In this paper, we study a combinatorial MAB problem with concave objective and fairness constraints. In particular, we adopt a new approach that combines online convex optimization with bandit methods to design selection algorithms. Our algorithm is computationally efficient, and more importantly, manages to achieve a sublinear regret bound with probability guarantees. Finally, we evaluate the performance of our algorithm via extensive simulations and demonstrate that it outperforms the baselines substantially.

APA, Harvard, Vancouver, ISO, and other styles

Kang, Sunjung, and Changhee Joo. "Combinatorial multi-armed bandits in cognitive radio networks: A brief overview." In 2017 International Conference on Information and Communication Technology Convergence (ICTC). IEEE, 2017. http://dx.doi.org/10.1109/ictc.2017.8190862.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Huang, Hanxun, Xingjun Ma, Sarah M. Erfani, and James Bailey. "Neural Architecture Search via Combinatorial Multi-Armed Bandit." In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021. http://dx.doi.org/10.1109/ijcnn52387.2021.9533655.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Song, Yiwen, and Haiming Jin. "Minimizing Entropy for Crowdsourcing with Combinatorial Multi-Armed Bandit." In IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. IEEE, 2021. http://dx.doi.org/10.1109/infocom42981.2021.9488800.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mandaglio, Domenico, and Andrea Tagarelli. "Dynamic consensus community detection and combinatorial multi-armed bandit." In ASONAM '19: International Conference on Advances in Social Networks Analysis and Mining. New York, NY, USA: ACM, 2019. http://dx.doi.org/10.1145/3341161.3342910.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yi Gai, B. Krishnamachari, and Mingyan Liu. "On the Combinatorial Multi-Armed Bandit Problem with Markovian Rewards." In 2011 IEEE Global Communications Conference (GLOBECOM 2011). IEEE, 2011. http://dx.doi.org/10.1109/glocom.2011.6134244.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gao, Guoju, He Huang, Mingjun Xiao, Jie Wu, Yu-E. Sun, and Sheng Zhang. "Auction-Based Combinatorial Multi-Armed Bandit Mechanisms with Strategic Arms." In IEEE INFOCOM 2021 - IEEE Conference on Computer Communications. IEEE, 2021. http://dx.doi.org/10.1109/infocom42981.2021.9488765.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Combinatorial multi-armed bandits'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Combinatorial multi-armed bandits"

Dissertations / Theses on the topic "Combinatorial multi-armed bandits"

Book chapters on the topic "Combinatorial multi-armed bandits"

Conference papers on the topic "Combinatorial multi-armed bandits"