Log in

Relevant bibliographies by topics / Combinatorial multi-armed bandits / Journal articles

Journal articles on the topic 'Combinatorial multi-armed bandits'

To see the other types of publications on this topic, follow the link: Combinatorial multi-armed bandits.

Author: Grafiati

Published: 10 December 2022

Last updated: 28 January 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 26 journal articles for your research on the topic 'Combinatorial multi-armed bandits.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Ontañón, Santiago. "Combinatorial Multi-armed Bandits for Real-Time Strategy Games." Journal of Artificial Intelligence Research 58 (March 29, 2017): 665–702. http://dx.doi.org/10.1613/jair.5398.

Full text

Abstract:

Games with large branching factors pose a significant challenge for game tree search algorithms. In this paper, we address this problem with a sampling strategy for Monte Carlo Tree Search (MCTS) algorithms called "naive sampling", based on a variant of the Multi-armed Bandit problem called "Combinatorial Multi-armed Bandits" (CMAB). We analyze the theoretical properties of several variants of naive sampling, and empirically compare it against the other existing strategies in the literature for CMABs. We then evaluate these strategies in the context of real-time strategy (RTS) games, a genre of computer games characterized by their very large branching factors. Our results show that as the branching factor grows, naive sampling outperforms the other sampling strategies.

APA, Harvard, Vancouver, ISO, and other styles

2

Xu, Lily, Elizabeth Bondi, Fei Fang, Andrew Perrault, Kai Wang, and Milind Tambe. "Dual-Mandate Patrols: Multi-Armed Bandits for Green Security." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 17 (May 18, 2021): 14974–82. http://dx.doi.org/10.1609/aaai.v35i17.17757.

Full text

Abstract:

Conservation efforts in green security domains to protect wildlife and forests are constrained by the limited availability of defenders (i.e., patrollers), who must patrol vast areas to protect from attackers (e.g., poachers or illegal loggers). Defenders must choose how much time to spend in each region of the protected area, balancing exploration of infrequently visited regions and exploitation of known hotspots. We formulate the problem as a stochastic multi-armed bandit, where each action represents a patrol strategy, enabling us to guarantee the rate of convergence of the patrolling policy. However, a naive bandit approach would compromise short-term performance for long-term optimality, resulting in animals poached and forests destroyed. To speed up performance, we leverage smoothness in the reward function and decomposability of actions. We show a synergy between Lipschitz-continuity and decomposition as each aids the convergence of the other. In doing so, we bridge the gap between combinatorial and Lipschitz bandits, presenting a no-regret approach that tightens existing guarantees while optimizing for short-term performance. We demonstrate that our algorithm, LIZARD, improves performance on real-world poaching data from Cambodia.

APA, Harvard, Vancouver, ISO, and other styles

3

Zhou, Huozhi, Lingda Wang, Lav Varshney, and Ee-Peng Lim. "A Near-Optimal Change-Detection Based Algorithm for Piecewise-Stationary Combinatorial Semi-Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 04 (April 3, 2020): 6933–40. http://dx.doi.org/10.1609/aaai.v34i04.6176.

Full text

Abstract:

We investigate the piecewise-stationary combinatorial semi-bandit problem. Compared to the original combinatorial semi-bandit problem, our setting assumes the reward distributions of base arms may change in a piecewise-stationary manner at unknown time steps. We propose an algorithm, GLR-CUCB, which incorporates an efficient combinatorial semi-bandit algorithm, CUCB, with an almost parameter-free change-point detector, the Generalized Likelihood Ratio Test (GLRT). Our analysis shows that the regret of GLR-CUCB is upper bounded by O(√NKT log T), where N is the number of piecewise-stationary segments, K is the number of base arms, and T is the number of time steps. As a complement, we also derive a nearly matching regret lower bound on the order of Ω(√NKT), for both piecewise-stationary multi-armed bandits and combinatorial semi-bandits, using information-theoretic techniques and judiciously constructed piecewise-stationary bandit instances. Our lower bound is tighter than the best available regret lower bound, which is Ω(√T). Numerical experiments on both synthetic and real-world datasets demonstrate the superiority of GLR-CUCB compared to other state-of-the-art algorithms.

APA, Harvard, Vancouver, ISO, and other styles

4

Moraes, Rubens, Julian Mariño, Levi Lelis, and Mario Nascimento. "Action Abstractions for Combinatorial Multi-Armed Bandit Tree Search." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 14, no. 1 (September 25, 2018): 74–80. http://dx.doi.org/10.1609/aiide.v14i1.13018.

Full text

Abstract:

Search algorithms based on combinatorial multi-armed bandits (CMABs) are promising for dealing with state-space sequential decision problems. However, current CMAB-based algorithms do not scale to problem domains with very large actions spaces, such as real-time strategy games played in large maps. In this paper we introduce CMAB-based search algorithms that use action abstraction schemes to reduce the action space considered during search. One of the approaches we introduce use regular action abstractions (A1N), while the other two use asymmetric action abstractions (A2N and A3N). Empirical results on MicroRTS show that A1N, A2N, and A3N are able to outperform an existing CMAB-based algorithm in matches played in large maps, and A3N is able to outperform all state-of-the-art search algorithms tested.

APA, Harvard, Vancouver, ISO, and other styles

5

Du, Yihan, Siwei Wang, and Longbo Huang. "A One-Size-Fits-All Solution to Conservative Bandit Problems." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7254–61. http://dx.doi.org/10.1609/aaai.v35i8.16891.

Full text

Abstract:

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.

APA, Harvard, Vancouver, ISO, and other styles

6

Agarwal, Mridul, Vaneet Aggarwal, Abhishek Kumar Umrawal, and Chris Quinn. "DART: Adaptive Accept Reject Algorithm for Non-Linear Combinatorial Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 6557–65. http://dx.doi.org/10.1609/aaai.v35i8.16812.

Full text

Abstract:

We consider the bandit problem of selecting K out of N arms at each time step. The joint reward can be a non-linear function of the rewards of the selected individual arms. The direct use of a multi-armed bandit algorithm requires choosing among all possible combinations, making the action space large. To simplify the problem, existing works on combinatorial bandits typically assume feedback as a linear function of individual rewards. In this paper, we prove the lower bound for top-K subset selection with bandit feedback with possibly correlated rewards. We present a novel algorithm for the combinatorial setting without using individual arm feedback or requiring linearity of the reward function. Additionally, our algorithm works on correlated rewards of individual arms. Our algorithm, aDaptive Accept RejecT (DART), sequentially finds good arms and eliminates bad arms based on confidence bounds. DART is computationally efficient and uses storage linear in N. Further, DART achieves a regret bound of Õ(K√KNT) for a time horizon T, which matches the lower bound in bandit feedback up to a factor of √log 2NT. When applied to the problem of cross-selling optimization and maximizing the mean of individual rewards, the performance of the proposed algorithm surpasses that of state-of-the-art algorithms. We also show that DART significantly outperforms existing methods for both linear and non-linear joint reward environments.

APA, Harvard, Vancouver, ISO, and other styles

7

Gai, Yi, Bhaskar Krishnamachari, and Rahul Jain. "Combinatorial Network Optimization With Unknown Variables: Multi-Armed Bandits With Linear Rewards and Individual Observations." IEEE/ACM Transactions on Networking 20, no. 5 (October 2012): 1466–78. http://dx.doi.org/10.1109/tnet.2011.2181864.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Ontanon, Santiago. "The Combinatorial Multi-Armed Bandit Problem and Its Application to Real-Time Strategy Games." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 9, no. 1 (June 30, 2021): 58–64. http://dx.doi.org/10.1609/aiide.v9i1.12681.

Full text

Abstract:

Game tree search in games with large branching factors is a notoriously hard problem. In this paper, we address this problem with a new sampling strategy for Monte Carlo Tree Search (MCTS) algorithms, called "Naive Sampling", based on a variant of the Multi-armed Bandit problem called the "Combinatorial Multi-armed Bandit" (CMAB) problem. We present a new MCTS algorithm based on Naive Sampling called NaiveMCTS, and evaluate it in the context of real-time strategy (RTS) games. Our results show that as the branching factor grows, NaiveMCTS performs significantly better than other algorithms.

APA, Harvard, Vancouver, ISO, and other styles

9

Zhang, Chen, and Steven C. H. Hoi. "Partially Observable Multi-Sensor Sequential Change Detection: A Combinatorial Multi-Armed Bandit Approach." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 5733–40. http://dx.doi.org/10.1609/aaai.v33i01.33015733.

Full text

Abstract:

This paper explores machine learning to address a problem of Partially Observable Multi-sensor Sequential Change Detection (POMSCD), where only a subset of sensors can be observed to monitor a target system for change-point detection at each online learning round. In contrast to traditional Multisensor Sequential Change Detection tasks where all the sensors are observable, POMSCD is much more challenging because the learner not only needs to detect on-the-fly whether a change occurs based on partially observed multi-sensor data streams, but also needs to cleverly choose a subset of informative sensors to be observed in the next learning round, in order to maximize the overall sequential change detection performance. In this paper, we present the first online learning study to tackle POMSCD in a systemic and rigorous way. Our approach has twofold novelties: (i) we attempt to detect changepoints from partial observations effectively by exploiting potential correlations between sensors, and (ii) we formulate the sensor subset selection task as a Multi-Armed Bandit (MAB) problem and develop an effective adaptive sampling strategy using MAB algorithms. We offer theoretical analysis for the proposed online learning solution, and further validate its empirical performance via an extensive set of numerical studies together with a case study on real-world data sets.

APA, Harvard, Vancouver, ISO, and other styles

10

Nasim, Imtiaz, Ahmed S. Ibrahim, and Seungmo Kim. "Learning-Based Beamforming for Multi-User Vehicular Communications: A Combinatorial Multi-Armed Bandit Approach." IEEE Access 8 (2020): 219891–902. http://dx.doi.org/10.1109/access.2020.3043301.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Chen, Xiaowei, Jiawei Xue, Zengxiang Lei, Xinwu Qian, and Satish V. Ukkusuri. "Online eco-routing for electric vehicles using combinatorial multi-armed bandit with estimated covariance." Transportation Research Part D: Transport and Environment 111 (October 2022): 103447. http://dx.doi.org/10.1016/j.trd.2022.103447.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Taheri Javan, Nastooh, Masoud Sabaei, and Vesal Hakami. "IEEE 802.15.4.e TSCH-Based Scheduling for Throughput Optimization: A Combinatorial Multi-Armed Bandit Approach." IEEE Sensors Journal 20, no. 1 (January 1, 2020): 525–37. http://dx.doi.org/10.1109/jsen.2019.2941012.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Han, Kai, Yuntian He, Alex X. Liu, Shaojie Tang, and He Huang. "Differentially Private and Budget-Limited Bandit Learning over Matroids." INFORMS Journal on Computing 32, no. 3 (July 2020): 790–804. http://dx.doi.org/10.1287/ijoc.2019.0903.

Full text

Abstract:

We propose the first budget-limited multi-armed bandit (BMAB) algorithm subject to a union of matroid constraints in arm pulling, while at the same time achieving differential privacy. Our model generalizes the arm-pulling models studied in prior BMAB schemes, and it can be used to address many practical problems such as network backbone construction and dynamic pricing in crowdsourcing. We handle the exploitation versus exploration tradeoff in our BMAB problem by exploiting the combinatorial structures of matroids, and reduce the searching complexity of arm selection based on a divide-and-conquer approach. Our algorithm achieves a uniform logarithmic regret bound with respect to B and ɛ-differential privacy, where B is the budget for pulling the arms with random costs. Without differential privacy, our algorithm achieves a uniform logarithmic regret bound with respect to B, which advances the asymptotic regret bounds achieved by prior BMAB algorithms. We performed side-by-side comparisons with prior schemes in our experiments. Experimental results show that our purely-combinatorial algorithm not only achieves significantly better regret performance, but also is more than 20 times faster than prior BMAB schemes, which use time-consuming LP-solving techniques.

APA, Harvard, Vancouver, ISO, and other styles

14

Bandyopadhyay, Sambaran, Pratyush Kumar, and Vijay Arya. "Planning Curtailment of Renewable Generation in Power Grids." Proceedings of the International Conference on Automated Planning and Scheduling 26 (March 30, 2016): 353–57. http://dx.doi.org/10.1609/icaps.v26i1.13779.

Full text

Abstract:

The increasing penetration of renewable sources like solar energy add new dimensions in planning power grid operations. We study the problem of curtailing a subset of prosumers generating solar power with the twin goals of being close to a target collection and maintaining fairness across prosumers. The problem is complicated by the uncertainty in the amount of energy fed-in by each prosumer and the large problem size in terms of number of prosumers. To meet these challenges, we propose an algorithm based on the Combinatorial Multi-Armed Bandit problem with an approximate Knapsack based oracle. With real-data on solar panel output across multiple prosumers, we are able to demonstrate the effectiveness of the proposed algorithm.

APA, Harvard, Vancouver, ISO, and other styles

15

Wan Ariffin, Wan Nur Suryani Firuz Wan, Xinruo Zhang, Mohammad Reza Nakhai, Hasliza A. Rahim, and R. Badlishah Ahmad. "Online Learning Approach for Predictive Real-Time Energy Trading in Cloud-RANs." Sensors 21, no. 7 (March 25, 2021): 2308. http://dx.doi.org/10.3390/s21072308.

Full text

Abstract:

Constantly changing electricity demand has made variability and uncertainty inherent characteristics of both electric generation and cellular communication systems. This paper develops an online learning algorithm as a prescheduling mechanism to manage the variability and uncertainty to maintain cost-aware and reliable operation in cloud radio access networks (Cloud-RANs). The proposed algorithm employs a combinatorial multi-armed bandit model and minimizes the long-term energy cost at remote radio heads. The algorithm preschedules a set of cost-efficient energy packages to be purchased from an ancillary energy market for the future time slots by learning both from cooperative energy trading at previous time slots and by exploring new energy scheduling strategies at the current time slot. The simulation results confirm a significant performance gain of the proposed scheme in controlling the available power budgets and minimizing the overall energy cost compared with recently proposed approaches for real-time energy resources and energy trading in Cloud-RANs.

APA, Harvard, Vancouver, ISO, and other styles

16

Perera, R. Malinga, Bastian Oetomo, Benjamin I. P. Rubinstein, and Renata Borovica-Gajic. "HMAB." Proceedings of the VLDB Endowment 16, no. 2 (October 2022): 216–29. http://dx.doi.org/10.14778/3565816.3565824.

Full text

Abstract:

Effective physical database design tuning requires selection of several physical design structures (PDS), such as indices and materialised views, whose combination influences overall system performance in a non-linear manner. While the simplicity of combining the results of iterative searches for individual PDSs may be appealing, such a greedy approach may yield vastly suboptimal results compared to an integrated search. We propose a new self-driving approach (HMAB) based on hierarchical multi-armed bandit learners, which can work in an integrated space of multiple PDS while avoiding the full cost of combinatorial search. HMAB eschews the optimiser cost misestimates by direct performance observations through a strategic exploration, while carefully leveraging its knowledge to prune the less useful exploration paths. As an added advantage, HMAB comes with a provable guarantee on its expected performance. To the best of our knowledge, this is the first learned system to tune both indices and materialised views in an integrated manner. We find that our solution enjoys superior empirical performance relative to state-of-the-art commercial physical database design tools that search over the integrated space of materialised views and indices. Specifically, HMAB achieves up to 96% performance gain over a state-of-the-art commercial physical database design tool when running industrial benchmarks.

APA, Harvard, Vancouver, ISO, and other styles

17

Shweta, Jain, and Gujar Sujit. "A Multiarmed Bandit Based Incentive Mechanism for a Subset Selection of Customers for Demand Response in Smart Grids." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 02 (April 3, 2020): 2046–53. http://dx.doi.org/10.1609/aaai.v34i02.5577.

Full text

Abstract:

Demand response is a crucial tool to maintain the stability of the smart grids. With the upcoming research trends in the area of electricity markets, it has become a possibility to design a dynamic pricing system, and consumers are made aware of what they are going to pay. Though the dynamic pricing system (pricing based on the total demand a distributor company is facing) seems to be one possible solution, the current dynamic pricing approaches are either too complex for a consumer to understand or are too naive leading to inefficiencies in the system (either consumer side or distributor side). Due to these limitations, the recent literature is focusing on the approach to provide incentives to the consumers to reduce the electricity, especially in peak hours. For each round, the goal is to select a subset of consumers to whom the distributor should offer incentives so as to minimize the loss which comprises of cost of buying the electricity from the market, uncertainties at consumer end, and cost incurred to the consumers to reduce the electricity which is a private information to the consumers. Due to the uncertainties in the loss function (arising from renewable energy resources as well as consumption needs), traditional auction theory-based incentives face manipulation challenges. Towards this, we propose a novel combinatorial multi-armed bandit (MAB) algorithm, which we refer to as \namemab\ to learn the uncertainties along with an auction to elicit true costs incurred by the consumers. We prove that our mechanism is regret optimal and is incentive compatible. We further demonstrate efficacy of our algorithms via simulations.

APA, Harvard, Vancouver, ISO, and other styles

18

Lykouris, Thodoris, Karthik Sridharan, and Éva Tardos. "Small-Loss Bounds for Online Learning with Partial Information." Mathematics of Operations Research, January 25, 2022. http://dx.doi.org/10.1287/moor.2021.1204.

Full text

Abstract:

We consider the problem of adversarial (nonstochastic) online learning with partial-information feedback, in which, at each round, a decision maker selects an action from a finite set of alternatives. We develop a black-box approach for such problems in which the learner observes as feedback only losses of a subset of the actions that includes the selected action. When losses of actions are nonnegative, under the graph-based feedback model introduced by Mannor and Shamir, we offer algorithms that attain the so called “small-loss” [Formula: see text] regret bounds with high probability, where α is the independence number of the graph and [Formula: see text] is the loss of the best action. Prior to our work, there was no data-dependent guarantee for general feedback graphs even for pseudo-regret (without dependence on the number of actions, i.e., utilizing the increased information feedback). Taking advantage of the black-box nature of our technique, we extend our results to many other applications, such as combinatorial semi-bandits (including routing in networks), contextual bandits (even with an infinite comparator class), and learning with slowly changing (shifting) comparators. In the special case of multi-armed bandit and combinatorial semi-bandit problems, we provide optimal small-loss, high-probability regret guarantees of [Formula: see text], where d is the number of actions, answering open questions of Neu. Previous bounds for multi-armed bandits and semi-bandits were known only for pseudo-regret and only in expectation. We also offer an optimal [Formula: see text] regret guarantee for fixed feedback graphs with clique-partition number at most κ.

APA, Harvard, Vancouver, ISO, and other styles

19

Wang, Yingfei, Hua Ouyang, Chu Wang, Jianhui Chen, Tsvetan Asamov, and Yi Chang. "Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation." Proceedings of the AAAI Conference on Artificial Intelligence 31, no. 1 (February 13, 2017). http://dx.doi.org/10.1609/aaai.v31i1.10939.

Full text

Abstract:

Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the results in S (out of M) different positions. The aim is to maximize the cumulative reward with respect to the best possible subset and positions in hindsight. By the adaptation of a minimum-cost maximum-flow network, a practical algorithm based on Thompson sampling is derived for the (contextual) combinatorial problem, thus resolving the problem of computational intractability.With its potential to work with whole-page recommendation and any probabilistic models, to illustrate the effectiveness of our method, we focus on Gaussian process optimization and a contextual setting where click-through rate is predicted using logistic regression. We demonstrate the algorithms’ performance on synthetic Gaussian process problems and on large-scale news article recommendation datasets from Yahoo! Front Page Today Module.

APA, Harvard, Vancouver, ISO, and other styles

20

Letard, Alexandre, Tassadit Amghar, Olivier Camp, and Nicolas Gutowski. "COM-MABs: From Users' Feedback to Recommendation." International FLAIRS Conference Proceedings 35 (May 4, 2022). http://dx.doi.org/10.32473/flairs.v35i.130560.

Full text

Abstract:

Recently, the COMbinatorial Multi-Armed Bandits (COM-MAB) problem has arisen as an active research field. In systems interacting with humans, those reinforcement learning approaches use a feedback strategy as their reward function. On the study of those strategies, this paper present three contributions: 1) We model a feedback strategy as a three-step process, where each step influences the performances of an agent ; 2) Based on this model, we propose a novel Reward Computing process, BUSBC, which significantly increases the global accuracy reached by optimistic COM-MAB algorithms -- up to 16.2\% -- ; 3) We conduct an empirical analysis of our approach and several feedback strategies from the literature on three real-world application datasets, confirming our propositions.

APA, Harvard, Vancouver, ISO, and other styles

21

Zhou, Yitong, Hesheng Sun, Yibo Jin, Yanfang Zhu, Yuan Li, Zhuzhong Qian, Sheng Zhang, and Sanglu Lu. "Inference replication at edges via combinatorial multi-armed bandit." Journal of Systems Architecture, June 2022, 102636. http://dx.doi.org/10.1016/j.sysarc.2022.102636.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Yan, Cairong, Haixia Han, Yanting Zhang, Dandan Zhu, and Yongquan Wan. "Dynamic clustering based contextual combinatorial multi-armed bandit for online recommendation." Knowledge-Based Systems, September 2022, 109927. http://dx.doi.org/10.1016/j.knosys.2022.109927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Wang, Hengzhi, Yongjian Yang, En Wang, Wenbin Liu, Yuanbo Xu, and Jie Wu. "Truthful User Recruitment for Cooperative Crowdsensing Task: A Combinatorial Multi-Armed Bandit Approach." IEEE Transactions on Mobile Computing, 2022, 1. http://dx.doi.org/10.1109/tmc.2022.3153451.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Song, Yunchao, Chen Liu, Wenyi Zhang, Yiliang Liu, Haibo Zhou, and Xuemin Shen. "Two Stage Beamforming in Massive MIMO: A Combinatorial Multi-Armed Bandit Based Approach." IEEE Transactions on Vehicular Technology, 2023, 1–6. http://dx.doi.org/10.1109/tvt.2022.3229312.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Modi, Navikkumar, Philippe Mary, and Christophe Moy. "Transfer restless multi-armed bandit policy for energy-efficient heterogeneous cellular network." EURASIP Journal on Advances in Signal Processing 2019, no. 1 (October 21, 2019). http://dx.doi.org/10.1186/s13634-019-0637-1.

Full text

Abstract:

Abstract This paper proposes a learning policy to improve the energy efficiency (EE) of heterogeneous cellular networks. The combination of active and inactive base stations (BS) that allows for maximizing EE is identified as a combinatorial learning problem and requires high computational complexity as well as a large signaling overhead. This paper aims at presenting a learning policy that dynamically switches a BS to ON or OFF status in order to follow the traffic load variation during the day. The network traffic load is represented as a Markov decision process, and we propose a modified upper confidence bound algorithm based on restless Markov multi-armed bandit framework for the BS switching operation. Moreover, to cope with initial reward loss and to speed up the convergence of the learning algorithm, the transfer learning concept is adapted to our algorithm in order to benefit from the transferred knowledge observed in historical periods from the same region. Based on our previous work, a convergence theorem is provided for the proposed policy. Extensive simulations demonstrate that the proposed algorithms follow the traffic load variation during the day and contribute to a performance jump-start in EE improvement under various practical traffic load profiles. It also demonstrates that proposed schemes can significantly reduce the total energy consumption of cellular network, e.g., up to 70% potential energy savings based on a real traffic profile.

APA, Harvard, Vancouver, ISO, and other styles

26

Shu, Zhaogang, Haoxian Feng, Tarik Taleb, and ZhiFang Zhang. "A Novel Combinatorial Multi-Armed Bandit Game to Identify Online the Changing Top-K Flows in Software-Defined Networks." SSRN Electronic Journal, 2023. http://dx.doi.org/10.2139/ssrn.4326503.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!