Увійти

Готові списки джерел за темами / Multi-armed bandit formulation / Статті в журналах

Статті в журналах з теми "Multi-armed bandit formulation"

Щоб переглянути інші типи публікацій з цієї теми, перейдіть за посиланням: Multi-armed bandit formulation.

Автор: Grafiati

Опубліковано: 6 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся з топ-18 статей у журналах для дослідження на тему "Multi-armed bandit formulation".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Переглядайте статті в журналах для різних дисциплін та оформлюйте правильно вашу бібліографію.

1

Dzhoha, A. S. "Sequential resource allocation in a stochastic environment: an overview and numerical experiments." Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, no. 3 (2021): 13–25. http://dx.doi.org/10.17721/1812-5409.2021/3.1.

Повний текст джерела

Анотація:

In this paper, we consider policies for the sequential resource allocation under the multi-armed bandit problem in a stochastic environment. In this model, an agent sequentially selects an action from a given set and an environment reveals a reward in return. In the stochastic setting, each action is associated with a probability distribution with parameters that are not known in advance. The agent makes a decision based on the history of the chosen actions and obtained rewards. The objective is to maximize the total cumulative reward, which is equivalent to the loss minimization. We provide a brief overview of the sequential analysis and an appearance of the multi-armed bandit problem as a formulation in the scope of the sequential resource allocation theory. Multi-armed bandit classification is given with an analysis of the existing policies for the stochastic setting. Two different approaches are shown to tackle the multi-armed bandit problem. In the frequentist view, the confidence interval is used to express the exploration-exploitation trade-off. In the Bayesian approach, the parameter that needs to be estimated is treated as a random variable. Shown, how this model can be modelled with help of the Markov decision process. In the end, we provide numerical experiments in order to study the effectiveness of these policies.

Стилі APA, Harvard, Vancouver, ISO та ін.

2

Zayas-Cabán, Gabriel, Stefanus Jasin, and Guihua Wang. "An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits." Advances in Applied Probability 51, no. 03 (September 2019): 745–72. http://dx.doi.org/10.1017/apr.2019.29.

Повний текст джерела

Анотація:

AbstractWe propose an asymptotically optimal heuristic, which we term randomized assignment control (RAC) for a restless multi-armed bandit problem with discrete-time and finite states. It is constructed using a linear programming relaxation of the original stochastic control formulation. In contrast to most of the existing literature, we consider a finite-horizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the setting with nonstationary transition matrix and nonstationary cost function. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to infinity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Finally, we show that RAC is also asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system.

Стилі APA, Harvard, Vancouver, ISO та ін.

3

Roy Chaudhuri, Arghya, and Shivaram Kalyanakrishnan. "Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10085–92. http://dx.doi.org/10.1609/aaai.v34i06.6566.

Повний текст джерела

Анотація:

Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several optimal algorithms have been proposed. Such algorithms depend on (sufficient statistics of) the empirical reward distributions of the arms to decide which arm to pull next. In this paper, we consider the design of algorithms that are constrained to store statistics from only a bounded number of arms. For bandits with a finite set of arms, we derive a sub-linear upper bound on the regret that decreases with the “arm memory” size M. For instances with a large, possibly infinite, set of arms, we show a sub-linear bound on the quantile regret.Our problem formulation generalises that of Liau et al. (2018), who fix M = O(1), and so do not obtain bounds that depend on M. More importantly, our algorithms keep exploration and exploitation tightly coupled, without a dedicated exploration phase as employed by Liau et al. (2018). Although this choice makes our analysis harder, it leads to much-improved practical performance. For bandits with a large number of arms and no known structure on the rewards, our algorithms serve as a viable option. Unlike many other approaches to restrict the memory of bandit algorithms, our algorithms do not need any additional technical assumptions.

Стилі APA, Harvard, Vancouver, ISO та ін.

4

Ai, Jing, and Alhussein A. Abouzeid. "Opportunistic spectrum access based on a constrained multi-armed bandit formulation." Journal of Communications and Networks 11, no. 2 (April 2009): 134–47. http://dx.doi.org/10.1109/jcn.2009.6391388.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

5

Bagheri, Saeed, and Anna Scaglione. "The Restless Multi-Armed Bandit Formulation of the Cognitive Compressive Sensing Problem." IEEE Transactions on Signal Processing 63, no. 5 (March 2015): 1183–98. http://dx.doi.org/10.1109/tsp.2015.2389620.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

6

Li, Xinbin, Jiajia Liu, Lei Yan, Song Han, and Xinping Guan. "Relay Selection for Underwater Acoustic Sensor Networks: A Multi-User Multi-Armed Bandit Formulation." IEEE Access 6 (2018): 7839–53. http://dx.doi.org/10.1109/access.2018.2801350.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

7

Ho, Chien-Ju, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. "Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems." Journal of Artificial Intelligence Research 55 (February 3, 2016): 317–59. http://dx.doi.org/10.1613/jair.4940.

Повний текст джерела

Анотація:

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.

Стилі APA, Harvard, Vancouver, ISO та ін.

8

Cavenaghi, Emanuele, Gabriele Sottocornola, Fabio Stella, and Markus Zanker. "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm." Entropy 23, no. 3 (March 23, 2021): 380. http://dx.doi.org/10.3390/e23030380.

Повний текст джерела

Анотація:

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

Стилі APA, Harvard, Vancouver, ISO та ін.

9

Li, Xinbin, Xianglin Xu, Lei Yan, Haihong Zhao, and Tongwei Zhang. "Energy-Efficient Data Collection Using Autonomous Underwater Glider: A Reinforcement Learning Formulation." Sensors 20, no. 13 (July 4, 2020): 3758. http://dx.doi.org/10.3390/s20133758.

Повний текст джерела

Анотація:

The autonomous underwater glider has attracted enormous interest for underwater activities, especially in long-term and large-scale underwater data collection. In this paper, we focus on the application of gliders gathering data from underwater sensor networks over underwater acoustic channels. However, this application suffers from a rapidly time-varying environment and limited energy. To optimize the performance of data collection and maximize the network lifetime, we propose a distributed, energy-efficient sensor scheduling algorithm based on the multi-armed bandit formulation. Besides, we design an indexable threshold policy to tradeoff between the data quality and the collection delay. Moreover, to reduce the computational complexity, we divide the proposed algorithm into off-line computation and on-line scheduling parts. Simulation results indicate that the proposed policy significantly improves the performance of the data collection and reduces the energy consumption. They prove the effectiveness of the threshold, which could reduce the collection delay by at least 10% while guaranteeing the data quality.

Стилі APA, Harvard, Vancouver, ISO та ін.

10

Mohamed, Ehab Mahmoud, Mohammad Alnakhli, Sherief Hashima, and Mohamed Abdel-Nasser. "Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB." Electronics 12, no. 1 (December 20, 2022): 12. http://dx.doi.org/10.3390/electronics12010012.

Повний текст джерела

Анотація:

Millimeter wave (mmWave), reconfigurable intelligent surface (RIS), and unmanned aerial vehicles (UAVs) are considered vital technologies of future six-generation (6G) communication networks. In this paper, various UAV mounted RIS are distributed to support mmWave coverage over several hotspots where numerous users exist in harsh blockage environment. UAVs should be spread among the hotspots to maximize their average achievable data rates while minimizing their hovering and flying energy consumptions. To efficiently address this non-polynomial time (NP) problem, it will be formulated as a centralized budget constraint multi-player multi-armed bandit (BCMP-MAB) game. In this formulation, UAVs will act as the players, the hotspots as the arms, and the achievable sum rates of the hotspots as the profit of the MAB game. This formulated MAB problem is different from the traditional one due to the added constraints of the limited budget of UAVs batteries as well as collision avoidance among UAVs, i.e., a hotspot should be covered by only one UAV at a time. Numerical analysis of different scenarios confirm the superior performance of the proposed BCMP-MAB algorithm over other benchmark schemes in terms of average sum rate and energy efficiency with comparable computational complexity and convergence rate.

Стилі APA, Harvard, Vancouver, ISO та ін.

11

Huanca-Anquise, Candy A., Ana Lúcia Cetertich Bazzan, and Anderson R. Tavares. "Multi-Objective, Multi-Armed Bandits: Algorithms for Repeated Games and Application to Route Choice." Revista de Informática Teórica e Aplicada 30, no. 1 (January 30, 2023): 11–23. http://dx.doi.org/10.22456/2175-2745.122929.

Повний текст джерела

Анотація:

Multi-objective decision-making in multi-agent scenarios poses multiple challenges. Dealing with multiple objectives and non-stationarity caused by simultaneous learning are only two of them, which have been addressed separately. In this work, reinforcement learning algorithms that tackle both issues together are proposed and applied to a route choice problem, where drivers must select an action in a single-state formulation, while aiming to minimize both their travel time and toll. Hence, we deal with repeated games, now with a multi-objective approach. Advantages, limitations and differences of these algorithms are discussed. Our results show that the proposed algorithms for action selection using reinforcement learning deal with non-stationarity and multiple objectives, while providing alternative solutions to those of centralized methods.

Стилі APA, Harvard, Vancouver, ISO та ін.

12

Rodriguez Diaz, Paula, Jackson A. Killian, Lily Xu, Arun Sai Suggala, Aparna Taneja, and Milind Tambe. "Flexible Budgets in Restless Bandits: A Primal-Dual Algorithm for Efficient Budget Allocation." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 10 (June 26, 2023): 12103–11. http://dx.doi.org/10.1609/aaai.v37i10.26427.

Повний текст джерела

Анотація:

Restless multi-armed bandits (RMABs) are an important model to optimize allocation of limited resources in sequential decision-making settings. Typical RMABs assume the budget --- the number of arms pulled --- to be fixed for each step in the planning horizon. However, for realistic real-world planning, resources are not necessarily limited at each planning step; we may be able to distribute surplus resources in one round to an earlier or later round. In real-world planning settings, this flexibility in budget is often constrained to within a subset of consecutive planning steps, e.g., weekly planning of a monthly budget. In this paper we define a general class of RMABs with flexible budget, which we term F-RMABs, and provide an algorithm to optimally solve for them. We derive a min-max formulation to find optimal policies for F-RMABs and leverage gradient primal-dual algorithms to solve for reward-maximizing policies with flexible budgets. We introduce a scheme to sample expected gradients to apply primal-dual algorithms to the F-RMAB setting and make an otherwise computationally expensive approach tractable. Additionally, we provide heuristics that trade off solution quality for efficiency and present experimental comparisons of different F-RMAB solution approaches.

Стилі APA, Harvard, Vancouver, ISO та ін.

13

Yang, Yibo, Antoine Blanchard, Themistoklis Sapsis, and Paris Perdikaris. "Output-weighted sampling for multi-armed bandits with extreme payoffs." Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 478, no. 2260 (April 2022). http://dx.doi.org/10.1098/rspa.2021.0781.

Повний текст джерела

Анотація:

We present a new type of acquisition function for online decision-making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound acquisition function that guides exploration towards the bandits that are deemed most relevant according to the variability of the observed rewards. This is achieved by computing a tractable likelihood ratio that quantifies the importance of the output relative to the inputs and essentially acts as an attention mechanism that promotes exploration of extreme rewards. Our formulation is supported by asymptotic zero-regret guarantees, and its performance is demonstrated across several synthetic benchmarks, as well as two realistic examples involving noisy sensor network data. Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.

Стилі APA, Harvard, Vancouver, ISO та ін.

14

Yang, Yibo, Antoine Blanchard, Themistoklis Sapsis, and Paris Perdikaris. "Output-weighted sampling for multi-armed bandits with extreme payoffs." Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 478, no. 2260 (April 2022). http://dx.doi.org/10.1098/rspa.2021.0781.

Повний текст джерела

Анотація:

We present a new type of acquisition function for online decision-making in multi-armed and contextual bandit problems with extreme payoffs. Specifically, we model the payoff function as a Gaussian process and formulate a novel type of upper confidence bound acquisition function that guides exploration towards the bandits that are deemed most relevant according to the variability of the observed rewards. This is achieved by computing a tractable likelihood ratio that quantifies the importance of the output relative to the inputs and essentially acts as an attention mechanism that promotes exploration of extreme rewards. Our formulation is supported by asymptotic zero-regret guarantees, and its performance is demonstrated across several synthetic benchmarks, as well as two realistic examples involving noisy sensor network data. Finally, we provide a JAX library for efficient bandit optimization using Gaussian processes.

Стилі APA, Harvard, Vancouver, ISO та ін.

15

Taywade, Kshitija, Brent Harrison, Adib Bagh, and Judy Goldsmith. "Modelling Cournot Games as Multi-agent Multi-armed Bandits." International FLAIRS Conference Proceedings 35 (May 4, 2022). http://dx.doi.org/10.32473/flairs.v35i.130697.

Повний текст джерела

Анотація:

We investigate the use of a multi-agent multi-armed bandit (MA-MAB) setting for modeling repeated Cournot oligopoly games, where the firms acting as agents choose from the set of arms representing production quantity (a discrete value). Agents interact with separate and independent bandit problems. In this formulation, each agent makes sequential choices among arms to maximize its own reward. Agents do not have any information about the environment; they can only see their own rewards after taking an action. However, the market demand is a stationary function of the total industry output, and random entry or exit from the market is not allowed. Given these assumptions, we found that an ε-greedy approach offers a more viable learning mechanism than other traditional MAB approaches, as it does not require any additional knowledge of the system to operate. We also propose two novel approaches that take advantage of the ordered action space: ε-greedy+HL and ε-greedy+EL. These new approaches help firms to focus on more profitable actions by eliminating less profitable choices and hence are designed to optimize the exploration. We use computer simulations to study the emergence of various equilibria in the outcomes and do the empirical analysis of joint cumulative regrets.

Стилі APA, Harvard, Vancouver, ISO та ін.

16

Mandel, Travis, Yun-En Liu, Emma Brunskill, and Zoran Popović. "The Queue Method: Handling Delay, Heuristics, Prior Data, and Evaluation in Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 29, no. 1 (February 21, 2015). http://dx.doi.org/10.1609/aaai.v29i1.9604.

Повний текст джерела

Анотація:

Current algorithms for the standard multi-armed bandit problem have good empirical performance and optimal regret bounds. However, real-world problems often differ from the standard formulation in several ways. First, feedback may be delayed instead of arriving immediately. Second, the real world often contains structure which suggests heuristics, which we wish to incorporate while retaining the best-known theoretical guarantees. Third, we may wish to make use of an arbitrary prior dataset without negatively impacting performance. Fourth, we may wish to efficiently evaluate algorithms using a previously collected dataset. Surprisingly, these seemingly-disparate problems can be addressed using algorithms inspired by a recently-developed queueing technique. We present the Stochastic Delayed Bandits (SDB) algorithm as a solution to these four problems, which takes black-box bandit algorithms (including heuristic approaches) as input while achieving good theoretical guarantees. We present empirical results from both synthetic simulations and real-world data drawn from an educational game. Our results show that SDB outperforms state-of-the-art approaches to handling delay, heuristics, prior data, and evaluation.

Стилі APA, Harvard, Vancouver, ISO та ін.

17

Jagadeesan, Meena, Alexander Wei, Yixin Wang, Michael I. Jordan, and Jacob Steinhardt. "Learning Equilibria in Matching Markets with Bandit Feedback." Journal of the ACM, February 16, 2023. http://dx.doi.org/10.1145/3583681.

Повний текст джерела

Анотація:

Large-scale, two-sided matching platforms must find market outcomes that align with user preferences while simultaneously learning these preferences from data. Classical notions of stability (Gale and Shapley, 1962; Shapley and Shubik, 1971) are unfortunately of limited value in the learning setting, given that preferences are inherently uncertain and destabilizing while they are being learned. To bridge this gap, we develop a framework and algorithms for learning stable market outcomes under uncertainty. Our primary setting is matching with transferable utilities, where the platform both matches agents and sets monetary transfers between them. We design an incentive-aware learning objective that captures the distance of a market outcome from equilibrium. Using this objective, we analyze the complexity of learning as a function of preference structure, casting learning as a stochastic multi-armed bandit problem. Algorithmically, we show that “optimism in the face of uncertainty,” the principle underlying many bandit algorithms, applies to a primal-dual formulation of matching with transfers and leads to near-optimal regret bounds. Our work takes a first step toward elucidating when and how stable matchings arise in large, data-driven marketplaces.

Стилі APA, Harvard, Vancouver, ISO та ін.

18

Gullo, F., D. Mandaglio, and A. Tagarelli. "A combinatorial multi-armed bandit approach to correlation clustering." Data Mining and Knowledge Discovery, June 29, 2023. http://dx.doi.org/10.1007/s10618-023-00937-5.

Повний текст джерела

Анотація:

AbstractGiven a graph whose edges are assigned positive-type and negative-type weights, the problem of correlation clustering aims at grouping the graph vertices so as to minimize (resp. maximize) the sum of negative-type (resp. positive-type) intra-cluster weights plus the sum of positive-type (resp. negative-type) inter-cluster weights. In correlation clustering, it is typically assumed that the weights are readily available. This is a rather strong hypothesis, which is unrealistic in several scenarios. To overcome this limitation, in this work we focus on the setting where edge weights of a correlation-clustering instance are unknown, and they have to be estimated in multiple rounds, while performing the clustering. The clustering solutions produced in the various rounds provide a feedback to properly adjust the weight estimates, and the goal is to maximize the cumulative quality of the clusterings. We tackle this problem by resorting to the reinforcement-learning paradigm, and, specifically, we design for the first time a Combinatorial Multi-Armed Bandit (CMAB) framework for correlation clustering. We provide a variety of contributions, namely (1) formulations of the minimization and maximization variants of correlation clustering in a CMAB setting; (2) adaptation of well-established CMAB algorithms to the correlation-clustering context; (3) regret analyses to theoretically bound the accuracy of these algorithms; (4) design of further (heuristic) algorithms to have the probability constraint satisfied at every round (key condition to soundly adopt efficient yet effective algorithms for correlation clustering as CMAB oracles); (5) extensive experimental comparison among a variety of both CMAB and non-CMAB approaches for correlation clustering.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!