Готові списки джерел за темами / Multi-armed bandit formulation

Добірка наукової літератури з теми "Multi-armed bandit formulation"

Автор: Grafiati

Опубліковано: 6 вересня 2023

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Зміст

Статті в журналах
Дисертації
Частини книг
Тези доповідей конференцій

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Multi-armed bandit formulation".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Multi-armed bandit formulation"

Dzhoha, A. S. "Sequential resource allocation in a stochastic environment: an overview and numerical experiments." Bulletin of Taras Shevchenko National University of Kyiv. Series: Physics and Mathematics, no. 3 (2021): 13–25. http://dx.doi.org/10.17721/1812-5409.2021/3.1.

Повний текст джерела

Анотація:

In this paper, we consider policies for the sequential resource allocation under the multi-armed bandit problem in a stochastic environment. In this model, an agent sequentially selects an action from a given set and an environment reveals a reward in return. In the stochastic setting, each action is associated with a probability distribution with parameters that are not known in advance. The agent makes a decision based on the history of the chosen actions and obtained rewards. The objective is to maximize the total cumulative reward, which is equivalent to the loss minimization. We provide a brief overview of the sequential analysis and an appearance of the multi-armed bandit problem as a formulation in the scope of the sequential resource allocation theory. Multi-armed bandit classification is given with an analysis of the existing policies for the stochastic setting. Two different approaches are shown to tackle the multi-armed bandit problem. In the frequentist view, the confidence interval is used to express the exploration-exploitation trade-off. In the Bayesian approach, the parameter that needs to be estimated is treated as a random variable. Shown, how this model can be modelled with help of the Markov decision process. In the end, we provide numerical experiments in order to study the effectiveness of these policies.

Стилі APA, Harvard, Vancouver, ISO та ін.

Zayas-Cabán, Gabriel, Stefanus Jasin, and Guihua Wang. "An asymptotically optimal heuristic for general nonstationary finite-horizon restless multi-armed, multi-action bandits." Advances in Applied Probability 51, no. 03 (September 2019): 745–72. http://dx.doi.org/10.1017/apr.2019.29.

Повний текст джерела

Анотація:

AbstractWe propose an asymptotically optimal heuristic, which we term randomized assignment control (RAC) for a restless multi-armed bandit problem with discrete-time and finite states. It is constructed using a linear programming relaxation of the original stochastic control formulation. In contrast to most of the existing literature, we consider a finite-horizon problem with multiple actions and time-dependent (i.e. nonstationary) upper bound on the number of bandits that can be activated at each time period; indeed, our analysis can also be applied in the setting with nonstationary transition matrix and nonstationary cost function. The asymptotic setting is obtained by letting the number of bandits and other related parameters grow to infinity. Our main contribution is that the asymptotic optimality of RAC in this general setting does not require indexability properties or the usual stability conditions of the underlying Markov chain (e.g. unichain) or fluid approximation (e.g. global stable attractor). Moreover, our multi-action setting is not restricted to the usual dominant action concept. Finally, we show that RAC is also asymptotically optimal for a dynamic population, where bandits can randomly arrive and depart the system.

Стилі APA, Harvard, Vancouver, ISO та ін.

Roy Chaudhuri, Arghya, and Shivaram Kalyanakrishnan. "Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 06 (April 3, 2020): 10085–92. http://dx.doi.org/10.1609/aaai.v34i06.6566.

Повний текст джерела

Анотація:

Regret minimisation in stochastic multi-armed bandits is a well-studied problem, for which several optimal algorithms have been proposed. Such algorithms depend on (sufficient statistics of) the empirical reward distributions of the arms to decide which arm to pull next. In this paper, we consider the design of algorithms that are constrained to store statistics from only a bounded number of arms. For bandits with a finite set of arms, we derive a sub-linear upper bound on the regret that decreases with the “arm memory” size M. For instances with a large, possibly infinite, set of arms, we show a sub-linear bound on the quantile regret.Our problem formulation generalises that of Liau et al. (2018), who fix M = O(1), and so do not obtain bounds that depend on M. More importantly, our algorithms keep exploration and exploitation tightly coupled, without a dedicated exploration phase as employed by Liau et al. (2018). Although this choice makes our analysis harder, it leads to much-improved practical performance. For bandits with a large number of arms and no known structure on the rewards, our algorithms serve as a viable option. Unlike many other approaches to restrict the memory of bandit algorithms, our algorithms do not need any additional technical assumptions.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ai, Jing, and Alhussein A. Abouzeid. "Opportunistic spectrum access based on a constrained multi-armed bandit formulation." Journal of Communications and Networks 11, no. 2 (April 2009): 134–47. http://dx.doi.org/10.1109/jcn.2009.6391388.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Bagheri, Saeed, and Anna Scaglione. "The Restless Multi-Armed Bandit Formulation of the Cognitive Compressive Sensing Problem." IEEE Transactions on Signal Processing 63, no. 5 (March 2015): 1183–98. http://dx.doi.org/10.1109/tsp.2015.2389620.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Xinbin, Jiajia Liu, Lei Yan, Song Han, and Xinping Guan. "Relay Selection for Underwater Acoustic Sensor Networks: A Multi-User Multi-Armed Bandit Formulation." IEEE Access 6 (2018): 7839–53. http://dx.doi.org/10.1109/access.2018.2801350.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Ho, Chien-Ju, Aleksandrs Slivkins, and Jennifer Wortman Vaughan. "Adaptive Contract Design for Crowdsourcing Markets: Bandit Algorithms for Repeated Principal-Agent Problems." Journal of Artificial Intelligence Research 55 (February 3, 2016): 317–59. http://dx.doi.org/10.1613/jair.4940.

Повний текст джерела

Анотація:

Crowdsourcing markets have emerged as a popular platform for matching available workers with tasks to complete. The payment for a particular task is typically set by the task's requester, and may be adjusted based on the quality of the completed work, for example, through the use of "bonus" payments. In this paper, we study the requester's problem of dynamically adjusting quality-contingent payments for tasks. We consider a multi-round version of the well-known principal-agent model, whereby in each round a worker makes a strategic choice of the effort level which is not directly observable by the requester. In particular, our formulation significantly generalizes the budget-free online task pricing problems studied in prior work. We treat this problem as a multi-armed bandit problem, with each "arm" representing a potential contract. To cope with the large (and in fact, infinite) number of arms, we propose a new algorithm, AgnosticZooming, which discretizes the contract space into a finite number of regions, effectively treating each region as a single arm. This discretization is adaptively refined, so that more promising regions of the contract space are eventually discretized more finely. We analyze this algorithm, showing that it achieves regret sublinear in the time horizon and substantially improves over non-adaptive discretization (which is the only competing approach in the literature). Our results advance the state of art on several different topics: the theory of crowdsourcing markets, principal-agent problems, multi-armed bandits, and dynamic pricing.

Стилі APA, Harvard, Vancouver, ISO та ін.

Cavenaghi, Emanuele, Gabriele Sottocornola, Fabio Stella, and Markus Zanker. "Non Stationary Multi-Armed Bandit: Empirical Evaluation of a New Concept Drift-Aware Algorithm." Entropy 23, no. 3 (March 23, 2021): 380. http://dx.doi.org/10.3390/e23030380.

Повний текст джерела

Анотація:

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address real-world challenges related to sequential decision making. In this setting, an agent selects the best action to be performed at time-step t, based on the past rewards received by the environment. This formulation implicitly assumes that the expected payoff for each action is kept stationary by the environment through time. Nevertheless, in many real-world applications this assumption does not hold and the agent has to face a non-stationary environment, that is, with a changing reward distribution. Thus, we present a new MAB algorithm, named f-Discounted-Sliding-Window Thompson Sampling (f-dsw TS), for non-stationary environments, that is, when the data streaming is affected by concept drift. The f-dsw TS algorithm is based on Thompson Sampling (TS) and exploits a discount factor on the reward history and an arm-related sliding window to contrast concept drift in non-stationary environments. We investigate how to combine these two sources of information, namely the discount factor and the sliding window, by means of an aggregation function f(.). In particular, we proposed a pessimistic (f=min), an optimistic (f=max), as well as an averaged (f=mean) version of the f-dsw TS algorithm. A rich set of numerical experiments is performed to evaluate the f-dsw TS algorithm compared to both stationary and non-stationary state-of-the-art TS baselines. We exploited synthetic environments (both randomly-generated and controlled) to test the MAB algorithms under different types of drift, that is, sudden/abrupt, incremental, gradual and increasing/decreasing drift. Furthermore, we adapt four real-world active learning tasks to our framework—a prediction task on crimes in the city of Baltimore, a classification task on insects species, a recommendation task on local web-news, and a time-series analysis on microbial organisms in the tropical air ecosystem. The f-dsw TS approach emerges as the best performing MAB algorithm. At least one of the versions of f-dsw TS performs better than the baselines in synthetic environments, proving the robustness of f-dsw TS under different concept drift types. Moreover, the pessimistic version (f=min) results as the most effective in all real-world tasks.

Стилі APA, Harvard, Vancouver, ISO та ін.

Li, Xinbin, Xianglin Xu, Lei Yan, Haihong Zhao, and Tongwei Zhang. "Energy-Efficient Data Collection Using Autonomous Underwater Glider: A Reinforcement Learning Formulation." Sensors 20, no. 13 (July 4, 2020): 3758. http://dx.doi.org/10.3390/s20133758.

Повний текст джерела

Анотація:

The autonomous underwater glider has attracted enormous interest for underwater activities, especially in long-term and large-scale underwater data collection. In this paper, we focus on the application of gliders gathering data from underwater sensor networks over underwater acoustic channels. However, this application suffers from a rapidly time-varying environment and limited energy. To optimize the performance of data collection and maximize the network lifetime, we propose a distributed, energy-efficient sensor scheduling algorithm based on the multi-armed bandit formulation. Besides, we design an indexable threshold policy to tradeoff between the data quality and the collection delay. Moreover, to reduce the computational complexity, we divide the proposed algorithm into off-line computation and on-line scheduling parts. Simulation results indicate that the proposed policy significantly improves the performance of the data collection and reduces the energy consumption. They prove the effectiveness of the threshold, which could reduce the collection delay by at least 10% while guaranteeing the data quality.

Стилі APA, Harvard, Vancouver, ISO та ін.

Mohamed, Ehab Mahmoud, Mohammad Alnakhli, Sherief Hashima, and Mohamed Abdel-Nasser. "Distribution of Multi MmWave UAV Mounted RIS Using Budget Constraint Multi-Player MAB." Electronics 12, no. 1 (December 20, 2022): 12. http://dx.doi.org/10.3390/electronics12010012.

Повний текст джерела

Анотація:

Millimeter wave (mmWave), reconfigurable intelligent surface (RIS), and unmanned aerial vehicles (UAVs) are considered vital technologies of future six-generation (6G) communication networks. In this paper, various UAV mounted RIS are distributed to support mmWave coverage over several hotspots where numerous users exist in harsh blockage environment. UAVs should be spread among the hotspots to maximize their average achievable data rates while minimizing their hovering and flying energy consumptions. To efficiently address this non-polynomial time (NP) problem, it will be formulated as a centralized budget constraint multi-player multi-armed bandit (BCMP-MAB) game. In this formulation, UAVs will act as the players, the hotspots as the arms, and the achievable sum rates of the hotspots as the profit of the MAB game. This formulated MAB problem is different from the traditional one due to the added constraints of the limited budget of UAVs batteries as well as collision avoidance among UAVs, i.e., a hotspot should be covered by only one UAV at a time. Numerical analysis of different scenarios confirm the superior performance of the proposed BCMP-MAB algorithm over other benchmark schemes in terms of average sum rate and energy efficiency with comparable computational complexity and convergence rate.

Стилі APA, Harvard, Vancouver, ISO та ін.

Більше джерел

Дисертації з теми "Multi-armed bandit formulation"

Ghalme, Ganesh Sambhaji. "Algorithms for Social Good in Online Platforms with Guarantees on Honest Participation and Fairness." Thesis, 2020. https://etd.iisc.ac.in/handle/2005/4573.

Повний текст джерела

Анотація:

Recent decades have seen a revolution in the way people communicate, buy products, learn new things, and share life experiences. This has spurred the growth of online platforms that enable users from all over the globe to buy/review/recommend products and services, ask questions and provide responses, participate in online learning, etc. There are certain crucial requirements that are required to be satisfied by the online forums for ensuring their trustworthiness and sustainability. In this thesis, we are concerned with three of these requirements: social welfare maximization, honest participation by the users, and fairness in decision making. In particular, we address three contemporary problems in online platforms and obtain principled solutions that achieve social welfare maximization while satisfying honest participation and fairness of allocation. The three problems considered are set in the context of three different platforms: online review or Q&A forums, online discussion forums, and online search platforms. In each case, we develop an abstraction of the problem and solve it in its generality. 1) Ballooning Multi-armed Bandits. In our first problem, we consider online platforms where the users are shown user generated content such as reviews on an e-commerce platform or answers on a Q&A platform. The number of reviews/answers increases over time. We seek to design an algorithm that quickly learns the best review/best answer and displays it prominently. We model this problem as a novel multi-armed bandit formulation (which we call ballooning bandits) in which the set of arms expands over time. We first show that when the number of arms grows linearly with time, one cannot achieve sub-linear regret. In a realistic special case, where the best answer is likely to arrive early enough, we prove that we can achieve optimal sublinear regret guarantee. We prove our results for best answer arrival time distributions that have sub-exponetal or sub-Pareto tails. 2) Strategy-proof Allocation of Indivisible Goods with Fairness Guarantees. Second, we consider the problem of fairness in online search platforms. We view the sponsored ad-slots on these platforms as indivisible goods to be allocated in a fair manner among competing advertisers. We use envy-freeness up to one good (EF1) and maximin fair share (MMS) allocation as the fairness notions. The problem is to maximize the overall social welfare subject to these fairness constraints. We first prove under a single parameter setting that the problem of social welfare maximization under EF1 is NP-hard. We complement this result by showing that any EF1 allocation satisfies an 1/2-approximation guarantee and present an algorithm with a (1, 1/2) bi-criteria approximation guarantee. We finally show in a strategic setting that one can design a truthful mechanism with the proposed fair allocation. 3) Coalition Resistant Credit Score Functions. In the third problem, we study manipulation in online discussion forums. We consider a specific but a common form of manipulation namely manipulation by coalition formation. We design a manipulation resistant credit scoring rule that assigns to each user a score such that forming a coalition is discouraged. In particular, we study the graph generated by the interactions on the platform and use community detection algorithms. We show that the community scores given by community detection algorithms that maximize modularity lead to a coalition resistant credit scoring rule. This in turn leads to sustainable discussion forums with honest participation from users, devoid of any coalitional manipulation.

Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Multi-armed bandit formulation"

Pini, Giovanni, Arne Brutschy, Gianpiero Francesca, Marco Dorigo, and Mauro Birattari. "Multi-armed Bandit Formulation of the Task Partitioning Problem in Swarm Robotics." In Lecture Notes in Computer Science, 109–20. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-32650-9_10.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Multi-armed bandit formulation"

Bouneffouf, Djallel, Irina Rish, Guillermo Cecchi, and Raphaël Féraud. "Context Attentive Bandits: Contextual Bandit with Restricted Context." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/203.

Повний текст джерела

Анотація:

We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Sampling with Restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets.

Стилі APA, Harvard, Vancouver, ISO та ін.

Avrachenkov, Konstantin, Laura Cottatellucci, and Lorenzo Maggi. "Slow fading channel selection: A restless multi-armed bandit formulation." In 2012 9th International Symposium on Wireless Communication Systems (ISWCS 2012). IEEE, 2012. http://dx.doi.org/10.1109/iswcs.2012.6328535.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Cheung, Mei Yi, Joshua Leighton, and Franz S. Hover. "Multi-armed bandit formulation for autonomous mobile acoustic relay adaptive positioning." In 2013 IEEE International Conference on Robotics and Automation (ICRA). IEEE, 2013. http://dx.doi.org/10.1109/icra.2013.6631165.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Feki, Afef, and Veronique Capdevielle. "Autonomous resource allocation for dense LTE networks: A Multi Armed Bandit formulation." In 2011 IEEE 22nd International Symposium on Personal, Indoor and Mobile Radio Communications - (PIMRC 2011). IEEE, 2011. http://dx.doi.org/10.1109/pimrc.2011.6140047.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Barbato, Antimo, Lin Chen, Fabio Martignon, and Stefano Paris. "A multi-armed bandit formulation for distributed appliances scheduling in smart grids." In 2014 IEEE Online Conference on Green Communications (OnlineGreencomm). IEEE, 2014. http://dx.doi.org/10.1109/onlinegreencom.2014.7114418.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Gai, Yi, Bhaskar Krishnamachari, and Rahul Jain. "Learning Multiuser Channel Allocations in Cognitive Radio Networks: A Combinatorial Multi-Armed Bandit Formulation." In 2010 IEEE International Symposium on New Frontiers in Dynamic Spectrum Access Networks (IEEE DySPAN). IEEE, 2010. http://dx.doi.org/10.1109/dyspan.2010.5457857.

Повний текст джерела

Стилі APA, Harvard, Vancouver, ISO та ін.

Chen, Kun, Kechao Cai, Longbo Huang, and John C. S. Lui. "Beyond the Click-Through Rate: Web Link Selection with Multi-level Feedback." In Twenty-Seventh International Joint Conference on Artificial Intelligence {IJCAI-18}. California: International Joint Conferences on Artificial Intelligence Organization, 2018. http://dx.doi.org/10.24963/ijcai.2018/459.

Повний текст джерела

Анотація:

The web link selection problem is to select a small subset of web links from a large web link pool, and to place the selected links on a web page that can only accommodate a limited number of links, e.g., advertisements, recommendations, or news feeds. Despite the long concerned click-through rate which reflects the attractiveness of the link itself, revenue can only be obtained from user actions after clicks, e.g., purchasing after being directed to the product pages by recommendation links. Thus, web links have an intrinsic multi-level feedback structure. With this observation, we consider the context-free web link selection problem, where the objective is to maximize revenue while ensuring that the attractiveness is no less than a preset threshold. The key challenge of the problem is that each link's multi-level feedbacks are stochastic, and unobservable unless the link is selected. We model this problem with a constrained stochastic multi-armed bandit formulation, and design an efficient link selection algorithm, called Constrained Upper Confidence Bound algorithm (Con-UCB). We prove O(sqrt(T ln(T))) bounds on both regret and violation of the attractiveness constraint. We also conduct extensive experiments on three real-world datasets, and show that Con-UCB outperforms state-of-the-art context-free bandit algorithms concerning the multi-level feedback structure.

Стилі APA, Harvard, Vancouver, ISO та ін.

ElSayed, Karim A., Ilias Bilionis, and Jitesh H. Panchal. "Evaluating Heuristics in Engineering Design: A Reinforcement Learning Approach." In ASME 2021 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference. American Society of Mechanical Engineers, 2021. http://dx.doi.org/10.1115/detc2021-70425.

Повний текст джерела

Анотація:

Abstract Heuristics are essential for addressing the complexities of engineering design processes. The goodness of heuristics is context-dependent. Appropriately tailored heuristics can enable designers to find good solutions efficiently, and inappropriate heuristics can result in cognitive biases and inferior design outcomes. While there have been several efforts at understanding which heuristics are used by designers, there is a lack of normative understanding about when different heuristics are suitable. Towards addressing this gap, this paper presents a reinforcement learning-based approach to evaluate the goodness of heuristics for three sub-problems commonly faced by designers: (1) learning the map between the design space and the performance space, (2) acquiring sequential information, and (3) stopping the information acquisition process. Using a multi-armed bandit formulation and simulation studies, we learn the suitable heuristics for these individual sub-problems under different resource constraints and problem complexities. Additionally, we learn the optimal heuristics for the combined problem (i.e., the one composing all three sub-problems), and we compare them to ones learned at the sub-problem level. The results of our simulation study indicate that the proposed reinforcement learning-based approach can be effective for determining the quality of heuristics for different problems, and how the effectiveness of the heuristics changes as a function of the designer’s preference (e.g., performance versus cost), the complexity of the problem, and the resources available.

Стилі APA, Harvard, Vancouver, ISO та ін.

Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!