Segui questo link per vedere altri tipi di pubblicazioni sul tema: Bandit Contextuel.

Articoli di riviste sul tema "Bandit Contextuel"

Cita una fonte nei formati APA, MLA, Chicago, Harvard e in molti altri stili

Scegli il tipo di fonte:

Vedi i top-50 articoli di riviste per l'attività di ricerca sul tema "Bandit Contextuel".

Accanto a ogni fonte nell'elenco di riferimenti c'è un pulsante "Aggiungi alla bibliografia". Premilo e genereremo automaticamente la citazione bibliografica dell'opera scelta nello stile citazionale di cui hai bisogno: APA, MLA, Harvard, Chicago, Vancouver ecc.

Puoi anche scaricare il testo completo della pubblicazione scientifica nel formato .pdf e leggere online l'abstract (il sommario) dell'opera se è presente nei metadati.

Vedi gli articoli di riviste di molte aree scientifiche e compila una bibliografia corretta.

1

Gisselbrecht, Thibault, Sylvain Lamprier e Patrick Gallinari. "Collecte ciblée à partir de flux de données en ligne dans les médias sociaux. Une approche de bandit contextuel". Document numérique 19, n. 2-3 (30 dicembre 2016): 11–30. http://dx.doi.org/10.3166/dn.19.2-3.11-30.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
2

Dimakopoulou, Maria, Zhengyuan Zhou, Susan Athey e Guido Imbens. "Balanced Linear Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 3445–53. http://dx.doi.org/10.1609/aaai.v33i01.33013445.

Testo completo
Abstract (sommario):
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.
Gli stili APA, Harvard, Vancouver, ISO e altri
3

Tong, Ruoyi. "A survey of the application and technical improvement of the multi-armed bandit". Applied and Computational Engineering 77, n. 1 (16 luglio 2024): 25–31. http://dx.doi.org/10.54254/2755-2721/77/20240631.

Testo completo
Abstract (sommario):
In recent years, the multi-armed bandit (MAB) model has been widely used and has shown excellent performance. This article provides an overview of the applications and technical improvements of the multi-armed bandit machine problem. First, an overview of the multi-armed bandit problem is presented, including the explanation of a general modeling approach and several existing common algorithms, such as -greedy, ETC, UCB, and Thompson sampling. Then, the real-life applications of the multi-armed bandit model are explored, covering the fields of recommender systems, healthcare, and finance. Then, some improved algorithms and models are summarized by addressing the problems encountered in different application domains, including the multi-armed bandit considering multiple objectives, the mortal multi-armed bandits, the multi-armed bandit considering contextual side information, combinatorial multi-armed bandits. Finally, the characteristics, trends of changes among different algorithms, and applicable scenarios are summarized and discussed.
Gli stili APA, Harvard, Vancouver, ISO e altri
4

Yang, Luting, Jianyi Yang e Shaolei Ren. "Contextual Bandits with Delayed Feedback and Semi-supervised Learning (Student Abstract)". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 18 (18 maggio 2021): 15943–44. http://dx.doi.org/10.1609/aaai.v35i18.17968.

Testo completo
Abstract (sommario):
Contextual multi-armed bandit (MAB) is a classic online learning problem, where a learner/agent selects actions (i.e., arms) given contextual information and discovers optimal actions based on reward feedback. Applications of contextual bandit have been increasingly expanding, including advertisement, personalization, resource allocation in wireless networks, among others. Nonetheless, the reward feedback is delayed in many applications (e.g., a user may only provide service ratings after a period of time), creating challenges for contextual bandits. In this paper, we address delayed feedback in contextual bandits by using semi-supervised learning — incorporate estimates of delayed rewards to improve the estimation of future rewards. Concretely, the reward feedback for an arm selected at the beginning of a round is only observed by the agent/learner with some observation noise and provided to the agent after some a priori unknown but bounded delays. Motivated by semi-supervised learning that produces pseudo labels for unlabeled data to further improve the model performance, we generate fictitious estimates of rewards that are delayed and have yet to arrive based on already-learnt reward functions. Thus, by combining semi-supervised learning with online contextual bandit learning, we propose a novel extension and design two algorithms, which estimate the values for currently unavailable reward feedbacks to minimize the maximum estimation error and average estimation error, respectively.
Gli stili APA, Harvard, Vancouver, ISO e altri
5

Sharaf, Amr, e Hal Daumé III. "Meta-Learning Effective Exploration Strategies for Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 11 (18 maggio 2021): 9541–48. http://dx.doi.org/10.1609/aaai.v35i11.17149.

Testo completo
Abstract (sommario):
In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a reward signal that is observed only for the action chosen. This leads to an exploration/exploitation trade-off: the algorithm must balance taking actions it already believes are good with taking new actions to potentially discover better choices. We develop a meta-learning algorithm, Mêlée, that learns an exploration policy based on simulated, synthetic con- textual bandit tasks. Mêlée uses imitation learning against these simulations to train an exploration policy that can be applied to true contextual bandit tasks at test time. We evaluate Mêlée on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual ban- dit problems derived from classification tasks. Mêlée outperforms seven strong baselines on most of these datasets by leveraging a rich feature representation for learning an exploration strategy.
Gli stili APA, Harvard, Vancouver, ISO e altri
6

Du, Yihan, Siwei Wang e Longbo Huang. "A One-Size-Fits-All Solution to Conservative Bandit Problems". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 8 (18 maggio 2021): 7254–61. http://dx.doi.org/10.1609/aaai.v35i8.16891.

Testo completo
Abstract (sommario):
In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.
Gli stili APA, Harvard, Vancouver, ISO e altri
7

Varatharajah, Yogatheesan, e Brent Berry. "A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials". Life 12, n. 8 (21 agosto 2022): 1277. http://dx.doi.org/10.3390/life12081277.

Testo completo
Abstract (sommario):
Clinical trials are conducted to evaluate the efficacy of new treatments. Clinical trials involving multiple treatments utilize the randomization of treatment assignments to enable the evaluation of treatment efficacies in an unbiased manner. Such evaluation is performed in post hoc studies that usually use supervised-learning methods that rely on large amounts of data collected in a randomized fashion. That approach often proves to be suboptimal in that some participants may suffer and even die as a result of having not received the most appropriate treatments during the trial. Reinforcement-learning methods improve the situation by making it possible to learn the treatment efficacies dynamically during the course of the trial, and to adapt treatment assignments accordingly. Recent efforts using multi-arm bandits, a type of reinforcement-learning method, have focused on maximizing clinical outcomes for a population that was assumed to be homogeneous. However, those approaches have failed to account for the variability among participants that is becoming increasingly evident as a result of recent clinical-trial-based studies. We present a contextual-bandit-based online treatment optimization algorithm that, in choosing treatments for new participants in the study, takes into account not only the maximization of the clinical outcomes as well as the patient characteristics. We evaluated our algorithm using a real clinical trial dataset from the International Stroke Trial. We simulated the online setting by sequentially going through the data of each participant admitted to the trial. Two bandits (one for each context) were created, with four choices of treatments. For a new participant in the trial, depending on the context, one of the bandits was selected. Then, we took three different approaches to choose a treatment: (a) a random choice (i.e., the strategy currently used in clinical trial settings), (b) a Thompson sampling-based approach, and (c) a UCB-based approach. Success probabilities of each context were calculated separately by considering the participants with the same context. Those estimated outcomes were used to update the prior distributions within the bandit corresponding to the context of each participant. We repeated that process through the end of the trial and recorded the outcomes and the chosen treatments for each approach. We also evaluated a context-free multi-arm-bandit-based approach, using the same dataset, to showcase the benefits of our approach. In the context-free case, we calculated the success probabilities for the Bernoulli sampler using the whole clinical trial dataset in a context-independent manner. The results of our retrospective analysis indicate that the proposed approach performs significantly better than either a random assignment of treatments (the current gold standard) or a multi-arm-bandit-based approach, providing substantial gains in the percentage of participants who are assigned the most suitable treatments. The contextual-bandit and multi-arm bandit approaches provide 72.63% and 64.34% gains, respectively, compared to a random assignment.
Gli stili APA, Harvard, Vancouver, ISO e altri
8

Li, Jialian, Chao Du e Jun Zhu. "A Bayesian Approach for Subset Selection in Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 9 (18 maggio 2021): 8384–91. http://dx.doi.org/10.1609/aaai.v35i9.17019.

Testo completo
Abstract (sommario):
Subset selection in Contextual Bandits (CB) is an important task in various applications such as advertisement recommendation. In CB, arms are attached with contexts and thus correlated in the context space. Proper exploration for subset selection in CB should carefully consider the contexts. Previous works mainly concentrate on the best one arm identification in linear bandit problems, where the expected rewards are linearly dependent on the contexts. However, these methods highly rely on linearity, and cannot be easily extended to more general cases. We propose a novel Bayesian approach for subset selection in general CB where the reward functions can be nonlinear. Our method provides a principled way to employ contextual information and efficiently explore the arms. For cases with relatively smooth posteriors, we give theoretical results that are comparable to previous works. For general cases, we provide a calculable approximate variant. Empirical results show the effectiveness of our method on both linear bandits and general CB.
Gli stili APA, Harvard, Vancouver, ISO e altri
9

Qu, Jiaming. "Survey of dynamic pricing based on Multi-Armed Bandit algorithms". Applied and Computational Engineering 37, n. 1 (22 gennaio 2024): 160–65. http://dx.doi.org/10.54254/2755-2721/37/20230497.

Testo completo
Abstract (sommario):
Dynamic pricing seeks to determine the most optimal selling price for a product or service, taking into account factors like limited supply and uncertain demand. This study aims to provide a comprehensive exploration of dynamic pricing using the multi-armed bandit problem framework in various contexts. The investigation highlights the prevalence of Thompson sampling in dynamic pricing scenarios with a Bayesian backdrop, where the seller possesses prior knowledge of demand functions. On the other hand, in non-Bayesian situations, the Upper Confidence Bound (UCB) algorithm family gains traction due to their favorable regret bounds. As markets often exhibit temporal fluctuations, the domain of non-stationary multi-armed bandits within dynamic pricing emerges as crucial. Future research directions include enhancing traditional multi-armed bandit algorithms to suit online learning settings, especially those involving dynamic reward distributions. Additionally, merging prior insights into demand functions with contextual multi-armed bandit approaches holds promise for advancing dynamic pricing strategies. In conclusion, this study sheds light on dynamic pricing through the lens of multi-armed bandit problems, offering insights and pathways for further exploration.
Gli stili APA, Harvard, Vancouver, ISO e altri
10

Atsidakou, Alexia, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos e Christos Tzamos. "Contextual Pandora’s Box". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 10 (24 marzo 2024): 10944–52. http://dx.doi.org/10.1609/aaai.v38i10.28969.

Testo completo
Abstract (sommario):
Pandora’s Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative, while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate distributions are given for the values of all the alternatives, while recent work studies the online variant of Pandora’s Box where the distributions are originally unknown. In this work, we study Pandora’s Box in the online setting, while incorporating context. At each round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well against the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored. The key technique that enables our result is a novel modification of the realizability condition in contextual bandits that connects a context to a sufficient statistic of each alternative’s distribution (its reservation value) rather than its mean.
Gli stili APA, Harvard, Vancouver, ISO e altri
11

Zhang, Qianqian. "Real-world Applications of Bandit Algorithms: Insights and Innovations". Transactions on Computer Science and Intelligent Systems Research 5 (12 agosto 2024): 753–58. http://dx.doi.org/10.62051/ge4sk783.

Testo completo
Abstract (sommario):
In the rapidly evolving landscape of decision-making systems, the significance of Multi-Armed Bandit (MAB) algorithms has surged, showcasing a remarkable ability to address the exploration-exploitation dilemma across diverse domains. Originating from the probabilistic and statistical decision-making framework, MAB algorithms have established a critical role by offering a systematic approach to making choices in uncertain environments with limited information. These algorithms ingeniously balance the trade-off between exploiting known resources for immediate gains and exploring new possibilities for future benefits. The spectrum of MAB algorithms ranges from Stochastic Stationary Bandits, dealing with static reward distributions, to more complex forms like Restless and Contextual Bandits, each tailored to the dynamism and specificity of real-world challenges. Further, Structured Bandits explore the underlying patterns in reward distributions, providing strategic insights into decision-making processes. The practical applications of these algorithms span several fields, including healthcare, content recommendation, and education, demonstrating their versatility and efficacy in addressing specific contextual challenges. This paper aims to provide a comprehensive overview of the development, nuances, and practical applications of MAB algorithms, highlighting their pivotal role in advancing decision-making processes amidst uncertainty.
Gli stili APA, Harvard, Vancouver, ISO e altri
12

Wang, Zhiyong, Xutong Liu, Shuai Li e John C. S. Lui. "Efficient Explorative Key-Term Selection Strategies for Conversational Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 8 (26 giugno 2023): 10288–95. http://dx.doi.org/10.1609/aaai.v37i8.26225.

Testo completo
Abstract (sommario):
Conversational contextual bandits elicit user preferences by occasionally querying for explicit feedback on key-terms to accelerate learning. However, there are aspects of existing approaches which limit their performance. First, information gained from key-term-level conversations and arm-level recommendations is not appropriately incorporated to speed up learning. Second, it is important to ask explorative key-terms to quickly elicit the user's potential interests in various domains to accelerate the convergence of user preference estimation, which has never been considered in existing works. To tackle these issues, we first propose ``ConLinUCB", a general framework for conversational bandits with better information incorporation, combining arm-level and key-term-level feedback to estimate user preference in one step at each time. Based on this framework, we further design two bandit algorithms with explorative key-term selection strategies, ConLinUCB-BS and ConLinUCB-MCR. We prove tighter regret upper bounds of our proposed algorithms. Particularly, ConLinUCB-BS achieves a better regret bound than the previous result. Extensive experiments on synthetic and real-world data show significant advantages of our algorithms in learning accuracy (up to 54% improvement) and computational efficiency (up to 72% improvement), compared to the classic ConUCB algorithm, showing the potential benefit to recommender systems.
Gli stili APA, Harvard, Vancouver, ISO e altri
13

Bansal, Nipun, Manju Bala e Kapil Sharma. "FuzzyBandit An Autonomous Personalized Model Based on Contextual Multi Arm Bandits Using Explainable AI". Defence Science Journal 74, n. 4 (26 aprile 2024): 496–504. http://dx.doi.org/10.14429/dsj.74.19330.

Testo completo
Abstract (sommario):
In the era of artificial cognizance, context-aware decision-making problems have attracted significant attention. Contextual bandit addresses these problems by solving the exploration versus exploitation dilemma faced to provide customized solutions as per the user’s liking. However, a high level of accountability is required, and there is a need to understand the underlying mechanism of the black box nature of the contextual bandit algorithms proposed in the literature. To overcome these shortcomings, an explainable AI (XAI) based FuzzyBandit model is proposed, which maximizes the cumulative reward by optimizing the decision at each trial based on the rewards received in previous observations and, at the same time, generates explanations for the decision made. The proposed model uses an adaptive neuro-fuzzy inference system (ANFIS) to address the vague nature of arm selection in contextual bandits and uses a feedback mechanism to adjust its parameters based on the relevance and diversity of the features to maximize reward generation. The FuzzyBandit model has also been empirically compared with the existing seven most popular art of literature models on four benchmark datasets over nine criteria, namely recall, specificity, precision, prevalence, F1 score, Matthews Correlation Coefficient (MCC), Fowlkes–Mallows index (FM), Critical Success Index (CSI) and accuracy.
Gli stili APA, Harvard, Vancouver, ISO e altri
14

Tang, Qiao, Hong Xie, Yunni Xia, Jia Lee e Qingsheng Zhu. "Robust Contextual Bandits via Bootstrapping". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 13 (18 maggio 2021): 12182–89. http://dx.doi.org/10.1609/aaai.v35i13.17446.

Testo completo
Abstract (sommario):
Upper confidence bound (UCB) based contextual bandit algorithms require one to know the tail property of the reward distribution. Unfortunately, such tail property is usually unknown or difficult to specify in real-world applications. Using a tail property heavier than the ground truth leads to a slow learning speed of the contextual bandit algorithm, while using a lighter one may cause the algorithm to diverge. To address this fundamental problem, we develop an estimator (evaluated from historical rewards) for the contextual bandit UCB based on the multiplier bootstrapping technique. We first establish sufficient conditions under which our estimator converges asymptotically to the ground truth of contextual bandit UCB. We further derive a second order correction for our estimator so as to obtain its confidence level with a finite number of rounds. To demonstrate the versatility of the estimator, we apply it to design a BootLinUCB algorithm for the contextual bandit. We prove that the BootLinUCB has a sub-linear regret upper bound and also conduct extensive experiments to validate its superior performance.
Gli stili APA, Harvard, Vancouver, ISO e altri
15

Wu, Jiazhen. "In-depth Exploration and Implementation of Multi-Armed Bandit Models Across Diverse Fields". Highlights in Science, Engineering and Technology 94 (26 aprile 2024): 201–5. http://dx.doi.org/10.54097/d3ez0n61.

Testo completo
Abstract (sommario):
This paper presents an in-depth analysis of the Multi-Armed Bandit (MAB) problem, tracing its evolution from its origins in the gambling domain of the 1940s to its current prominence in machine learning and artificial intelligence. The analysis begins with a historical overview, noting key developments like Herbert Robbins' probabilistic framework and the expansion of the problem into strategic decision-making in the 1970s. The emergence of algorithms like the Upper Confidence Bound (UCB) and Thompson Sampling in the late 20th century is highlighted, demonstrating the MAB problem's transition to practical applications. The integration of MAB algorithms with machine learning, particularly in the era of reinforcement learning, is explored, emphasizing their application in various domains such as online advertising, financial market trading, and clinical trials. The paper discusses the critical role of decision theory and probabilistic models in MAB problems, focusing on the balance between exploration and exploitation strategies. Recent advancements in Contextual Bandits, non-stationary reward distributions, and Multi-agent Bandits are examined, showcasing the ongoing evolution and adaptability of MAB problems.
Gli stili APA, Harvard, Vancouver, ISO e altri
16

Wang, Kun. "Conservative Contextual Combinatorial Cascading Bandit". IEEE Access 9 (2021): 151434–43. http://dx.doi.org/10.1109/access.2021.3124416.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
17

Elwood, Adam, Marco Leonardi, Ashraf Mohamed e Alessandro Rozza. "Maximum Entropy Exploration in Contextual Bandits with Neural Networks and Energy Based Models". Entropy 25, n. 2 (18 gennaio 2023): 188. http://dx.doi.org/10.3390/e25020188.

Testo completo
Abstract (sommario):
Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration–exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform standard baseline algorithms, such as NN HMC, NN Discrete, Upper Confidence Bound, and Thompson Sampling, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.
Gli stili APA, Harvard, Vancouver, ISO e altri
18

Baheri, Ali. "Multilevel Constrained Bandits: A Hierarchical Upper Confidence Bound Approach with Safety Guarantees". Mathematics 13, n. 1 (3 gennaio 2025): 149. https://doi.org/10.3390/math13010149.

Testo completo
Abstract (sommario):
The multi-armed bandit (MAB) problem is a foundational model for sequential decision-making under uncertainty. While MAB has proven valuable in applications such as clinical trials and online advertising, traditional formulations have limitations; specifically, they struggle to handle three key real-world scenarios: (1) when decisions must follow a hierarchical structure (as in autonomous systems where high-level strategy guides low-level actions); (2) when there are constraints at multiple levels of decision-making (such as both system-wide and component-level resource limits); and (3) when available actions depend on previous choices or context. To address these challenges, we introduce the hierarchical constrained bandits (HCB) framework, which extends contextual bandits to incorporate both hierarchical decisions and multilevel constraints. We propose the HC-UCB (hierarchical constrained upper confidence bound) algorithm to solve the HCB problem. The algorithm uses confidence bounds within a hierarchical setting to balance exploration and exploitation while respecting constraints at all levels. Our theoretical analysis establishes that HC-UCB achieves sublinear regret, guarantees constraint satisfaction at all hierarchical levels, and is near-optimal in terms of achievable performance. Simple experimental results demonstrate the algorithm’s effectiveness in balancing reward maximization with constraint satisfaction.
Gli stili APA, Harvard, Vancouver, ISO e altri
19

Strong, Emily, Bernard Kleynhans e Serdar Kadıoğlu. "MABWISER: Parallelizable Contextual Multi-armed Bandits". International Journal on Artificial Intelligence Tools 30, n. 04 (giugno 2021): 2150021. http://dx.doi.org/10.1142/s0218213021500214.

Testo completo
Abstract (sommario):
Contextual multi-armed bandit algorithms are an effective approach for online sequential decision-making problems. However, there are limited tools available to support their adoption in the community. To fill this gap, we present an open-source Python library with context-free, parametric and non-parametric contextual multi-armed bandit algorithms. The MABWiser library is designed to be user-friendly and supports custom bandit algorithms for specific applications. Our design provides built-in parallelization to speed up training and testing for scalability with special attention given to ensuring the reproducibility of results. The API makes hybrid strategies possible that combine non-parametric policies with parametric ones, an area that is not explored in the literature. As a practical application, we demonstrate using the library in both batch and online simulations for context-free, parametric and non-parametric contextual policies with the well-known MovieLens data set. Finally, we quantify the performance benefits of built-in parallelization.
Gli stili APA, Harvard, Vancouver, ISO e altri
20

Lee, Kyungbok, Myunghee Cho Paik, Min-hwan Oh e Gi-Soo Kim. "Mixed-Effects Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 12 (24 marzo 2024): 13409–17. http://dx.doi.org/10.1609/aaai.v38i12.29243.

Testo completo
Abstract (sommario):
We study a novel variant of a contextual bandit problem with multi-dimensional reward feedback formulated as a mixed-effects model, where the correlations between multiple feedback are induced by sharing stochastic coefficients called random effects. We propose a novel algorithm, Mixed-Effects Contextual UCB (ME-CUCB), achieving tildeO(d sqrt(mT)) regret bound after T rounds where d is the dimension of contexts and m is the dimension of outcomes, with either known or unknown covariance structure. This is a tighter regret bound than that of the naive canonical linear bandit algorithm ignoring the correlations among rewards. We prove a lower bound of Omega(d sqrt(mT)) matching the upper bound up to logarithmic factors. To our knowledge, this is the first work providing a regret analysis for mixed-effects models and algorithms involving weighted least-squares estimators. Our theoretical analysis faces a significant technical challenge in that the error terms do not constitute martingales since the weights depend on the rewards. We overcome this challenge by using covering numbers, of theoretical interest in its own right. We provide numerical experiments demonstrating the advantage of our proposed algorithm, supporting the theoretical claims.
Gli stili APA, Harvard, Vancouver, ISO e altri
21

Oh, Min-hwan, e Garud Iyengar. "Multinomial Logit Contextual Bandits: Provable Optimality and Practicality". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 10 (18 maggio 2021): 9205–13. http://dx.doi.org/10.1609/aaai.v35i10.17111.

Testo completo
Abstract (sommario):
We consider a sequential assortment selection problem where the user choice is given by a multinomial logit (MNL) choice model whose parameters are unknown. In each period, the learning agent observes a d-dimensional contextual information about the user and the N available items, and offers an assortment of size K to the user, and observes the bandit feedback of the item chosen from the assortment. We propose upper confidence bound based algorithms for this MNL contextual bandit. The first algorithm is a simple and practical method that achieves an O(d√T) regret over T rounds. Next, we propose a second algorithm which achieves a O(√dT) regret. This matches the lower bound for the MNL bandit problem, up to logarithmic terms, and improves on the best-known result by a √d factor. To establish this sharper regret bound, we present a non-asymptotic confidence bound for the maximum likelihood estimator of the MNL model that may be of independent interest as its own theoretical contribution. We then revisit the simpler, significantly more practical, first algorithm and show that a simple variant of the algorithm achieves the optimal regret for a broad class of important applications.
Gli stili APA, Harvard, Vancouver, ISO e altri
22

Zhao, Yisen. "Enhancing conversational recommendation systems through the integration of KNN with ConLinUCB contextual bandits". Applied and Computational Engineering 68, n. 1 (6 giugno 2024): 8–16. http://dx.doi.org/10.54254/2755-2721/68/20241388.

Testo completo
Abstract (sommario):
In recommender system research, contextual multi-armed bandits have shown promise in delivering tailored recommendations by utilizing contextual data. However, their effectiveness is often curtailed by the cold start problem, arising from the lack of initial user data. This necessitates extensive exploration to ascertain user preferences, consequently impeding the speed of learning. The advent of conversational recommendation systems offers a solution. Through these systems, the conversational contextual bandit algorithm swiftly learns user preferences for specific key-terms via interactive dialogues, thereby enhancing the learning pace. Despite these advancements, there are limitations in current methodologies. A primary issue is the suboptimal integration of data from key-term-centric dialogues and arm-level recommendations, which could otherwise expedite the learning process. Another crucial aspect is the strategic suggestion of exploratory key phrases. These phrases are essential in quickly uncovering users potential interests in various domains, thus accelerating the convergence of accurate user preference models. Addressing these challenges, the ConLinUCB framework emerges as a groundbreaking solution. It ingeniously combines feedback from both arm-level and key-term-level interactions, significantly optimizing the learning trajectory. Building upon this, the framework integrates a K-nearest neighbour (KNN) approach to refine key-term selection and arm recommendations. This integration hinges on the similarity of user preferences, further hastening the convergence of the parameter vectors.
Gli stili APA, Harvard, Vancouver, ISO e altri
23

Chen, Qiufan. "A survey on contextual multi-armed bandits". Applied and Computational Engineering 53, n. 1 (28 marzo 2024): 287–95. http://dx.doi.org/10.54254/2755-2721/53/20241593.

Testo completo
Abstract (sommario):
As a powerful reinforcement learning framework, Contextual Multi-Armed Bandits have extensive applications in various domains. The models of Contextual Multi-Armed Bandits enable decision-makers to make intelligent choices in situations with uncertainty, and they find utility in fields such as online advertising, medical treatment optimization, resource allocation, and more. This paper reviews the evolution of algorithms for Contextual Multi-Armed Bandits, including traditional Bayesian approaches and the latest deep learning techniques. Successful case studies are summarized in different application domains, such as online ad click-through rate optimization and medical decision support. Furthermore, the author discusses future research directions, including more sophisticated context modeling, interpretability, fairness issues, and ethical considerations in the context of automated decision-making.
Gli stili APA, Harvard, Vancouver, ISO e altri
24

Mohaghegh Neyshabouri, Mohammadreza, Kaan Gokcesu, Hakan Gokcesu, Huseyin Ozkan e Suleyman Serdar Kozat. "Asymptotically Optimal Contextual Bandit Algorithm Using Hierarchical Structures". IEEE Transactions on Neural Networks and Learning Systems 30, n. 3 (marzo 2019): 923–37. http://dx.doi.org/10.1109/tnnls.2018.2854796.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
25

Gu, Haoran, Yunni Xia, Hong Xie, Xiaoyu Shi e Mingsheng Shang. "Robust and efficient algorithms for conversational contextual bandit". Information Sciences 657 (febbraio 2024): 119993. http://dx.doi.org/10.1016/j.ins.2023.119993.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
26

Narita, Yusuke, Shota Yasui e Kohei Yata. "Efficient Counterfactual Learning from Bandit Feedback". Proceedings of the AAAI Conference on Artificial Intelligence 33 (17 luglio 2019): 4634–41. http://dx.doi.org/10.1609/aaai.v33i01.33014634.

Testo completo
Abstract (sommario):
What is the most statistically efficient way to do off-policy optimization with batch data from bandit feedback? For log data generated by contextual bandit algorithms, we consider offline estimators for the expected reward from a counterfactual policy. Our estimators are shown to have lowest variance in a wide class of estimators, achieving variance reduction relative to standard estimators. We then apply our estimators to improve advertisement design by a major advertisement company. Consistent with the theoretical result, our estimators allow us to improve on the existing bandit algorithm with more statistical confidence compared to a state-of-theart benchmark.
Gli stili APA, Harvard, Vancouver, ISO e altri
27

Li, Zhaoyu, e Qian Ai. "Managing Considerable Distributed Resources for Demand Response: A Resource Selection Strategy Based on Contextual Bandit". Electronics 12, n. 13 (23 giugno 2023): 2783. http://dx.doi.org/10.3390/electronics12132783.

Testo completo
Abstract (sommario):
The widespread adoption of distributed energy resources (DERs) leads to resource redundancy in grid operation and increases computation complexity, which underscores the need for effective resource management strategies. In this paper, we present a novel resource management approach that decouples the resource selection and power dispatch tasks. The resource selection task determines the subset of resources designated to participate in the demand response service, while the power dispatch task determines the power output of the selected candidates. A solution strategy based on contextual bandit with DQN structure is then proposed. Concretely, an agent determines the resource selection action, while the power dispatch task is solved in the environment. The negative value of the operational cost is used as feedback to the agent, which links the two tasks in a closed-loop manner. Moreover, to cope with the uncertainty in the power dispatch problem, distributionally robust optimization (DRO) is applied for the reserve settlement to satisfy the reliability requirement against this uncertainty. Numerical studies demonstrate that the DQN-based contextual bandit approach can achieve a profit enhancement ranging from 0.35% to 46.46% compared to the contextual bandit with policy gradient approach under different resource selection quantities.
Gli stili APA, Harvard, Vancouver, ISO e altri
28

Huang, Wen, e Xintao Wu. "Robustly Improving Bandit Algorithms with Confounded and Selection Biased Offline Data: A Causal Approach". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 18 (24 marzo 2024): 20438–46. http://dx.doi.org/10.1609/aaai.v38i18.30027.

Testo completo
Abstract (sommario):
This paper studies bandit problems where an agent has access to offline data that might be utilized to potentially improve the estimation of each arm’s reward distribution. A major obstacle in this setting is the existence of compound biases from the observational data. Ignoring these biases and blindly fitting a model with the biased data could even negatively affect the online learning phase. In this work, we formulate this problem from a causal perspective. First, we categorize the biases into confounding bias and selection bias based on the causal structure they imply. Next, we extract the causal bound for each arm that is robust towards compound biases from biased observational data. The derived bounds contain the ground truth mean reward and can effectively guide the bandit agent to learn a nearly-optimal decision policy. We also conduct regret analysis in both contextual and non-contextual bandit settings and show that prior causal bounds could help consistently reduce the asymptotic regret.
Gli stili APA, Harvard, Vancouver, ISO e altri
29

Spieker, Helge, e Arnaud Gotlieb. "Adaptive metamorphic testing with contextual bandits". Journal of Systems and Software 165 (luglio 2020): 110574. http://dx.doi.org/10.1016/j.jss.2020.110574.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
30

Jagerman, Rolf, Ilya Markov e Maarten De Rijke. "Safe Exploration for Optimizing Contextual Bandits". ACM Transactions on Information Systems 38, n. 3 (26 giugno 2020): 1–23. http://dx.doi.org/10.1145/3385670.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
31

Kakadiya, Ashutosh, Sriraam Natarajan e Balaraman Ravindran. "Relational Boosted Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 13 (18 maggio 2021): 12123–30. http://dx.doi.org/10.1609/aaai.v35i13.17439.

Testo completo
Abstract (sommario):
Contextual bandits algorithms have become essential in real-world user interaction problems in recent years. However, these algorithms represent context as attribute value representation, which makes them infeasible for real world domains like social networks, which are inherently relational. We propose Relational Boosted Bandits (RB2), a contextual bandits algorithm for relational domains based on (relational) boosted trees. RB2 enables us to learn interpretable and explainable models due to the more descriptive nature of the relational representation. We empirically demonstrate the effectiveness and interpretability of RB2 on tasks such as link prediction, relational classification, and recommendation.
Gli stili APA, Harvard, Vancouver, ISO e altri
32

Seifi, Farshad, e Seyed Taghi Akhavan Niaki. "Optimizing contextual bandit hyperparameters: A dynamic transfer learning-based framework". International Journal of Industrial Engineering Computations 15, n. 4 (2024): 951–64. http://dx.doi.org/10.5267/j.ijiec.2024.6.003.

Testo completo
Abstract (sommario):
The stochastic contextual bandit problem, recognized for its effectiveness in navigating the classic exploration-exploitation dilemma through ongoing player-environment interactions, has found broad applications across various industries. This utility largely stems from the algorithms’ ability to accurately forecast reward functions and maintain an optimal balance between exploration and exploitation, contingent upon the precise selection and calibration of hyperparameters. However, the inherently dynamic and real-time nature of bandit environments significantly complicates hyperparameter tuning, rendering traditional offline methods inadequate. While specialized methods have been developed to overcome these challenges, they often face three primary issues: difficulty in adaptively learning hyperparameters in ever-changing environments, inability to simultaneously optimize multiple hyperparameters for complex models, and inefficiencies in data utilization and knowledge transfer from analogous tasks. To tackle these hurdles, this paper introduces an innovative transfer learning-based approach designed to harness past task knowledge for accelerated optimization and dynamically optimize multiple hyperparameters, making it well-suited for fluctuating environments. The method employs a dual Gaussian meta-model strategy—one for transfer learning and the other for assessing hyperparameters’ performance within the current task —enabling it to leverage insights from previous tasks while quickly adapting to new environmental changes. Furthermore, the framework’s meta-model-centric architecture enables simultaneous optimization of multiple hyperparameters. Experimental evaluations demonstrate that this approach markedly outperforms competing methods in scenarios with perturbations and exhibits superior performance in 70% of stationary cases while matching performance in the remaining 30%. This superiority in performance, coupled with its computational efficiency on par with existing alternatives, positions it as a superior and practical solution for optimizing hyperparameters in contextual bandit settings.
Gli stili APA, Harvard, Vancouver, ISO e altri
33

Zhao, Yafei, e Long Yang. "Constrained contextual bandit algorithm for limited-budget recommendation system". Engineering Applications of Artificial Intelligence 128 (febbraio 2024): 107558. http://dx.doi.org/10.1016/j.engappai.2023.107558.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
34

Yang, Jianyi, e Shaolei Ren. "Robust Bandit Learning with Imperfect Context". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 12 (18 maggio 2021): 10594–602. http://dx.doi.org/10.1609/aaai.v35i12.17267.

Testo completo
Abstract (sommario):
A standard assumption in contextual multi-arm bandit is that the true context is perfectly known before arm selection. Nonetheless, in many practical applications (e.g., cloud resource management), prior to arm selection, the context information can only be acquired by prediction subject to errors or adversarial modification. In this paper, we study a novel contextual bandit setting in which only imperfect context is available for arm selection while the true context is revealed at the end of each round. We propose two robust arm selection algorithms: MaxMinUCB (Maximize Minimum UCB) which maximizes the worst-case reward, and MinWD (Minimize Worst-case Degradation) which minimizes the worst-case regret. Importantly, we analyze the robustness of MaxMinUCB and MinWD by deriving both regret and reward bounds compared to an oracle that knows the true context. Our results show that as time goes on, MaxMinUCB and MinWD both perform as asymptotically well as their optimal counterparts that know the reward function. Finally, we apply MaxMinUCB and MinWD to online edge datacenter selection, and run synthetic simulations to validate our theoretical analysis.
Gli stili APA, Harvard, Vancouver, ISO e altri
35

Liu, Zizhuo. "Investigation of progress and application related to Multi-Armed Bandit algorithms". Applied and Computational Engineering 37, n. 1 (22 gennaio 2024): 155–59. http://dx.doi.org/10.54254/2755-2721/37/20230496.

Testo completo
Abstract (sommario):
This paper discusses four Multi-armed Bandit algorithms: Explore-then-Commit (ETC), Epsilon-Greedy, Upper Confidence Bound (UCB), and Thompson Sampling algorithm. ETC algorithm aims to spend the majority of rounds on the best arm, but it can lead to a suboptimal outcome if the environment changes rapidly. The Epsilon-Greedy algorithm is designed to explore and exploit simultaneously, while it often tries sub-optimal arm even after the algorithm finds the best arm. Thus, the Epsilon-Greedy algorithm performs well when the environment continuously changes. UCB algorithm is one of the most used Multi-armed Bandit algorithms because it can rapidly narrow the potential optimal decisions in a wide range of scenarios; however, the algorithm can be influenced by some specific pattern of reward distribution or noise presenting in the environment. Thompson Sampling algorithm is also one of the most common algorithms in the Multi-armed Bandit algorithm due to its simplicity, effectiveness, and adaptability to various reward distributions. The Thompson Sampling algorithm performs well in multiple scenarios because it explores and exploits simultaneously, but its variance is greater than the three algorithms mentioned above. Today, Multi-armed bandit algorithms are widely used in advertisement, health care, and website and app optimization. Finally, the Multi-armed Bandit algorithms are rapidly replacing the traditional algorithms; in the future, the advanced Multi-armed Bandit algorithm, contextual Multi-armed Bandit algorithm, will gradually replace the old one.
Gli stili APA, Harvard, Vancouver, ISO e altri
36

Semenov, Alexander, Maciej Rysz, Gaurav Pandey e Guanglin Xu. "Diversity in news recommendations using contextual bandits". Expert Systems with Applications 195 (giugno 2022): 116478. http://dx.doi.org/10.1016/j.eswa.2021.116478.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
37

Sui, Guoxin, e Yong Yu. "Bayesian Contextual Bandits for Hyper Parameter Optimization". IEEE Access 8 (2020): 42971–79. http://dx.doi.org/10.1109/access.2020.2977129.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
38

Tekin, Cem, e Mihaela van der Schaar. "Distributed Online Learning via Cooperative Contextual Bandits". IEEE Transactions on Signal Processing 63, n. 14 (luglio 2015): 3700–3714. http://dx.doi.org/10.1109/tsp.2015.2430837.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
39

Qin, Yuzhen, Yingcong Li, Fabio Pasqualetti, Maryam Fazel e Samet Oymak. "Stochastic Contextual Bandits with Long Horizon Rewards". Proceedings of the AAAI Conference on Artificial Intelligence 37, n. 8 (26 giugno 2023): 9525–33. http://dx.doi.org/10.1609/aaai.v37i8.26140.

Testo completo
Abstract (sommario):
The growing interest in complex decision-making and language modeling problems highlights the importance of sample-efficient learning over very long horizons. This work takes a step in this direction by investigating contextual linear bandits where the current reward depends on at most s prior actions and contexts (not necessarily consecutive), up to a time horizon of h. In order to avoid polynomial dependence on h, we propose new algorithms that leverage sparsity to discover the dependence pattern and arm parameters jointly. We consider both the data-poor (T= h) regimes and derive respective regret upper bounds O(d square-root(sT) +min(q, T) and O( square-root(sdT) ), with sparsity s, feature dimension d, total time horizon T, and q that is adaptive to the reward dependence pattern. Complementing upper bounds, we also show that learning over a single trajectory brings inherent challenges: While the dependence pattern and arm parameters form a rank-1 matrix, circulant matrices are not isometric over rank-1 manifolds and sample complexity indeed benefits from the sparse reward dependence structure. Our results necessitate a new analysis to address long-range temporal dependencies across data and avoid polynomial dependence on the reward horizon h. Specifically, we utilize connections to the restricted isometry property of circulant matrices formed by dependent sub-Gaussian vectors and establish new guarantees that are also of independent interest.
Gli stili APA, Harvard, Vancouver, ISO e altri
40

Xu, Xiao, Fang Dong, Yanghua Li, Shaojian He e Xin Li. "Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests". Proceedings of the AAAI Conference on Artificial Intelligence 34, n. 04 (3 aprile 2020): 6518–25. http://dx.doi.org/10.1609/aaai.v34i04.6125.

Testo completo
Abstract (sommario):
A contextual bandit problem is studied in a highly non-stationary environment, which is ubiquitous in various recommender systems due to the time-varying interests of users. Two models with disjoint and hybrid payoffs are considered to characterize the phenomenon that users' preferences towards different items vary differently over time. In the disjoint payoff model, the reward of playing an arm is determined by an arm-specific preference vector, which is piecewise-stationary with asynchronous and distinct changes across different arms. An efficient learning algorithm that is adaptive to abrupt reward changes is proposed and theoretical regret analysis is provided to show that a sublinear scaling of regret in the time length T is achieved. The algorithm is further extended to a more general setting with hybrid payoffs where the reward of playing an arm is determined by both an arm-specific preference vector and a joint coefficient vector shared by all arms. Empirical experiments are conducted on real-world datasets to verify the advantages of the proposed learning algorithms against baseline ones in both settings.
Gli stili APA, Harvard, Vancouver, ISO e altri
41

Tekin, Cem, e Eralp Turgay. "Multi-objective Contextual Multi-armed Bandit With a Dominant Objective". IEEE Transactions on Signal Processing 66, n. 14 (15 luglio 2018): 3799–813. http://dx.doi.org/10.1109/tsp.2018.2841822.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
42

Yoon, Gyugeun, e Joseph Y. J. Chow. "Contextual Bandit-Based Sequential Transit Route Design under Demand Uncertainty". Transportation Research Record: Journal of the Transportation Research Board 2674, n. 5 (maggio 2020): 613–25. http://dx.doi.org/10.1177/0361198120917388.

Testo completo
Abstract (sommario):
While public transit network design has a wide literature, the study of line planning and route generation under uncertainty is not so well covered. Such uncertainty is present in planning for emerging transit technologies or operating models in which demand data is largely unavailable to make predictions on. In such circumstances, this paper proposes a sequential route generation process in which an operator periodically expands the route set and receives ridership feedback. Using this sensor loop, a reinforcement learning-based route generation methodology is proposed to support line planning for emerging technologies. The method makes use of contextual bandit problems to explore different routes to invest in while optimizing the operating cost or demand served. Two experiments are conducted. They (1) prove that the algorithm is better than random choice; and (2) show good performance with a gap of 3.7% relative to a heuristic solution to an oracle policy.
Gli stili APA, Harvard, Vancouver, ISO e altri
43

Li, Litao. "Exploring Multi-Armed Bandit algorithms: Performance analysis in dynamic environments". Applied and Computational Engineering 34, n. 1 (22 gennaio 2024): 252–59. http://dx.doi.org/10.54254/2755-2721/34/20230338.

Testo completo
Abstract (sommario):
The Multi-armed Bandit algorithm, a proficient solver of the exploration-and-exploitation trade-off predicament, furnishes businesses with a robust tool for resource allocation that predominantly aligns with customer preferences. However, varying Multi-armed Bandit algorithm types exhibit dissimilar performance characteristics based on contextual variations. Hence, a series of experiments is imperative, involving alterations to input values across distinct algorithms. Within this study, three specific algorithms were applied, Explore-then-commit (ETC), Upper Confident Bound (UCB) and its asymptotically optimal variant, and Thompson Sampling (TS), to the extensively utilized MovieLens dataset. This application aimed to gauge their effectiveness comprehensively. The algorithms were translated into executable code, and their performance was visually depicted through multiple figures. Through cumulative regret tracking within defined conditions, algorithmic performance was scrutinized, laying the groundwork for subsequent parameter-based comparisons. A dedicated experimentation framework was devised to evaluate the robustness of each algorithm, involving deliberate parameter adjustments and tailored experiments to elucidate distinct performance nuances. The ensuing graphical depictions distinctly illustrated Thompson Sampling's persistent minimal regrets across most scenarios. UCB algorithms displayed steadfast stability. ETC manifested excellent performance with a low number of runs but escalate significantly along the number of runs growing. It also warranting constraints on exploratory phases to mitigate regrets. This investigation underscores the efficacy of Multi-armed Bandit algorithms while elucidating their nuanced behaviors within diverse contextual contingencies.
Gli stili APA, Harvard, Vancouver, ISO e altri
44

Zhu, Tan, Guannan Liang, Chunjiang Zhu, Haining Li e Jinbo Bi. "An Efficient Algorithm for Deep Stochastic Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 35, n. 12 (18 maggio 2021): 11193–201. http://dx.doi.org/10.1609/aaai.v35i12.17335.

Testo completo
Abstract (sommario):
In stochastic contextual bandit (SCB) problems, an agent selects an action based on certain observed context to maximize the cumulative reward over iterations. Recently there have been a few studies using a deep neural network (DNN) to predict the expected reward for an action, and the DNN is trained by a stochastic gradient based method. However, convergence analysis has been greatly ignored to examine whether and where these methods converge. In this work, we formulate the SCB that uses a DNN reward function as a non-convex stochastic optimization problem, and design a stage-wised stochastic gradient descent algorithm to optimize the problem and determine the action policy. We prove that with high probability, the action sequence chosen by our algorithm converges to a greedy action policy respecting a local optimal reward function. Extensive experiments have been performed to demonstrate the effectiveness and efficiency of the proposed algorithm on multiple real-world datasets.
Gli stili APA, Harvard, Vancouver, ISO e altri
45

Martín H., José Antonio, e Ana M. Vargas. "Linear Bayes policy for learning in contextual-bandits". Expert Systems with Applications 40, n. 18 (dicembre 2013): 7400–7406. http://dx.doi.org/10.1016/j.eswa.2013.07.041.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
46

Raghavan, Manish, Aleksandrs Slivkins, Jennifer Wortman Vaughan e Zhiwei Steven Wu. "Greedy Algorithm Almost Dominates in Smoothed Contextual Bandits". SIAM Journal on Computing 52, n. 2 (12 aprile 2023): 487–524. http://dx.doi.org/10.1137/19m1247115.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
47

Ayala-Romero, Jose A., Andres Garcia-Saavedra e Xavier Costa-Perez. "Risk-Aware Continuous Control with Neural Contextual Bandits". Proceedings of the AAAI Conference on Artificial Intelligence 38, n. 19 (24 marzo 2024): 20930–38. http://dx.doi.org/10.1609/aaai.v38i19.30083.

Testo completo
Abstract (sommario):
Recent advances in learning techniques have garnered attention for their applicability to a diverse range of real-world sequential decision-making problems. Yet, many practical applications have critical constraints for operation in real environments. Most learning solutions often neglect the risk of failing to meet these constraints, hindering their implementation in real-world contexts. In this paper, we propose a risk-aware decision-making framework for contextual bandit problems, accommodating constraints and continuous action spaces. Our approach employs an actor multi-critic architecture, with each critic characterizing the distribution of performance and constraint metrics. Our framework is designed to cater to various risk levels, effectively balancing constraint satisfaction against performance. To demonstrate the effectiveness of our approach, we first compare it against state-of-the-art baseline methods in a synthetic environment, highlighting the impact of intrinsic environmental noise across different risk configurations. Finally, we evaluate our framework in a real-world use case involving a 5G mobile network where only our approach satisfies consistently the system constraint (a signal processing reliability target) with a small performance toll (8.5% increase in power consumption).
Gli stili APA, Harvard, Vancouver, ISO e altri
48

Pilani, Akshay, Kritagya Mathur, Himanshu Agrawal, Deeksha Chandola, Vinay Anand Tikkiwal e Arun Kumar. "Contextual Bandit Approach-based Recommendation System for Personalized Web-based Services". Applied Artificial Intelligence 35, n. 7 (6 aprile 2021): 489–504. http://dx.doi.org/10.1080/08839514.2021.1883855.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
49

Li, Xinbin, Jiajia Liu, Lei Yan, Song Han e Xinping Guan. "Relay Selection in Underwater Acoustic Cooperative Networks: A Contextual Bandit Approach". IEEE Communications Letters 21, n. 2 (febbraio 2017): 382–85. http://dx.doi.org/10.1109/lcomm.2016.2625300.

Testo completo
Gli stili APA, Harvard, Vancouver, ISO e altri
50

Gisselbrecht, Thibault, Sylvain Lamprier e Patrick Gallinari. "Dynamic Data Capture from Social Media Streams: A Contextual Bandit Approach". Proceedings of the International AAAI Conference on Web and Social Media 10, n. 1 (4 agosto 2021): 131–40. http://dx.doi.org/10.1609/icwsm.v10i1.14734.

Testo completo
Abstract (sommario):
Social Media usually provide streaming data access that enable dynamic capture of the social activity of their users. Leveraging such APIs for collecting social data that satisfy a given pre-defined need may constitute a complex task, that implies careful stream selections. With user-centered streams, it indeed comes down to the problem of choosing which users to follow in order to maximize the utility of the collected data w.r.t. the need. On large social media, this represents a very challenging task due to the huge number of potential targets and restricted access to the data. Because of the intrinsic non-stationarity of user's behavior, a relevant target today might be irrelevant tomorrow, which represents a major difficulty to apprehend. In this paper, we propose a new approach that anticipates which profiles are likely to publish relevant contents - given a predefined need - in the future, and dynamically selects a subset of accounts to follow at each iteration. Our method has the advantage to take into account both API restrictions and the dynamics of users' behaviors. We formalize the task as a contextual bandit problem with multiple actions selection. We finally conduct experiments on Twitter, which demonstrate the empirical effectiveness of our approach in real-world settings.
Gli stili APA, Harvard, Vancouver, ISO e altri
Offriamo sconti su tutti i piani premium per gli autori le cui opere sono incluse in raccolte letterarie tematiche. Contattaci per ottenere un codice promozionale unico!

Vai alla bibliografia