Relevant bibliographies by topics / Bandit Contextuel

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Academic literature on the topic 'Bandit Contextuel'

Author: Grafiati

Published: 11 January 2025

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bandit Contextuel.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Bandit Contextuel"

Gisselbrecht, Thibault, Sylvain Lamprier, and Patrick Gallinari. "Collecte ciblée à partir de ﬂux de données en ligne dans les médias sociaux. Une approche de bandit contextuel." Document numérique 19, no. 2-3 (December 30, 2016): 11–30. http://dx.doi.org/10.3166/dn.19.2-3.11-30.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dimakopoulou, Maria, Zhengyuan Zhou, Susan Athey, and Guido Imbens. "Balanced Linear Contextual Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 33 (July 17, 2019): 3445–53. http://dx.doi.org/10.1609/aaai.v33i01.33013445.

Full text

Abstract:

Contextual bandit algorithms are sensitive to the estimation method of the outcome model as well as the exploration method used, particularly in the presence of rich heterogeneity or complex outcome models, which can lead to difficult estimation problems along the path of learning. We develop algorithms for contextual bandits with linear payoffs that integrate balancing methods from the causal inference literature in their estimation to make it less prone to problems of estimation bias. We provide the first regret bound analyses for linear contextual bandits with balancing and show that our algorithms match the state of the art theoretical guarantees. We demonstrate the strong practical advantage of balanced contextual bandits on a large number of supervised learning datasets and on a synthetic example that simulates model misspecification and prejudice in the initial training data.

APA, Harvard, Vancouver, ISO, and other styles

Tong, Ruoyi. "A survey of the application and technical improvement of the multi-armed bandit." Applied and Computational Engineering 77, no. 1 (July 16, 2024): 25–31. http://dx.doi.org/10.54254/2755-2721/77/20240631.

Full text

Abstract:

In recent years, the multi-armed bandit (MAB) model has been widely used and has shown excellent performance. This article provides an overview of the applications and technical improvements of the multi-armed bandit machine problem. First, an overview of the multi-armed bandit problem is presented, including the explanation of a general modeling approach and several existing common algorithms, such as -greedy, ETC, UCB, and Thompson sampling. Then, the real-life applications of the multi-armed bandit model are explored, covering the fields of recommender systems, healthcare, and finance. Then, some improved algorithms and models are summarized by addressing the problems encountered in different application domains, including the multi-armed bandit considering multiple objectives, the mortal multi-armed bandits, the multi-armed bandit considering contextual side information, combinatorial multi-armed bandits. Finally, the characteristics, trends of changes among different algorithms, and applicable scenarios are summarized and discussed.

APA, Harvard, Vancouver, ISO, and other styles

Yang, Luting, Jianyi Yang, and Shaolei Ren. "Contextual Bandits with Delayed Feedback and Semi-supervised Learning (Student Abstract)." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 15943–44. http://dx.doi.org/10.1609/aaai.v35i18.17968.

Full text

Abstract:

Contextual multi-armed bandit (MAB) is a classic online learning problem, where a learner/agent selects actions (i.e., arms) given contextual information and discovers optimal actions based on reward feedback. Applications of contextual bandit have been increasingly expanding, including advertisement, personalization, resource allocation in wireless networks, among others. Nonetheless, the reward feedback is delayed in many applications (e.g., a user may only provide service ratings after a period of time), creating challenges for contextual bandits. In this paper, we address delayed feedback in contextual bandits by using semi-supervised learning — incorporate estimates of delayed rewards to improve the estimation of future rewards. Concretely, the reward feedback for an arm selected at the beginning of a round is only observed by the agent/learner with some observation noise and provided to the agent after some a priori unknown but bounded delays. Motivated by semi-supervised learning that produces pseudo labels for unlabeled data to further improve the model performance, we generate fictitious estimates of rewards that are delayed and have yet to arrive based on already-learnt reward functions. Thus, by combining semi-supervised learning with online contextual bandit learning, we propose a novel extension and design two algorithms, which estimate the values for currently unavailable reward feedbacks to minimize the maximum estimation error and average estimation error, respectively.

APA, Harvard, Vancouver, ISO, and other styles

Sharaf, Amr, and Hal Daumé III. "Meta-Learning Effective Exploration Strategies for Contextual Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 11 (May 18, 2021): 9541–48. http://dx.doi.org/10.1609/aaai.v35i11.17149.

Full text

Abstract:

In contextual bandits, an algorithm must choose actions given ob- served contexts, learning from a reward signal that is observed only for the action chosen. This leads to an exploration/exploitation trade-off: the algorithm must balance taking actions it already believes are good with taking new actions to potentially discover better choices. We develop a meta-learning algorithm, Mêlée, that learns an exploration policy based on simulated, synthetic con- textual bandit tasks. Mêlée uses imitation learning against these simulations to train an exploration policy that can be applied to true contextual bandit tasks at test time. We evaluate Mêlée on both a natural contextual bandit problem derived from a learning to rank dataset as well as hundreds of simulated contextual ban- dit problems derived from classification tasks. Mêlée outperforms seven strong baselines on most of these datasets by leveraging a rich feature representation for learning an exploration strategy.

APA, Harvard, Vancouver, ISO, and other styles

Du, Yihan, Siwei Wang, and Longbo Huang. "A One-Size-Fits-All Solution to Conservative Bandit Problems." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 8 (May 18, 2021): 7254–61. http://dx.doi.org/10.1609/aaai.v35i8.16891.

Full text

Abstract:

In this paper, we study a family of conservative bandit problems (CBPs) with sample-path reward constraints, i.e., the learner's reward performance must be at least as well as a given baseline at any time. We propose a One-Size-Fits-All solution to CBPs and present its applications to three encompassed problems, i.e. conservative multi-armed bandits (CMAB), conservative linear bandits (CLB) and conservative contextual combinatorial bandits (CCCB). Different from previous works which consider high probability constraints on the expected reward, we focus on a sample-path constraint on the actually received reward, and achieve better theoretical guarantees (T-independent additive regrets instead of T-dependent) and empirical performance. Furthermore, we extend the results and consider a novel conservative mean-variance bandit problem (MV-CBP), which measures the learning performance with both the expected reward and variability. For this extended problem, we provide a novel algorithm with O(1/T) normalized additive regrets (T-independent in the cumulative form) and validate this result through empirical evaluation.

APA, Harvard, Vancouver, ISO, and other styles

Varatharajah, Yogatheesan, and Brent Berry. "A Contextual-Bandit-Based Approach for Informed Decision-Making in Clinical Trials." Life 12, no. 8 (August 21, 2022): 1277. http://dx.doi.org/10.3390/life12081277.

Full text

Abstract:

Clinical trials are conducted to evaluate the efficacy of new treatments. Clinical trials involving multiple treatments utilize the randomization of treatment assignments to enable the evaluation of treatment efficacies in an unbiased manner. Such evaluation is performed in post hoc studies that usually use supervised-learning methods that rely on large amounts of data collected in a randomized fashion. That approach often proves to be suboptimal in that some participants may suffer and even die as a result of having not received the most appropriate treatments during the trial. Reinforcement-learning methods improve the situation by making it possible to learn the treatment efficacies dynamically during the course of the trial, and to adapt treatment assignments accordingly. Recent efforts using multi-arm bandits, a type of reinforcement-learning method, have focused on maximizing clinical outcomes for a population that was assumed to be homogeneous. However, those approaches have failed to account for the variability among participants that is becoming increasingly evident as a result of recent clinical-trial-based studies. We present a contextual-bandit-based online treatment optimization algorithm that, in choosing treatments for new participants in the study, takes into account not only the maximization of the clinical outcomes as well as the patient characteristics. We evaluated our algorithm using a real clinical trial dataset from the International Stroke Trial. We simulated the online setting by sequentially going through the data of each participant admitted to the trial. Two bandits (one for each context) were created, with four choices of treatments. For a new participant in the trial, depending on the context, one of the bandits was selected. Then, we took three different approaches to choose a treatment: (a) a random choice (i.e., the strategy currently used in clinical trial settings), (b) a Thompson sampling-based approach, and (c) a UCB-based approach. Success probabilities of each context were calculated separately by considering the participants with the same context. Those estimated outcomes were used to update the prior distributions within the bandit corresponding to the context of each participant. We repeated that process through the end of the trial and recorded the outcomes and the chosen treatments for each approach. We also evaluated a context-free multi-arm-bandit-based approach, using the same dataset, to showcase the benefits of our approach. In the context-free case, we calculated the success probabilities for the Bernoulli sampler using the whole clinical trial dataset in a context-independent manner. The results of our retrospective analysis indicate that the proposed approach performs significantly better than either a random assignment of treatments (the current gold standard) or a multi-arm-bandit-based approach, providing substantial gains in the percentage of participants who are assigned the most suitable treatments. The contextual-bandit and multi-arm bandit approaches provide 72.63% and 64.34% gains, respectively, compared to a random assignment.

APA, Harvard, Vancouver, ISO, and other styles

Li, Jialian, Chao Du, and Jun Zhu. "A Bayesian Approach for Subset Selection in Contextual Bandits." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 9 (May 18, 2021): 8384–91. http://dx.doi.org/10.1609/aaai.v35i9.17019.

Full text

Abstract:

Subset selection in Contextual Bandits (CB) is an important task in various applications such as advertisement recommendation. In CB, arms are attached with contexts and thus correlated in the context space. Proper exploration for subset selection in CB should carefully consider the contexts. Previous works mainly concentrate on the best one arm identification in linear bandit problems, where the expected rewards are linearly dependent on the contexts. However, these methods highly rely on linearity, and cannot be easily extended to more general cases. We propose a novel Bayesian approach for subset selection in general CB where the reward functions can be nonlinear. Our method provides a principled way to employ contextual information and efficiently explore the arms. For cases with relatively smooth posteriors, we give theoretical results that are comparable to previous works. For general cases, we provide a calculable approximate variant. Empirical results show the effectiveness of our method on both linear bandits and general CB.

APA, Harvard, Vancouver, ISO, and other styles

Qu, Jiaming. "Survey of dynamic pricing based on Multi-Armed Bandit algorithms." Applied and Computational Engineering 37, no. 1 (January 22, 2024): 160–65. http://dx.doi.org/10.54254/2755-2721/37/20230497.

Full text

Abstract:

Dynamic pricing seeks to determine the most optimal selling price for a product or service, taking into account factors like limited supply and uncertain demand. This study aims to provide a comprehensive exploration of dynamic pricing using the multi-armed bandit problem framework in various contexts. The investigation highlights the prevalence of Thompson sampling in dynamic pricing scenarios with a Bayesian backdrop, where the seller possesses prior knowledge of demand functions. On the other hand, in non-Bayesian situations, the Upper Confidence Bound (UCB) algorithm family gains traction due to their favorable regret bounds. As markets often exhibit temporal fluctuations, the domain of non-stationary multi-armed bandits within dynamic pricing emerges as crucial. Future research directions include enhancing traditional multi-armed bandit algorithms to suit online learning settings, especially those involving dynamic reward distributions. Additionally, merging prior insights into demand functions with contextual multi-armed bandit approaches holds promise for advancing dynamic pricing strategies. In conclusion, this study sheds light on dynamic pricing through the lens of multi-armed bandit problems, offering insights and pathways for further exploration.

APA, Harvard, Vancouver, ISO, and other styles

Atsidakou, Alexia, Constantine Caramanis, Evangelia Gergatsouli, Orestis Papadigenopoulos, and Christos Tzamos. "Contextual Pandora’s Box." Proceedings of the AAAI Conference on Artificial Intelligence 38, no. 10 (March 24, 2024): 10944–52. http://dx.doi.org/10.1609/aaai.v38i10.28969.

Full text

Abstract:

Pandora’s Box is a fundamental stochastic optimization problem, where the decision-maker must find a good alternative, while minimizing the search cost of exploring the value of each alternative. In the original formulation, it is assumed that accurate distributions are given for the values of all the alternatives, while recent work studies the online variant of Pandora’s Box where the distributions are originally unknown. In this work, we study Pandora’s Box in the online setting, while incorporating context. At each round, we are presented with a number of alternatives each having a context, an exploration cost and an unknown value drawn from an unknown distribution that may change at every round. Our main result is a no-regret algorithm that performs comparably well against the optimal algorithm which knows all prior distributions exactly. Our algorithm works even in the bandit setting where the algorithm never learns the values of the alternatives that were not explored. The key technique that enables our result is a novel modification of the realizability condition in contextual bandits that connects a context to a sufficient statistic of each alternative’s distribution (its reservation value) rather than its mean.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Bandit Contextuel"

Sakhi, Otmane. "Offline Contextual Bandit : Theory and Large Scale Applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAG011.

Full text

Abstract:

Cette thèse s'intéresse au problème de l'apprentissage à partir d'interactions en utilisant le cadre du bandit contextuel hors ligne. En particulier, nous nous intéressons à deux sujets connexes : (1) l'apprentissage de politiques hors ligne avec des certificats de performance, et (2) l'apprentissage rapide et efficace de politiques, pour le problème de recommandation à grande échelle. Pour (1), nous tirons d'abord parti des résultats du cadre d'optimisation distributionnellement robuste pour construire des bornes asymptotiques, sensibles à la variance, qui permettent l'évaluation des performances des politiques. Ces bornes nous aident à obtenir de nouveaux objectifs d'apprentissage plus pratiques grâce à leur nature composite et à leur calibrage simple. Nous analysons ensuite le problème d'un point de vue PAC-Bayésien et fournissons des bornes, plus étroites, sur les performances des politiques. Nos résultats motivent de nouvelles stratégies, qui offrent des certificats de performance sur nos politiques avant de les déployer en ligne. Les stratégies nouvellement dérivées s'appuient sur des objectifs d'apprentissage composites qui ne nécessitent pas de réglage supplémentaire. Pour (2), nous proposons d'abord un modèle bayésien hiérarchique, qui combine différents signaux, pour estimer efficacement la qualité de la recommandation. Nous fournissons les outils computationnels appropriés pour adapter l'inférence aux problèmes à grande échelle et démontrons empiriquement les avantages de l'approche dans plusieurs scénarios. Nous abordons ensuite la question de l'accélération des approches communes d'optimisation des politiques, en nous concentrant particulièrement sur les problèmes de recommandation avec des catalogues de millions de produits. Nous dérivons des méthodes d'optimisation, basées sur de nouvelles approximations du gradient calculées en temps logarithmique par rapport à la taille du catalogue. Notre approche améliore le temps linéaire des méthodes courantes de calcul de gradient, et permet un apprentissage rapide sans nuire à la qualité des politiques obtenues
This thesis presents contributions to the problem of learning from logged interactions using the offline contextual bandit framework. We are interested in two related topics: (1) offline policy learning with performance certificates, and (2) fast and efficient policy learning applied to large scale, real world recommendation. For (1), we first leverage results from the distributionally robust optimisation framework to construct asymptotic, variance-sensitive bounds to evaluate policies' performances. These bounds lead to new, more practical learning objectives thanks to their composite nature and straightforward calibration. We then analyse the problem from the PAC-Bayesian perspective, and provide tighter, non-asymptotic bounds on the performance of policies. Our results motivate new strategies, that offer performance certificates before deploying the policies online. The newly derived strategies rely on composite learning objectives that do not require additional tuning. For (2), we first propose a hierarchical Bayesian model, that combines different signals, to efficiently estimate the quality of recommendation. We provide proper computational tools to scale the inference to real world problems, and demonstrate empirically the benefits of the approach in multiple scenarios. We then address the question of accelerating common policy optimisation approaches, particularly focusing on recommendation problems with catalogues of millions of items. We derive optimisation routines, based on new gradient approximations, computed in logarithmic time with respect to the catalogue size. Our approach improves on common, linear time gradient computations, yielding fast optimisation with no loss on the quality of the learned policies

APA, Harvard, Vancouver, ISO, and other styles

Huix, Tom. "Variational Inference : theory and large scale applications." Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX071.

Full text

Abstract:

Cette thèse développe des méthodes d'Inférence Variationnelle pour l'apprentissage bayésien en grande dimension. L'approche bayésienne en machine learning permet de gérer l'incertitude épistémique des modèles et ainsi de mieux quantifier l'incertitude de ces modèles, ce qui est nécessaire dans de nombreuses applications de machine learning. Cependant, l'inférence bayésienne n'est souvent pas réalisable car la distribution à posteriori des paramètres du modèle n'est pas calculable en général. L'Inférence Variationnelle (VI) est une approche qui permet de contourner ce problème en approximant la distribution à posteriori par une distribution plus simple appelée distribution Variationnelle.Dans la première partie de cette thèse, nous avons travaillé sur les garanties théoriques de l'Inférence Variationnelle. Dans un premier temps, nous avons étudié cette approche lorsque la distribution Variationnelle est une Gaussienne, dans le régime surparamétré, c'est-à-dire lorsque les modèles sont en très grande dimension. Puis, nous nous sommes intéressés aux distributions Variationnelles plus expressives que sont les mélanges de Gaussiennes et nous avons étudié à la fois l'erreur d'optimisation et l'erreur d'approximation de cette méthode.Dans la deuxième partie de la thèse, nous avons étudié les garanties théoriques des problèmes de bandit contextuels en utilisant une approche bayésienne appelée Thompson Sampling. Dans un premier temps, nous avons exploré l'utilisation d'Inférence Variationnelle pour l'algorithme Thompson Sampling. Nous avons notament démontré que dans le cadre linéaire, cette approche permet d'obtenir les mêmes garanties théoriques que lorsque la distribution à posteriori est connue. Dans un deuxième temps, nous avons étudié une variante de Thompson Sampling appelée Feel-Good Thompson Sampling (FG-TS). Cette méthode permet d'obtenir de meilleures garanties théoriques que l'algorithme classique. Nous avons alors étudié l'utilisation d'une méthode de Monte Carlo Markov Chain pour approximer la distribution à posteriori. Plus spécifiquement, nous avons ajouté à FG-TS un algorithme de Langevin Monte Carlo et de Metropolized Langevin Monte Carlo. De plus, nous avons obtenu les mêmes garanties théoriques que pour FG-TS lorsque la distribution à posteriori est connue
This thesis explores Variational Inference methods for high-dimensional Bayesian learning. In Machine Learning, the Bayesian approach allows one to deal with epistemic uncertainty and provides and a better uncertainty quantification, which is necessary in many machine learning applications. However, Bayesian inference is often not feasible because the posterior distribution of the model parameters is generally untractable. Variational Inference (VI) allows to overcome this problem by approximating the posterior distribution with a simpler distribution called the variational distribution.In the first part of this thesis, we worked on the theoretical guarantees of Variational Inference. First, we studied VI when the Variational distribution is a Gaussian and in the overparameterized regime, i.e., when the models are high dimensional. Finally, we explore the Gaussian mixtures Variational distributions, as it is a more expressive distribution. We studied both the optimization error and the approximation error of this method.In the second part of the thesis, we studied the theoretical guarantees for contextual bandit problems using a Bayesian approach called Thompson Sampling. First, we explored the use of Variational Inference for Thompson Sampling algorithm. We notably showed that in the linear framework, this approach allows us to obtain the same theoretical guarantees as if we had access to the true posterior distribution. Finally, we consider a variant of Thompson Sampling called Feel-Good Thompson Sampling (FG-TS). This method allows to provide better theoretical guarantees than the classical algorithm. We then studied the use of a Monte Carlo Markov Chain method to approximate the posterior distribution. Specifically, we incorporated into FG-TS a Langevin Monte Carlo algorithm and a Metropolized Langevin Monte Carlo algorithm. Moreover, we obtained the same theoretical guarantees as for FG-TS when the posterior distribution is known

APA, Harvard, Vancouver, ISO, and other styles

Bouneffouf, Djallel. "DRARS, A Dynamic Risk-Aware Recommender System." Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-01026136.

Full text

Abstract:

L'immense quantité d'information générée et gérée au quotidien par les systèmes d'information et leurs utilisateurs conduit inéluctablement ?a la problématique de surcharge d'information. Dans ce contexte, les systèmes de recommandation traditionnels fournissent des informations pertinentes aux utilisateurs. Néanmoins, avec la propagation récente des dispositifs mobiles (Smartphones et tablettes), nous constatons une migration progressive des utilisateurs vers la manipulation d'environnements pérvasifs. Le problème avec les approches traditionnelles de recommandation est qu'elles n'utilisent pas toute l'information disponible pour produire des recommandations. Davantage d'informations contextuelles pourraient être utilisées dans le processus de recommandation pour aboutir à des recommandations plus précises. Les systèmes de recommandations sensibles au contexte (CARS) combinent les caractéristiques des systèmes sensibles au contexte et des systèmes de recommandation an de fournir des informations personnalisées aux utilisateurs dans des environnements ubiquitaires. Dans cette perspective ou tout ce qui concerne l'utilisateur est dynamique, les contenus qu'il manipule et son environnement, deux questions principales doivent être adressées : i) Comment prendre en compte la dynamicité des contenus de l'utilisateur ? et ii ) Comment éviter d'être intrusif en particulier dans des situations critiques ?. En réponse ?a ces questions, nous avons développé un système de recommandation dynamique et sensible au risque appelé DRARS (Dynamic Risk-Aware Recommender System), qui modélise la recommandation sensible au contexte comme un problème de bandit. Ce système combine une technique de filtrage basée sur le contenu et un algorithme de bandit contextuel. Nous avons montré que DRARS améliore la stratégie de l'algorithme UCB (Upper Con dence Bound), le meilleur algorithme actuellement disponible, en calculant la valeur d'exploration la plus optimale pour maintenir un compromis entre exploration et exploitation basé sur le niveau de risque de la situation courante de l'utilisateur. Nous avons mené des expériences dans un contexte industriel avec des données réelles et des utilisateurs réels et nous avons montré que la prise en compte du niveau de risque de la situation de l'utilisateur augmentait significativement la performance du système de recommandation.

APA, Harvard, Vancouver, ISO, and other styles

Chia, John. "Non-linear contextual bandits." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42191.

Full text

Abstract:

The multi-armed bandit framework can be motivated by any problem where there is an abundance of choice and the utility of trying something new must be balanced with that of going with the status quo. This is a trade-off that is present in the everyday problem of where and what to eat: should I try a new restaurant or go to that Chinese place on the corner? In this work, a multi-armed bandit algorithm is presented which uses a non-parametric non-linear data model (a Gaussian process) to solve problems of this sort. The advantages of this method over existing work is highlighted through experiments. The method is also capable of modelling correlations between separate instances of problems, e.g., between similar dishes at similar restaurants. To demonstrate this, a few experiments are performed. The first, a synthetic example where the reward function is actually sampled from a Gaussian process, begs the question but helps pin down the properties of the algorithm in a controlled environment. The second, a problem where the objective is to aim a cannon at a distant target, shows how a well-defined objective, i.e., hit the target, can be used to speed up convergence. Finally, the third, an experiment with photographic post-processing, shows how the algorithm can learn from experience. The experiments demonstrate both the flexibility and the computational complexity of the model. This complexity means that problems such as the aforementioned restaurant problem, among others, are still future work.

APA, Harvard, Vancouver, ISO, and other styles

Galichet, Nicolas. "Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112242/document.

Full text

Abstract:

Cette thèse s'inscrit dans le domaine de la prise de décision séquentielle en environnement inconnu, et plus particulièrement dans le cadre des bandits manchots (multi-armed bandits, MAB), défini par Robbins et Lai dans les années 50. Depuis les années 2000, ce cadre a fait l'objet de nombreuses recherches théoriques et algorithmiques centrées sur le compromis entre l'exploration et l'exploitation : L'exploitation consiste à répéter le plus souvent possible les choix qui se sont avérés les meilleurs jusqu'à présent. L'exploration consiste à essayer des choix qui ont rarement été essayés, pour vérifier qu'on a bien identifié les meilleurs choix. Les applications des approches MAB vont du choix des traitements médicaux à la recommandation dans le contexte du commerce électronique, en passant par la recherche de politiques optimales de l'énergie. Les contributions présentées dans ce manuscrit s'intéressent au compromis exploration vs exploitation sous deux angles spécifiques. Le premier concerne la prise en compte du risque. Toute exploration dans un contexte inconnu peut en effet aboutir à des conséquences indésirables ; par exemple l'exploration des comportements d'un robot peut aboutir à des dommages pour le robot ou pour son environnement. Dans ce contexte, l'objectif est d'obtenir un compromis entre exploration, exploitation, et prise de risque (EER). Plusieurs algorithmes originaux sont proposés dans le cadre du compromis EER. Sous des hypothèses fortes, l'algorithme MIN offre des garanties de regret logarithmique, à l'état de l'art ; il offre également une grande robustesse, contrastant avec la forte sensibilité aux valeurs des hyper-paramètres de e.g. (Auer et al. 2002). L'algorithme MARAB s'intéresse à un critère inspiré de la littérature économique(Conditional Value at Risk), et montre d'excellentes performances empiriques comparées à (Sani et al. 2012), mais sans garanties théoriques. Enfin, l'algorithme MARABOUT modifie l'estimation du critère CVaR pour obtenir des garanties théoriques, tout en obtenant un bon comportement empirique. Le second axe de recherche concerne le bandit contextuel, où l'on dispose d'informations additionnelles relatives au contexte de la décision ; par exemple, les variables d'état du patient dans un contexte médical ou de l'utilisateur dans un contexte de recommandation. L'étude se focalise sur le choix entre bras qu'on a tirés précédemment un nombre de fois différent. Le choix repose en général sur la notion d'optimisme, comparant les bornes supérieures des intervalles de confiance associés aux bras considérés. Une autre approche appelée BESA, reposant sur le sous-échantillonnage des valeurs tirées pour les bras les plus visités, et permettant ainsi de se ramener au cas où tous les bras ont été tirés un même nombre de fois, a été proposée par (Baransi et al. 2014)
This thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior

APA, Harvard, Vancouver, ISO, and other styles

Nicol, Olivier. "Data-driven evaluation of contextual bandit algorithms and applications to dynamic recommendation." Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10211/document.

Full text

Abstract:

Ce travail de thèse a été réalisé dans le contexte de la recommandation dynamique. La recommandation est l'action de fournir du contenu personnalisé à un utilisateur utilisant une application, dans le but d'améliorer son utilisation e.g. la recommandation d'un produit sur un site marchant ou d'un article sur un blog. La recommandation est considérée comme dynamique lorsque le contenu à recommander ou encore les goûts des utilisateurs évoluent rapidement e.g. la recommandation d'actualités. Beaucoup d'applications auxquelles nous nous intéressons génèrent d'énormes quantités de données grâce à leurs millions d'utilisateurs sur Internet. Néanmoins, l'utilisation de ces données pour évaluer une nouvelle technique de recommandation ou encore comparer deux algorithmes de recommandation est loin d'être triviale. C'est cette problématique que nous considérons ici. Certaines approches ont déjà été proposées. Néanmoins elles sont très peu étudiées autant théoriquement (biais non quantifié, borne de convergence assez large...) qu'empiriquement (expériences sur données privées). Dans ce travail nous commençons par combler de nombreuses lacunes de l'analyse théorique. Ensuite nous discutons les résultats très surprenants d'une expérience à très grande échelle : une compétition ouverte au public que nous avons organisée. Cette compétition nous a permis de mettre en évidence une source de biais considérable et constamment présente en pratique : l'accélération temporelle. La suite de ce travail s'attaque à ce problème. Nous montrons qu'une approche à base de bootstrap permet de réduire mais surtout de contrôler ce biais
The context of this thesis work is dynamic recommendation. Recommendation is the action, for an intelligent system, to supply a user of an application with personalized content so as to enhance what is refered to as "user experience" e.g. recommending a product on a merchant website or even an article on a blog. Recommendation is considered dynamic when the content to recommend or user tastes evolve rapidly e.g. news recommendation. Many applications that are of interest to us generates a tremendous amount of data through the millions of online users they have. Nevertheless, using this data to evaluate a new recommendation technique or even compare two dynamic recommendation algorithms is far from trivial. This is the problem we consider here. Some approaches have already been proposed. Nonetheless they were not studied very thoroughly both from a theoretical point of view (unquantified bias, loose convergence bounds...) and from an empirical one (experiments on private data only). In this work we start by filling many blanks within the theoretical analysis. Then we comment on the result of an experiment of unprecedented scale in this area: a public challenge we organized. This challenge along with a some complementary experiments revealed a unexpected source of a huge bias: time acceleration. The rest of this work tackles this issue. We show that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it

APA, Harvard, Vancouver, ISO, and other styles

May, Benedict C. "Bayesian sampling in contextual-bandit problems with extensions to unknown normal-form games." Thesis, University of Bristol, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627937.

Full text

Abstract:

In sequential decision problems in unknown environments, decision makers often face dilemmas over whether to explore to discover more about the environment, or to exploit current knowledge. In this thesis, we address this exploration/exploitation dilemma in a general setting encompassing both standard and contextualised bandit problems, and also multi-agent (game-theoretic) problems. We consider an approach of Thompson (1933) which makes use of samples from the posterior distributions for the instantaneous value of each action. Our initial focus is on problems with a single decision maker acting. We extend the approach of Thompson (1933) by introducing a new algorithm, Optimistic Bayesian Sampling (OBS), in which the probability of playing an action increases with the uncertainty in the estimate of the action value. This results in better directed exploratory behaviour. We prove that, under unrestrictive assumptions, both approaches result in optimal behaviour with respect to the average reward criterion of Yang and Zhu(2002) . The problem has recently resurfaced in the context of contextual bandits for maximising revenue in sponsored web search advertising. We implement OBS and test its performance in several simulated domains. We find that it performs consistently better than numerous competitor methods. Our second focus is that of extending the method of Thompson (1933) to problems with more than one decision maker acting, and individual rewards depending on actions of others. Each agent must predict the actions of others to maximise reward. We consider combining Thompson sampling with fictitious play and establish conditions under which agents strategies converge to best responses to the empirical frequencies of opponent play, and also under which the belief process is a generalised weakened fictitious play process of Leslie and Collins (2006). Fictitious play is a deterministic algorithm, and so is not entirely consistent with the philosophy of Thompson sampling. We consider combining Thompson sampling with a randomised version of fictitious play that guarantees players play best responses to the empirical frequencies of opponent play. We also consider how the LTS and OBS algorithms can be extended to team games, where all agents receive the same reward. We suggest a novel method of achieving 'perfect coordination', in the sense that the multi-agent problem is effectively reduced to a single-agent problem.

APA, Harvard, Vancouver, ISO, and other styles

Ju, Weiyu. "Mobile Deep Neural Network Inference in Edge Computing with Resource Restrictions." Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25038.

Full text

Abstract:

Recent advances in deep neural networks (DNNs) have substantially improved the accuracy of intelligent applications. However, the pursuit of a higher accuracy has led to an increase in the complexity of DNNs, which inevitably increases the inference latency. For many time-sensitive mobile inferences, such a delay is intolerable and could be fatal in many real-world applications. To solve this problem, one effective scheme known as DNN partition is proposed, which significantly improves the inference latency by partitioning the DNN to a mobile device and an edge server to jointly process the inference. This approach utilises the stronger computing capacity of the edge while reducing the data transmission. Nevertheless, this approach requires a reliable network connection, which is oftentimes unstable. Therefore, DNN partition is vulnerable in the presence of service outages. In this thesis, we are motivated to investigate how to maintain the quality of the service during service outages to avoid interruptions. Inspired by the recently developed early exit technique, we propose three solutions: (1) When the service outage time is predictable, we propose eDeepSave to decide which frames to process during the service outage. (2) When the service outage time is not predictable but relatively short, we design LEE to effectively learn the optimal exit point in a per-instance manner. (3) When the service outage time is not predictable and relatively long, we present the DEE scheme to learn the optimal action (to exit or not) at each exit point, so that the system can dynamically exit the inference by utilising the observed environmental information. For each scheme, we provide detailed mathematical proofs of the performance and then test their performance in real-world experiments as well as the extensive simulations. The results of the three schemes demonstrate their effectiveness in maintaining the service during the service outage under a variety of scenarios.

APA, Harvard, Vancouver, ISO, and other styles

Brégère, Margaux. "Stochastic bandit algorithms for demand side management Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders Online Hierarchical Forecasting for Power Consumption Data Target Tracking for Contextual Bandits : Application to Demand Side Management." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM022.

Full text

Abstract:

L'électricité se stockant difficilement à grande échelle, l'équilibre entre la production et la consommation doit être rigoureusement maintenu. Une gestion par anticipation de la demande se complexifie avec l'intégration au mix de production des énergies renouvelables intermittentes. Parallèlement, le déploiement des compteurs communicants permet d'envisager un pilotage dynamique de la consommation électrique. Plus concrètement, l'envoi de signaux - tels que des changements du prix de l'électricité – permettrait d'inciter les usagers à moduler leur consommation afin qu'elle s'ajuste au mieux à la production d'électricité. Les algorithmes choisissant ces signaux devront apprendre la réaction des consommateurs face aux envois tout en les optimisant (compromis exploration-exploitation). Notre approche, fondée sur la théorie des bandits, a permis de formaliser ce problème d'apprentissage séquentiel et de proposer un premier algorithme pour piloter la demande électrique d'une population homogène de consommateurs. Une borne supérieure d'ordre T⅔ a été obtenue sur le regret de cet algorithme. Des expériences réalisées sur des données de consommation de foyers soumis à des changements dynamiques du prix de l'électricité illustrent ce résultat théorique. Un jeu de données en « information complète » étant nécessaire pour tester un algorithme de bandits, un simulateur de données de consommation fondé sur les auto-encodeurs variationnels a ensuite été construit. Afin de s'affranchir de l'hypothèse d'homogénéité de la population, une approche pour segmenter les foyers en fonction de leurs habitudes de consommation est aussi proposée. Ces différents travaux sont finalement combinés pour proposer et tester des algorithmes de bandits pour un pilotage personnalisé de la consommation électrique
As electricity is hard to store, the balance between production and consumption must be strictly maintained. With the integration of intermittent renewable energies into the production mix, the management of the balance becomes complex. At the same time, the deployment of smart meters suggests demand response. More precisely, sending signals - such as changes in the price of electricity - would encourage users to modulate their consumption according to the production of electricity. The algorithms used to choose these signals have to learn consumer reactions and, in the same time, to optimize them (exploration-exploration trade-off). Our approach is based on bandit theory and formalizes this sequential learning problem. We propose a first algorithm to control the electrical demand of a homogeneous population of consumers and offer T⅔ upper bound on its regret. Experiments on a real data set in which price incentives were offered illustrate these theoretical results. As a “full information” dataset is required to test bandit algorithms, a consumption data generator based on variational autoencoders is built. In order to drop the assumption of the population homogeneity, we propose an approach to cluster households according to their consumption profile. These different works are finally combined to propose and test a bandit algorithm for personalized demand side management

APA, Harvard, Vancouver, ISO, and other styles

Wan, Hao. "Tutoring Students with Adaptive Strategies." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/36.

Full text

Abstract:

Adaptive learning is a crucial part in intelligent tutoring systems. It provides students with appropriate tutoring interventions, based on studentsâ€™ characteristics, status, and other related features, in order to optimize their learning outcomes. It is required to determine studentsâ€™ knowledge level or learning progress, based on which it then uses proper techniques to choose the optimal interventions. In this dissertation work, I focus on these aspects related to the process in adaptive learning: student modeling, k-armed bandits, and contextual bandits. Student modeling. The main objective of student modeling is to develop cognitive models of students, including modeling content skills and knowledge about learning. In this work, we investigate the effect of prerequisite skill in predicting studentsâ€™ knowledge in post skills, and we make use of the prerequisite performance in different student models. As a result, this makes them superior to traditional models. K-armed bandits. We apply k-armed bandit algorithms to personalize interventions for students, to optimize their learning outcomes. Due to the lack of diverse interventions and small difference of intervention effectiveness in educational experiments, we also propose a simple selection strategy, and compare it with several k-armed bandit algorithms. Contextual bandits. In contextual bandit problem, additional side information, also called context, can be used to determine which action to select. First, we construct a feature evaluation mechanism, which determines which feature to be combined with bandits. Second, we propose a new decision tree algorithm, which is capable of detecting aptitude treatment effect for students. Third, with combined bandits with the decision tree, we apply the contextual bandits to make personalization in two different types of data, simulated data and real experimental data.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Bandit Contextuel"

Pijnenburg, Huub, Jo Hermanns, Tom van Yperen, Giel Hutschemaekers, and Adri van Montfoort. Zorgen dat het werkt: Werkzame factoren in de zorg voor jeugd. 2nd ed. Uitgeverij SWP, 2011. http://dx.doi.org/10.36254/978-90-8850-131-9.

Full text

Abstract:

Evidence based werken in de zorg voor jeugd? Prima! Maar wat doen we met vragen als: - In wiens handen werken interventies; wat kenmerkt effectieve professionals? - Wat is de invloed van de werkalliantie van professionals en cliënten? - Waarom werken interventies, en onder welke condities? - Hoe kunnen we steunfactoren benutten in de leefomgeving van jeugdigen en opvoeders? - Wat betekent dit alles voor de manier waarop we hulp moeten organiseren en beroepskrachten moeten opleiden? Vijf bijdragen maken dit boek waardevol voor jeugdzorgprofessionals en studenten. Vijf auteurs die thuis zijn in veld en wetenschap laten hun licht schijnen over: - De samenhang tussen werkzame factoren, met nadruk op kenmerken van effectieve professionals en het belang van cliënt-hulpverlener-alliantie (Huub Pijnenburg) - Vernieuwende opvattingen over inrichting van contextuele jeugdzorg en niet-vrijblijvende samenwerking bij complexe hulpvragen (Jo Hermanns) - Mogelijkheden voor effectiviteitsverbetering, waaronder aandacht voor implementatie van effectieve interventies (Tom van Yperen) - Recente ontwikkelingen in het denken over evidence based practice en de zoektocht naar een werkzame alliantie tussen praktijk en wetenschap (Giel Hutschemaekers) - De samenhang tussen een integrale visie op jeugdzorg, belangen van overheden, en dimensies in werk en opleiding van beroepskrachten (Adri van Montfoort) De eerste bijdrage is een bewerking van de intreerede van Huub Pijnenburg bij de aanvaarding van zijn lectoraat Werkzame Factoren in de Zorg voor Jeugd aan de Hogeschool van Arnhem en Nijmegen. Dit lectoraat zoekt samen met de praktijk naar antwoorden op vragen over werkzaamheid van zorg voor jeugd, en wat dit betekent voor beroepskrachten en instellingen. De factoren die de werkzaamheid van de psychosociale zorg voor jeugd beïnvloeden, laten zich kennen als een bonte familie. Meer kennis over de leden van deze familie en hun onderlinge band zal de werkzaamheid van de jeugdzorg vergroten. Want dat is en blijft de grote uitdaging: zorgen dat het werkt.

APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Bandit Contextuel"

Nguyen, Le Minh Duc, Fuhua Lin, and Maiga Chang. "Generating Learning Sequences Using Contextual Bandit Algorithms." In Generative Intelligence and Intelligent Tutoring Systems, 320–29. Cham: Springer Nature Switzerland, 2024. http://dx.doi.org/10.1007/978-3-031-63028-6_26.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Tavakol, Maryam, Sebastian Mair, and Katharina Morik. "HyperUCB: Hyperparameter Optimization Using Contextual Bandits." In Machine Learning and Knowledge Discovery in Databases, 44–50. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-43823-4_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ma, Yuzhe, Kwang-Sung Jun, Lihong Li, and Xiaojin Zhu. "Data Poisoning Attacks in Contextual Bandits." In Lecture Notes in Computer Science, 186–204. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-01554-1_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Labille, Kevin, Wen Huang, and Xintao Wu. "Transferable Contextual Bandits with Prior Observations." In Advances in Knowledge Discovery and Data Mining, 398–410. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-75765-6_32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Shirey, Heather. "19. Art in the Streets." In Play in a Covid Frame, 427–40. Cambridge, UK: Open Book Publishers, 2023. http://dx.doi.org/10.11647/obp.0326.19.

Full text

Abstract:

Drawing on photographic documentation of street art, contextual analysis and artist interviews, this essay examines the work of two prolific street artists: The Velvet Bandit, a wheatpaste artist in the Bay Area (California, USA) and SudaLove, a muralist working in Khartoum (Sudan). Both The Velvet Bandit and SudaLove create artistic interventions in the street as a means of engaging with Covid-19 in a manner that is light and playful but also serious and political. As is typical of street art, their work is highly accessible, using simple visual language. At the same time, each piece requires deeper contextual knowledge to understand the underlying political and social significance.

APA, Harvard, Vancouver, ISO, and other styles

Liu, Weiwen, Shuai Li, and Shengyu Zhang. "Contextual Dependent Click Bandit Algorithm for Web Recommendation." In Lecture Notes in Computer Science, 39–50. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-319-94776-1_4.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bouneffouf, Djallel, Romain Laroche, Tanguy Urvoy, Raphael Feraud, and Robin Allesiardo. "Contextual Bandit for Active Learning: Active Thompson Sampling." In Neural Information Processing, 405–12. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-12637-1_51.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bouneffouf, Djallel, Amel Bouzeghoub, and Alda Lopes Gançarski. "Contextual Bandits for Context-Based Information Retrieval." In Neural Information Processing, 35–42. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-42042-9_5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Delande, David, Patricia Stolf, Raphaël Feraud, Jean-Marc Pierson, and André Bottaro. "Horizontal Scaling in Cloud Using Contextual Bandits." In Euro-Par 2021: Parallel Processing, 285–300. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-85665-6_18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gampa, Phanideep, and Sumio Fujita. "BanditRank: Learning to Rank Using Contextual Bandits." In Advances in Knowledge Discovery and Data Mining, 259–71. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-75768-7_21.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Bandit Contextuel"

Chen, Zhaoxin. "Enhancing Recommendation Systems Through Contextual Bandit Models." In International Conference on Engineering Management, Information Technology and Intelligence, 622–27. SCITEPRESS - Science and Technology Publications, 2024. http://dx.doi.org/10.5220/0012960800004508.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Liu, Fangzhou, Zehua Pei, Ziyang Yu, Haisheng Zheng, Zhuolun He, Tinghuan Chen, and Bei Yu. "CBTune: Contextual Bandit Tuning for Logic Synthesis." In 2024 Design, Automation & Test in Europe Conference & Exhibition (DATE), 1–6. IEEE, 2024. http://dx.doi.org/10.23919/date58400.2024.10546766.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Yufan, Honglin Wen, and Qiuwei Wu. "A Contextual Bandit Approach for Value-oriented Prediction Interval Forecasting." In 2024 IEEE Power & Energy Society General Meeting (PESGM), 1. IEEE, 2024. http://dx.doi.org/10.1109/pesgm51994.2024.10688595.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Li, Haowei, Mufeng Wang, Jiarui Zhang, Tianyu Shi, and Alaa Khamis. "A Contextual Multi-armed Bandit Approach to Personalized Trip Itinerary Planning." In 2024 IEEE International Conference on Smart Mobility (SM), 55–60. IEEE, 2024. http://dx.doi.org/10.1109/sm63044.2024.10733530.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bouneffouf, Djallel, Irina Rish, Guillermo Cecchi, and Raphaël Féraud. "Context Attentive Bandits: Contextual Bandit with Restricted Context." In Twenty-Sixth International Joint Conference on Artificial Intelligence. California: International Joint Conferences on Artificial Intelligence Organization, 2017. http://dx.doi.org/10.24963/ijcai.2017/203.

Full text

Abstract:

We consider a novel formulation of the multi-armed bandit model, which we call the contextual bandit with restricted context, where only a limited number of features can be accessed by the learner at every iteration. This novel formulation is motivated by different online problems arising in clinical trials, recommender systems and attention modeling.Herein, we adapt the standard multi-armed bandit algorithm known as Thompson Sampling to take advantage of our restricted context setting, and propose two novel algorithms, called the Thompson Sampling with Restricted Context (TSRC) and the Windows Thompson Sampling with Restricted Context (WTSRC), for handling stationary and nonstationary environments, respectively. Our empirical results demonstrate advantages of the proposed approaches on several real-life datasets.

APA, Harvard, Vancouver, ISO, and other styles

Pase, Francesco, Deniz Gunduz, and Michele Zorzi. "Remote Contextual Bandits." In 2022 IEEE International Symposium on Information Theory (ISIT). IEEE, 2022. http://dx.doi.org/10.1109/isit50566.2022.9834399.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Lin, Baihan, Djallel Bouneffouf, Guillermo A. Cecchi, and Irina Rish. "Contextual Bandit with Adaptive Feature Extraction." In 2018 IEEE International Conference on Data Mining Workshops (ICDMW). IEEE, 2018. http://dx.doi.org/10.1109/icdmw.2018.00136.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Peng, Yi, Miao Xie, Jiahao Liu, Xuying Meng, Nan Li, Cheng Yang, Tao Yao, and Rong Jin. "A Practical Semi-Parametric Contextual Bandit." In Twenty-Eighth International Joint Conference on Artificial Intelligence {IJCAI-19}. California: International Joint Conferences on Artificial Intelligence Organization, 2019. http://dx.doi.org/10.24963/ijcai.2019/450.

Full text

Abstract:

Classic multi-armed bandit algorithms are inefficient for a large number of arms. On the other hand, contextual bandit algorithms are more efficient, but they suffer from a large regret due to the bias of reward estimation with finite dimensional features. Although recent studies proposed semi-parametric bandits to overcome these defects, they assume arms' features are constant over time. However, this assumption rarely holds in practice, since real-world problems often involve underlying processes that are dynamically evolving over time especially for the special promotions like Singles' Day sales. In this paper, we formulate a novel Semi-Parametric Contextual Bandit Problem to relax this assumption. For this problem, a novel Two-Steps Upper-Confidence Bound framework, called Semi-Parametric UCB (SPUCB), is presented. It can be flexibly applied to linear parametric function problem with a satisfied gap-free bound on the n-step regret. Moreover, to make our method more practical in online system, an optimization is proposed for dealing with high dimensional features of a linear function. Extensive experiments on synthetic data as well as a real dataset from one of the largest e-commercial platforms demonstrate the superior performance of our algorithm.

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Xiaoying, Hong Xie, Hang Li, and John C.S. Lui. "Conversational Contextual Bandit: Algorithm and Application." In WWW '20: The Web Conference 2020. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3366423.3380148.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ban, Yikun, Jingrui He, and Curtiss B. Cook. "Multi-facet Contextual Bandits." In KDD '21: The 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM, 2021. http://dx.doi.org/10.1145/3447548.3467299.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Bandit Contextuel"

Yun, Seyoung, Jun Hyun Nam, Sangwoo Mo, and Jinwoo Shin. Contextual Multi-armed Bandits under Feature Uncertainty. Office of Scientific and Technical Information (OSTI), March 2017. http://dx.doi.org/10.2172/1345927.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Contents

Academic literature on the topic 'Bandit Contextuel'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Journal articles on the topic "Bandit Contextuel"

Dissertations / Theses on the topic "Bandit Contextuel"

Books on the topic "Bandit Contextuel"

Book chapters on the topic "Bandit Contextuel"

Conference papers on the topic "Bandit Contextuel"

Reports on the topic "Bandit Contextuel"