Academic literature on the topic 'Bandits à plusieurs bras'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bandits à plusieurs bras.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Bandits à plusieurs bras"
Leboyer, M., T. D’Amato, A. Malafosse, D. Campion, and F. Gheysen. "Génétique épidémiologique des troubles de l’humeur: une nouvelle voie de recherches?" Psychiatry and Psychobiology 4, no. 4 (1989): 193–202. http://dx.doi.org/10.1017/s0767399x00002753.
Full textBengeni, D., P. Lim, and A. Belaud. "Qualité des eaux de trois bras morts de la Garonne variabilité spatio-temporelle)." Revue des sciences de l'eau 5, no. 2 (April 12, 2005): 131–56. http://dx.doi.org/10.7202/705125ar.
Full textMarquis, Dominique. "Un homme et son journal : comment Jules-Paul Tardivel « domestiqua » La Vérité." Mens 13, no. 2 (July 23, 2014): 35–57. http://dx.doi.org/10.7202/1025982ar.
Full textDejean, Frédéric. "De la visibilité des lieux du religieux en contexte urbain : l’exemple des églises protestantes évangéliques à Montréal." Studies in Religion/Sciences Religieuses 49, no. 3 (June 9, 2020): 408–31. http://dx.doi.org/10.1177/0008429820924012.
Full textLe Bras, Hervé. "Dix ans de perspectives de la population étrangère : une perspective." Population Vol. 52, no. 1 (January 1, 1997): 103–33. http://dx.doi.org/10.3917/popu.p1997.52n1.0133.
Full textDOSTIE, GAÉTANE. "Considérations sur la forme et le sens. Pis en français québécois. Une simple variante de puis? Un simple remplaçant de et?" Journal of French Language Studies 14, no. 2 (July 2004): 113–28. http://dx.doi.org/10.1017/s0959269504001607.
Full textMokhtari, Mathieu. "Capitoline Wolf or Draco? Politicizing the Ancient Past and Materializing the Autochthony in Twenty-First Century Romania." Passés politisés, no. 9 (December 15, 2023): 31–46. http://dx.doi.org/10.35562/frontieres.1833.
Full textAchilleas, Philippe. "La bataille de la 5G et le droit international." Annuaire français de droit international 66, no. 1 (2020): 709–31. http://dx.doi.org/10.3406/afdi.2020.5489.
Full textHanawalt, Barbara A., and Ben R. McRee. "The guilds of homo prudens in late medieval England." Continuity and Change 7, no. 2 (August 1992): 163–79. http://dx.doi.org/10.1017/s0268416000001557.
Full textLhomme, E., R. Sitta, V. Journot, C. Chazallon, D. Gabillard, L. Piroth, B. Lefèvre, et al. "Plateforme COVERAGE France : un essai clinique randomisé multicentrique utilisant un schéma adaptatif multi-bras multi-étape (MAMS) pour évaluer plusieurs traitements expérimentaux de la COVID-19 en ambulatoire." Revue d'Épidémiologie et de Santé Publique 69 (June 2021): S6. http://dx.doi.org/10.1016/j.respe.2021.04.005.
Full textDissertations / Theses on the topic "Bandits à plusieurs bras"
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Full textThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
Gajane, Pratik. "Multi-armed bandits with unconventional feedback." Thesis, Lille 3, 2017. http://www.theses.fr/2017LIL30045/document.
Full textThe multi-armed bandit (MAB) problem is a mathematical formulation of the exploration-exploitation trade-off inherent to reinforcement learning, in which the learner chooses an action (symbolized by an arm) from a set of available actions in a sequence of trials in order to maximize their reward. In the classical MAB problem, the learner receives absolute bandit feedback i.e. it receives as feedback the reward of the arm it selects. In many practical situations however, different kind of feedback is more readily available. In this thesis, we study two of such kinds of feedbacks, namely, relative feedback and corrupt feedback.The main practical motivation behind relative feedback arises from the task of online ranker evaluation. This task involves choosing the optimal ranker from a finite set of rankers using only pairwise comparisons, while minimizing the comparisons between sub-optimal rankers. This is formalized by the MAB problem with relative feedback, in which the learner selects two arms instead of one and receives the preference feedback. We consider the adversarial formulation of this problem which circumvents the stationarity assumption over the mean rewards for the arms. We provide a lower bound on the performance measure for any algorithm for this problem. We also provide an algorithm called "Relative Exponential-weight algorithm for Exploration and Exploitation" with performance guarantees. We present a thorough empirical study on several information retrieval datasets that confirm the validity of these theoretical results.The motivating theme behind corrupt feedback is that the feedback the learner receives is a corrupted form of the corresponding reward of the selected arm. Practically such a feedback is available in the tasks of online advertising, recommender systems etc. We consider two goals for the MAB problem with corrupt feedback: best arm identification and exploration-exploitation. For both the goals, we provide lower bounds on the performance measures for any algorithm. We also provide various algorithms for these settings. The main contribution of this module is the algorithms "KLUCB-CF" and "Thompson Sampling-CF" which asymptotically attain the best possible performance. We present experimental results to demonstrate the performance of these algorithms. We also show how this problem setting can be used for the practical application of enforcing differential privacy
Besson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Full textIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Abeille, Marc. "Exploration-exploitation with Thompson sampling in linear systems." Thesis, Lille 1, 2017. http://www.theses.fr/2017LIL10182/document.
Full textThis dissertation is dedicated to the study of the Thompson Sampling (TS) algorithms designed to address the exploration-exploitation dilemma that is inherent in sequential decision-making under uncertainty. As opposed to algorithms derived from the optimism-in-the-face-of-uncertainty (OFU) principle, where the exploration is performed by selecting the most favorable model within the set of plausible one, TS algorithms rely on randomization to enhance the exploration, and thus are much more computationally efficient. We focus on linearly parametrized problems that allow for continuous state-action spaces, namely the Linear Bandit (LB) problems and the Linear Quadratic (LQ) control problems. We derive two novel analyses for the regret of TS algorithms in those settings. While the obtained regret bound for LB is similar to previous results, the proof sheds new light on the functioning of TS, and allows us to extend the analysis to LQ problems. As a result, we prove the first regret bound for TS in LQ, and show that the frequentist regret is of order O(sqrt{T}) which matches the existing guarantee for the regret of OFU algorithms in LQ. Finally, we propose an application of exploration-exploitation techniques to the practical problem of portfolio construction, and discuss the need for active exploration in this setting
Lagrée, Paul. "Méthodes adaptatives pour les applications d'accès à l'information centrées sur l'utilisateur." Thesis, Université Paris-Saclay (ComUE), 2017. http://www.theses.fr/2017SACLS341/document.
Full textWhen users interact on modern Web systems, they let numerous footprints which we propose to exploit in order to develop better applications for information access. We study a family of techniques centered on users, which take advantage of the many types of feedback to adapt and improve services provided to users. We focus on applications like recommendation and influencer marketing in which users generate discrete feedback (e.g. clicks, "likes", reposts, etc.) that we incorporate in our algorithms in order to deliver strongly contextualized services. The first part of this dissertation is dedicated to an approach for as-you-type search on social media. The problem consists in retrieving a set of k search results in a social-aware environment under the constraint that the query may be incomplete (e.g., if the last term is a prefix). Every time the user updates his / her query, the system updates the set of search results accordingly. We adopt a "network-aware" interpretation of information relevance, by which information produced by users who are closer to the user issuing a request is considered more relevant. Then, we study a generic version of influence maximization, in which we want to maximize the influence of marketing or information campaigns by adaptively selecting "spread seeds" from a small subset of the population. Influencer marketing is a straightforward application of this, in which the focus of a campaign is placed on precise key individuals who are typically able to reach millions of consumers. This represents an unprecedented tool for online marketing that we propose to improve using an adaptive approach. Notably, our approach makes no assumptions on the underlying diffusion model and no diffusion network is needed. Finally, we propose to address the well-known cold start problem faced by recommender systems with an adaptive approach. If no information is available regarding the user appreciation of an item, the recommender system needs to gather feedback (e.g., clicks) so as to estimate the value of the item. However, in order to minimize "bad" recommendations, a well-designed system should not collect feedback carelessly. We introduce a dynamic algorithm that aims to intelligently achieve the balance between "bad" and "good" recommendations
Ménard, Pierre. "Sur la notion d'optimalité dans les problèmes de bandit stochastique." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30087/document.
Full textThe topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning
Degenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms." Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Full textIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Couetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Full textIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Collet, Timothé. "Méthodes optimistes d’apprentissage actif pour la classification." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0084/document.
Full textA Classification problem makes use of a training set consisting of data labeled by an oracle. The larger the training set, the best the performance. However, requesting the oracle may be costly. The goal of Active Learning is thus to minimize the number of requests to the oracle while achieving the best performance. To do so, the data that are presented to the oracle must be carefully selected among a large number of unlabeled instances acquired at no cost. However, the true profitability of labeling a particular instance may not be known perfectly. It can therefore be estimated along with a measure of uncertainty. To Increase the precision on the estimate, we need to label more data. Thus, there is a dilemma between labeling data in order to increase the performance of the classifier or to better know how to select data. This dilemma is well studied in the context of finite budget optimization under the name of exploration versus exploitation dilemma. The most famous solutions make use of the principle of Optimism in the Face of Uncertainty. In this thesis, we show that it is possible to adapt this principle to the active learning problem for classification. Several algorithms have been developed for classifiers of increasing complexity, each one of them using the principle of Optimism in the Face of Uncertainty, and their performances have been empirically evaluated
Jedor, Matthieu. "Bandit algorithms for recommender system optimization." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Full textIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Book chapters on the topic "Bandits à plusieurs bras"
"« Les nouvelles formes habitent et conditionne en le suicide, comme pas mal de d’expression qui apparaissent partie leur scolarité et leur filles de la cité. Quand tu vis là-chez les jeunes Maghrébins de accès au monde professionnel. dedans, tu es convaincue que France portent souvent la Dans une monographie, un ça été voulu comme ça, qu’on marque d’une longue jeune qui avait vécu dans les t’as mis sur la touche [4] pour expérience et d’un profond années soixante dans le plus que t’y restes, pour que tu te sentiment d’exclusion sociale, grand bidonville de la région sentes jamais chez toi, tu es là économique et politique. […] parisienne, « La Folie » à près de la sortie, et à tout Dans l’analyse de ce sentiment Nanterre, raconte: moment, on peut te mettre d’exclusion qu’expriment un – « Vraiment, je me carrément dehors ». (Malika, 25 grand nombre de ces jeunes, demande, qui est-ce qui a pu ans, Marseille) Pour d’autres plusieurs significations inventer le bidonville? Un jeunes, ceux qui ont grandi apparaissent: ils se sentent sadique certainement (…). Les dans les grands ensembles et exclus parce qu’ils sont ordures, on les laissait; les rats, les ZUP [5] qui ont été d’origine maghrébine, enfants on les laissait; les gosses construites à tour de bras [6] de manœuvres et d’ouvriers, tombaient malades, ils avaient dans les années soixante, le jeunes dans une société pas de place pour apprendre à sentiment d’être exclu est le vieillissante que leur jeunesse marcher. On avait honte, on même, mais il est différent effraie; ce sentiment était sales, et pourtant on dans sa nature: si on les a d’exclusion commence pour essayait d’être propres pour pas parqués à la périphérie des certains très tôt à l’école, qu’on sache [2] qu’on était du villes, ce n’est pas pour les ensuite, c’est le lieu bidonville ». exclure totalement de l’espace d’habitation, le manque de Plusieurs histoires allant urbain et social, mais pour les loisirs et de moyens, des dans le même sens sont empêcher d’y entrer. frustrations quotidiennes de racontées par des jeunes des leurs désirs et rêves d’enfants cités de transit de la région." In Francotheque: A resource for French studies, 61. Routledge, 2014. http://dx.doi.org/10.4324/978020378416-8.
Full textConference papers on the topic "Bandits à plusieurs bras"
Hascoet, E., G. Valette, G. Le Toux, and S. Boisramé. "Proposition d’un protocole de prise en charge implanto-portée de patients traités en oncologie tête et cou suite à une étude rétrospective au CHRU de Brest." In 66ème Congrès de la SFCO. Les Ulis, France: EDP Sciences, 2020. http://dx.doi.org/10.1051/sfco/20206602009.
Full text