Academic literature on the topic 'Bandit à plusieurs bras'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bandit à plusieurs bras.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Bandit à plusieurs bras"
Leboyer, M., T. D’Amato, A. Malafosse, D. Campion, and F. Gheysen. "Génétique épidémiologique des troubles de l’humeur: une nouvelle voie de recherches?" Psychiatry and Psychobiology 4, no. 4 (1989): 193–202. http://dx.doi.org/10.1017/s0767399x00002753.
Full textBreda, Giulia, and Swanie Potot. "Tri des migrants, racisme et solidarités aux frontières européennes : enquêtes en Pologne." Revue européenne des migrations internationales 40 - n°2 et 3 (2024): 149–69. http://dx.doi.org/10.4000/12hu3.
Full textBengeni, D., P. Lim, and A. Belaud. "Qualité des eaux de trois bras morts de la Garonne variabilité spatio-temporelle)." Revue des sciences de l'eau 5, no. 2 (April 12, 2005): 131–56. http://dx.doi.org/10.7202/705125ar.
Full textMarquis, Dominique. "Un homme et son journal : comment Jules-Paul Tardivel « domestiqua » La Vérité." Mens 13, no. 2 (July 23, 2014): 35–57. http://dx.doi.org/10.7202/1025982ar.
Full textDejean, Frédéric. "De la visibilité des lieux du religieux en contexte urbain : l’exemple des églises protestantes évangéliques à Montréal." Studies in Religion/Sciences Religieuses 49, no. 3 (June 9, 2020): 408–31. http://dx.doi.org/10.1177/0008429820924012.
Full textLe Bras, Hervé. "Dix ans de perspectives de la population étrangère : une perspective." Population Vol. 52, no. 1 (January 1, 1997): 103–33. http://dx.doi.org/10.3917/popu.p1997.52n1.0133.
Full textDOSTIE, GAÉTANE. "Considérations sur la forme et le sens. Pis en français québécois. Une simple variante de puis? Un simple remplaçant de et?" Journal of French Language Studies 14, no. 2 (July 2004): 113–28. http://dx.doi.org/10.1017/s0959269504001607.
Full textFronczak, Stéphane. "Ransomware : le dossier comprendre l’ennemi." Revue Cyber & Conformité N° 1, no. 1 (February 1, 2021): 25–30. http://dx.doi.org/10.3917/cyco.001.0027.
Full textMokhtari, Mathieu. "Capitoline Wolf or Draco? Politicizing the Ancient Past and Materializing the Autochthony in Twenty-First Century Romania." Passés politisés, no. 9 (December 15, 2023): 31–46. http://dx.doi.org/10.35562/frontieres.1833.
Full textAchilleas, Philippe. "La bataille de la 5G et le droit international." Annuaire français de droit international 66, no. 1 (2020): 709–31. http://dx.doi.org/10.3406/afdi.2020.5489.
Full textDissertations / Theses on the topic "Bandit à plusieurs bras"
Robledo, Relaño Francisco. "Algorithmes d'apprentissage par renforcement avancé pour les problèmes bandits multi-arches." Electronic Thesis or Diss., Pau, 2024. http://www.theses.fr/2024PAUU3021.
Full textThis thesis presents advances in Reinforcement Learning (RL) algorithms for resource and policy management in Restless Multi-Armed Bandit (RMAB) problems. We develop algorithms through two approaches in this area. First, for problems with discrete and binary actions, which is the original case of RMAB, we have developed QWI and QWINN. These algorithms compute Whittle indices, a heuristic that decouples the different RMAB processes, thereby simplifying the policy determination. Second, for problems with continuous actions, which generalize to Weakly Coupled Markov Decision Processes (MDPs), we propose LPCA. This algorithm employs a Lagrangian relaxation to decouple the different MDPs.The QWI and QWINN algorithms are introduced as two-timescale methods for computing Whittle indices for RMAB problems. In our results, we show mathematically that the estimates of Whittle indices of QWI converge to the theoretical values. QWINN, an extension of QWI, incorporates neural networks to compute the Q-values used to compute the Whittle indices. Through our results, we present the local convergence properties of the neural network used in QWINN. Our results show how QWINN outperforms QWI in terms of convergence rates and scalability.In the continuous action case, the LPCA algorithm applies a Lagrangian relaxation to decouple the linked decision processes, allowing for efficient computation of optimal policies under resource constraints. We propose two different optimization methods, differential evolution and greedy optimization strategies, to efficiently handle resource allocation. In our results, LPCA shows superior performance over other contemporary RL approaches.Empirical results from different simulated environments validate the effectiveness of the proposed algorithms.These algorithms represent a significant contribution to the field of resource allocation in RL and pave the way for future research into more generalized and scalable reinforcement learning frameworks
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Full textThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
Iacob, Alexandra. "Scalable Model-Free Algorithms for Influencer Marketing." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG012.
Full textMotivated by scenarios of information diffusion and advertising in social media, we study an emph{influence maximization} (IM) problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate. In such a highly uncertain environment, one can focus on emph{multi-round diffusion campaigns}, with the objective to maximize the number of distinct users that are influenced or activated, starting from a known base of few influential nodes.During a campaign, spread seeds are selected sequentially at consecutive rounds, and feedback is collected in the form of the activated nodes at each round.A round's impact (reward) is then quantified as the number of emph{newly activated nodes}.Overall, one must maximize the campaign's total spread, as the sum of rounds' rewards.We consider two sub-classes of IM, emph{cimp} (CIMP) and emph{ecimp} (ECIMP), where (i) the reward of a given round of an ongoing campaign consists of only the extit{new activations} (not observed at previous rounds within that campaign), (ii) the round's context and the historical data from previous rounds can be exploited to learn the best policy, and (iii) ECIMP is CIMP repeated multiple times, offering the possibility of learning from previous campaigns as well.This problem is directly motivated by the real-world scenarios of information diffusion in emph{influencer marketing}, where (i) only a target user's emph{first} / unique activation is of interest (and this activation will emph{persist} as an acquired, latent one throughout the campaign), and (ii) valuable side-information is available to the learning agent.In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaigns.For CIMP, we describe and compare two methods of emph{contextual multi-armed bandits}, with emph{upper-confidence bounds} on the remaining potential of influencers, one using a generalized linear model and the Good-Turing estimator for remaining potential (glmucb), and another one that directly adapts the LinUCB algorithm to our setting (linucb).For ECIMP, we propose the algorithmlgtlsvi, which implements the extit{optimism in the face of uncertainty} principle for episodic reinforcement learning with linear approximation. The learning agent estimates for each seed node its remaining potential with a Good-Turing estimator, modified by an estimated Q-function.We show that they outperform baseline methods using state-of-the-art ideas, on synthetic and real-world data, while at the same time exhibiting different and complementary behavior, depending on the scenarios in which they are deployed
Besson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Full textIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Jedor, Matthieu. "Bandit algorithms for recommender system optimization." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Full textIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Ménard, Pierre. "Sur la notion d'optimalité dans les problèmes de bandit stochastique." Thesis, Toulouse 3, 2018. http://www.theses.fr/2018TOU30087/document.
Full textThe topics addressed in this thesis lie in statistical machine learning and sequential statistic. Our main framework is the stochastic multi-armed bandit problems. In this work we revisit lower bounds on the regret. We obtain non-asymptotic, distribution-dependent bounds and provide simple proofs based only on well-known properties of Kullback-Leibler divergence. These bounds show in particular that in the initial phase the regret grows almost linearly, and that the well-known logarithmic growth of the regret only holds in a final phase. Then, we propose algorithms for regret minimization in stochastic bandit models with exponential families of distributions or with distribution only assumed to be supported by the unit interval, that are simultaneously asymptotically optimal (in the sense of Lai and Robbins lower bound) and minimax optimal. We also analyze the sample complexity of sequentially identifying the distribution whose expectation is the closest to some given threshold, with and without the assumption that the mean values of the distributions are increasing. This work is motivated by phase I clinical trials, a practically important setting where the arm means are increasing by nature. Finally we extend Fano's inequality, which controls the average probability of (disjoint) events in terms of the average of some Kullback-Leibler divergences, to work with arbitrary unit-valued random variables. Several novel applications are provided, in which the consideration of random variables is particularly handy. The most important applications deal with the problem of Bayesian posterior concentration (minimax or distribution-dependent) rates and with a lower bound on the regret in non-stochastic sequential learning
Degenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms." Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Full textIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Kaufmann, Emilie. "Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources." Thesis, Paris, ENST, 2014. http://www.theses.fr/2014ENST0056/document.
Full textIn this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal
Kaufmann, Emilie. "Analyse de stratégies bayésiennes et fréquentistes pour l'allocation séquentielle de ressources." Electronic Thesis or Diss., Paris, ENST, 2014. http://www.theses.fr/2014ENST0056.
Full textIn this thesis, we study strategies for sequential resource allocation, under the so-called stochastic multi-armed bandit model. In this model, when an agent draws an arm, he receives as a reward a realization from a probability distribution associated to the arm. In this document, we consider two different bandit problems. In the reward maximization objective, the agent aims at maximizing the sum of rewards obtained during his interaction with the bandit, whereas in the best arm identification objective, his goal is to find the set of m best arms (i.e. arms with highest mean reward), without suffering a loss when drawing ‘bad’ arms. For these two objectives, we propose strategies, also called bandit algorithms, that are optimal (or close to optimal), in a sense precised below. Maximizing the sum of rewards is equivalent to minimizing a quantity called regret. Thanks to an asymptotic lower bound on the regret of any uniformly efficient algorithm given by Lai and Robbins, one can define asymptotically optimal algorithms as algorithms whose regret reaches this lower bound. In this thesis, we propose, for two Bayesian algorithms, Bayes-UCB and Thompson Sampling, a finite-time analysis, that is a non-asymptotic upper bound on their regret, in the particular case of bandits with binary rewards. This upper bound allows to establish the asymptotic optimality of both algorithms. In the best arm identification framework, a possible goal is to determine the number of samples of the armsneeded to identify, with high probability, the set of m best arms. We define a notion of complexity for best arm identification in two different settings considered in the literature: the fixed-budget and fixed-confidence settings. We provide new lower bounds on these complexity terms and we analyse new algorithms, some of which reach the lower bound in particular cases of two-armed bandit models and are therefore optimal
Clement, Benjamin. "Adaptive Personalization of Pedagogical Sequences using Machine Learning." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0373/document.
Full textCan computers teach people? To answer this question, Intelligent Tutoring Systems are a rapidly expanding field of research among the Information and Communication Technologies for the Education community. This subject brings together different issues and researchers from various fields, such as psychology, didactics, neurosciences and, particularly, machine learning. Digital technologies are becoming more and more a part of everyday life with the development of tablets and smartphones. It seems natural to consider using these technologies for educational purposes. This raises several questions, such as how to make user interfaces accessible to everyone, how to make educational content motivating and how to customize it to individual learners. In this PhD, we developed methods, grouped in the aptly-named HMABITS framework, to adapt pedagogical activity sequences based on learners' performances and preferences to maximize their learning speed and motivation. These methods use computational models of intrinsic motivation and curiosity-driven learning to identify the activities providing the highest learning progress and use Multi-Armed Bandit algorithms to manage the exploration/exploitation trade-off inside the activity space. Activities of optimal interest are thus privileged with the target to keep the learner in a state of Flow or in his or her Zone of Proximal Development. Moreover, some of our methods allow the student to make choices about contextual features or pedagogical content, which is a vector of self-determination and motivation. To evaluate the effectiveness and relevance of our algorithms, we carried out several types of experiments. We first evaluated these methods with numerical simulations before applying them to real teaching conditions. To do this, we developed multiple models of learners, since a single model never exactly replicates the behavior of a real learner. The simulation results show the HMABITS framework achieves comparable, and in some cases better, learning results than an optimal solution or an expert sequence. We then developed our own pedagogical scenario and serious game to test our algorithms in classrooms with real students. We developed a game on the theme of number decomposition, through the manipulation of money, for children aged 6 to 8. We then worked with the educational institutions and several schools in the Bordeaux school district. Overall, about 1000 students participated in trial lessons using the tablet application. The results of the real-world studies show that the HMABITS framework allows the students to do more diverse and difficult activities, to achieve better learning and to be more motivated than with an Expert Sequence. The results show that this effect is even greater when the students have the possibility to make choices
Book chapters on the topic "Bandit à plusieurs bras"
Lorre-Johnston, Christine. "Gayatri Chakravorty Spivak." In Gayatri Chakravorty Spivak, 67–87. Hermann, 2023. http://dx.doi.org/10.3917/herm.renau.2023.02.0067.
Full text"« Les nouvelles formes habitent et conditionne en le suicide, comme pas mal de d’expression qui apparaissent partie leur scolarité et leur filles de la cité. Quand tu vis là-chez les jeunes Maghrébins de accès au monde professionnel. dedans, tu es convaincue que France portent souvent la Dans une monographie, un ça été voulu comme ça, qu’on marque d’une longue jeune qui avait vécu dans les t’as mis sur la touche [4] pour expérience et d’un profond années soixante dans le plus que t’y restes, pour que tu te sentiment d’exclusion sociale, grand bidonville de la région sentes jamais chez toi, tu es là économique et politique. […] parisienne, « La Folie » à près de la sortie, et à tout Dans l’analyse de ce sentiment Nanterre, raconte: moment, on peut te mettre d’exclusion qu’expriment un – « Vraiment, je me carrément dehors ». (Malika, 25 grand nombre de ces jeunes, demande, qui est-ce qui a pu ans, Marseille) Pour d’autres plusieurs significations inventer le bidonville? Un jeunes, ceux qui ont grandi apparaissent: ils se sentent sadique certainement (…). Les dans les grands ensembles et exclus parce qu’ils sont ordures, on les laissait; les rats, les ZUP [5] qui ont été d’origine maghrébine, enfants on les laissait; les gosses construites à tour de bras [6] de manœuvres et d’ouvriers, tombaient malades, ils avaient dans les années soixante, le jeunes dans une société pas de place pour apprendre à sentiment d’être exclu est le vieillissante que leur jeunesse marcher. On avait honte, on même, mais il est différent effraie; ce sentiment était sales, et pourtant on dans sa nature: si on les a d’exclusion commence pour essayait d’être propres pour pas parqués à la périphérie des certains très tôt à l’école, qu’on sache [2] qu’on était du villes, ce n’est pas pour les ensuite, c’est le lieu bidonville ». exclure totalement de l’espace d’habitation, le manque de Plusieurs histoires allant urbain et social, mais pour les loisirs et de moyens, des dans le même sens sont empêcher d’y entrer. frustrations quotidiennes de racontées par des jeunes des leurs désirs et rêves d’enfants cités de transit de la région." In Francotheque: A resource for French studies, 61. Routledge, 2014. http://dx.doi.org/10.4324/978020378416-8.
Full textConference papers on the topic "Bandit à plusieurs bras"
Hascoet, E., G. Valette, G. Le Toux, and S. Boisramé. "Proposition d’un protocole de prise en charge implanto-portée de patients traités en oncologie tête et cou suite à une étude rétrospective au CHRU de Brest." In 66ème Congrès de la SFCO. Les Ulis, France: EDP Sciences, 2020. http://dx.doi.org/10.1051/sfco/20206602009.
Full text