Contents
Academic literature on the topic 'Bandit à plusieurs bra'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Bandit à plusieurs bra.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Journal articles on the topic "Bandit à plusieurs bra"
Fronczak, Stéphane. "Ransomware : le dossier comprendre l’ennemi." Revue Cyber & Conformité N° 1, no. 1 (February 1, 2021): 25–30. http://dx.doi.org/10.3917/cyco.001.0027.
Full textDissertations / Theses on the topic "Bandit à plusieurs bra"
Robledo, Relaño Francisco. "Algorithmes d'apprentissage par renforcement avancé pour les problèmes bandits multi-arches." Electronic Thesis or Diss., Pau, 2024. http://www.theses.fr/2024PAUU3021.
Full textThis thesis presents advances in Reinforcement Learning (RL) algorithms for resource and policy management in Restless Multi-Armed Bandit (RMAB) problems. We develop algorithms through two approaches in this area. First, for problems with discrete and binary actions, which is the original case of RMAB, we have developed QWI and QWINN. These algorithms compute Whittle indices, a heuristic that decouples the different RMAB processes, thereby simplifying the policy determination. Second, for problems with continuous actions, which generalize to Weakly Coupled Markov Decision Processes (MDPs), we propose LPCA. This algorithm employs a Lagrangian relaxation to decouple the different MDPs.The QWI and QWINN algorithms are introduced as two-timescale methods for computing Whittle indices for RMAB problems. In our results, we show mathematically that the estimates of Whittle indices of QWI converge to the theoretical values. QWINN, an extension of QWI, incorporates neural networks to compute the Q-values used to compute the Whittle indices. Through our results, we present the local convergence properties of the neural network used in QWINN. Our results show how QWINN outperforms QWI in terms of convergence rates and scalability.In the continuous action case, the LPCA algorithm applies a Lagrangian relaxation to decouple the linked decision processes, allowing for efficient computation of optimal policies under resource constraints. We propose two different optimization methods, differential evolution and greedy optimization strategies, to efficiently handle resource allocation. In our results, LPCA shows superior performance over other contemporary RL approaches.Empirical results from different simulated environments validate the effectiveness of the proposed algorithms.These algorithms represent a significant contribution to the field of resource allocation in RL and pave the way for future research into more generalized and scalable reinforcement learning frameworks
Azize, Achraf. "Privacy-Utility Trade-offs in Sequential Decision-Making under Uncertainty." Electronic Thesis or Diss., Université de Lille (2022-....), 2024. http://www.theses.fr/2024ULILB029.
Full textThe topics addressed in this thesis aim to characterise the privacy-utility trade-offs in sequential decision-making under uncertainty. The main privacy framework adopted is Differential Privacy (DP), and the main setting for studying utility is the stochastic Multi-Armed Bandit (MAB) problem. First, we propose different definitions that extend DP to the setting of multi-armed bandits. Then, we quantify the hardness of private bandits by proving lower bounds on the performance of bandit algorithms verifying the DP constraint. These bounds suggest the existence of two hardness regimes depending on the privacy budget and the reward distributions. We further propose a generic blueprint to design near-optimal DP extensions of bandit algorithms. We instantiate the blueprint to design DP versions of different bandit algorithms under different settings: finite-armed, linear and contextual bandits under regret as a utility measure, and finite-armed bandits under sample complexity of identifying the optimal arm as a utility measure. The theoretical and experimental analysis of the proposed algorithms furthermore validates the existence of two hardness regimes depending on the privacy budget.In the second part of this thesis, we shift the view from privacy defences to attacks. Specifically, we study fixed-target Membership Inference (MI) attacks, where an adversary aims to infer whether a fixed target point was included or not in the input dataset of an algorithm. We define the target-dependent leakage of a datapoint as the advantage of the optimal adversary trying to infer the membership of that datapoint. Then, we quantify both the target-dependent leakage and the trade-off functions for the empirical mean and variants of interest in terms of the Mahalanobis distance between the target point and the data-generating distribution. Our asymptotic analysis builds on a novel proof technique that combines an Edgeworth expansion of the Likelihood Ratio (LR) test and a Lindeberg-Feller central limit theorem. Our analysis shows that the LR test for the empirical mean is a scalar product attack but corrected for the geometry of the data using the inverse of the covariance matrix. Finally, as by-products of our analysis, we propose a new covariance score and a new canary selection strategy for auditing gradient descent algorithms in the white-box federated learning setting
Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.
Full textThe main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
Iacob, Alexandra. "Scalable Model-Free Algorithms for Influencer Marketing." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG012.
Full textMotivated by scenarios of information diffusion and advertising in social media, we study an emph{influence maximization} (IM) problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate. In such a highly uncertain environment, one can focus on emph{multi-round diffusion campaigns}, with the objective to maximize the number of distinct users that are influenced or activated, starting from a known base of few influential nodes.During a campaign, spread seeds are selected sequentially at consecutive rounds, and feedback is collected in the form of the activated nodes at each round.A round's impact (reward) is then quantified as the number of emph{newly activated nodes}.Overall, one must maximize the campaign's total spread, as the sum of rounds' rewards.We consider two sub-classes of IM, emph{cimp} (CIMP) and emph{ecimp} (ECIMP), where (i) the reward of a given round of an ongoing campaign consists of only the extit{new activations} (not observed at previous rounds within that campaign), (ii) the round's context and the historical data from previous rounds can be exploited to learn the best policy, and (iii) ECIMP is CIMP repeated multiple times, offering the possibility of learning from previous campaigns as well.This problem is directly motivated by the real-world scenarios of information diffusion in emph{influencer marketing}, where (i) only a target user's emph{first} / unique activation is of interest (and this activation will emph{persist} as an acquired, latent one throughout the campaign), and (ii) valuable side-information is available to the learning agent.In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaigns.For CIMP, we describe and compare two methods of emph{contextual multi-armed bandits}, with emph{upper-confidence bounds} on the remaining potential of influencers, one using a generalized linear model and the Good-Turing estimator for remaining potential (glmucb), and another one that directly adapts the LinUCB algorithm to our setting (linucb).For ECIMP, we propose the algorithmlgtlsvi, which implements the extit{optimism in the face of uncertainty} principle for episodic reinforcement learning with linear approximation. The learning agent estimates for each seed node its remaining potential with a Good-Turing estimator, modified by an estimated Q-function.We show that they outperform baseline methods using state-of-the-art ideas, on synthetic and real-world data, while at the same time exhibiting different and complementary behavior, depending on the scenarios in which they are deployed