Tesis sobre el tema "Bandit Contextuel"
Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros
Consulte los 30 mejores tesis para su investigación sobre el tema "Bandit Contextuel".
Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.
También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.
Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.
Sakhi, Otmane. "Offline Contextual Bandit : Theory and Large Scale Applications". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAG011.
Texto completoThis thesis presents contributions to the problem of learning from logged interactions using the offline contextual bandit framework. We are interested in two related topics: (1) offline policy learning with performance certificates, and (2) fast and efficient policy learning applied to large scale, real world recommendation. For (1), we first leverage results from the distributionally robust optimisation framework to construct asymptotic, variance-sensitive bounds to evaluate policies' performances. These bounds lead to new, more practical learning objectives thanks to their composite nature and straightforward calibration. We then analyse the problem from the PAC-Bayesian perspective, and provide tighter, non-asymptotic bounds on the performance of policies. Our results motivate new strategies, that offer performance certificates before deploying the policies online. The newly derived strategies rely on composite learning objectives that do not require additional tuning. For (2), we first propose a hierarchical Bayesian model, that combines different signals, to efficiently estimate the quality of recommendation. We provide proper computational tools to scale the inference to real world problems, and demonstrate empirically the benefits of the approach in multiple scenarios. We then address the question of accelerating common policy optimisation approaches, particularly focusing on recommendation problems with catalogues of millions of items. We derive optimisation routines, based on new gradient approximations, computed in logarithmic time with respect to the catalogue size. Our approach improves on common, linear time gradient computations, yielding fast optimisation with no loss on the quality of the learned policies
Huix, Tom. "Variational Inference : theory and large scale applications". Electronic Thesis or Diss., Institut polytechnique de Paris, 2024. http://www.theses.fr/2024IPPAX071.
Texto completoThis thesis explores Variational Inference methods for high-dimensional Bayesian learning. In Machine Learning, the Bayesian approach allows one to deal with epistemic uncertainty and provides and a better uncertainty quantification, which is necessary in many machine learning applications. However, Bayesian inference is often not feasible because the posterior distribution of the model parameters is generally untractable. Variational Inference (VI) allows to overcome this problem by approximating the posterior distribution with a simpler distribution called the variational distribution.In the first part of this thesis, we worked on the theoretical guarantees of Variational Inference. First, we studied VI when the Variational distribution is a Gaussian and in the overparameterized regime, i.e., when the models are high dimensional. Finally, we explore the Gaussian mixtures Variational distributions, as it is a more expressive distribution. We studied both the optimization error and the approximation error of this method.In the second part of the thesis, we studied the theoretical guarantees for contextual bandit problems using a Bayesian approach called Thompson Sampling. First, we explored the use of Variational Inference for Thompson Sampling algorithm. We notably showed that in the linear framework, this approach allows us to obtain the same theoretical guarantees as if we had access to the true posterior distribution. Finally, we consider a variant of Thompson Sampling called Feel-Good Thompson Sampling (FG-TS). This method allows to provide better theoretical guarantees than the classical algorithm. We then studied the use of a Monte Carlo Markov Chain method to approximate the posterior distribution. Specifically, we incorporated into FG-TS a Langevin Monte Carlo algorithm and a Metropolized Langevin Monte Carlo algorithm. Moreover, we obtained the same theoretical guarantees as for FG-TS when the posterior distribution is known
Bouneffouf, Djallel. "DRARS, A Dynamic Risk-Aware Recommender System". Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-01026136.
Texto completoChia, John. "Non-linear contextual bandits". Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/42191.
Texto completoGalichet, Nicolas. "Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits". Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112242/document.
Texto completoThis thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior
Nicol, Olivier. "Data-driven evaluation of contextual bandit algorithms and applications to dynamic recommendation". Thesis, Lille 1, 2014. http://www.theses.fr/2014LIL10211/document.
Texto completoThe context of this thesis work is dynamic recommendation. Recommendation is the action, for an intelligent system, to supply a user of an application with personalized content so as to enhance what is refered to as "user experience" e.g. recommending a product on a merchant website or even an article on a blog. Recommendation is considered dynamic when the content to recommend or user tastes evolve rapidly e.g. news recommendation. Many applications that are of interest to us generates a tremendous amount of data through the millions of online users they have. Nevertheless, using this data to evaluate a new recommendation technique or even compare two dynamic recommendation algorithms is far from trivial. This is the problem we consider here. Some approaches have already been proposed. Nonetheless they were not studied very thoroughly both from a theoretical point of view (unquantified bias, loose convergence bounds...) and from an empirical one (experiments on private data only). In this work we start by filling many blanks within the theoretical analysis. Then we comment on the result of an experiment of unprecedented scale in this area: a public challenge we organized. This challenge along with a some complementary experiments revealed a unexpected source of a huge bias: time acceleration. The rest of this work tackles this issue. We show that a bootstrap-based approach allows to significantly reduce this bias and more importantly to control it
May, Benedict C. "Bayesian sampling in contextual-bandit problems with extensions to unknown normal-form games". Thesis, University of Bristol, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627937.
Texto completoJu, Weiyu. "Mobile Deep Neural Network Inference in Edge Computing with Resource Restrictions". Thesis, The University of Sydney, 2021. https://hdl.handle.net/2123/25038.
Texto completoBrégère, Margaux. "Stochastic bandit algorithms for demand side management Simulating Tariff Impact in Electrical Energy Consumption Profiles with Conditional Variational Autoencoders Online Hierarchical Forecasting for Power Consumption Data Target Tracking for Contextual Bandits : Application to Demand Side Management". Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM022.
Texto completoAs electricity is hard to store, the balance between production and consumption must be strictly maintained. With the integration of intermittent renewable energies into the production mix, the management of the balance becomes complex. At the same time, the deployment of smart meters suggests demand response. More precisely, sending signals - such as changes in the price of electricity - would encourage users to modulate their consumption according to the production of electricity. The algorithms used to choose these signals have to learn consumer reactions and, in the same time, to optimize them (exploration-exploration trade-off). Our approach is based on bandit theory and formalizes this sequential learning problem. We propose a first algorithm to control the electrical demand of a homogeneous population of consumers and offer T⅔ upper bound on its regret. Experiments on a real data set in which price incentives were offered illustrate these theoretical results. As a “full information” dataset is required to test bandit algorithms, a consumption data generator based on variational autoencoders is built. In order to drop the assumption of the population homogeneity, we propose an approach to cluster households according to their consumption profile. These different works are finally combined to propose and test a bandit algorithm for personalized demand side management
Wan, Hao. "Tutoring Students with Adaptive Strategies". Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/36.
Texto completoAkhavanfoomani, Aria. "Derivative-free stochastic optimization, online learning and fairness". Electronic Thesis or Diss., Institut polytechnique de Paris, 2023. http://www.theses.fr/2023IPPAG001.
Texto completoIn this thesis, we first study the problem of zero-order optimization in the active setting for smooth and three different classes of functions: i) the functions that satisfy the Polyak-Łojasiewicz condition, ii) strongly convex functions, and iii) the larger class of highly smooth non-convex functions.Furthermore, we propose a novel algorithm that is based on l1-type randomization, and we study its properties for Lipschitz convex functions in an online optimization setting. Our analysis is due to deriving a new Poincar'e type inequality for the uniform measure on the l1-sphere with explicit constants.Then, we study the zero-order optimization problem in the passive schemes. We propose a new method for estimating the minimizer and the minimum value of a smooth and strongly convex regression function f. We derive upper bounds for this algorithm and prove minimax lower bounds for such a setting.In the end, we study the linear contextual bandit problem under fairness constraints where an agent has to select one candidate from a pool, and each candidate belongs to a sensitive group. We propose a novel notion of fairness which is practical in the aforementioned example. We design a greedy policy that computes an estimate of the relative rank of each candidate using the empirical cumulative distribution function, and we proved its optimal property
Morant, Brigitte y Adoracion Vadillo. "Les Trabucayres : dans leur contexte économique, historique, sociologique et culturel : (1836-1846)". Perpignan, 1987. http://www.theses.fr/1987PERP0052.
Texto completoGutowski, Nicolas. "Recommandation contextuelle de services : application à la recommandation d'évènements culturels dans la ville intelligente". Thesis, Angers, 2019. http://www.theses.fr/2019ANGE0030.
Texto completoNowadays, Multi-Armed Bandit algorithms for context-aware recommendation systems are extensively studied. In order to meet challenges underlying this field of research, our works and contributions have been organised according to three research directions : 1) recommendation systems ; 2) Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit algorithms (CMAB) ; 3) context.The first part of our contributions focuses on MAB and CMAB algorithms for recommendation. It particularly addresses diversification of recommendations for improving individual accuracy. The second part is focused on contextacquisition, on context reasoning for cultural events recommendation systems for Smart Cities, and on dynamic context enrichment for CMAB algorithms
Wang, Yu-Xiang. "New Paradigms and Optimality Guarantees in Statistical Learning and Estimation". Research Showcase @ CMU, 2017. http://repository.cmu.edu/dissertations/1113.
Texto completoKhalaf, Ziad. "Contributions à l'étude de détection des bandes libres dans le contexte de la radio intelligente". Phd thesis, Supélec, 2013. http://tel.archives-ouvertes.fr/tel-00812666.
Texto completoTrizzulla, Caterina. "Appréhender la variété des modes de consommation culturelle en contextes présents et passés : le cas de la bande dessinée". Thesis, Université de Lorraine, 2018. http://www.theses.fr/2018LORR0156/document.
Texto completoThis doctoral research aims to discuss the construction and observation of plural cultural consumption patterns based on the case of comics. The literature review seems to highlight the importance of reporting both the present (synchronic) and temporal (diachronic) dimensions of the cultural practices observed. Lahire's (2005, 2013) dispositionalist and contextualist perspective seems to be a response to this requirement. Indeed, the cultural practices observed are never disconnected from the frameworks that participate in their construction or from those that allow them to be observed in contexts. To reflect this dual dimension of cultural practices, this work is based on the production of six sociological portraits (Lahire, 2005). They allow not only to identify the variety of frameworks behind the construction of observed consumption patterns, but also to describe the variety of their effects at the individual level: the dispositions
Jouini, Wassim. "Contribution à l'apprentissage et à la prise de décision, dans des contextes d'incertitude, pour la radio intelligente". Phd thesis, Supélec, 2012. http://tel.archives-ouvertes.fr/tel-00765437.
Texto completoLe, Bras Hughes. "Etude des réseaux radio sur fibre dans le contexte des réseaux d'accès et privatifs". Paris 6, 2008. https://tel.archives-ouvertes.fr/tel-00812485.
Texto completoKetsea, Eftychia Vilelmini. "Les élèves créent des bandes dessinées pour l'apprentissage de la physique dans le contexte de la classe du secondaire : une analyse sémiotique permettant d'accéder aux processus d'apprentissage". Electronic Thesis or Diss., CY Cergy Paris Université, 2024. http://www.theses.fr/2024CYUN1294.
Texto completoA design-based research comprising the design and implementation of a sequence of 10 lessons (combining physics and comics-making) and aiming at the attainment of physics learning objectives, provided the data corpus, namely the students' comics production, in a secondary school class. Semiotic theory applied to the visual language of sequential images provided the analytical tools that linked the signs in the comics to processes of learning and primarily to the students' modes of reasoning and their characteristics
Le, Bras Hugues. "Étude des réseaux radio sur fibre dans le contexte des réseaux d'accès et privatifs". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2008. http://tel.archives-ouvertes.fr/tel-00812485.
Texto completoStanciu, Mihai Ionut. "Sur l'estimation aveugle de paramètres de signaux UWB impulsionnels dans un contexte de radio intelligente". Brest, 2011. http://www.theses.fr/2011BRES2023.
Texto completoThis thesis is concerned with the study of UWB systems which represent a promising perspective in low range radio systems field. UWB technology is best suited to be used within ad-hoc Piconet radio networks, which must dispose of high flexibility. Consequently this thesis is focused on one hand on the development of very low complexity parameters blind estimation methods, which can play an essential role in the synchronization stage, and on the other hand on the statistical characterization of the propagation channel, with the scope of establishing criteria to realize blind real time adjustments of the digital transmission. The study is organized in three main directions. The first consists of developing a method to estimate the chip time, based on noisy times of arrival measurements, with false and missing observations. The main problem with this approach is that the considered times of arrival statistical model cannot realistically reflect indoor UWB channels. Therefore a second direction is concerned with the development of a method to estimate the chip time based on energy measurements on the received UWB impulse radio signal. Using the well known energy detector principle this approach jointly estimates the chip time and this optimal integration window, the main advantage is that it allows considering propagation noise, multipath propagation and multiuser interference. The third direction deals with a statistical study of the multipath propagation interference of a UWB propagation channel
Pagani, Pascal. "Caractérisation et modélisation du canal de propagation radio en contexte Ultra Large Bande". Phd thesis, INSA de Rennes, 2005. http://tel.archives-ouvertes.fr/tel-00011220.
Texto completoLes deux techniques de sondage proposées permettent la mesure du canal statique dans la bande 3,1 – 11,1 GHz et le sondage en temps réel dans la bande 4 – 5 GHz. Plusieurs campagnes de mesure réalisées en environnement intérieur de bureau sont détaillées. Leur analyse permet de dégager les paramètres grande échelle et les évanouissements rapides du canal statique, avec une étude particulière de l'influence de la fréquence. Des études spécifiques sont dédiées aux variations du canal UWB dues au mouvement des antennes et au passage de personnes. Un modèle statistique est proposé, permettant de reproduire les effets du canal de propagation UWB, en configurations statique et dynamique.
Abdi, Abyaneh Mohammad. "Génération des signaux agrégés en fréquences dans le contexte de LTE-A". Electronic Thesis or Diss., Paris, ENST, 2016. https://pastel.hal.science/tel-03689710.
Texto completoIn this dissertation, a signal generation structure is proposed using which a multitone Local Oscillator (LO) signal is created by a single I/Q modulator. These LOs can be used in the CA receivers to down-convert the transmitted component carriers simultaneously. The multi-tone LO signal structure is further developed to be used at the transmitter as a CA generation solution. Using the proposed structure n-component carriers can be generated in parallel. This structures requires lower sampling rates with respect to the case where RF signals are synthesized directly by digital-to- analog converters. Moreover, less circuitry is required, because one single I/Q modulator is used to generate n component carriers, instead of n I/Q modulators. This work follows on investigating the origin of impairments and mild nonlinearities in our I/Q modulator. To overcome these problems, we focus on the functionality of the overall system rather than each component of the I/Q modulator. This method is called behavioral modeling. Once the nonlinear model is obtained, if its inverse function is applied to the input, a linearized output is expected. The generation of the inverse function is called Digital Pre-Distortion (DPD). We propose a tri-band behavioral model for nonlinearities and impairments in tri-band CA using our I/Q modulator. Furthermore, the DPD of the model is evaluated in simulations and experiments
Tchoffo, Talom Friedman. "Modélisation déterministe du canal de propagation indoor dans un contexte Ultra Wide Band". Phd thesis, INSA de Rennes, 2005. http://tel.archives-ouvertes.fr/tel-00012059.
Texto completoHenaut, Julien. "Architecture de traitement du signal pour les couches physiques très haut débit pour les réseaux de capteur : Application à la métrologie dans un contexte aéronautique et spatial". Phd thesis, INSA de Toulouse, 2013. http://tel.archives-ouvertes.fr/tel-00849338.
Texto completoChang, Ya-Hsuan y 張雅軒. "Study on Contextual Bandit Problem with Multiple Actions". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/94665894891939536263.
Texto completo國立臺灣大學
資訊工程學研究所
101
The contextual bandit problem is usually used to model online applications like article recommendation. Somehow the problem cannot fully meet some needs of these applica- tions, such as making multiple actions at the same time. We propose a new Contextual Bandit Problem with Multiple Ac- tions (CBMA), which is an extension of the traditional con- textual bandit problem and fits the online applications better. We adapt some existing contextual bandit algorithms for our CBMA problem, and propose a new Pairwise Regression with Upper Confidence Bound (PairUCB) algorithm which utilizes the new properties of the CBMA problem, The experiment re- sults demostrate that PairUCB outperforms other algorithms.
Chou, Ku-Chun y 周谷駿. "Pseudo-reward Algorithms for Linear Contextual Bandit Problems". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/48964441878502463981.
Texto completo國立臺灣大學
資訊工程學研究所
101
We study the contextual bandit problem that arises in many real world applications such as advertising, recommendations, and otherWeb applications. One leading algorithm for contextual bandit is the linear upper confidence bound (LINUCB) approach, which is based on updating internal linear regression models with the partial feedback from the environment. Because of updating with only the partial feedback, LINUCB can be slow in converging to the optimal performance. In this work, we study techniques that improve LINUCB by updating the linear regressionmodels with some additional feedback called the pseudo-reward. By choosing a proper pseudo-reward formula and implementing a forgetting mechanism to avoid being overly biased by the pseudo-rewards, we propose an improved algorithm that matches the regret guarantee of LINUCB in theory. Furthermore, we design a variant of the proposed algorithm that can be significantly more efficient than LINUCB during action selection, which directly implies faster response time in many applications. Extensive experimental results from both artificial data and the benchmark Yahoo! News recommendation data show that the proposed algorithm enjoys better performance than LINUCB and other contextual bandit algorithms.
Chien, Zhi-hua y 簡志樺. "Using Contextual Multi-Armed Bandit Algorithms for Recommending Investment in Stock Market". Thesis, 2016. http://ndltd.ncl.edu.tw/handle/n3qyn2.
Texto completo國立中山大學
資訊管理學系研究所
104
The Contextual Bandit Problem (CMAB) is usually used to recommend for online applications on article, music, movie, etc. One leading algorithm for contextual bandit is the LinUCB algorithm, which updates internal linear regression models by the partial feedback from the environment. However, we observe that CMAB is rarely used in the stock recommendation, while most of the recommendations are for the purpose of profit, and ignore investor’s features (risk tolerance, investment features, and the others). We propose a personalized recommendation system for stock by using contextual multi-armed bandit algorithm. We take investor’s investment records as user features, and recommend the “arm”, which is a type of stock, based on two kinds of analysis, the technical and fundamental analysis. To the chosen arm, we rank the stocks according to the similarity of the stock and the arm. Our experiment is base on an online investment dataset, and the result demonstrates that our method outperforms other algorithms. Our experiment dataset collects simulation investment on the online website, and the result demonstrates that our method outperforms other algorithms.
Saha, Aadirupa. "Battle of Bandits: Online Learning from Subsetwise Preferences and Other Structured Feedback". Thesis, 2020. https://etd.iisc.ac.in/handle/2005/5184.
Texto completo(9136835), Sungbum Jun. "SCHEDULING AND CONTROL WITH MACHINE LEARNING IN MANUFACTURING SYSTEMS". Thesis, 2020.
Buscar texto completo