Dissertations / Theses on the topic 'Reinforcement Learning, Multi-armed Bandits'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 37 dissertations / theses for your research on the topic 'Reinforcement Learning, Multi-armed Bandits.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Magureanu, Stefan. "Structured Stochastic Bandits." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-182816.
Full textQC 20160223
Talebi, Mazraeh Shahi Mohammad Sadegh. "Minimizing Regret in Combinatorial Bandits and Reinforcement Learning." Doctoral thesis, KTH, Reglerteknik, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-219970.
Full textQC 20171215
Hauser, Kristen. "Hyperparameter Tuning for Reinforcement Learning with Bandits and Off-Policy Sampling." Case Western Reserve University School of Graduate Studies / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=case1613034993418088.
Full textOlkhovskaya, Julia. "Large-scale online learning under partial feedback." Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673926.
Full textSequential decision making under uncertainty covers a broad class of problems. Real-world applications require the algorithms to be computationally efficient and scalable. We study a range of sequential learning problems, where the learner observe only partial information about the rewards we develop the algorithms that are robust and computationally efficient in large-scale settings. First problem that we consider is an online influence maximization problem in which a decision maker sequentiaonally selects a node in the graph in order to spread the information throughout the graph by placing the information in the chosen node. The available feedback is only some information about a small neighbourhood of the selected vertex. Our results show that such partial local observations can be sufficient for maximizing global influence. We propose sequential learning algorithms that aim at maximizing influence, and provide their theoretical analysis in both the subcritical and supercritical regimes of broadly studied graph models. Thus this is the first algorithms in the sequential influence maximization setting, that perform efficiently in the graph with a huge number of nodes. In another line of work, we study the contextual bandit problem, where the reward function is allowed to change in an adversarial manner and the learner only gets to observe the rewards associated with its actions. We assume that the number of arms is finite and the context space can be infinite. We develop a computationally efficient algorithm under the assumption that the d-dimensional contexts are generated i.i.d. at random from a known distribution. We also propose an algorithm that is shown to be robust to misspecification in the setting where the true reward function is linear up to an additive nonlinear error. To our knowledge, our performance guarantees constitute the very first results on this problem setting. We also provide an extension when the context is an element of a reproducing kernel Hilbert space. Finally, we consider an extension of the contextual bandit problem described above. We study a setting where the learner interacts with a Markov decision process in a sequence of episodes, where an adversary chooses the reward function and the reward observations are available only for the selected action. We allow the state space to be arbitrarily large, but we assume that all action-value functions can be represented as linear functions in terms of a known low-dimensional feature map, and that the learner at least has access to the simulator of the trajectories in the MDP. Our main contributions are the first algorithms that are shown to be robust and efficient in this problem setting.
Racey, Deborah Elaine. "EFFECTS OF RESPONSE FREQUENCY CONSTRAINTS ON LEARNING IN A NON-STATIONARY MULTI-ARMED BANDIT TASK." OpenSIUC, 2009. https://opensiuc.lib.siu.edu/dissertations/86.
Full textBesson, Lilian. "Multi-Players Bandit Algorithms for Internet of Things Networks." Thesis, CentraleSupélec, 2019. http://www.theses.fr/2019CSUP0005.
Full textIn this PhD thesis, we study wireless networks and reconfigurable end-devices that can access Cognitive Radio networks, in unlicensed bands and without central control. We focus on Internet of Things networks (IoT), with the objective of extending the devices’ battery life, by equipping them with low-cost but efficient machine learning algorithms, in order to let them automatically improve the efficiency of their wireless communications. We propose different models of IoT networks, and we show empirically on both numerical simulations and real-world validation the possible gain of our methods, that use Reinforcement Learning. The different network access problems are modeled as Multi-Armed Bandits (MAB), but we found that analyzing the realistic models was intractable, because proving the convergence of many IoT devices playing a collaborative game, without communication nor coordination is hard, when they all follow random activation patterns. The rest of this manuscript thus studies two restricted models, first multi-players bandits in stationary problems, then non-stationary single-player bandits. We also detail another contribution, SMPyBandits, our open-source Python library for numerical MAB simulations, that covers all the studied models and more
Achab, Mastane. "Ranking and risk-aware reinforcement learning." Electronic Thesis or Diss., Institut polytechnique de Paris, 2020. http://www.theses.fr/2020IPPAT020.
Full textThis thesis divides into two parts: the first part is on ranking and the second on risk-aware reinforcement learning. While binary classification is the flagship application of empirical risk minimization (ERM), the main paradigm of machine learning, more challenging problems such as bipartite ranking can also be expressed through that setup. In bipartite ranking, the goal is to order, by means of scoring methods, all the elements of some feature space based on a training dataset composed of feature vectors with their binary labels. This thesis extends this setting to the continuous ranking problem, a variant where the labels are taking continuous values instead of being simply binary. The analysis of ranking data, initiated in the 18th century in the context of elections, has led to another ranking problem using ERM, namely ranking aggregation and more precisely the Kemeny's consensus approach. From a training dataset made of ranking data, such as permutations or pairwise comparisons, the goal is to find the single "median permutation" that best corresponds to a consensus order. We present a less drastic dimensionality reduction approach where a distribution on rankings is approximated by a simpler distribution, which is not necessarily reduced to a Dirac mass as in ranking aggregation.For that purpose, we rely on mathematical tools from the theory of optimal transport such as Wasserstein metrics. The second part of this thesis focuses on risk-aware versions of the stochastic multi-armed bandit problem and of reinforcement learning (RL), where an agent is interacting with a dynamic environment by taking actions and receiving rewards, the objective being to maximize the total payoff. In particular, a novel atomic distributional RL approach is provided: the distribution of the total payoff is approximated by particles that correspond to trimmed means
Barkino, Iliam. "Summary Statistic Selection with Reinforcement Learning." Thesis, Uppsala universitet, Avdelningen för beräkningsvetenskap, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-390838.
Full textJedor, Matthieu. "Bandit algorithms for recommender system optimization." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM027.
Full textIn this PhD thesis, we study the optimization of recommender systems with the objective of providing more refined suggestions of items for a user to benefit.The task is modeled using the multi-armed bandit framework.In a first part, we look upon two problems that commonly occured in recommendation systems: the large number of items to handle and the management of sponsored contents.In a second part, we investigate the empirical performance of bandit algorithms and especially how to tune conventional algorithm to improve results in stationary and non-stationary environments that arise in practice.This leads us to analyze both theoretically and empirically the greedy algorithm that, in some cases, outperforms the state-of-the-art
Couetoux, Adrien. "Monte Carlo Tree Search for Continuous and Stochastic Sequential Decision Making Problems." Thesis, Paris 11, 2013. http://www.theses.fr/2013PA112192.
Full textIn this thesis, we study sequential decision making problems, with a focus on the unit commitment problem. Traditionally solved by dynamic programming methods, this problem is still a challenge, due to its high dimension and to the sacrifices made on the accuracy of the model to apply state of the art methods. We investigate on the applicability of Monte Carlo Tree Search methods for this problem, and other problems that are single player, stochastic and continuous sequential decision making problems. We started by extending the traditional finite state MCTS to continuous domains, with a method called Double Progressive Widening (DPW). This method relies on two hyper parameters, and determines the ratio between width and depth in the nodes of the tree. We developed a heuristic called Blind Value (BV) to improve the exploration of new actions, using the information from past simulations. We also extended the RAVE heuristic to continuous domain. Finally, we proposed two new ways of backing up information through the tree, that improved the convergence speed considerably on two test cases.An important part of our work was to propose a way to mix MCTS with existing powerful heuristics, with the application to energy management in mind. We did so by proposing a framework that allows to learn a good default policy by Direct Policy Search (DPS), and to include it in MCTS. The experimental results are very positive.To extend the reach of MCTS, we showed how it could be used to solve Partially Observable Markovian Decision Processes, with an application to game of Mine Sweeper, for which no consistent method had been proposed before.Finally, we used MCTS in a meta-bandit framework to solve energy investment problems: the investment decision was handled by classical bandit algorithms, while the evaluation of each investment was done by MCTS.The most important take away is that continuous MCTS has almost no assumption (besides the need for a generative model), is consistent, and can easily improve existing suboptimal solvers by using a method similar to what we proposed with DPS
Bouneffouf, Djallel. "DRARS, A Dynamic Risk-Aware Recommender System." Phd thesis, Institut National des Télécommunications, 2013. http://tel.archives-ouvertes.fr/tel-01026136.
Full textAmeen, S. A. "Optimizing deep learning networks using multi-armed bandits." Thesis, University of Salford, 2017. http://usir.salford.ac.uk/45018/.
Full textParaschakis, Dimitris. "Algorithmic and Ethical Aspects of Recommender Systems in e-Commerce." Licentiate thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-7792.
Full textAudibert, Jean-Yves. "PAC-Bayesian aggregation and multi-armed bandits." Habilitation à diriger des recherches, Université Paris-Est, 2010. http://tel.archives-ouvertes.fr/tel-00843972.
Full textGutowski, Nicolas. "Recommandation contextuelle de services : application à la recommandation d'évènements culturels dans la ville intelligente." Thesis, Angers, 2019. http://www.theses.fr/2019ANGE0030.
Full textNowadays, Multi-Armed Bandit algorithms for context-aware recommendation systems are extensively studied. In order to meet challenges underlying this field of research, our works and contributions have been organised according to three research directions : 1) recommendation systems ; 2) Multi-Armed Bandit (MAB) and Contextual Multi-Armed Bandit algorithms (CMAB) ; 3) context.The first part of our contributions focuses on MAB and CMAB algorithms for recommendation. It particularly addresses diversification of recommendations for improving individual accuracy. The second part is focused on contextacquisition, on context reasoning for cultural events recommendation systems for Smart Cities, and on dynamic context enrichment for CMAB algorithms
Galichet, Nicolas. "Contributions to Multi-Armed Bandits : Risk-Awareness and Sub-Sampling for Linear Contextual Bandits." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112242/document.
Full textThis thesis focuses on sequential decision making in unknown environment, and more particularly on the Multi-Armed Bandit (MAB) setting, defined by Lai and Robbins in the 50s. During the last decade, many theoretical and algorithmic studies have been aimed at cthe exploration vs exploitation tradeoff at the core of MABs, where Exploitation is biased toward the best options visited so far while Exploration is biased toward options rarely visited, to enforce the discovery of the the true best choices. MAB applications range from medicine (the elicitation of the best prescriptions) to e-commerce (recommendations, advertisements) and optimal policies (e.g., in the energy domain). The contributions presented in this dissertation tackle the exploration vs exploitation dilemma under two angles. The first contribution is centered on risk avoidance. Exploration in unknown environments often has adverse effects: for instance exploratory trajectories of a robot can entail physical damages for the robot or its environment. We thus define the exploration vs exploitation vs safety (EES) tradeoff, and propose three new algorithms addressing the EES dilemma. Firstly and under strong assumptions, the MIN algorithm provides a robust behavior with guarantees of logarithmic regret, matching the state of the art with a high robustness w.r.t. hyper-parameter setting (as opposed to, e.g. UCB (Auer 2002)). Secondly, the MARAB algorithm aims at optimizing the cumulative 'Conditional Value at Risk' (CVar) rewards, originated from the economics domain, with excellent empirical performances compared to (Sani et al. 2012), though without any theoretical guarantees. Finally, the MARABOUT algorithm modifies the CVar estimation and yields both theoretical guarantees and a good empirical behavior. The second contribution concerns the contextual bandit setting, where additional informations are provided to support the decision making, such as the user details in the ontent recommendation domain, or the patient history in the medical domain. The study focuses on how to make a choice between two arms with different numbers of samples. Traditionally, a confidence region is derived for each arm based on the associated samples, and the 'Optimism in front of the unknown' principle implements the choice of the arm with maximal upper confidence bound. An alternative, pioneered by (Baransi et al. 2014), and called BESA, proceeds instead by subsampling without replacement the larger sample set. In this framework, we designed a contextual bandit algorithm based on sub-sampling without replacement, relaxing the (unrealistic) assumption that all arm reward distributions rely on the same parameter. The CL-BESA algorithm yields both theoretical guarantees of logarithmic regret and good empirical behavior
CELLA, LEONARDO. "EFFICIENCY AND REALISM IN STOCHASTIC BANDITS." Doctoral thesis, Università degli Studi di Milano, 2021. http://hdl.handle.net/2434/807862.
Full textLiu, Fang. "Efficient Online Learning with Bandit Feedback." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1587680990430268.
Full textAllesiardo, Robin. "Bandits Manchots sur Flux de Données Non Stationnaires." Thesis, Université Paris-Saclay (ComUE), 2016. http://www.theses.fr/2016SACLS334/document.
Full textThe multi-armed bandit is a framework allowing the study of the trade-off between exploration and exploitation under partial feedback. At each turn t Є [1,T] of the game, a player has to choose an arm kt in a set of K and receives a reward ykt drawn from a reward distribution D(µkt) of mean µkt and support [0,1]. This is a challeging problem as the player only knows the reward associated with the played arm and does not know what would be the reward if she had played another arm. Before each play, she is confronted to the dilemma between exploration and exploitation; exploring allows to increase the confidence of the reward estimators and exploiting allows to increase the cumulative reward by playing the empirical best arm (under the assumption that the empirical best arm is indeed the actual best arm).In the first part of the thesis, we will tackle the multi-armed bandit problem when reward distributions are non-stationary. Firstly, we will study the case where, even if reward distributions change during the game, the best arm stays the same. Secondly, we will study the case where the best arm changes during the game. The second part of the thesis tacles the contextual bandit problem where means of reward distributions are now dependent of the environment's current state. We will study the use of neural networks and random forests in the case of contextual bandits. We will then propose meta-bandit based approach for selecting online the most performant expert during its learning
McInerney, Robert E. "Decision making under uncertainty." Thesis, University of Oxford, 2014. http://ora.ox.ac.uk/objects/uuid:a34e87ad-8330-42df-8ba6-d55f10529331.
Full textTalebi, Mazraeh Shahi Mohammad Sadegh. "Online Combinatorial Optimization under Bandit Feedback." Licentiate thesis, KTH, Reglerteknik, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-181321.
Full textQC 20160201
Chafaa, Irched. "Machine learning for beam alignment in mmWave networks." Electronic Thesis or Diss., université Paris-Saclay, 2021. http://www.theses.fr/2021UPASG044.
Full textTo cope with the ever increasing mobile data traffic, an envisioned solution for future wireless networks is to exploit the large available spectrum in the millimeter wave (mmWave) band. However, communicating at these high frequencies is very challenging as the transmitted signal suffers from strong attenuation, which leads to a limited propagation range and few multipath components (sparse mmWave channels). Hence, highly-directional beams have to be employed to focus the signal energy towards the intended user and compensate all those losses. Such beams need to be steered appropriately to guarantee a reliable communication link. This represents the so called beam alignment problem where the beams of the transmitter and the receiver need to be constantly aligned. Moreover, beam alignment policies need to support devices mobility and the unpredicted dynamics of the network, which result in significant signaling and training overhead affecting the overall performance. In the first part of the thesis, we formulate the beam alignment problem via the adversarial multi-armed bandit framework, which copes with arbitrary network dynamics including non-stationary or adversarial components. We propose online and adaptive beam alignment policies relying only on one-bit feedback to steer the beams of both nodes of the communication link in a distributed manner. Building on the well-known exponential weights algorithm (EXP3) and by exploiting the sparse nature of mmWave channels, we propose a modified policy (MEXP3), which comes with optimal theoretical guarantees in terms of asymptotic regret. Moreover, for finite horizons, our regret upper-bound is tighter than that of the original EXP3 suggesting better performance in practice. We then introduce an additional modification that accounts for the temporal correlation between successive beams and propose another beam alignment policy (NBT-MEXP3). In the second part of the thesis, deep learning tools are investigated to select mmWave beams in an access point -- user link. We leverage unsupervised deep learning to exploit the channel knowledge at sub-6 GHz and predict beamforming vectors in the mmWave band; this complex channel-beam mapping is learned via data issued from the DeepMIMO dataset and lacking the ground truth. We also show how to choose an optimal size of our neural network depending on the number of transmit and receive antennas at the access point. Furthermore, we investigate the impact of training data availability and introduce a federated learning (FL) approach to predict the beams of multiple links by sharing only the parameters of the locally trained neural networks (and not the local data). We investigate both synchronous and asynchronous FL methods. Our numerical simulations show the high potential of our approach, especially when the local available data is scarce or imperfect (noisy). At last, we compare our proposed deep learning methods with reinforcement learning methods derived in the first part. Simulations show that choosing an appropriate beam steering method depends on the target application and is a tradeoff between rate performance and computational complexity
Wilhelmi, Roca Francesc. "Towards spatial reuse in future wireless local area networks: a sequential learning approach." Doctoral thesis, Universitat Pompeu Fabra, 2020. http://hdl.handle.net/10803/669970.
Full textL'operació de reutilització espacial (SR) està guanyant impuls per a la darrera família d'estàndards IEEE 802.11 a causa dels aclaparadors requisits que presenten les xarxes sense fils de nova generació. En particular, la creixent necessitat de tràfic i el nombre de dispositius concurrents comprometen l'eficiència de les xarxes d'àrea local sense fils (WLANs) cada cop més concorregudes i posen en dubte la seva naturalesa descentralitzada. L'operació SR, inicialment introduïda per l'estàndard IEEE 802.11ax-2021 i estudiada posteriorment a IEEE 802.11be-2024, pretén augmentar el nombre de transmissions concurrents en un conjunt bàsic de serveis superposats (OBSS) mitjançant l'ajustament de la sensibilitat i el control de potència de transmissió, millorant així l'eficiència espectral. El nostre estudi sobre el funcionament de SR mostra un potencial destacat per millorar el nombre de transmissions simultànies en desplegaments multitudinaris, contribuint així al desenvolupament d'aplicacions de nova generació de baixa latència. Tot i això, els beneficis potencials de SR són actualment limitats per la rigidesa del mecanisme introduït per a l'11ax, i la manca de coordinació entre els BSS que ho implementen. L'operació SR evoluciona cap a esquemes coordinats on cooperen diferents BSS. En canvi, la coordinació comporta una sobrecàrrega de comunicació i sincronització, el qual té un impacte en el rendiment de les WLAN. D'altra banda, l'esquema coordinat és incompatible amb els dispositius que utilitzen versions anteriors IEEE 802.11, la qual cosa podria deteriorar el rendiment de les xarxes ja existents. Per aquests motius, en aquesta tesi s'avalua la viabilitat de mecanismes descentralitzats per a SR i s'analitzen minuciosament els principals impediments i mancances que se'n poden derivar. El nostre objectiu és donar llum a la futura forma de les WLAN pel que fa a l?optimització de SR i si s'ha de mantenir el seu caràcter descentralitzat, o bé és preferible evolucionar cap a desplegaments coordinats i centralitzats. Per abordar SR de forma descentralitzada, ens centrem en la Intel·ligència Artificial (AI) i ens proposem utilitzar una classe de mètodes seqüencials basats en l'aprenentatge, anomenats Multi-Armed Bandits (MAB). L'esquema MAB s'adapta al problema descentralitzat de SR perquè aborda la incertesa causada pel funcionament simultani de diversos dispositius (és a dir, un entorn multi-jugador) i la falta d'informació que se'n deriva. Els MAB poden fer front a la complexitat darrera les interaccions espacials entre dispositius que resulten de modificar la seva sensibilitat i potència de transmissió. En aquest sentit, els nostres resultats indiquen guanys importants de rendiment (fins al 100 \%) en desplegaments altament densos. Tot i això, l'aplicació d'aprenentatge automàtic amb múltiples agents planteja diversos problemes que poden comprometre el rendiment dels dispositius d'una xarxa (definició d'objectius conjunts, horitzó de convergència, aspectes d'escalabilitat o manca d'estacionarietat). A més, el nostre estudi d'aprenentatge multi-agent per a SR multi-agent inclou aspectes d'infraestructura per a xarxes de nova generació que integrin AI de manera intrínseca.
Alotaibi, Faisal F. "CONTENT TRADING AND PRIVACY-AWARE PRICING FOR EFFICIENT SPECTRUM UTILIZATION." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu1574698784641394.
Full textCayci, Semih. "Online Learning for Optimal Control of Communication and Computing Systems." The Ohio State University, 2020. http://rave.ohiolink.edu/etdc/view?acc_num=osu1595516470389826.
Full textSelent, Douglas A. "Creating Systems and Applying Large-Scale Methods to Improve Student Remediation in Online Tutoring Systems in Real-time and at Scale." Digital WPI, 2017. https://digitalcommons.wpi.edu/etd-dissertations/308.
Full textDegenne, Rémy. "Impact of structure on the design and analysis of bandit algorithms." Thesis, Université de Paris (2019-....), 2019. http://www.theses.fr/2019UNIP7179.
Full textIn this Thesis, we study sequential learning problems called stochastic multi-armed bandits. First a new bandit algorithm is presented. The analysis of that algorithm uses confidence intervals on the mean of the arms reward distributions, as most bandit proofs do. In a parametric setting, we derive concentration inequalities which quantify the deviation between the mean parameter of a distribution and its empirical estimation in order to obtain confidence intervals. These inequalities are presented as bounds on the Kullback-Leibler divergence. Three extensions of the stochastic multi-armed bandit problem are then studied. First we study the so-called combinatorial semi-bandit problem, in which an algorithm chooses a set of arms and the reward of each of these arms is observed. The minimal attainable regret then depends on the correlation between the arm distributions. We consider then a setting in which the observation mechanism changes. One source of difficulty of the bandit problem is the scarcity of information: only the arm pulled is observed. We show how to use efficiently eventual supplementary free information (which do not influence the regret). Finally a new family of algorithms is introduced to obtain both regret minimization and est arm identification regret guarantees. Each algorithm of the family realizes a trade-off between regret and time needed to identify the best arm. In a second part we study the so-called pure exploration problem, in which an algorithm is not evaluated on its regret but on the probability that it returns a wrong answer to a question on the arm distributions. We determine the complexity of such problems and design with performance close to that complexity
Jouini, Wassim. "Contribution to learning and decision making under uncertainty for Cognitive Radio." Thesis, Supélec, 2012. http://www.theses.fr/2012SUPL0010/document.
Full textDuring the last century, most of the meaningful frequency bands were licensed to emerging wireless applications. Because of the static model of frequency allocation, the growing number of spectrum demanding services led to a spectrum scarcity. However, recently, series of measurements on the spectrum utilization showed that the different frequency bands were underutilized (sometimes even unoccupied) and thus that the scarcity of the spectrum resource is virtual and only due to the static allocation of the different bands to specific wireless services. Moreover, the underutilization of the spectrum resource varies on different scales in time and space offering many opportunities to an unlicensed user or network to access the spectrum. Cognitive Radio (CR) and Opportunistic Spectrum Access (OSA) were introduced as possible solutions to alleviate the spectrum scarcity issue.In this dissertation, we aim at enabling CR equipments to exploit autonomously communication opportunities found in their vicinity. For that purpose, we suggest decision making mechanisms designed and/or adapted to answer CR related problems in general, and more specifically, OSA related scenarios. Thus, we argue that OSA scenarios can be modeled as Multi-Armed Bandit (MAB) problems. As a matter of fact, within OSA contexts, CR equipments are assumed to have no prior knowledge on their environment. Acquiring the necessary information relies on a sequential interaction between the CR equipment and its environment. Finally, the CR equipment is modeled as a cognitive agent whose purpose is to learn while providing an improving service to its user. Thus, firstly we analyze the performance of UCB1 algorithm when dealing with OSA problems with imperfect sensing. More specifically, we show that UCB1 can efficiently cope with sensing errors. We prove its convergence to the optimal channel and quantify its loss of performance compared to the case with perfect sensing. Secondly, we combine UCB1 algorithm with collaborative and coordination mechanism to model a secondary network (i.e. several SUs). We show that within this complex scenario, a coordinated learning mechanism can lead to efficient secondary networks. These scenarios assume that a SU can efficiently detect incumbent users’ activity while having no prior knowledge on their characteristics. Usually, energy detection is suggested as a possible approach to handle such task. Unfortunately, energy detection in known to perform poorly when dealing with uncertainty. Consequently, we ventured in this Ph.D. to revisit the problem of energy detection limits under uncertainty. We present new results on its performances as well as its limits when the noise level is uncertain and the uncertainty is modeled by a log-normal distribution (as suggested by Alexander Sonnenschein and Philip M. Fishman in 1992). Within OSA contexts, we address a final problem where a sensor aims at quantifying the quality of a channel in fading environments. In such contexts, UCB1 algorithms seem to fail. Consequently, we designed a new algorithm called Multiplicative UCB (UCB) and prove its convergence. Moreover, we prove that MUCB algorithms are order optimal (i.e., the order of their learning rate is optimal). This last work provides a contribution that goes beyond CR and OSA. As a matter of fact, MUCB algorithms are introduced and solved within a general MAB framework
Collet, Timothé. "Méthodes optimistes d’apprentissage actif pour la classification." Thesis, Université de Lorraine, 2016. http://www.theses.fr/2016LORR0084/document.
Full textA Classification problem makes use of a training set consisting of data labeled by an oracle. The larger the training set, the best the performance. However, requesting the oracle may be costly. The goal of Active Learning is thus to minimize the number of requests to the oracle while achieving the best performance. To do so, the data that are presented to the oracle must be carefully selected among a large number of unlabeled instances acquired at no cost. However, the true profitability of labeling a particular instance may not be known perfectly. It can therefore be estimated along with a measure of uncertainty. To Increase the precision on the estimate, we need to label more data. Thus, there is a dilemma between labeling data in order to increase the performance of the classifier or to better know how to select data. This dilemma is well studied in the context of finite budget optimization under the name of exploration versus exploitation dilemma. The most famous solutions make use of the principle of Optimism in the Face of Uncertainty. In this thesis, we show that it is possible to adapt this principle to the active learning problem for classification. Several algorithms have been developed for classifiers of increasing complexity, each one of them using the principle of Optimism in the Face of Uncertainty, and their performances have been empirically evaluated
Guillou, Frédéric. "On recommendation systems in a sequential context." Thesis, Lille 3, 2016. http://www.theses.fr/2016LIL30041/document.
Full textThis thesis is dedicated to the study of Recommendation Systems under a sequential setting, where the feedback given by users on items arrive one after another in the system. After each feedback, the system has to integrate it and try to improve future recommendations. Many techniques or evaluation methods have already been proposed to study the recommendation problem. Despite that, such sequential setting, which is more realistic and represent a closer framework to a real Recommendation System evaluation, has surprisingly been left aside. Under a sequential context, recommendation techniques need to take into consideration several aspects which are not visible for a fixed setting. The first one is the exploration-exploitation dilemma: the model making recommendations needs to find a good balance between gathering information about users' tastes or items through exploratory recommendation steps, and exploiting its current knowledge of the users and items to try to maximize the feedback received. We highlight the importance of this point through the first evaluation study and propose a simple yet efficient approach to make effective recommendation, based on Matrix Factorization and Multi-Armed Bandit algorithms. The second aspect emphasized by the sequential context appears when a list of items is recommended to the user instead of a single item. In such a case, the feedback given by the user includes two parts: the explicit feedback as the rating, but also the implicit feedback given by clicking (or not clicking) on other items of the list. By integrating both feedback into a Matrix Factorization model, we propose an approach which can suggest better ranked list of items, and we evaluate it in a particular setting
Ray, Chowdhury Sayak. "Reinforcement Learning in Large and Structured Environments." Thesis, 2021. https://etd.iisc.ac.in/handle/2005/5744.
Full textGoogle India PhD Fellowship
Lattimore, Finnian Rachel. "Learning how to act: making good decisions with machine learning." Phd thesis, 2017. http://hdl.handle.net/1885/144602.
Full textPalanisamy, Sundar Agnideven. "Learning-based Attack and Defense on Recommender Systems." Thesis, 2021. http://dx.doi.org/10.7912/C2/65.
Full textThe internet is the home for massive volumes of valuable data constantly being created, making it difficult for users to find information relevant to them. In recent times, online users have been relying on the recommendations made by websites to narrow down the options. Online reviews have also become an increasingly important factor in the final choice of a customer. Unfortunately, attackers have found ways to manipulate both reviews and recommendations to mislead users. A Recommendation System is a special type of information filtering system adapted by online vendors to provide suggestions to their customers based on their requirements. Collaborative filtering is one of the most widely used recommendation systems; unfortunately, it is prone to shilling/profile injection attacks. Such attacks alter the recommendation process to promote or demote a particular product. On the other hand, many spammers write deceptive reviews to change the credibility of a product/service. This work aims to address these issues by treating the review manipulation and shilling attack scenarios independently. For the shilling attacks, we build an efficient Reinforcement Learning-based shilling attack method. This method reduces the uncertainty associated with the item selection process and finds the most optimal items to enhance attack reach while treating the recommender system as a black box. Such practical online attacks open new avenues for research in building more robust recommender systems. When it comes to review manipulations, we introduce a method to use a deep structure embedding approach that preserves highly nonlinear structural information and the dynamic aspects of user reviews to identify and cluster the spam users. It is worth mentioning that, in the experiment with real datasets, our method captures about 92\% of all spam reviewers using an unsupervised learning approach.
(11190282), Agnideven Palanisamy Sundar. "Learning-based Attack and Defense on Recommender Systems." Thesis, 2021.
Find full textSiddartha, Y. R. "Learning Tournament Solutions from Preference-based Multi-Armed Bandits." Thesis, 2017. https://etd.iisc.ac.in/handle/2005/4698.
Full text"Sample-efficient learning algorithms for multi-armed bandits and tensor completion: 基於有限樣本的機器學習算法." 2015. http://repository.lib.cuhk.edu.hk/en/item/cuhk-1291960.
Full textThesis Ph.D. Chinese University of Hong Kong 2015.
Includes bibliographical references (leaves 211-228).
Abstracts also in Chinese.
Title from PDF title page (viewed on 07, December, 2016).
Chen, Shouyuan.
Narayanaprasad, Karthik Periyapattana. "Sequential Controlled Sensing to Detect an Anomalous Process." Thesis, 2021. https://etd.iisc.ac.in/handle/2005/5514.
Full textDepartment of Science and Technology, Ministry of Human Resource Development, Center for Networked Intelligence, Robert Bosch Center for Cyber-Physical Systems