Добірка наукової літератури з теми "Upper Confidence Bound"

Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями

Оберіть тип джерела:

Ознайомтеся зі списками актуальних статей, книг, дисертацій, тез та інших наукових джерел на тему "Upper Confidence Bound".

Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.

Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.

Статті в журналах з теми "Upper Confidence Bound"

1

Francisco-Valencia, Iván, José Raymundo Marcial-Romero, and Rosa María Valdovinos-Rosas. "Upper Confidence Bound o Upper Cofidence Bound Tuned para General Game Playing: Un estudio empírico." Research in Computing Science 147, no. 8 (December 31, 2018): 301–9. http://dx.doi.org/10.13053/rcs-147-8-23.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Saffidine, Abdallah, Tristan Cazenave, and Jean Méhat. "UCD : Upper confidence bound for rooted directed acyclic graphs." Knowledge-Based Systems 34 (October 2012): 26–33. http://dx.doi.org/10.1016/j.knosys.2011.11.014.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Bollabás, Béla, and Alan Stacey. "Approximate upper bounds for the critical probability of oriented percolation in two dimensions based on rapidly mixing Markov chains." Journal of Applied Probability 34, no. 4 (December 1997): 859–67. http://dx.doi.org/10.2307/3215002.

Повний текст джерела
Анотація:
We develop a technique for establishing statistical tests with precise confidence levels for upper bounds on the critical probability in oriented percolation. We use it to givepc< 0.647 with a 99.999967% confidence. As Monte Carlo simulations suggest thatpc≈ 0.6445, this bound is fairly tight.
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Bollabás, Béla, and Alan Stacey. "Approximate upper bounds for the critical probability of oriented percolation in two dimensions based on rapidly mixing Markov chains." Journal of Applied Probability 34, no. 04 (December 1997): 859–67. http://dx.doi.org/10.1017/s0021900200101573.

Повний текст джерела
Анотація:
We develop a technique for establishing statistical tests with precise confidence levels for upper bounds on the critical probability in oriented percolation. We use it to give pc &lt; 0.647 with a 99.999967% confidence. As Monte Carlo simulations suggest that pc ≈ 0.6445, this bound is fairly tight.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Cruse, Thomas A., and Jeffrey M. Brown. "Confidence Interval Simulation for Systems of Random Variables." Journal of Engineering for Gas Turbines and Power 129, no. 3 (October 11, 2005): 836–42. http://dx.doi.org/10.1115/1.2718217.

Повний текст джерела
Анотація:
Bayesian network models are seen as important tools in probabilistic design assessment for complex systems. Such network models for system reliability analysis provide a single probability of failure value whether the experimental data used to model the random variables in the problem are perfectly known or derive from limited experimental data. The values of the probability of failure for each of those two cases are not the same, of course, but the point is that there is no way to derive a Bayesian type of confidence interval from such reliability network models. Bayesian confidence (or belief) intervals for a probability of failure are needed for complex system problems in order to extract information on which random variables are dominant, not just for the expected probability of failure but also for some upper bound, such as for a 95% confidence upper bound. We believe that such confidence bounds on the probability of failure will be needed for certifying turbine engine components and systems based on probabilistic design methods. This paper reports on a proposed use of a two-step Bayesian network modeling strategy that provides a full cumulative distribution function for the probability of failure, conditioned by the experimental evidence for the selected random variables. The example is based on a hypothetical high-cycle fatigue design problem for a transport aircraft engine application.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Ottens, Brammert, Christos Dimitrakakis, and Boi Faltings. "DUCT: An Upper Confidence Bound Approach to Distributed Constraint Optimization Problems." Proceedings of the AAAI Conference on Artificial Intelligence 26, no. 1 (September 20, 2021): 528–34. http://dx.doi.org/10.1609/aaai.v26i1.8129.

Повний текст джерела
Анотація:
The Upper Confidence Bounds (UCB) algorithm is a well-known near-optimal strategy for the stochastic multi-armed bandit problem. Its extensions to trees, such as the Upper Confidence Tree (UCT) algorithm, have resulted in good solutions to the problem of Go. This paper introduces DUCT, a distributed algorithm inspired by UCT, for solving Distributed Constraint Optimization Problems (DCOP). Bounds on the solution quality are provided, and experiments show that, compared to existing DCOP approaches, DUCT is able to solve very large problems much more efficiently, or to find significantly higher quality solutions.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Radović, Nevena, and Milena Erceg. "Hardware implementation of the upper confidence-bound algorithm for reinforcement learning." Computers & Electrical Engineering 96 (December 2021): 107537. http://dx.doi.org/10.1016/j.compeleceng.2021.107537.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Melesko, Jaroslav, and Vitalij Novickij. "Computer Adaptive Testing Using Upper-Confidence Bound Algorithm for Formative Assessment." Applied Sciences 9, no. 20 (October 14, 2019): 4303. http://dx.doi.org/10.3390/app9204303.

Повний текст джерела
Анотація:
There is strong support for formative assessment inclusion in learning processes, with the main emphasis on corrective feedback for students. However, traditional testing and Computer Adaptive Testing can be problematic to implement in the classroom. Paper based tests are logistically inconvenient and are hard to personalize, and thus must be longer to accurately assess every student in the classroom. Computer Adaptive Testing can mitigate these problems by making use of Multi-Dimensional Item Response Theory at cost of introducing several new problems, most problematic of which are the greater test creation complexity, because of the necessity of question pool calibration, and the debatable premise that different questions measure one common latent trait. In this paper a new approach of modelling formative assessment as a Multi-Armed bandit problem is proposed and solved using Upper-Confidence Bound algorithm. The method in combination with e-learning paradigm has the potential to mitigate such problems as question item calibration and lengthy tests, while providing accurate formative assessment feedback for students. A number of simulation and empirical data experiments (with 104 students) are carried out to explore and measure the potential of this application with positive results.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Dzhoha, Andrii, and Iryna Rozora. "Beta Upper Confidence Bound Policy for the Design of Clinical Trials." Austrian Journal of Statistics 52, SI (August 15, 2023): 26–39. http://dx.doi.org/10.17713/ajs.v52isi.1751.

Повний текст джерела
Анотація:
The multi-armed bandit problem is a classic example of the exploration-exploitation trade-off well suited to model sequential resource allocation under uncertainty. One of its typical motivating applications is the adaptive designs in clinical trials which modify the trial's course in accordance with the pre-specified objective by utilizing results accumulating in the trial. Since the response to a procedure in clinical trials is not immediate, the multi-armed bandit policies require adaptation to delays to retain their theoretical guarantees. In this work, we show the importance of such adaptation by evaluating policies using the publicly available datasetThe International Stroke Trial of a randomized trial of aspirin and subcutaneous heparin among 19,435 patients with acute ischaemic stroke. In addition to adapted policies, we analyze the Upper Confidence Bound policy with the beta feedback to mitigate delays when the certainty evidence of successful treatment is available in a relatively short-term period after the procedure.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

WALLINGA, J., D. LÉVY-BRUHL, N. J. GAY, and C. H. WACHMANN. "Estimation of measles reproduction ratios and prospects for elimination of measles by vaccination in some Western European countries." Epidemiology and Infection 127, no. 2 (October 2001): 281–95. http://dx.doi.org/10.1017/s095026880100601x.

Повний текст джерела
Анотація:
The objective of this study is to estimate the measles reproduction ratio for eight Western European vaccination programmes. Because many plausible age-structured transmission patterns result in a similar description of the observations, it is not possible to estimate a unique value of the reproduction ratio. A method is developed to estimate bounds and confidence intervals for plausible values of the reproduction ratios using maximum likelihood methods. Lower and upper bounds for plausible values of the basic reproduction ratio are estimated to be 7·17 (95% CI 7·14–7·20) and 45·41 (95% CI 9·77–49·57), corresponding to lower and upper bounds on critical vaccine coverage of 86·6% and 98·1%. Of the eight evaluated vaccination programmes, four have vaccine coverage below the lower bound and allow measles to persist, and four have vaccine coverage at the upper bound and may eventually eliminate measles.
Стилі APA, Harvard, Vancouver, ISO та ін.

Дисертації з теми "Upper Confidence Bound"

1

Modi, Navikkumar. "Machine Learning and Statistical Decision Making for Green Radio." Thesis, CentraleSupélec, 2017. http://www.theses.fr/2017SUPL0002/document.

Повний текст джерела
Анотація:
Cette thèse étudie les techniques de gestion intelligente du spectre et de topologie des réseaux via une approche radio intelligente dans le but d’améliorer leur capacité, leur qualité de service (QoS – Quality of Service) et leur consommation énergétique. Les techniques d’apprentissage par renforcement y sont utilisées dans le but d’améliorer les performances d’un système radio intelligent. Dans ce manuscrit, nous traitons du problème d’accès opportuniste au spectre dans le cas de réseaux intelligents sans infrastructure. Nous nous plaçons dans le cas où aucune information n’est échangée entre les utilisateurs secondaires (pour éviter les surcoûts en transmissions). Ce problème particulier est modélisé par une approche dite de bandits manchots « restless » markoviens multi-utilisateurs (multi-user restless Markov MAB -multi¬armed bandit). La contribution principale de cette thèse propose une stratégie d’apprentissage multi-joueurs qui prend en compte non seulement le critère de disponibilité des canaux (comme déjà étudié dans la littérature et une thèse précédente au laboratoire), mais aussi une métrique de qualité, comme par exemple le niveau d’interférence mesuré (sensing) dans un canal (perturbations issues des canaux adjacents ou de signaux distants). Nous prouvons que notre stratégie, RQoS-UCB distribuée (distributed restless QoS-UCB – Upper Confidence Bound), est quasi optimale car on obtient des performances au moins d’ordre logarithmique sur son regret. En outre, nous montrons par des simulations que les performances du système intelligent proposé sont améliorées significativement par l’utilisation de la solution d’apprentissage proposée permettant à l’utilisateur secondaire d’identifier plus efficacement les ressources fréquentielles les plus disponibles et de meilleure qualité. Cette thèse propose également un nouveau modèle d’apprentissage par renforcement combiné à un transfert de connaissance afin d’améliorer l’efficacité énergétique (EE) des réseaux cellulaires hétérogènes. Nous formulons et résolvons un problème de maximisation de l’EE pour le cas de stations de base (BS – Base Stations) dynamiquement éteintes et allumées (ON-OFF). Ce problème d’optimisation combinatoire peut aussi être modélisé par des bandits manchots « restless » markoviens. Par ailleurs, une gestion dynamique de la topologie des réseaux hétérogènes, utilisant l’algorithme RQoS-UCB, est proposée pour contrôler intelligemment le mode de fonctionnement ON-OFF des BS, dans un contexte de trafic et d’étude de capacité multi-cellulaires. Enfin une méthode incluant le transfert de connaissance « transfer RQoS-UCB » est proposée et validée par des simulations, pour pallier les pertes de récompense initiales et accélérer le processus d’apprentissage, grâce à la connaissance acquise à d’autres périodes temporelles correspondantes à la période courante (même heure de la journée la veille, ou même jour de la semaine par exemple). La solution proposée de gestion dynamique du mode ON-OFF des BS permet de diminuer le nombre de BS actives tout en garantissant une QoS adéquate en atténuant les fluctuations de la QoS lors des variations du trafic et en améliorant les conditions au démarrage de l’apprentissage. Ainsi, l’efficacité énergétique est grandement améliorée. Enfin des démonstrateurs en conditions radio réelles ont été développés pour valider les solutions d’apprentissage étudiées. Les algorithmes ont également été confrontés à des bases de données de mesures effectuées par un partenaire dans la gamme de fréquence HF, pour des liaisons transhorizon. Les résultats confirment la pertinence des solutions d’apprentissage proposées, aussi bien en termes d’optimisation de l’utilisation du spectre fréquentiel, qu’en termes d’efficacité énergétique
Future cellular network technologies are targeted at delivering self-organizable and ultra-high capacity networks, while reducing their energy consumption. This thesis studies intelligent spectrum and topology management through cognitive radio techniques to improve the capacity density and Quality of Service (QoS) as well as to reduce the cooperation overhead and energy consumption. This thesis investigates how reinforcement learning can be used to improve the performance of a cognitive radio system. In this dissertation, we deal with the problem of opportunistic spectrum access in infrastructureless cognitive networks. We assume that there is no information exchange between users, and they have no knowledge of channel statistics and other user's actions. This particular problem is designed as multi-user restless Markov multi-armed bandit framework, in which multiple users collect a priori unknown reward by selecting a channel. The main contribution of the dissertation is to propose a learning policy for distributed users, that takes into account not only the availability criterion of a band but also a quality metric linked to the interference power from the neighboring cells experienced on the sensed band. We also prove that the policy, named distributed restless QoS-UCB (RQoS-UCB), achieves at most logarithmic order regret. Moreover, numerical studies show that the performance of the cognitive radio system can be significantly enhanced by utilizing proposed learning policies since the cognitive devices are able to identify the appropriate resources more efficiently. This dissertation also introduces a reinforcement learning and transfer learning frameworks to improve the energy efficiency (EE) of the heterogeneous cellular network. Specifically, we formulate and solve an energy efficiency maximization problem pertaining to dynamic base stations (BS) switching operation, which is identified as a combinatorial learning problem, with restless Markov multi-armed bandit framework. Furthermore, a dynamic topology management using the previously defined algorithm, RQoS-UCB, is introduced to intelligently control the working modes of BSs, based on traffic load and capacity in multiple cells. Moreover, to cope with initial reward loss and to speed up the learning process, a transfer RQoS-UCB policy, which benefits from the transferred knowledge observed in historical periods, is proposed and provably converges. Then, proposed dynamic BS switching operation is demonstrated to reduce the number of activated BSs while maintaining an adequate QoS. Extensive numerical simulations demonstrate that the transfer learning significantly reduces the QoS fluctuation during traffic variation, and it also contributes to a performance jump-start and presents significant EE improvement under various practical traffic load profiles. Finally, a proof-of-concept is developed to verify the performance of proposed learning policies on a real radio environment and real measurement database of HF band. Results show that proposed multi-armed bandit learning policies using dual criterion (e.g. availability and quality) optimization for opportunistic spectrum access is not only superior in terms of spectrum utilization but also energy efficient
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Hadiji, Hédi. "On some adaptivity questions in stochastic multi-armed bandits." Thesis, université Paris-Saclay, 2020. http://www.theses.fr/2020UPASM021.

Повний текст джерела
Анотація:
Cette thèse s'inscrit dans le domaine des statistiques séquentielles. Le cadre principal étudié est celui des bandits stochastiques à plusieurs bras, cadre idéal qui modélise le dilemme exploration-exploitation face à des choix répétés. La thèse est composée de quatre chapitres, précédés d'une introduction. Dans la première partie du corps de la thèse, on présente un nouvel algorithme capable d'atteindre des garanties optimales à la fois d'un point de vue distribution-dépendent et distribution-free. Les deux chapitres suivants sont consacrés à des questions dites d'adaptation. D'abord, on propose un algorithme capable de s'adapter à la régularité inconnue dans des problèmes de bandits continus, mettant en évidence le coût polynomial de l'adaptation en bandits continus. Ensuite, on considère un problème d'adaptation au supports pour des problèmes de bandits à K bras, à distributions de paiements bornés dans des intervalles inconnus. Enfin, dans un dernier chapitre un peu à part, on étudie un cadre légèrement différent de bandits préservant la diversité. On montre que le regret optimal dans ce cadre croît à des vitesses différentes des vitesses classiques, avec notamment la possibilité d'atteindre un regret constant sous certaines hypothèses
The main topics adressed in this thesis lie in the general domain of sequential learning, and in particular stochastic multi-armed bandits. The thesis is divided into four chapters and an introduction. In the first part of the main body of the thesis, we design a new algorithm achieving, simultaneously, distribution-dependent and distribution-free optimal guarantees. The next two chapters are devoted to adaptivity questions. First, in the context of continuum-armed bandits, we present a new algorithm which, for the first time, does not require the knowledge of the regularity of the bandit problem it is facing. Then, we study the issue of adapting to the unknown support of the payoffs in bounded K-armed bandits. We provide a procedure that (almost) obtains the same guarantees as if it was given the support in advance. In the final chapter, we study a slightly different bandit setting, designed to enforce diversity-preserving conditions on the strategies. We show that the optimal regert in this setting at a speed that is quite different from the traditional bandit setting. In particular, we observe that bounded regret is possible under some specific hypotheses
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Iacob, Alexandra. "Scalable Model-Free Algorithms for Influencer Marketing." Electronic Thesis or Diss., université Paris-Saclay, 2023. http://www.theses.fr/2023UPASG012.

Повний текст джерела
Анотація:
Motivés par les scénarios de diffusion de l'information et de publicité dans le les réseaux sociaux, nous étudions un problème de maximisation de l'influence (MI) dans lequel on suppose que l'on en sait peu sur le réseau de diffusion ou sur le modèle qui détermine comment l'information peut se propager.Dans un tel environnement incertain, on peut se concentrer sur des campagnes de diffusion à plusieurs tours, avec l'objectif de maximiser le nombre d'utilisateurs distincts qui sont influencés ou activés, à partir d'une base de nœuds influents.Au cours d'une campagne, les graines de propagation sont sélectionnées séquentiellement lors de tours consécutifs, et les commentaires sont collectés sous la forme des nœuds activés à chaque tour.L'impact (récompense) d'un tour est alors quantifié par le nombre de nœuds nouvellement activés. En général, il faut maximiser la propagation totale de la campagne, comme la somme des récompenses des tours.Nous considérons deux sous-classes de d'IM, emph{cimp} (CIMP) et emph{ecimp} (ECIMP), où (i) la récompense d'un tour d'une campagne en cours consiste uniquement en de nouvelles activations (non observées lors des tours précédents de cette campagne),(ii) le contexte du tour et les données historiques des tours précédents peuvent être exploités pour apprendre la meilleure politique, et(iii) ECIMP est CIMP répété plusieurs fois, ce qui permet d'apprendre également des campagnes précédentes.Ce problème est directement motivé par les scénarios du monde réel de la diffusion de l'information dans le marketing d'influence, où (i) seule la première / unique activation d'un utilisateur cible présente un intérêt (et cette activation persistera comme une activation acquise, latente, tout au long de la campagne).(ii) de précieuses informations secondaires sont disponibles pour l'agent d'apprentissageDans ce contexte, une approche d'exploration-exploitation pourrait être utilisée pour apprendre les principaux paramètres de diffusion sous-jacents, tout en exécutant les campagnes.Pour CIMP, nous décrivons et comparons deux méthodes de bandits à bras multiples contextuels, avec des limites supérieures de confiance sur le potentiel restant des influenceurs, l'une utilisant un modèle linéaire généralisé et l'estimateur de Good-Turing pour le potentiel restant, et l'autre adaptant directement l'algorithme LinUCB à notre cadre.Pour ECIMP, nous proposons l'algorithmelgtlsvi qui implémente le principe d'optimisme face à l'incertitude pour l'apprentissage par renforcement, avec approximation linéaire.L'agent d'apprentissage estime pour chaque nœud de départ son potentiel restant avec un estimateur de Good-Turing, modifié par une fonction Q estimée. Nous montrons qu'ils surpassent les performances des méthodes de base utilisant les idées les plus récentes, sur des données synthétiques et réelles, tout en présentant un comportement différent et complémentaire, selon les scénarios dans lesquels ils sont déployés
Motivated by scenarios of information diffusion and advertising in social media, we study an emph{influence maximization} (IM) problem in which little is assumed to be known about the diffusion network or about the model that determines how information may propagate. In such a highly uncertain environment, one can focus on emph{multi-round diffusion campaigns}, with the objective to maximize the number of distinct users that are influenced or activated, starting from a known base of few influential nodes.During a campaign, spread seeds are selected sequentially at consecutive rounds, and feedback is collected in the form of the activated nodes at each round.A round's impact (reward) is then quantified as the number of emph{newly activated nodes}.Overall, one must maximize the campaign's total spread, as the sum of rounds' rewards.We consider two sub-classes of IM, emph{cimp} (CIMP) and emph{ecimp} (ECIMP), where (i) the reward of a given round of an ongoing campaign consists of only the extit{new activations} (not observed at previous rounds within that campaign), (ii) the round's context and the historical data from previous rounds can be exploited to learn the best policy, and (iii) ECIMP is CIMP repeated multiple times, offering the possibility of learning from previous campaigns as well.This problem is directly motivated by the real-world scenarios of information diffusion in emph{influencer marketing}, where (i) only a target user's emph{first} / unique activation is of interest (and this activation will emph{persist} as an acquired, latent one throughout the campaign), and (ii) valuable side-information is available to the learning agent.In this setting, an explore-exploit approach could be used to learn the key underlying diffusion parameters, while running the campaigns.For CIMP, we describe and compare two methods of emph{contextual multi-armed bandits}, with emph{upper-confidence bounds} on the remaining potential of influencers, one using a generalized linear model and the Good-Turing estimator for remaining potential (glmucb), and another one that directly adapts the LinUCB algorithm to our setting (linucb).For ECIMP, we propose the algorithmlgtlsvi, which implements the extit{optimism in the face of uncertainty} principle for episodic reinforcement learning with linear approximation. The learning agent estimates for each seed node its remaining potential with a Good-Turing estimator, modified by an estimated Q-function.We show that they outperform baseline methods using state-of-the-art ideas, on synthetic and real-world data, while at the same time exhibiting different and complementary behavior, depending on the scenarios in which they are deployed
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Piette, Eric. "Une nouvelle approche au General Game Playing dirigée par les contraintes." Thesis, Artois, 2016. http://www.theses.fr/2016ARTO0401/document.

Повний текст джерела
Анотація:
Développer un programme capable de jouer à n’importe quel jeu de stratégie, souvent désigné par le General Game Playing (GGP) constitue un des Graal de l’intelligence artificielle. Les compétitions GGP, où chaque jeu est représenté par un ensemble de règles logiques au travers du Game Description Language (GDL), ont conduit la recherche à confronter de nombreuses approches incluant les méthodes de type Monte Carlo, la construction automatique de fonctions d’évaluation, ou la programmation logique et ASP. De par cette thèse, nous proposons une nouvelle approche dirigée par les contraintes stochastiques.Dans un premier temps, nous nous concentrons sur l’élaboration d’une traduction de GDL en réseauxde contraintes stochastiques (SCSP) dans le but de fournir une représentation dense des jeux de stratégies et permettre la modélisation de stratégies.Par la suite, nous exploitons un fragment de SCSP au travers d’un algorithme dénommé MAC-UCBcombinant l’algorithme MAC (Maintaining Arc Consistency) utilisé pour résoudre chaque niveau duSCSP tour après tour, et à l’aide de UCB (Upper Confidence Bound) afin d’estimer l’utilité de chaquestratégie obtenue par le dernier niveau de chaque séquence. L’efficacité de cette nouvelle technique sur les autres approches GGP est confirmée par WoodStock, implémentant MAC-UCB, le leader actuel du tournoi continu de GGP.Finalement, dans une dernière partie, nous proposons une approche alternative à la détection de symétries dans les jeux stochastiques, inspirée de la programmation par contraintes. Nous montrons expérimentalement que cette approche couplée à MAC-UCB, surpasse les meilleures approches du domaine et a permis à WoodStock de devenir champion GGP 2016
The ability for a computer program to effectively play any strategic game, often referred to General Game Playing (GGP), is a key challenge in AI. The GGP competitions, where any game is represented according to a set of logical rules in the Game Description Language (GDL), have led researches to compare various approaches, including Monte Carlo methods, automatic constructions of evaluation functions, logic programming, and answer set programming through some general game players. In this thesis, we offer a new approach driven by stochastic constraints. We first focus on a translation process from GDL to stochastic constraint networks (SCSP) in order to provide compact representations of strategic games and to model strategies. In a second part, we exploit a fragment of SCSP through an algorithm called MAC-UCB by coupling the MAC (Maintaining Arc Consistency) algorithm, used to solve each stage of the SCSP in turn, together with the UCB (Upper Confidence Bound) policy for approximating the values of those strategies obtained by the last stage in the sequence. The efficiency of this technical on the others GGP approaches is confirmed by WoodStock, implementing MAC-UCB, the actual leader on the GGP Continuous Tournament. Finally, in the last part, we propose an alternative approach to symmetry detection in stochastic games, inspired from constraint programming techniques. We demonstrate experimentally that MAC-UCB, coupled with our constranit-based symmetry detection approach, significantly outperforms the best approaches and made WoodStock the GGP champion 2016
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Liao, Jhan-Yi, and 廖展逸. "Use Upper Confidence Bound for Tree in Chinese Chess." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/b8wbzp.

Повний текст джерела
Анотація:
碩士
國立東華大學
資訊工程學系
96
Chinese Chess is the one of the oldest game of Chinese. In artificial intelligence, many people study in computer Chinese Chess, There are many theories to get the score of the most bottom board more quickly base on Alpha-beta Search., e.g. internal iterative deepening, history heuristic and killer heuristic. Our method is different from the traditional way of searching. It is base on winning rate and get balance in experience and test other path when it is searching. This search style is like human thinking.
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Chou, Cheng-Wei, та 周政緯. "Design and Implementation of a Computer GO Program based on UCㄒ(Upper Confidence bound for Tree search )". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/04854200470817945378.

Повний текст джерела
Анотація:
碩士
輔仁大學
資訊工程學系
96
Computer Go, one of the related realm of artificial intellegence, Computer Go has been an arduous challenge following chess and Chinese chess.UCT is a tree search algorithm based on Monte Carlo Method, it is very effective that UCT be used in global tree search of Computer Go Program. JIMMY is a Computer Go program, which main principle of design is emulating method of thinking of human. In this paper, we will introduce how to combine JIMMY and UCT to increase the strength of JIMMY.
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Siddartha, Y. R. "Learning Tournament Solutions from Preference-based Multi-Armed Bandits." Thesis, 2017. https://etd.iisc.ac.in/handle/2005/4698.

Повний текст джерела
Анотація:
We consider the dueling bandits problem, a sequential decision task where the goal is to learn to pick `good' arms out of an available pool by actively querying for and observing relative preferences between selected pairs of arms. The noisy observed preferences are assumed to be generated by a fixed but unknown stochastic preference model. Motivated by applications in information retrieval, e-commerce, crowdsourcing, etc., a number of bandit algorithms have been proposed in recent years for this task. These have mostly addressed restricted settings wherein the underlying preference model satisfies various structural assumptions. such as being based on a random utility function or a feature-space embedding, or satisfying transitivity or sparsity properties, or at least possessing a Condorcet winner { a single `best ‘arm that is preferred to all others. In seeking to move beyond such restricted settings, there has been a recent shift towards alternative notions of `good' arms (including Borda, Copeland and von Neumann winners). In this work, we extend the dueling bandits problem by adopting, as the desired target set of good (or 'winning') arms, a number of tournament solutions that have been proposed in social choice and voting theory literature as embodying principled and natural criteria for identifying good arms based on preference relations. We then propose a family of upper confidence bound (UCB) based dueling bandit algorithms that learn to play winning arms from several popular tournament solutions, the top cycle, uncovered set, Banks set and Copeland set. We derive these algorithms by first proposing a generic UCB-based framework algorithm that can be instantiated for different tournament solutions by means of appropriately designed `selection procedures. We show sufficiency conditions for the resulting dueling bandit algorithms to satisfy distribution-dependent, horizon-free bounds on natural regret measures defined w.r.t. the target tournament solutions. In contrast to previous work, these bounds do not require restrictive structural assumptions on the preference model and hold for a range of different tournament solutions. We develop selection procedures that satisfy the sufficiency conditions for a number of popular tournament solutions, yielding dueling bandit algorithms UCB-TC, UCB-UC, UCB-BA and UCB-CO for the top cycle, uncovered set, Banks set and the Copeland set respectively. The O_K2 ln T g2 _ bounds we derive are optimal in their dependence on the time horizon T. We show that for all of these tournament solutions, the distribution-dependent `margin' g is lower bounded by the separation or the relative advantage of top cycle arms over non-top cycle arms. While O(K ln T) bounds are known for Condorcet models, our O(K2 ln T) bounds extend to more general models as well as other tournament solutions. We empirically validate these claims and evaluate the proposed algorithms, comparing them to dueling bandit algorithms RUCB, SAVAGE and BTMB over synthetic and real-world preference models. We show that the UCB-TS algorithms perform competitively over models that possess a Condorcet winner, but out-perform the other algorithms over more general models that do not possess a Condorcet winner.
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Chatterjee, Aritra. "A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit Problem." Thesis, 2017. http://etd.iisc.ac.in/handle/2005/3631.

Повний текст джерела
Анотація:
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem. Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction. In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem.
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Chatterjee, Aritra. "A Study of Thompson Sampling Approach for the Sleeping Multi-Armed Bandit Problem." Thesis, 2017. http://etd.iisc.ernet.in/2005/3631.

Повний текст джерела
Анотація:
The multi-armed bandit (MAB) problem provides a convenient abstraction for many online decision problems arising in modern applications including Internet display advertising, crowdsourcing, online procurement, smart grids, etc. Several variants of the MAB problem have been proposed to extend the basic model to a variety of practical and general settings. The sleeping multi-armed bandit (SMAB) problem is one such variant where the set of available arms varies with time. This study is focused on analyzing the efficacy of the Thompson Sampling algorithm for solving the SMAB problem. Any algorithm for the classical MAB problem is expected to choose one of K available arms (actions) in each of T consecutive rounds. Each choice of an arm generates a stochastic reward from an unknown but fixed distribution. The goal of the algorithm is to maximize the expected sum of rewards over the T rounds (or equivalently minimize the expected total regret), relative to the best fixed action in hindsight. In many real-world settings, however, not all arms may be available in any given round. For example, in Internet display advertising, some advertisers might choose to stay away from the auction due to budget constraints; in crowdsourcing, some workers may not be available at a given time due to timezone difference, etc. Such situations give rise to the sleeping MAB abstraction. In the literature, several upper confidence bound (UCB)-based approaches have been proposed and investigated for the SMAB problem. Our contribution is to investigate the efficacy of a Thomp-son Sampling-based approach. Our key finding is to establish a logarithmic regret bound, which non-trivially generalizes a similar bound known for this approach in the classical MAB setting. Our bound also matches (up to constants) the best-known lower bound for the SMAB problem. Furthermore, we show via detailed simulations, that the Thompson Sampling approach in fact outperforms the known algorithms for the SMAB problem.
Стилі APA, Harvard, Vancouver, ISO та ін.
10

方裕欽. "Research on Applicabilities and Improved Strategies of the Upper Confidence Bounds Applied to Trees Algorithm on Othello." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/85144592400294467104.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Частини книг з теми "Upper Confidence Bound"

1

Drugan, Mădălina M. "Scalarized Lower Upper Confidence Bound Algorithm." In Lecture Notes in Computer Science, 229–35. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-19084-6_21.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Garivier, Aurélien, and Eric Moulines. "On Upper-Confidence Bound Policies for Switching Bandit Problems." In Lecture Notes in Computer Science, 174–88. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24412-4_16.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Francisco-Valencia, Iván, José Raymundo Marcial-Romero, and Rosa María Valdovinos-Rosas. "Some Variations of Upper Confidence Bound for General Game Playing." In Lecture Notes in Computer Science, 68–79. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-21077-9_7.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Contal, Emile, David Buffoni, Alexandre Robicquet, and Nicolas Vayatis. "Parallel Gaussian Process Optimization with Upper Confidence Bound and Pure Exploration." In Advanced Information Systems Engineering, 225–40. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-40988-2_15.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Carpentier, Alexandra, Alessandro Lazaric, Mohammad Ghavamzadeh, Rémi Munos, and Peter Auer. "Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits." In Lecture Notes in Computer Science, 189–203. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011. http://dx.doi.org/10.1007/978-3-642-24412-4_17.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Roy, Kaushik, Qi Zhang, Manas Gaur, and Amit Sheth. "Knowledge Infused Policy Gradients with Upper Confidence Bound for Relational Bandits." In Machine Learning and Knowledge Discovery in Databases. Research Track, 35–50. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-86486-6_3.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Huang, Kuan-Hao, and Hsuan-Tien Lin. "Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards." In Advances in Knowledge Discovery and Data Mining, 143–55. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-31750-2_12.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Gonçalves, Richard A., Carolina P. Almeida, and Aurora Pozo. "Upper Confidence Bound (UCB) Algorithms for Adaptive Operator Selection in MOEA/D." In Lecture Notes in Computer Science, 411–25. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-15934-8_28.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Liang, Yuan. "Fatigue-Aware Event-Participant Arrangement in Event-Based Social Networks: An Upper Confidence Bound Method." In Lecture Notes in Networks and Systems, 780–96. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-16078-3_54.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Wu, Lin, Ying Li, Chao Deng, Lei Chen, Meiyu Yuan, and Hong Jiang. "Implementation and Performance Evaluation of the Fully Enclosed Region Upper Confidence Bound Applied to Trees Algorithm." In Proceedings of the 4th International Conference on Computer Engineering and Networks, 163–69. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-11104-9_19.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Тези доповідей конференцій з теми "Upper Confidence Bound"

1

Ma, Rui, Xvhong Zhou, Xiajing Wang, Zheng Zhang, Jinman Jiang, and Wei Huo. "pAFL: Adaptive Energy Allocation with Upper Confidence Bound." In ICCNS 2023: 2023 13th International Conference on Communication and Network Security. New York, NY, USA: ACM, 2023. http://dx.doi.org/10.1145/3638782.3638792.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Saffidine, Abdallah, Tristan Cazenave, and Jean Mehat. "UCD: Upper Confidence Bound for Rooted Directed Acyclic Graphs." In 2010 International Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 2010. http://dx.doi.org/10.1109/taai.2010.79.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
3

Melian-Gutierrez, Laura, Navikkumar Modi, Christophe Moy, Ivan Perez-Alvarez, Faouzi Bader, and Santiago Zazo. "Upper Confidence Bound learning approach for real HF measurements." In 2015 ICC - 2015 IEEE International Conference on Communications Workshops (ICC). IEEE, 2015. http://dx.doi.org/10.1109/iccw.2015.7247209.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
4

Berk, Julian, Sunil Gupta, Santu Rana, and Svetha Venkatesh. "Randomised Gaussian Process Upper Confidence Bound for Bayesian Optimisation." In Twenty-Ninth International Joint Conference on Artificial Intelligence and Seventeenth Pacific Rim International Conference on Artificial Intelligence {IJCAI-PRICAI-20}. California: International Joint Conferences on Artificial Intelligence Organization, 2020. http://dx.doi.org/10.24963/ijcai.2020/316.

Повний текст джерела
Анотація:
In order to improve the performance of Bayesian optimisation, we develop a modified Gaussian process upper confidence bound (GP-UCB) acquisition function. This is done by sampling the exploration-exploitation trade-off parameter from a distribution. We prove that this allows the expected trade-off parameter to be altered to better suit the problem without compromising a bound on the function's Bayesian regret. We also provide results showing that our method achieves better performance than GP-UCB in a range of real-world and synthetic problems.
Стилі APA, Harvard, Vancouver, ISO та ін.
5

Liu, Guangwu, Wen Shi, and Kun Zhang. "An Upper Confidence Bound Approach to Estimating Coherent Risk Measures." In 2019 Winter Simulation Conference (WSC). IEEE, 2019. http://dx.doi.org/10.1109/wsc40007.2019.9004921.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
6

Wan, Yuchen, L. Jeff Hong, and Weiwei Fan. "Upper-Confidence-Bound Procedure for Robust Selection of The Best." In 2023 Winter Simulation Conference (WSC). IEEE, 2023. http://dx.doi.org/10.1109/wsc60868.2023.10407226.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
7

Bonnefoi, Remi, Lilian Besson, Julio Manco-Vasquez, and Christophe Moy. "Upper-Confidence Bound for Channel Selection in LPWA Networks with Retransmissions." In 2019 IEEE Wireless Communications and Networking Conference Workshop (WCNCW). IEEE, 2019. http://dx.doi.org/10.1109/wcncw.2019.8902891.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
8

Jouini, Wassim, Christophe Moy, and Jacques Palicot. "Upper Confidence Bound Algorithm for Opportunistic Spectrum Access with Sensing Errors." In 6th International ICST Conference on Cognitive Radio Oriented Wireless Networks and Communications. IEEE, 2011. http://dx.doi.org/10.4108/icst.crowncom.2011.245851.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
9

Kao, Kuo-Yuan, and I.-Hao Chen. "Maximal expectation as upper confidence bound for multi-armed bandit problems." In 2014 IEEE 7th Joint International Information Technology and Artificial Intelligence Conference (ITAIC). IEEE, 2014. http://dx.doi.org/10.1109/itaic.2014.7065060.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
10

Imagaw, Takahisa, and Tomoyuki Kaneko. "Estimating the Maximum Expected Value through Upper Confidence Bound of Likelihood." In 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI). IEEE, 2017. http://dx.doi.org/10.1109/taai.2017.19.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.

Звіти організацій з теми "Upper Confidence Bound"

1

Wright, T. Rare attributes in finite universe: Hypotheses testing specification and exact randomized upper confidence bounds. Office of Scientific and Technical Information (OSTI), March 1993. http://dx.doi.org/10.2172/10157055.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
2

Wright, T. Rare attributes in finite universe: Hypotheses testing specification and exact randomized upper confidence bounds. Office of Scientific and Technical Information (OSTI), March 1993. http://dx.doi.org/10.2172/6379060.

Повний текст джерела
Стилі APA, Harvard, Vancouver, ISO та ін.
Ми пропонуємо знижки на всі преміум-плани для авторів, чиї праці увійшли до тематичних добірок літератури. Зв'яжіться з нами, щоб отримати унікальний промокод!

До бібліографії