Acceder

Bibliografías temáticas / Multiarmed Bandits / Artículos de revistas

Artículos de revistas sobre el tema "Multiarmed Bandits"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Multiarmed Bandits.

Autor: Grafiati

Publicado: 6 de septiembre de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Multiarmed Bandits".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Righter, Rhonda y J. George Shanthikumar. "Independently Expiring Multiarmed Bandits". Probability in the Engineering and Informational Sciences 12, n.º 4 (octubre de 1998): 453–68. http://dx.doi.org/10.1017/s0269964800005325.

Texto completo

Resumen

We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Gao, Xiujuan, Hao Liang y Tong Wang. "A Common Value Experimentation with Multiarmed Bandits". Mathematical Problems in Engineering 2018 (30 de julio de 2018): 1–8. http://dx.doi.org/10.1155/2018/4791590.

Texto completo

Resumen

We study a value common experimentation with multiarmed bandits and give an application about the experimentation. The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers among multiple sellers, we obtain the necessary and sufficient conditions about the common experimentation. The Markov perfect equilibrium and the socially effective allocation in K-armed markets are discussed.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Kalathil, Dileep, Naumaan Nayyar y Rahul Jain. "Decentralized Learning for Multiplayer Multiarmed Bandits". IEEE Transactions on Information Theory 60, n.º 4 (abril de 2014): 2331–45. http://dx.doi.org/10.1109/tit.2014.2302471.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Cesa-Bianchi, Nicolò. "MULTIARMED BANDITS IN THE WORST CASE". IFAC Proceedings Volumes 35, n.º 1 (2002): 91–96. http://dx.doi.org/10.3182/20020721-6-es-1901.01001.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Bray, Robert L., Decio Coviello, Andrea Ichino y Nicola Persico. "Multitasking, Multiarmed Bandits, and the Italian Judiciary". Manufacturing & Service Operations Management 18, n.º 4 (octubre de 2016): 545–58. http://dx.doi.org/10.1287/msom.2016.0586.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Denardo, Eric V., Haechurl Park y Uriel G. Rothblum. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits". Mathematics of Operations Research 32, n.º 2 (mayo de 2007): 374–94. http://dx.doi.org/10.1287/moor.1060.0240.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Weber, Richard. "On the Gittins Index for Multiarmed Bandits". Annals of Applied Probability 2, n.º 4 (noviembre de 1992): 1024–33. http://dx.doi.org/10.1214/aoap/1177005588.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Drugan, Madalina M. "Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits". IEEE Transactions on Neural Networks and Learning Systems 30, n.º 8 (agosto de 2019): 2493–502. http://dx.doi.org/10.1109/tnnls.2018.2885123.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Burnetas, Apostolos N. y Michael N. Katehakis. "ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM". Probability in the Engineering and Informational Sciences 17, n.º 1 (enero de 2003): 53–82. http://dx.doi.org/10.1017/s0269964803171045.

Texto completo

Resumen

The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Nayyar, Naumaan, Dileep Kalathil y Rahul Jain. "On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits". IEEE Transactions on Control of Network Systems 5, n.º 1 (marzo de 2018): 597–606. http://dx.doi.org/10.1109/tcns.2016.2635380.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Reverdy, Paul B., Vaibhav Srivastava y Naomi Ehrich Leonard. "Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits". Proceedings of the IEEE 102, n.º 4 (abril de 2014): 544–71. http://dx.doi.org/10.1109/jproc.2014.2307024.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Krishnamurthy, Vikram y Bo Wahlberg. "Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results". Mathematics of Operations Research 34, n.º 2 (mayo de 2009): 287–302. http://dx.doi.org/10.1287/moor.1080.0371.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Camerlenghi, Federico, Bianca Dumitrascu, Federico Ferrari, Barbara E. Engelhardt y Stefano Favaro. "Nonparametric Bayesian multiarmed bandits for single-cell experiment design". Annals of Applied Statistics 14, n.º 4 (diciembre de 2020): 2003–19. http://dx.doi.org/10.1214/20-aoas1370.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Mintz, Yonatan, Anil Aswani, Philip Kaminsky, Elena Flowers y Yoshimi Fukuoka. "Nonstationary Bandits with Habituation and Recovery Dynamics". Operations Research 68, n.º 5 (septiembre de 2020): 1493–516. http://dx.doi.org/10.1287/opre.2019.1918.

Texto completo

Resumen

In many sequential decision-making settings where there is uncertainty about the reward of each action, frequent selection of specific actions may reduce expected reward while choosing less frequently selected actions could lead to an increase. These effects are commonly observed in settings ranging from personalized healthcare interventions and targeted online advertising. To address this problem, the authors propose a new class of models called ROGUE (reducing or gaining unknown efficacy) multiarmed bandits. In the paper, the authors present a maximum likelihood approach to estimate the parameters of these models, and we show that these estimates can be used to construct upper confidence bound algorithms and epsilon-greedy algorithms for optimizing these models with strong theoretical guarantees. The authors conclude with a simulation study to show that these algorithms perform better than current nonstationary bandit algorithms in terms of both cumulative regret and average reward.

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Glazebrook, K. D., D. Ruiz-Hernandez y C. Kirkbride. "Some indexable families of restless bandit problems". Advances in Applied Probability 38, n.º 3 (septiembre de 2006): 643–72. http://dx.doi.org/10.1239/aap/1158684996.

Texto completo

Resumen

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Glazebrook, K. D., D. Ruiz-Hernandez y C. Kirkbride. "Some indexable families of restless bandit problems". Advances in Applied Probability 38, n.º 03 (septiembre de 2006): 643–72. http://dx.doi.org/10.1017/s000186780000121x.

Texto completo

Resumen

In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Meshram, Rahul, D. Manjunath y Aditya Gopalan. "On the Whittle Index for Restless Multiarmed Hidden Markov Bandits". IEEE Transactions on Automatic Control 63, n.º 9 (septiembre de 2018): 3046–53. http://dx.doi.org/10.1109/tac.2018.2799521.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Caro, Felipe y Onesun Steve Yoo. "INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS". Probability in the Engineering and Informational Sciences 24, n.º 3 (23 de abril de 2010): 349–74. http://dx.doi.org/10.1017/s0269964810000021.

Texto completo

Resumen

This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Glazebrook, K. D. y R. Minty. "A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements". Mathematics of Operations Research 34, n.º 1 (febrero de 2009): 26–44. http://dx.doi.org/10.1287/moor.1080.0342.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Farias, Vivek F. y Ritesh Madan. "The Irrevocable Multiarmed Bandit Problem". Operations Research 59, n.º 2 (abril de 2011): 383–99. http://dx.doi.org/10.1287/opre.1100.0891.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Auer, Peter, Nicolò Cesa-Bianchi, Yoav Freund y Robert E. Schapire. "The Nonstochastic Multiarmed Bandit Problem". SIAM Journal on Computing 32, n.º 1 (enero de 2002): 48–77. http://dx.doi.org/10.1137/s0097539701398375.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Peköz, Erol A. "Some memoryless bandit policies". Journal of Applied Probability 40, n.º 1 (marzo de 2003): 250–56. http://dx.doi.org/10.1239/jap/1044476838.

Texto completo

Resumen

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Peköz, Erol A. "Some memoryless bandit policies". Journal of Applied Probability 40, n.º 01 (marzo de 2003): 250–56. http://dx.doi.org/10.1017/s0021900200022373.

Texto completo

Resumen

We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Dayanik, Savas, Warren Powell y Kazutoshi Yamazaki. "Index policies for discounted bandit problems with availability constraints". Advances in Applied Probability 40, n.º 2 (junio de 2008): 377–400. http://dx.doi.org/10.1239/aap/1214950209.

Texto completo

Resumen

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Dayanik, Savas, Warren Powell y Kazutoshi Yamazaki. "Index policies for discounted bandit problems with availability constraints". Advances in Applied Probability 40, n.º 02 (junio de 2008): 377–400. http://dx.doi.org/10.1017/s0001867800002573.

Texto completo

Resumen

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Tsitsiklis, J. "A lemma on the multiarmed bandit problem". IEEE Transactions on Automatic Control 31, n.º 6 (junio de 1986): 576–77. http://dx.doi.org/10.1109/tac.1986.1104332.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Reverdy, Paul, Vaibhav Srivastava y Naomi Ehrich Leonard. "Corrections to “Satisficing in Multiarmed Bandit Problems”". IEEE Transactions on Automatic Control 66, n.º 1 (enero de 2021): 476–78. http://dx.doi.org/10.1109/tac.2020.2981433.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Frostig, Esther y Gideon Weiss. "Four proofs of Gittins’ multiarmed bandit theorem". Annals of Operations Research 241, n.º 1-2 (7 de enero de 2014): 127–65. http://dx.doi.org/10.1007/s10479-013-1523-0.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Ishikida, Takashi y Yat-wah Wan. "Scheduling Jobs That Are Subject to Deterministic Due Dates and Have Deteriorating Expected Rewards". Probability in the Engineering and Informational Sciences 11, n.º 1 (enero de 1997): 65–78. http://dx.doi.org/10.1017/s026996480000468x.

Texto completo

Resumen

A single server processes jobs that can yield rewards but expire on predetermined dates. Expected immediate rewards from each job are deteriorating. The instance is formulated as a multiarmed bandit problem, and an index-based scheduling policy is shown to maximize the expected total reward.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Jiang, Weijin, Pingping Chen, Wanqing Zhang, Yongxia Sun, Chen Junpeng y Qing Wen. "User Recruitment Algorithm for Maximizing Quality under Limited Budget in Mobile Crowdsensing". Discrete Dynamics in Nature and Society 2022 (20 de enero de 2022): 1–13. http://dx.doi.org/10.1155/2022/4804231.

Texto completo

Resumen

In the mobile crowdsensing task assignment, under the premise that the data platform does not know the user’s perceived quality or cost value, how to establish a suitable user recruitment mechanism is the critical issue that this article needs to solve. It is necessary to learn the user’s perceived quality in the execution p. It also needs to try its best to ensure the efficiency and profit maximization of the mobile group intelligence perception platform. Therefore, this paper proposes a mobile crowdsensing user recruitment algorithm based on Combinatorial Multiarmed Bandit (CMAB) to solve the recruitment problem with known and unknown user costs. Firstly, the user recruitment process is modeled as a combined multiarm bandit model. Each rocker arm represents the selection of different users, and the income obtained represents the user’s perceived quality. Secondly, it proposes the upper confidence bound (UCB) algorithm, which updates the user’s perceptual quality according to the completion of the task. This algorithm sorts the users’ perceived quality values from high to low, then selects the most significant ratio of perceived quality to recruitment costs under the budget condition, assigns tasks, and updates their perceived quality. Finally, this paper introduces the regret value to measure the efficiency of the user recruitment algorithm and conducts many experimental simulations based on real data sets to verify the feasibility and effectiveness of the algorithm. The experimental results show that the recruitment algorithm with known user cost is close to the optimal algorithm, and the recruitment algorithm with unknown user cost is more than 75% of the optimal algorithm result, and the gap tends to decrease as the budget cost increases, compared with other comparisons. The algorithm is 25% higher, which proves that the proposed algorithm has good learning ability and can independently select high-quality users to realize task assignments.

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Zeng, Fanzi y Xinwang Shen. "Channel Selection Based on Trust and Multiarmed Bandit in Multiuser, Multichannel Cognitive Radio Networks". Scientific World Journal 2014 (2014): 1–6. http://dx.doi.org/10.1155/2014/916156.

Texto completo

Resumen

This paper proposes a channel selection scheme for the multiuser, multichannel cognitive radio networks. This scheme formulates the channel selection as the multiarmed bandit problem, where cognitive radio users are compared to the players and channels to the arms. By simulation negotiation we can achieve the potential reward on each channel after it is selected for transmission; then the channel with the maximum accumulated rewards is formally chosen. To further improve the performance, the trust model is proposed and combined with multi-armed bandit to address the channel selection problem. Simulation results validate the proposed scheme.

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Mersereau, A. J., P. Rusmevichientong y J. N. Tsitsiklis. "A Structured Multiarmed Bandit Problem and the Greedy Policy". IEEE Transactions on Automatic Control 54, n.º 12 (diciembre de 2009): 2787–802. http://dx.doi.org/10.1109/tac.2009.2031725.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Varaiya, P., J. Walrand y C. Buyukkoc. "Extensions of the multiarmed bandit problem: The discounted case". IEEE Transactions on Automatic Control 30, n.º 5 (mayo de 1985): 426–39. http://dx.doi.org/10.1109/tac.1985.1103989.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Martin, David M. y Fred A. Johnson. "A Multiarmed Bandit Approach to Adaptive Water Quality Management". Integrated Environmental Assessment and Management 16, n.º 6 (14 de agosto de 2020): 841–52. http://dx.doi.org/10.1002/ieam.4302.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Kang, Xiaohan, Hong Ri, Mohd Nor Akmal Khalid y Hiroyuki Iida. "Addictive Games: Case Study on Multi-Armed Bandit Game". Information 12, n.º 12 (15 de diciembre de 2021): 521. http://dx.doi.org/10.3390/info12120521.

Texto completo

Resumen

The attraction of games comes from the player being able to have fun in games. Gambling games that are based on the Variable-Ratio schedule in Skinner’s experiment are the most typical addictive games. It is necessary to clarify the reason why typical gambling games are simple but addictive. Also, the Multiarmed Bandit game is a typical test for Skinner Box design and is most popular in the gambling house, which is a good example to analyze. This article mainly focuses on expanding on the idea of the motion in mind model in the scene of Multiarmed Bandit games, quantifying the player’s psychological inclination by simulation experimental data. By relating with the quantification of player satisfaction and play comfort, the expectation’s feeling is discussed from the energy perspective. Two different energies are proposed: player-side (Er) and game-side energy (Ei). This provides the difference of player-side (Er) and game-side energy (Ei), denoted as Ed to show the player’s psychological gap. Ten settings of mass bandit were simulated. It was found that the setting of the best player confidence (Er) and entry difficulty (Ei) can balance player expectation. The simulation results show that when m=0.3,0.7, the player has the biggest psychological gap, which expresses that player will be motivated by not being reconciled. Moreover, addiction is likely to occur when m∈[0.5,0.7]. Such an approach can also help the developers and educators increase edutainment games’ efficiency and make the game more attractive.

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Meng, Hao, Wasswa Shafik, S. Mojtaba Matinkhah y Zubair Ahmad. "A 5G Beam Selection Machine Learning Algorithm for Unmanned Aerial Vehicle Applications". Wireless Communications and Mobile Computing 2020 (1 de agosto de 2020): 1–16. http://dx.doi.org/10.1155/2020/1428968.

Texto completo

Resumen

The unmanned aerial vehicles (UAVs) emerged into a promising research trend within the recurrent year where current and future networks are to use enhanced connectivity in these digital immigrations in different fields like medical, communication, and search and rescue operations among others. The current technologies are using fixed base stations to operate onsite and off-site in the fixed position with its associated problems like poor connectivity. This open gate for the UAV technology is to be used as a mobile alternative to increase accessibility with fifth-generation (5G) connectivity that focuses on increased availability and connectivity. There has been less usage of wireless technologies in the medical field. This paper first presents a study on deep learning to medical field application in general and provides detailed steps that are involved in the multiarmed bandit (MAB) approach in solving the UAV biomedical engineering technology device and medical exploration to exploitation dilemma. The paper further presents a detailed description of the bandit network applicability to achieve close optimal performance and efficiency of medical engineered devices. The simulated results depicted that a multiarmed bandit problem approach can be applied in optimizing the performance of any medical networked device issue compared to the Thompson sampling, Bayesian algorithm, and ε-greedy algorithm. The results obtained further illustrated the optimized utilization of biomedical engineering technology systems achieving thus close optimal performance on the average period through deep learning of realistic medical situations.

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Chang, Hyeong Soo y Sanghee Choe. "Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality". Journal of Control Science and Engineering 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/264953.

Texto completo

Resumen

This brief paper provides a simple algorithm that selects a strategy at each time in a given set of multiple strategies for stochastic multiarmed bandit problems, thereby playing the arm by the chosen strategy at each time. The algorithm follows the idea of the probabilisticϵt-switching in theϵt-greedy strategy and is asymptotically optimal in the sense that the selected strategy converges to the best in the set under some conditions on the strategies in the set and the sequence of{ϵt}.

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Yoshida, Y. "Optimal stopping problems for multiarmed bandit processes with arms' independence". Computers & Mathematics with Applications 26, n.º 12 (diciembre de 1993): 47–60. http://dx.doi.org/10.1016/0898-1221(93)90058-4.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Gokcesu, Kaan y Suleyman Serdar Kozat. "An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem". IEEE Transactions on Neural Networks and Learning Systems 29, n.º 11 (noviembre de 2018): 5565–80. http://dx.doi.org/10.1109/tnnls.2018.2806006.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Misra, Kanishka, Eric M. Schwartz y Jacob Abernethy. "Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments". Marketing Science 38, n.º 2 (marzo de 2019): 226–52. http://dx.doi.org/10.1287/mksc.2018.1129.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Toelch, Ulf, Matthew J. Bruce, Marius T. H. Meeus y Simon M. Reader. "Humans copy rapidly increasing choices in a multiarmed bandit problem". Evolution and Human Behavior 31, n.º 5 (septiembre de 2010): 326–33. http://dx.doi.org/10.1016/j.evolhumbehav.2010.03.002.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Muqattash, Isa y Jiaqiao Hu. "An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes". Stats 6, n.º 1 (1 de enero de 2023): 99–112. http://dx.doi.org/10.3390/stats6010006.

Texto completo

Resumen

We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Mansour, Yishay, Aleksandrs Slivkins y Vasilis Syrgkanis. "Bayesian Incentive-Compatible Bandit Exploration". Operations Research 68, n.º 4 (julio de 2020): 1132–61. http://dx.doi.org/10.1287/opre.2019.1949.

Texto completo

Resumen

As self-interested individuals (“agents”) make decisions over time, they utilize information revealed by other agents in the past and produce information that may help agents in the future. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in medical decisions. Each agent would like to exploit: select the best action given the current information, but would prefer the previous agents to explore: try out various alternatives to collect information. A social planner, by means of a carefully designed recommendation policy, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We model the planner’s recommendation policy as a multiarm bandit algorithm under incentive-compatibility constraints induced by agents’ Bayesian priors. We design a bandit algorithm which is incentive-compatible and has asymptotically optimal performance, as expressed by regret. Further, we provide a black-box reduction from an arbitrary multiarm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings that incorporate contexts and arbitrary partial feedback.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Uriarte, Alberto y Santiago Ontañón. "Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data". Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 12, n.º 1 (25 de junio de 2021): 100–106. http://dx.doi.org/10.1609/aiide.v12i1.12852.

Texto completo

Resumen

Applying game-tree search techniques to RTS games poses a significant challenge, given the large branching factors involved. This paper studies an approach to incorporate knowledge learned offline from game replays to guide the search process. Specifically, we propose to learn Naive Bayesian models predicting the probability of action execution in different game states, and use them to inform the search process of Monte Carlo Tree Search. We evaluate the effect of incorporating these models into several Multiarmed Bandit policies for MCTS in the context of StarCraft, showing a significant improvement in gameplay performance.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Qu, Yuben, Chao Dong, Dawei Niu, Hai Wang y Chang Tian. "A Two-Dimensional Multiarmed Bandit Approach to Secondary Users with Network Coding in Cognitive Radio Networks". Mathematical Problems in Engineering 2015 (2015): 1–10. http://dx.doi.org/10.1155/2015/672837.

Texto completo

Resumen

We study how to utilize network coding to improve the throughput of secondary users (SUs) in cognitive radio networks (CRNs) when the channel quality is unavailable at SUs. We use a two-dimensional multiarmed bandit (MAB) approach to solve the problem of SUs with network coding under unknown channel quality in CRNs. We analytically prove the asymptotical-throughput optimality of the proposed two-dimensional MAB algorithm. Simulation results show that our proposed algorithm achieves comparable throughput performance, compared to both the theoretical upper bound and the scheme assuming known channel quality information.

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Bao, Wenqing, Xiaoqiang Cai y Xianyi Wu. "A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches". SIAM Journal on Control and Optimization 59, n.º 6 (enero de 2021): 4666–88. http://dx.doi.org/10.1137/19m1282386.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Drabik, Ewa. "On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms". Applicationes Mathematicae 23, n.º 4 (1996): 449–73. http://dx.doi.org/10.4064/am-23-4-449-473.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Liu, Haoyang, Keqin Liu y Qing Zhao. "Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics". IEEE Transactions on Information Theory 59, n.º 3 (marzo de 2013): 1902–16. http://dx.doi.org/10.1109/tit.2012.2230215.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Agrawal, Himanshu y Krishna Asawa. "Decentralized Learning for Opportunistic Spectrum Access: Multiuser Restless Multiarmed Bandit Formulation". IEEE Systems Journal 14, n.º 2 (junio de 2020): 2485–96. http://dx.doi.org/10.1109/jsyst.2019.2943361.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Nakayama, Kazuaki, Ryuzo Nakamura, Masato Hisakado y Shintaro Mori. "Optimal learning dynamics of multiagent system in restless multiarmed bandit game". Physica A: Statistical Mechanics and its Applications 549 (julio de 2020): 124314. http://dx.doi.org/10.1016/j.physa.2020.124314.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!