Journal articles on the topic 'Multiarmed Bandits'

To see the other types of publications on this topic, follow the link: Multiarmed Bandits.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Multiarmed Bandits.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Righter, Rhonda, and J. George Shanthikumar. "Independently Expiring Multiarmed Bandits." Probability in the Engineering and Informational Sciences 12, no. 4 (October 1998): 453–68. http://dx.doi.org/10.1017/s0269964800005325.

Full text
Abstract:
We give conditions on the optimality of an index policy for multiarmed bandits when arms expire independently. We also give a new simple proof of the optimality of the Gittins index policy for the classic multiarmed bandit problem.
APA, Harvard, Vancouver, ISO, and other styles
2

Gao, Xiujuan, Hao Liang, and Tong Wang. "A Common Value Experimentation with Multiarmed Bandits." Mathematical Problems in Engineering 2018 (July 30, 2018): 1–8. http://dx.doi.org/10.1155/2018/4791590.

Full text
Abstract:
We study a value common experimentation with multiarmed bandits and give an application about the experimentation. The second derivative of value functions at cutoffs is investigated when an agent switches action with multiarmed bandits. If consumers have identical preference which is unknown and purchase products from only two sellers among multiple sellers, we obtain the necessary and sufficient conditions about the common experimentation. The Markov perfect equilibrium and the socially effective allocation in K-armed markets are discussed.
APA, Harvard, Vancouver, ISO, and other styles
3

Kalathil, Dileep, Naumaan Nayyar, and Rahul Jain. "Decentralized Learning for Multiplayer Multiarmed Bandits." IEEE Transactions on Information Theory 60, no. 4 (April 2014): 2331–45. http://dx.doi.org/10.1109/tit.2014.2302471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Cesa-Bianchi, Nicolò. "MULTIARMED BANDITS IN THE WORST CASE." IFAC Proceedings Volumes 35, no. 1 (2002): 91–96. http://dx.doi.org/10.3182/20020721-6-es-1901.01001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Bray, Robert L., Decio Coviello, Andrea Ichino, and Nicola Persico. "Multitasking, Multiarmed Bandits, and the Italian Judiciary." Manufacturing & Service Operations Management 18, no. 4 (October 2016): 545–58. http://dx.doi.org/10.1287/msom.2016.0586.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Denardo, Eric V., Haechurl Park, and Uriel G. Rothblum. "Risk-Sensitive and Risk-Neutral Multiarmed Bandits." Mathematics of Operations Research 32, no. 2 (May 2007): 374–94. http://dx.doi.org/10.1287/moor.1060.0240.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Weber, Richard. "On the Gittins Index for Multiarmed Bandits." Annals of Applied Probability 2, no. 4 (November 1992): 1024–33. http://dx.doi.org/10.1214/aoap/1177005588.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Drugan, Madalina M. "Covariance Matrix Adaptation for Multiobjective Multiarmed Bandits." IEEE Transactions on Neural Networks and Learning Systems 30, no. 8 (August 2019): 2493–502. http://dx.doi.org/10.1109/tnnls.2018.2885123.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Burnetas, Apostolos N., and Michael N. Katehakis. "ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM." Probability in the Engineering and Informational Sciences 17, no. 1 (January 2003): 53–82. http://dx.doi.org/10.1017/s0269964803171045.

Full text
Abstract:
The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with that of a known one over a finite horizon. We assume that the bandits represent random variables with distributions from the one-parameter exponential family. When the objective is to maximize the Bayes expected sum of outcomes over a finite horizon, it is shown that optimal policies tend to simple limits when the length of the horizon is large.
APA, Harvard, Vancouver, ISO, and other styles
10

Nayyar, Naumaan, Dileep Kalathil, and Rahul Jain. "On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits." IEEE Transactions on Control of Network Systems 5, no. 1 (March 2018): 597–606. http://dx.doi.org/10.1109/tcns.2016.2635380.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Reverdy, Paul B., Vaibhav Srivastava, and Naomi Ehrich Leonard. "Modeling Human Decision Making in Generalized Gaussian Multiarmed Bandits." Proceedings of the IEEE 102, no. 4 (April 2014): 544–71. http://dx.doi.org/10.1109/jproc.2014.2307024.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Krishnamurthy, Vikram, and Bo Wahlberg. "Partially Observed Markov Decision Process Multiarmed Bandits—Structural Results." Mathematics of Operations Research 34, no. 2 (May 2009): 287–302. http://dx.doi.org/10.1287/moor.1080.0371.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Camerlenghi, Federico, Bianca Dumitrascu, Federico Ferrari, Barbara E. Engelhardt, and Stefano Favaro. "Nonparametric Bayesian multiarmed bandits for single-cell experiment design." Annals of Applied Statistics 14, no. 4 (December 2020): 2003–19. http://dx.doi.org/10.1214/20-aoas1370.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Mintz, Yonatan, Anil Aswani, Philip Kaminsky, Elena Flowers, and Yoshimi Fukuoka. "Nonstationary Bandits with Habituation and Recovery Dynamics." Operations Research 68, no. 5 (September 2020): 1493–516. http://dx.doi.org/10.1287/opre.2019.1918.

Full text
Abstract:
In many sequential decision-making settings where there is uncertainty about the reward of each action, frequent selection of specific actions may reduce expected reward while choosing less frequently selected actions could lead to an increase. These effects are commonly observed in settings ranging from personalized healthcare interventions and targeted online advertising. To address this problem, the authors propose a new class of models called ROGUE (reducing or gaining unknown efficacy) multiarmed bandits. In the paper, the authors present a maximum likelihood approach to estimate the parameters of these models, and we show that these estimates can be used to construct upper confidence bound algorithms and epsilon-greedy algorithms for optimizing these models with strong theoretical guarantees. The authors conclude with a simulation study to show that these algorithms perform better than current nonstationary bandit algorithms in terms of both cumulative regret and average reward.
APA, Harvard, Vancouver, ISO, and other styles
15

Glazebrook, K. D., D. Ruiz-Hernandez, and C. Kirkbride. "Some indexable families of restless bandit problems." Advances in Applied Probability 38, no. 3 (September 2006): 643–72. http://dx.doi.org/10.1239/aap/1158684996.

Full text
Abstract:
In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.
APA, Harvard, Vancouver, ISO, and other styles
16

Glazebrook, K. D., D. Ruiz-Hernandez, and C. Kirkbride. "Some indexable families of restless bandit problems." Advances in Applied Probability 38, no. 03 (September 2006): 643–72. http://dx.doi.org/10.1017/s000186780000121x.

Full text
Abstract:
In 1988 Whittle introduced an important but intractable class of restless bandit problems which generalise the multiarmed bandit problems of Gittins by allowing state evolution for passive projects. Whittle's account deployed a Lagrangian relaxation of the optimisation problem to develop an index heuristic. Despite a developing body of evidence (both theoretical and empirical) which underscores the strong performance of Whittle's index policy, a continuing challenge to implementation is the need to establish that the competing projects all pass an indexability test. In this paper we employ Gittins' index theory to establish the indexability of (inter alia) general families of restless bandits which arise in problems of machine maintenance and stochastic scheduling problems with switching penalties. We also give formulae for the resulting Whittle indices. Numerical investigations testify to the outstandingly strong performance of the index heuristics concerned.
APA, Harvard, Vancouver, ISO, and other styles
17

Meshram, Rahul, D. Manjunath, and Aditya Gopalan. "On the Whittle Index for Restless Multiarmed Hidden Markov Bandits." IEEE Transactions on Automatic Control 63, no. 9 (September 2018): 3046–53. http://dx.doi.org/10.1109/tac.2018.2799521.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Caro, Felipe, and Onesun Steve Yoo. "INDEXABILITY OF BANDIT PROBLEMS WITH RESPONSE DELAYS." Probability in the Engineering and Informational Sciences 24, no. 3 (April 23, 2010): 349–74. http://dx.doi.org/10.1017/s0269964810000021.

Full text
Abstract:
This article considers an important class of discrete time restless bandits, given by the discounted multiarmed bandit problems with response delays. The delays in each period are independent random variables, in which the delayed responses do not cross over. For a bandit arm in this class, we use a coupling argument to show that in each state there is a unique subsidy that equates the pulling and nonpulling actions (i.e., the bandit satisfies the indexibility criterion introduced by Whittle (1988). The result allows for infinite or finite horizon and holds for arbitrary delay lengths and infinite state spaces. We compute the resulting marginal productivity indexes (MPI) for the Beta-Bernoulli Bayesian learning model, formulate and compute a tractable upper bound, and compare the suboptimality gap of the MPI policy to those of other heuristics derived from different closed-form indexes. The MPI policy performs near optimally and provides a theoretical justification for the use of the other heuristics.
APA, Harvard, Vancouver, ISO, and other styles
19

Glazebrook, K. D., and R. Minty. "A Generalized Gittins Index for a Class of Multiarmed Bandits with General Resource Requirements." Mathematics of Operations Research 34, no. 1 (February 2009): 26–44. http://dx.doi.org/10.1287/moor.1080.0342.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Farias, Vivek F., and Ritesh Madan. "The Irrevocable Multiarmed Bandit Problem." Operations Research 59, no. 2 (April 2011): 383–99. http://dx.doi.org/10.1287/opre.1100.0891.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Auer, Peter, Nicolò Cesa-Bianchi, Yoav Freund, and Robert E. Schapire. "The Nonstochastic Multiarmed Bandit Problem." SIAM Journal on Computing 32, no. 1 (January 2002): 48–77. http://dx.doi.org/10.1137/s0097539701398375.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Peköz, Erol A. "Some memoryless bandit policies." Journal of Applied Probability 40, no. 1 (March 2003): 250–56. http://dx.doi.org/10.1239/jap/1044476838.

Full text
Abstract:
We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.
APA, Harvard, Vancouver, ISO, and other styles
23

Peköz, Erol A. "Some memoryless bandit policies." Journal of Applied Probability 40, no. 01 (March 2003): 250–56. http://dx.doi.org/10.1017/s0021900200022373.

Full text
Abstract:
We consider a multiarmed bandit problem, where each arm when pulled generates independent and identically distributed nonnegative rewards according to some unknown distribution. The goal is to maximize the long-run average reward per pull with the restriction that any previously learned information is forgotten whenever a switch between arms is made. We present several policies and a peculiarity surrounding them.
APA, Harvard, Vancouver, ISO, and other styles
24

Dayanik, Savas, Warren Powell, and Kazutoshi Yamazaki. "Index policies for discounted bandit problems with availability constraints." Advances in Applied Probability 40, no. 2 (June 2008): 377–400. http://dx.doi.org/10.1239/aap/1214950209.

Full text
Abstract:
A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.
APA, Harvard, Vancouver, ISO, and other styles
25

Dayanik, Savas, Warren Powell, and Kazutoshi Yamazaki. "Index policies for discounted bandit problems with availability constraints." Advances in Applied Probability 40, no. 02 (June 2008): 377–400. http://dx.doi.org/10.1017/s0001867800002573.

Full text
Abstract:
A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.
APA, Harvard, Vancouver, ISO, and other styles
26

Tsitsiklis, J. "A lemma on the multiarmed bandit problem." IEEE Transactions on Automatic Control 31, no. 6 (June 1986): 576–77. http://dx.doi.org/10.1109/tac.1986.1104332.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Reverdy, Paul, Vaibhav Srivastava, and Naomi Ehrich Leonard. "Corrections to “Satisficing in Multiarmed Bandit Problems”." IEEE Transactions on Automatic Control 66, no. 1 (January 2021): 476–78. http://dx.doi.org/10.1109/tac.2020.2981433.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Frostig, Esther, and Gideon Weiss. "Four proofs of Gittins’ multiarmed bandit theorem." Annals of Operations Research 241, no. 1-2 (January 7, 2014): 127–65. http://dx.doi.org/10.1007/s10479-013-1523-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Ishikida, Takashi, and Yat-wah Wan. "Scheduling Jobs That Are Subject to Deterministic Due Dates and Have Deteriorating Expected Rewards." Probability in the Engineering and Informational Sciences 11, no. 1 (January 1997): 65–78. http://dx.doi.org/10.1017/s026996480000468x.

Full text
Abstract:
A single server processes jobs that can yield rewards but expire on predetermined dates. Expected immediate rewards from each job are deteriorating. The instance is formulated as a multiarmed bandit problem, and an index-based scheduling policy is shown to maximize the expected total reward.
APA, Harvard, Vancouver, ISO, and other styles
30

Jiang, Weijin, Pingping Chen, Wanqing Zhang, Yongxia Sun, Chen Junpeng, and Qing Wen. "User Recruitment Algorithm for Maximizing Quality under Limited Budget in Mobile Crowdsensing." Discrete Dynamics in Nature and Society 2022 (January 20, 2022): 1–13. http://dx.doi.org/10.1155/2022/4804231.

Full text
Abstract:
In the mobile crowdsensing task assignment, under the premise that the data platform does not know the user’s perceived quality or cost value, how to establish a suitable user recruitment mechanism is the critical issue that this article needs to solve. It is necessary to learn the user’s perceived quality in the execution p. It also needs to try its best to ensure the efficiency and profit maximization of the mobile group intelligence perception platform. Therefore, this paper proposes a mobile crowdsensing user recruitment algorithm based on Combinatorial Multiarmed Bandit (CMAB) to solve the recruitment problem with known and unknown user costs. Firstly, the user recruitment process is modeled as a combined multiarm bandit model. Each rocker arm represents the selection of different users, and the income obtained represents the user’s perceived quality. Secondly, it proposes the upper confidence bound (UCB) algorithm, which updates the user’s perceptual quality according to the completion of the task. This algorithm sorts the users’ perceived quality values from high to low, then selects the most significant ratio of perceived quality to recruitment costs under the budget condition, assigns tasks, and updates their perceived quality. Finally, this paper introduces the regret value to measure the efficiency of the user recruitment algorithm and conducts many experimental simulations based on real data sets to verify the feasibility and effectiveness of the algorithm. The experimental results show that the recruitment algorithm with known user cost is close to the optimal algorithm, and the recruitment algorithm with unknown user cost is more than 75% of the optimal algorithm result, and the gap tends to decrease as the budget cost increases, compared with other comparisons. The algorithm is 25% higher, which proves that the proposed algorithm has good learning ability and can independently select high-quality users to realize task assignments.
APA, Harvard, Vancouver, ISO, and other styles
31

Zeng, Fanzi, and Xinwang Shen. "Channel Selection Based on Trust and Multiarmed Bandit in Multiuser, Multichannel Cognitive Radio Networks." Scientific World Journal 2014 (2014): 1–6. http://dx.doi.org/10.1155/2014/916156.

Full text
Abstract:
This paper proposes a channel selection scheme for the multiuser, multichannel cognitive radio networks. This scheme formulates the channel selection as the multiarmed bandit problem, where cognitive radio users are compared to the players and channels to the arms. By simulation negotiation we can achieve the potential reward on each channel after it is selected for transmission; then the channel with the maximum accumulated rewards is formally chosen. To further improve the performance, the trust model is proposed and combined with multi-armed bandit to address the channel selection problem. Simulation results validate the proposed scheme.
APA, Harvard, Vancouver, ISO, and other styles
32

Mersereau, A. J., P. Rusmevichientong, and J. N. Tsitsiklis. "A Structured Multiarmed Bandit Problem and the Greedy Policy." IEEE Transactions on Automatic Control 54, no. 12 (December 2009): 2787–802. http://dx.doi.org/10.1109/tac.2009.2031725.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Varaiya, P., J. Walrand, and C. Buyukkoc. "Extensions of the multiarmed bandit problem: The discounted case." IEEE Transactions on Automatic Control 30, no. 5 (May 1985): 426–39. http://dx.doi.org/10.1109/tac.1985.1103989.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Martin, David M., and Fred A. Johnson. "A Multiarmed Bandit Approach to Adaptive Water Quality Management." Integrated Environmental Assessment and Management 16, no. 6 (August 14, 2020): 841–52. http://dx.doi.org/10.1002/ieam.4302.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Kang, Xiaohan, Hong Ri, Mohd Nor Akmal Khalid, and Hiroyuki Iida. "Addictive Games: Case Study on Multi-Armed Bandit Game." Information 12, no. 12 (December 15, 2021): 521. http://dx.doi.org/10.3390/info12120521.

Full text
Abstract:
The attraction of games comes from the player being able to have fun in games. Gambling games that are based on the Variable-Ratio schedule in Skinner’s experiment are the most typical addictive games. It is necessary to clarify the reason why typical gambling games are simple but addictive. Also, the Multiarmed Bandit game is a typical test for Skinner Box design and is most popular in the gambling house, which is a good example to analyze. This article mainly focuses on expanding on the idea of the motion in mind model in the scene of Multiarmed Bandit games, quantifying the player’s psychological inclination by simulation experimental data. By relating with the quantification of player satisfaction and play comfort, the expectation’s feeling is discussed from the energy perspective. Two different energies are proposed: player-side (Er) and game-side energy (Ei). This provides the difference of player-side (Er) and game-side energy (Ei), denoted as Ed to show the player’s psychological gap. Ten settings of mass bandit were simulated. It was found that the setting of the best player confidence (Er) and entry difficulty (Ei) can balance player expectation. The simulation results show that when m=0.3,0.7, the player has the biggest psychological gap, which expresses that player will be motivated by not being reconciled. Moreover, addiction is likely to occur when m∈[0.5,0.7]. Such an approach can also help the developers and educators increase edutainment games’ efficiency and make the game more attractive.
APA, Harvard, Vancouver, ISO, and other styles
36

Meng, Hao, Wasswa Shafik, S. Mojtaba Matinkhah, and Zubair Ahmad. "A 5G Beam Selection Machine Learning Algorithm for Unmanned Aerial Vehicle Applications." Wireless Communications and Mobile Computing 2020 (August 1, 2020): 1–16. http://dx.doi.org/10.1155/2020/1428968.

Full text
Abstract:
The unmanned aerial vehicles (UAVs) emerged into a promising research trend within the recurrent year where current and future networks are to use enhanced connectivity in these digital immigrations in different fields like medical, communication, and search and rescue operations among others. The current technologies are using fixed base stations to operate onsite and off-site in the fixed position with its associated problems like poor connectivity. This open gate for the UAV technology is to be used as a mobile alternative to increase accessibility with fifth-generation (5G) connectivity that focuses on increased availability and connectivity. There has been less usage of wireless technologies in the medical field. This paper first presents a study on deep learning to medical field application in general and provides detailed steps that are involved in the multiarmed bandit (MAB) approach in solving the UAV biomedical engineering technology device and medical exploration to exploitation dilemma. The paper further presents a detailed description of the bandit network applicability to achieve close optimal performance and efficiency of medical engineered devices. The simulated results depicted that a multiarmed bandit problem approach can be applied in optimizing the performance of any medical networked device issue compared to the Thompson sampling, Bayesian algorithm, and ε-greedy algorithm. The results obtained further illustrated the optimized utilization of biomedical engineering technology systems achieving thus close optimal performance on the average period through deep learning of realistic medical situations.
APA, Harvard, Vancouver, ISO, and other styles
37

Chang, Hyeong Soo, and Sanghee Choe. "Combining Multiple Strategies for Multiarmed Bandit Problems and Asymptotic Optimality." Journal of Control Science and Engineering 2015 (2015): 1–7. http://dx.doi.org/10.1155/2015/264953.

Full text
Abstract:
This brief paper provides a simple algorithm that selects a strategy at each time in a given set of multiple strategies for stochastic multiarmed bandit problems, thereby playing the arm by the chosen strategy at each time. The algorithm follows the idea of the probabilisticϵt-switching in theϵt-greedy strategy and is asymptotically optimal in the sense that the selected strategy converges to the best in the set under some conditions on the strategies in the set and the sequence of{ϵt}.
APA, Harvard, Vancouver, ISO, and other styles
38

Yoshida, Y. "Optimal stopping problems for multiarmed bandit processes with arms' independence." Computers & Mathematics with Applications 26, no. 12 (December 1993): 47–60. http://dx.doi.org/10.1016/0898-1221(93)90058-4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Gokcesu, Kaan, and Suleyman Serdar Kozat. "An Online Minimax Optimal Algorithm for Adversarial Multiarmed Bandit Problem." IEEE Transactions on Neural Networks and Learning Systems 29, no. 11 (November 2018): 5565–80. http://dx.doi.org/10.1109/tnnls.2018.2806006.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Misra, Kanishka, Eric M. Schwartz, and Jacob Abernethy. "Dynamic Online Pricing with Incomplete Information Using Multiarmed Bandit Experiments." Marketing Science 38, no. 2 (March 2019): 226–52. http://dx.doi.org/10.1287/mksc.2018.1129.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Toelch, Ulf, Matthew J. Bruce, Marius T. H. Meeus, and Simon M. Reader. "Humans copy rapidly increasing choices in a multiarmed bandit problem." Evolution and Human Behavior 31, no. 5 (September 2010): 326–33. http://dx.doi.org/10.1016/j.evolhumbehav.2010.03.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Muqattash, Isa, and Jiaqiao Hu. "An ϵ-Greedy Multiarmed Bandit Approach to Markov Decision Processes." Stats 6, no. 1 (January 1, 2023): 99–112. http://dx.doi.org/10.3390/stats6010006.

Full text
Abstract:
We present REGA, a new adaptive-sampling-based algorithm for the control of finite-horizon Markov decision processes (MDPs) with very large state spaces and small action spaces. We apply a variant of the ϵ-greedy multiarmed bandit algorithm to each stage of the MDP in a recursive manner, thus computing an estimation of the “reward-to-go” value at each stage of the MDP. We provide a finite-time analysis of REGA. In particular, we provide a bound on the probability that the approximation error exceeds a given threshold, where the bound is given in terms of the number of samples collected at each stage of the MDP. We empirically compare REGA against another sampling-based algorithm called RASA by running simulations against the SysAdmin benchmark problem with 210 states. The results show that REGA and RASA achieved similar performance. Moreover, REGA and RASA empirically outperformed an implementation of the algorithm that uses the “original” ϵ-greedy algorithm that commonly appears in the literature.
APA, Harvard, Vancouver, ISO, and other styles
43

Mansour, Yishay, Aleksandrs Slivkins, and Vasilis Syrgkanis. "Bayesian Incentive-Compatible Bandit Exploration." Operations Research 68, no. 4 (July 2020): 1132–61. http://dx.doi.org/10.1287/opre.2019.1949.

Full text
Abstract:
As self-interested individuals (“agents”) make decisions over time, they utilize information revealed by other agents in the past and produce information that may help agents in the future. This phenomenon is common in a wide range of scenarios in the Internet economy, as well as in medical decisions. Each agent would like to exploit: select the best action given the current information, but would prefer the previous agents to explore: try out various alternatives to collect information. A social planner, by means of a carefully designed recommendation policy, can incentivize the agents to balance the exploration and exploitation so as to maximize social welfare. We model the planner’s recommendation policy as a multiarm bandit algorithm under incentive-compatibility constraints induced by agents’ Bayesian priors. We design a bandit algorithm which is incentive-compatible and has asymptotically optimal performance, as expressed by regret. Further, we provide a black-box reduction from an arbitrary multiarm bandit algorithm to an incentive-compatible one, with only a constant multiplicative increase in regret. This reduction works for very general bandit settings that incorporate contexts and arbitrary partial feedback.
APA, Harvard, Vancouver, ISO, and other styles
44

Uriarte, Alberto, and Santiago Ontañón. "Improving Monte Carlo Tree Search Policies in StarCraft via Probabilistic Models Learned from Replay Data." Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment 12, no. 1 (June 25, 2021): 100–106. http://dx.doi.org/10.1609/aiide.v12i1.12852.

Full text
Abstract:
Applying game-tree search techniques to RTS games poses a significant challenge, given the large branching factors involved. This paper studies an approach to incorporate knowledge learned offline from game replays to guide the search process. Specifically, we propose to learn Naive Bayesian models predicting the probability of action execution in different game states, and use them to inform the search process of Monte Carlo Tree Search. We evaluate the effect of incorporating these models into several Multiarmed Bandit policies for MCTS in the context of StarCraft, showing a significant improvement in gameplay performance.
APA, Harvard, Vancouver, ISO, and other styles
45

Qu, Yuben, Chao Dong, Dawei Niu, Hai Wang, and Chang Tian. "A Two-Dimensional Multiarmed Bandit Approach to Secondary Users with Network Coding in Cognitive Radio Networks." Mathematical Problems in Engineering 2015 (2015): 1–10. http://dx.doi.org/10.1155/2015/672837.

Full text
Abstract:
We study how to utilize network coding to improve the throughput of secondary users (SUs) in cognitive radio networks (CRNs) when the channel quality is unavailable at SUs. We use a two-dimensional multiarmed bandit (MAB) approach to solve the problem of SUs with network coding under unknown channel quality in CRNs. We analytically prove the asymptotical-throughput optimality of the proposed two-dimensional MAB algorithm. Simulation results show that our proposed algorithm achieves comparable throughput performance, compared to both the theoretical upper bound and the scheme assuming known channel quality information.
APA, Harvard, Vancouver, ISO, and other styles
46

Bao, Wenqing, Xiaoqiang Cai, and Xianyi Wu. "A General Theory of MultiArmed Bandit Processes with Constrained Arm Switches." SIAM Journal on Control and Optimization 59, no. 6 (January 2021): 4666–88. http://dx.doi.org/10.1137/19m1282386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Drabik, Ewa. "On nearly selfoptimizing strategies for multiarmed bandit problems with controlled arms." Applicationes Mathematicae 23, no. 4 (1996): 449–73. http://dx.doi.org/10.4064/am-23-4-449-473.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Liu, Haoyang, Keqin Liu, and Qing Zhao. "Learning in a Changing World: Restless Multiarmed Bandit With Unknown Dynamics." IEEE Transactions on Information Theory 59, no. 3 (March 2013): 1902–16. http://dx.doi.org/10.1109/tit.2012.2230215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Agrawal, Himanshu, and Krishna Asawa. "Decentralized Learning for Opportunistic Spectrum Access: Multiuser Restless Multiarmed Bandit Formulation." IEEE Systems Journal 14, no. 2 (June 2020): 2485–96. http://dx.doi.org/10.1109/jsyst.2019.2943361.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Nakayama, Kazuaki, Ryuzo Nakamura, Masato Hisakado, and Shintaro Mori. "Optimal learning dynamics of multiagent system in restless multiarmed bandit game." Physica A: Statistical Mechanics and its Applications 549 (July 2020): 124314. http://dx.doi.org/10.1016/j.physa.2020.124314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography