Abstract
In this article, we consider a sequential sampling scheme for efficient estimation of the difference between the means of two independent treatments when the population variances are unequal across groups. The sampling scheme proposed is based on a solution to bandit problems called Thompson sampling. While this approach is most often used to maximize the cumulative payoff over competing treatments, we show that the same method can also be used to balance exploration and exploitation when the aim of the experimenter is to efficiently increase estimation precision. We introduce this novel design optimization method and, by simulation, show its effectiveness.
Similar content being viewed by others
Notes
The uniform Beta(1, 1) prior is not the most uninformative prior; higher variance Beta priors contain less information (see Zhu & Lu, 2004). However, it is maximally uninformative given the assumption of binary outcomes.
The above method of addressing the explore–exploit trade-off in the two-arm binomial bandit case is easy to implement in currently standard analysis packages such as the [R] language for statistical computing. Appendix 1 contains an example of the implementation of the two-armed bandit problem described in this section
In the unlikely event that X A = X B , we randomly choose one treatment with equal probabilities for both and subsequently assign that treatment to the remaining M-N units.
Here, one could also fit a linear model with an intercept and one indicator variable. The method proceeds in the same way (and leads to the same results). However, we feel that this is less intuitive as an explanation.
Do note that the above update will not be computable with a single data point, since it has undefined variance.
Do note that the above update will not be computable with a single datapoint since it has unknown variance.
References
Agrawal, S. and Goyal, N. (2011). Analysis of thompson sampling for the multi-armed bandit problem. arXiv preprint arXiv:1111.1797.
Agrawal, S. and Goyal, N. (2012). Further optimal regret bounds for thompson sampling. arXiv preprint arXiv:1209.3353.
Allen, T. T., Yu, L., & Schmitz, J. (2003). An experimental design criterion for minimizing meta-model prediction errors applied to die casting process design. Journal of the Royal Statistical Society: Series C: Applied Statistics, 52(1), 103–117.
Antille, G. and Weinberg, A. (2000). A Study of D-optimal Designs Efficiency for Polynomial Regression. Universit é de Genève, Faculté des sciences économiques et sociales, Département d’économétrie.
Atkinson, A. C., Donev, A. N., and Tobias, R. D. (2007). Optimum experimental designs, with SAS, volume 34. Oxford University Press Oxford.
Audibert, J.-Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.
Auer, P., & Ortner, R. (2010). UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem. Periodica Mathematica Hungarica, 61(1), 1–11.
Bardsley, W. G., Wood, R. M. W., & Melikhova, E. M. (1996). Optimal design: A computer program to study the best possible spacing of design points for model discrimination. Computers & Chemistry, 20(2), 145–157.
Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments (Monographs on Statistics and Applied Probability). Springer.
Box, J. F. (1987). Guinness, Gosset, Fisher, and Small Samples. Statistical Science, 2(1), 45–52.
Box, G. E. P., & Hill, W. J. (1967). Discrimination among mechanistic models. Technometrics, 9(1), 57–71.
Brezzi, M., & Lai, T. L. (2000). Incomplete Learning from Endogenous Data in Dynamic Allocation. Econometrica, 68(6), 1511–1516.
Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2), 122–142.
Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 10(3), 273–304.
Chapelle, O. and Li, L. (2011). An Empirical Evaluation of Thompson Sampling. In Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F. C. N., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems, pages 2249—-2257.
Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.
Dunlop, M. (2009). Paper Rejected (p > 0.05): An Introduction to the Debate on Appropriateness of Null-Hypothesis Testing. International Journal of Mobile Human Computer Interaction, 1(3), 1–8.
Garivier, A., & Cappé, O. (2011). The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. Bernoulli, 19(1), 13.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian data analysis. CRC press.
Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B Methodological, 41(2), 148–177.
Gittins, J., & Wang, Y. G. (1992). The learning component of dynamic allocation indices. Annals of Statistics, 20(3), 12.
Goodman, S. N. (1999). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine, 130(12), 1005–1013.
Hauser, J. R., Urban, G. L., Liberali, G., & Braun, M. (2009). Website Morphing. Marketing Science, 28(2), 202–223.
Hutchinson, J. W., Kamakura, W. A., & Lynch, J. J. G. (2001). Unobserved Heterogeneity as an Alternative Explanation for “Reversal” Effects in Behavioral Research. Journal of Consumer Research, 27(3), 324–344.
Keller, G., & Rady, S. (2010). Strategic experimentation with Poisson bandits. Theoretical Economics, 5(2), 275–311.
Kruschke, J. K. (2012). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 31, 1–33.
Kuck, H., de Freitas, N., and Doucet, A. (2006). SMC samplers for Bayesian optimal nonlinear design. In Nonlinear Statistical Signal Processing Workshop, 2006 IEEE, pages 99–102. IEEE.
Lai, T. L. (1987). Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. The Annals of Statistics, 15(3), 1091–1114.
Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.
McClelland, G. H. (1997). Optimal design in psychological research. Psychological Methods, 2(1), 3–19.
McCloskey, D. N., & Ziliak, S. T. (2009). The Unreasonable Ineffectiveness of Fisherian ‘Tests’ in Biology, and Especially in Medicine. Biological Theory, 4(1), 44–53.
Meehl, P. E. (1967). Theory testing in psychology and physics: a methodological paradox. Philosophy of Science, 34(74), 103–115.
Myung, J. and Pitt, M. (2009a). Bayesian adaptive optimal design of psychology experiments. Proceedings of the 2nd International Workshop in Sequential Methodologies (IWSM2009), pages 1–6.
Myung, J. I., & Pitt, M. A. (2009b). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499.
Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive design optimization. Journal of Mathematical Psychology, 57(3), 53–67.
Niño-Mora, J. (2007). A (2/3) n 3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS Journal on Computing, 19(4), 596–606.
Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature, 364(6432), 56–58.
O’Brien, T. E., & Funk, G. M. (2003). A gentle introduction to optimal design for regression models. The American Statistician, 57(4), 265–267.
Ortega, P. and Braun, D. (2013). Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference. arXiv preprint arXiv:1303.4431.
Press, W. H. (2009). Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proceedings of the National Academy of Sciences of the United States of America, 106(52), 22387–22392.
Robbins, H. (1985). Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer.
Rosenthal, R., Cooper, H., & Hedges, L. (1994). Parametric measures of effect size. The Handbook of Research Synthesis, Pages, 231–244.
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.
Scott, S. L. (2010). A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 639–658.
Shieh, G. (2013). On using a pilot sample variance for sample size determination in the detection of differences between two means: Power consideration. Psicológica, 34, 125–143.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.
Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statistics & Probability Letters, 78(12), 1526–1533.
Steyvers, M., Lee, M. D., & Wagenmakers, E.-J. (2009). A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 53(3), 168–179.
Sutton, R. S. and Barto, A. G. (1998). Introduction to reinforcement learning. MIT Press.
Thompson, W. R. (1933). On the Likelihood that one Unknown Probability Exceeds Another in view of the Evidence of two Samples. Biometrika, 25(3–4), 285–294.
Vermorel, J. and Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Machine Learning: ECML 2005, pages 437–448. Springer.
Wagenmakers, E., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432.
Welch, B. (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3–4), 330–336.
Whittle, P. (1973). Some General Points in the Theory of Optimal Experimental Design. Journal of the Royal Statistical Society: Series B Methodological, 35(1), 123–130.
Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society: Series B Methodological, 42(2), 143–149.
Wilcox, R. R. (1981). A Review of the Beta-Binomial Model and Its Extensions. Journal of Educational Statistics, 6, 3–32.
Zhang, S., & Lee, M. D. (2010). Optimal experimental design for a class of bandit problems. Journal of Mathematical Psychology, 54(6), 499–508.
Zhu, M., & Lu, A. Y. (2004). The Counter-intuitive Non-informative Prior for the Bernoulli Family. Journal of Statistics Education, 12(2), 1–10.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1 [R]-code for the binomial model
In this section we describe how one can use the [R] language for statistical computing to decide on the allocation of subjects using Thompson sampling in the two armed binomial bandit case (e.g., a case in which two treatments with binary outcomes are compared).
To set up the problem, we need to specify our prior believe(s) regarding the effectiveness of each treatment A and B. For the prior, we use a Beta(1, 1) distribution, independently for each treatment. The density of this prior believe is displayed in the top row of Fig. 1 and places equal weight on all possible outcomes. We first set up two lists to store the initial parameters:
Next, to decide on which treatments to select, we perform a random draw from both Beta A () as well as Beta B ():
Then, to select which treatment to allocate our next subject to (or “which arm to play” in the canonical version of the problem), we compare our two draws and select the highest:
If allocate.to = 1, we allocate the next subject to treatment A, and if allocate.to = 2, we allocate the next subject to treatment B. After assigning to a treatment, the response is observed, and the prior believe is updated by the data (which is either 0 for a failure, and 1 for a success). A simple [R] function suffices:
Suppose that we allocated a subject to treatment B (allocate.to = 2 in line number 5) and we observed a success; then, we update our believe about treatment B:
After updating our believe, we decide on the allocation of subject 2,
Appendix 2 [R]-code for the normal optimization
Implementing the normal inverse - χ 2 model takes a bit more code than implementing the binomial update described in Appendix 1. However, more and more packages that allow researchers to update prior believes “of-the-shelf”, for all kinds of distributions, are available (see, e.g., the LearnBayes package). For the reference prior that is used in the article, the update can be done using only two simple [R] functions:
For each treatment A and B, one collects a vector of observations, which might look likeFootnote 5:
Deciding on the next treatment for optimization of the effect concerns:
The process is repeated after the new data point is observed and added to the vector of observations for the treatment that was selected.
Appendix 3 [R]-code for optimal experiment
The step from optimization of the treatment effect as described in Appendix 2 to minimization of the estimation error is simple, and the same [R] functions can be used (see Appendix 2, code lines 15–24).
Again, for each treatment A and B, one collects a vector of observations, which might look likeFootnote 6
Deciding on the next treatment for minimization of the estimation error now entails first obtaining a draw from the posterior variances,
and next deciding which treatment should be selected to minimize the \( SE\left(\widehat{\delta}\right) \):
Here, the interpretation of the allocate.to variable is the same as in Appendixes 1 and 2.
Rights and permissions
About this article
Cite this article
Kaptein, M. The use of Thompson sampling to increase estimation precision. Behav Res 47, 409–423 (2015). https://doi.org/10.3758/s13428-014-0480-0
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-014-0480-0