Skip to main content
Log in

The use of Thompson sampling to increase estimation precision

  • Published:
Behavior Research Methods Aims and scope Submit manuscript

Abstract

In this article, we consider a sequential sampling scheme for efficient estimation of the difference between the means of two independent treatments when the population variances are unequal across groups. The sampling scheme proposed is based on a solution to bandit problems called Thompson sampling. While this approach is most often used to maximize the cumulative payoff over competing treatments, we show that the same method can also be used to balance exploration and exploitation when the aim of the experimenter is to efficiently increase estimation precision. We introduce this novel design optimization method and, by simulation, show its effectiveness.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. The uniform Beta(1, 1) prior is not the most uninformative prior; higher variance Beta priors contain less information (see Zhu & Lu, 2004). However, it is maximally uninformative given the assumption of binary outcomes.

  2. The above method of addressing the explore–exploit trade-off in the two-arm binomial bandit case is easy to implement in currently standard analysis packages such as the [R] language for statistical computing. Appendix 1 contains an example of the implementation of the two-armed bandit problem described in this section

  3. In the unlikely event that X A  = X B , we randomly choose one treatment with equal probabilities for both and subsequently assign that treatment to the remaining M-N units.

  4. Here, one could also fit a linear model with an intercept and one indicator variable. The method proceeds in the same way (and leads to the same results). However, we feel that this is less intuitive as an explanation.

  5. Do note that the above update will not be computable with a single data point, since it has undefined variance.

  6. Do note that the above update will not be computable with a single datapoint since it has unknown variance.

References

  • Agrawal, S. and Goyal, N. (2011). Analysis of thompson sampling for the multi-armed bandit problem. arXiv preprint arXiv:1111.1797.

  • Agrawal, S. and Goyal, N. (2012). Further optimal regret bounds for thompson sampling. arXiv preprint arXiv:1209.3353.

  • Allen, T. T., Yu, L., & Schmitz, J. (2003). An experimental design criterion for minimizing meta-model prediction errors applied to die casting process design. Journal of the Royal Statistical Society: Series C: Applied Statistics, 52(1), 103–117.

    Article  Google Scholar 

  • Antille, G. and Weinberg, A. (2000). A Study of D-optimal Designs Efficiency for Polynomial Regression. Universit é de Genève, Faculté des sciences économiques et sociales, Département d’économétrie.

  • Atkinson, A. C., Donev, A. N., and Tobias, R. D. (2007). Optimum experimental designs, with SAS, volume 34. Oxford University Press Oxford.

  • Audibert, J.-Y., Munos, R., & Szepesvári, C. (2009). Exploration-exploitation tradeoff using variance estimates in multi-armed bandits. Theoretical Computer Science, 410(19), 1876–1902.

    Article  Google Scholar 

  • Auer, P., & Ortner, R. (2010). UCB Revisited: Improved Regret Bounds for the Stochastic Multi-Armed Bandit Problem. Periodica Mathematica Hungarica, 61(1), 1–11.

    Google Scholar 

  • Bardsley, W. G., Wood, R. M. W., & Melikhova, E. M. (1996). Optimal design: A computer program to study the best possible spacing of design points for model discrimination. Computers & Chemistry, 20(2), 145–157.

    Article  Google Scholar 

  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments (Monographs on Statistics and Applied Probability). Springer.

  • Box, J. F. (1987). Guinness, Gosset, Fisher, and Small Samples. Statistical Science, 2(1), 45–52.

    Article  Google Scholar 

  • Box, G. E. P., & Hill, W. J. (1967). Discrimination among mechanistic models. Technometrics, 9(1), 57–71.

    Article  Google Scholar 

  • Brezzi, M., & Lai, T. L. (2000). Incomplete Learning from Endogenous Data in Dynamic Allocation. Econometrica, 68(6), 1511–1516.

    Article  Google Scholar 

  • Burnetas, A. N., & Katehakis, M. N. (1996). Optimal adaptive policies for sequential allocation problems. Advances in Applied Mathematics, 17(2), 122–142.

    Article  Google Scholar 

  • Chaloner, K., & Verdinelli, I. (1995). Bayesian experimental design: A review. Statistical Science, 10(3), 273–304.

    Article  Google Scholar 

  • Chapelle, O. and Li, L. (2011). An Empirical Evaluation of Thompson Sampling. In Shawe-Taylor, J., Zemel, R. S., Bartlett, P., Pereira, F. C. N., and Weinberger, K. Q., editors, Advances in Neural Information Processing Systems, pages 2249—-2257.

  • Daw, N. D., O’Doherty, J. P., Dayan, P., Seymour, B., & Dolan, R. J. (2006). Cortical substrates for exploratory decisions in humans. Nature, 441(7095), 876–879.

    Article  PubMed Central  PubMed  Google Scholar 

  • Dunlop, M. (2009). Paper Rejected (p > 0.05): An Introduction to the Debate on Appropriateness of Null-Hypothesis Testing. International Journal of Mobile Human Computer Interaction, 1(3), 1–8.

    Article  Google Scholar 

  • Garivier, A., & Cappé, O. (2011). The KL-UCB Algorithm for Bounded Stochastic Bandits and Beyond. Bernoulli, 19(1), 13.

    Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2013). Bayesian data analysis. CRC press.

  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society: Series B Methodological, 41(2), 148–177.

    Google Scholar 

  • Gittins, J., & Wang, Y. G. (1992). The learning component of dynamic allocation indices. Annals of Statistics, 20(3), 12.

    Article  Google Scholar 

  • Goodman, S. N. (1999). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine, 130(12), 1005–1013.

    Article  PubMed  Google Scholar 

  • Hauser, J. R., Urban, G. L., Liberali, G., & Braun, M. (2009). Website Morphing. Marketing Science, 28(2), 202–223.

    Article  Google Scholar 

  • Hutchinson, J. W., Kamakura, W. A., & Lynch, J. J. G. (2001). Unobserved Heterogeneity as an Alternative Explanation for “Reversal” Effects in Behavioral Research. Journal of Consumer Research, 27(3), 324–344.

    Article  Google Scholar 

  • Keller, G., & Rady, S. (2010). Strategic experimentation with Poisson bandits. Theoretical Economics, 5(2), 275–311.

    Article  Google Scholar 

  • Kruschke, J. K. (2012). Bayesian estimation supersedes the t test. Journal of Experimental Psychology: General, 31, 1–33.

    Google Scholar 

  • Kuck, H., de Freitas, N., and Doucet, A. (2006). SMC samplers for Bayesian optimal nonlinear design. In Nonlinear Statistical Signal Processing Workshop, 2006 IEEE, pages 99–102. IEEE.

  • Lai, T. L. (1987). Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. The Annals of Statistics, 15(3), 1091–1114.

    Article  Google Scholar 

  • Lai, T., & Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Advances in Applied Mathematics, 6(1), 4–22.

    Article  Google Scholar 

  • McClelland, G. H. (1997). Optimal design in psychological research. Psychological Methods, 2(1), 3–19.

    Article  Google Scholar 

  • McCloskey, D. N., & Ziliak, S. T. (2009). The Unreasonable Ineffectiveness of Fisherian ‘Tests’ in Biology, and Especially in Medicine. Biological Theory, 4(1), 44–53.

    Article  Google Scholar 

  • Meehl, P. E. (1967). Theory testing in psychology and physics: a methodological paradox. Philosophy of Science, 34(74), 103–115.

    Article  Google Scholar 

  • Myung, J. and Pitt, M. (2009a). Bayesian adaptive optimal design of psychology experiments. Proceedings of the 2nd International Workshop in Sequential Methodologies (IWSM2009), pages 1–6.

  • Myung, J. I., & Pitt, M. A. (2009b). Optimal experimental design for model discrimination. Psychological Review, 116(3), 499.

    Article  PubMed Central  PubMed  Google Scholar 

  • Myung, J. I., Cavagnaro, D. R., & Pitt, M. A. (2013). A tutorial on adaptive design optimization. Journal of Mathematical Psychology, 57(3), 53–67.

    Article  PubMed Central  PubMed  Google Scholar 

  • Niño-Mora, J. (2007). A (2/3) n 3 fast-pivoting algorithm for the gittins index and optimal stopping of a markov chain. INFORMS Journal on Computing, 19(4), 596–606.

    Article  Google Scholar 

  • Nowak, M., & Sigmund, K. (1993). A strategy of win-stay, lose-shift that outperforms tit-for-tat in the Prisoner’s Dilemma game. Nature, 364(6432), 56–58.

    Article  PubMed  Google Scholar 

  • O’Brien, T. E., & Funk, G. M. (2003). A gentle introduction to optimal design for regression models. The American Statistician, 57(4), 265–267.

    Article  Google Scholar 

  • Ortega, P. and Braun, D. (2013). Generalized Thompson Sampling for Sequential Decision-Making and Causal Inference. arXiv preprint arXiv:1303.4431.

  • Press, W. H. (2009). Bandit solutions provide unified ethical models for randomized clinical trials and comparative effectiveness research. Proceedings of the National Academy of Sciences of the United States of America, 106(52), 22387–22392.

    Article  PubMed Central  PubMed  Google Scholar 

  • Robbins, H. (1985). Some aspects of the sequential design of experiments. In Herbert Robbins Selected Papers, pages 169–177. Springer.

  • Rosenthal, R., Cooper, H., & Hedges, L. (1994). Parametric measures of effect size. The Handbook of Research Synthesis, Pages, 231–244.

  • Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.

    Article  Google Scholar 

  • Scott, S. L. (2010). A modern Bayesian look at the multi-armed bandit. Applied Stochastic Models in Business and Industry, 26(6), 639–658.

    Article  Google Scholar 

  • Shieh, G. (2013). On using a pilot sample variance for sample size determination in the detection of differences between two means: Power consideration. Psicológica, 34, 125–143.

    Google Scholar 

  • Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant. Psychological Science, 22(11), 1359–1366.

    Article  PubMed  Google Scholar 

  • Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statistics & Probability Letters, 78(12), 1526–1533.

    Article  Google Scholar 

  • Steyvers, M., Lee, M. D., & Wagenmakers, E.-J. (2009). A Bayesian analysis of human decision-making on bandit problems. Journal of Mathematical Psychology, 53(3), 168–179.

    Article  Google Scholar 

  • Sutton, R. S. and Barto, A. G. (1998). Introduction to reinforcement learning. MIT Press.

  • Thompson, W. R. (1933). On the Likelihood that one Unknown Probability Exceeds Another in view of the Evidence of two Samples. Biometrika, 25(3–4), 285–294.

    Article  Google Scholar 

  • Vermorel, J. and Mohri, M. (2005). Multi-armed bandit algorithms and empirical evaluation. In Machine Learning: ECML 2005, pages 437–448. Springer.

  • Wagenmakers, E., Wetzels, R., Borsboom, D., & van der Maas, H. L. J. (2011). Why psychologists must change the way they analyze their data: The case of psi: Comment on Bem (2011). Journal of Personality and Social Psychology, 100(3), 426–432.

    Article  PubMed  Google Scholar 

  • Welch, B. (1951). On the comparison of several mean values: an alternative approach. Biometrika, 38(3–4), 330–336.

    Article  Google Scholar 

  • Whittle, P. (1973). Some General Points in the Theory of Optimal Experimental Design. Journal of the Royal Statistical Society: Series B Methodological, 35(1), 123–130.

    Google Scholar 

  • Whittle, P. (1980). Multi-armed bandits and the Gittins index. Journal of the Royal Statistical Society: Series B Methodological, 42(2), 143–149.

    Google Scholar 

  • Wilcox, R. R. (1981). A Review of the Beta-Binomial Model and Its Extensions. Journal of Educational Statistics, 6, 3–32.

    Article  Google Scholar 

  • Zhang, S., & Lee, M. D. (2010). Optimal experimental design for a class of bandit problems. Journal of Mathematical Psychology, 54(6), 499–508.

    Article  Google Scholar 

  • Zhu, M., & Lu, A. Y. (2004). The Counter-intuitive Non-informative Prior for the Bernoulli Family. Journal of Statistics Education, 12(2), 1–10.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Maurits Kaptein.

Appendices

Appendix 1 [R]-code for the binomial model

In this section we describe how one can use the [R] language for statistical computing to decide on the allocation of subjects using Thompson sampling in the two armed binomial bandit case (e.g., a case in which two treatments with binary outcomes are compared).

To set up the problem, we need to specify our prior believe(s) regarding the effectiveness of each treatment A and B. For the prior, we use a Beta(1, 1) distribution, independently for each treatment. The density of this prior believe is displayed in the top row of Fig. 1 and places equal weight on all possible outcomes. We first set up two lists to store the initial parameters:

Next, to decide on which treatments to select, we perform a random draw from both Beta A () as well as Beta B ():

Then, to select which treatment to allocate our next subject to (or “which arm to play” in the canonical version of the problem), we compare our two draws and select the highest:

If allocate.to = 1, we allocate the next subject to treatment A, and if allocate.to = 2, we allocate the next subject to treatment B. After assigning to a treatment, the response is observed, and the prior believe is updated by the data (which is either 0 for a failure, and 1 for a success). A simple [R] function suffices:

Suppose that we allocated a subject to treatment B (allocate.to = 2 in line number 5) and we observed a success; then, we update our believe about treatment B:

After updating our believe, we decide on the allocation of subject 2,

Appendix 2 [R]-code for the normal optimization

Implementing the normal inverse - χ 2 model takes a bit more code than implementing the binomial update described in Appendix 1. However, more and more packages that allow researchers to update prior believes “of-the-shelf”, for all kinds of distributions, are available (see, e.g., the LearnBayes package). For the reference prior that is used in the article, the update can be done using only two simple [R] functions:

For each treatment A and B, one collects a vector of observations, which might look likeFootnote 5:

Deciding on the next treatment for optimization of the effect concerns:

The process is repeated after the new data point is observed and added to the vector of observations for the treatment that was selected.

Appendix 3 [R]-code for optimal experiment

The step from optimization of the treatment effect as described in Appendix 2 to minimization of the estimation error is simple, and the same [R] functions can be used (see Appendix 2, code lines 15–24).

Again, for each treatment A and B, one collects a vector of observations, which might look likeFootnote 6

Deciding on the next treatment for minimization of the estimation error now entails first obtaining a draw from the posterior variances,

and next deciding which treatment should be selected to minimize the \( SE\left(\widehat{\delta}\right) \):

Here, the interpretation of the allocate.to variable is the same as in Appendixes 1 and 2.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kaptein, M. The use of Thompson sampling to increase estimation precision. Behav Res 47, 409–423 (2015). https://doi.org/10.3758/s13428-014-0480-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.3758/s13428-014-0480-0

Keywords

Navigation