Optimal design of experiments to identify latent behavioral types


Bayesian optimal experiments that maximize the information gained from collected data are critical to efficiently identify behavioral models. We extend a seminal method for designing Bayesian optimal experiments by introducing two computational improvements that make the procedure tractable: (1) a search algorithm from artificial intelligence that efficiently explores the space of possible design parameters, and (2) a sampling procedure which evaluates each design parameter combination more efficiently. We apply our procedure to a game of imperfect information to evaluate and quantify the computational improvements. We then collect data across five different experimental designs to compare the ability of the optimal experimental design to discriminate among competing behavioral models against the experimental designs chosen by a “wisdom of experts” prediction experiment. We find that data from the experiment suggested by the optimal design approach requires significantly less data to distinguish behavioral models (i.e., test hypotheses) than data from the experiment suggested by experts. Substantively, we find that reinforcement learning best explains human decision-making in the imperfect information game and that behavior is not adequately described by the Bayesian Nash equilibrium. Our procedure is general and computationally efficient and can be applied to dynamically optimize online experiments.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    It is important to note that this difference metric is a directed, asymmetric measure. Wang et al. use a similar metric but propose using the average KL divergence (Wang et al. 2010), which can be expressed as \(I(\theta ) = \displaystyle \sum\nolimits _{i}^n p_i I(i;\theta )\).

  2. 2.

    Following the design by El-Gamal and Palfrey (1996), Player 2 is asked to make a decision even if Player 1 chooses Stop.

  3. 3.

    Wording as used in El-Gamal and Palfrey (1996). See Appendix for a more detailed description.

  4. 4.

    We closely follow the implementation of the Stop-Go game by El-Gamal and Palfrey (1996), where Player 2 makes a decision even if Player 1 chooses Stop.

  5. 5.

    In fact, we could construct the information surface for choosing the optimal design parameters only for a simpler two-player version of our game, and it took approximately 72 h on the following super-computing cluster: Four hundreds parallel R v.3.x jobs, distributed across 56-core x86 64 Little Endian Intel(R) Xeon(R) cpus (E5-2680 v4 @ 2.40GHz; L1d cache: 32K, L1i cache: 32K, L2 cache: 256K, L3 cache: 35840K).

  6. 6.

    All code is available at http://github.com/shakty/optimal-design.

  7. 7.

    Non-uniform sampling can be used when the experimenter has prior over the distribution of model parameters.

  8. 8.

    Note: the optimal experiment we report in Figure 3—\(A\approx 2.0\) and \(\pi \approx 0.5\)—is generated from an information surface comparing three models. The same coordinate is the optimal experiment when we include all four models. See Fig. A.3 in the Appendix for the full information surface.


  1. Azevedo, E. M., Deng, A., Olea, J. L. M., Rao, J., & Weyl, E. G. (2019). A/B testing with fat tails. A/b testing with fat tails. Journal of Political Economy,. https://doi.org/10.1086/710607.

    Article  Google Scholar 

  2. Bakshy, E., Dworkin, L., Karrer, B., Kashin, K. Letham, B., Murthy, A. & Singh, S. (2018). AE: A domain-agnostic platform for adaptive experimentation. In Conference on Neural Information Processing Systems (pp. 1–8). http://eytan.github.io/papers/ae_workshop.pdf.

  3. Balietti, S. (2017). nodeGame: Real-time, synchronous, online experiments in the browser Behavior Research Methods(i), (1–31). https://doi.org/10.3758/s13428-016-0824-z

  4. Berman, R. (2018). Beyond the last touch: Attribution in online advertising. Marketing Science, 37(5), 771–792. https://doi.org/10.1287/mksc.2018.1104.

    Article  Google Scholar 

  5. Berman, R., Pekelis, L., Scott, A., & Van den Bulte, C. (2018). p-Hacking and false discovery in A/B testing. Available at SSRN,. https://doi.org/10.2139/ssrn.3204791.

    Article  Google Scholar 

  6. Bramoullé, Y., Djebbari, H., & Fortin, B. (2020). Peer effects in networks: A survey. CEPR Discussion Paper No. DP14260. http://ftp.iza.org/dp12947.pdf.

  7. Camerer, C. F. (2011). Behavioral game theory: Experiments in strategic interaction. Princeton University Press. http://psycnet.apa.org/record/2003-06054-000.

  8. Camerer, C. F., & Ho, T.-H. (1999). Experience-weighted attraction learning in normal form games. Econometrica, 67(4), 27–874. https://doi.org/10.1111/1468-0262.00054.

    Article  Google Scholar 

  9. Chapman, J., Snowberg, E., Wang, S., & Camerer, C. (2018). Loss attitudes in the US population: Evidence from dynamically optimized sequential experimentation (DOSE). National Bureau of Economic Research, (1–55). https://doi.org/10.3386/w25072.

  10. Contal, E., Buffoni, D., Robicquet, A., & Vayatis, N. (2013). Parallel Gaussian process optimization with upper confidence bound and pure exploration. In Machine Learning and Knowledge Discovery in Databases (pp. 225–240). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-40988-2_15.

  11. David, P. A. (1985). Clio and the economics of QWERTY. American Economic Review, 75(2), 332–337.

    Google Scholar 

  12. De Freitas, N., Smola, A. J., & Zoghi, M. (2012). Exponential regret bounds for Gaussian process bandits with deterministic observations. In Proceedings of the 29th International Coference on International Conference on Machine Learning (pp. 955–962). https://doi.org/10.5555/3042573.3042697.

  13. DellaVigna, S., & Pope, D. (2017). What motivates effort? Evidence and expert forecasts. Review of Economic Studies, 85(2), 1029–1069. https://doi.org/10.1093/restud/rdx033.

    Article  Google Scholar 

  14. Eckles, D., & Kaptein, M. C. (2014). Thompson sampling with the online bootstrap. arXiv, (1–13). arxiv:1410.4009.

  15. El-Gamal, M. A., & Palfrey, T. R. (1996). Economical experiments: Bayesian efficient experimental design. International Journal of Game Theory, 25, 495–517. https://doi.org/10.1007/BF01803953.

    Article  Google Scholar 

  16. Erev, I., & Roth, A. E. (1998). Predicting how people play games: Reinforcement learning in games with unique strategy equilibrium. American Economic Review, 88(4), 848–881.

    Google Scholar 

  17. Erev, I., & Roth, A. E. (2014). Maximization, learning, and economic behavior. Proceedings of the National Academy of Sciences, 111, 10818–10825. https://doi.org/10.1073/pnas.1402846111.

    Article  Google Scholar 

  18. Feltovich, N. (2000). Reinforcement-based versus belief-based learning models in experimental asymmetric-information games. Econometrica, 68(3), 605–641. https://doi.org/10.1111/1468-0262.00125.

    Article  Google Scholar 

  19. Fershtman, C., & Pakes, A. (2012). Dynamic games with asymmetric information: A framework for empirical work. The Quarterly Journal of Economics, 127(4), 1611–1661. https://doi.org/10.1093/qje/qjs025.

    Article  Google Scholar 

  20. Fisher, R. A. (1936). The design of experiments. American Mathematical Monthly, 43(3), 180. https://doi.org/10.2307/2300364.

    Article  Google Scholar 

  21. Foley, M., Forber, P., Smead, R., & Riedl, C. (2018). Conflict and convention in dynamic networks. Journal of the Royal Society Interface, 15(140), 20170835. https://doi.org/10.1098/rsif.2017.0835.

    Article  Google Scholar 

  22. Fudenberg, D., & Tirole, J. (1991). Game theory. Cambridge: MIT Press.

    Google Scholar 

  23. Gale, J., Binmore, K. G., & Samuelson, L. (1995). Learning to be imperfect: The ultimatum game. Games and Economic Behavior, 8(1), 56–90. https://doi.org/10.1016/S0899-8256(05)80017-X.

    Article  Google Scholar 

  24. Gilchrist, D. S., & Sands, E. G. (2016). Something to talk about: Social spillovers in movie consumption. Journal of Political Economy, 124(5), 1339–1382. https://doi.org/10.1086/688177.

    Article  Google Scholar 

  25. Goldman, M., & Rao, J. (2016). Experiments as instruments: Heterogeneous position effects in sponsored search auctions. EEAI Endorsed Transactions on Serious Games,. https://doi.org/10.4108/eai.8-8-2015.2261043.

    Article  Google Scholar 

  26. Görtler, J., Kehlbeck, R., & Deussen, O. (2019). A visual exploration of Gaussian processes. Distill,. https://doi.org/10.23915/distill.00017.

    Article  Google Scholar 

  27. Harsanyi, J. C. (1967). Games with incomplete information played by “Bayesian” players, Part I. The Basic Model. Management Science, 14(3), 159–182. https://doi.org/10.1287/mnsc.1040.0270.

    Article  Google Scholar 

  28. Hertwig, R., & Ortmann, A. (2001). Experimental practices in economics: A methodological challenge for psychologists? Behavioral and Brain Sciences, 24(3), 383–403. https://doi.org/10.1037/e683322011-032.

    Article  Google Scholar 

  29. Hill, T. P. (1995). A statistical derivation of the significant-digit law. Statistical Science, 10(4), 354–363. https://doi.org/10.2307/2246134.

    Article  Google Scholar 

  30. Ho, T.-H., Wang, X., & Camerer, C. F. (2008). Individual differences in EWA learning with partial payoff information. The Economic Journal, 118(525), 37–59. https://doi.org/10.1111/j.1468-0297.2007.02103.x.

    Article  Google Scholar 

  31. Horton, J. J., Rand, D. G., & Zeckhauser, R. J. (2011). The online laboratory: Conducting experiments in a real labor market. Experimental Economics, 14(3), 399–425. https://doi.org/10.1007/s10683-011-9273-9.

    Article  Google Scholar 

  32. Imai, T., & Camerer, C. F. (2018). Estimating time preferences from budget set choices using optimal adaptive design. working paper. http://taisukeimai.com/files/adaptive_ctb.pdf.

  33. Kachelmeier, S. J., & Towry, K. L. (2005). The limitations of experimental design: A case study involving monetary incentive effects in laboratory markets. Experimental Economics, 8(1), 21–33. https://doi.org/10.1007/s10683-005-0435-5.

    Article  Google Scholar 

  34. Katz, M. L., & Shapiro, C. (1985). Network externalities, competition, and compatibility. American Economic Review, 75(3), 424–440.

    Google Scholar 

  35. Knez, M., & Camerer, C. F. (1994). Creating expectational assets in the laboratory: Coordination in ‘weakest-link’ games. Strategic Management Journal, 15(1 S), 101–119. https://doi.org/10.1002/smj.4250150908.

    Article  Google Scholar 

  36. Kohavi, R., Longbotham, R., Sommerfield, D., & Henne, R. M. (2009). Controlled experiments on the web: Survey and practical guide. Data Mining and Knowledge Discovery, 18(1), 140–181. https://doi.org/10.1007/s10618-008-0114-1.

    Article  Google Scholar 

  37. Kohavi, R., & Thomke, S. (2017). The surprising power of online experiments. Harvard Business Review, 95(5), 2–9.

    Google Scholar 

  38. Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. https://doi.org/10.1214/aoms/1177729694.

    Article  Google Scholar 

  39. Letham, B., Karrer, B., Ottoni, G., & Bakshy, E. (2017). Constrained Bayesian optimization with noisy experiments. arXiv, 1–20. arxiv:1706.07094.

  40. Mason, W., & Suri, S. (2012). Conducting behavioral research on Amazon’s Mechanical Turk. Behavior Research Methods, 44(1), 1–23. https://doi.org/10.3758/s13428-011-0124-6.

    Article  Google Scholar 

  41. McIntyre, D. P., & Chintakananda, A. (2014). Competing in network markets: Can the winner take all? Business Horizons, 57(1), 117–125. https://doi.org/10.1016/j.bushor.2013.09.005.

    Article  Google Scholar 

  42. Paolacci, G., Chandler, J., & Ipeirotis, P. G. (2010). Running experiments on Amazon Mechanical Turk. Judgment and Decision Making, 5(5), 411–419.

    Google Scholar 

  43. Parker, B. M., Gilmour, S. G., & Schormans, J. (2017). Optimal design of experiments on connected units with application to social networks. Journal of the Royal Statistical Society C, 66(3), 455–480. https://doi.org/10.1111/rssc.12170.

    Article  Google Scholar 

  44. Phan, T. Q., & Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proceedings of the National Academy of Sciences, 112(21), 6595–6600. https://doi.org/10.1073/pnas.1404770112.

    Article  Google Scholar 

  45. Pooseh, S., Bernhardt, N., Guevara, A., Huys, Q. J. M., & Smolka, M. N. (2018). Value-based decision-making battery: A Bayesian adaptive approach to assess impulsive and risky behavior. Behavior Research Methods, 50(1), 236–249. https://doi.org/10.3758/s13428-017-0866-x.

    Article  Google Scholar 

  46. Rasmussen, C. E., & Williams, C. K. I. (2005). Gaussian processes for machine learning The MIT Press. http://gaussianprocess.org/gpml/.

  47. Rzhetsky, A., Foster, J. G., Foster, I. T., & Evans, J. A. (2015). Choosing experiments to accelerate collective discovery. In Proceedings of the National Academy of Sciences (pp. 1–6). https://doi.org/10.1073/pnas.1509757112.

  48. Salmon, T. C. (2001). An evaluation of econometric models of adaptive learning. Econometrica, 69(6), 1597–1628. https://doi.org/10.1111/1468-0262.00258.

    Article  Google Scholar 

  49. Sarin, R., & Vahid, F. (2001). Predicting how people play games: A simple dynamic model of choice. Games and Economic Behavior, 34(1), 104–122. https://doi.org/10.1006/game.1999.0783.

    Article  Google Scholar 

  50. Schwartz, E. M., Bradlow, E. T., & Fader, P. S. (2017). Customer acquisition via display advertising using multi-armed bandit experiments. Marketing Science, 36(4), 500–522. https://doi.org/10.1287/mksc.2016.1023.

    Article  Google Scholar 

  51. Sobol, I. M. (1998). On quasi-Monte Carlo integrations. Mathematics and Computers in Simulation, 47(2), 103–112. https://doi.org/10.1016/S0378-4754(98)00096-2.

    Article  Google Scholar 

  52. Srinivas, N., Krause, A., Kakade, S. M., & Seeger, M. (2010). Gaussian process optimization in the bandit setting: No regret and experimental design. In Proceedings of the 27th International Conference on International Conference on Machine Learning (pp. 1015–1022). https://doi.org/10.5555/3104322.3104451.

  53. Stefanakis, T. S., Contal, E., Vayatis, N., Dias, F., & Synolakis, C. E. (2014). Can small islands protect nearby coasts from tsunamis? An active experimental design approach. Proceedings of the Royal Society A, 470(2172), 1–20. https://doi.org/10.1098/rspa.2014.0575.

    Article  Google Scholar 

  54. Tauber, E. M. (1972). Why do people shop? Journal of Marketing, 36(4), 46–49. https://doi.org/10.2307/1250426.

    Article  Google Scholar 

  55. Wang, S. W., Filiba, M., & Camerer, C. F. (2010). Dynamically optimized sequential experimentation (DOSE) for estimating economic preference parameters. arXiv, 1–41. http://pdfs.semanticscholar.org/1707/ded4fdc981aedc2a2f6bab077fcf37acb7d5.pdf.

  56. Zhou, S., Valentine, M., & Bernstein, M. S. (2018). In search of the dream team: Temporally constrained multi-armed bandits for identifying effective team structures. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1–13). https://doi.org/10.1145/3173574.3173682.

Download references


The authors acknowledge Mahmoud El-Gamal for helpful correspondences and Stephanie W. Wang for useful comments on the design and implementation. This work was supported in part by the Office of Naval Research (N00014-16-1-3005 and N00014-17-1-2542) and the National Defense Science & Engineering Graduate Fellowship (NDSEG) Program.

Author information




All authors contributed to the study conception and design. All authors contributed to analyses and preparation of the manuscript. S.B. ran the online experiments, and all authors contributed to the expert surveying.

Corresponding author

Correspondence to Christoph Riedl.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1383 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Balietti, S., Klein, B. & Riedl, C. Optimal design of experiments to identify latent behavioral types. Exp Econ (2020). https://doi.org/10.1007/s10683-020-09680-w

Download citation


  • Optimal experimental design
  • Behavioral types
  • Expert prediction
  • Active learning

JEL Classification

  • C90
  • C80
  • C72