The statistical power of individual-level risk preference estimation


Accurately estimating risk preferences is of critical importance when evaluating data from many economic experiments or strategic interactions. I use a simulation model to conduct power analyses over two lottery batteries designed to classify individual subjects as being best explained by one of a number of alternative specifications of risk preference models. I propose a case in which there are only two possible alternatives for classification and find that the statistical methods used to classify subjects result in type I and type II errors at rates far beyond traditionally acceptable levels. These results suggest that subjects in experiments must make significantly more choices, or that traditional lottery pair batteries need to be substantially redesigned to make accurate inferences about the risk preference models that characterize a subject’s choices.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8


  1. 1.

    Though the method proposed by De Long and Lang (1992) also addresses the issue of type II errors, they do so using a meta-analyses of published literature, where I employ a power analysis though simulation to determine the probabilities of type I and type II errors directly.

  2. 2.

    HO use a “Strong utility” stochastic specification, so-called because it implies “strong stochastic transitivity”, whereas the CU model implies “moderate stochastic transitivity”. Differences in stochastic specifications can lead to wholly different inferences drawn from the structural model of risk preferences. Wilcox (2008) provides an in-depth review of the implications of different stochastic specifications and the results of an experiment designed to test these implications.

  3. 3.

    It may be the case that 100 observations are insufficiently large to satisfy the asymptotic properties of ML, but this is not the focus of this paper.

  4. 4.

    The Akaike information criterion is given by \({\mathrm{AIC}} = -2 \log L({\hat{\alpha }}) / T + 2k / T\), where \(L({\hat{\alpha }})\) is the log-likelihood of the model at its estimated maximum, k is the number of parameters for that model, and T is the number of observations.

  5. 5.

    As with HO, it may be the case that 80 observations are insufficiently large to satisfy the asymptotic properties of ML for these tests.

  6. 6.

    Typically, when a test indicates the probability of a type I error to be less than 5%, social scientists consider this result “statistically significant,” and when researchers engage in ex ante power analysis, they typically aim for a probability of a type II error less than 20% (Cohen 1988; Gelman and Loken 2014). These values are based on convention, and are somewhat arbitrary. Ronald Fisher disagreed with picking the same level of statistical significance for every analysis: “[...] no scientific worker has a fixed level of significance at which from year to year, and in all circumstances, he rejects hypotheses; he rather gives his mind to each particular case in the light of his evidence and his ideas” (Fisher 1956).

  7. 7.

    Consider a choice probability calculated to be 0.90 for option A, and therefore 0.10 for option B. A random number drawn from an univariate uniform distribution on [0, 1] has a 90% chance of being less than or equal to 0.90, so option A would be chosen 90% of the time by the simulated subject.

  8. 8.

    See the Appendix of HN for estimates of typical university students in the United States, and Harrison and Rutström (2008) for additional reviews of studies with human subjects.

  9. 9.

    While we may expect the probability that an RDU subject is correctly classified to vary somewhat with r and \(\lambda\), how the probability of correct classification changes with the probability weighting parameters, \(\phi\) and \(\eta\), is of greater interest as these parameters define how RDU is different from EUT.

  10. 10.

    The r parameter of subject 8 given as an example in HN (pg. 104) would fall in this range.

  11. 11.

    Recall that a type II error in this analysis is 1 minus the probability of correctly classifying an RDU subject.

  12. 12.

    These are the values estimated for subject 98 in HN (p. 104) whom HN classified as RDU with a Prelec (1998) PWF. HN report in Appendix C that the estimated r parameter is 0.3473 for this subject, which is near the \(r = 0.5\) restriction for this simulation, but do not report the estimated \(\lambda\) parameter.

  13. 13.

    Subject 94 from HN was classified as RDU with estimated parameters \(r = 0.4461\), \(\phi = 1.3907\), and \(\eta = 0.6883\), which fall in this range.

  14. 14.

    While the analyses above use the Wald test to perform classification, and HO uses the likelihood ratio test. Analyses in Online Supplement A using the likelihood ratio test to classify subjects show no qualitative differences in the rates of type I and type II errors. Both types of errors still occur at exceedingly high rates when the likelihood ratio is used instead of the Wald test.


  1. Andersen, S., Fountain, J., Harrison, G. W., & Elisabet Rutström, E. (2014). Estimating subjective probabilities. Journal of Risk and Uncertainty, 48(3), 207–229.

    Article  Google Scholar 

  2. Andersen, S., Harrison, G. W., Lau, M. I., & Elisabet Rutström, E. (2008). Eliciting risk and time preferences. Econometrica, 76(3), 583–618.

    Article  Google Scholar 

  3. Bell, D. E. (1982). Regret in decision making under uncertainty. Operations Research, 30(5), 961–981.

    Article  Google Scholar 

  4. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (Vol. 2). New York: Academic Press.

    Google Scholar 

  5. De Long, J. B., & Lang, K. (1992). Are all economic hypotheses false? Journal of Political Economy, 100(6), 1257–1272.

    Article  Google Scholar 

  6. Feiveson, A. H. (2002). Power by simulation. Stata Journal, 2(2), 107–124.

    Article  Google Scholar 

  7. Fisher, R. (1956). Statistical methods and scientific inference (p. 175). Edinburgh: Oliver & Boyd.

    Google Scholar 

  8. Gelman, A., & Loken, E. (2014). The statistical crisis in science. American Scientist, 102, 460–465.

    Article  Google Scholar 

  9. Harrison, G. W., & Elisabet Rutström, E. (2008). Risk aversion in the laboratory. In J. C. Cox & G. W. Harrison (Eds.), Research in experimental economics (Vol. 12, pp. 41–196). Bingley: Emerald Group Publishing Limited.

    Google Scholar 

  10. Harrison, G. W., Martínez-Correa, J., & Todd Swarthout, J. (2015). Reduction of compound lotteries with objective probabilities: Theory and evidence. Journal of Economic Behavior and Organization, 119, 32–55.

    Article  Google Scholar 

  11. Harrison, G. W., & Ng, J. M. (2016). Evaluating the expected welfare gain from insurance. Journal of Risk and Insurance, 83(1), 91–120.

    Article  Google Scholar 

  12. Hey, J. D., & Orme, C. (1994). Investigating generalizations of expected utility theory using experimental data. Econometrica, 62(6), 1291–1326.

    Article  Google Scholar 

  13. Ioannidis, J. P. A. (2005). Why most published research findings are false. Chance, 18(4), 40–47. (arXiv: 0208024 [gr-qc]).

    Article  Google Scholar 

  14. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica, 47(2), 263–292.

    Article  Google Scholar 

  15. Loomes, G., & Sugden, R. (1982). Regret theory: An alternative theory of rational choice under uncertainty. Economic Journal, 92(368), 805–824.

    Article  Google Scholar 

  16. Loomes, G., & Sugden, R. (1998). Testing different stochastic specifications of risky choice. Economica, 65, 581–598.

    Article  Google Scholar 

  17. McCloskey, D. N., & Ziliak, S. T. (1996). The standard error of regressions. Journal of Economic Literature, 34, 97–114.

    Google Scholar 

  18. Prelec, D. (1998). The probability weighting function. Econometrica, 66(3), 497–527.

    Article  Google Scholar 

  19. Quiggin, J. (1982). A theory of anticipated utility. Journal of Economic Behavior & Organization, 3, 323–343.

    Article  Google Scholar 

  20. Wilcox, N. T. (2008). Stochastic models for binary discrete choice under risk: A critical primer and econometric comparison. In J. C. Cox & G. W. Harrison (Ed.), Research in Experimental Economics, Vol. 12 (pp. 197–292). Bingley, U.K.: Emerald Group Publishing Limited.

  21. Wilcox, N. T. (2011). ‘Stochastically more risk averse:’ A contextual theory of stochastic discrete choice under risk. Journal of Econometrics, 162(1), 89–104.

    Article  Google Scholar 

  22. Zhang, L., & Ortmann, A. (2013). Exploring the meaning of significance in experimental economics. Working Paper. Australian School of Business, University of New South Wales.

Download references

Author information



Corresponding author

Correspondence to Brian Albert Monroe.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Special thanks to Glenn Harrison, Don Ross, and Andre Hofmeyr for providing comments and feedback on this paper.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1430 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Monroe, B.A. The statistical power of individual-level risk preference estimation. J Econ Sci Assoc (2020).

Download citation


  • Power analysis
  • Risk preferences
  • Experimental economics
  • Expected utility theory
  • Rank dependent utility

JEL Classification

  • C12
  • C13
  • C18
  • C52
  • C90