Theory and Decision

, Volume 80, Issue 3, pp 363–387 | Cite as

In search of good probability assessors: an experimental comparison of elicitation rules for confidence judgments

  • Guillaume Hollard
  • Sébastien MassoniEmail author
  • Jean-Christophe Vergnaud


In this paper, we use an experimental design to compare the performance of elicitation rules for subjective beliefs. Contrary to previous works in which elicited beliefs are compared to an objective benchmark, we consider a purely subjective belief framework (confidence in one’s own performance in a cognitive task and a perceptual task). The performance of different elicitation rules is assessed according to the accuracy of stated beliefs in predicting success. We measure this accuracy using two main factors: calibration and discrimination. For each of them, we propose two statistical indexes and we compare the rules’ performances for each measurement. The matching probability method provides more accurate beliefs in terms of discrimination, while the quadratic scoring rule reduces overconfidence and the free rule, a simple rule with no incentives, which succeeds in eliciting accurate beliefs. Nevertheless, the matching probability appears to be the best mechanism for eliciting beliefs due to its performances in terms of calibration and discrimination, but also its ability to elicit consistent beliefs across measures and across tasks, as well as its empirical and theoretical properties.


Belief elicitation Scoring rules Confidence Calibration Discrimination Incentives 



The authors are grateful for insightful comments by Karim N’Diaye, Steve Fleming, Thibault Gajdos and Peter Wakker; participants at the ESE Conference in Rotterdam, the EBIM Workshop in Paris, the LabSi Workshop in Sienna, the FUR XIV in Newcastle, the ESA 2010 in Copenhagen, the SABE 2010 in San Diego, the EMPG 2011 in Paris, and the North American ES Winter Meeting 2012 in Chicago.


  1. Abdellaoui, M., Vossmann, F., & Weber, M. (2005). Choice-based elicitation and decomposition of decision weights for gains and losses under uncertainty. Management Science, 51(9), 1384–1399.CrossRefGoogle Scholar
  2. Andersen, S., Fountain, J., Harrison, G., & Rutstrom, E. (2010). Estimating subjective probabilities. CEAR Working Paper.Google Scholar
  3. Armantier, O., & Treich, N. (2013). Eliciting beliefs: Proper scoring rules, incentives, stakes and hedging. European Economic Review, 62, 17–40.CrossRefGoogle Scholar
  4. Arrow, K. J. (1951). Alternative approaches to the theory of choice in risk-taking situations. Econometrica, 19, 404–437.CrossRefGoogle Scholar
  5. Baillon, A., & Bleichrodt, H. (2015). Testing ambiguity models through the measurement of probabilities for gains and losses. American Economic Journal: Microeconomics (forthcoming), 7(2), 77–100.Google Scholar
  6. Baillon, A., Cabantous, L., & Wakker, P. (2012). Aggregating imprecise or conflicting beliefs: An experimental investigation using modern ambiguity theories. Journal of Risk and Uncertainty, 44(2), 115–147.CrossRefGoogle Scholar
  7. Baranski, J., & Petrusic, W. (1994). The calibration and resolution of confidence in perceptual judgments. Perception and Psychophysics, 55(4), 412–428.CrossRefGoogle Scholar
  8. Becker, G., DeGroot, M., & Marschak, J. (1964). Measuring utility by a single-response sequential method. Behavioral Science, 9(3), 226–232.CrossRefGoogle Scholar
  9. Biais, B., Hilton, D., Mazurier, K., & Pouget, S. (2005). Judgmental overconfidence, self monitoring, and trading performance in an experimental financial market. The Review of Economic Studies, 72(2), 287–312.CrossRefGoogle Scholar
  10. Blavatskyy, P. (2009). Betting on own knowledge: Experimental test of overconfidence. Journal of Risk and Uncertainty, 38(1), 39–49.CrossRefGoogle Scholar
  11. Brainard, D. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436.CrossRefGoogle Scholar
  12. Brier, G. W. (1950). Verification of forecasts expressed in terms of probability. Monthly Weather Review, 78(1), 1–3.CrossRefGoogle Scholar
  13. Camerer, C., & Lovallo, D. (1999). Overconfidence and excess entry: An experimental approach. The American Economic Review, 89(1), 306–318.CrossRefGoogle Scholar
  14. Clark, J., & Friesen, L. (2009). Overconfidence in forecasts of own performance: An experimental study. The Economic Journal, 119(534), 229–251.CrossRefGoogle Scholar
  15. Dimmock, S., Kouwenberg, R., & Wakker, P. (2011). Ambiguity attitudes and portfolio choice: Evidence from a large representative survey. Netspar Discussion Paper No 06/2011-054.Google Scholar
  16. Fleming, S., & Dolan, R. (2012). The neural basis of accurate metacognition. Philosophical Transactions of the Royal Society B, 367(1594), 1338–1349.CrossRefGoogle Scholar
  17. Fleming, S. M., Weil, R. S., Nagy, Z., Dolan, R. J., & Rees, G. (2010). Relating introspective accuracy to individual differences in brain structure. Science, 329, 1541–1543.CrossRefGoogle Scholar
  18. Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003). Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions. Psychonomic Bulletin and Review, 10, 843–876.CrossRefGoogle Scholar
  19. Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477), 359–378.CrossRefGoogle Scholar
  20. Green, D. M., & Swets, J. A. (1966). Signal detection theory and psychophysics. New York: Wiley.Google Scholar
  21. Grether, D. (1992). Testing Bayes rule and the representativeness heuristic: Some experimental evidence. Journal of Economic Behavior and Organization, 17, 31–57.CrossRefGoogle Scholar
  22. Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36.CrossRefGoogle Scholar
  23. Hao, L., & Houser, D. (2012). Belief elicitation in the presence of naive respondents: An experimental study. Journal of Risk and Uncertainty, 44(2), 161–180.CrossRefGoogle Scholar
  24. Harvey, N. (1997). Confidence in judgment. Trends in Cognitive Sciences, 1(2), 78–82.CrossRefGoogle Scholar
  25. Holt, C. (2006). Markets, games, and strategic behavior: Recipes for interactive learning. Reading: Addison-Wesley.Google Scholar
  26. Holt, C., & Smith, M. (2009). An update on Bayesian updating. Journal of Economic Behavior and Organization, 69(2), 125–134.CrossRefGoogle Scholar
  27. Hossain, T., & Okui, R. (2013). The binarized scoring rule. The Review of Economic Studies, 80(3), 984–1001.CrossRefGoogle Scholar
  28. Kadane, J. B., & Winkler, R. L. (1988). Separating probability elicitation from utilities. Journal of the American Statistical Association, 83(402), 357–363.CrossRefGoogle Scholar
  29. Kaivanto, K. (2006). Informational rent, publicly known firm type, and ‘closeness’ in relationship finance. Economics Letters, 91(3), 430–435.CrossRefGoogle Scholar
  30. Karni, E. (2009). A mechanism for eliciting probabilities. Econometrica, 77(2), 603–606.CrossRefGoogle Scholar
  31. Kothiyal, A., Spinu, V., & Wakker, P. (2011). Comonotonic proper scoring rules to measure ambiguity and subjective beliefs. Journal of Multi-Criteria Decision Analysis, 17, 101–113.CrossRefGoogle Scholar
  32. LaValle, I. H. (1978). Fundamentals of decision analysis. New York: Holt, Rinehart and Winston.Google Scholar
  33. Levitt, H. (1971). Transformed up-down methods in psychoacoustics. Journal of the Acoustical Society of America, 49, 467–477.CrossRefGoogle Scholar
  34. Lichtenstein, S., & Fischhoff, B. (1977). Do those who know more also know more about how much they know? The calibration of probability judgments. Organizational Behavior and Human Performance, 20(7), 159–183.CrossRefGoogle Scholar
  35. Lichtenstein, S., Fischhoff, B., & Phillips, L. (1982). Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.), Judgment under uncertainty: Heuristic and biases (pp. 306–334). Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  36. Massoni, S. (2009). A direct revelation mechanism for elicitating confidence in perceptual and cognitive tasks: An experimental study. Master’s Thesis, Université Paris 1.Google Scholar
  37. Massoni, S., Gajdos, T., & Vergnaud, J. C. (2014). Confidence measurement in the light of signal detection theory. Frontiers in Psychology, 5, 1455.CrossRefGoogle Scholar
  38. McCurdy, L., Maniscalco, B., Metcalfe, J., Liu, K., de Lange, F., & Lau, H. (2013). Anatomical coupling between distinct metacognitive systems for memory and visual perception. The Journal of Neuroscience, 33(5), 1897–1906.CrossRefGoogle Scholar
  39. Mobius, M., Niederle, M., Niehaus, P., & Rosenblat, T. (2011). Managing self-confidence: Theory and experimental evidence. NBER Working Paper No 17014.Google Scholar
  40. Murphy, A. H. (1972). Scalar and vector partitions of the probability score. Part I: Two-state situation. Journal of Applied Meteorology, 11, 273–282.CrossRefGoogle Scholar
  41. Murphy, A. H. (1998). The early history of probability forecasts: Some extensions and clarifications. Weather and Forecasting, 13, 5–15.CrossRefGoogle Scholar
  42. Nyarko, Y., & Schotter, A. (2002). An experimental study of belief learning using elicited beliefs. Econometrica, 70(3), 971–1005.CrossRefGoogle Scholar
  43. Offerman, T., Sonnemans, J., Van de Kuilen, G., & Wakker, P. (2009). A truth-serum for non-Bayesian: Correcting proper scoring rules for risk attitudes. Review of Economic Studies, 76(4), 1461–1489.CrossRefGoogle Scholar
  44. Palfrey, T., & Wang, S. (2009). On eliciting beliefs in strategic games. Journal of Economic Behavior and Organization, 71(2), 98–109.CrossRefGoogle Scholar
  45. Raiffa, H. (1968). Decision analysis. London: Addison-Wesley.Google Scholar
  46. Rounis, E., Maniscalco, B., Rothwell, J. C., Passingham, R. E., & Lau, H. (2010). Theta-burst transcranial magnetic stimulation to the prefrontal cortex impairs metacognitive visual awareness. Cognitive Neuroscience, 1(3), 165–175.CrossRefGoogle Scholar
  47. Schotter, A., & Trevino, I. (2014). Belief Elicitation in the Laboratory. Annual Review of Economics, 6, 103–128.CrossRefGoogle Scholar
  48. Song, C., Kanai, R., Fleming, S., Weil, R., Schwarzkopf, D., & Rees, G. (2011). Relating inter-individual differences in metacognitive performance on different perceptual tasks. Consciousness and Cognition, 20(4), 1787–1792.CrossRefGoogle Scholar
  49. Trautmann, S., & van de Kuilen, G. (2015). Belief elicitation: A horse race among truth serums. The Economic Journal (forthcoming).Google Scholar
  50. Wallsten, T. S., & Budescu, D. V. (1983). Encoding subjective probabilities: A psychological and psychometric review. Management Science, 29(2), 151–173.CrossRefGoogle Scholar
  51. Winkler, R. L. (1972). An introduction to Bayesian inference and decision theory. New York: Holt, Rinehart and Winston.Google Scholar
  52. Winkler, R. L., & Murphy, A. H. (1968). “good” probability assessors. Journal of Applied Meteorology, 7, 751–758.CrossRefGoogle Scholar
  53. Yates, J. F. (1982). External correspondence: Decompositions of the mean probability score. Organizational Behavior and Human Performance, 30(1), 132–156.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Guillaume Hollard
    • 1
    • 2
  • Sébastien Massoni
    • 3
    Email author
  • Jean-Christophe Vergnaud
    • 2
    • 4
  1. 1.Département d’EconomieEcole PolytechniquePalaiseauFrance
  2. 2.CNRSParisFrance
  3. 3.QuBE - School of Economics and FinanceQueensland University of TechnologyBrisbaneAustralia
  4. 4.Centre d’Economie de la SorbonneUniversity of Paris 1ParisFrance

Personalised recommendations