# In search of good probability assessors: an experimental comparison of elicitation rules for confidence judgments

- 605 Downloads
- 15 Citations

## Abstract

In this paper, we use an experimental design to compare the performance of elicitation rules for subjective beliefs. Contrary to previous works in which elicited beliefs are compared to an objective benchmark, we consider a purely subjective belief framework (confidence in one’s own performance in a cognitive task and a perceptual task). The performance of different elicitation rules is assessed according to the accuracy of stated beliefs in predicting success. We measure this accuracy using two main factors: calibration and discrimination. For each of them, we propose two statistical indexes and we compare the rules’ performances for each measurement. The matching probability method provides more accurate beliefs in terms of discrimination, while the quadratic scoring rule reduces overconfidence and the free rule, a simple rule with no incentives, which succeeds in eliciting accurate beliefs. Nevertheless, the matching probability appears to be the best mechanism for eliciting beliefs due to its performances in terms of calibration and discrimination, but also its ability to elicit consistent beliefs across measures and across tasks, as well as its empirical and theoretical properties.

## Keywords

Belief elicitation Scoring rules Confidence Calibration Discrimination Incentives## Notes

### Acknowledgments

The authors are grateful for insightful comments by Karim N’Diaye, Steve Fleming, Thibault Gajdos and Peter Wakker; participants at the ESE Conference in Rotterdam, the EBIM Workshop in Paris, the LabSi Workshop in Sienna, the FUR XIV in Newcastle, the ESA 2010 in Copenhagen, the SABE 2010 in San Diego, the EMPG 2011 in Paris, and the North American ES Winter Meeting 2012 in Chicago.

## References

- Abdellaoui, M., Vossmann, F., & Weber, M. (2005). Choice-based elicitation and decomposition of decision weights for gains and losses under uncertainty.
*Management Science*,*51*(9), 1384–1399.CrossRefGoogle Scholar - Andersen, S., Fountain, J., Harrison, G., & Rutstrom, E. (2010). Estimating subjective probabilities. CEAR Working Paper.Google Scholar
- Armantier, O., & Treich, N. (2013). Eliciting beliefs: Proper scoring rules, incentives, stakes and hedging.
*European Economic Review*,*62*, 17–40.CrossRefGoogle Scholar - Arrow, K. J. (1951). Alternative approaches to the theory of choice in risk-taking situations.
*Econometrica*,*19*, 404–437.CrossRefGoogle Scholar - Baillon, A., & Bleichrodt, H. (2015). Testing ambiguity models through the measurement of probabilities for gains and losses.
*American Economic Journal: Microeconomics (forthcoming)*,*7*(2), 77–100.Google Scholar - Baillon, A., Cabantous, L., & Wakker, P. (2012). Aggregating imprecise or conflicting beliefs: An experimental investigation using modern ambiguity theories.
*Journal of Risk and Uncertainty*,*44*(2), 115–147.CrossRefGoogle Scholar - Baranski, J., & Petrusic, W. (1994). The calibration and resolution of confidence in perceptual judgments.
*Perception and Psychophysics*,*55*(4), 412–428.CrossRefGoogle Scholar - Becker, G., DeGroot, M., & Marschak, J. (1964). Measuring utility by a single-response sequential method.
*Behavioral Science*,*9*(3), 226–232.CrossRefGoogle Scholar - Biais, B., Hilton, D., Mazurier, K., & Pouget, S. (2005). Judgmental overconfidence, self monitoring, and trading performance in an experimental financial market.
*The Review of Economic Studies*,*72*(2), 287–312.CrossRefGoogle Scholar - Blavatskyy, P. (2009). Betting on own knowledge: Experimental test of overconfidence.
*Journal of Risk and Uncertainty*,*38*(1), 39–49.CrossRefGoogle Scholar - Brier, G. W. (1950). Verification of forecasts expressed in terms of probability.
*Monthly Weather Review*,*78*(1), 1–3.CrossRefGoogle Scholar - Camerer, C., & Lovallo, D. (1999). Overconfidence and excess entry: An experimental approach.
*The American Economic Review*,*89*(1), 306–318.CrossRefGoogle Scholar - Clark, J., & Friesen, L. (2009). Overconfidence in forecasts of own performance: An experimental study.
*The Economic Journal*,*119*(534), 229–251.CrossRefGoogle Scholar - Dimmock, S., Kouwenberg, R., & Wakker, P. (2011). Ambiguity attitudes and portfolio choice: Evidence from a large representative survey. Netspar Discussion Paper No 06/2011-054.Google Scholar
- Fleming, S., & Dolan, R. (2012). The neural basis of accurate metacognition.
*Philosophical Transactions of the Royal Society B*,*367*(1594), 1338–1349.CrossRefGoogle Scholar - Fleming, S. M., Weil, R. S., Nagy, Z., Dolan, R. J., & Rees, G. (2010). Relating introspective accuracy to individual differences in brain structure.
*Science*,*329*, 1541–1543.CrossRefGoogle Scholar - Galvin, S. J., Podd, J. V., Drga, V., & Whitmore, J. (2003). Type 2 tasks in the theory of signal detectability: Discrimination between correct and incorrect decisions.
*Psychonomic Bulletin and Review*,*10*, 843–876.CrossRefGoogle Scholar - Gneiting, T., & Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation.
*Journal of the American Statistical Association*,*102*(477), 359–378.CrossRefGoogle Scholar - Green, D. M., & Swets, J. A. (1966).
*Signal detection theory and psychophysics*. New York: Wiley.Google Scholar - Grether, D. (1992). Testing Bayes rule and the representativeness heuristic: Some experimental evidence.
*Journal of Economic Behavior and Organization*,*17*, 31–57.CrossRefGoogle Scholar - Hanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve.
*Radiology*,*143*, 29–36.CrossRefGoogle Scholar - Hao, L., & Houser, D. (2012). Belief elicitation in the presence of naive respondents: An experimental study.
*Journal of Risk and Uncertainty*,*44*(2), 161–180.CrossRefGoogle Scholar - Harvey, N. (1997). Confidence in judgment.
*Trends in Cognitive Sciences*,*1*(2), 78–82.CrossRefGoogle Scholar - Holt, C. (2006).
*Markets, games, and strategic behavior: Recipes for interactive learning*. Reading: Addison-Wesley.Google Scholar - Holt, C., & Smith, M. (2009). An update on Bayesian updating.
*Journal of Economic Behavior and Organization*,*69*(2), 125–134.CrossRefGoogle Scholar - Hossain, T., & Okui, R. (2013). The binarized scoring rule.
*The Review of Economic Studies*,*80*(3), 984–1001.CrossRefGoogle Scholar - Kadane, J. B., & Winkler, R. L. (1988). Separating probability elicitation from utilities.
*Journal of the American Statistical Association*,*83*(402), 357–363.CrossRefGoogle Scholar - Kaivanto, K. (2006). Informational rent, publicly known firm type, and ‘closeness’ in relationship finance.
*Economics Letters*,*91*(3), 430–435.CrossRefGoogle Scholar - Karni, E. (2009). A mechanism for eliciting probabilities.
*Econometrica*,*77*(2), 603–606.CrossRefGoogle Scholar - Kothiyal, A., Spinu, V., & Wakker, P. (2011). Comonotonic proper scoring rules to measure ambiguity and subjective beliefs.
*Journal of Multi-Criteria Decision Analysis*,*17*, 101–113.CrossRefGoogle Scholar - LaValle, I. H. (1978).
*Fundamentals of decision analysis*. New York: Holt, Rinehart and Winston.Google Scholar - Levitt, H. (1971). Transformed up-down methods in psychoacoustics.
*Journal of the Acoustical Society of America*,*49*, 467–477.CrossRefGoogle Scholar - Lichtenstein, S., & Fischhoff, B. (1977). Do those who know more also know more about how much they know? The calibration of probability judgments.
*Organizational Behavior and Human Performance*,*20*(7), 159–183.CrossRefGoogle Scholar - Lichtenstein, S., Fischhoff, B., & Phillips, L. (1982). Calibration of probabilities: The state of the art to 1980. In D. Kahneman, P. Slovic, & A. Tversky (Eds.),
*Judgment under uncertainty: Heuristic and biases*(pp. 306–334). Cambridge: Cambridge University Press.CrossRefGoogle Scholar - Massoni, S. (2009). A direct revelation mechanism for elicitating confidence in perceptual and cognitive tasks: An experimental study. Master’s Thesis, Université Paris 1.Google Scholar
- Massoni, S., Gajdos, T., & Vergnaud, J. C. (2014). Confidence measurement in the light of signal detection theory.
*Frontiers in Psychology*,*5*, 1455.CrossRefGoogle Scholar - McCurdy, L., Maniscalco, B., Metcalfe, J., Liu, K., de Lange, F., & Lau, H. (2013). Anatomical coupling between distinct metacognitive systems for memory and visual perception.
*The Journal of Neuroscience*,*33*(5), 1897–1906.CrossRefGoogle Scholar - Mobius, M., Niederle, M., Niehaus, P., & Rosenblat, T. (2011). Managing self-confidence: Theory and experimental evidence. NBER Working Paper No 17014.Google Scholar
- Murphy, A. H. (1972). Scalar and vector partitions of the probability score. Part I: Two-state situation.
*Journal of Applied Meteorology*,*11*, 273–282.CrossRefGoogle Scholar - Murphy, A. H. (1998). The early history of probability forecasts: Some extensions and clarifications.
*Weather and Forecasting*,*13*, 5–15.CrossRefGoogle Scholar - Nyarko, Y., & Schotter, A. (2002). An experimental study of belief learning using elicited beliefs.
*Econometrica*,*70*(3), 971–1005.CrossRefGoogle Scholar - Offerman, T., Sonnemans, J., Van de Kuilen, G., & Wakker, P. (2009). A truth-serum for non-Bayesian: Correcting proper scoring rules for risk attitudes.
*Review of Economic Studies*,*76*(4), 1461–1489.CrossRefGoogle Scholar - Palfrey, T., & Wang, S. (2009). On eliciting beliefs in strategic games.
*Journal of Economic Behavior and Organization*,*71*(2), 98–109.CrossRefGoogle Scholar - Raiffa, H. (1968).
*Decision analysis*. London: Addison-Wesley.Google Scholar - Rounis, E., Maniscalco, B., Rothwell, J. C., Passingham, R. E., & Lau, H. (2010). Theta-burst transcranial magnetic stimulation to the prefrontal cortex impairs metacognitive visual awareness.
*Cognitive Neuroscience*,*1*(3), 165–175.CrossRefGoogle Scholar - Schotter, A., & Trevino, I. (2014). Belief Elicitation in the Laboratory.
*Annual Review of Economics*,*6*, 103–128.CrossRefGoogle Scholar - Song, C., Kanai, R., Fleming, S., Weil, R., Schwarzkopf, D., & Rees, G. (2011). Relating inter-individual differences in metacognitive performance on different perceptual tasks.
*Consciousness and Cognition*,*20*(4), 1787–1792.CrossRefGoogle Scholar - Trautmann, S., & van de Kuilen, G. (2015). Belief elicitation: A horse race among truth serums.
*The Economic Journal*(forthcoming).Google Scholar - Wallsten, T. S., & Budescu, D. V. (1983). Encoding subjective probabilities: A psychological and psychometric review.
*Management Science*,*29*(2), 151–173.CrossRefGoogle Scholar - Winkler, R. L. (1972).
*An introduction to Bayesian inference and decision theory*. New York: Holt, Rinehart and Winston.Google Scholar - Winkler, R. L., & Murphy, A. H. (1968). “good” probability assessors.
*Journal of Applied Meteorology*,*7*, 751–758.CrossRefGoogle Scholar - Yates, J. F. (1982). External correspondence: Decompositions of the mean probability score.
*Organizational Behavior and Human Performance*,*30*(1), 132–156.CrossRefGoogle Scholar