Skip to main content
Log in

Scoring rules and the evaluation of probabilities

  • Published:
Test Aims and scope Submit manuscript

Summary

In Bayesian inference and decision analysis, inferences and predictions are inherently probabilistic in nature. Scoring rules, which involve the computation of a score based on probability forecasts and what actually occurs, can be used to evaluate probabilities and to provide appropriate incentives for “good” probabilities. This paper review scoring rules and some related measures for evaluating probabilities, including decompositions of scoring rules and attributes of “goodness” of probabilites, comparability of scores, and the design of scoring rules for specific inferential and decision-making problems

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bayarri, M. J. and DeGroot, M. H. (1988) Gaining weight. A Bayesian approach.Bayesian Statistics 3 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.), Oxford, University Press, 25–44, (with discussion).

    Google Scholar 

  • Bernardo, J. M. and Bermúdez, J. D. (1985) The choice of variables in probabilistic classification.Bayesian Statistics 2 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.), Amsterdam: North-Holland, 67–81 (with discussion).

    Google Scholar 

  • Bernardo, J. M. and Smith, A. F. M. (1994)Bayesian Theory. Chichester: Wiley

    MATH  Google Scholar 

  • Blattenberger, G. and Lad, F (1985) Separating the Brier score into calibration and refinement components: A graphical exposition.Amer. Statist. 39, 26–32.

    Article  Google Scholar 

  • Brier, G. W. (1950) Verification of forecasts expressed in terms of probability.Monthly Weather Review 78, 1–3.

    Google Scholar 

  • Clemen, R. T. (1996)Making Hard Decisions. 2nd Edition, Belmont, CA: Duxbury Press

    Google Scholar 

  • Cooke, R. M. (1991)Experts in Uncertainty: Opinion and Subjective Probability in Science, Oxford: University Press.

    Google Scholar 

  • Dawid, A. P. (1982) The well-calibrated Bayesian.J. Amer. Statist. Assoc. 77, 605–613.

    Article  MathSciNet  MATH  Google Scholar 

  • de Finetti, B. (1937). La prévision: ses lois logiques, ses sources subjectives.Annales de l’Institut Henri Poincaré 7, 1–68. Translated as “Foresight: Its logical laws, its subjective sources” inStudies in Subjective Probability (H. E. Kyburg and H. E. Smokler, eds.), New York: Wiley, 1964, 93–158.

    MATH  Google Scholar 

  • de Finetti, B. (1962) Does it make sense to speak of “good probability appraisers”?The Scientist Speculates: An Anthology of Partly-Baked Ideas (I. J. Good, ed.). New York: Wiley, 357–363.

    Google Scholar 

  • de Finetti, B. (1965) Methods for discriminating levels of partial knowledge concerning a test item.British J. of Math. and Stat. Psych. 18, 87–123.

    Google Scholar 

  • DeGroot, M. H. and Eriksson, E. A. (1985) Probability forecasting, stochastic dominance, and the Lorenz curve,Bayesian Statistics 2 (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.), Amsterdam: North-Holland, 99–118, (with discussion).

    Google Scholar 

  • DeGroot, M. H. and Fienberg S. E. (1982) Assessing probability assessors: Calibration and refinement.Statistical Decision Theory and Related Topics III 1 (S. S. Gupta and J. O. Berger, eds.), New York: Academic Press, 291–314.

    Google Scholar 

  • DeGroot, M. H. and Fienberg S. E. (1983) The comparison and evaluation of forecasters.The Statistician 32, 14–22.

    Article  Google Scholar 

  • Epstein, E. S. (1969). A scoring system for probability forecasts of ranked categories.J. Appl. Meteorology 8, 985–987.

    Article  Google Scholar 

  • Good, I. J. (1952) Rational decisions.J. Roy. Statist. Soc. B 11, 107–114.

    MathSciNet  Google Scholar 

  • Howard, R. A. and Matheson, J. E. (1983)The Principles and Applications of Decision Analysis (2 volumes), Palo Alto, CA: Strategic Decisions Group.

    Google Scholar 

  • Kadane, J. B. and Winkler, R. L. (1988) Separating probability elicitation from utilities.J. Amer. Statist. Assoc. 83, 357–363.

    Article  MathSciNet  Google Scholar 

  • Keeney, R. L. and Raiffa, H. (1976).Decisions with Multiple Objectives: Preferences and Value Tradeoffs, New York: Wiley.

    Google Scholar 

  • Kenney, R. L. and von Winterfeldt, D. (1991) Eliciting probabilities from experts in complex technical problems.IEEE Trans. Eng. Management 38, 191–201.

    Article  Google Scholar 

  • Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions.Manag. Sci. 22, 1087–1096.

    MATH  Google Scholar 

  • McCarthy, J. (1956). Measures of the value of information.Proc. Nat. Acad. Sciences 42, 654–655.

    Article  MATH  Google Scholar 

  • Morgan, M. G. and Henrion M. (1990)Uncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge: University Press.

    Google Scholar 

  • Murphy, A. H. (1972a). Scalar and vector partitions of the probability score. Part I. Two-state situation.J. Appl. Meteorology 11, 273–282.

    Article  Google Scholar 

  • Murphy, A. H. (1972b) Scalar and vector partitions of the probability score. Part II. N-state situation.J. Appl. Meteorology 11, 1183–1192.

    Article  Google Scholar 

  • Murphy, A. H. (1973a) Hedging and skill scores for probability forecasts.J. Appl. Meteorology 12, 215–223.

    Article  Google Scholar 

  • Murphy, A. H. (1973b). A new vector, partition of the probability score.J. Appl. Meteorology,12, 595–600.

    Article  Google Scholar 

  • Murphy, A. H. (1974). A sample skill score for probability forecasts.Monthly Weather Review 102, 48–55.

    Article  Google Scholar 

  • Murphy, A. H. (1977). The value of climatological, categorical, and probabilistic forecasts in the cost-loss ratio situation.Monthly Weather Review 105, 803–816.

    Article  Google Scholar 

  • Murphy, A. H. (1993). What is a good forecasts? An essay on the nature of goodness in weather forecasting.Weather and Forecasting 8, 281–293.

    Article  Google Scholar 

  • Murphy, A. H. (1996). General decompositions of MSE-based skill scores: Measures of some basic aspects of forecast quality.Monthly Weather Review 124, (to appear).

  • Murphy, A. H. and Daan, H. (1985). Forecast evaluation.Probability, Statistics, and Decision Making in the Atmospheric Sciences (A. H. Murphy and R. W. Katz, eds.), Boulder, CO: Westview Press, 379–437.

    Google Scholar 

  • Murphy, A. H. and Winkler, R. L. (1984). Probability forecasting in meteorology.J. Amer. Statist. Assoc. 79, 489–500.

    Article  Google Scholar 

  • Murphy, A. H. and Winkler, R. L. (1987). A general framework for forecast verification.Monthly Weather Review 115, 1330–1338.

    Article  Google Scholar 

  • Murphy, A. H. and Winkler, R. L. (1992). Diagnostic verification of probability forecasts.Int. J. Forecasting 7, 435–455.

    Article  Google Scholar 

  • Pearl, J. (1978). An economic basis for certain methods of evaluating probabilistic forecasts.Int. J. Man-Machine Studies 10, 175–183.

    Article  Google Scholar 

  • Raiffa, H. (1968).Decision Analysis, Reading, MA: Addison-Wesley.

    MATH  Google Scholar 

  • Roberts, H. V. (1965). Probabilistic prediction.J. Amer. Statist. Assoc 60, 50–62.

    Article  MathSciNet  MATH  Google Scholar 

  • Sanders, F. (1963). On subjective probability forecasting.J. Appl. Meteorology 2, 191–201.

    Article  Google Scholar 

  • Sarin, R. K. and Winkler, R. L. (1980). Performance-based incentive plans.Manag. Sci. 26, 1131–1144.

    MathSciNet  Google Scholar 

  • Savage, L. J. (1954).The Foundations of Statistics. New York: Wiley.

    MATH  Google Scholar 

  • Savage, L. J. (1971). Elicitation of personal probabilities and expectations.J. Amer. Statist. Assoc. 66, 783–801.

    Article  MathSciNet  MATH  Google Scholar 

  • Schervish, M. J. (1989). A general method for comparing probability assessors.Ann. Statist. 17, 1856–1879.

    MathSciNet  MATH  Google Scholar 

  • Shuford, E. H., Albert, A., and Massengill, H. E. (1966). Admissible probability measurement procedures.Psychometrika 31, 125–145.

    Article  MATH  Google Scholar 

  • Spetzler, C. S. and Staël von Holstein, C.-A. S. (1975). Probability encoding in decision analysis.Manag. Sci. 22, 340–358.

    Google Scholar 

  • Staël von Holstein, C.-A. S. (1970).Assessment and Evaluation of Subjective Probability Distributions. Stockholm: ERI, Stockholm School of Economics.

    Google Scholar 

  • Wallsten, T. S. and Budescu, D. V. (1983). Encoding subjective probabilities: A psychological and psychometric review.Manag. Sci. 29, 151–173.

    Google Scholar 

  • Wilks, D. S. (1995).Statistical Methods in the Atmospheric Sciences. New York: Academic Press.

    Google Scholar 

  • Winkler, R. L. (1967a). The assessment of prior distribution in Bayesian analysis.J. Amer. Statist. Assoc. 62, 776–800.

    Article  MathSciNet  Google Scholar 

  • Winkler, R. L. (1967b). The quantification of judgment: Some methodological suggestions.J. Amer. Statist. Assoc. 62, 1105–1120.

    Article  MathSciNet  Google Scholar 

  • Winkler, R. L. (1969). Scoring rules and the evaluation of probability assessors.J. Amer. Statist. Assoc. 64, 1073–1078.

    Article  Google Scholar 

  • Winkler, R. L. (1986). On “good probability appraisers”.Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti (P. Goel and A. Zellner, eds.), Amsterdam: North-Holland, 265–278.

    Google Scholar 

  • Winkler, R. L. (1994). Evaluating probabilities: Asymmetric scoring rules.Manag. Sci. 40, 1395–1405.

    MATH  Google Scholar 

  • Winkler, R. L. and Murphy, A. H. (1968). “Good” probability assessorsJ. Appl. Meteorology 7, 751–758.

    Article  Google Scholar 

  • Winkler, R. L., and Poses, R. M. (1993). Evaluating and combining physicians’ probabilities of survival in an intensive care unit.Manag. Sci. 39, 1526–1543.

    Google Scholar 

  • Yates, J. F. (1982) External correspondence: Decompositions of the mean probability score.Organizational Behavior and Human Performance 30, 132–156.

    Article  Google Scholar 

  • Yates, J. F. (1988). Analyzing the accuracy of probability judgments for multiple events: An extension of the covariance decomposition.Organizational Behavior and Human Decision Processes 41, 281–299.

    Article  Google Scholar 

  • Yates, J. F. and Curley, S. P. (1985). Conditional distribution analyses of probabilistic forecasts.J. Forecasting 4, 61–73.

    Google Scholar 

Additional References in the Discussion

  • Berger, J. (1994). An overview of robust Bayesian analysis.Test 3, 5–124 (with discussion).

    MathSciNet  MATH  Google Scholar 

  • Berger, J. O. and Wolpert, R. L. (1984).The Likelihood Principle. Lecture notesmonograph series. IMS: Hayward.

    MATH  Google Scholar 

  • Bernardo, J. M. (1979). Expected information as expected utility.Ann. Statist. 7, 686–690.

    MathSciNet  MATH  Google Scholar 

  • Bernardo, J. M. (1987). Approximations in statistics from a decision-theoretical view-point.Probability and Bayesian Statistics (R. Viertl, ed.). New York: Plenum, 53–60.

    Google Scholar 

  • Blattenberger, G. (1996). Money demand revisited: an operational subjective approach.J. Appl. Econometrics 11, 153–168

    Article  Google Scholar 

  • Blattenberger, G. and Lad, F. (1988). An application of operational-subjective statistical methods to rational expectations,J. Bus. Econ. Statistics 6, 453–477 (with discussion).

    Article  Google Scholar 

  • Cervera, J. L. and Muñoz, J. (1996). Proper scoring rules for fractiles.Bayesian Statistics 5 (J. M. Bernardo, J. O. Berger, A. P. Dawid and A. F. M. Smith, eds.): Oxford: University Press.

    Google Scholar 

  • Chaloner, K., Church, T., Louis, T. and Matts, J. (1993). Graphical elicitation of a prior distribution for a clinical trial.The Statistician 41, 342–353.

    Google Scholar 

  • Cooke, R. (1991).Experts in Uncertainty. Oxford: University Press.

    Google Scholar 

  • Dawid, A. P. (1986). Probability forecasting.Encyclopedia of Statistical Sciences 7 (S. Kotz, N. L. Johnson and C. B. Read, eds.). New York: Wiley, 210–218.

    Google Scholar 

  • Dawid, A. P., DeGroot, M. H. and Mortera, J. (1995). Coherent combination of experts’ opinions.Test 4, 263–313 (with discussion).

    MathSciNet  MATH  Google Scholar 

  • de Finetti, B. (1963). Lá décision et les probabilitiés.Rev. Roumaine Math. Pures Appl. 7, 405–413.

    Google Scholar 

  • de Finetti, B. (1964). Probabilità subordinate e teoria delle decisioni.Rendiconti Matematica 23, 128–131. Reprinted as ‘Conditional probabilities and decision theory’ in 1972,Probability, Induction and Statistics New York: Wiley, 13–18.

    MATH  Google Scholar 

  • Eaton, M. L. (1992). A statistical diptych: admissible inferences, recurrence of symmetric Markov chains.Ann. Statist. 20, 1147–1179.

    MathSciNet  MATH  Google Scholar 

  • Edwards, W. and von Winterfeldt, D. (1986).Decision Analysis and Behavioral Research. Cambridge: University Press.

    Google Scholar 

  • Fudenberg, D. and Tirole, J. (1991).Game Theory. Cambridge: University Press.

    Google Scholar 

  • Hadley, G. and Kemp, M. C. (1971).Variational Methods in Economics. Amsterdam: North-Holland.

    MATH  Google Scholar 

  • Harsanyi, J. (1967). Games with incomplete information played by ‘Bayesian’ players.Manag. Sci. 14, 159–182; 320–334; 486–502.

    MathSciNet  MATH  Google Scholar 

  • Hirshleifer, J. and Riley, J. G. (1992).The Analytics of Uncertainty and Information. Cambridge: University Press.

    Google Scholar 

  • Kadane, J. B. (1993). Several Bayesians: a review.Test 2, 1–32.

    Article  MathSciNet  MATH  Google Scholar 

  • Kadane, J. B., Dickey, J. M., Winkler, R. L., Smith, W. S. and Peters, S. C. (1980). Interactive elicitation of opinion for a normal linear model.J. Amer. Statist. Assoc. 75, 845–854.

    Article  MathSciNet  Google Scholar 

  • Katz, R. W., Murphy, A. H. and Winkler, R. L. (1982). Assessing the value of frost forecasts to orchardists: A dynamic decision-making approach.J. Appl. Meteor. 21, 518–531.

    Article  Google Scholar 

  • Krzysztofowicz, R. (1992). Bayesian correlation score: A utilitarian measure of forecast skill.Mon. Wea. Rev. 120, 208–219.

    Article  Google Scholar 

  • Lindley, D. V. (1956). On a measure of information provided by an experiment.Ann. Math. Statist. 27, 986–1005.

    MathSciNet  MATH  Google Scholar 

  • Lindley, D. V. (1982). Scoring rules and the inevitability of probability.Internat. Statist. Rev. 50, 1–26 (with discussion).

    Article  MathSciNet  MATH  Google Scholar 

  • McCloskey, D. and Ziliak, S. (1996). The standard error of regressions,J. Economic Literature 34(1), 97–114.

    Google Scholar 

  • Murphy, A. H. (1970). The ranked probability score and the probability score: A comparison.Mon. Wea. Rev. 98, 917–924.

    Google Scholar 

  • Murphy, A. H. (1991). Forecast verification: Its complexity and dimensionality.Mon. Wea. Rev. 119, 1590–1601.

    Article  Google Scholar 

  • Murphy, A. H. (1995). A coherent method of stratification within a general framework for forecast verification.Mon. Wea. Rev. 123, 1582–1588.

    Article  Google Scholar 

  • Murphy, A. H. (1996). Forecast verification.Economic Value of Weather and Climate Forecasts (R. W. Katz and A. H. Murphy, eds.). Cambridge: University Press, (to appear).

    Google Scholar 

  • Murphy, A. H. and Daan, H. (1984). Impacts of feedback and experience on the quality of subjective probability forecasts: Comparison of results from the first and second years of the Zierikzee experiment.Mon. Wea. Rev. 112, 413–423.

    Article  Google Scholar 

  • Murphy, A. H. and Ehrendorfer, M. (1996).Probability forecasting and probability forecasts. Corvallis, Oregon: Prediction and Evaluation Systems (manuscript).

    Google Scholar 

  • Murphy, A. H. and Wilks, D. S. (1996). Statistical models in forecast verification: A case study of precipitation probability forecasts.13th Conference on Probability and Statistics in the Atmospheric Sciences. American Meteorology Society, 218–223.

  • Pearl, J. (1988).Probabilistic Reasoning in Intelligent Systems. San Mateo: Morgan Kaufmann.

    Google Scholar 

  • Pratt, J. W. and Zeckhauser, R. J. (eds.) (1985).Principals and Agents: The Structure of Business. Boston: Harvard Business School Press.

    Google Scholar 

  • Rubin, H. (1987). A weak system of axioms for ‘rational’ behavior and the non-separability of utility from prior.Statistics and Decisions 5, 47–58.

    MathSciNet  MATH  Google Scholar 

  • Schervish, M. J. (1995).Theory of Statistics, New York: Springer.

    MATH  Google Scholar 

  • Spiegelhalter, D. J., Dawid, A. P., Larutzen, S. L. and Cowell, R. G. (1993). Bayesian analysis in expert systems.Statist. Sci. 8, 219–246.

    MathSciNet  MATH  Google Scholar 

  • Staël von Holstein, C.-A. S. and Murphy, A. H. (1978). The family of quadratic scoring rules.Mon. Wea. Rev. 106, 917–924.

    Article  Google Scholar 

  • West, M. (1988). Modelling expert opinion.Bayesian Statistics 3 (J. M. Bernardo, M. H. DeGrott, D. V. Lindley and A. F. M. Smith, eds.). Oxford: University Press, 493–508 (with discussion).

    Google Scholar 

  • Winkler, R. L. (1986). Expert resolution.Manag. Sci. 32, 298–303.

    Google Scholar 

  • Winkler, R. L., Smith, W. S. and Kulkarni, R. B. (1978). Adaptive forecasting models based on predictive distributions.Manag. Sci. 24, 977–986.

    Article  MATH  Google Scholar 

  • Yates, J. F. (1994). Subjective probability accuracy analysis.Subjective Probability (G. Wright and P. Ayton, eds.). Chichester: Wiley, 381–410.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

Read before the Spanish Statistical Society at a meeting organized by the Universitat de València on Tuesday, April 23, 1996

Rights and permissions

Reprints and permissions

About this article

Cite this article

Winkler, R.L., Muñoz, J., Cervera, J.L. et al. Scoring rules and the evaluation of probabilities. Test 5, 1–60 (1996). https://doi.org/10.1007/BF02562681

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02562681

Keywords

Navigation