, 17:211 | Cite as

Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds

  • Tilmann GneitingEmail author
  • Larissa I. Stanberry
  • Eric P. Grimit
  • Leonhard Held
  • Nicholas A. Johnson
Invited Paper


We discuss methods for the evaluation of probabilistic predictions of vector-valued quantities, that can take the form of a discrete forecast ensemble or a density forecast. In particular, we propose a multivariate version of the univariate verification rank histogram or Talagrand diagram that can be used to check the calibration of ensemble forecasts. In the case of density forecasts, Box’s density ordinate transform provides an attractive alternative. The multivariate energy score generalizes the continuous ranked probability score. It addresses both calibration and sharpness, and can be used to compare deterministic forecasts, ensemble forecasts and density forecasts, using a single loss function that is proper. An application to the University of Washington mesoscale ensemble points at strengths and deficiencies of probabilistic short-range forecasts of surface wind vectors over the North American Pacific Northwest.


Calibration Density forecast Ensemble postprocessing Exchangeability Forecast verification Probability integral transform Proper scoring rule Sharpness Rank histogram 

Mathematics Subject Classification (2000)

62H99 62P12 


  1. Anderson JL (1996) A method for producing and evaluating probabilistic forecasts from ensemble model integrations. J Climate 9:1518–1525 CrossRefGoogle Scholar
  2. Azzalini A, Genton MG (2008) Robust likelihood methods based on the skew-t and related distributions. Int Stat Rev 76:106–129 CrossRefzbMATHGoogle Scholar
  3. Bernardo JM (1979) Expected information as expected utility. Ann Stat 7:686–690 zbMATHCrossRefMathSciNetGoogle Scholar
  4. Berrocal VJ, Raftery AE, Gneiting T (2007) Combining spatial statistical and ensemble information in probabilistic weather forecasts. Mon Weather Rev 135:1386–1402 CrossRefGoogle Scholar
  5. Besag J, Green P, Higdon D, Mengersen K (1995) Bayesian computing and stochastic systems. Stat Sci 10:3–66 zbMATHCrossRefMathSciNetGoogle Scholar
  6. Bickel PJ (1969) A distribution free version of the Smirnov two sample test in the p-variate case. Ann Math Stat 40:1–23 zbMATHCrossRefGoogle Scholar
  7. Bickel PJ, Lehmann EL (1979) Descriptive statistics for nonparametric models IV. Spread. In: Jureckova J (ed) Contributions to statistics. Academia, Prague, pp 33–40 Google Scholar
  8. Box GEP (1980) Sampling and Bayes’ inference in scientific modelling and robustness. J R Stat Soc Ser A 143:383–425 zbMATHCrossRefMathSciNetGoogle Scholar
  9. Brockwell AE (2007) Universal residuals: a multivariate transformation. Stat Probab Lett 77:1473–1478 zbMATHCrossRefMathSciNetGoogle Scholar
  10. Bröcker J, Smith LA (2007) Scoring probabilistic forecasts: the importance of being proper. Weather Forecast 22:382–388 CrossRefGoogle Scholar
  11. Candille G, Talagrand O (2005) Evaluation of probabilistic prediction systems for a scalar variable. Q J R Meteorol Soc 131:2131–2150 CrossRefGoogle Scholar
  12. Clements MP (2005) Evaluating econometric forecasts of economic and financial variables. Palgrave Macmillan, Basingstroke, Hampshire Google Scholar
  13. Clements MP, Smith J (2000) Evaluating the forecast densities of linear and non-linear models: applications to output growth and unemployment. J Forecast 19:255–276 CrossRefGoogle Scholar
  14. Clements MP, Smith J (2002) Evaluating multivariate forecast densities: a comparison of two approaches. Int J Forecast 18:397–407 CrossRefGoogle Scholar
  15. Czado C, Gneiting T, Held L (2007) Predictive model assessment for count data. Tech Rep no 518, Dept of Statistics, University of Washington Google Scholar
  16. Dawid AP (1984) Statistical theory: the prequential approach. J R Stat Soc Ser A 147:278–292 zbMATHCrossRefMathSciNetGoogle Scholar
  17. Dawid AP, Sebastiani P (1999) Coherent dispersion criteria for optimal experimental design. Ann Stat 27:65–81 zbMATHCrossRefMathSciNetGoogle Scholar
  18. De Gooijer JG (2007) Power of the Neyman smooth test for evaluating multivariate forecast densities. J Appl Stat 34:371–381 CrossRefMathSciNetzbMATHGoogle Scholar
  19. Delle Monache L, Hacker JP, Zhou Y, Deng X, Stull RB (2006) Probabilistic aspects of meteorological and ozone regional ensemble forecasts. J Geophys Res 111:D24307. doi: 10.1029/2005JD006917 CrossRefGoogle Scholar
  20. Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13:253–263 CrossRefGoogle Scholar
  21. Diebold FX, Gunther TA, Tay AS (1998) Evaluating density forecasts: with applications to financial risk management. Int Econ Rev 39:863–883 CrossRefGoogle Scholar
  22. Diebold FX, Hahn J, Tay AS (1999) Multivariate density forecast evaluation and calibration in financial risk management: high-frequency returns on foreign exchange. Rev Econ Stat 81:661–673 CrossRefGoogle Scholar
  23. Eckel FA, Mass CF (2005) Aspects of effective short-range ensemble forecasting. Weather Forecast 20:328–350 CrossRefGoogle Scholar
  24. Friedman JH, Rafsky LC (1979) Multivariate generalizations of the Wald-Wolfowitz and Smirnov two-sample tests. Ann Stat 7:697–717 zbMATHCrossRefMathSciNetGoogle Scholar
  25. Genest C, Rivest LP (2001) On the multivariate probability integral transform. Stat Probab Lett 53:391–399 zbMATHCrossRefMathSciNetGoogle Scholar
  26. Gneiting T (2008) Editorial: probabilistic forecasting. J R Stat Soc Ser A 171:319–321 CrossRefMathSciNetGoogle Scholar
  27. Gneiting T, Raftery AE (2005) Weather forecasting with ensemble methods. Science 310:248–249 CrossRefGoogle Scholar
  28. Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102:359–378 zbMATHCrossRefMathSciNetGoogle Scholar
  29. Gneiting T, Larson K, Westrick K, Genton MG, Aldrich E (2006) Calibrated probabilistic forecasting at the Stateline wind energy center: the regime-switching space-time (RST) method. J Am Stat Assoc 101:968–979 zbMATHCrossRefMathSciNetGoogle Scholar
  30. Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J R Stat Soc Ser B 69:243–268 zbMATHCrossRefMathSciNetGoogle Scholar
  31. Good IJ (1971) Comment on ‘Measuring information and uncertainty’ by Robert J. Buehler. In: Godambe VP, Sprott DA (eds) Foundations of statistical inference. Holt, Rinehart and Winston, Toronto, pp 337–339 Google Scholar
  32. Gombos D, Hansen JA, Du J, McQueen J (2007) Theory and applications of the minimum spanning tree rank histogram. Mon Weather Rev 135:1490–1505 CrossRefGoogle Scholar
  33. Granger CWJ (2006) Preface: Some thoughts on the future of forecasting. Oxf Bull Econ Stat 67S:707–711 Google Scholar
  34. Grimit EP, Mass CF (2002) Initial results of a mesoscale short-range ensemble system over the Pacific Northwest. Weather Forecast 17:192–205 CrossRefGoogle Scholar
  35. Grimit EP, Gneiting T, Berrocal VJ, Johnson NA (2006) The continuous ranked probability score for circular variables and its application to mesoscale forecast ensemble verification. Q J R Meteorol Soc 132:2925–2942 CrossRefGoogle Scholar
  36. Hamill TM (1999) Hypothesis tests for evaluating numerical precipitation forecasts. Weather Forecast 14:155–167 CrossRefGoogle Scholar
  37. Hamill TM (2001) Interpretation of rank histograms for verifying ensemble forecasts. Mon Weather Rev 129:550–560 CrossRefGoogle Scholar
  38. Hamill TM, Colucci SJ (1997) Verification of Eta-RSM short-range ensemble forecasts. Mon Weather Rev 125:1312–1327 CrossRefGoogle Scholar
  39. Hersbach H (2000) Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather Forecast 15:559–570 CrossRefGoogle Scholar
  40. Huber PJ (1985) Projection pursuit. Ann Stat 13:435–475 zbMATHCrossRefMathSciNetGoogle Scholar
  41. Ishida I (2005) Scanning multivariate conditional densities with probability integral transforms. Center for Advanced Research in Finance, University of Tokyo, Working Paper F-045 Google Scholar
  42. Jolliffe IT (2007) Uncertainty and inference for verification measures. Weather Forecast 22:637–650 CrossRefGoogle Scholar
  43. Jolliffe IT, Stephenson DB (2003) Forecast verification: a practitioner’s guide in atmospheric science. Wiley, Chichester Google Scholar
  44. Judd K, Smith LA, Weisheimer A (2007) How good is an ensemble at capturing truth? Using bounding boxes for forecast evaluation. Q J R Meteorol Soc 133:1309–1325 CrossRefGoogle Scholar
  45. Kruskal JB (1956) On the shortest spanning subtree of a graph and the traveling salesman problem. Proc Am Math Soc 7:48–50 CrossRefMathSciNetGoogle Scholar
  46. Krzysztofowicz R (2004) Bayesian processor of output: a new technique for probabilistic weather forecasting. In: Abstracts of the 17th conference on probability and statistics in the atmospheric sciences. Extended abstract no 4.2 Google Scholar
  47. Malmberg A, Holst J, Holst U (2008) A real-time assimilation algorithm applied to near-surface ocean wind fields. Environmetrics 19:319–330 CrossRefMathSciNetGoogle Scholar
  48. Mass CF, Albright M, Ovens D, Steed R, MacIver M, Grimit E, Eckel T, Lamb B, Vaughan J, Westrick K, Storck P, Colman B, Hill C, Maykut N, Gilroy M, Ferguson SA, Yetter J, Sierchio JM, Bowman C, Stender R, Wilson R, Brown W (2003) Regional environmental prediction over the Pacific Northwest. Bull Am Meteorol Soc 84:1353–1366 CrossRefGoogle Scholar
  49. Matheson JE, Winkler RL (1976) Scoring rules for continuous probability distributions. Manag Sci 22:1087–1096 zbMATHCrossRefGoogle Scholar
  50. Murphy AH, Winkler RL (1992) Diagnostic verification of probability forecasts. Int J Forecast 7:435–455 CrossRefGoogle Scholar
  51. Murphy AH, Brown BG, Chen YS (1989) Diagnostic verification of temperature forecasts. Weather Forecast 4:485–501 CrossRefGoogle Scholar
  52. National Research Council (2006) Completing the forecast: characterizing and communicating uncertainty for better decisions using weather and climate forecasts. The National Academies Press, Washington Google Scholar
  53. O’Hagan A (2003) HSSS model criticism. In: Green PJ, Hjort NL, Richardson S (eds) Highly structured stochastic systems. Oxford University Press, Oxford, pp 423–444 Google Scholar
  54. Oja H (1983) Descriptive statistics for multivariate distributions. Stat Probab Lett 1:327–332 zbMATHCrossRefMathSciNetGoogle Scholar
  55. Oja H, Randles RH (2004) Multivariate nonparametric tests. Stat Sci 19:598–605 zbMATHCrossRefMathSciNetGoogle Scholar
  56. Palmer TN (2002) The economic value of ensemble forecasts as a tool for risk assessment: from days to decades. Q J R Meteorol Soc 128:747–774 CrossRefGoogle Scholar
  57. Pepe MS (2003) The statistical evaluation of medical tests for classification and prediction. Oxford statistical science series, vol 28. Oxford University Press, Oxford zbMATHGoogle Scholar
  58. Raftery AE, Gneiting T, Balabdaoui F, Polakowski M (2005) Using Bayesian model averaging to calibrate forecast ensembles. Mon Weather Rev 133:1155–1174 CrossRefGoogle Scholar
  59. Rife DL, Davis CA (2005) Verification of temporal variations in mesoscale numerical wind forecasts. Mon Weather Rev 133:3368–3381 CrossRefGoogle Scholar
  60. Rosenblatt M (1952) Remarks on a multivariate transformation. Ann Math Stat 23:470–472 zbMATHCrossRefMathSciNetGoogle Scholar
  61. Roulston MS, Smith LA (2003) Combining dynamical and statistical ensembles. Tellus Ser A 55:16–25 CrossRefGoogle Scholar
  62. Savage LJ (1971) Elicitation of personal probabilities and expectation. J Am Stat Assoc 66:783–801 zbMATHCrossRefMathSciNetGoogle Scholar
  63. Shaked M, Shanthikumar JG (1994) Stochastic orders and their applications. Academic, Boston zbMATHGoogle Scholar
  64. Shephard N (1994) Partial non-Gaussian state space. Biometrika 81:115–131 zbMATHCrossRefMathSciNetGoogle Scholar
  65. Smith LA (2001) Disentangling uncertainty and error: on the predictability of nonlinear systems. In: Mees AI (ed) Nonlinear dynamics and statistics. Birkhäuser, Boston, pp 31–64 Google Scholar
  66. Smith LA, Hansen JA (2004) Extending the limits of ensemble forecast verification with the minimum spanning tree histogram. Mon Weather Rev 132:1522–1528 CrossRefGoogle Scholar
  67. Stephenson DB, Doblas-Reyes FJ (2000) Statistical methods for interpreting Monte Carlo forecasts. Tellus Ser A 52:300–322 CrossRefGoogle Scholar
  68. Stigler SM (1975) The transition from point to distribution estimation. Bull Int Stat Inst 46:332–340 MathSciNetGoogle Scholar
  69. Talagrand O, Vautard R, Strauss B (1997) Evaluation of probabilistic prediction systems. In: Proceedings of a workshop held at ECMWF on predictability, 20–22 October 1997. European Centre for Medium-Range Weather Forecasts, Reading, pp 1–25 Google Scholar
  70. Timmermann A (2000) Density forecasting in economics and finance. J Forecast 19:231–234 CrossRefGoogle Scholar
  71. Weisheimer A, Smith LA, Judd K (2005) A new view of seasonal forecast skill: bounding boxes from the DEMETER ensemble forecasts. Tellus Ser A 57:265–279 CrossRefGoogle Scholar
  72. Wilks DS (2002) Smoothing forecast ensembles with fitted probability distributions. Q J R Meteorol Soc 128:2821–2836 CrossRefGoogle Scholar
  73. Wilks DS (2004) The minimum spanning tree histogram as a verification tool for multidimensional ensemble forecasts. Mon Weather Rev 132:1329–1340 CrossRefGoogle Scholar
  74. Wilks DS (2006) Statistical methods in the atmospheric sciences, 2nd edn. Elsevier Academic, Amsterdam Google Scholar
  75. Wilson LJ, Burrows WR, Lanzinger A (1999) A strategy for verification of weather element forecasts from an ensemble prediction system. Mon Weather Rev 127:956–970 CrossRefGoogle Scholar
  76. Winkler RL (1977) Rewarding expertise in probability assessment. In: Jungermann H, de Zeeuw G (eds) Decision making and change in human affairs. D. Reidel, Dordrecht, pp 127–140 Google Scholar
  77. Winkler RL (1996) Scoring rules and the evaluation of probabilities. Test 5:1–60 zbMATHCrossRefMathSciNetGoogle Scholar
  78. Zuo Y, Serfling R (2000) General notions of statistical depth functions. Ann Stat 28:461–482 zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2008

Authors and Affiliations

  • Tilmann Gneiting
    • 1
    Email author
  • Larissa I. Stanberry
    • 1
  • Eric P. Grimit
    • 2
  • Leonhard Held
    • 3
  • Nicholas A. Johnson
    • 4
  1. 1.Department of StatisticsUniversity of WashingtonSeattleUSA
  2. 2.3Tier Environmental Forecast GroupSeattleUSA
  3. 3.Institut für Sozial- und Präventivmedizin, Abteilung BiostatistikUniversität ZürichZürichSwitzerland
  4. 4.Department of StatisticsStanford UniversityStanfordUSA

Personalised recommendations