, Volume 23, Issue 4, pp 787–805 | Cite as

Calibration tests for count data

  • Wei WeiEmail author
  • Leonhard Held
Original Paper


Calibration, the statistical consistency of forecast distributions and observations, is a central requirement for probabilistic predictions. Calibration of continuous forecasts has been widely discussed, and significance tests are commonly used to detect whether a prediction model is miscalibrated. However, calibration tests for discrete forecasts are rare, especially for distributions with unlimited support. In this paper, we propose two types of calibration tests for count data: tests based on conditional exceedance probabilities and tests based on proper scoring rules. For the latter, three scoring rules are considered: the ranked probability score, the logarithmic score and the Dawid-Sebastiani score. Simulation studies show that all the different tests have good control of the type I error rate and sufficient power under miscalibration. As an illustration, we apply the methodology to weekly data on meningoccocal disease incidence in Germany, 2001–2006. The results show that the test approach is powerful in detecting miscalibrated forecasts.


Calibration test Count data Predictive distribution Proper scoring rules 

Mathematics Subject Classification (2000)

62M20 Prediction 



We thank two referees for helpful comments and suggestions. Financial support by the Swiss National Science Foundation (SNF) is gratefully acknowledged.

Supplementary material

11749_2014_380_MOESM1_ESM.pdf (97 kb)
Supplementary material 1 (pdf 97 KB)


  1. Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3CrossRefGoogle Scholar
  2. Christoffersen PF (1998) Evaluating interval forecasts. Int Econ Rev 39(4):841–862CrossRefMathSciNetGoogle Scholar
  3. Corradi V, Swanson NR (2006) Predictive density and conditional confidence interval accuracy tests. J Econ 135(1):187–228CrossRefMathSciNetGoogle Scholar
  4. Cox DR (1958) Two further applications of a model for binary regression. Biometrika 45:562–565CrossRefzbMATHGoogle Scholar
  5. Czado C, Gneiting T, Held L (2009) Predictive model assessment for count data. Biometrics 65:1254–1261CrossRefzbMATHMathSciNetGoogle Scholar
  6. Dawid AP (1984) Statistical theory: the prequential appoach. J Royal Stat Soc Ser A 147:278–292CrossRefzbMATHMathSciNetGoogle Scholar
  7. Dawid AP, Sebastiani P (1999) Coherent dispersion criteria for optimal experimental design. Ann Stat 27:65–81CrossRefzbMATHMathSciNetGoogle Scholar
  8. DeGroot M, Schervish M (2012) Probability and statistics, 4th edn. Addison-Wesley, BostonGoogle Scholar
  9. Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263Google Scholar
  10. Diebold FX, Gunther TA, Tay AS (1998) Evaluating density forecasts with applications to financial risk management. Int Econ Rev 39(4):863–883Google Scholar
  11. Elsner JB, Jagger TH (2006) Prediction models for annual US hurricane counts. J Clim 19(12):2935–2952CrossRefGoogle Scholar
  12. Epstein ES (1969) A scoring system for probability forecasts of ranked categories. J Appl Meteorol 8:985–987CrossRefGoogle Scholar
  13. Farrington CP, Andrews NJ, Beale AD, Catchpole MA (1996) A statistical algorithm for the early detection of outbreaks of infectious disease. J Royal Stat Soc Ser A 159:547–563CrossRefzbMATHMathSciNetGoogle Scholar
  14. Frühwirth-Schnatter S, Frühwirth R, Held L, Rue H (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19(4):479–492CrossRefMathSciNetGoogle Scholar
  15. Gneiting T (2008) Editorial: Probabilistic forecasting. J Roy Statist Soc Ser A 171(2), pp. 319–321. doi: 10.1111/j.1467-985X.2007.00522.x
  16. Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J Royal Stat Soc Ser B 69:243–268CrossRefzbMATHMathSciNetGoogle Scholar
  17. Gneiting T, Stanberry LI, Grimit EP, Held L, Johnson NA (2008) Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17(2):211–235CrossRefzbMATHMathSciNetGoogle Scholar
  18. Good IJ (1952) Rational decisions. J Royal Stat Soc Ser B 14:107–114MathSciNetGoogle Scholar
  19. Harvey DI, Leybourne SJ, Newbold P (1998) Tests for forecast encompassing. J Bus Econ Stat 16(2):254–259Google Scholar
  20. Heisterkamp SH, Dekkers AL, Heijne JC (2006) Automated detection of infectious disease outbreaks: hierarchical time series models. Stat Med 25(24):4179–4196CrossRefMathSciNetGoogle Scholar
  21. Held L, Paul M (2012) Modeling seasonality in space-time infectious disease surveillance data. Biom J 54(6):824–843CrossRefzbMATHMathSciNetGoogle Scholar
  22. Held L, Höhle M, Hofmann M (2005) A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Model 5:187–199Google Scholar
  23. Held L, Hofmann M, Höhle M, Schmid V (2006) A two-component model for counts of infectious diseases. Biostatistics 7(3):422–437CrossRefzbMATHGoogle Scholar
  24. Held L, Rufibach K, Balabdaoui F (2010) A score regression approach to assess calibration of continuous probabilistic predictions. Biometrics 66(4):1295–1305CrossRefzbMATHMathSciNetGoogle Scholar
  25. Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  26. Katti S (1960) The moments of the absolute difference and the absolute deviation of distributions. Ann Math Stat 31:78–85CrossRefzbMATHMathSciNetGoogle Scholar
  27. Knessl C (1998) Integral representations and asymptotic expansions for Shannon and Renyi entropies. Appl Math Lett 11(2):69–74CrossRefMathSciNetGoogle Scholar
  28. Manitz J, Höhle M (2013) Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biom J.Google Scholar
  29. Mason S, Galpin J, Goddard L, Graham N, Rajartnam B (2007) Conditional exceedance probabilities. Mon Weather Rev 135(2):363–372CrossRefGoogle Scholar
  30. McCabe B, Martin G (2005) Bayesian predictions of low count time series. Int J Forecast 21(2):315–330CrossRefMathSciNetGoogle Scholar
  31. McCabe BP, Martin GM, Harris D (2011) Efficient probabilistic forecasts for counts. J Royal Stat Soc Ser B (Stat Methodol) 73(2):253–272CrossRefMathSciNetGoogle Scholar
  32. Murphy AH, Winkler RL (1987) A general framework for forecast verification. Mon Weather Rev 115:1330–1338CrossRefGoogle Scholar
  33. Nelson K, Leroux B (2006) Statistical models for autocorrelated data. Stat Med 25:1413–1430CrossRefMathSciNetGoogle Scholar
  34. Noufaily A, Enki DG, Farrington P, Garthwaite P, Andrews N, Charlett A (2013) An improved algorithm for outbreak detection in multiple surveillance systems. Stat Med 32(7):1206–1222CrossRefMathSciNetGoogle Scholar
  35. Paul M, Held L, Toschke A (2008) Multivariate modelling of infectious disease surveillance data. Stat Med 27:6250–6267CrossRefMathSciNetGoogle Scholar
  36. Smith JQ (1985) Diagnostic checks of non-standard time series models. J Forecast 4:283–291CrossRefGoogle Scholar
  37. Spiegelhalter DJ (1986) Probabilistic prediction in patient management. Stat Med 5:421–433CrossRefGoogle Scholar
  38. Steyerberg E (2009) Clinical prediction models. Springer, New YorkCrossRefzbMATHGoogle Scholar
  39. Winkelmann R (2008) Econometric analysis of count data, 5th edn. Springer, New YorkGoogle Scholar
  40. Winkler RL (1996) Scoring rules and the evaluation of probabilities. Test 5(1):1–60CrossRefzbMATHMathSciNetGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2014

Authors and Affiliations

  1. 1.Division of Biostatistics, Institute of Social and Preventive MedicineUniversity of ZurichZurichSwitzerland

Personalised recommendations