Abstract
Calibration, the statistical consistency of forecast distributions and observations, is a central requirement for probabilistic predictions. Calibration of continuous forecasts has been widely discussed, and significance tests are commonly used to detect whether a prediction model is miscalibrated. However, calibration tests for discrete forecasts are rare, especially for distributions with unlimited support. In this paper, we propose two types of calibration tests for count data: tests based on conditional exceedance probabilities and tests based on proper scoring rules. For the latter, three scoring rules are considered: the ranked probability score, the logarithmic score and the Dawid-Sebastiani score. Simulation studies show that all the different tests have good control of the type I error rate and sufficient power under miscalibration. As an illustration, we apply the methodology to weekly data on meningoccocal disease incidence in Germany, 2001–2006. The results show that the test approach is powerful in detecting miscalibrated forecasts.
Similar content being viewed by others
References
Brier GW (1950) Verification of forecasts expressed in terms of probability. Mon Weather Rev 78:1–3
Christoffersen PF (1998) Evaluating interval forecasts. Int Econ Rev 39(4):841–862
Corradi V, Swanson NR (2006) Predictive density and conditional confidence interval accuracy tests. J Econ 135(1):187–228
Cox DR (1958) Two further applications of a model for binary regression. Biometrika 45:562–565
Czado C, Gneiting T, Held L (2009) Predictive model assessment for count data. Biometrics 65:1254–1261
Dawid AP (1984) Statistical theory: the prequential appoach. J Royal Stat Soc Ser A 147:278–292
Dawid AP, Sebastiani P (1999) Coherent dispersion criteria for optimal experimental design. Ann Stat 27:65–81
DeGroot M, Schervish M (2012) Probability and statistics, 4th edn. Addison-Wesley, Boston
Diebold FX, Mariano RS (1995) Comparing predictive accuracy. J Bus Econ Stat 13(3):253–263
Diebold FX, Gunther TA, Tay AS (1998) Evaluating density forecasts with applications to financial risk management. Int Econ Rev 39(4):863–883
Elsner JB, Jagger TH (2006) Prediction models for annual US hurricane counts. J Clim 19(12):2935–2952
Epstein ES (1969) A scoring system for probability forecasts of ranked categories. J Appl Meteorol 8:985–987
Farrington CP, Andrews NJ, Beale AD, Catchpole MA (1996) A statistical algorithm for the early detection of outbreaks of infectious disease. J Royal Stat Soc Ser A 159:547–563
Frühwirth-Schnatter S, Frühwirth R, Held L, Rue H (2009) Improved auxiliary mixture sampling for hierarchical models of non-Gaussian data. Stat Comput 19(4):479–492
Gneiting T (2008) Editorial: Probabilistic forecasting. J Roy Statist Soc Ser A 171(2), pp. 319–321. doi:10.1111/j.1467-985X.2007.00522.x
Gneiting T, Balabdaoui F, Raftery AE (2007) Probabilistic forecasts, calibration and sharpness. J Royal Stat Soc Ser B 69:243–268
Gneiting T, Stanberry LI, Grimit EP, Held L, Johnson NA (2008) Assessing probabilistic forecasts of multivariate quantities, with an application to ensemble predictions of surface winds. Test 17(2):211–235
Good IJ (1952) Rational decisions. J Royal Stat Soc Ser B 14:107–114
Harvey DI, Leybourne SJ, Newbold P (1998) Tests for forecast encompassing. J Bus Econ Stat 16(2):254–259
Heisterkamp SH, Dekkers AL, Heijne JC (2006) Automated detection of infectious disease outbreaks: hierarchical time series models. Stat Med 25(24):4179–4196
Held L, Paul M (2012) Modeling seasonality in space-time infectious disease surveillance data. Biom J 54(6):824–843
Held L, Höhle M, Hofmann M (2005) A statistical framework for the analysis of multivariate infectious disease surveillance counts. Stat Model 5:187–199
Held L, Hofmann M, Höhle M, Schmid V (2006) A two-component model for counts of infectious diseases. Biostatistics 7(3):422–437
Held L, Rufibach K, Balabdaoui F (2010) A score regression approach to assess calibration of continuous probabilistic predictions. Biometrics 66(4):1295–1305
Hilbe JM (2011) Negative binomial regression, 2nd edn. Cambridge University Press, Cambridge
Katti S (1960) The moments of the absolute difference and the absolute deviation of distributions. Ann Math Stat 31:78–85
Knessl C (1998) Integral representations and asymptotic expansions for Shannon and Renyi entropies. Appl Math Lett 11(2):69–74
Manitz J, Höhle M (2013) Bayesian outbreak detection algorithm for monitoring reported cases of campylobacteriosis in Germany. Biom J.
Mason S, Galpin J, Goddard L, Graham N, Rajartnam B (2007) Conditional exceedance probabilities. Mon Weather Rev 135(2):363–372
McCabe B, Martin G (2005) Bayesian predictions of low count time series. Int J Forecast 21(2):315–330
McCabe BP, Martin GM, Harris D (2011) Efficient probabilistic forecasts for counts. J Royal Stat Soc Ser B (Stat Methodol) 73(2):253–272
Murphy AH, Winkler RL (1987) A general framework for forecast verification. Mon Weather Rev 115:1330–1338
Nelson K, Leroux B (2006) Statistical models for autocorrelated data. Stat Med 25:1413–1430
Noufaily A, Enki DG, Farrington P, Garthwaite P, Andrews N, Charlett A (2013) An improved algorithm for outbreak detection in multiple surveillance systems. Stat Med 32(7):1206–1222
Paul M, Held L, Toschke A (2008) Multivariate modelling of infectious disease surveillance data. Stat Med 27:6250–6267
Smith JQ (1985) Diagnostic checks of non-standard time series models. J Forecast 4:283–291
Spiegelhalter DJ (1986) Probabilistic prediction in patient management. Stat Med 5:421–433
Steyerberg E (2009) Clinical prediction models. Springer, New York
Winkelmann R (2008) Econometric analysis of count data, 5th edn. Springer, New York
Winkler RL (1996) Scoring rules and the evaluation of probabilities. Test 5(1):1–60
Acknowledgments
We thank two referees for helpful comments and suggestions. Financial support by the Swiss National Science Foundation (SNF) is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Wei, W., Held, L. Calibration tests for count data. TEST 23, 787–805 (2014). https://doi.org/10.1007/s11749-014-0380-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-014-0380-8