Skip to main content
Log in

Post-processing Multiensemble Temperature and Precipitation Forecasts Through an Exchangeable Normal-Gamma Model and Its Tobit Extension

  • Published:
Journal of Agricultural, Biological and Environmental Statistics Aims and scope Submit manuscript

Abstract

Meteorological ensemble members are a collection of scenarios for future weather issued by a meteorological center. Such ensembles nowadays form the main source of valuable information for probabilistic forecasting which aims at producing a predictive probability distribution of the quantity of interest instead of a single best guess point-wise estimate. Unfortunately, ensemble members cannot generally be considered as a sample from such a predictive probability distribution without a preliminary post-processing treatment to re-calibrate the ensemble. Two main families of post-processing methods, either competing such as the BMA or collaborative such as the EMOS, can be found in the literature. This paper proposes a mixed-effect model belonging to the collaborative family. The structure of the model is formally justified by Bruno de Finetti’s representation theorem which shows how to construct operational statistical models of ensemble based on judgments of invariance under the relabeling of the members. Its interesting specificities are as follows: (1) exchangeability contributes to parsimony, with an interpretation of the latent pivot of the ensemble in terms of a statistical synthesis of the essential meteorological features of the ensemble members, (2) a multiensemble implementation is straightforward, allowing to take advantage of various information so as to increase the sharpness of the forecasting procedure. Focus is cast onto normal statistical structures, first with a direct application for temperatures, then with its very convenient Tobit extension for precipitation. Inference is performed by expectation maximization (EM) algorithms with both steps leading to explicit analytic expressions in the Gaussian temperature case, and recourse is made to stochastic conditional simulations in the zero-inflated precipitation case. After checking its good behavior on artificial data, the proposed post-processing technique is applied to temperature and precipitation ensemble forecasts produced for lead times from 1 to 9 days over five river basins managed by Hydro-Québec, which ranks among the world’s largest electric companies. These ensemble forecasts, provided by three meteorological global forecast centers (Canadian, USA and European), were extracted from the THORPEX Interactive Grand Global Ensemble (TIGGE) database. The results indicate that post-processed ensembles are calibrated and generally sharper than the raw ensembles for the five watersheds under study.

Supplementary materials accompanying this paper appear on-line.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Notes

  1. CMC: Canadian Meteorological Center.

  2. NCEP: National Center for Environmental Prediction; GEFS: Global Ensemble Forecast System.

  3. http://collaboration.cmc.ec.gc.ca/cmc/ensemble/doc/info_geps_e.pdf.

References

  • D. Allard. Modeling spatial and spatio-temporal non Gaussian processes. In Advances and Challenges in Space-time Modelling of Natural Events, pages 141–164. Springer, 2012.

  • Z. Ben Bouallègue.Calibrated short-range ensemble precipitation forecasts using extended logistic regression with interaction terms. Weather and Forecasting, 28(2):515–524, 2013.

    Article  Google Scholar 

  • P. Bougeault, Z. Toth, C. Bishop, B. Brown, D. Burridge, D. H. Chen, B. Ebert, M. Fuentes, T. M. Hamill, K. Mylne, et al. The THORPEX interactive grand global ensemble. Bulletin of the American Meteorological Society, 91(8):1059–1072, 2010.

    Article  Google Scholar 

  • G. E. Box and D. R. Cox. An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), pages 211–252, 1964.

  • M. Broniatowski, G. Celeux, and J. Diebolt. Reconnaissance de mélanges de densités par un algorithme d’apprentissage probabiliste. Data Analysis and Informatics, 3:359–373, 1983.

    Google Scholar 

  • R. Buizza, M. Leutbecher, and L. Isaksen. Potential use of an ensemble of analyses in the ECMWF Ensemble Prediction System. Quarterly Journal of the Royal Meteorological Society, 134(637):2051–2066, 2008.

    Article  Google Scholar 

  • G. Celeux and J. Diebolt. The SEM algorithm: a probabilistic teacher algorithm derived from the EM algorithm for the mixture problem. Computational Statistics Quarterly, 2(1):73–82, 1985.

    Google Scholar 

  • M. Courbariaux, P. Barbillon, and É. Parent. Water flow probabilistic predictions based on a rainfall–runoff simulator: a two-regime model with variable selection. Journal of Agricultural, Biological and Environmental Statistics, 22(2):194–219, 2017.

    Article  MathSciNet  MATH  Google Scholar 

  • B. de Finetti. Funzione caratteristica di un fenomeno aleatorio. 1931.

  • B. de Finetti. La prévision: ses lois logiques, ses sources subjectives. In Annales de l’institut Henri Poincaré, volume 7, pages 1–68, 1937.

    MATH  Google Scholar 

  • A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (methodological), pages 1–38, 1977.

  • C. Fraley, A. E. Raftery, and T. Gneiting. Calibrating multimodel forecast ensembles with exchangeable and missing members using bayesian model averaging. Monthly Weather Review, 138(1):190–202, 2010.

    Article  Google Scholar 

  • R. Garçon. Prévision opérationnelle des apports de la Durance à Serre-Ponçon à l’aide du modèle MORDOR. Bilan de l’année 1994-1995. La Houille Blanche, (5):71–76, 1996.

  • A. E. Gelfand and A. F. Smith. Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410):398–409, 1990.

    Article  MathSciNet  MATH  Google Scholar 

  • T. Gneiting, A. E. Raftery, A. H. Westveld, and T. Goldman. Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133(5):1098–1118, 2005.

    Article  Google Scholar 

  • T. Gneiting, F. Balabdaoui, and A. E. Raftery. Probabilistic forecasts, calibration and sharpness. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 69(2):243–268, 2007.

    Article  MathSciNet  MATH  Google Scholar 

  • C. Guay, M. Minville, and I. Chartier. Hsami+ : Guide théorique. Technical report, Institut de recherche d’Hydro-Québec, Varennes, QC, Canada, 2018.

  • T. M. Hamill. Interpretation of rank histograms for verifying ensemble forecasts. Monthly Weather Review, 129 (3):550–560, 2001.

    Article  Google Scholar 

  • T. M. Hamill and S. J. Colucci. Verification of eta-rsm short-range ensemble forecasts. Monthly Weather Review, 125 (6): 1312–1327, 1997.

    Article  Google Scholar 

  • H. Hersbach. Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15 (5): 559–570, 2000.

    Article  Google Scholar 

  • E. Hewitt and L. J. Savage. Symmetric measures on Cartesian products. Transactions of the American Mathematical Society, 80 (2): 470–501, 1955.

    Article  MathSciNet  MATH  Google Scholar 

  • S. Khajehei, A. Ahmadalipour, and H. Moradkhani. An effective post-processing of the north american multi-model ensemble (nmme) precipitation forecasts over the continental us. Climate Dynamics, 51 (1-2): 457–472, 2018.

    Article  Google Scholar 

  • R. Krzysztofowicz and C. J. Maranzano. Bayesian processor of output for probabilistic quantitative precipitation forecasts. Manuscript in review, 2006.

  • W. Li, Q. Duan, C. Miao, A. Ye, W. Gong, and Z. Di. A review on statistical postprocessing methods for hydrometeorological ensemble forecasting. Wiley Interdisciplinary Reviews: Water, 4 (6): e1246, 2017.

    Article  Google Scholar 

  • D. V. Lindley. Understanding uncertainty. John Wiley & Sons, 2013.

  • N. Meinshausen. Quantile regression forests. Journal of Machine Learning Research, 7 (Jun): 983–999, 2006.

    MathSciNet  MATH  Google Scholar 

  • J. W. Messner, G. J. Mayr, A. Zeileis, and D. S. Wilks. Heteroscedastic extended logistic regression for postprocessing of ensemble guidance. Monthly Weather Review, 142 (1): 448–456, 2014.

    Article  Google Scholar 

  • A. O’Hagan. Research in elicitation. University of Sheffield, Department of Probability and Statistics, School of Mathematics, 2005.

  • Y.-Y. Park, R. Buizza, and M. Leutbecher. Tigge: Preliminary results on comparing and combining ensembles. Quarterly Journal of the Royal Meteorological Society, 134 (637): 2029–2050, 2008. ISSN 1477-870X. https://doi.org/10.1002/qj.334.

  • L. Perreault. Post-traitement statistique des prévisions météorologiques d’ensemble pour le complexe manicouagan : les températures. Rapport scientifique IREQ-2017-0057, Institut de recherche d’Hydro-Québec, 2017.

  • A. E. Raftery, T. Gneiting, F. Balabdaoui, and M. Polakowski. Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133 (5), 2005.

  • R. Schefzik, T. L. Thorarinsdottir, T. Gneiting, et al. Uncertainty quantification in complex simulation models using ensemble copula coupling. Statistical Science, 28 (4): 616–640, 2013.

    Article  MathSciNet  MATH  Google Scholar 

  • M. Scheuerer. Probabilistic quantitative precipitation forecasting using ensemble model output statistics. Quarterly Journal of the Royal Meteorological Society, 140 (680): 1086–1096, 2014.

    Article  Google Scholar 

  • M. Scheuerer and T. M. Hamill. Statistical postprocessing of ensemble precipitation forecasts by fitting censored, shifted gamma distributions. Monthly Weather Review, 143 (11): 4578–4596, 2015.

    Article  Google Scholar 

  • P. Schultz, H. Yuan, M. Charles, R. Krzysztofowicz, and Z. Toth. Pseudo-precipitation: a continuous variable for statistical post-processing. In 20th Conference on Probability and Statistics in the Atmospheric Sciences, 2010.

  • J. M. L. Sloughter, A. E. Raftery, T. Gneiting, and C. Fraley. Probabilistic quantitative precipitation forecasting using Bayesian model averaging. Monthly Weather Review, 135 (9): 3209–3220, 2007.

    Article  Google Scholar 

  • M. Taillardat, O. Mestre, M. Zamo, and P. Naveau. Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Monthly Weather Review, 144 (6): 2375–2393, 2016.

    Article  Google Scholar 

  • C. Tebaldi and R. Knutti. The use of the multi-model ensemble in probabilistic climate projections. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 365 (1857): 2053–2075, 2007.

    Article  MathSciNet  Google Scholar 

  • T. L. Thorarinsdottir and T. Gneiting. Probabilistic forecasts of wind speed: ensemble model output statistics by using heteroscedastic censored regression. Journal of the Royal Statistical Society: Series A (Statistics in Society), 173(2):371–388, 2010.

    Article  MathSciNet  Google Scholar 

  • D. S. Wilks. Extending logistic regression to provide full-probability-distribution MOS forecasts. Meteorological Applications, 16(3):361–368, 2009.

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by Électricité de France and by Hydro-Québec [Research Grant Number 694R] through the Ph.D. thesis of M. Courbariaux. We would like to thank Jacques Bernier, Joël Gailhard, Anne-Catherine Favre and Vincent Fortin for their unfailing help and constructive comments regarding this work. The forecasting and development teams at EDF-DTG and Hydro-Québec have provided the necessary material and case studies as well as many valuable advises: we thank in particular Fabian Tito Arandia Martinez and Éric Crobeddu from Hydro-Québec, Fabien Rinaldi and Rémy Garçon from EDF-DTG.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pierre Barbillon.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (zip 760 KB)

Details on Inference in the Gaussian Case

Details on Inference in the Gaussian Case

E-step  

We need to compute the conditional distributions \(\left[ Z_t|\omega _t^{-2},{\mathbf {X}}_t,Y_t\right] \) and \(\Big [\omega _t^{-2}|{\mathbf {X}}_t,Y_t\Big ]\) for each time t of the training set. We interpret the joint distribution at time t

$$\begin{aligned} \left[ Z_t,\omega _t^{2},{\mathbf {X}}_t,Y_t\right]&=[{\mathbf {X}}_t,Y_t|Z_t,\omega _t^{-2}][\omega _t^{-2},Z_t]\\&=[{\mathbf {X}}_t|Z_t,\omega _t^{-2}][Y_t|Z_t,\omega _t^{-2}][\omega _t^{-2},Z_t] \end{aligned}$$

as a function of \(\left( \omega _t^{-2},Z_t\right) ,\) and try to recognize the probability distribution function (pdf) of \(\left( Z_t,\omega _t^{-2}|{\mathbf {X}}_t,Y_t\right) \) up to a multiplicative constant, since \(\left[ Z_t,\omega _t^{-2}|{\mathbf {X}}_t,Y_t\right] =[\omega _t^{-2},Z_t,{\mathbf {X}}_t,Y_t]\times \left( \frac{1}{[{\mathbf {X}}_t,Y_t]}\right) \).

The complete deviance (minus twice the complete log-likelihood) at time t can be written as a quadratic form in \({\mathbf {X}}_t\) and \(Y_t\), up to known normalizing constants:

$$\begin{aligned}&\sum _{e=0}^{E}\left\{ \sum _{k=1}^{K_{e}}(X_{e,k,t}-b_{e}Z_t-a_{e})^{2}\omega _t^{-2}c_{e}^{-2}-K_{e}\log \left( \omega _t^{-2}\right) -K_{e}\log \left( c_{e}^{-2}\right) \right\} \nonumber \\&\qquad +\,Z_t^{2}\lambda ^{-1}\omega _t^{-2}-\log (\lambda ^{-1})-\log \left( \omega _t^{-2}\right) \nonumber \\&\quad \quad +\,2\beta \omega _t^{-2}-2(\alpha -1)\log \left( \omega _t^{-2}\right) -2\log \left( \frac{\beta ^{\alpha }}{\Gamma (\alpha )}\right) . \end{aligned}$$
(8)

The pdf we are looking for can be further decomposed as:

$$\begin{aligned} \left[ Z_t,\omega _t^{-2}|{\mathbf {X}}_t,Y_t\right] =[Z_t|\omega _t^{-2},{\mathbf {X}}_t,Y_t][\omega _t^{-2}|{\mathbf {X}}_t,Y_t]\,. \end{aligned}$$

We now check whether we can still benefit from a conjugate situation, i.e., given \(\left( {\mathbf {X}}_t,Y_t\right) \) we would still get a normal pdf for \(\left( Z_t|\omega _t^{-2},{\mathbf {X}}_t,Y_t\right) \) under the form \({\mathcal {N}}(m^{\prime },\lambda ^{\prime }\omega _t^{2})\) and a gamma pdf for \(\left( \omega _t^{-2}|{\mathbf {X}}_t,Y_t\right) \), \(\Gamma \left( \alpha ^{\prime },\beta ^{\prime }\right) \). Expressed as a function of \([Z_t,\omega _t^{-2}|{\mathbf {X}}_t,Y_t]\), the deviance exhibits the following shape:

$$\begin{aligned} (Z_t-m_t^{\prime })^{2}\lambda ^{\prime -1}\omega _t^{-2}-\log (\omega _t^{-2})-\log (\lambda ^{\prime -1})-2(\alpha ^{\prime }-1)\log (\omega _t^{-2})+2\beta _t^{\prime }\,(\omega _t^{-2})\,. \end{aligned}$$

We proceed by trying to identify parameters \(\alpha ^{\prime },\beta _t^{\prime },\lambda ^{\prime },m_t^{\prime }\) in the above equation to match the deviance for their joint distribution given by Eq. (8).

By matching both expressions, we obtain:

$$\begin{aligned} \begin{array}{lcl} \lambda ^{\prime -1}&{}=&{}\sum _{e=0}^{E}K_{e}b_{e}^{2}c_{e}^{-2}+\lambda ^{-1},\\ m_t^{\prime }&{}=&{}\lambda ^{\prime }\cdot \sum _{e=0}^{E}c_{e}^{-2}b_{e}K_{e}\left( {\bar{X}}_{e,t}-a_{e}\right) \\ \alpha ^{\prime }&{}=&{}\alpha +\frac{\sum _{e=0}^{E}K_{e}}{2},\\ \beta _t^{\prime }&{}=&{}\beta +\frac{1}{2}\left\{ \sum _{e=0}^{E}\sum _{k=1}^{K_{e}}c_{e}^{-2}(X_{e,k,t}-a_{e})^{2}-m_t^{\prime 2}\lambda ^{\prime -1}\right\} , \end{array} \end{aligned}$$

where \({\bar{X}}_{e,t}=\frac{1}{K_{e}}\sum _{k=1}^{K_{e}}X_{e,k,t}\). Therefore, we are in a conjugate situation since the conditional pdf \(\left[ Z_t,\omega _t^{-2}|{\mathbf {X}}_t,Y_t\right] \) is in the normal-gamma model as is the marginal pdf \(\left[ Z_t,\omega _t^{-2}\right] \).

Denoting \(\phi (\cdot )\) the first derivative of function \(\log \left\{ \Gamma (\cdot )\right\} \), the moments necessary for performing the E-step are:

$$\begin{aligned} \begin{array}{lcl} {\mathbb {E}}\left( \log (\omega _t^{-2})|{\mathbf {X}}_t,Y_t\right) &{} =&{}-\log (\beta _t^{\prime })+\phi (\alpha ^{\prime })\,,\\ {\mathbb {E}}(\omega _t^{-2}|{\mathbf {X}}_t,Y_t) &{}=&{}\frac{\alpha ^{\prime }}{\beta _t^{\prime }}\,,\\ {\mathbb {E}}(Z_t^{2}\omega _t^{-2}|{\mathbf {X}}_t,Y_t) &{}=&{}\lambda ^{\prime }+m_t^{\prime 2}\frac{\alpha ^{\prime }}{\beta _t^{\prime }}\,,\\ {\mathbb {E}}(Z_t \omega _t^{-2}|{\mathbf {X}}_t,Y_t) &{} =&{}m_t^{\prime }\frac{\alpha ^{\prime }}{\beta _t^{\prime }}\,. \end{array} \end{aligned}$$

M-step  

We write the complete deviance \(D(\varvec{\theta })=D(\alpha ,\beta ,\lambda ,{\mathbf {a}},{\mathbf {b}},{\mathbf {c}})\), denoting n the number of records in the dataset (each of them indexed by t):

$$\begin{aligned} \begin{aligned}D(\varvec{\theta })&= \sum _{t=1}^{n} \Bigg \{ Z_{t}^{2}\lambda ^{-1}\omega _{t}^{-2}-\log (\lambda ^{-1})-2\alpha \log (\omega _{t}^{-2})+2\beta \omega _{t}^{-2}-2\alpha \log (\beta )\\&\qquad +2\log \left\{ \Gamma (\alpha )\right\} \\&\qquad + \sum _{e=0}^{E}\left\{ \sum _{k=1}^{K_{e}}\left( X_{e,k,t}-a_{e}-b_{e}Z_{t}\right) ^{2}c_{e}^{-2}\omega _{t}^{-2}-\log \left( c_{e}^{-2}\right) \right\} \Bigg \}+ \text {Cst} \,, \end{aligned} \end{aligned}$$

where \(\text {Cst}\) is a constant term with respect to the parameters to estimate.

First, the expectation of \(D(\varvec{\theta })\) is computed by using the moments computed in the E-step. Then, this expectation is differentiated with respect to the parameters to be updated. This leads to the following explicit update formulas, and the subscript new indicates the new value of the parameter:

$$\begin{aligned} \begin{array}{lcl} b_{e,new} &{} =&{}\frac{\frac{D_{e}}{B}-\frac{C_{e}}{G}}{\frac{G}{B}-\frac{H}{G}-\frac{n\lambda ^{\prime }}{G\alpha ^{\prime }}}\quad \text { for } e\in \{1,\ldots ,E\},\\ a_{e,new} &{} =&{}\frac{D_{e}}{B}-b_{e,new}\frac{G}{B} \quad \text { for } e\in \{0,\ldots ,E\},\\ c_{e,new}^{2}&{}=&{}K_{e}b_{e,new}^{2}\lambda ^{\prime }+\frac{1}{n}\sum _{t=1}^{n}\frac{\alpha ^{\prime }}{\beta _{t}^{\prime }}\sum _{k=1}^{K_{e}}\left( X_{e,k,t}-a_{e,new}-b_{e,new}m_{t}^{\prime }\right) ^{2}\quad \text { for } e\in \{1,\ldots ,E\},\\ \lambda _{new}&{}=&{}\lambda ^{\prime }+\frac{\alpha ^{\prime }}{n}\sum _{t=1}^{n}\frac{m_{t}^{\prime 2}}{\beta _{t}^{\prime }},\\ \beta _{new} &{} =&{}\frac{n\alpha _{new}}{\alpha ^{\prime }\sum _{t=1}^{n}\frac{1}{\beta _{t}^{\prime }}}, \end{array} \end{aligned}$$

where \(G=\sum _{t=1}^{n}\frac{m_{t}^{\prime }}{\beta _{t}^{\prime }}\), \(B=\sum _{t=1}^{n}\frac{1}{\beta _{t}^{\prime }}\), \(C_{e}=\sum _{t=1}^{n}\frac{m_{t}^{\prime }{\bar{X}}_{e,t}}{\beta _{t}^{\prime }}\), \(D_{e}=\sum _{t=1}^{n}\frac{{\bar{X}}_{e,t}}{\beta _{t}^{\prime }}\) et \(H=\sum _{t=1}^{n}\frac{m_{t}^{\prime 2}}{\beta _{t}^{\prime }}\). For updating \(\alpha \), we use a numeric solver of the following equation:

$$\begin{aligned} \log \left( \frac{n\alpha _{new}}{\alpha ^{\prime }\sum _{t=1}^{n}\frac{1}{\beta _{t}^{\prime }}}\right) -\phi (\alpha _{new}) =\frac{1}{n}\sum _{t=1}^{n}\log (\beta _{t}^{\prime })-\phi (\alpha ^{\prime }). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Courbariaux, M., Barbillon, P., Perreault, L. et al. Post-processing Multiensemble Temperature and Precipitation Forecasts Through an Exchangeable Normal-Gamma Model and Its Tobit Extension. JABES 24, 309–345 (2019). https://doi.org/10.1007/s13253-019-00358-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13253-019-00358-2

Keywords

Navigation