Water flow probabilistic predictions based on a rainfall–runoff simulator: a two-regime model with variable selection


Probabilistic forecasting aims at producing a predictive distribution of the quantity of interest instead of a single best guess point-wise estimate. With regard to water flow forecasts, the two main sources of uncertainty stem from unknown future rainfall and temperature (input error, i.e., meteorological uncertainty) and from the inadequacy of the deterministic simulator mimicking the rainfall–runoff (RR) transformation (hydrological uncertainty or RR error). These two sources of uncertainty can be dealt with separately and only the latter will be considered here. Only hydrological uncertainty is at stake when recorded meteorological data (instead of meteorological forecasts) are used as inputs to feed the RR simulator (RRS) for probabilistic predictions. The predictive performance of the RRS may strongly depend on the hydrological regimes: rapid flood variations induce large errors of anticipation but a series of dry events will translate into a much more smoother sequence of river levels due to the easily predictable behavior of the soil reservoir emptying. Consequently, a model with several regimes adapted to different error structures appears as a solution to cope with the issue of unstationary predictive variance. The river regime is modeled as a latent variable, the distribution of which is based on additional outputs of the RRS to be selected. Inference is performed by the EM algorithm with both steps leading to explicit analytic expressions. Asymptotic confidence regions for the estimates are provided within the same EM framework. Model selection is also performed, including the length of the model memory as well as the choice of explanatory variables for the latent regimes. The model is applied to a series of water flow forecasts routinely issued by two hydroelectricity producers in France and in Québec and compared with their present operational forecasting methods.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. Ailliot, P. and Monbet, V. (2012). Markov-switching autoregressive models for wind time series. Environmental Modelling & Software, 30:92–101.

    Article  Google Scholar 

  2. Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. Journal of the American Statistical Association, 88(422):669–679.

    MathSciNet  Article  MATH  Google Scholar 

  3. Andreassian, V., Bergstrom, S., Chahinian, N., Duan, Q., Gusev, Y., Littlewood, I., Mathevet, T., Michel, C., Montanari, A., Moretti, G., et al. (2006). Catalogue of the models used in MOPEX 2004/2005. IAHS publication, 307:41.

    Google Scholar 

  4. Bates, B. C. and Campbell, E. P. (2001). A Markov chain Monte Carlo scheme for parameter estimation and inference in conceptual rainfall-runoff modeling. Water Resources Research, 37(4):937–947.

    Article  Google Scholar 

  5. Box, G. and Jenkins, G. (1970). Time Series Analysis: Forecasting and Control. Holden–Day, San Francisco, Ca.

    Google Scholar 

  6. Box, G. E. and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), pages 211–252.

  7. Chib, S. (1996). Calculating posterior distributions and modal estimates in markov mixture models. Journal of Econometrics, 75(1):79–97.

    MathSciNet  Article  MATH  Google Scholar 

  8. Collet, J., Épiard, X., and Coudray, P. (2009). Simulating hydraulic inflows using PCA and ARMAX. The European Physical Journal-Special Topics, 174(1):125–134.

    Article  Google Scholar 

  9. Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (methodological), pages 1–38.

  10. Engeland, K., Renard, B., Steinsland, I., and Kolberg, S. (2010). Evaluation of statistical models for forecast errors from the HBV model. Journal of Hydrology, 384(1):142–155.

    Article  Google Scholar 

  11. Evin, G., Kavetski, D., Thyer, M., and Kuczera, G. (2013). Pitfalls and improvements in the joint inference of heteroscedasticity and autocorrelation in hydrological model calibration. Water Resources Research, 49(7):4518–4524.

    Article  Google Scholar 

  12. Evin, G., Thyer, M., Kavetski, D., McInerney, D., and Kuczera, G. (2014). Comparison of joint versus postprocessor approaches for hydrological uncertainty estimation accounting for error autocorrelation and heteroscedasticity. Water Resources Research, 50(3):2350–2375.

    Article  Google Scholar 

  13. Fortin, V. (2000). Le modèle météo-apport HSAMI: historique, théorie et application. Institut de recherche d’Hydro-Québec, Varennes.

    Google Scholar 

  14. Furrer, E. M., Jacques, C., and Favre, A.-C. (2006). Short term discharge prediction using a Markovian regime switching model. Technical report, INRS-ETE.

  15. Gailhard, J. (2014). Algorithme de recalage associé à MORDOR diagnostic et proposition d’améliorations. Note Technique Interne H-44200965-2014-000075, EDF-DTG.

  16. Gelfand, A. E. and Smith, A. F. (1990). Sampling-based approaches to calculating marginal densities. Journal of the American Statistical Association, 85(410):398–409.

    MathSciNet  Article  MATH  Google Scholar 

  17. Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. Journal of the American Statistical Association, 102(477):359–378.

    MathSciNet  Article  MATH  Google Scholar 

  18. Gneiting, T., Raftery, A. E., Westveld, A. H., and Goldman, T. (2005). Calibrated probabilistic forecasting using ensemble model output statistics and minimum CRPS estimation. Monthly Weather Review, 133(5):1098–1118.

    Article  Google Scholar 

  19. Hemri, S., Fundel, F., and Zappa, M. (2013). Simultaneous calibration of ensemble river flow predictions over an entire range of lead times. Water Resources Research, 49(10):6744–6755.

    Article  Google Scholar 

  20. Hemri, S., Lisniak, D., and Klein, B. (2015). Multivariate postprocessing techniques for probabilistic hydrological forecasting. Water Resources Research, 51(9):7436–7451.

    Article  Google Scholar 

  21. Hersbach, H. (2000). Decomposition of the continuous ranked probability score for ensemble prediction systems. Weather and Forecasting, 15(5):559–570.

    Article  Google Scholar 

  22. Johnson, N. L., Kotz, S., and Balakrishnan, N. (1994). Continuous Univariate Distributions, vol. 1–2. New York: John Wiley & Sons.

  23. Krzysztofowicz, R. (2002). Bayesian system for probabilistic river stage forecasting. Journal of Hydrology, 268(1):16–40.

    Article  Google Scholar 

  24. Kuczera, G. (1983). Improved parameter inference in catchment models: 1. evaluating parameter uncertainty. Water Resources Research, 19(5):1151–1162.

    Article  Google Scholar 

  25. Li, M., Wang, Q., Bennett, J., and Robertson, D. (2015). A strategy to overcome adverse effects of autoregressive updating of streamflow forecasts. Hydrology and Earth System Sciences, 19(1):1–15.

    Article  Google Scholar 

  26. Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society. Series B, 44(2):226–233.

    MathSciNet  MATH  Google Scholar 

  27. Lu, Z.-Q. and Berliner, L. M. (1999). Markov switching time series models with application to a daily runoff series. Water Resources Research, 35(2):523–534.

    Article  Google Scholar 

  28. Matheson, J. E. and Winkler, R. L. (1976). Scoring rules for continuous probability distributions. Management Science, 22(10):1087–1096.

    Article  MATH  Google Scholar 

  29. Mathevet, T. (2010). Erreur empirique de modèle. Note Technique Interne D4165/NT/2010-00395-A, EDF-DTG.

  30. Morawietz, M., Xu, C.-Y., Gottschalk, L., and Tallaksen, L. M. (2011). Systematic evaluation of autoregressive error models as post-processors for a probabilistic streamflow forecast system. Journal of Hydrology, 407(1):58–72.

    Article  Google Scholar 

  31. Perreault, L., Garçon, R., and Gaudet, J. (2007). Analyse de séquences de variables aléatoires hydrologiques à l’aide de modèles de changement de régime exploitant des variables atmosphériques. La Houille Blanche (6):111–123.

    Article  Google Scholar 

  32. Pianosi, F. and Raso, L. (2012). Dynamic modeling of predictive uncertainty by regression on absolute errors. Water Resources Research, 48(3).

  33. Raftery, A. E., Gneiting, T., Balabdaoui, F., and Polakowski, M. (2005). Using Bayesian model averaging to calibrate forecast ensembles. Monthly Weather Review, 133(5).

  34. Schaefli, B., Talamba, D. B., and Musy, A. (2007). Quantifying hydrological modeling errors through a mixture of normal distributions. Journal of Hydrology, 332(3):303–315.

    Article  Google Scholar 

  35. Schoups, G. and Vrugt, J. A. (2010). A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resources Research, 46(10).

  36. Schwarz, G. (1978). Estimating the Dimension of a Model. The Annals of Statistics, 6(2):461–464.

    MathSciNet  Article  MATH  Google Scholar 

  37. Sorooshian, S. and Dracup, J. A. (1980). Stochastic parameter estimation procedures for hydrologie rainfall-runoff models: Correlated and heteroscedastic error cases. Water Resources Research, 16(2):430–442.

    Article  Google Scholar 

  38. Thyer, M., Kuczera, G., and Wang, Q. (2002). Quantifying parameter uncertainty in stochastic models using the Box-Cox transformation. Journal of Hydrology, 265(1):246–257.

    Article  Google Scholar 

  39. Todini, E. (2008). A model conditional processor to assess predictive uncertainty in flood forecasting. International Journal of River Basin Management, 6(2):123–137.

    Article  Google Scholar 

  40. Vrugt, J. A. and Robinson, B. A. (2007). Treatment of uncertainty using ensemble methods: Comparison of sequential data assimilation and Bayesian model averaging. Water Resources Research, 43(1).

  41. Wang, Q., Shrestha, D. L., Robertson, D., and Pokhrel, P. (2012). A log-sinh transformation for data normalization and variance stabilization. Water Resources Research, 48(5).

Download references


This work was supported by Électricité de France and by Hydro-Québec [research Grant Number 694R] through the thesis of M. Courbariaux. We would like to thank Anne-Catherine Favre, Joël Gailhard and Luc Perreault for their unfailing help and constructive comments on earlier drafts of the article. The forecasting and development teams at EDF-DTG and Hydro-Québec have provided the necessary material and case studies as well as many valuable advises ; we thank in particular Catherine Guay, Isabelle Chartier and Marie Minville from IREQ, Rémy Garçon, Matthieu Le-Lay and Federico Garavaglia from EDF-DTG. We also thank Joan Sobota for English proofreading. We finally thank the Associate Editor and the two reviewers for their comments and questions which help us to improve the paper.

Author information



Corresponding author

Correspondence to Pierre Barbillon.


Appendix 1: Operational predictive method

EDF’s operational predictive method consists of 3 independent modules: a deterministic model, an error model and an empirical copula.

Deterministic model (Gailhard 2014) The deterministic model in use at EDF is an autoregressive model combined with exponential smoothing. The strength of the autocorrelation is supposed to increase with the rate of water flow coming from the deep reservoirs of the watershed.

Error model (Mathevet 2010) The error model is an heteroscedastic conditional normal model derived for each forecasting lead time h (after normalization):

$$\begin{aligned} \left( Y_{t+h}|X_{t+h}=x\right) =b_{h}\left( x\right) +x+\sigma _{h}\left( x\right) \varepsilon ,\;\;\varepsilon \sim \mathcal {N}\left( 0,1\right) , \end{aligned}$$

where \(b_{h}\) and \(\sigma _{h}\) are tabulated functions of x.

Empirical mopula One finally resorts to an empirical copula to get samples of a space and time multivariate distribution from samples from the marginal (lead time by lead time) distributions.

Appendix 2: Fisher information matrix

$$\begin{aligned} G_{\gamma _k}(\mathbf {Y},\mathbf {Z})&=\sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}}\left( \mathbf {U}_t/\sigma _k^2 \cdot Y_t +0 \cdot Y_t^2-\mathbf {U}_t\mathbf {U}_t^T\varvec{\gamma }_k/\sigma ^2_k \right) \\&=\sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}}\left( \mathbf {U}_t/\sigma _k^2\cdot Y_t-\mathbf {U}_t\mathbf {U}_t^T\varvec{\gamma }_k/\sigma ^2_k \right) ,\\ G_{\sigma ^2_k}(\mathbf {Y},\mathbf {Z})&=\sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}}\left( \frac{\left( Y_t-\varvec{\gamma }_k^T\mathbf {U}_t \right) ^2}{2\sigma ^4_k}-\frac{1}{2\sigma _k^2}\right) ,\\ G_{\mathbf {B}}(\mathbf {Y},\mathbf {Z})&= \sum _{t>t_{\min }} (Z_t-\mathbf {B}^T\mathbf {V}_t)\mathbf {V}_t. \end{aligned}$$

For any k, \(k'\not =k\),

$$\begin{aligned} \frac{\partial }{\partial \varvec{\gamma }_k}G_{\varvec{\gamma }_k}(\mathbf {Y},\mathbf {Z})= & {} -\sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}} \mathbf {U}_t\mathbf {U}_t^T/\sigma ^2_k,\\ \frac{\partial }{\partial \sigma _k^2}G_{\sigma ^2_k}(\mathbf {Y},\mathbf {Z})= & {} \sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}} \left( \frac{-\left( Y_t-\varvec{\gamma }_k^T\mathbf {U}_t \right) ^2}{\sigma ^6_k}+\frac{1}{2\sigma _k^4}\right) ,\\ \frac{\partial }{\partial \varvec{\gamma }_k}G_{\sigma ^2_k}(\mathbf {Y},\mathbf {Z})= & {} \sum _{t>t_{\min }} \mathbb {I}_{\{S_{t}=k\}}\left( \frac{-\mathbf {U}_t \left( Y_t-\varvec{\gamma }_k^T\mathbf {U}_t \right) }{\sigma ^4_k}\right) ,\\ \frac{\partial }{\partial \theta _{k'}}G_{\theta _k}(\mathbf {Y},\mathbf {Z})= & {} 0,\\ \frac{\partial }{\partial \mathbf {B}}G_{\mathbf {B}}(\mathbf {Y},\mathbf {Z})= & {} -\sum _{t>t_{\min }}\mathbf {V}_t\mathbf {V}_t^T,\\ \frac{\partial }{\partial \theta _k}G_{\mathbf {B}}(\mathbf {Y},\mathbf {Z})= & {} 0. \end{aligned}$$

Then, the first term in the Louis decomposition is computed since \(\mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}|\mathbf {Y};(\varvec{\theta },\mathbf {B}))=\tau _{kt}\) is computed above.

For the second term in the Louis decomposition, we notice that:

$$\begin{aligned} \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}^2|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}|\mathbf {Y};\varvec{\theta },\mathbf {B})=\tau _{kt},\\ \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}\mathbb {I}_{\{S_{t}=k'\}}|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} 0 \text { for } k\not =k',\\ \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}\mathbb {I}_{\{S_{t'}=k\}}|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}|\mathbf {Y};\varvec{\theta },\mathbf {B})\mathbb {E}(\mathbb {I}_{\{S_{t'}=k\}}|\mathbf {Y};\varvec{\theta },\mathbf {B})\text { by independence}. \end{aligned}$$

We also need to compute:

$$\begin{aligned} \mathbb {E}(Z_t|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} \sum _k \tau _{kt}E_{kt}\quad \text {and}\\ \mathbb {E}\left( (Z_t-\mathbf {B}^T\mathbf {V}_t)^2|\mathbf {Y};\varvec{\theta },\mathbf {B}\right)= & {} \text {var}(Z_t|\mathbf {Y};\varvec{\theta },\mathbf {B})=\sum _k \tau _{kt}\varsigma _{kt}, \end{aligned}$$


$$\begin{aligned} \varsigma _{1t}= & {} 1-\frac{\phi (-\mathbf {B}^T \mathbf {V}_t)}{\varPhi (-\mathbf {B}^T\mathbf {V}_t)}\left( \frac{\phi (-\mathbf {B}^T\mathbf {V}_t)}{\varPhi (-\mathbf {B}^T\mathbf {V}_t)}-\mathbf {B}^T\mathbf {V}_t\right) ,\\ \varsigma _{0t}= & {} 1-\frac{\phi (-\mathbf {B}^T \mathbf {V}_t)}{1-\varPhi (-\mathbf {B}^T\mathbf {V}_t)}\left( \frac{\phi (-\mathbf {B}^T\mathbf {V}_t)}{1-\varPhi (-\mathbf {B}^T\mathbf {V}_t)}+\mathbf {B}^T\mathbf {V}_t\right) . \end{aligned}$$

Again, we rely on the independence between the \(Z_t\)s.

The remaining terms are easily evaluated:

$$\begin{aligned} \mathbb {E}(\mathbb {I}_{\{S_{t'}=k\}}Z_t|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} \tau _{kt'}\mathbb {E}(Z_t|\mathbf {Y};\varvec{\theta },\mathbf {B})\text { by independence},\\ \mathbb {E}(\mathbb {I}_{\{S_{t}=k\}}Z_t|\mathbf {Y};\varvec{\theta },\mathbf {B})= & {} \tau _{kt}E_{kt}. \end{aligned}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Courbariaux, M., Barbillon, P. & Parent, É. Water flow probabilistic predictions based on a rainfall–runoff simulator: a two-regime model with variable selection. JABES 22, 194–219 (2017). https://doi.org/10.1007/s13253-017-0278-5

Download citation


  • EM algorithm
  • Probit model
  • Model uncertainty
  • Probabilistic forecasts
  • Hydrology
  • Rainfall–runoff model