Statistics for sample splitting for the calibration and validation of hydrological models

  • Dedi Liu
  • Shenglian Guo
  • Zhaoli Wang
  • Pan Liu
  • Xixuan Yu
  • Qin Zhao
  • Hui Zou
Original Paper


Hydrological models have been widely applied in flood forecasting, water resource management and other environmental sciences. Most hydrological models calibrate and validate parameters with available records. However, the first step of hydrological simulation is always to quantitatively and objectively split samples for use in calibration and validation. In this paper, we have proposed a framework to address this issue through a combination of a hierarchical scheme through trial and error method, for systematic testing of hydrological models, and hypothesis testing to check the statistical significance of goodness-of-fit indices. That is, the framework evaluates the performance of a hydrological model using sample splitting for calibration and validation, and assesses the statistical significance of the Nash–Sutcliffe efficiency index (E f ), which is commonly used to assess the performance of hydrological models. The sample splitting scheme used is judged as acceptable if the E f values exceed the threshold of hypothesis testing. According to the requirements of the hierarchical scheme for systematic testing of hydrological models, cross calibration and validation will help to increase the reliability of the splitting scheme, and reduce the effective range of sample sizes for both calibration and validation. It is illustrated that the threshold of E f is dependent on the significance level, evaluation criteria (both regarded as the population), distribution type, and sample size. The performance rating of E f is largely dependent on the evaluation criteria. Three types of distributions, which are based on an approximately standard normal distribution, a Chi square distribution, and a bootstrap method, are used to investigate their effects on the thresholds, with two commonly used significance levels. The highest threshold is from the bootstrap method, the middle one is from the approximately standard normal distribution, and the lowest is from the Chi square distribution. It was found that the smaller the sample size, the higher the threshold values are. Sample splitting was improved by providing more records. In addition, outliers with a large bias between the simulation and the observation can affect the sample values of E f , and hence the output of the sample splitting scheme. Physical hydrology processes and the purpose of the model should be carefully considered when assessing outliers. The proposed framework in this paper cannot guarantee the best splitting scheme, but the results show the necessary conditions for splitting schemes to calibrate and validate hydrological models from a statistical point of view.


Sample splitting Model calibration and validation Hypothesis testing Hydrological model Nash–Sutcliffe efficiency index 



The authors gratefully acknowledge the financial support from the National Natural Science Foundation of China (Nos. 51579183, 51379148, 91647106 and 51525902) and the Science and Technology Program of Guangzhou City (No. 201707010072).


  1. Alley WM (1984) On the treatment of évapotranspiration, soil moisture accounting, and aquifer recharge in monthly water balance models. Water Resour Res 20:1137–1149CrossRefGoogle Scholar
  2. Alley WM (1985) Water balance models in one-month-ahead stream flow forecasting. Water Resour Res 21(4):597–606CrossRefGoogle Scholar
  3. Artinyan E, Vincendon B, Kroumova K, Nedkov N, Tsarev P, Balabanova S, Koshinchanov G (2016) Flood forecasting and alert system for Arda River basin. J Hydrol 541:457–470CrossRefGoogle Scholar
  4. ASCE (1993) The ASCE task committee on definition of criteria for evaluation of watershed models of the watershed management committee, irrigation and drainage division, criteria for evaluation of watershed models. J Irrig Drain Eng 119(3):429–442CrossRefGoogle Scholar
  5. Beven KJ (2006) A manifesto for the equifinality thesis. J Hydrol 320:18–36CrossRefGoogle Scholar
  6. Beven K (2012) Rainfall-Runoff Modelling: The Primer, 2nd edn. Wiley, HobokenCrossRefGoogle Scholar
  7. Biondi D, Freni G, Iacobellis V, Mascaro G, Montanari A (2012) Validation of hydrological models: conceptual basis, methodological approaches and a proposal for a code of practice. Phys Chem Earth 42–44:70–76CrossRefGoogle Scholar
  8. Borup M, Grum M, Madsen H, Mikkelsen PS (2015) A partial ensemble Kalman filtering approach to enable use of range limited observations. Stoch Environ Res Risk Assess 29(1):119–129CrossRefGoogle Scholar
  9. Brown RA, Skaggs RW, Hunt WF III (2013) Calibration and validation of DRAINMOD to model bioretention hydrology. J Hydrol 486:430–442CrossRefGoogle Scholar
  10. Chen L, Shen Z, Yang X, Liao Q, Yu SL (2014) An interval-deviation approach for hydrology and water quality model evaluation within an uncertainty framework. J Hydrol 509:207–214CrossRefGoogle Scholar
  11. Chiew FHS, McMahon TA (1993) Assessing the adequacy of catchment streamflow yield estimates. Aust J Soil Res 31(5):665–680CrossRefGoogle Scholar
  12. Clarke RT (1973) A review of some mathematical models used in hydrology, with observations on their calibration and use. J Hydrol 19(1):1–20CrossRefGoogle Scholar
  13. Cramér H (1946) Mathematical methods of statistics. Princeton University Press, PrincetonGoogle Scholar
  14. Criss RE, Winston WE (2008) Do Nash values have value? Discussion and alternate proposals. Hydrol Process 22:2723–2725CrossRefGoogle Scholar
  15. Dawson CW, Abrahart RJ, See LM (2007) HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environ Model Softw 22(7):1034–1052CrossRefGoogle Scholar
  16. Duan QY, Sorooshian S, Gupta V (1992) Effective and efficient global optimization for conceptual rainfall-runoff models. Water Resour Res 28(4):1015–1031CrossRefGoogle Scholar
  17. Efron B (1979) Bootstrap methods: another look at the jackknife. Ann Stat 7:1–26CrossRefGoogle Scholar
  18. Efron B (1981a) Nonparametric estimates of standard error: the Jackknife, the bootstrap, and other methods. Biometrika 68:589–599CrossRefGoogle Scholar
  19. Efron B (1981b) Nonparametric standard errors and confidence intervals. Can J Stat 9:139–158CrossRefGoogle Scholar
  20. Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New YorkCrossRefGoogle Scholar
  21. Fernandez W, Vogel RM, Sankarasubramanian A (2000) Regional calibration of a watershed model. Hydrolog Sci J 45(5):689–706CrossRefGoogle Scholar
  22. Gupta HV, Kling H (2011) On typical range, sensitivity, and normalization of mean squared error and Nash–Sutcliffe efficiency type metrics. Water Resour Res 47:W10601. CrossRefGoogle Scholar
  23. Gupta HV, Sorooshian S, Yapo PO (1998) Toward improved calibration of hydrologic models: multiple and noncommensurable measures of information. Water Resour Res 34(4):751–763CrossRefGoogle Scholar
  24. Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: implications for improving hydrological modelling. J Hydrol 377:80–91CrossRefGoogle Scholar
  25. Harmel RD, Smith PK, Migliaccio KW (2010) Modifying goodness-of-fit indicators to incorporate both measurement and model uncertainty in model calibration and validation. Trans. ASABE 53:55–63CrossRefGoogle Scholar
  26. Houghton-Carr HA (1999) Assessment criteria for simple conceptual daily rainfall-runoff models. J Hydrol Sci 44(2):237–261CrossRefGoogle Scholar
  27. Huang ZP (2003) Hydrologic statistics. The press of Hohai University, Nanjing (In Chinese) Google Scholar
  28. Jain SH, Sudheer KP (2008) Fitting of hydrologic models: a close look at the Nash–Sutcliffe index. J Hydrol Eng 13(10):981–986CrossRefGoogle Scholar
  29. Jiang C, Xiong LH, Guo SL, Xia J, Xu CY (2017) A process-based insight into nonstationarity of the probability distribution of annual runoff. Water Resour Res 53(5):4214–4235CrossRefGoogle Scholar
  30. Kapetanios G, Papailias F (2011) Block bootstrap and long memory. Working Paper 679, Queen Mary University of LondonGoogle Scholar
  31. Kiem AS, Verdon-Kidd DC (2011) Steps toward “useful” hydroclimatic scenarios for water resource management in the Murray-Darling Basin. Water Resour Res 47:W00G06. CrossRefGoogle Scholar
  32. Klemeš V (1986) Operational testing of hydrological simulation models. Hydrol Sci J 31(1):13–24. CrossRefGoogle Scholar
  33. Krause P, Boyle DP, Bäse F (2005) Comparison of different efficiency criteria for hydrological model assessment. Adv Geosci 29(5):89–97CrossRefGoogle Scholar
  34. Krogh SA, Pomeroy JW, Marsh P (2017) Diagnosis of the hydrology of a small Arctic basin at the tundra-taiga transition using a physically based hydrological model. J Hydrol 550:685–703CrossRefGoogle Scholar
  35. Legates DR, McCabe GJ (1999) Evaluating the use of ‘‘goodness-of-fit’’ measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241. CrossRefGoogle Scholar
  36. Liu RY, Singh K (1992) Moving blocks jackknife and bootstrap capture weak dependence. In: Le Page R, Billard L (eds) Exploring the limits of bootstrap. Wiley, New York, pp 225–248Google Scholar
  37. Loague K, Green RE (1991) Statistical and graphical methods for evaluating solute transport models: overview and application. J Contam Hydrol 7:51–73CrossRefGoogle Scholar
  38. Martinez GF, Gupta HV (2010) Toward improved identification of hydrological models: a diagnostic evaluation of the “abcd” monthly water balance model for the conterminous United States. Water Resour Res 46:W08507. CrossRefGoogle Scholar
  39. Mathevet T, Michel C, Andreassian V, Perrin C (2006) A bounded version of the Nash–Sutcliffe criterion for better model assessment on large sets of basins. In: Andréassian V, Hall A, Chahinian N, Schaake J (eds) Large sample basin experiment for hydrological model parameterization: results of the model parameter experiment—MOPEX. IAHS Publ, Wallingford, p 567Google Scholar
  40. McCuen RH, Snyder WM (1975) A proposed index for comparing hydrographs. Water Resour Res 11(6):1021–1024CrossRefGoogle Scholar
  41. McCuen RH, Knightm Z, Cutter AG (2006) Evaluation of the Nash–Sutcliffe efficiency index. J Hydrol Eng 11(6):597–602CrossRefGoogle Scholar
  42. Merz R, Parajka J, Blöschl G (2011) Time stability of catchment model parameters: implications for climate impact analyses. Water Resour Res 47:W02531. CrossRefGoogle Scholar
  43. Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50(3):885–900CrossRefGoogle Scholar
  44. Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models. Part I: a discussion of principles. J Hydrol 10(3):282–290CrossRefGoogle Scholar
  45. Nkiaka E, Nawaz NR, Lovett JC (2017) Effect of single and multi-site calibration techniques on hydrological model performance, parameter estimation and predictive uncertainty: a case study in the Logone catchment. Stoch Environ Res Risk Assess, Lake Chad basin. Google Scholar
  46. Pachepsky YA, Martinez G, Pan F, Wagener T, Nicholson T (2016) Evaluating hydrological model performance using information theory-based metrics. Hydrol Earth Syst Sci Discuss. Google Scholar
  47. Politis DN, Romano JP (1994) The stationary bootstrap. J Am Stat Assoc 89:1303–1313CrossRefGoogle Scholar
  48. Politis DN, White H (2004) Automatic block-length selection for the dependent bootstrap. Econom Rev 23:53–70CrossRefGoogle Scholar
  49. Reusser DE, Blume T, Schaefli B, Zehe E (2009) Analysing the temporal dynamics of model performance for hydrological models. Hydrol Earth Syst Sci 13(7):999–1018CrossRefGoogle Scholar
  50. Ritter A, Muñoz-Carpena R (2013) Performance evaluation of hydrological models: statistical significance for reducing subjectivity in goodness-of-fit assessments. J Hydrol 480(3):33–45CrossRefGoogle Scholar
  51. Santhi C, Arnold JG, Williams JR, Dugas WA, Srinivasan R, Hauck LM (2001) Validation of the SWAT model on a large river basin with point and nonpoint sources. J Am Water Resour Assoc 37:1169–1188CrossRefGoogle Scholar
  52. Schaefli B, Gupta HV (2007) Do Nash values have value? Hydrol Processes 21:2075–2080CrossRefGoogle Scholar
  53. Thomas HA (1981) Improved methods for national water assessment: final report USGS Water Resources Contract WR15249270, Harvard University, Cambridge, Massachusetts, p 44Google Scholar
  54. Thomas HA, Marin CM, Brown MJ, Fiering MB (1983) Methodology for water resource assessment. Report NTIS 84-124163, to US Geological Survey, National. Tech Info Serv, Springfield. Virginia, USAGoogle Scholar
  55. Thornthwaite CW (1948) An approach toward a rational classification of climate. Geogr Rev 38:55–94. CrossRefGoogle Scholar
  56. Van Liew MW, Arnold JG, Garbrecht JD (2003) Hydrologic simulation on agricultural watersheds: choosing between two models. Trans ASAE 46:1539–1551CrossRefGoogle Scholar
  57. Wang XY, Yang T, Krysanova V, Yu ZB (2015) Assessing the impact of climate change on flood in an alpine catchment using multiple hydrological models. Stoch Environ Res Risk Assess 29(8):2143–2158CrossRefGoogle Scholar
  58. Westra S, Thyer M, Leonard M, Kavetski D, Lambert M (2014) A strategy for diagnosing and interpreting hydrological model nonstationarity. Water Resour Res 50:5090–5113. CrossRefGoogle Scholar
  59. Willmott CJ (1981) On the validation of models. Phys Geogr 2:184–194Google Scholar
  60. Willmott CJ, Ackleson SG, Davis RE, Feddema JJ, Klink KM, Legates DR, O’Donnell J, Rowe CM (1985) Statistic for the evaluation and comparison of models. J Geophys 90:8995–9005CrossRefGoogle Scholar
  61. Willmott CJ, Robeson SM, Matsuura K (2012) Short communication a refined index of model performance. Int J Climatol 32:2088–2094CrossRefGoogle Scholar
  62. Xu CY (2001) Statistical analysis of parameters and residuals of a conceptual water balance model methodology and case study. Water Resour Manag 15(2):75–92CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Dedi Liu
    • 1
  • Shenglian Guo
    • 1
  • Zhaoli Wang
    • 2
  • Pan Liu
    • 1
  • Xixuan Yu
    • 3
  • Qin Zhao
    • 1
  • Hui Zou
    • 1
  1. 1.State Key Laboratory of Water Resources and Hydropower Engineering ScienceWuhan UniversityWuhanChina
  2. 2.The State Key Laboratory of Subtropical Building ScienceSouth China University of TechnologyGuangzhouChina
  3. 3.Agricultural and Environmental SciencesMcGill UniversitySte. Anne de BellevueCanada

Personalised recommendations