Abstract
Real-time error correction is an effective measure to improve forecast accuracy. This paper develops a real-time error correction model based on machine learning (ML) ensemble method, including the establishment of base learners, heterogeneity test, and Stacking combination. Then the Copula-based Bayesian processor of forecast (BPF) is adopted for probabilistic forecasts, which quantitatively describes the uncertainty of the correction results. Finally, the model performance is comprehensively evaluated from accuracy and stability, with deterministic metrics used to evaluate the correction effect and uncertain metrics for assessing probabilistic forecasts. The proposed model is applied to the multireservoir system in the Huai River Basin, and the results reveal the following: (1) Unlike the single ML algorithm with the performance of oscillations, the Stacking ensemble method can aggregate the advantages of multiple learners, showing robust correction effect and high adaptability across all data samples. (2) The forecasting error is significantly reduced by the Stacking method, with the average Nash–Sutcliffe efficiency coefficient (\(NSE\)) value increasing above 0.9, which is 4.93% higher than that of the autoregressive (AR) method. The greater superiority is also shown in the remaining evaluation metric values. Moreover, as the lead time increases, the performance of the stacking method tends to have a slower decline trend than the AR method. (3) The changes in the structure of the Stacking method have a relatively small influence on the forecast uncertainty, with all the Containing ratio (\(CR\)) values over 80% for different samples. The flexible combination of ML algorithms in the ensemble method will not add additional uncertainty factors and ensure the stability of the correction performance. The framework for real-time error correction and its uncertainty assessment has overall optimal correction performance with less observed data and lower computational cost required, which is promising for further improving the accuracy and reliability of real-time flood forecasting.
Similar content being viewed by others
References
Adnan RM, Liang ZM, Heddam S, Zounemat-Kermani M, Kisi O, Li BQ (2020) Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J Hydrol. https://doi.org/10.1016/j.jhydrol.2019.124371
Adnan RM, Petroselli A, Heddam S, Santos C, Kisi O (2021) Short term rainfall-runoff modelling using several machine learning methods and a conceptual event-based model. Stoch Environ Res Risk Assess 35(3):597–616. https://doi.org/10.1007/s00477-020-01910-0
Ali M, Deo RC, Downs NJ, Maraseni T (2018) Multi-stage hybridized online sequential extreme learning machine integrated with Markov Chain Monte Carlo copula-Bat algorithm for rainfall forecasting. Atmos Res 213(NOV):450–464. https://doi.org/10.1016/j.atmosres.2018.07.005
Ali M, Prasad R, Xiang Y, Yaseen ZM (2020) Complete ensemble empirical mode decomposition hybridized with random forest and Kernel ridge regression model for monthly rainfall forecasts. J Hydrol 584:124647. https://doi.org/10.1016/j.jhydrol.2020.124647
Alves A (2017) Stacking machine learning classifiers to identify Higgs bosons at the LHC. J Instrum. https://doi.org/10.1088/1748-0221/12/05/T05005
Bao W, Si W, Qu S (2014) Flow updating in real-time flood forecasting based on runoff correction by a dynamic system response curve. J Hydrol Eng 19(4):747–756. https://doi.org/10.1061/(ASCE)HE.1943-5584.0000848
Baswardono W, Kurniadi D, Mulyani A, Arifin DM (2019) Comparative analysis of decision tree algorithms: Random forest and C4.5 for airlines customer satisfaction classification. In: 4th annual applied science and engineering conference, 2019, vol 1402. https://doi.org/10.1088/1742-6596/1402/6/066055
Bhusal A, Parajuli U, Regmi S, Kalra A (2022) Application of Machine Learning and Process-Based Models for Rainfall-Runoff Simulation in DuPage River Basin, Illinois. Hydrology 9(7):117. https://doi.org/10.3390/hydrology9070117
Breiman L (2001) Random forests. Mach Learn 45(1):5–32. https://doi.org/10.1023/A:1010933404324
Chen L, Singh VP, Lu W, Zhang J, Zhou J, Guo S (2016) Streamflow forecast uncertainty evolution and its effect on real-time reservoir operation. J Hydrol 540:712–726. https://doi.org/10.1016/j.jhydrol.2016.06.015
Chen J, Yin J, Zang L, Zhang T, Zhao M (2019) Stacking machine learning model for estimating hourly PM2.5 in China based on Himawari 8 aerosol optical depth data. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2019.134021
Choubin B, Darabi H, Rahmati O, Sajedi-Hosseini F, Klve B (2018) River suspended sediment modelling using the CART model: a comparative study of machine learning techniques. Sci Total Environ 615:272–281. https://doi.org/10.1016/j.scitotenv.2017.09.293
Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651(PT.2):2087–2096. https://doi.org/10.1016/j.scitotenv.2018.10.064
Coustau M, Ricci S, Borrell-Estupina V, Bouvier C, Thual O (2013) Benefits and limitations of data assimilation for discharge forecasting using an event-based rainfall-runoff model. Nat Hazards Earth Syst Sci 13(3):583–596. https://doi.org/10.5194/nhess-13-583-2013
Darbandsari P, Coulibaly P (2020) Introducing entropy-based Bayesian model averaging for streamflow forecast. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.125577
Dawson CW, Abrahart RJ, See LM (2007) HydroTest: a web-based toolbox of evaluation metrics for the standardised assessment of hydrological forecasts. Environ Modell Softw 22(7):1034–1052. https://doi.org/10.1016/j.envsoft.2006.06.008
Demsar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30
Deng C, Liu P, Liu Y, Wu Z, Wang D (2015) Integrated hydrologic and reservoir routing model for real-time water level forecasts. J Hydrol Eng. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001138
Dibike YB, Velickov S, Solomatine D, Abbott MB (2001) Model induction with support vector machines: Introduction and applications. J Comput Civil Eng 15(3):208–216. https://doi.org/10.1061/(ASCE)0887-3801(2001)15:3(208)
Dietterich TG (2000) Ensemble methods in machine learning. In: Paper presented at the International workshop on multiple classifier systems, Springer, Berlin. https://doi.org/10.1007/3-540-45014-9_1
Dou J, Yunus AP, Dieu TB, Merghadi A, Sahana M, Zhu Z, Chen C, Han Z, Binh TP (2020) Improved landslide assessment using support vector machine with bagging, boosting, and stacking ensemble machine learning framework in a mountainous watershed, Japan. Landslides 17(3):641–658. https://doi.org/10.1007/s10346-019-01286-5
Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63(1):3–42. https://doi.org/10.1007/s10994-006-6226-1
Graczyk M, Lasota T, Trawinski B, Trawinski K (2010) Comparison of bagging, boosting and stacking ensembles applied to real estate appraisal. In: Paper presented at the 2nd Asian Conference on Intelligent Information and Database Systems (ACIIDS), Hue City, VIETNAM. https://doi.org/10.1007/978-3-642-12101-2_35
Huang J, Ko K, Shu M, Hsu B (2020) Application and comparison of several machine learning algorithms and their integration models in regression problems. Neural Comput Appl 32(10SI):5461–5469. https://doi.org/10.1007/s00521-019-04644-5
Krzysztofowicz R (1999) Bayesian theory of probabilistic forecasting via deterministic hydrologic model. Water Resour Res 35(9):2739–2750. https://doi.org/10.1029/1999WR900099
Lee H, Seo D, Koren V (2011) Assimilation of streamflow and in situ soil moisture data into operational distributed hydrologic models: Effects of uncertainties in the data and initial model soil moisture states. Adv Water Resour 34(12):1597–1615. https://doi.org/10.1016/j.advwatres.2011.08.012
Li X, Guo S, Pan L, Chen G (2010) Dynamic control of flood limited water level for reservoir operation by considering inflow uncertainty. J Hydrol 391(1–2):126–134. https://doi.org/10.1016/j.jhydrol.2010.07.011
Li Y, Ryu D, Western AW, Wang QJ, Robertson DE, Crow WT (2014) An integrated error parameter estimation and lag-aware data assimilation scheme for real-time flood forecasting. J Hydrol 519(D):2722–2736. https://doi.org/10.1016/j.jhydrol.2014.08.009
Liang Z, Huang Y, Singh VP, Hu Y, Li B, Wang J (2021) Multi-source error correction for flood forecasting based on dynamic system response curve method. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.125908
Liu Z, Guo S, Zhang H, Liu D, Yang G (2016) Comparative study of three updating procedures for real-time flood forecasting. Water Resour Manag 30(7):2111–2126. https://doi.org/10.1007/s11269-016-1275-0
Liu Z, Guo S, Xiong L, Xu CY (2018) Hydrological uncertainty processor based on a copula function. Hydrol Sci J 63(1):74–86. https://doi.org/10.1080/02626667.2017.1410278
Lu HF, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere. https://doi.org/10.1016/j.chemosphere.2020.126169
Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Modell Softw 15(1):101–124. https://doi.org/10.1016/S1364-8152(99)00007-9
Maulud D, Abdulazeez AM (2020) A review on linear regression comprehensive in machine learning. J Appl Sci Technol Trends 1(4):140–147. https://doi.org/10.38094/jastt1457
Mo R, Xu B, Zhong P, Zhu F, Huang X, Liu W, Xu S, Wang G, Zhang J (2021) Dynamic long-term streamflow probabilistic forecasting model for a multisite system considering real-time forecast updating through Spatio-temporal dependent error correction. J Hydrol. https://doi.org/10.1016/j.jhydrol.2021.126666
Mosaffa H, Sadeghi M, Hayatbini N, Gorooh VA, Asanjan AA, Nguyen P, Sorooshian S (2020) Spatiotemporal variations of precipitation over iran using the high-resolution and nearly four decades satellite-based PERSIANN-CDR Dataset. Remote Sens. https://doi.org/10.3390/rs12101584
Mosaffa H, Sadeghi M, Mallakpour I, Naghdyzadegan Jahromi M, Pourghasemi HR (2022) Application of machine learning algorithms in hydrology. In: Pourghasemi HR (ed) Computers in earth and environmental sciences. Elsevier, Amsterdam, pp 585–591. https://doi.org/10.1016/B978-0-323-89861-4.00027-0
Ni LL, Wang D, Wu JF, Wang YK, Tao YW, Zhang JY, Liu JF (2020) Streamflow forecasting using extreme gradient boosting model coupled with Gaussian mixture model. J Hydrol. https://doi.org/10.1016/j.jhydrol.2020.124901
Qin Y, Kavetski D, Kuczera G (2018) A robust gauss-newton algorithm for the optimization of hydrological models: benchmarking against industry-standard algorithms. Water Resour Res 54(11):9637–9654. https://doi.org/10.1029/2017WR022489
Raghavendra NS, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput 19:372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Sadeghi M, Nguyen P, Hsu KL, Sorooshian S (2020) Improving near real-time precipitation estimation using a U-Net convolutional neural network and geographical information. Environ Modell Softw. https://doi.org/10.1016/j.envsoft.2020.104856
Shafizadeh-Moghadam H, Valavi R, Shahabi H, Chapi K, Shirzadi A (2018) Novel forecasting approaches using combination of machine learning and statistical models for flood susceptibility mapping. J Environ Manage 217:1–11. https://doi.org/10.1016/j.jenvman.2018.03.089
Si W, Gupta HV, Bao W, Jiang P, Wang W (2019) Improved dynamic system response curve method for real-time flood forecast updating. Water Resour Res 55(9):7493–7519. https://doi.org/10.1029/2019WR025520
Smith PJ, Beven KJ, Weerts AH, Leedal D (2012) Adaptive correction of deterministic models to produce probabilistic forecasts. Hydrol Earth Syst Sci 16(8):2783–2799. https://doi.org/10.5194/hess-16-2783-2012
Snoek J, Larochelle H, Adams RP (2012) Practical Bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst, 4
Sun W, Li ZQ (2020) Hourly PM2.5 concentration forecasting based on feature extraction and stacking-driven ensemble model for the winter of the Beijing-Tianjin-Hebei area. Atmos Pollut Res 11(6):110–121. https://doi.org/10.1016/j.apr.2020.02.022
Sun Y, Bao W, Jiang P, Ji X, Gao S, Xu Y, Zhang Q, Si W (2018) Development of multivariable dynamic system response curve method for real-time flood forecasting correction. Water Resour Res 54(7):4730–4749. https://doi.org/10.1029/2018WR022555
Sun Y, Bao W, Valk K, Brauer CC, Sumihar J, Weerts AH (2020) Improving Forecast Skill of Lowland Hydrological Models Using Ensemble Kalman Filter and Unscented Kalman Filter. Water Resour Res. https://doi.org/10.1029/2020WR027468
Wang JH, Lin GF, Chang MJ, Huang IH, Chen YR (2019) Real-time water-level forecasting using dilated causal convolutional neural networks. Water Resour Manage 33(11):3759–3780. https://doi.org/10.1007/s11269-019-02342-4
Wu YL, Ke YT, Chen Z, Liang SY, Zhao HL, Hong HY (2020) Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. CATENA. https://doi.org/10.1016/j.catena.2019.104396
Xie X, Zhang D (2010) Data assimilation for distributed hydrological catchment modeling via ensemble Kalman filter. Adv Water Resour 33(6):678–690. https://doi.org/10.1016/j.advwatres.2010.03.012
Xiong LH, O’Connor KM (2002) Comparison of four updating models for real-time river flow forecasting. Hydrol Sci J J Des Sci Hydrol 47(4):621–639. https://doi.org/10.1080/02626660209492964
Xu B, Huang X, Mo R, Zhong P, Lu Q, Zhang H, Si W, Xiao J, Sun Y (2021) Integrated real-time flood risk identification, analysis, and diagnosis model framework for a multireservoir system considering temporally and spatially dependent forecast uncertainties. J Hydrol. https://doi.org/10.1016/j.jhydrol.2021.126679
Yang J, Reichert P, Abbaspour KC (2007) Bayesian uncertainty analysis in distributed hydrologic modeling: a case study in the Thur River basin (Switzerland). Water Resour Res. https://doi.org/10.1029/2006WR005497
Yaseen ZM, Naganna SR, Sa’Adi Z, Samui P, Ghorbani MA, Salih SQ, Shahid S (2020) hourly river flow forecasting: application of emotional neural network versus multiple machine learning paradigms. Water Resour Manage 34(3):1075–1091. https://doi.org/10.1007/s11269-020-02484-w
Yu PS, Yang TC, Chen SY, Kuo CM, Tseng HW (2017) Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. J Hydrol 552:92–104. https://doi.org/10.1016/j.jhydrol.2017.06.020
Zhang X, Liu P, Cheng L, Liu Z, Zhao Y (2018) A back-fitting algorithm to improve real-time flood forecasting. J Hydrol 562:140–150. https://doi.org/10.1016/j.jhydrol.2018.04.051
Zhou ZH (2012) Ensemble methods: foundations and algorithms. Chapmall & Hall, London
Zhu S, Luo XG, Yuan XH, Xu ZY (2020) An improved long short-term memory network for streamflow forecasting in the upper Yangtze River. Stoch Environ Res Risk Assess 34(9):1313–1329. https://doi.org/10.1007/s00477-020-01766-4
Zounemat-Kermani M, Batelaan O, Fadaee M, Hinkelmann R (2021) Ensemble machine learning paradigms in hydrology: a review. J Hydrol 598:126266. https://doi.org/10.1016/j.jhydrol.2021.126266
Funding
This work was supported by the National Natural Science Foundation of China (Grant Nos. 52079037, 52009029); the Fundamental Research Funds for the Central Universities (Grant No. B200202032); the China Postdoctoral Science Foundation (Grant No. 2020T130169).
Author information
Authors and Affiliations
Contributions
Methodology, Xu C. J.; conceptualization, Zhong P. A.; software, Xu C. J. and Zhu F. L.; validation, Yang L. H. and Wang S.; formal analysis, Xu C. J. and Wang S.; writing-original draft, Xu C. J. and Wang Y. W.; writing-review and editing, Yang L. H.; visualization, Wang Y. W.; supervision, Zhong P. A.; funding acquisition, Zhong P. A.
Corresponding author
Ethics declarations
Conflict of interest
There is no conflict of interests regarding the publication of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xu, C., Zhong, Pa., Zhu, F. et al. Real-time error correction for flood forecasting based on machine learning ensemble method and its uncertainty assessment. Stoch Environ Res Risk Assess 37, 1557–1577 (2023). https://doi.org/10.1007/s00477-022-02336-6
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-022-02336-6