Skip to main content
Log in

Hybrid Iterative and Tree-Based Machine Learning Algorithms for Lake Water Level Forecasting

  • Published:
Water Resources Management Aims and scope Submit manuscript

Abstract

Accurate forecasting of lake water level (WL) fluctuations is essential for effective development and management of water resource systems. This study applies the Random Tree (RT) algorithm and the Iterative Classifier Optimizer (ICO), which is based on the Alternating Model Tree (AMT) as an iterative regressor, to forecast WL up to three months ahead for Lake Superior and Lake Michigan. To enhance the accuracy of these machine learning (ML) algorithms, their forecasts are combined using ensemble algorithms such as Bagging (BA) or Additive Regression (AR), resulting in BA-RT, BA-ICO, AR-RT, and AR-ICO models. The most effective inputs for WL forecasting are determined using a nonlinear input variable selection method called partial mutual information selection (PMIS), considering lagged WL values up to 24 months. Forecasting models for each lake are developed using a training subset spanning from 1918 to 1988. The models' parameters are tuned using a validation subset covering 1989 to 2003. Finally, model performance is evaluated using a testing subset from 2004 to 2018. Statistical metrics and visual analysis with testing data are used to validate the performance of the developed algorithms. Additionally, results obtained from Seasonal Autoregressive Integrated Moving Average (SARIMA) time series models serve as benchmarks for comparison with ML results. The findings demonstrate that ML models outperform SARIMA models in terms of error values: RMSPE ranges between 3.9% and 11.3% for Lake Michigan and between 2.3% and 9.2% for Lake Superior respectively. Furthermore, both hybrid ensemble algorithms improve individual ML algorithm performance; however, the BA algorithm achieves better overall performance compared to the AR algorithm. As a novel approach in forecasting problems, ICO algorithm based on AMT shows great potential in generating accurate multistep forecasts of lake WL. It demonstrates high generalization and low variance compared to the RT model.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Availability of Data

Data are available from the authors upon reasonable request.

References

  • Altunkaynak A (2007) Forecasting surface water level fluctuations of Lake Van by artificial neural networks. Water Resour Manag 21:399–408

    Article  Google Scholar 

  • Altunkaynak A (2014) Predicting water level fluctuations in lake michigan-huron using wavelet-expert system methods. Water Resour Manag 28:2293–2314

    Article  Google Scholar 

  • Altunkaynak A, Şen Z (2007) Fuzzy logic model of lake water level fluctuations in Lake Van, Turkey. Theoret Appl Climatol 90:227–233

    Article  Google Scholar 

  • Arndt S, Turvey C, Andreasen NC (1999) Correlating and predicting psychiatric symptom ratings: Spearmans r versus Kendalls tau correlation. J Psychiatr Res 33(2):97–104

    Article  Google Scholar 

  • Baker RE, Peña JM, Jayamohan J, Jérusalem A (2018) Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol Let 14(5):20170660

    Article  Google Scholar 

  • Barzegar R, Aalami MT, Adamowski J (2021) Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting. J Hydrol 598:126196

    Article  Google Scholar 

  • Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran. Stoch Environ Res Risk Assess 30(7):1797–1819

    Article  Google Scholar 

  • Barzegar R, Fijani E, Moghaddam AA, Tziritis E (2017) Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. Sci Total Environ 599–600:20–31

    Article  Google Scholar 

  • Baskin II, Marcou G, Horvath D, Varnek A (2017) Bagging and boosting of regression models. Tutor Chemoinformatics 28:249–255

    Article  Google Scholar 

  • Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36:105–139

    Article  Google Scholar 

  • Bonakdari H, Ebtehaj I, Samui P, Gharabaghi B (2019) Lake water-level fluctuations forecasting using minimax probability machine regression, relevance vector machine, gaussian process regression, and extreme learning machine. Water Resour Manag 33(11):3965–3984

    Article  Google Scholar 

  • Bowden GJ, Dandy GC, Maier HR (2005) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301(1–4):75–92

    Article  Google Scholar 

  • Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. John Wiley & Sons

    Google Scholar 

  • Breiman L (1996) Bagging predictors. Mach Learn 24:123–140

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  Google Scholar 

  • Bui DT, Ho T-C, Pradhan B, Pham B-T, Nhu V-H, Revhaug I (2016) GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environmental Earth Sciences 75:1101

    Article  Google Scholar 

  • Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612

    Article  Google Scholar 

  • Castán-Lascorz MA, Jiménez-Herrera P, Troncoso A, Asencio-Cortés G (2022) A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting. Inf Sci 586:611–627

    Article  Google Scholar 

  • Cheng C-T, Zhao M-Y, Chau KW, Wu X-Y (2006) Using genetic algorithm and TOPSIS for Xinanjing model calibration with a single procedure. J Hydrol 316(1–4):129–140

    Article  Google Scholar 

  • Çimen M, Kisi O (2009) Comparison of two different data-driven techniques in modeling lake level fluctuations in Turkey. J Hydrol 378(3–4):253–262

    Article  Google Scholar 

  • Coulibaly P (2010) Reservoir computing approach to Great Lakes water level forecasting. J Hydrol 381(1–2):76–88

    Article  Google Scholar 

  • de Lima TP, da Silva AJ, Ludermir TB, de Oliveira WR (2014) An automatic methodology for construction of multi-classifier systems based on the combination of selection and fusion. Prog Artif Intell 2(4):205–215

    Article  Google Scholar 

  • Drmota M, Gittenberger B (1997) On the profile of random trees. Random Struct Algorithms 10(4):421–451

    Article  Google Scholar 

  • Faramarzzadeh M, Ehsani MR, Akbari M, Rahimi R, Moghaddam M, Behrangi A, Klöve B, Torabi Haghighi A, Oussalah M (2023) Application of machine learning and remote sensing for gap-filling daily precipitation data of a sparsely gauged basin in East Africa. Environ Process 10:8

    Article  Google Scholar 

  • Frank E, Mayo M, Kramer S (2015) Alternating model trees. Proc Ann ACM Symp Appl Comput 871–878. https://doi.org/10.1145/2695664.2695848

  • Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378

    Article  Google Scholar 

  • Friedman JH, Meulman JJ (2003) Multiple additive regression trees with application in epidemiology. Stat Med 22(9):1365–1381

    Article  Google Scholar 

  • Fry LM, Apps D, Gronewold AD (2020) Operational seasonal water supply and water level forecasting for the laurentian great lakes. J Water Resour Plan Manag 146(9):04020072

    Article  Google Scholar 

  • Fuller K, Shear H (1995) The Great Lakes: An environmental atlas and resource book. U. S. Environmental Protection Agency/Government of Canada; 3rd edition. 46 p

  • Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51

    Article  Google Scholar 

  • Ghorbani MA, Deo RC, Karimi V, Kashani MH, Ghorbani S (2019) Design and implementation of a hybrid MLP-GSA model with multi-layer perceptron-gravitational search algorithm for monthly lake water level forecasting. Stoch Env Res Risk Assess 33(1):125–147

    Article  Google Scholar 

  • Gronewold AD, Clites AH, Hunter TS, Stow CA (2011) An appraisal of the Great Lakes advanced hydrologic prediction system. J Great Lakes Res 37(3):577–583

    Article  Google Scholar 

  • Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol 377(1–2):80–91

    Article  Google Scholar 

  • Hough JL (1968) Great lakes (North America). In: Geomorphology. Encyclopedia of Earth Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31060-6_170

  • Huang A, Rao YR, Lu Y, Zhao J (2010) Hydrodynamic modeling of Lake Ontario: an intercomparison of three models. J Geophys Res 115(12):C12076

    Google Scholar 

  • Iwok IA, Okpe AS (2016) A comparative study between univariate and multivariate linear stationary time series models. Am J Math Stat 6(5):203–212

    Google Scholar 

  • Kebede S, Travi Y, Alemayehu T, Marc V (2006) Water balance of Lake Tana and its sensitivity to fluctuations in rainfall, Blue Nile basin, Ethiopia. J Hydrol 316(1–4):233–247

    Article  Google Scholar 

  • Khatibi R, Ghorbani MA, Naghipour L, Jothiprakash V, Fathima TA, Fazelifard MH (2014) Inter-comparison of time series models of lake levels predicted by several modeling strategies. J Hydrol 511:530–545

    Article  Google Scholar 

  • Khatibi R, Ghorbani MA, Naghshara S, Aydin HARUN, Karimi V (2020) A framework for ‘Inclusive Multiple Modelling’with critical views on modelling practices–Applications to modelling water levels of Caspian Sea and Lakes Urmia and Van. J Hydrol 587:124923

    Article  Google Scholar 

  • Kisi O, Shiri J, Nikoofar B (2012) Forecasting daily lake levels using artificial intelligence approaches. Comput Geosci 41:169–180

    Article  Google Scholar 

  • Knoben WJ, Freer JE, Woods RA (2019) Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrol Earth Syst Sci 23(10):4323–4331

    Article  Google Scholar 

  • Lees MJ (2000) Data-based mechanistic modelling and forecasting of hydrological systems. J Hydroinf 2(1):15–34

    Article  Google Scholar 

  • Meshram SG, Safari MJS, Khosravi K, Meshram C (2021) Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. Environ Sci Pollut Res 28(9):11637–11649

    Article  Google Scholar 

  • Mihelich M, Dognin C, Shu Y, Blot M (2020) A characterization of mean squared error for estimator with bagging. Int Conf Artif Intell Stat 288–297. PMLR

  • Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Am Soc Agric Biol Eng 50(3):885–900

    Google Scholar 

  • Noori R, Karbassi AR, Moghaddamnia A, Han D, Zokaei-Ashtiani MH, Farokhnia A, Gousheh MG (2011) Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol 401(3–4):177–189

    Article  Google Scholar 

  • Omran BA, Chen Q, Jin R (2016) Comparison of data mining techniques for predicting compressive strength of environmentally friendly concrete. J Comput Civ Eng 30(6):04016029

    Article  Google Scholar 

  • Pak I, Teh PL (2016) Machine learning classifiers: Evaluation of the performance in online reviews. Indian J Sci Technol 9(45):1–9

    Article  Google Scholar 

  • Palani S, Liong SY, Tkalich P (2008) An ANN application for water quality forecasting. Mar Pollut Bull 56:1586–1597

    Article  Google Scholar 

  • Petropoulos F, Hyndman RJ, Bergmeir C (2018) Exploring the sources of uncertainty: Why does bagging for time series forecasting work? Eur J Oper Res 268(2):545–554

    Article  Google Scholar 

  • Ping X, Yang F, Zhang H, Xing C, Zhang W, Wang Y (2022) Evaluation of hybrid forecasting methods for organic Rankine cycle: Unsupervised learning-based outlier removal and partial mutual information-based feature selection. Appl Energy 311:118682

    Article  Google Scholar 

  • Quilty J, Adamowski J (2018) Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J Hydrol 563:336–353

    Article  Google Scholar 

  • Quilty J, Adamowski J, Khalil B, Rathinasamy M (2016) Bootstrap rank-ordered conditional mutual information (broCMI): A nonlinear input variable selection method for water resources modeling. Water Resour Res 52(3):2299–2326

    Article  Google Scholar 

  • Ribeiro MHDM, Coelho LDS (2020) Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 86:105837

    Article  Google Scholar 

  • Romanowicz RJ, Young PC, Beven KJ, Pappenberger F (2008) A data based mechanistic approach to nonlinear flood routing and adaptive flood level forecasting. Adv Water Resour 31(8):1048–1056

    Article  Google Scholar 

  • Saad IA (2018) An efficient classification algorithms for image retrieval based color and texture features. J Al-Qadisiyah Comput Sci Math 10(1):42–53

    Google Scholar 

  • Salam R, Islam ARMT (2020) Potential of RT, Bagging and RS ensemble learning algorithms for reference evapotranspiration prediction using climatic data-limited humid region in Bangladesh. J Hydrol 590:125241

    Article  Google Scholar 

  • Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons

    Book  Google Scholar 

  • Sharma A (2000) Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for system predictor identification. J Hydrol 239(1–4):232–239

    Article  Google Scholar 

  • Shiri J, Shamshirband S, Kisi O, Karimi S, Bateni SM, Hosseini Nezhad SH, Hashemi A (2016) Prediction of water-level in the Urmia Lake using the extreme learning machine approach. Water Resour Manag 30(14):5217–5229

    Article  Google Scholar 

  • Smith PJ, Panziera L, Beven KJ (2014) Forecasting flash floods using data-based mechanistic models and NORA radar rainfall forecasts. Hydrol Sci J 59(7):1403–1417

    Article  Google Scholar 

  • Taormina R, Chau K-W (2015) Data-driven input variable selection for rainfall–runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines. J Hydrol 529:1617–1632

    Article  Google Scholar 

  • Tiwari MK, Chatterjee C (2011) A new wavelet–bootstrap–ANN hybrid model for daily discharge forecasting. J Hydroinf 13(3):500–519

    Article  Google Scholar 

  • Yadav B, Eliza K (2017) A hybrid wavelet-support vector machine model for prediction of Lake water level fluctuations using hydro-meteorological data. Measurement 103:294–301

    Article  Google Scholar 

  • Yan S, Wang X, Zhang Y, Liu D, Yi Y, Li C, Liu Q, Yang Z (2020) A hybrid PCA-GAM model for investigating the spatiotemporal impacts of water level fluctuations on the diversity of benthic macroinvertebrates in Baiyangdian Lake, North China. Ecol Indic 116:106459

    Article  Google Scholar 

  • Yarar A, Onucyildiz M, Copty NK (2009) Modelling level change in lakes using neuro-fuzzy and artificial neural networks. J Hydrol 365(3–4):329–334

    Article  Google Scholar 

  • Young PC, Romanowicz RJ, Beven KJ (2014) A data-based mechanistic modelling approach to real-time flood forecasting. Applied Uncertainty Analysis for Flood Risk Management, edited by: Beven, KJ and Hall, JW, Imperial College Press: London, 407–461

  • Zhu S, Hrnjica B, Ptak M, Choiński A, Sivakumar B (2020a) Forecasting of water level in multiple temperate lakes using machine learning models. J Hydrol 585:124819

    Article  Google Scholar 

  • Zhu S, Lu H, Ptak M, Dai J, Ji Q (2020b) Lake water-level fluctuation forecasting using machine learning models: a systematic review. Environ Sci Pollut Res 27:44807–44819

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to express their gratitude to Dr. Rahim Barzegar, Dr. John Quilty, Prof. Jan Adamowski, and Dr. Homa Kheirollahpour for their support during the study.

Author information

Authors and Affiliations

Authors

Contributions

Elham Fijani: Writing – original draft, Conceptualization, Visualization, review & editing. Khabat Khosravi: Formal analysis, Software, review & editing.

Corresponding author

Correspondence to Elham Fijani.

Ethics declarations

Ethical Approval

Not applicable because this article does not contain any studies with human or animal subjects.

Consent to Participate

The authors declare that they are aware and consent with their participation on this paper.

Consent to Publish

The authors declare that they are consent with the publication of this paper.

Competing Interests

The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 2684 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fijani, E., Khosravi, K. Hybrid Iterative and Tree-Based Machine Learning Algorithms for Lake Water Level Forecasting. Water Resour Manage 37, 5431–5457 (2023). https://doi.org/10.1007/s11269-023-03613-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11269-023-03613-x

Keywords

Navigation