Abstract
Accurate forecasting of lake water level (WL) fluctuations is essential for effective development and management of water resource systems. This study applies the Random Tree (RT) algorithm and the Iterative Classifier Optimizer (ICO), which is based on the Alternating Model Tree (AMT) as an iterative regressor, to forecast WL up to three months ahead for Lake Superior and Lake Michigan. To enhance the accuracy of these machine learning (ML) algorithms, their forecasts are combined using ensemble algorithms such as Bagging (BA) or Additive Regression (AR), resulting in BA-RT, BA-ICO, AR-RT, and AR-ICO models. The most effective inputs for WL forecasting are determined using a nonlinear input variable selection method called partial mutual information selection (PMIS), considering lagged WL values up to 24 months. Forecasting models for each lake are developed using a training subset spanning from 1918 to 1988. The models' parameters are tuned using a validation subset covering 1989 to 2003. Finally, model performance is evaluated using a testing subset from 2004 to 2018. Statistical metrics and visual analysis with testing data are used to validate the performance of the developed algorithms. Additionally, results obtained from Seasonal Autoregressive Integrated Moving Average (SARIMA) time series models serve as benchmarks for comparison with ML results. The findings demonstrate that ML models outperform SARIMA models in terms of error values: RMSPE ranges between 3.9% and 11.3% for Lake Michigan and between 2.3% and 9.2% for Lake Superior respectively. Furthermore, both hybrid ensemble algorithms improve individual ML algorithm performance; however, the BA algorithm achieves better overall performance compared to the AR algorithm. As a novel approach in forecasting problems, ICO algorithm based on AMT shows great potential in generating accurate multistep forecasts of lake WL. It demonstrates high generalization and low variance compared to the RT model.
Similar content being viewed by others
Availability of Data
Data are available from the authors upon reasonable request.
References
Altunkaynak A (2007) Forecasting surface water level fluctuations of Lake Van by artificial neural networks. Water Resour Manag 21:399–408
Altunkaynak A (2014) Predicting water level fluctuations in lake michigan-huron using wavelet-expert system methods. Water Resour Manag 28:2293–2314
Altunkaynak A, Şen Z (2007) Fuzzy logic model of lake water level fluctuations in Lake Van, Turkey. Theoret Appl Climatol 90:227–233
Arndt S, Turvey C, Andreasen NC (1999) Correlating and predicting psychiatric symptom ratings: Spearmans r versus Kendalls tau correlation. J Psychiatr Res 33(2):97–104
Baker RE, Peña JM, Jayamohan J, Jérusalem A (2018) Mechanistic models versus machine learning, a fight worth fighting for the biological community? Biol Let 14(5):20170660
Barzegar R, Aalami MT, Adamowski J (2021) Coupling a hybrid CNN-LSTM deep learning model with a Boundary Corrected Maximal Overlap Discrete Wavelet Transform for multiscale Lake water level forecasting. J Hydrol 598:126196
Barzegar R, Adamowski J, Moghaddam AA (2016) Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran. Stoch Environ Res Risk Assess 30(7):1797–1819
Barzegar R, Fijani E, Moghaddam AA, Tziritis E (2017) Forecasting of groundwater level fluctuations using ensemble hybrid multi-wavelet neural network-based models. Sci Total Environ 599–600:20–31
Baskin II, Marcou G, Horvath D, Varnek A (2017) Bagging and boosting of regression models. Tutor Chemoinformatics 28:249–255
Bauer E, Kohavi R (1999) An empirical comparison of voting classification algorithms: bagging, boosting, and variants. Mach Learn 36:105–139
Bonakdari H, Ebtehaj I, Samui P, Gharabaghi B (2019) Lake water-level fluctuations forecasting using minimax probability machine regression, relevance vector machine, gaussian process regression, and extreme learning machine. Water Resour Manag 33(11):3965–3984
Bowden GJ, Dandy GC, Maier HR (2005) Input determination for neural network models in water resources applications. Part 1—background and methodology. J Hydrol 301(1–4):75–92
Box GE, Jenkins GM, Reinsel GC, Ljung GM (2015) Time series analysis: forecasting and control. John Wiley & Sons
Breiman L (1996) Bagging predictors. Mach Learn 24:123–140
Breiman L (2001) Random forests. Mach Learn 45:5–32
Bui DT, Ho T-C, Pradhan B, Pham B-T, Nhu V-H, Revhaug I (2016) GIS-based modeling of rainfall-induced landslides using data mining-based functional trees classifier with AdaBoost, Bagging, and MultiBoost ensemble frameworks. Environmental Earth Sciences 75:1101
Bui DT, Khosravi K, Tiefenbacher J, Nguyen H, Kazakis N (2020) Improving prediction of water quality indices using novel hybrid machine-learning algorithms. Sci Total Environ 721:137612
Castán-Lascorz MA, Jiménez-Herrera P, Troncoso A, Asencio-Cortés G (2022) A new hybrid method for predicting univariate and multivariate time series based on pattern forecasting. Inf Sci 586:611–627
Cheng C-T, Zhao M-Y, Chau KW, Wu X-Y (2006) Using genetic algorithm and TOPSIS for Xinanjing model calibration with a single procedure. J Hydrol 316(1–4):129–140
Çimen M, Kisi O (2009) Comparison of two different data-driven techniques in modeling lake level fluctuations in Turkey. J Hydrol 378(3–4):253–262
Coulibaly P (2010) Reservoir computing approach to Great Lakes water level forecasting. J Hydrol 381(1–2):76–88
de Lima TP, da Silva AJ, Ludermir TB, de Oliveira WR (2014) An automatic methodology for construction of multi-classifier systems based on the combination of selection and fusion. Prog Artif Intell 2(4):205–215
Drmota M, Gittenberger B (1997) On the profile of random trees. Random Struct Algorithms 10(4):421–451
Faramarzzadeh M, Ehsani MR, Akbari M, Rahimi R, Moghaddam M, Behrangi A, Klöve B, Torabi Haghighi A, Oussalah M (2023) Application of machine learning and remote sensing for gap-filling daily precipitation data of a sparsely gauged basin in East Africa. Environ Process 10:8
Frank E, Mayo M, Kramer S (2015) Alternating model trees. Proc Ann ACM Symp Appl Comput 871–878. https://doi.org/10.1145/2695664.2695848
Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data Anal 38(4):367–378
Friedman JH, Meulman JJ (2003) Multiple additive regression trees with application in epidemiology. Stat Med 22(9):1365–1381
Fry LM, Apps D, Gronewold AD (2020) Operational seasonal water supply and water level forecasting for the laurentian great lakes. J Water Resour Plan Manag 146(9):04020072
Fuller K, Shear H (1995) The Great Lakes: An environmental atlas and resource book. U. S. Environmental Protection Agency/Government of Canada; 3rd edition. 46 p
Galelli S, Humphrey GB, Maier HR, Castelletti A, Dandy GC, Gibbs MS (2014) An evaluation framework for input variable selection algorithms for environmental data-driven models. Environ Model Softw 62:33–51
Ghorbani MA, Deo RC, Karimi V, Kashani MH, Ghorbani S (2019) Design and implementation of a hybrid MLP-GSA model with multi-layer perceptron-gravitational search algorithm for monthly lake water level forecasting. Stoch Env Res Risk Assess 33(1):125–147
Gronewold AD, Clites AH, Hunter TS, Stow CA (2011) An appraisal of the Great Lakes advanced hydrologic prediction system. J Great Lakes Res 37(3):577–583
Gupta HV, Kling H, Yilmaz KK, Martinez GF (2009) Decomposition of the mean squared error and NSE performance criteria: Implications for improving hydrological modelling. J Hydrol 377(1–2):80–91
Hough JL (1968) Great lakes (North America). In: Geomorphology. Encyclopedia of Earth Science. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-31060-6_170
Huang A, Rao YR, Lu Y, Zhao J (2010) Hydrodynamic modeling of Lake Ontario: an intercomparison of three models. J Geophys Res 115(12):C12076
Iwok IA, Okpe AS (2016) A comparative study between univariate and multivariate linear stationary time series models. Am J Math Stat 6(5):203–212
Kebede S, Travi Y, Alemayehu T, Marc V (2006) Water balance of Lake Tana and its sensitivity to fluctuations in rainfall, Blue Nile basin, Ethiopia. J Hydrol 316(1–4):233–247
Khatibi R, Ghorbani MA, Naghipour L, Jothiprakash V, Fathima TA, Fazelifard MH (2014) Inter-comparison of time series models of lake levels predicted by several modeling strategies. J Hydrol 511:530–545
Khatibi R, Ghorbani MA, Naghshara S, Aydin HARUN, Karimi V (2020) A framework for ‘Inclusive Multiple Modelling’with critical views on modelling practices–Applications to modelling water levels of Caspian Sea and Lakes Urmia and Van. J Hydrol 587:124923
Kisi O, Shiri J, Nikoofar B (2012) Forecasting daily lake levels using artificial intelligence approaches. Comput Geosci 41:169–180
Knoben WJ, Freer JE, Woods RA (2019) Inherent benchmark or not? Comparing Nash-Sutcliffe and Kling-Gupta efficiency scores. Hydrol Earth Syst Sci 23(10):4323–4331
Lees MJ (2000) Data-based mechanistic modelling and forecasting of hydrological systems. J Hydroinf 2(1):15–34
Meshram SG, Safari MJS, Khosravi K, Meshram C (2021) Iterative classifier optimizer-based pace regression and random forest hybrid models for suspended sediment load prediction. Environ Sci Pollut Res 28(9):11637–11649
Mihelich M, Dognin C, Shu Y, Blot M (2020) A characterization of mean squared error for estimator with bagging. Int Conf Artif Intell Stat 288–297. PMLR
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Am Soc Agric Biol Eng 50(3):885–900
Noori R, Karbassi AR, Moghaddamnia A, Han D, Zokaei-Ashtiani MH, Farokhnia A, Gousheh MG (2011) Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol 401(3–4):177–189
Omran BA, Chen Q, Jin R (2016) Comparison of data mining techniques for predicting compressive strength of environmentally friendly concrete. J Comput Civ Eng 30(6):04016029
Pak I, Teh PL (2016) Machine learning classifiers: Evaluation of the performance in online reviews. Indian J Sci Technol 9(45):1–9
Palani S, Liong SY, Tkalich P (2008) An ANN application for water quality forecasting. Mar Pollut Bull 56:1586–1597
Petropoulos F, Hyndman RJ, Bergmeir C (2018) Exploring the sources of uncertainty: Why does bagging for time series forecasting work? Eur J Oper Res 268(2):545–554
Ping X, Yang F, Zhang H, Xing C, Zhang W, Wang Y (2022) Evaluation of hybrid forecasting methods for organic Rankine cycle: Unsupervised learning-based outlier removal and partial mutual information-based feature selection. Appl Energy 311:118682
Quilty J, Adamowski J (2018) Addressing the incorrect usage of wavelet-based hydrological and water resources forecasting models for real-world applications with best practices and a new forecasting framework. J Hydrol 563:336–353
Quilty J, Adamowski J, Khalil B, Rathinasamy M (2016) Bootstrap rank-ordered conditional mutual information (broCMI): A nonlinear input variable selection method for water resources modeling. Water Resour Res 52(3):2299–2326
Ribeiro MHDM, Coelho LDS (2020) Ensemble approach based on bagging, boosting and stacking for short-term prediction in agribusiness time series. Appl Soft Comput 86:105837
Romanowicz RJ, Young PC, Beven KJ, Pappenberger F (2008) A data based mechanistic approach to nonlinear flood routing and adaptive flood level forecasting. Adv Water Resour 31(8):1048–1056
Saad IA (2018) An efficient classification algorithms for image retrieval based color and texture features. J Al-Qadisiyah Comput Sci Math 10(1):42–53
Salam R, Islam ARMT (2020) Potential of RT, Bagging and RS ensemble learning algorithms for reference evapotranspiration prediction using climatic data-limited humid region in Bangladesh. J Hydrol 590:125241
Scott DW (1992) Multivariate density estimation: theory, practice, and visualization. John Wiley & Sons
Sharma A (2000) Seasonal to interannual rainfall probabilistic forecasts for improved water supply management: Part 1—A strategy for system predictor identification. J Hydrol 239(1–4):232–239
Shiri J, Shamshirband S, Kisi O, Karimi S, Bateni SM, Hosseini Nezhad SH, Hashemi A (2016) Prediction of water-level in the Urmia Lake using the extreme learning machine approach. Water Resour Manag 30(14):5217–5229
Smith PJ, Panziera L, Beven KJ (2014) Forecasting flash floods using data-based mechanistic models and NORA radar rainfall forecasts. Hydrol Sci J 59(7):1403–1417
Taormina R, Chau K-W (2015) Data-driven input variable selection for rainfall–runoff modeling using binary-coded particle swarm optimization and Extreme Learning Machines. J Hydrol 529:1617–1632
Tiwari MK, Chatterjee C (2011) A new wavelet–bootstrap–ANN hybrid model for daily discharge forecasting. J Hydroinf 13(3):500–519
Yadav B, Eliza K (2017) A hybrid wavelet-support vector machine model for prediction of Lake water level fluctuations using hydro-meteorological data. Measurement 103:294–301
Yan S, Wang X, Zhang Y, Liu D, Yi Y, Li C, Liu Q, Yang Z (2020) A hybrid PCA-GAM model for investigating the spatiotemporal impacts of water level fluctuations on the diversity of benthic macroinvertebrates in Baiyangdian Lake, North China. Ecol Indic 116:106459
Yarar A, Onucyildiz M, Copty NK (2009) Modelling level change in lakes using neuro-fuzzy and artificial neural networks. J Hydrol 365(3–4):329–334
Young PC, Romanowicz RJ, Beven KJ (2014) A data-based mechanistic modelling approach to real-time flood forecasting. Applied Uncertainty Analysis for Flood Risk Management, edited by: Beven, KJ and Hall, JW, Imperial College Press: London, 407–461
Zhu S, Hrnjica B, Ptak M, Choiński A, Sivakumar B (2020a) Forecasting of water level in multiple temperate lakes using machine learning models. J Hydrol 585:124819
Zhu S, Lu H, Ptak M, Dai J, Ji Q (2020b) Lake water-level fluctuation forecasting using machine learning models: a systematic review. Environ Sci Pollut Res 27:44807–44819
Acknowledgements
The authors would like to express their gratitude to Dr. Rahim Barzegar, Dr. John Quilty, Prof. Jan Adamowski, and Dr. Homa Kheirollahpour for their support during the study.
Author information
Authors and Affiliations
Contributions
Elham Fijani: Writing – original draft, Conceptualization, Visualization, review & editing. Khabat Khosravi: Formal analysis, Software, review & editing.
Corresponding author
Ethics declarations
Ethical Approval
Not applicable because this article does not contain any studies with human or animal subjects.
Consent to Participate
The authors declare that they are aware and consent with their participation on this paper.
Consent to Publish
The authors declare that they are consent with the publication of this paper.
Competing Interests
The authors declare that they have no known competing financial interests or personal relationships that could have influenced the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fijani, E., Khosravi, K. Hybrid Iterative and Tree-Based Machine Learning Algorithms for Lake Water Level Forecasting. Water Resour Manage 37, 5431–5457 (2023). https://doi.org/10.1007/s11269-023-03613-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-023-03613-x