Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality

Moeini, Mohammadreza

doi:10.1007/s40899-024-01064-9

Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality

Original Article
Published: 14 March 2024

Volume 10, article number 83, (2024)
Cite this article

Sustainable Water Resources Management Aims and scope Submit manuscript

Mohammadreza Moeini ORCID: orcid.org/0000-0003-3844-8874¹

94 Accesses
Explore all metrics

Abstract

Physically based models (PBMs), including stormwater management model (SWMM), require a significant amount of in situ data and expertise to predict water quality in urban watersheds. In recent years, data-driven models have been increasingly used as an alternative for the prediction of pollutant concentrations. Supervised machine learning (ML) models have been used for estimating stormwater quality parameters. However, optimizing the structure of such ML models has rarely been considered. This study aims to comprehensively evaluate the optimization of the supervised ensemble bagging ML model for forecasting stormwater quality using an ML-based optimization method called Bayesian optimization (BO). To that end, a bagging ensemble model, namely random forest (RF), was first developed for estimating total suspended solids (TSS) concentration in urban watersheds. Eleven factors, including drainage area, land-use types, impervious area, rainfall depth, the volume of runoff, and antecedent dry days, were implemented as predictive features in the model, and their data were acquired from the National Stormwater Quality Database (NSQD). Values for the number of basic estimators, the number of basic selected features for developing basic estimators, subsamples, and the maximum depth of basic learners were optimized using BO. A sensitivity analysis was done on the ML model and the BO parameters, including acquisition function, number of initial points, and realizations. Results indicated that the accuracy of the RF model depends on all mentioned RF parameters. The performance of the best-developed RF model was satisfactory in both the training and the testing steps. This model obtained the R² values of 0.955 and 0.915 for the training and testing step, respectively. The study demonstrated the potential of a combination of the RF models and BO for accurately predicting stormwater quality parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stream water quality prediction using boosted regression tree and random forest models

Article 20 January 2022

Enhancing the performance of tunnel water inflow prediction using Random Forest optimized by Grey Wolf Optimizer

Article 11 July 2023

Water quality prediction using machine learning models based on grid search method

Article Open access 29 September 2023

Data availability

The datasets generated and/or analyzed during the current study are publicly available on the International Stormwater BMP database (https://bmpdatabase.org/).

References

Adam EM, Mutanga O, Rugege D, Ismail R (2012) Discriminating the Papyrus vegetation (Cyperus Papyrus L.) and Its Co-existent species using random forest and hyperspectral data resampled to HYMAP. Int J Remote Sens 33(2):552–569
Article Google Scholar
Ahmed N et al (2019) Machine learning methods for better water quality prediction. J Hydrol 578:124084
Article Google Scholar
Al Hasan M, Chaoji V, Salem S, Zaki M (2006) Link prediction using supervised learning. In: SDM06: workshop on link analysis, counter-terrorism and security, vol 30, pp 798–805
Álvarez-Cabria M, Barquín J, Peñas FJ (2016) Modelling the spatial and seasonal variability of water quality for entire river networks: relationships with natural and anthropogenic factors. Sci Total Environ 545–546:152–162. https://doi.org/10.1016/j.scitotenv.2015.12.109
Article CAS Google Scholar
Bardenet R, Brendel M, Kégl B, Sebag M (2013) Collaborative hyperparameter tuning. Int Conf Mach Learn, ICML 28(2):858–866
Google Scholar
Beriman L (2001) Random forests. Mach Learn 45:5–32
Article Google Scholar
Berk J, Gupta S, Rana S, Venkatesh S (2020) Randomised gaussian process upper confidence bound for bayesian optimisation. IJCAI Int Joint Conf Artif Intell 2021:2284–2290
Google Scholar
Brochu E, Cora VM, De Freitas N (2010) “A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning.” http://arxiv.org/abs/1012.2599
Cambez MJ, Pinho J, David LM (2008) “Using SWMM 5 in the continuous modelling of stormwater hydraulics and quality”. 1–10
Candelieri A, Perego R, Archetti F (2018) Bayesian optimization of pump operations in water distribution systems. J Global Optim 71(1):213–235. https://doi.org/10.1007/s10898-018-0641-2
Article Google Scholar
Frazier PI (2018a) A tutorial on Bayesian optimization. arXiv 5:1–22
Google Scholar
Frazier PI. 2018b. “Bayesian optimization.” Recent Adv Optim Model Contemp Probl 255–78
García-Alba J, Bárcena JF, Ugarteburu C, García A (2019) Artificial neural networks as emulators of process-based models to analyse bathing water quality in estuaries. Water Res 150:283–295
Article Google Scholar
García-Callejas D, Araújo MB (2016) Of model and data complexity on predictions from species distributions models. Ecol Model 326:4–12. https://doi.org/10.1016/j.ecolmodel.2015.06.002
Article Google Scholar
Gelbart MA, Snoek J, Adams RP (2014) “Bayesian optimization with unknown constraints.” Uncertainty in Artificial Intelligence-Proceedings of the 30th Conference, UAI 2014: 250–59
Golecha YS (2017) Analyzing term deposits in banking sector by performing predictive analysis using multiple machine learning techniques. Doctoral dissertation, Dublin, National College of Ireland
Gong Y, Liang X, Li X, Li J, Fang X, Song R (2016) Influence of rainfall characteristics on total suspended solids in urban runoff: a case study in Beijing, China. Water 8(7):278. https://doi.org/10.3390/w8070278
Article Google Scholar
Granata F et al (2017) Machine learning algorithms for the forecasting of wastewater quality indicators. Water (switzerland) 9(2):1–12
Google Scholar
Haghiabi AH, Nasrolahi AH, Parsaie A (2018) Water quality prediction using machine learning methods. Water Qual Res J 53(1):3–13
Article CAS Google Scholar
Hansen N et al (2010). Experimental setup to cite this version : HAL Id : Inria-00462481 Real-Parameter Black-Box Optimization Benchmarking 2010 : Experimental Setup”
Hardt M, Price E, Srebro N (2016) Equality of opportunity in supervised learning. In: Advances in neural information processing systems 29 (NIPS 2016), pp 3323–3331
Hasanipanah M et al (2017) Forecasting blast-induced ground vibration developing a CART model. Eng Comput 33(2):307–316
Article Google Scholar
He F, Zhou J, Feng ZK, Liu G, Yang Y (2019) A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl Energy 237:103–116
Article Google Scholar
Jeung M et al (2019) Evaluation of random forest and regression tree methods for estimation of mass first flush ratio in urban catchments. J Hydrol 575(May):1099–1110. https://doi.org/10.1016/j.jhydrol.2019.05.079
Article CAS Google Scholar
Kim YH et al (2014) Machine learning approaches to coastal water quality monitoring using GOCI satellite data. Gisci Remote Sens 51(2):158–174
Article Google Scholar
King JK, Blanton JO (2011) Model for predicting effects of landuse changes on the canal-mediated discharge of total suspended solids into tidal creeks and estuaries. J Environ Eng 137(10):920–927. https://doi.org/10.1061/(ASCE)EE.1943-7870.0000396
Article CAS Google Scholar
Knysh P, Korkolis Y. Blackbox (2016) “Blackbox: a procedure for parallel optimization of expensive black-box functions.” : 1–8. http://arxiv.org/abs/1605.00998
Kokkonen TS, Jakeman AJ, Young PC, Koivusalo HJ (2003) Predicting daily flows in ungauged catchments: model regionalization from catchment descriptors at the coweeta hydrologic laboratory, North Carolina. Hydrol Process 17(11):2219–2238
Article Google Scholar
Krebs G et al (2013) A high resolution application of a stormwater management model (SWMM) using genetic parameter optimization. Urban Water J 10(6):394–410
Article Google Scholar
Li L et al (2018) Hyperband: a novel bandit-based approach to hyperparameter optimization. J Mach Learn Res 18:1–52
Google Scholar
Li P et al (2020) Comparison of the use of a physical-based model with data assimilation and machine learning methods for simulating soil water dynamics. J Hydrol 584(January):124692. https://doi.org/10.1016/j.jhydrol.2020.124692
Article Google Scholar
Liang J, Li W, Bradford SA, Šimůnek J (2019) Physics-informed data-driven models to predict surface runoffwater quantity and quality in agricultural fields. Water (switzerland) 11(2):200
CAS Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Google Scholar
Ließ M, Glaser B, Huwe B (2012) Uncertainty in the spatial prediction of soil texture. comparison of regression tree and random forest models. Geoderma 170:70–79. https://doi.org/10.1016/j.geoderma.2011.10.010
Article Google Scholar
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169. https://doi.org/10.1016/j.chemosphere.2020.126169
Article CAS Google Scholar
Mansour-Bahmani A, Haghiabi AH, Shamsi Z, Parsaie A (2021) Predictive modeling the discharge of urban wastewater using artificial intelligent models (case study: Kerman city). Model Earth Syst Environ 7:1917–1925
Article Google Scholar
McCarthy DT, Hathaway JM, Hunt WF, Deletic A (2012) Intraevent variability of Escherichia coli and total suspended solids in urban stormwater runoff. Water Res 46(20):6661–6670. https://doi.org/10.1016/j.watres.2012.01.006
Article CAS Google Scholar
Minocha VK (2004) Discussion of “ comparative analysis of event-based rainfall-runoff modeling. J Hydrol Eng 9(6):550–558
Article Google Scholar
Moeini M, Shojaeizadeh A, Geza M (2021) Supervised machine learning for estimation of total suspended solids in urban watersheds. Water (switzerland) 13(2):147
Google Scholar
Moeini M, Shojaeizadeh A, Geza M (2022) Supervised stacking ensemble machine learning approach for enhancing prediction of total suspended solids concentration in urban watersheds. J Environ Eng 148(6):1–12
Article Google Scholar
Moeini M, Sela L, Taha AF, Abokifa AA (2023a) Bayesian optimization of booster disinfection scheduling in water distribution networks. Water Res 242:120117. https://doi.org/10.1016/j.watres.2023.120117
Article CAS Google Scholar
Moeini M, Sela L, Taha AF, Abokifa AA (2023b) Optimization techniques for chlorine dosage scheduling in water distribution networks: a comparative analysis. World environmental and water resources congress 2023:987–998. https://doi.org/10.1061/9780784484852.09
Article Google Scholar
Munkhdalai L et al (2019) Mixture of activation functions with extended min-max normalization for forex market prediction. IEEE Access 7:183680–183691
Article Google Scholar
Najafzadeh M, Ghaemi A, Emamgholizadeh S (2019) Prediction of water quality parameters using evolutionary computing-based formulations. Int J Environ Sci Technol 16(10):6377–6396. https://doi.org/10.1007/s13762-018-2049-4
Article CAS Google Scholar
Nezaratian H, Zahiri J, Peykani MF, Haghiabi A, Parsaie A (2021) A genetic algorithm-based support vector machine to estimate the transverse mixing coefficient in streams. Water Qual Res J 56(3):127–142
Article CAS Google Scholar
Nguyen Vu et al (2017) Regret for expected improvement over the best-observed value and stopping condition. J Mach Learn Res 77:279–294
Google Scholar
Ok AO, Akar O, Gungor O (2012) Evaluation of random forest method for agricultural crop classification. Eur J Remote Sens 45(1):421–432
Article Google Scholar
Pandey A, Jain A (2017) Comparative analysis of KNN algorithm using various normalization techniques. IntJ Comput Netw Inform Secur 9(11):36–42
Google Scholar
Parsaie A, Emamgholizadeh S, Azamathulla HM, Haghiabi AH (2018) ANFIS-based PCA to predict the longitudinal dispersion coefficient in rivers. Int J Hydrol Sci Technol 8(4):410–424
Article Google Scholar
Pizarro J, Vergara PM, Morales JL, Rodríguez JA, Vila I (2014) Influence of land use and climate on the load of suspended solids in catchments of Andean rivers. Environ Monit Assess 186(2):835–843. https://doi.org/10.1007/s10661-013-3420-z
Article CAS Google Scholar
Qishlaqi A, Kordian S, Parsaie A (2017) Hydrochemical evaluation of river water quality—a case study. Appl Water Sci 7:2337–2342
Article CAS Google Scholar
Rajadurai H, Gandhi UD (2020) A stacked ensemble learning model for intrusion detection in wireless network. Neural Comput Appl. https://doi.org/10.1007/s00521-020-04986-5
Article Google Scholar
Reddy GT et al (2020) An ensemble based machine learning model for diabetic retinopathy classification. Int Conf Emerg Trends Inform Technol Eng Ic-ETITE 2020:1–6
Google Scholar
Schratz P et al (2019) Hyperparameter tuning and performance assessment of statistical and machine-learning algorithms using spatial data. Ecol Model 406:109–120. https://doi.org/10.1016/j.ecolmodel.2019.06.002
Article Google Scholar
Seeger M (2004) Gaussian processes for machine learning. Int J Neural Syst 14(2):69–106
Article Google Scholar
Singh D, Singh B (2020) Investigating the impact of data normalization on classification performance. Appl Soft Comput 97:105524. https://doi.org/10.1016/j.asoc.2019.105524
Article Google Scholar
Singh KP, Basant A, Malik A, Jain G (2009) Artificial neural network modeling of the river water quality-a case study. Ecol Model 220(6):888–895
Article CAS Google Scholar
Snoek J, Larochelle H, Adams RP (2012) Practical bayesian optimization of machine learning algorithms. Adv Neural Inf Process Syst 4:2951–2959
Google Scholar
Springenberg JT (2015) “Unsupervised and semi-supervised learning with categorical generative adversarial networks.” (2009): 1–20. http://arxiv.org/abs/1511.06390
Sutton CD (2005) 24 handbook of statistics classification and regression trees, bagging, and boosting. Elsevier Masson SAS. https://doi.org/10.1016/S0169-7161(04)24011-1
Article Google Scholar
Tan M, Quoc V Le (2019) “EfficientNet: rethinking model scaling for convolutional neural networks.” 36th International Conference on Machine Learning, ICML 2019 2019-June: 10691–700
Uygun BŞ, Albek M (2015) Determination effects of impervious areas on urban watershed. Environ Sci Pollut Res 22(3):2272–2286. https://doi.org/10.1007/s11356-014-3345-2
Article Google Scholar
Wu J, Poloczek M, Wilson AG, Frazier PI (2017) Bayesian optimization with gradients. Adv Neural Inform Process Syst 3:5268–5279
Google Scholar
Wu J et al (2019) Hyperparameter optimization for machine learning models based on bayesian optimization. J Electron Sci Technol 17(1):26–40. https://doi.org/10.11989/JEST.1674-862X.80904120
Article Google Scholar
Wu Di, Wang H, Seidu R (2020) Smart data driven quality prediction for urban water source management. Futur Gener Comput Syst 107:418–432
Article Google Scholar
Yang Li, Shami A (2020) On hyperparameter optimization of machine learning algorithms: theory and practice. Neurocomputing 415:295–316. https://doi.org/10.1016/j.neucom.2020.07.061
Article Google Scholar
Yao Y et al (2017) Complexity vs. performance: empirical analysis of machine learning as a service. Proceed ACM SIGCOMM Internet Meas Conf, IMC Part F1319(119):384–397
Google Scholar

Download references

Acknowledgements

The author would like to thank the National Science Foundation (NSF), the University of Illinois Chicago, the Department of Civil, Materials, and Environmental Engineering, and Dr. Ahmed Abokifa for their support while he is continuing his Ph.D. studies. Also, the author would like to express his sincere gratitude to Dr. Ahmed Abokifa for his advice on this research study.

Author information

Authors and Affiliations

Department of Civil, Materials, and Environmental Engineering, University of Illinois at Chicago, Chicago, IL, 60607, USA
Mohammadreza Moeini

Authors

Mohammadreza Moeini
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Mohammadreza Moeini wrote the original draft, conceived the data, investigated the manuscript, reviewed the data, edited the manuscript, and supervised the data.

Corresponding author

Correspondence to Mohammadreza Moeini.

Ethics declarations

Conflict of interest

The author declares no conflict of interests.

Ethical approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Moeini, M. Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality. Sustain. Water Resour. Manag. 10, 83 (2024). https://doi.org/10.1007/s40899-024-01064-9

Download citation

Received: 30 August 2023
Accepted: 01 February 2024
Published: 14 March 2024
DOI: https://doi.org/10.1007/s40899-024-01064-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality

Abstract

Access this article

Similar content being viewed by others

Stream water quality prediction using boosted regression tree and random forest models

Enhancing the performance of tunnel water inflow prediction using Random Forest optimized by Grey Wolf Optimizer

Water quality prediction using machine learning models based on grid search method

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hyperparameter tuning of supervised bagging ensemble machine learning model using Bayesian optimization for estimating stormwater quality

Abstract

Access this article

Similar content being viewed by others

Stream water quality prediction using boosted regression tree and random forest models

Enhancing the performance of tunnel water inflow prediction using Random Forest optimized by Grey Wolf Optimizer

Water quality prediction using machine learning models based on grid search method

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval and consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation