Abstract
Predicting turbidity (T), which represents the amount of fine sediment in water, is essential in effective water quality management. In this study, two ensemble learning models, XGBoost and light gradient boosting decision tree (LGB), were employed to predict T, using discharge (Q) as an independent variable. The input variables were classified into three groups based on the flow phase: rising limb, falling limb, and base flow, where different time–frequency datasets (2, 8, and 24 h) were utilized to develop the model. In the first model set (Model 1), each model was trained separately for every phase, and their performance was tested by applying each to the corresponding Q. Another model set using XGBoost and LGB was developed by considering the entire period without classification for a comparison purpose (Model 2). The results demonstrated that Model 1 which used data classified into three phases outperformed Model 2. Further analysis of the flood phase and hysteresis in the relationship between Q and T showed that different data distributions in the three phases determined the performance differences between Models 1 and 2. By considering these differences, Model 1 exhibited better performance compared to Model 2. The Shapley additive explanation (SHAP), a novel explainable artificial intelligence method, provided a reasonable interpretation of the difference in model predictions between Models 1 and 2.
Similar content being viewed by others
Data Availability
The manuscript contains data representation itself, additional data will be made available on reasonable request.
References
Asrafuzzaman M, Fakhruddin A, Hossain M (2011) Reduction of turbidity of water using locally available natural coagulants. Int Sch Res Notices 2011:1–6
Asselman NE (1999) Suspended sediment dynamics in a large drainage basin: the River Rhine. Hydrol Process 13(10):1437–1450
Bailey LP, Clare MA, Pope EL, Haigh ID, Cartigny MJ, Talling PJ, Lintern DG, Hage S, Heijnen M (2023) Predicting turbidity current activity offshore from meltwater-fed river deltas. Earth Planet Sci Lett 604:117977
Bennett ND, Croke BF, Guariso G, Guillaume JH, Hamilton SH, Jakeman AJ, Marsili-Libelli S, Newham LT, Norton JP, Perrin C (2013) Characterising performance of environmental models. Environ Model Softw 40:1–20
Bezak N, Mikoš M, Šraj M (2014) Trivariate frequency analyses of peak discharge, hydrograph volume and suspended sediment concentration data using copulas. Water Resour Manag 28(8):2195–2212
Buendia C, Vericat D, Batalla RJ, Gibbins CN (2016) Temporal dynamics of sediment transport and transient in-channel storage in a highly erodible catchment. Land Degrad Dev 27(4):1045–1063
Cantalice JRB, Cunha Filho M, Stosic BD, Piscoya VC, Guerra SM, Singh VP (2013) Relationship between bedload and suspended sediment in the sand-bed Exu River, in the semi-arid region of Brazil. Hydrol Sci J 58(8):1789–1802
Carling PA (1983) Particulate dynamics, dissolved and total load, in two small basins, northern Pennines. UK Hydrol Sci J 28(3):355–375
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. Proceedings of Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery And Data Mining, pp. 785–794.
Choubin B, Darabi H, Rahmati O, Sajedi-Hosseini F, Kløve B (2018) River suspended sediment modelling using the CART model: a comparative study of machine learning techniques. Sci Total Environ 615:272–281
Cui F, Salih SQ, Choubin B, Bhagat SK, Samui P, Yaseen ZM (2020) Newly explored machine learning model for river flow time series forecasting at Mary River. Australia Environ Monit Assess 192:1–15
Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29:1189–1232
Frostick LE, Lucas P, Reid I (1984) The infiltration of fine matrices into coarse-grained alluvial sediments and its implications for stratigraphical interpretation. J Geol Soc 141(6):955–965
Greig S, Sear D, Carling P (2005) The impact of fine sediment accumulation on the survival of incubating salmon progeny: implications for sediment management. Sci Total Environ 344(1–3):241–258
Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang G (2019) XAI—explainable artificial intelligence. Sci Robot 4(37):eaay7120
Hamshaw SD, Dewoolkar MM, Schroth AW, Wemple BC, Rizzo DM (2018) A new machine-learning approach for classifying hysteresis in suspended-sediment discharge relationships using high-frequency monitoring data. Water Resour Res 54(6):4040–4058
Harvey JW, Drummond JD, Martin RL, McPhillips LE, Packman AI, Jerolmack DJ, Stonedahl SH, Aubeneau AF, Sawyer AH, Larsen LG (2012) Hydrogeomorphology of the hyporheic zone: stream solute and fine particle interactions with a dynamic streambed. J Geophys Res-Biogeo 117(G4):1–20
Jensen DW, Steel EA, Fullerton AH, Pess GR (2009) Impact of fine sediment on egg-to-fry survival of Pacific salmon: a meta-analysis of published studies. Rev Fish Sci 17(3):348–359
Kastl B, Obedzinski M, Carlson SM, Boucher WT, Grantham TE (2022) Migration in drought: receding streams contract the seaward migration window of endangered salmon. Ecosphere 13(12):e4295
Ke G, Meng Q, Finley T, Wang T, Chen W, Ma W, Ye Q, Liu T-Y (2017) Lightgbm: a highly efficient gradient boosting decision tree. Adv Neural Inform Process Syst 30:3146–3154
Kemp P, Sear D, Collins A, Naden P, Jones I (2011) The impacts of fine sediment on riverine fish. Hydrol Process 25(11):1800–1821
Li L, Qiao J, Yu G, Wang L, Li H-Y, Liao C, Zhu Z (2022) Interpretable tree-based ensemble model for predicting beach water quality. Water Res 211:118078
Lin W, Sung S, Chen L, Chung H, Wang C, Wu R, Lee D, Huang C, Juang R, Peng X (2004) Treating high-turbidity water using full-scale floc blanket clarifiers. J Environ Eng 130(12):1481–1487
Lu H, Ma X (2020) Hybrid decision tree-based machine learning models for short-term water quality prediction. Chemosphere 249:126169
Lundberg SM, Lee S-I (2017) A unified approach to interpreting model predictions. Adv Neural Inform Process Syst 30:1–10
Lundberg SM, Erion GG, Lee S-I (2018) Consistent individualized feature attribution for tree ensembles. arXiv preprint arXiv:1802.03888.
Ma X, Sha J, Wang D, Yu Y, Yang Q, Niu X (2018) Study on a prediction of P2P network loan default based on the machine learning LightGBM and XGBoost algorithms according to different high dimensional data cleaning. Electron Commer Res Appl 31:24–39
Malutta S, Kobiyama M, Chaffe PLB, Bonumá NB (2020) Hysteresis analysis to quantify and qualify the sediment dynamics: state of the art. Water Sci Technol 81(12):2471–2487
Megnounif A, Terfous A, Ouillon S (2013) A graphical method to study suspended sediment dynamics during flood events in the Wadi Sebdou, NW Algeria (1973–2004). J Hydrol 497:24–36
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50(3):885–900
Navratil O, Legout C, Gateuille D, Esteves M, Liebault F (2010) Assessment of intermediate fine sediment storage in a braided river reach (southern French Prealps). Hydrol Process 24(10):1318–1332
Park J, Lee WH, Kim KT, Park CY, Lee S, Heo T-Y (2022) Interpretation of ensemble learning to predict water quality using explainable artificial intelligence. Sci Total Environ 832:155070
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V (2011) Scikit-learn: machine learning in python. J Mach Learn Res 12:2825–2830
Piqué G, López-Tarazón JA, Batalla RJ (2014) Variability of in-channel sediment storage in a river draining highly erodible areas (the Isábena, Ebro Basin). J Soil Sediment 14(12):2031–2044
Ribeiro MT, Singh S, Guestrin C (2016) “Why should I trust you?” explaining the predictions of any classifier. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144
Tuset J, Vericat D, Batalla R (2016) Rainfall, runoff and sediment transport in a Mediterranean mountainous catchment. Sci Total Environ 540:114–132
Walling D (1977) Assessing the accuracy of suspended sediment rating curves for a small basin. Water Resour Res 13(3):531–538
Walling DE, Owens PN, Leeks GJ (1998) The role of channel and floodplain storage in the suspended sediment budget of the River Ouse, Yorkshire. UK Geomorphol 22(3–4):225–242
Williams GP (1989) Sediment concentration versus water discharge during single hydrologic events in rivers. J Hydrol 111(1–4):89–106
Zhang D, Qian L, Mao B, Huang C, Huang B, Si Y (2018) A data-driven design for fault detection of wind turbines using random forests and XGBoost. IEEE Access 6:21020–21031
Zounemat-Kermani M, Kişi Ö, Adamowski J, Ramezani-Charmahineh A (2016) Evaluation of data driven models for river suspended sediment concentration modeling. J Hydrol 535:457–472
Zounemat-Kermani M, Mahdavi-Meymand A, Alizamir M, Adarsh S, Yaseen ZM (2020) On the complexities of sediment load modeling using integrative machine learning: application of the great river of Loíza in Puerto Rico. J Hydrol 585:124759
Zounemat-Kermani M, Alizamir M, Fadaee M, Sankaran Namboothiri A, Shiri J (2021) Online sequential extreme learning machine in river water quality (turbidity) prediction: a comparative study on different data mining approaches. Water Environ J 35(1):335–348
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2022R1F1A1065518) (50%). This study [G21S302588802] was supported by the technology development project of the Ministry of SMEs in 2022 (50%).
Author information
Authors and Affiliations
Contributions
JP: conceptualization, carried out the modeling and data analysis, investigation, methodology, writing—original draft, writing—review and editing. WHL: conceptualization, investigation, writing—review and editing. IK: conceptualization, investigation, writing—review and editing. JCJ: conceptualization, methodology, writing—review and editing.
Corresponding authors
Ethics declarations
Conflict of interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Park, J., Joo, J.C., Kang, I. et al. The use of explainable artificial intelligence for interpreting the effect of flow phase and hysteresis on turbidity prediction. Environ Earth Sci 82, 375 (2023). https://doi.org/10.1007/s12665-023-11056-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-023-11056-1