Abstract
The rising water pollution from anthropogenic factors motivates further research in developing water quality predicting models. The available models have certain limitations due to limited timespan data and the incapability to provide empirical expressions. This study is devoted to model and derive empirical equations for surface water quality of upper Indus river basin using a 30-year dataset with machine learning techniques and then to determine the most reliable model capable to accurately predict river water quality. Total dissolve solids (TDS) and electrical conductivity (EC) were used as dependent variables, whereas eight parameters were used as independent variables with 70 and 30% data for model training and testing, respectively. Various evaluation criteria, i.e., Nash-Sutcliffe efficiency (NSE), root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE), were used to assess the performance of models. The data is also validated with the help of k-fold cross-validation using R2 and RMSE. The results indicated a strong correlation with NSE and R2 both above 0.85 for all the developed models. Gene expression programming (GEP) outperformed both artificial neural network (ANN) and linear and non-linear regression models for TDS and EC. The sensitivity and parametric analyses revealed that bicarbonate is the most sensitive parameter influencing both TDS and EC models. Two equations were derived and formulated to represent the novel results of GEP model to help authorities in the effective monitoring of river water quality.
Similar content being viewed by others
Data Availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Abdollahzadeh G, Jahani E, Kashir Z (2017) Genetic programming based formulation to predict compressive strength of high strength concrete. Civil Eng Infrastructures J 50(2):207–219
Abunama T, Othman F, Ansari M, El-Shafie A (2019) Leachate generation rate modeling using artificial intelligence algorithms aided by input optimization method for an MSW landfill. Environ Sci Pollut Res 26(4):3368–3381
Adamowski J, Fung Chan H, Prasher SO, Ozga-Zielinski B, Sliusarieva A (2012) Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resour Res 48(1):W01528
Ali S, Li D, Congbin F, Khan F (2015) Twenty first century climatic and hydrological changes over Upper Indus Basin of Himalayan region of Pakistan. Environ Res Lett 10(1):014007. https://doi.org/10.1088/1748-9326/10/1/014007
Alizadeh MJ, Kavianpour MR, Danesh M, Adolf J, Shamshirband S, Chau K-W (2018) Effect of river flow on the quality of estuarine and coastal waters using machine learning models. Eng Appl Computational Fluid Mech 12(1):810–823
Al-Mukhtar M, Al-Yaseen F (2019) Modeling water quality parameters using data-driven models, a case study Abu-Ziriq marsh in south of Iraq. Hydrology 6(1):24
Ansari M, Othman F, Abunama T, El-Shafie A (2018) Analysing the accuracy of machine learning techniques to develop an integrated influent time series model: case study of a sewage treatment plant, Malaysia. Environ Sci Pollut Res 25(12):12139–12149
Aryafar A, Khosravi V, Zarepourfard H, Rooki R (2019) Evolving genetic programming and other AI-based models for estimating groundwater quality parameters of the Khezri plain, Eastern Iran. Environ Earth Sci 78(3):69
Azad A, Karami H, Farzin S, Mousavi S-F, Kisi O (2019) Modeling river water quality parameters using modified adaptive neuro fuzzy inference system. Water Sci Eng 12(1):45–54
Azamathulla HM, Ghani AA, Leow CS, Chang CK, Zakaria NA (2011) Gene-expression programming for the development of a stage-discharge curve of the Pahang River. Water Resourc Manag 25(11):2901–2916
Azamathulla HM, Rathnayake U, Shatnawi A (2018) Gene expression programming and artificial neural network to estimate atmospheric temperature in Tabuk, Saudi Arabia. Appl Water Sci 8(6):184
Azim I, Yang J, Javed MF, Iqbal MF, Mahmood Z, Wang F, and Liu Q-F. (2020). Prediction model for compressive arch action capacity of RC frame structures under column removal scenario using gene expression programming. Paper presented at the Structures.
Basant N, Gupta S, Malik A, Singh KP (2010) Linear and nonlinear modeling for simultaneous prediction of dissolved oxygen and biochemical oxygen demand of the surface water—a case study. Chemom Intell Lab Syst 104(2):172–180
Bozorg-Haddad O, Soleimani S, Loáiciga HA (2017) Modeling water-quality parameters using genetic algorithm–least squares support vector regression and genetic programming. J Environ Eng 143(7):04017021
Chen X-Y, Chau K-W (2019) Uncertainty analysis on hybrid double feedforward neural network model for sediment load estimation with LUBE method. Water Resourc Manag 33(10):3563–3577
Chen K, Chen H, Zhou C, Huang Y, Qi X, Shen R, Wang J (2020) Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res 171:115454
Crocker J, Bartram J (2014) Comparison and cost analysis of drinking water quality monitoring requirements versus practice in seven developing countries. Int J Environ Res Public Health 11(7):7333–7346
Ferreira C (2006). Gene expression programming: mathematical modeling by an artificial intelligence (Vol. 21): Springer.
Frank IE, and Todeschini R (1994). The data analysis handbook: Elsevier.
Gandomi AH, Yun GJ, Alavi AH (2013) An evolutionary approach for modeling of shear strength of RC deep beams. Mater Struct 46(12):2109–2119
Gholampour A, Gandomi AH, Ozbakkaloglu T (2017) New formulations for mechanical properties of recycled aggregate concrete using gene expression programming. Constr Build Mater 130:122–145
Iqbal MF, Liu Q-F, Azim I, Zhu X, Yang J, Javed MF, Rauf M (2020) Prediction of mechanical properties of green concrete incorporating waste foundry sand based on gene expression programming. J Hazard Mater 384:121322
Javed MF, Amin MN, Shah MI, Khan K, Iftikhar B, Farooq F, Aslam F, Alyousef R, Alabduljabbar H (2020) Applications of gene expression programming and regression techniques for estimating compressive strength of bagasse ash based concrete. Crystals 10(9):737
Juditsky A, Hjalmarsson H, Benveniste A, Delyon B, Ljung L, Sjöberg J, Zhang Q (1995) Nonlinear black-box models in system identification: Mathematical foundations. Automatica 31(12):1725–1750
Kargar K, Samadianfard S, Parsa J, Nabipour N, Shamshirband S, Mosavi A, Chau K-W (2020) Estimating longitudinal dispersion coefficient in natural streams using empirical models and machine learning algorithms. Eng Appl Computational Fluid Mech 14(1):311–322
Khan AJ, Koch M (2018) Correction and informed regionalization of precipitation data in a high mountainous region (Upper Indus Basin) and its effect on SWAT-modelled discharge. Water 10(11):1557
Khan A, Richards KS, Parker GT, McRobie A, Mukhopadhyay B (2014) How large is the Upper Indus Basin? The pitfalls of auto-delineation using DEMs. J Hydrol 509:442–453
Khare MJK, Warke A (2014) Selection of significant input parameters for water quality prediction-a comparative approach. Int J Res Advent Technol 2(03):81–90
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proceedings Ijcai, 14th edn. Montreal, Canada, pp 1137–1145
Liu M, Lu J (2014) Support vector machine―an alternative to artificial neuron network for water quality forecasting in an agricultural nonpoint source polluted river? Environ Sci Pollut Res 21(18):11036–11053. https://doi.org/10.1007/s11356-014-3046-x
Liu L-W, Wang Y-M (2019) Modelling reservoir turbidity using Landsat 8 Satellite Imagery by gene expression programming. Water 11(7):1479
Maedeh A, Mehrdadi N, Bidhendi G, Abyaneh HZ (2013) Application of artificial neural network to predict total dissolved solids variations in groundwater of Tehran plain: Iran. Int J Environ Sustain 2(1):10–20
Martí P, Shiri J, Duran-Ros M, Arbat G, De Cartagena FR, Puig-Bargués J (2013) Artificial neural networks vs. gene expression programming for estimating outlet dissolved oxygen in micro-irrigation sand filters fed with effluents. Comput Electron Agric 99:176–185
Mehdipour V, Memarianfard M, Homayounfar F (2017) Application of Gene Expression Programming to water dissolved oxygen concentration prediction: Int. J Hum Cap Urban Manag 2(1):1–10
Montaseri M, Ghavidel SZZ, Sanikhani H (2018) Water quality variations in different climates of Iran: toward modeling total dissolved solid using soft computing techniques. Stoch Env Res Risk A 32(8):2253–2273
Mustafa YA, Jaid GM, Alwared AI, Ebrahim M (2014) The use of artificial neural network (ANN) for the prediction and simulation of oil degradation in wastewater by AOP. Environ Sci Pollut Res 21(12):7530–7537. https://doi.org/10.1007/s11356-014-2635-z
Najah A, El-Shafie A, Karim OA, El-Shafie AH (2013) Application of artificial neural networks for water quality prediction. Neural Comput & Applic 22(1):187–201
Nasr M, Zahran HF (2014) Using of pH as a tool to predict salinity of groundwater for irrigation purpose using artificial neural network. Egypt J Aqua Res 40(2):111–115
Ouma YO, Okuku CO, Njau EN (2020) Use of artificial neural networks and multiple linear regression model for the prediction of dissolved oxygen in rivers: case study of hydrographic basin of River Nyando, Kenya. Complexity 2020:9570789 1-23
Pal S, Mukherjee S, Ghosh S (2014) Estimation of the phenolic waste attenuation capacity of some fine-grained soils with the help of ANN modeling. Environ Sci Pollut Res 21(5):3524–3533. https://doi.org/10.1007/s11356-013-2315-4
Ramzan S, Zahid FM, Ramzan S (2013) Evaluating multivariate normality: a graphical approach. Middle-East J Sci Res 13(2):254–263
Salami E, Salari M, Ehteshami M, Bidokhti N, Ghadimi H (2016) Application of artificial neural networks and mathematical modeling for the prediction of water quality variables (case study: southwest of Iran). Desalin Water Treat 57(56):27073–27084
Sarkar A, Pandey P (2015) River water quality modelling using artificial neural network technique. Aqua Proc 4:1070–1077
Sattari MT, Joudi AR, Kusiak A (2016) Estimation of water quality parameters with data-driven model. J-Am Water Works Assoc 108(4):E232–E239
Seyam MS, Alagha J, Abunama T, Mogheir Y, Affam AC, Heydari M, Ramlawi K (2020) Investigation of the influence of excess pumping on groundwater salinity in the Gaza Coastal Aquifer (Palestine) using three predicted future scenarios. Water 12(8):2218
Shah MI, Khan A, Akbar TA, Hassan QK, Khan AJ, Dewan A (2020) Predicting hydrologic responses to climate changes in highly glacierized and mountainous region Upper Indus Basin. R Soc Open Sci 7(8):191957
Shamshirband S, Jafari Nodoushan E, Adolf JE, Abdul Manaf A, Mosavi A, Chau, K.-w. (2019) Ensemble models with uncertainty analysis for multi-day ahead forecasting of chlorophyll a concentration in coastal waters. Engineering Applications of Computational Fluid Mechanics 13(1):91–101
Tahir AA, Chevallier P, Arnaud Y, Neppel L, Ahmad B (2011) Modeling snowmelt-runoff under climate scenarios in the Hunza River basin, Karakoram Range, Northern Pakistan. J Hydrol 409(1-2):104–117
Tu JV (1996) Advantages and disadvantages of using artificial neural networks versus logistic regression for predicting medical outcomes. J Clin Epidemiol 49(11):1225–1231
Tung TM, Yaseen ZM (2020) A survey on river water quality modelling using artificial intelligence models: 2000–2020. J Hydrol 585:124670
Zhang Y, Gao X, Smith K, Inial G, Liu S, Conil LB, Pan B (2019) Integrating water quality and operation into prediction of water production in drinking water treatment plants by genetic algorithm enhanced artificial neural network. Water Res 164:114888
Acknowledgments
The authors acknowledge the support of water and power development authority (WAPDA), Pakistan, for providing the water quality data of Indus River.
Funding
This research received no external funding.
Author information
Authors and Affiliations
Contributions
Conceptualization, data collection, and writing original draft preparation: Muhammad Izhar Shah; data analysis, modeling, review, and editing: Muhammad Faisal Javed; validation check, data curation, and manuscript revision: Taher Abunama. All authors approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare that they have no competing interests.
Ethical approval
Not applicable
Consent to participate
Not applicable
Consent to publish
Not applicable
Additional information
Responsible Editor: Marcus Schulz
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix. A Expression tree diagrams
Appendix. A Expression tree diagrams
Rights and permissions
About this article
Cite this article
Shah, M.I., Javed, M.F. & Abunama, T. Proposed formulation of surface water quality and modelling using gene expression, machine learning, and regression techniques. Environ Sci Pollut Res 28, 13202–13220 (2021). https://doi.org/10.1007/s11356-020-11490-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11356-020-11490-9