Abstract
Monitoring of river water is necessary to reveal its quality and pollution level so that we can protect human health and the environment. The present study explored the water quality of the Narmada River in India. To evaluate the water quality of the Narmada River, water samples were collected from 13 sites during the pre- and post-monsoon seasons, and were analyzed for different physicochemical parameters. The results from the analysis were used for the development of the entropy-river water quality index (ERWQI). The ERWQI was used to estimate the Narmada river water quality for two different uses: drinking after disinfection (ERWQId) and bathing (ERWQIb). The machine-learning-based classification models, namely the Logistic regression (LR), Support Vector (SV), K-Nearest Neighbor (KNN), Random Forest (RF), and Gradient Boosting (GB) models were examined to predict and classify ERWQI. The precision, recall, F1 score, and confusion matrix were used to evaluate the performance of the model. The findings of this study identified the LR model as the most accurate classification model with the highest accuracy score for both the ERWQId and ERWQIb. Moreover, this study also revealed that the water quality of the Narmada River was unsuitable for drinking after disinfection and hence, before any further use it requires treatment through conventional or an advanced techniques. However, the ERWQIb of the Narmada River was categorized as excellent to fair. This study has broad implications for the classification of river water quality and can provide some very useful information to monitoring agencies and policymakers.
Similar content being viewed by others
References
Abtahi M, Golchinpour N, Yaghmaeian K, Rafiee M, Jahangiri-rad M, Keyani A, Saeedi R (2015) A modified drinking water quality index (DWQI) for assessing drinking source water quality in rural communities of Khuzestan province. Iran Ecol Indic 53:283–291. https://doi.org/10.1016/j.ecolind.2015.02.009
Adimalla N, Qian H, Li P (2020) Entropy water quality index and probabilistic health risk assessment from geochemistry of groundwaters in hard rock terrain of Nanganur country, South India. Geochem 80(4):125544. https://doi.org/10.1016/j.chemer.2019.125544
Akhtar N, Syakir Ishak MI, Bhawani SA, Umar K (2021) Various natural and anthropogenic factors responsible for water quality degradation: a review. Water 13(19):2660. https://doi.org/10.3390/w13192660
APHA (2017) Standard methods for the examination of water and wastewater, 23rd edn. American Public Health Association, American Water Works Association, Water Environment Federation, Denver
Babbar R, Babbar S (2017) Predicting river water quality index using data mining techniques. Environ Earth Sci 76(14):1–15. https://doi.org/10.1007/s12665-017-6845-9
Barakat A, Meddah R, Afdali M, Touhami F (2018) Physicochemical and microbial assessment of spring water quality for drinking supply in piedmont of Béni-Mellal atlas (Morocco). Phys Chem Earth 104:39–46. https://doi.org/10.1016/j.pce.2018.01.006
Beyene J, Atenafu EG, Hamid JS, To T, Sung L (2009) Determining relative importance of variables in developing and validating predictive models. BMC Med Res Methodol 9(1):e10. https://doi.org/10.1186/1471-2288-9-64
BIS (2012) Drinking water specifications 2nd revision. Bureau of Indian Standards (IS 10500: 2012). New Delhi. http://cgwb.gov.in/documents/wq-standards.pdf
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Chen RC, Dewi C, Huang SW, Caraka RE (2020) Selecting critical features for data classification based on machine learning methods. J. Big Data 7:327. https://doi.org/10.1186/s40537-020-00327-4
Chen X, Liu H, Liu F, Huang T, Shenm R, Deng Y, Chen D (2021) Two novelty learning models developed based on deep cascade forest to address the environmental imbalanced issues: a case study of drinking water quality prediction. Enviro Pollut 291:118153. https://doi.org/10.1016/j.envpol.2021.118153
Cox DR (1958) The regression analysis of binary sequences. J R Stat Soc Series B Stat Methodol, 20(2):215–232 https://www.jstor.org/stable/2983890
CPCB (1979) https://cpcb.nic.in/wqstandards/. Accessed 28 Jan 2023
Cristianini N, Shawe-Taylor J (2000) An Introduction to support vector machines (and other Kernel-based learning methods). Cambridge University Press, UK
Fagbote EO, Olanipekun EO, Uyi HS (2014) Water quality index of the ground water of bitumen deposit impacted farm settlements using entropy weighted method. IJEST 11:127–138. https://doi.org/10.1007/s13762-012-0149-0
Fix E, Hodges JL (1989) Discriminatory analysis. Nonparametric discrimination: consistency properties. Int Stat Rev Rev Int Stat 57(3):238–247
Gaagai A, Aouissi HA, Bencedira S, Hinge G, Athamena A, Haddam S, Gad M, Elsherbiny O, Elsayed S, Eid MH, Ibrahim H (2023) Application of water quality indices, machine learning approaches, and GIS to identify groundwater quality for irrigation purposes: a case study of Sahara Aquifer, Doucen plain. Alger Water 15(2):289. https://doi.org/10.3390/w15020289
Gakii C, Jepkoech J (2019) A classification model for water quality analysis using decision tree. EJCSIT 3:1–8
Ghobadi F, Kang D (2023) Application of machine learning in water resources management: a systematic literature review. Water 15(4):620. https://doi.org/10.3390/w15040620
Gupta S, Gupta SK (2021) Development and evaluation of an innovative Enhanced river pollution Index model for holistic monitoring and management of river water quality. ESPR 28(21):27033–27046. https://doi.org/10.1007/s11356-021-12501-z
Gupta N, Pandey P, Hussain J (2017) Effect of physicochemical and biological parameters on the quality of river water of Narmada, Madhya Pradesh. India Water Sci 31(1):11–23. https://doi.org/10.1016/j.wsj.2017.03.002
Gupta D, Shukla R, Barya MP, Singh G, Mishra VK (2020) Water quality assessment of Narmada river along the different topographical regions of the central India. Water Sci 34(1):202–212. https://doi.org/10.1080/11104929.2020.1839345
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844
Horton RK (1965) An index number system for rating water quality. J Water Pollut Control Fed 37(3):300–306. https://doi.org/10.1029/WR015i002p00460
Huan J, Li H, Li M, Chen B (2020) Prediction of dissolved oxygen in aquaculture based on gradient boosting decision tree and long short-term memory network: a study of Chang Zhou fishery demonstration base China. Comput Electron Agric 175:105530. https://doi.org/10.1016/j.compag.2020.105530
ICMR (Council of Medical Research) (1975) Manual of standards of quality for drinking water supplies, Indian. Special Report 44.
Iscen FC, Emiroglu Ö, Ilhan S, Arslan N, Yilmaz V, Ahiska S (2008) Application of multivariate statistical techniques in the assessment of surface water quality in Uluabat lake. Turk Environ Monit Assess 144(1–3):269–276. https://doi.org/10.1007/s10661-007-9989-3
Landwehr JM (1979) A statistical view of a class of water quality indices. Water Resour Res 15(2):460–468. https://doi.org/10.1029/WR015i002p00460
Li X, Wang K, Liu L, Xin J, Yang H, Gao C (2011) Application of the entropy weight and TOPSIS method in safety evaluation of coal mines. Proc Eng 26:2085–2091. https://doi.org/10.1016/j.proeng.2011.11.2410
Liang B, Han G, Liu M, Yang K, Li X, Liu J (2018) Distribution, sources, and water quality assessment of dissolved heavy metals in the Jiulongjiang river water, Southeast China. Int J Environ Res Public Health 15(12):2752. https://doi.org/10.3390/ijerph15122752
Liu YH (2017) Python machine learning by example. Packt Publishing Ltd, Birmingham
Malek NHA, Wan Yaacob WF, Md Nasir SA, Shaadan N (2022) Prediction of water quality classification of the Kelantan river basin, Malaysia. Mach Learn Tech Water 14(7):1067. https://doi.org/10.3390/w14071067
Misaghi F, Delgosha F, Razzaghmanesh M, Myers B (2017) Introducing a water quality index for assessing water for irrigation purposes: a case study of the Ghezel Ozan river. Sci Total Environ 589:107–116. https://doi.org/10.1016/j.scitotenv.2017.02.226
Mishra S, Kumar A (2021) Estimation of physicochemical characteristics and associated metal contamination risk in the Narmada River. India. EER 26(1):521. https://doi.org/10.4491/eer.2019.521
Murphy KP (2012) Machine learning: a probabilistic perspective. MIT press, Cambridge
Nasir N, Kansal A, Alshaltone O, Barneih F, Sameer M, Shanableh A, Al-Shamma’a A (2022) Water quality classification using machine learning algorithms. JWPE 48:102920. https://doi.org/10.1016/j.jwpe.2022.102920
Nasirian M (2007) A new water quality index for environmental contamination contributed by mineral processing: a case study of Amang (Tin Tailing) processing activity. J Appl Sci 7(20):2977-2987. https://doi.org/10.3923/jas.2007.2977.2987
Nguyen MD, Costache R, Sy AH, Ahmadzadeh H, Van Le H, Prakash I, Pham BT (2022) Novel approach for soil classification using machine learning methods. Bull Eng Geol Environ 81(11):468. https://doi.org/10.1007/s10064-022-02967-7
Ongley ED (2000) Water quality management: design, financing and sustainability considerations-II. In Invited presentation at the World Bank’s water week conference: Towards a strategy for managing water quality management.
Pathakamuri PC, Villuri VGK, Pasupuleti S, Banerjee A, Venkatesh AS (2022) A holistic approach for understanding the status of water quality and causes of its deterioration in a drought-prone agricultural area of Southeastern India. ESPR 698:1–16. https://doi.org/10.1007/s11356-022-22906-z
Sadiq R, Kleiner Y, Rajani B (2010) Modelling the potential for water quality failures in distribution networks: framework (I). J Water Supply Res Tech AQUA 59(4):255–276. https://doi.org/10.2166/aqua.2010.059
Sahoo MM, Patra KC, Khatua KK (2015) Inference of water quality index using ANFIA and PCA. Aquat Proc 4:1099–1106. https://doi.org/10.1016/j.aqpro.2015.02.139
Sarker IH (2021) Machine learning: algorithms, real-world applications and research directions. SN Comput Sci 2(3):1–21. https://doi.org/10.1007/s42979-021-00592-x
Shah KA, Joshi GS (2017) Evaluation of water quality index for river Sabarmati, Gujarat. India Appl Water Sci 7(3):1349–1358. https://doi.org/10.1007/s13201-015-0318-7
Shannon CE (1948) A mathematical theory of communication. BSTJ 27(3):379–423. https://doi.org/10.1002/j.1538-7305.1948.tb01338.x
Sharma A, Bora CR, Shukla V (2013) Evaluation of seasonal changes in physico-chemical and bacteriological characteristics of water from the Narmada River (India) using multivariate analysis. Nat Resour Res 22:283–296. https://doi.org/10.1007/s11053-013-9204-x
Singh VP (2013) Entropy theory and its application in environmental and water engineering. John Wiley & Sons
Steinwart I, Christmann A (2008) Support vector machines. Springer Science and Business Media, Singapore
Thomas T, Gunthe SS, Ghosh NC, Sudheer KP (2015) Analysis of monsoon rainfall variability over Narmada basin in central India: implication of climate change. J Water Clim Change 6(3):615–627. https://doi.org/10.2166/wcc.2014.041
Tolles J, Meurer WJ (2016) Logistic regression: relating patient characteristics to outcomes. JAMA. https://doi.org/10.1001/jama.2016.7653
Tung TM, Yaseen ZM (2020) A survey on river water quality modelling using artificial intelligence models: 2000–2020. J. Hydrol 585:124670. https://doi.org/10.1016/j.jhydrol.2020.124670
Uddin MG, Nash S, Rahman A, Olbert AI (2023) Performance analysis of the water quality index model for predicting water state using machine learning techniques. PSEP 169:808–828. https://doi.org/10.1016/j.psep.2022.11.073
Veeramsetty V, Shadamaki N, Pinninti R, Mohnot A, Ashish G (2022) Water quality classification using support vector machine. In: AIP conference proceedings (Vol. 2418, No. 1, p. 040022). AIP Publishing LLC, Melville.
Verma M, Loganathan VA, Bhatt VK (2022) Development of entropy and deviation-based water quality index: case of river Ganga. India. Ecol Indic 143:109319. https://doi.org/10.1016/j.ecolind.2022.109319
World Health Organization (WHO) (2017) Guidelines for drinking water quality. 4th Edition. Resource document. World Health Organization.https://apps.who.int/iris/bitstream/handle/10665/254637/9789241549950 eng.pdf;jsessionid=2B8366923794036A821CA9E1A0777A9D?sequence=1. (Accessed on 28–01–2023)
Yan H, Zou Z (2014) Water quality evaluation based on entropy coefficient and blind number theory measure model. J Netw 9(7):1868. https://doi.org/10.4304/jnw.9.7.1868-1874
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
Conceptualization, methodology, data collection, writing original draft: D.G.; Supervision, validation, editing and critical review: V.K.M.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Gupta, D., Mishra, V.K. Development of entropy-river water quality index for predicting water quality classification through machine learning approach. Stoch Environ Res Risk Assess 37, 4249–4271 (2023). https://doi.org/10.1007/s00477-023-02506-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00477-023-02506-0