Abstract
This paper demonstrates the application of data mining techniques to predict river water quality index. The usefulness of these techniques lies in the automated extraction of novel knowledge from the data to improve decision-making. The popular classification techniques, namely k-nearest neighbor, decision trees, Naive Bayes, artificial neural networks, rule-based and support vector machines were used to develop the predictive environment to classify water quality into understandable terms based on the Overall Index of Pollution. Experimentation was conducted on two types of data sets: synthetic and real. A repeated k-fold cross-validation procedure was followed to design the learning and testing frameworks of the predictive environment. Based on the validation results, it was found that the error rate in defining the true water quality class was 20 and 28%, 11 and 24%, 1 and 38% and 10 and 20% for the k-nearest neighbor, Naive Bayes, artificial neural network and rule-based classifiers for synthetic and real data sets, respectively. The decision tree and support vector machines classifiers were found to be the best predictive models with 0% error rates during automated extraction of the water quality class. This study reveals that data mining techniques have the potential to quickly predict water quality class, provided data given are a true representation of the domain knowledge.
Similar content being viewed by others
References
Abbasi T, Abbasi SA (2012) Water quality indices. Elsevier, Amsterdam
Akkoyunlu A, Akiner ME (2012) Pollution evaluation in streams using water quality indices: a case study from Turkey’s Sapanca Lake Basin. Ecol Ind 18:501–511. doi:10.1016/j.ecolind.2011.12.018
Bordalo AA, Teixeira R, Wiebe WJ (2006) A water quality index applied to an international shared River Basin: the case of the Douro River. Environ Manag 38:910–920. doi:10.1007/s00267-004-0037-6
Bressler FT, Savic DA, Walters GA (2003) Water reservoir control with data mining. J Water Res Pl ASCE 129(1):26–34. doi:10.1061/(ASCE)0733-9496(2003)129:1(26)
Cordoba EB, Martinez AC, Ferrer EV (2010) Water quality indicators: comparison of a probabilistic index and a general quality index. the case of the Confederacion Hidrografica del Jucar (Spain). Ecol Ind 10:1049–1054. doi:10.1016/j.ecolind.2010.01.013
CPCB (2006) Water quality status of Yamuna River 1999–2005: Central Pollution Control Board, Ministry of Environment & Forests, Assessment and Development of River Basin Series: ADSORBS/41/2006-07
Cude CG (2001) Oregon water quality index a tool for evaluating water quality management effectiveness. J Am Water Resour Assoc 37(1):125–137. doi:10.1111/j.1752-1688.2001.tb05480.x
Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar Pollut Bull 64(11):2409–2420. doi:10.1016/j.marpolbul.2012.08.005
Gibert K, Rodrguez-Silva G, Rodrguez-Roda I (2010) Knowledge discovery with clustering based on rules by states: a water treatment application. Environ Modell Softw 26(6):712–723. doi:10.1016/j.envsoft.2009.11.004
Golge M, Yenilmez F, Aksoy A (2013) Development of pollution indices for the middle section of the Lower Seyhan Basin (Turkey). Ecol Ind 29:6–17. doi:10.1016/j.ecolind.2012.11.021
Han J, Kamber M (2010) Data mining: concepts and techniques. Elsevier, Atlanta
Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. The MIT Press Cambridge, MA
Hyvonen S, Junninen H, Laakso L, Dal Maso M, Gronholm T, Bonn B, Keronen P, Aalto P, Hiltunen V, Pohja T, Launiainen S, Hari P, Mannila H, Kulmala M (2005) A look at aerosol formation using data mining techniques. Atmos Chem Phys 5:3345–3356
Kovcs J, Kovcs S, Magyar N, Tanos P, Hatvani IG, Anda A (2014) Classification into homogeneous groups using combined cluster and discriminant analysis. Environ Modell & Softw 57:52–59. doi:10.1016/j.envsoft.2014.01.010
Liu M, Lu J (2014) Support vector machine-an alternative to artificial neuron network for water quality forecasting in an agricultural non point source polluted river? Environ Sci Pollut Res 21(18):11036–11053. doi:10.1007/s11356-014-3046-x
Lumb A, Sharma TC, Jean-Francois Bibeault (2011) A Review of Genesis and Evolution of Water Quality Index (WQI) and Some Future Directions. Water Qual Exp Health 3(1):11–24
Mohammadpour R, Shaharuddin S, Chang CK, Zakaria NA, Ghani AA, Chan NW (2015) Prediction of water quality index in constructed wetlands using support vector machine. Environ Sci Pollut Res 22:6208–6219. doi:10.1007/s11356-014-3806-7
Motamarri S, Boccelli DL (2012) Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms. Water Res 46(14):4508–4520. doi:10.1016/j.watres.2012.05.023
Mucherino A, Papajorgji P, Pardalos PM (2009) A survey of data mining techniques applied to agriculture. Oper Res Int J 9(2):121–140. doi:10.1007/s12351-009-0054-6
Palani S, Shie-Yui Liong, Tkalich P (2008) An ANN application for water quality forecasting. Mar Pollut Bull 56:1586–1597. doi:10.1016/j.marpolbul.2008.05.021
Prasanna MV, Praveena SM, Chidambaram S, Nagarajan R, Elayaraja A (2012) Evaluation of water quality pollution indices for heavy metal contamination monitoring: a case study from Curtin Lake, Miri City, East Malaysia. Environ Earth Sci 67:1987–2001. doi:10.1007/s12665-012-1639-6
Radojevic ID, Stefanovic DM, Comic LR, Ostojic AM, Topuzovic MD, Stefanovic ND (2012) Total Coliforms and data mining as a tool in water quality monitoring. Afr J Microbiol Res 6(10):2346–2356. doi:10.5897/AJMR11.1346
Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Water Resour Res 35(10):3089–3101
Ramesh S, Sukumaran N, Murugesan AG, Rajan MP (2010) An innovative approach of Drinking Water Quality Index-A case study from Southern Tamil Nadu, India. Ecol Ind 10:857–868. doi:10.1016/j.ecolind.2010.01.007
Russell S, Norvig P (2014) Artificial Intelligence: a modern approach. Pearson Education Limited, London
Sargaonkar A, Deshpande V (2003) Development of an Overall Index of Pollution for surface water based on a general classification scheme in Indian Context. Environ Monit and Assess 89:43–67
Singh RP, Nath S, Prasad SC, Nema AK (2008) Selection of suitable aggregation function for estimation of aggregate pollution index for River Ganges in India. J Environ Eng-ASCE 134(8):689–701. doi:10.1061/(ASCE)0733-9372(2008)134:8(689)
Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley Longman Publishing Co., Inc, Boston
Verma A, Wei X, Kusiak A (2013) Predicting the total suspended solids in wastewater: a data-mining approach. Eng Appl Artif Intel 26:1366–1372. doi:10.1016/j.engappai.2012.08.015
Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhi-Hua Zhou, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37. doi:10.1007/s10115-007-0114-2
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Babbar, R., Babbar, S. Predicting river water quality index using data mining techniques. Environ Earth Sci 76, 504 (2017). https://doi.org/10.1007/s12665-017-6845-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-017-6845-9