Skip to main content
Log in

Predicting river water quality index using data mining techniques

  • Original Article
  • Published:
Environmental Earth Sciences Aims and scope Submit manuscript

Abstract

This paper demonstrates the application of data mining techniques to predict river water quality index. The usefulness of these techniques lies in the automated extraction of novel knowledge from the data to improve decision-making. The popular classification techniques, namely k-nearest neighbor, decision trees, Naive Bayes, artificial neural networks, rule-based and support vector machines were used to develop the predictive environment to classify water quality into understandable terms based on the Overall Index of Pollution. Experimentation was conducted on two types of data sets: synthetic and real. A repeated k-fold cross-validation procedure was followed to design the learning and testing frameworks of the predictive environment. Based on the validation results, it was found that the error rate in defining the true water quality class was 20 and 28%, 11 and 24%, 1 and 38% and 10 and 20% for the k-nearest neighbor, Naive Bayes, artificial neural network and rule-based classifiers for synthetic and real data sets, respectively. The decision tree and support vector machines classifiers were found to be the best predictive models with 0% error rates during automated extraction of the water quality class. This study reveals that data mining techniques have the potential to quickly predict water quality class, provided data given are a true representation of the domain knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Abbasi T, Abbasi SA (2012) Water quality indices. Elsevier, Amsterdam

    Book  Google Scholar 

  • Akkoyunlu A, Akiner ME (2012) Pollution evaluation in streams using water quality indices: a case study from Turkey’s Sapanca Lake Basin. Ecol Ind 18:501–511. doi:10.1016/j.ecolind.2011.12.018

    Article  Google Scholar 

  • Bordalo AA, Teixeira R, Wiebe WJ (2006) A water quality index applied to an international shared River Basin: the case of the Douro River. Environ Manag 38:910–920. doi:10.1007/s00267-004-0037-6

    Article  Google Scholar 

  • Bressler FT, Savic DA, Walters GA (2003) Water reservoir control with data mining. J Water Res Pl ASCE 129(1):26–34. doi:10.1061/(ASCE)0733-9496(2003)129:1(26)

    Article  Google Scholar 

  • Cordoba EB, Martinez AC, Ferrer EV (2010) Water quality indicators: comparison of a probabilistic index and a general quality index. the case of the Confederacion Hidrografica del Jucar (Spain). Ecol Ind 10:1049–1054. doi:10.1016/j.ecolind.2010.01.013

    Article  Google Scholar 

  • CPCB (2006) Water quality status of Yamuna River 1999–2005: Central Pollution Control Board, Ministry of Environment & Forests, Assessment and Development of River Basin Series: ADSORBS/41/2006-07

  • Cude CG (2001) Oregon water quality index a tool for evaluating water quality management effectiveness. J Am Water Resour Assoc 37(1):125–137. doi:10.1111/j.1752-1688.2001.tb05480.x

    Article  Google Scholar 

  • Gazzaz NM, Yusoff MK, Aris AZ, Juahir H, Ramli MF (2012) Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Mar Pollut Bull 64(11):2409–2420. doi:10.1016/j.marpolbul.2012.08.005

    Article  Google Scholar 

  • Gibert K, Rodrguez-Silva G, Rodrguez-Roda I (2010) Knowledge discovery with clustering based on rules by states: a water treatment application. Environ Modell Softw 26(6):712–723. doi:10.1016/j.envsoft.2009.11.004

    Article  Google Scholar 

  • Golge M, Yenilmez F, Aksoy A (2013) Development of pollution indices for the middle section of the Lower Seyhan Basin (Turkey). Ecol Ind 29:6–17. doi:10.1016/j.ecolind.2012.11.021

    Article  Google Scholar 

  • Han J, Kamber M (2010) Data mining: concepts and techniques. Elsevier, Atlanta

    Google Scholar 

  • Hand DJ, Smyth P, Mannila H (2001) Principles of data mining. The MIT Press Cambridge, MA

    Google Scholar 

  • Hyvonen S, Junninen H, Laakso L, Dal Maso M, Gronholm T, Bonn B, Keronen P, Aalto P, Hiltunen V, Pohja T, Launiainen S, Hari P, Mannila H, Kulmala M (2005) A look at aerosol formation using data mining techniques. Atmos Chem Phys 5:3345–3356

    Article  Google Scholar 

  • Kovcs J, Kovcs S, Magyar N, Tanos P, Hatvani IG, Anda A (2014) Classification into homogeneous groups using combined cluster and discriminant analysis. Environ Modell & Softw 57:52–59. doi:10.1016/j.envsoft.2014.01.010

    Article  Google Scholar 

  • Liu M, Lu J (2014) Support vector machine-an alternative to artificial neuron network for water quality forecasting in an agricultural non point source polluted river? Environ Sci Pollut Res 21(18):11036–11053. doi:10.1007/s11356-014-3046-x

    Article  Google Scholar 

  • Lumb A, Sharma TC, Jean-Francois Bibeault (2011) A Review of Genesis and Evolution of Water Quality Index (WQI) and Some Future Directions. Water Qual Exp Health 3(1):11–24

    Article  Google Scholar 

  • Mohammadpour R, Shaharuddin S, Chang CK, Zakaria NA, Ghani AA, Chan NW (2015) Prediction of water quality index in constructed wetlands using support vector machine. Environ Sci Pollut Res 22:6208–6219. doi:10.1007/s11356-014-3806-7

    Article  Google Scholar 

  • Motamarri S, Boccelli DL (2012) Development of a neural-based forecasting tool to classify recreational water quality using fecal indicator organisms. Water Res 46(14):4508–4520. doi:10.1016/j.watres.2012.05.023

    Article  Google Scholar 

  • Mucherino A, Papajorgji P, Pardalos PM (2009) A survey of data mining techniques applied to agriculture. Oper Res Int J 9(2):121–140. doi:10.1007/s12351-009-0054-6

    Article  Google Scholar 

  • Palani S, Shie-Yui Liong, Tkalich P (2008) An ANN application for water quality forecasting. Mar Pollut Bull 56:1586–1597. doi:10.1016/j.marpolbul.2008.05.021

    Article  Google Scholar 

  • Prasanna MV, Praveena SM, Chidambaram S, Nagarajan R, Elayaraja A (2012) Evaluation of water quality pollution indices for heavy metal contamination monitoring: a case study from Curtin Lake, Miri City, East Malaysia. Environ Earth Sci 67:1987–2001. doi:10.1007/s12665-012-1639-6

    Article  Google Scholar 

  • Radojevic ID, Stefanovic DM, Comic LR, Ostojic AM, Topuzovic MD, Stefanovic ND (2012) Total Coliforms and data mining as a tool in water quality monitoring. Afr J Microbiol Res 6(10):2346–2356. doi:10.5897/AJMR11.1346

    Google Scholar 

  • Rajagopalan B, Lall U (1999) A k-nearest-neighbor simulator for daily precipitation and other weather variables. Water Resour Res 35(10):3089–3101

    Article  Google Scholar 

  • Ramesh S, Sukumaran N, Murugesan AG, Rajan MP (2010) An innovative approach of Drinking Water Quality Index-A case study from Southern Tamil Nadu, India. Ecol Ind 10:857–868. doi:10.1016/j.ecolind.2010.01.007

    Article  Google Scholar 

  • Russell S, Norvig P (2014) Artificial Intelligence: a modern approach. Pearson Education Limited, London

    Google Scholar 

  • Sargaonkar A, Deshpande V (2003) Development of an Overall Index of Pollution for surface water based on a general classification scheme in Indian Context. Environ Monit and Assess 89:43–67

    Article  Google Scholar 

  • Singh RP, Nath S, Prasad SC, Nema AK (2008) Selection of suitable aggregation function for estimation of aggregate pollution index for River Ganges in India. J Environ Eng-ASCE 134(8):689–701. doi:10.1061/(ASCE)0733-9372(2008)134:8(689)

    Article  Google Scholar 

  • Tan P-N, Steinbach M, Kumar V (2005) Introduction to data mining. Addison-Wesley Longman Publishing Co., Inc, Boston

    Google Scholar 

  • Verma A, Wei X, Kusiak A (2013) Predicting the total suspended solids in wastewater: a data-mining approach. Eng Appl Artif Intel 26:1366–1372. doi:10.1016/j.engappai.2012.08.015

    Article  Google Scholar 

  • Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhi-Hua Zhou, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inf Syst 14:1–37. doi:10.1007/s10115-007-0114-2

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sakshi Babbar.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Babbar, R., Babbar, S. Predicting river water quality index using data mining techniques. Environ Earth Sci 76, 504 (2017). https://doi.org/10.1007/s12665-017-6845-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12665-017-6845-9

Keywords

Navigation