Skip to main content
Log in

Assessment the performance of classification methods in water quality studies, A case study in Karaj River

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

To show the performance of classification methods in water quality studies, linear discriminant, and Naïve Bayesian classification methods were applied at nine sampling stations with respect to four parameters including COD, nitrite, nitrate, and total coliforms (selected from ten water quality variables) in Karaj River, Iran. To fulfill the goals of this study, the sampling stations were first separated into two groups using cluster analysis. Rural wastewater was the main source of pollution in the first group, whereas the quality of water in the second group has been degraded mainly by organic and agricultural pollution. In order to have an independent group against which the performance of other classification methods is considered, three cross-validation methods including twofold, leave-one-out, and holdout methods were utilized to retain an independent test set. The results of cross-validation for the linear discriminant analysis show that, except for the leave-one-out method with 11.1 % misclassification error, the overall performance has been the same as that of the training data set. Therefore, it has outperformed compared with that of Naïve Bayesian classification method. However, even though in situations where the correlation coefficient among the parameters is low, the latest method can offer the same performance as that of linear discriminant analysis as well. A sensitivity analysis was implemented using ten water quality variables (pH, COD, EC, TDA, turbidity, nitrate, nitrite, sulfate, TC, and FC) to find the most important variables in the classification of Karaj River showing that turbidity, next to COD, pH, nitrate, and sulfate, have had the most contribution in this field.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Astel, A., Biziuk, M., Przyjazny, A., & Namieśnik, J. (2006). Chemometrics in monitoring spatial and temporal variations in drinking water quality. Water Research, 40(8), 1706–1716.

    Article  CAS  Google Scholar 

  • Altartouri, A., & Jolma, A. (2013). A Naive Bayes classifier for modeling distributions of the common reed in Southern Finland, 20th International Congress on Modelling and Simulation, Adelaide, Australia.

  • Ashari, A., Paryudi, I., & Tjoa, A. (2013). Performance Comparison between Naïve Bayes, Decision Tree and k-Nearest Neighbor in Searching Alternative Design in an Energy Simulation Tool. IJACSA, 4(11), 33–39.

    Article  Google Scholar 

  • Astel, A., Tsakovski, S., Barbieri, P., & Simeonov, V. (2007). Comparison of Self-Organizing Maps classification approach with cluster and principal components analysis for large environmental data sets. Water Research, 41(19), 4566–4578.

    Article  CAS  Google Scholar 

  • Baum, E. B., & Haussler, D. (1989). What size net gives valid generalization? Neurocomputing, 6, 151–160.

    Google Scholar 

  • Berrada, M., El Hmaidi, A., Mounyr, N., Abrid, D., Abdallaoui, A., Essahlaoui, A., & El Ouali, A. (2014). Self-organizing map for the detection of seasonal variations in Sidi Chahed dam sediments (Northern Morocco). Hydrological Sciences Journal. doi:10.1080/02626667.2014.964717.

    Google Scholar 

  • Bricker, O. P., & Jones, B. F. (1995). Main factors affecting the composition of natural waters. In B. Salbu & E. Steinnes (Eds.), Trace Elements in Natural Waters (pp. 1–5). Boca Raton, FL: CRC Press.

    Google Scholar 

  • Burden, F. R., Donnert, D.,Godish, T., Mckelvie, I. (2004), Environmental monitoring handbook, McGraw-Hill Handbooks.

  • Cao, Y., Bark, A. W., & Williams, W. P. (1997). A comparison of clustering methods for river benthic community analysis. Hydrobiologia, 347(1–3), 24–40.

    Article  Google Scholar 

  • Caruana, R., & Niculescu-Mizil, A. (2006). An empirical comparison of supervised learning algorithms, Proceedings of the 23rd international conference on Machine learning.

  • Chapman D. (1992). Water Quality Assessment. In: Chapman D on behalf of UNESCO, WHO and UNEP, London:Chapman & Hall,pp 585.

  • Chavoshian, S., A. (2005). An overview to trans boundary and shared water resources management in Iran, technical challenges and solutions. Proc.of the Int. Conf. on Role of Water Sciences in Trans boundary River Basin Management, Thailand, pp. 189–195.

  • Dollar, E. S. J., James, C. S., Rogers, K. H., Thoms, M. C. (2007). A framework for interdisciplinary understanding of rivers as ecosystems. Geomorphology, 89, 147–169.

  • Galiano, V. R., Mendes, M. P., Soldado, M. J. G., Olmo, M. C., & Ribeiro, L. (2014). Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain). Science of Total Environment, 476–477, 189–206.

    Article  Google Scholar 

  • Gazzaz, N. M.,Yusoff, M. K., Aris, A. Z., Juahir, H., Ramli, M. F. (2012). Artificial neural network modeling of the water quality index for Kinta River (Malaysia) using water quality variables as predictors. Marine Pollution Bulletin, 64, 2409–2420.

  • Granato, G.E., (1996). Deicing chemicals as a source of constituents in highway runoff: Washington D.C., Transportation Research Record 1533, Transportation Research Board,National Research Council, p. 50–58.

  • Han, J., & Kamber, M. (2001). Data Mining Concepts and Techniques. CA: Morgan Kaufmann Publishers.

    Google Scholar 

  • Hand, D. J. (2009). Naïve Bayes. In X. Wu & V. Kumar (Eds.), The top ten algorithms in data mining (pp. 163–178). Boca Raton, FL: Chapman & Hall/CRC Press. doi:10.1201/9781420089653.ch9.

    Chapter  Google Scholar 

  • Harris, N. M., Gurnell, A. M., Hannah, D. M., & Petts, G. E. (2000). Classification of river regimes: a context for hydroecology. Hydrological Processes, 14(16–17), 2831–2848.

    Article  Google Scholar 

  • Jolliffe, I. T. (1973). Discarding variables in a principal component analysis.II:Real data. Applied Statistics, 22, 21–31.

    Article  Google Scholar 

  • King, J. R., & Jackson, D. A. (1999). Variable selection in large environmental data sets using principal components analysis. Environmetrics, 10, 67–77.

    Article  Google Scholar 

  • Li, X., Gan, Y., Zhou, A., Liu, Y., & Wang, D. (2013). Hydrological controls on the sources of dissolved sulfate in the Heihe River, a large inland river in the arid northwestern China, inferred from S and O isotopes. Applied Geochemistry, 35, 99–109.

    Article  Google Scholar 

  • Marques de Sá J. P. (2007). Applied Statistics using SPSS, STATISTICA, MATLAB and R. Springer – Verlag, 2nd edition, New York, USA.

  • Mast, M. A., Turk, J. T., Ingersoll, G. P., Clow, D. W., & Kester, C. L. (2001). Use of stable sulfur isotopes to identify sources of sulfate in Rocky Mountain snowpacks. Atmospheric Environment, 35, 3303–3313.

    Article  CAS  Google Scholar 

  • Ministry of Power (1991). A report of sediment logy and sediment measurements of Amir Kabir Dam. Institute of investigations and laboratories of Tehran water resources. p:113. In Persian.

  • Modaresi, F., & Araghinejad, S. (2014). A Comparative Assessment of Support Vector Machines, Probabilistic Neural Networks, and K-Nearest Neighbor Algorithms for Water Quality Classification. Water Resources Management, 28, 4095–4111.

    Article  Google Scholar 

  • Monk, W. A., Wood, P. J., Hannah, D. M., Wilson, D. A., Extence, C. A., & Chadd, R. P. (2006). Flow variability and macroinvertebrate community response within riverine systems. River Research and Applications, 22(5), 595–615.

    Article  Google Scholar 

  • Naumoski, A., & Mitreski, K. (2010). Naïve Bayes technique for diatoms classification with discretised input, ICT Innovations 2010 Web Proceedings ISSN 1857–7288.

  • Okun, O. (2011). Feature selection and ensemble methods for bioinformatics, algorithmic classification and implementations, Medical Information Science Reference, p:445.

  • Olkowska,E., Kudłak, B., Tsakovski, S., Ruman, M., Simeonov, V., Polkowska, P. (2014). Assessment of the water quality of Kłodnica River catchment using self-organizing maps. Science of the Total Environment, 476–477,477–484.

  • Park, Y. S., Kwon, Y. S., Hwang, S. J., & Park, S. (2014). Characterizing effects of landscape and morphometric factors on water quality of reservoirs using a self-organizing map. Environmental Modelling Software, 55, 214–221.

    Article  Google Scholar 

  • Piotrowski, A. P., Napiorkowski, J. J. (2013). A comparison of methods to avoid overfitting in neural networks training in the case of catchment runoff modeling. Journal of Hydrology, 476, 97–111.

  • Qishlaqi, A., & Moore, F. (2007). Statistical Analysis of Accumulation and Sources of Heavy Metals Occurrence in Agricultural Soils of Khoshk River Banks, Shiraz, Iran. American-Eurasian Jornal of Agricultural & Environmental Science, 2(5), 565–573.

    Google Scholar 

  • Ratanamahatana, C. A., & Gunopulos, D. (2002). Scaling up the Naive Bayesian Classifier: using decision trees for feature selection. Proceedings of workshop on data cleaning and pre-processing (DCAP 2002), at IEEE International Conference on Data Mining, Maebashi, Japan.

  • Rogowska, J., Kudłak, B., Tsakovski, S., Wolska, L., Simeonov, V., & Namieśnik, J. (2014). Novel approach to ecotoxicological risk assessment of sediment scores around the ship wreck by the use of self-organizing maps. Ecotoxicology and Environmental Safety, 104, 239–246.

    Article  Google Scholar 

  • Romesburg, H. C. (1984). Cluster analysis for researchers. Belmont, CA: Lifetime Learning Publications.

    Google Scholar 

  • Roshani, A. R., Mosaedi, A., Sedghi, H., Babazadeh, H., & Manshouri, M. (2012). Estimation of sedimentation in Karaj and Torogh Dam reservoirs (Iran) by hydrological models and comparison with actual sediment. Ecology Environment and Conservation, 18(2), 835–847.

    Google Scholar 

  • Saghebian, S. M., Sattari, M. T., Mirabbasi, R., & Pal, M. (2014). Ground water quality classification by decision tree method in Ardebil region, Iran. Arabian Journal Geoscience, 7, 4767–4777.

    Article  Google Scholar 

  • Shrestha, S., Kazama, F., & Nakamura, T. (2008). Use of principal component analysis, factor analysis and discriminant analysis to evaluate spatial and temporal variations in water quality of the Mekong River. Journal of Hydroinformatics, 10, 43–56.

    Article  Google Scholar 

  • Singh, K. P., Basant, N., & Gupta, S. (2011). Support vector machines in water quality management. Analytica Chimica Acta, 703, 152–162.

    Article  CAS  Google Scholar 

  • Snelder, T. H., Biggs, B. J. F., & Woods, R. A. (2005). Improved ecohydrological classification of rivers. River Reseach and Applications, 21(6), 609–628.

    Article  Google Scholar 

  • Samsudin, M. S., Juahir, H., Zain, S. M., & Adnan, N. H. (2011). Surface river water quality interpretation using environmetric techniques: case study at Perlis River basin, Malaysia. International Journal of Environmental Protection, 1, 1–8.

    Article  Google Scholar 

  • Sun, D. W. (2009). Infrared spectroscopy for food quality analysis and control, Academic press in an imprint of Elsevier, 51–82.

  • Tan, G., Yan, J., Gao, C., & Yang, S. (2012). Prediction of water quality time series data based on least squares support vector machine. Procedia Engineering, 31, 1194–1199.

    Article  CAS  Google Scholar 

  • Tinsley, H. E. A., & Brown, S. D. (2000). Handbook of applied multivariate statistics and mathematical modeling. New York: Academic Press.

    Google Scholar 

  • Towler, E., Rajagopalan, B., Seidel, C., & Summers, R. S. (2009). Simulating ensembles of source water quality using a K-nearest neighbor resampling approach. Environmental Science and Technology, 43(5), 1407–1411.

    Article  CAS  Google Scholar 

  • Tsakovski, S. Simeonov, V. (2014). Hasse diagram technique contributions to environmental risk assessment. Multi-indicator systems and modeling in partial order, pp 293–313.

  • Voyslavov, T., Tsakovski, S., & Simeonov, V. (2013). Hasse diagram technique as a tool for water quality assessment. Analytica Chimica Acta, 770, 29–35.

    Article  CAS  Google Scholar 

  • Wan, J., Bu, H., Zhang, Y., & Meng, W. (2013). Classification of rivers based on water quality assessment using factor analysis in Taizi River basin, northeast China. Environmental Earth Science, 69, 909–919.

    Article  CAS  Google Scholar 

  • Wu, M. L., Wang, Y. S., Sun, C. C., Wang, H., Dong, J. D., & Han, S. H. (2009). Identification of anthropogenic effects and seasonality on water quality in Daya Bay, South China Sea. Journal of Environmental Management, 90(10), 3082–3090.

    Article  CAS  Google Scholar 

  • Wu, E. M. Y., & Kuo, S. L. (2012). Applying a Multivariate Statistical Analysis Model to Evaluate the Water Quality of a Watershed. Water Environmental Research, 84, 2075–2085.

    Article  CAS  Google Scholar 

  • Xin, X., Wen-xi, L., & Lei, G. (2010). Discriminant analysis method application in water quality assessment: take Yinma River as example, 4th International Conference on Bioinformatics and Biomedical Engineering, China.

Download references

Acknowledgments

All of the financial support related to field sampling and laboratory analysis has been provided by the Alborz Department of Environment. The author is grateful to Ms. Elahe Pourkarimi for her kind help during the execution of this project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohamad Sakizadeh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sakizadeh, M. Assessment the performance of classification methods in water quality studies, A case study in Karaj River. Environ Monit Assess 187, 573 (2015). https://doi.org/10.1007/s10661-015-4761-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-015-4761-6

Keywords

Navigation