Abstract
Multivariate analysis (MV) and data mining (DM) techniques were applied to a small water quality dataset obtained from the surface waters at three water quality monitoring stations in the Petaquilla River Basin, Panama, during the hydrological period of 2008 through 2011 for the assessment and understanding of the ongoing environmental stress within the river basin. From Factor Analysis (PCA/FA), results indicated that the factors which changed the quality of the water for the two seasons differed. During the dry (low flows) season, water quality showed to be strongly influenced by turbidity (NTU) and total suspended solids (TSS) concentrations. In contrast, during the wet (high flows) season the main changes on water quality sources were characterized by an inverse relation of NTU and TSS with the electrical conductivity (EC) and chlorides (CL), followed by significant sources of agricultural pollution. To complement the MV analysis, DM techniques like cluster analysis (CA) and classification (CLA) was applied to the data. Cluster analysis was used to separate the stations based on their levels of pollution and the classification of stations was implemented by C5.0 algorithm to classify stations of unknown origin into one of the several known groups of water quality constituents. The study demonstrated that the major water pollution threats to the Petaquilla River Basin are industrial and urban development in character and uses of agricultural and grazing land which are defined as non-point sources. The use of DM techniques was to complement the MV analysis. Taking into account the limited data, the usage of these methodologies is regarded useful in aiding water managers for implementing water monitoring campaigns and in setting priorities for improving and protecting water quality sources that are impaired due to land disturbances from anthropogenic activities.
Keywords
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Panama, M.: Environmental Impact Assessment Study (2010)
Carpenter, S.R., Caraco, N.F., Correll, D.L., Howarth, R.W., Sharpley, A.N., Smith, V.H.: Nonpoint pollution of surface waters with phosphorus and nitrogen. Ecol. Appl. 8(3), 559–568 (1998)
Wetzel, R.G.: Gradient-dominated ecosystems: sources and regulatory functions of dissolved organic matter in freshwater ecosystems. Hydrobiologia 229(1), 181–198 (1992)
Dinar, A., ed.: Restoring and Protecting the World’s Lakes and Reservoirs, vol. 289. World Bank Publications (1995)
Lewis, W.M.: Basis for the protection and management of tropical lakes. Lakes Reserv. Res. Manage. 5(1), 35–48 (2000)
Bagenal, T.B.: Fecundity in eggs and early life history (Bagenal, T.B., Braum, E Part 1). In: Bagenal, T.B. (ed.) Methods for Assessment of Fish Production in Freshwaters, 3rd edn. pp. 166–178 (1978)
Simeonov, V., Einax, J.W., Stanimirova, I., Kraft, J.: Environmetric modeling and interpretation of river water monitoring data. Anal. Bional. Chem. 374(5), 898–905 (2002)
Praus, P.: Water quality assessment using SVD-based principal component analysis of hydrological data. Water SA 31(4), 417–422 (2005)
Jayakumar, R., Siraz, L.: Factor analysis in hydrogeochemistry of coastal aquifers–a preliminary study. Environ. Geol. 31(3-4), 174–177 (1997)
Spanos, T., Simeonov, V., Stratis, J., Xristina, X.: Assessment of water quality for human consumption. Microchim. Acta 141(1), 35–40 (2003)
Lu, J., Huang, T.: Data mining on forecast raw water quality from online monitoring station based on decision-making tree. In: Fifth International Joint Conference on INC, IMS and IDC, NCM 2009, pp. 706–709. IEEE (2009)
Fu-Cheng, L., Xue-Zhao, H.: Application of fuzzy c-means clustering for assessing rural surface water quality in Lianyungang City. In: 2013 Fifth International Conference on Measuring Technology and Mechatronics Automation (ICMTMA), pp. 291–295. IEEE (2013)
Mjolsness, E., DeCoste, D.: Machine learning for science: state of the art and future prospects. Science 293(5537), 2051–2055 (2001)
Jiang, Y., Li, M., Zhou, Z.-H.: Mining extremely small data sets with application to software reuse. Softw. Pract. Exper. 39(4), 423–440 (2009). https://doi.org/10.1002/spe.905
Andonie, R.: Extreme data mining: inference from small datasets. Int. J. Comput. Commun. Control 5(3), 280–291 (2010)
Natek, S., Zwilling, M.: Student data mining solution–knowledge management system related to higher education institutions. Expert Syst. Appl. 41(14), 6400–6407 (2014)
R Core Team: A language and environment for statistical computing. R Foundation for Statistical Computing Department of Agronomy, Faculty of Agriculture of the University of the Free State. Vienna, Austria (2017). https://www.R-project.org/
Hendrickson, A.E., White, P.O.: Promax: a quick method for rotation to oblique simple structure. Br. J. Stat. Psychol. 17, 65–70 (1964)
Ho, R.: Handbook of Univariate and Multivariate Data Analysis and Interpretation with SPSS. CRC Press (2006)
Abel, P.D.: Water pollution biology. CRC Press (1996)
Ayoade, A.A., Fagade, S.O., Adebisi, A.A.: Dynamics of limnological features of two man-made lakes in relation to fish production. Afr. J. Biotechnol. 5(10), 1013–1021 (2006)
Fataei, E., Shiralipoor, S.: Evaluation of surface water quality using cluster analysis: a case study. World J. Fish Mar. Sci. 3, 366–370 (2011)
Areerachakul, S., Sanguansintukal, S.: Classification and regression trees and MLP neural network to classify water quality of canals in Bangkok, Thailand. Int. J. Intell. Comput. Res. (IJICR) 1(1/2), 43–50 (2010)
Quinlan, J.R.: Induction of Decision Trees. Mach. Learn. 1(1), 81–106 (1986)
Salzberg, S.L.: C4.5: programs for machine learning by J. Ross Quinlan. Morgan Kaufmann Publishers, Inc., Mach. Learn. 16(3), 235–240 (1993)
Acknowledgement
The authors of this experiment will like to express their appreciation to Minera Panama S.A., Environmental Department for providing the necessary data. This work has been partially supported by the Spanish MICINN under projects: TRA2015–63708-R, and TRA2016-78886-C3-1-R.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this paper
Cite this paper
Simmonds, J., Gómez, J.A., Ledezma, A. (2018). Knowledge Inference from a Small Water Quality Dataset with Multivariate Statistics and Data-Mining. In: Angelov, P., Iglesias, J., Corrales, J. (eds) Advances in Information and Communication Technologies for Adapting Agriculture to Climate Change. AACC'17 2017. Advances in Intelligent Systems and Computing, vol 687. Springer, Cham. https://doi.org/10.1007/978-3-319-70187-5_1
Download citation
DOI: https://doi.org/10.1007/978-3-319-70187-5_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-70186-8
Online ISBN: 978-3-319-70187-5
eBook Packages: EngineeringEngineering (R0)