Skip to main content
Log in

On the use of multivariate statistical methods for combining in-stream monitoring data and spatial analysis to characterize water quality conditions in the White River Basin, Indiana, USA

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

Mechanistic hydrologic and water quality models provide useful alternatives for estimating water quality in unmonitored streams. However, developing these elaborate models for large watersheds can be time-consuming and expensive, in addition to challenges that arise during calibration when there is limited spatial and/or temporal monitored in-stream water quality data. The main objective of this research was to investigate different approaches for developing multivariate analysis models as alternative methods for rapidly assessing relationships between spatio-temporal physical attributes of the watershed and water quality conditions in monitored streams, and then using the developed relationships for estimating water quality conditions in unmonitored streams. The study compares the use of various statistical estimates (mean, geometric mean, trimmed mean, and median) of monitored water quality variables to represent annual and seasonal water quality conditions. The relationship between these estimates and the spatial data is then modeled via linear and non-linear multivariate methods. Overall, the non-linear techniques for classification outperformed the linear techniques with an average cross-validation accuracy of 79.7%. Additionally, the geometric mean based models outperformed models based on other statistical indicators with an average cross-validation accuracy of 80.2%. Dividing the data into annual and quarterly datasets also offered important insights into the behavior of certain water quality variables impacted by seasonal variations. The research provides useful guidance on the use and interpretation of the various statistical estimates and statistical models for multivariate water quality analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akume, D., & Weber, G. W. (2002). Cluster algorithms: theory and methods. Journal of Computational Technologies, 7(1), 15–27.

    Google Scholar 

  • Alhoniemi, E., Himberg, J., Parviainen, J., & Vesanto, J. (1999). SOM Toolbox 2.0, a software library for Matlab 5 implementing the self-organizing map algorithm. Retrieved from http://www.cis.hut.fi/somtoobox

  • Bezdek, J. C., & Pal, N. R. (1998). Some new indexes of cluster validity. IEEE transactions on Systems, Man, and Cybernetics—Part B: Cybernetics, 28(3), 301–315.

    Article  CAS  Google Scholar 

  • Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society. Series B (Methodological), 26(2), 211–252.

    Google Scholar 

  • Chang, C., & Lin, C. (2001). LIBSVM: A library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm

  • Chen, Y., & Lin, C. (2006). Combining SVMs with various feature selection strategies (Vol. 207, pp. 315–324). Studies in Fuzziness and Soft Computing, Springer Berlin.

  • Dalzell, B. J., Filley, T. R., & Harbor, J. M. (2006). The role of hydrology in annual organic carbon loads and terrestrial organic matter export from a Midwestern agricultural watershed. Geochica et Cosmochimica Acta, 71, 1448–1462.

    Article  Google Scholar 

  • Davis, J. C. (2002). Statistics and data analysis in geology (3rd ed.). New York: Wiley.

    Google Scholar 

  • ESRI (2005). Arc Hydro–HydroID. Version 1.1 Final, July 2005.

  • Fenelon, J. M. (1998). Water quality in the White River Basin, Indiana, 1992–1996: U.S. Geological Survey Circular 1150.

  • Fetter, C. W. (2001). Applied hydrology (4th ed.). Upper Saddle River: Prentice Hall.

    Google Scholar 

  • Fry, J. A., Coan, M. J., Homer, C. G., Meyer, D. K., & Wickham, J. D. (2009). Completion of the national land cover database (NLCD) 1992–2001 land cover change retrofit product. USGS Open-File Report 2008-1379, 18 p.

  • Gunn, S. R. (1998). Support vector machines for classification and regression. Technical Report, University of Southhampton.

  • Hammer, O., Harper, D. A. T., & Ryan, P. D. (2009). PAST—Palaeontological STastitics, ver. 1.89. Technical Report.

  • Hellweger, F. (1997). AGREE-DEM Surface Reconditioning System. University of Texas at Austin, http://www.ce.utexas.edu/prof/maidment/GISHYDRO/ferdi/research/agree/agee.ht#Part2

  • Hem, J. (1985). Study and interpretation of the chemical characteristics of natural water (3rd ed.). US Geological Survey Water Supply Paper 2254.

  • Homer, C. C., Huang, C., Yang, L., Wylie, B., & Coan, M. (2004). Development of a 2001 National Landcover Database for the United States. Photogrammetric Engineering and Remote Sensing, 70(7), 829–840.

    Google Scholar 

  • Hsu, C., Change, C., & Lin, C. (2010). A practical guide to support vector classification. Technical report, Dept. of Computer Science, National Taiwan University, Taipei 106, Taiwan. http://www.csie.ntu.edu.tw/~cjlin.

  • Iscen, C. F., Altin, A., Senoglu, B., & Yavuz, H. S. (2009). Evaluation of surface water quality characteristics by using multivariate statistical techniques: a case study of the Euphrates river basin, Turkey. Environmental Monitoring and Assessment, 151, 259–264.

    Article  CAS  Google Scholar 

  • Iscen, C. F., Emiroglu, O., Ilhan, S., Arslan, N., Yilmaz, V., & Ahiska, S. (2008). Application of multivariate statistical techniques in the assessment of surface water quality in Uluabat Lake, Turkey. Environmental Monitoring and Assessment, 144, 269–276.

    Article  CAS  Google Scholar 

  • Jacques, D. V., & Crawford, C. G. (1991). National water quality assessment program white River Basin: U.S. Geological Survey Open-File Report 91 169, 2 p. (WATER FACT SHEET)

  • Kartoun, U., Stern, H., & Edan, Y. (2006). Bag classification using support vector machines. In Applied Soft computing technologies: The challenge of complexity (pp. 665–674).

  • Kecman, V. (2001). Learning and soft computing—support vector machines, neural networks, and fuzzy logic models (slides accompanying book). Cambridge: The MIT Press.

    Google Scholar 

  • Nilsson, R., Pena, J. M., Bjorkegren, J., & Tegner, J. (2006). Evaluating feature selection for SVMs in high dimensions. In Proceedings of the 17th European conference on machine learning (pp. 719–726).

  • Park, Y. (2003). Deliverable 12: publication of ANN model results. PAEQANN, European Commission, Contract No. EVK1-CT199900026. Available at http://aquaeco.ups-tlse.fr/.

  • Paul, S., Srinivasan, R., Sanabria, J., Haan, P. K., Mukhtar, S., & Neimann, K. (2006). Groupwise modeling and study of bacterially impaired watersheds in Texas: Clustering analysis. Journal of the AmericanWater Resources Association, 42(4), 1017–1031.

    Article  Google Scholar 

  • Rao, A. R., & Srinivas, V. V. (2008). Regionalization of watersheds, (Vol. 58). Springer Science+Business Media B.V. Water and Science Library of Technology.

  • Ren, Y., Liu, H., Xue, C., Yao, X., Liu, M., & Fan, B. (2006). Classification study of skin sensitizers based on support vector machine and linear discriminant analysis. Analytica Chimica Acta, 572, 272–282.

    Article  CAS  Google Scholar 

  • Rojas, R. (1996). Neural networks—a systematic introduction (pp. 391–412). Berlin: Springer.

    Google Scholar 

  • Santos-Roman, D. M., Warner, G. S., & Scatena, F. (2003). Multivariate analysis of water quality and physical characteristics of selected watersheds in Puerto Rico. Journal of the American Water Resources Association, Paper No. 01039.

  • Sojka, M., Siepak, M., Ziola, A., Frankowski, M., Murat-Blazejewska, S., & Siepak, J. (2008). Application of multivariate statistical techniques to evaluation of water quality in the Mala Welna River (Western Poland). Environmental Monitoring and Assessment, 147, 159–170.

    Article  CAS  Google Scholar 

  • SAS (SAS Institute Inc.) (2002–2004). SAS 9.1.3 Help and documentation. Cary: SAS Institute, Inc.

    Google Scholar 

  • Siegel, S. (1956). Nonparametric statistics for the behavioral sciences. New York: McGraw Hill.

    Google Scholar 

  • Singh, A., Maichle, R., & Lee, S. (2006). On the computation of a 95% upper confidence limit of the unkown population mean upon data sets with below detection limit observations. Las Vegas: USEPA, Contract No. 68-W-04 005.

  • Suhr, D. D. (2005). Principal component analysis vs. Factor analysis. SAS SUGI 30 Proceedings, Statistics and data analysis section, Paper No. 203-30, Cary, NC, SAS Institute.

  • Tabachnick, B. G., & Fidell, L. S. (1989). Using multivariate statistics (2nd ed.). New York: Harper and Row.

    Google Scholar 

  • Tedesco, L. P., Pascual, D. L., Shrake, L. K., Casey, L. R., Vidon, P. G. F., Hernly, F. V., et al. (2005). Eagle creek watershed management plan: An integrated approach to improved water quality. Eagle Creek Watershed Alliance, CEES Publication 2005–2007. Indianapolis: IUPUI.

  • USDA (2004). State Soil Geographic (STATSGO) data base—data use information. Natural Resources Conservation Service, National Soil Survey Center. Miscellaneous Publication Number 1492.

  • USEPA (1996). U.S. EPA NPDES Permit Writers’ Manual. Office of Water; EPA-833B-96-003.

  • Vesanto, J., & Alhoniemi, E. (2000). Cluster of the self organizing map. IEEE Transactions on Neural Networks, 11(3), 586–600.

    Article  CAS  Google Scholar 

  • Vesanto, J., Himberg, J., Alhoniemi, E., & Parhankangas, J. (2000). SOM toolbox for Matlab 5. SOM Toolbox Team. Helsinki University of Technology. Report A57

  • Ward, A., & Trimble, S. (2004). Environmental hydrology (2nd ed.). Boca Raton: Lewis.

    Google Scholar 

  • Yunrong, X., & Liangzhong, J. (2009). Water quality prediction using LS-SVM and particle swarm optimization. In Proceedings of the 2009 second international workshop on knowledge discovery and data mining (pp. 900–904).

  • Zhang, Y., Guo, F., Meng, W., & Wang, X. (2009). Water quality assessment and source identification of Daliao river basin using multivariate statistical methods. Environmental Monitoring and Assessment, 152, 105–121.

    Article  CAS  Google Scholar 

  • Zhou, F., Liu, Y., & Guo, H. (2007). Application of multivariate statistical methods to water quality assessment of the watercourses in Northwestern New Territories, Hong Kong. Environmental Monitoring and Assessment, 132, 1–13.

    Article  CAS  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Meghna Babbar-Sebens.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gamble, A., Babbar-Sebens, M. On the use of multivariate statistical methods for combining in-stream monitoring data and spatial analysis to characterize water quality conditions in the White River Basin, Indiana, USA. Environ Monit Assess 184, 845–875 (2012). https://doi.org/10.1007/s10661-011-2005-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10661-011-2005-y

Keywords

Navigation