Skip to main content
Log in

A practitioner’s guide for exploring water quality patterns using principal components analysis and Procrustes

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

To design sustainable water quality monitoring programs, practitioners must choose meaningful variables, justify the temporal and spatial extent of measurements, and demonstrate that program objectives are successfully achieved after implementation. Consequently, data must be analyzed across several variables and often from multiple sites and seasons. Multivariate techniques such as ordination are common throughout the water quality literature, but methods vary widely and could benefit from greater standardization. We have found little clear guidance and open source code for efficiently conducting ordination to explore water quality patterns. Practitioners unfamiliar with techniques such as principal components analysis (PCA) are faced with a steep learning curve to summarize expansive data sets in periodic reports and manuscripts. Here, we present a seven-step framework for conducting PCA and associated tests. The last step is dedicated to conducting Procrustes analysis, a valuable but rarely used test within the water quality field that describes the degree of concordance between separate multivariate data matrices and provides residual values for similar points across each matrix. We illustrate the utility of these tools using three increasingly complex water quality case studies in US parklands. The case studies demonstrate how PCA and Procrustes analysis answer common applied monitoring questions such as (1) do data from separate monitoring locations describe similar water quality regimes, and (2) what time periods exhibit the greatest water quality regime variability? We provide data sets and annotated R code for recreating case study results and as a base for crafting new code for similar monitoring applications.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Bowles, D. E., Peitz, D. G., & Cribbs, J. T. (2013). Aquatic invertebrate community structure in the Niobrara River, Agate Fossil Beds National Monument, Nebraska, 1996-2009. Great Plains Research, 23, 1–10.

    Google Scholar 

  • Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: a practical information-theoretic approach. New York, NY: Springer.

    Google Scholar 

  • Fancy, S. G., Gross, J. E., & Carter, S. L. (2009). Monitoring the condition of natural resources in US National Parks. Environmental Monitoring and Assessment, 151, 161–174.

    Article  CAS  Google Scholar 

  • Gutentag, E. D., Heimes, F. J., Krothe, N. C., Luckey, R. R., & Weeks, J. B. (1984). Geohydrology of the high plains aquifer in parts of Colorado, Kansas, Nebraska, New Mexico, Oklahoma, South Dakota, Texas, and Wyoming. USGS Professional Paper 1400-B.

  • Hair, J. F., Jr., Anderson, R. E., & Tatham, R. L. (1987). Multivariate data analysis. New York, NY: MacMillan.

    Google Scholar 

  • Horn, J. L. (1965). A rationale and test for the number of factors in factor analysis. Psychometrika, 30, 179–185.

    Article  CAS  Google Scholar 

  • Jackson, D. A. (1993). Stopping rules in principal components analysis: a comparison of heuristical and statistical approaches. Ecology, 74, 2204–2214.

    Article  Google Scholar 

  • Legendre, P., & Legendre, L. (1998). Numerical ecology. Amsterdam, The Netherlands: Elsevier Science B. V.

    Google Scholar 

  • Linting, M., Meulman, J. J., Groenen, P. J. F., & Van der Kooij, J. J. (2007). Nonlinear principal components analysis: introduction and application. Psychological Methods, 12, 336–358.

    Article  Google Scholar 

  • McGarigal, K., Cushman, S., & Stafford, S. (2000). Multivariate statistics for wildlife and ecology research. New York, NY: Springer Science + Business Media.

    Book  Google Scholar 

  • Muangthong, S., & Shrestha, S. (2015). Assessment of surface water quality using multivariate statistical techniques: case study of the Nampong River and Songkhram River, Thailand. Environmental Monitoring and Assessment. doi:10.1007/s10661-015-4774-1.

    Google Scholar 

  • NPS. (2002). Recommendations for core water quality monitoring parameters and other key elements of the NPS vital signs program water quality monitoring component. National Park Service white paper, Fort Collins, Colorado. http://www.nature.nps.gov/water/vitalsigns/assets/docs/COREparamFINwSIGpg.pdf. Accessed 18 September 2015].

  • Olden, J. D., Jackson, D. A., & Peres-Neto, P. R. (2001). Spatial isolation and fish communities in drainage lakes. Oecologia, 127, 575–585.

    Article  Google Scholar 

  • Olsen, R. L., Chappell, R. W., & Loftis, J. C. (2012). Water quality sample collection, data treatment and results presentation for principal components analysis—literature review and Illinois River watershed case study. Water Research, 46, 3110–3122.

    Article  CAS  Google Scholar 

  • Ouyang, Y. O. (2005). Evaluation of river water quality monitoring stations by principal components analysis. Water Research, 39, 2621–2635.

    Article  CAS  Google Scholar 

  • Peres-Neto, P. R., & Jackson, D. A. (2001). How well do multivariate data sets match? The advantages of a procrustean superimposition approach over the Mantel test. Oecologia, 129, 169–178.

    Article  Google Scholar 

  • Peres-Neto, P. R., Jackson, D. A., & Somers, K. M. (2003). Giving meaningful interpretation to ordination axes: assessing loading significance in principal component analysis. Ecology, 84, 2347–2363.

    Article  Google Scholar 

  • R Core Team. (2014). R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. http://www.R-project.org/. Accessed 18 September 2015.

  • Sergeant, C. J., & Nagorski, S. A. (2014). The implications of monitoring frequency for describing riverine water quality regimes. River Research and Applications, 31, 602–610.

    Article  Google Scholar 

  • Shearer, J., Moore, C., Bartz, K. K., Booher, E. C. J., & Nelson, J. (2015). Monitoring freshwater systems in the Southwest Alaska Network: Standard operating procedures. Natural Resource Report NPS/SWAN/NRR—2015/925.1. Fort Collins, Colorado: National Park Service.

    Google Scholar 

  • Spurgeon, J. J., Stasiak, R. H., Cunningham, G. R., Pope, K. L., & Pegg, M. A. (2014). Status of native stream fishes within selected protected areas of the Niobrara River in Western Nebraska. Great Plains Research, 24, 71–78.

    Article  Google Scholar 

  • Starkey, E. N. (2012). Upper Columbia Basin Network integrated water quality annual report 2011: Nez Perce National Historical Park (NEPE). Natural Resource Technical Report NPS/UCBN/NRTR—2012/571. Fort Collins: National Park Service.

    Google Scholar 

  • Starkey, E. N., Garrett, L. K., Rodhouse, T. J., Dicus, G. H., & Steinhorst, R. K. (2008). Upper Columbia Basin Network integrated water quality monitoring protocol: narrative version 1.0. Natural Resource Report NPS/UCBN/NRR—2008/026. National Park Service: Fort Collins, CO.

    Google Scholar 

  • Wagner, R. J., Boulger Jr., R. J., Oblinger, C. J., & Smith, B. A. (2006). Guidelines and Standard procedures for continuous water-quality monitors: station operation, record computation, and data reporting: U.S. Geological Survey Techniques and Methods 1–D3, 51.

  • Wilson, T. L., & Moore, C. (2013). A review of lake vertical profile monitoring in the Southwest Alaska Network: recommendations for future efforts. Natural Resource Technical Report NPS/SWAN/NRTR—2013/689. Fort Collins, Colorado: National Park Service.

    Google Scholar 

  • Wilson, M. H., & Wilson, S. K. (2014). Water quality monitoring protocol for wadeable streams and rivers in the Northern Great Plains Network: Standard operating procedures version 1.0. Natural Resource Report NPS/NGPN/NRR—2014/868.1. Fort Collins, Colorado: National Park Service.

    Google Scholar 

  • Zar, J. H. (2010). Biostatistical analysis. Upper Saddle River, NJ: Prentice-Hall.

    Google Scholar 

Download references

Acknowledgments

This article was conceived during a workshop funded by the National Park Service Inventory and Monitoring Program. A. Larsen contributed initial article ideas. K. Sherrill and J. Best, Jr. provided valuable preliminary reviews of the manuscript. P. Lisi shared R code to create color ramp biplots for Fig. 5. The views expressed in this article do not necessarily represent the views of the US National Park Service. Any use of trade, firm, or product names is for descriptive purposes only and does not imply endorsement by the US Government.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to C. J. Sergeant.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Online Resource 1

(1_CheckNorm_RCode.R): R script assessing data normality (“check.norm” function) (TXT 3 kb)

Online Resource 2

(2_BrokenStick.xlsx): Spreadsheet calculating the broken-stick model (XLSX 15 kb)

Online Resource 3

(3_CaseStudyA.xlsx): Case study A data (Northern Great Plains Network) (XLSX 10 kb)

Online Resource 4

(4_CaseStudyA_RCode.R): Case study A. R script with detailed annotations (TXT 3 kb)

Online Resource 5

(5_CaseStudyB.xlsx). Case study B data (Upper Columbia Basin Network) (XLSX 12 kb)

Online Resource 6

(6_CaseStudyB_RCode.R): Case study B. R script with detailed annotations (TXT 9 kb)

Online Resource 7

(7_CaseStudyC.xlsx): Case study C data (Southwest Alaska Network) (XLSX 13 kb)

Online Resource 8

(8_CaseStudyC_Env.xlsx): Case study C environmental variables data matrix (XLSX 12 kb)

Online Resource 9

(9_CaseStudyC_RCode.R): Case study C. R script with detailed annotation (TXT 18 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sergeant, C.J., Starkey, E.N., Bartz, K.K. et al. A practitioner’s guide for exploring water quality patterns using principal components analysis and Procrustes. Environ Monit Assess 188, 249 (2016). https://doi.org/10.1007/s10661-016-5253-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-016-5253-z

Keywords

Navigation