Skip to main content
Log in

Correlation Analysis for Compositional Data

  • Published:
Mathematical Geosciences Aims and scope Submit manuscript

Abstract

Compositional data need a special treatment prior to correlation analysis. In this paper we argue why standard transformations for compositional data are not suitable for computing correlations, and why the use of raw or log-transformed data is neither meaningful. As a solution, a procedure based on balances is outlined, leading to sensible correlation measures. The construction of the balances is demonstrated using a real data example from geochemistry. It is shown that the considered correlation measures are invariant with respect to the choice of the binary partitions forming the balances. Robust counterparts to the classical, non-robust correlation measures are introduced and applied. By using appropriate graphical representations, it is shown how the resulting correlation coefficients can be interpreted.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall, London, 416 p

    Google Scholar 

  • Anderson TW (1958) An introduction to multivariate statistical analysis. Wiley, New York, 374 p

    Google Scholar 

  • Anděl J (1978) Mathematical statistics. SNTL/Alfa, Prague, 346 p (in Czech)

    Google Scholar 

  • Buccianti A, Pawlowsky-Glahn V (2005) New perspectives on water chemistry and compositional data analysis. Math Geol 37(7):703–727

    Article  Google Scholar 

  • Conover WJ (1998) Practical nonparametric statistics, 3rd edn. Wiley, New York, 584 p

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7):795–828

    Article  Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: From theory to practice. Special publications, vol 264. Geological Society, London, pp 145–160

    Google Scholar 

  • Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueraz G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3):279–300

    Article  Google Scholar 

  • Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3):233–248

    Article  Google Scholar 

  • Gabriel KR (1971) The biplot graphic display of matrices with application to principal component analysis. Biometrika 58:453–467

    Article  Google Scholar 

  • Harville DA (1997) Matrix algebra from a statistican’s perspective. Springer, New York, 630 p

    Google Scholar 

  • Johnson R, Wichern D (2007) Applied multivariate statistical analysis, 6th edn. Prentice-Hall, London, 816 p

    Google Scholar 

  • Mahalanobis P (1936) On the generalized distance in statistics. Proc Natl Inst Sci India 12:49–55

    Google Scholar 

  • Maronna R, Martin RD, Yohai VJ (2006) Robust statistics: Theory and methods. Wiley, New York, 436 p

    Book  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado J (2007) Lecture notes on compositional data analysis. http://diobma.udg.edu/handle/10256/297/

  • Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc Lond LX:489–502

    Google Scholar 

  • R Development Core Team (2008) R: A language and environment for statistical computing. Vienna, http://www.r-project.org

  • Reimann C, Filzmoser P (2000) Normal and lognormal data distribution in geochemistry: Death of a myth. Consequences for the statistical treatment of geochemical and environmental data. Environ Geol 39:1001–1014

    Article  Google Scholar 

  • Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat PD, Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents region. Special publication. Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), and Central Kola Expedition (CKE), Trondheim, Espoo, Monchegorsk, 745 p

    Google Scholar 

  • Reimann C, Filzmoser P, Garrett RG, Dutter R (2008) Statistical data analysis explained. Applied environmental statistics with R. Wiley, Chichester, 362 p

    Google Scholar 

  • Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Filzmoser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filzmoser, P., Hron, K. Correlation Analysis for Compositional Data. Math Geosci 41, 905–919 (2009). https://doi.org/10.1007/s11004-008-9196-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-008-9196-y

Keywords

Navigation