Mathematical Geosciences

, Volume 50, Issue 3, pp 273–298 | Cite as

Advances in Principal Balances for Compositional Data

  • J. A. Martín-Fernández
  • V. Pawlowsky-Glahn
  • J. J. Egozcue
  • R. Tolosona-Delgado
Article

Abstract

Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.

Keywords

Aitchison norm Cluster analysis Compositions Isometric logratio coordinates Principal component analysis Simplex 

Notes

Acknowledgements

This research has been supported by the Spanish Ministry of Economy and Competitiveness under the project CODA-RETOS (Ref: MTM2015-65016-C2-1(2)-R); and by the Agència de Gestió d’Ajuts Universitaris i de Recerca of the Generalitat de Catalunya under the project COSDA (Ref: 2014SGR551). The authors gratefully acknowledge the constructive comments of the anonymous referees which have undoubtedly helped to significantly improve the quality of the paper.

References

  1. Aitchison J (1982) The statistical analysis of compositional data (with discussion). J R Stat Soc B Methodol 44:139–177Google Scholar
  2. Aitchison J (1983) Principal component analysis of compositional data. Biometrika 70:57–65CrossRefGoogle Scholar
  3. Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall Ltd., London. (Reprinted in 2003 with additional material by The Blackburn Press)Google Scholar
  4. Aitchison J, Greenacre M (2002) Biplots for compositional data. J R Stat Soc C Appl 51:375–392CrossRefGoogle Scholar
  5. Barceló-Vidal C, Martín-Fernández JA (2016) The mathematics of compositional analysis. Austrian J Stat 45:57–71CrossRefGoogle Scholar
  6. Chipman HA, Gu H (2005) Interpretable dimension reduction. J Appl Stat 32:969–987CrossRefGoogle Scholar
  7. Cox TF, Arnold DS (2016) Simple components. J App Stat.  https://doi.org/10.1080/02664763.2016.1268104
  8. Enki HA, Trendafilov NT, Jolliffe IT (2013) A clustering approach to interpretable principal components. J Appl Stat 40:583–599CrossRefGoogle Scholar
  9. Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828CrossRefGoogle Scholar
  10. Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. Geol Soc Spec Pub 264:145–159CrossRefGoogle Scholar
  11. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300CrossRefGoogle Scholar
  12. Everitt BS, Landau S, Leese M, Stahl D (2011) Cluster analysis. Wiley, ChichesterCrossRefGoogle Scholar
  13. Gallo M, Trendafilov NT, Buccianti A (2016) Sparse PCA and investigation of multi-elements compositional repositories: theory and applications. Environ Ecol Stat 23:421–434CrossRefGoogle Scholar
  14. Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441CrossRefGoogle Scholar
  15. Izenman AJ (2008) Modern multivariate statistical techniques: regression, classification, and manifold learning. Springer, New YorkCrossRefGoogle Scholar
  16. Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer series in statistics. Springer, New YorkGoogle Scholar
  17. Jolliffe IT, Trendafilov NT, Uddin M (2003) A modified principal component technique based on the LASSO. J Comput Graph Stat 12:531–547CrossRefGoogle Scholar
  18. Lovell D, Pawlowsky-Glahn V, Egozcue JJ, Marguerat S, Bähler J (2015) Proportionality: a valid alternative to correlation for relative data. PLoS Comput Biol 11(3):e1004075.  https://doi.org/10.1371/journal.pcbi.1004075 CrossRefGoogle Scholar
  19. Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 31–42Google Scholar
  20. Mert MC, Filzmoser P, Hron K (2015) Sparse principal balances. Stat Model 15:159–174CrossRefGoogle Scholar
  21. Palarea-Albaladejo J, Martín-Fernández JA, Soto JA (2012) Dealing with distances and transformations for fuzzy C-means clustering of compositional data. J Classif 29:144–169CrossRefGoogle Scholar
  22. Palarea-Albaladejo J, Martín-Fernández JA (2015) zCompositions—R package for multivariate imputation of nondetects and zeros in compositional data sets. Chemom Intell Lab 143:85–96CrossRefGoogle Scholar
  23. Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15:384–398CrossRefGoogle Scholar
  24. Pawlowsky-Glahn V, Egozcue JJ (2011) Exploring compositional data with the CoDa-dendrogram. Austrian J Stat 40:103–113Google Scholar
  25. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2011) Principal balances. In Egozcue JJ, Tolosana-Delgado R, Ortego M (eds) Proceedings of the 4th international workshop on compositional data analysis, Girona, Spain, pp 1–10Google Scholar
  26. Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Statistics in practice. Wiley, ChichesterGoogle Scholar
  27. Podani J (2000) Simulation of random dendrograms and comparison tests: some comments. J Classif 17:123–142CrossRefGoogle Scholar
  28. Prados F, Boada I, Prats A, Martín-Fernández JA, Feixas M, Blasco G, Puig J, Pedraza S (2010) Analysis of new diffusion tensor imaging anisotropy measures in the 3P-plot. J Magn Reson Imaging 31:1435–1444CrossRefGoogle Scholar
  29. R development core team (2015) R: a language and environment for statistical computing: Vienna. http://www.r-project.org
  30. Tolosana-Delgado R, von Eynatten H (2010) Simplifying compositional multiple regression: application to grain size controls on sediment geochemistry. Comput Geosci 36:577–589CrossRefGoogle Scholar
  31. von Eynatten H, Tolosana-Delgado R, Karius V (2012) Sediment generation in modern glacial settings: grain-size and source-rock control on sediment composition. Sediment Geol 280:80–92CrossRefGoogle Scholar
  32. Witten D, Tibshirani R, Gross S, Narasimhan B (2011) PMA: penalized multivariate analysis. R Package Version 1:8Google Scholar

Copyright information

© International Association for Mathematical Geosciences 2017

Authors and Affiliations

  1. 1.Dept. Informàtica, Matemàtica Aplicada, i EstadísticaUniversitat de GironaGironaSpain
  2. 2.Dept. d’Enginyeria Civil i AmbientalU. Politècnica de CatalunyaBarcelonaSpain
  3. 3.Dept. Modelling and EvaluationHelmholtz-Zentrum Dresden-Rossendorf, Helmholtz-Institut Freiberg for Resource TechnologyFreibergGermany

Personalised recommendations