Advances in Principal Balances for Compositional Data
- 216 Downloads
Compositional data analysis requires selecting an orthonormal basis with which to work on coordinates. In most cases this selection is based on a data driven criterion. Principal component analysis provides bases that are, in general, functions of all the original parts, each with a different weight hindering their interpretation. For interpretative purposes, it would be better to have each basis component as a ratio or balance of the geometric means of two groups of parts, leaving irrelevant parts with a zero weight. This is the role of principal balances, defined as a sequence of orthonormal balances which successively maximize the explained variance in a data set. The new algorithm to compute principal balances requires an exhaustive search along all the possible sets of orthonormal balances. To reduce computational time, the sets of possible partitions for up to 15 parts are stored. Two other suboptimal, but feasible, algorithms are also introduced: (i) a new search for balances following a constrained principal component approach and (ii) the hierarchical cluster analysis of variables. The latter is a new approach based on the relation between the variation matrix and the Aitchison distance. The properties and performance of these three algorithms are illustrated using a typical data set of geochemical compositions and a simulation exercise.
KeywordsAitchison norm Cluster analysis Compositions Isometric logratio coordinates Principal component analysis Simplex
This research has been supported by the Spanish Ministry of Economy and Competitiveness under the project CODA-RETOS (Ref: MTM2015-65016-C2-1(2)-R); and by the Agència de Gestió d’Ajuts Universitaris i de Recerca of the Generalitat de Catalunya under the project COSDA (Ref: 2014SGR551). The authors gratefully acknowledge the constructive comments of the anonymous referees which have undoubtedly helped to significantly improve the quality of the paper.
- Aitchison J (1982) The statistical analysis of compositional data (with discussion). J R Stat Soc B Methodol 44:139–177Google Scholar
- Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman & Hall Ltd., London. (Reprinted in 2003 with additional material by The Blackburn Press)Google Scholar
- Cox TF, Arnold DS (2016) Simple components. J App Stat. https://doi.org/10.1080/02664763.2016.1268104
- Jolliffe IT (2002) Principal component analysis, 2nd edn. Springer series in statistics. Springer, New YorkGoogle Scholar
- Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue JJ (2011) The principle of working on coordinates. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis: theory and applications. Wiley, Chichester, pp 31–42Google Scholar
- Pawlowsky-Glahn V, Egozcue JJ (2011) Exploring compositional data with the CoDa-dendrogram. Austrian J Stat 40:103–113Google Scholar
- Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2011) Principal balances. In Egozcue JJ, Tolosana-Delgado R, Ortego M (eds) Proceedings of the 4th international workshop on compositional data analysis, Girona, Spain, pp 1–10Google Scholar
- Pawlowsky-Glahn V, Egozcue JJ, Tolosana-Delgado R (2015) Modeling and analysis of compositional data. Statistics in practice. Wiley, ChichesterGoogle Scholar
- R development core team (2015) R: a language and environment for statistical computing: Vienna. http://www.r-project.org
- Witten D, Tibshirani R, Gross S, Narasimhan B (2011) PMA: penalized multivariate analysis. R Package Version 1:8Google Scholar