Abstract
Data describing amounts of components of specimens are compositional if the size of each specimen is constant or irrelevant. Ideally compositional data is given by relative portions summing up to 1 or 100 %. But more often compositional data appear disguised in several ways: different components might be reported in different physical units, different cases might sum up to different totals, and almost never all relevant components are reported. Nevertheless, the constraints of constant sum and relative meaning of the portions have important implications for their statistical analysis, contradicting the typical assumptions of usual uni- and multivariate statistical methods and thus rendering their direct application spurious. A comprehensive statistical methodology, based on a vector space structure of the mathematical simplex, has only been developed very recently, and several software packages are now available to treat compositional data within it. This book is at the same time a textbook on compositional data analysis from a modern perspective and a sort of manual on the R-package “compositions”: both R and “compositions” are available for download as free software. This chapter discusses the need of an own statistical methodology for compositions, the historic background of compositional data analysis, and the software needs for compositional data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aitchison, J. (1986). The statistical analysis of compositional data. Monographs on statistics and applied probability. London: Chapman & Hall (Reprinted in 2003 with additional material by The Blackburn Press), 416 pp.
Aitchison, J. (1997). The one-hour course in compositional data analysis or compositional data analysis is simple. In V. Pawlowsky-Glahn (Ed.), Proceedings of IAMG’97—The third annual conference of the International Association for Mathematical Geology, Volume I, II and addendum (pp. 3–35). Barcelona: International Center for Numerical Methods in Engineering (CIMNE), 1100 pp.
Barceló-Vidal, C. (2000). Fundamentación matemática del análisis de datos composicionales. Technical Report IMA 00-02-RR. Spain: Departament d’Informática i Matemática Aplicada, Universitat de Girona, 77 pp.
Barcelo-Vidal, C. (2003). When a data set can be considered compositional? See Thió-Henestrosa and Martín-Fernández (2003).
Billheimer, D., Guttorp, P., & Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214.
Butler, J. C. (1978). Visual bias in R-mode dendrograms due to the effect of closure. Mathematical Geology, 10(2), 243–252.
Butler, J. C. (1979). The effects of closure on the moments of a distribution. Mathematical Geology, 11(1), 75–84.
Chasalow, S. (2005). combinat: Combinatorics utilities. R package version 0.0-6.
Chayes, F. (1960). On correlation between variables of constant sum. Journal of Geophysical Research, 65(12), 4185–4193.
Chayes, F., & Trochimczyk, J. (1978). An effect of closure on the structure of principal components. Mathematical Geology, 10(4), 323–333.
Cortes, J. A. (2009). On the harker variation diagrams; a comment on “the statistical analysis of compositional data. Where are we and where should we be heading?” by Aitchison and Egozcue (2005). Mathematical Geosciences, 41(7), 817–828.
Dalgaard, P. (2008). Introductory tatistics with R (2nd ed.). Statistics and computing. Berlin: Springer.
Franz, C. (2006). cramer: Multivariate nonparametric Cramer-Test for the two-sample-problem. R package version 0.8-1.
Ihaka, R., Murrell, P., Hornik, K., & Zeileis, A. (2009). colorspace: Color Space Manipulation. R package version 1.0-1.
Martin, A. D., Quinn, K. M., & Park, J. H. (2008). MCMCpack: Markov chain Monte Carlo (MCMC) Package. R package version 0.9-4.
Pawlowsky-Glahn, V. (1984). On spurious spatial covariance between variables of constant sum. Science de la Terre, Série Informatique, 21, 107–113.
Pawlowsky-Glahn, V. (2003). Statistical modelling on coordinates. See Thió-Henestrosa and Martín-Fernández (2003).
Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment (SERRA), 15(5), 384–398.
Pearson, K. (1897). Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, LX, 489–502.
R Development Core Team (2004). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-00-3.
Rizzo, M. L., & Szekely, G. J. (2008). energy: E-statistics (energy statistics). R package version 1.1-0.
Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Maechler, M., et al. (2007). robustbase: Basic Robust Statistics. R package version 0.2-8.
Shurtz, R. F. (2003). Compositional geometry and mass conservation. Mathematical Geology, 35(8), 927–937.
Thió-Henestrosa, S., Barceló-Vidal, C., Martín-Fernández, J., & Pawlowsky-Glahn, V. (2003). CoDaPack. An Excel and Visual Basic based software for compositional data analysis. Current version and discussion for upcoming versions. See Thió-Henestrosa and Martín-Fernández (2003).
Thió-Henestrosa, S., & Martín-Fernández, J. A. (Eds.). (2003). Compositional data analysis workshop—CoDaWork’03, Proceedings. Universitat de Girona, ISBN 84-8458-111-X, http://ima.udg.es/Activitats/CoDaWork03/.
van den Boogaart, K. G. (2007). tensorA: Advanced tensors arithmetic with named indices. R package version 0.31.
van den Boogaart, K. G., Tolosana, R., & Bren, M. (2009). compositions: Compositional data analysis. R package version 1.02-1.
Warnes, G. R. (2008). gtools: Various R programming tools. Includes R source code and/or documentation contributed by Ben Bolker and Thomas Lumley.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
van den Boogaart, K.G., Tolosana-Delgado, R. (2013). Introduction. In: Analyzing Compositional Data with R. Use R!. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36809-7_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-36809-7_1
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-36808-0
Online ISBN: 978-3-642-36809-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)