Skip to main content

Introduction

  • Chapter
  • First Online:
Analyzing Compositional Data with R

Part of the book series: Use R! ((USE R))

Abstract

Data describing amounts of components of specimens are compositional if the size of each specimen is constant or irrelevant. Ideally compositional data is given by relative portions summing up to 1 or 100 %. But more often compositional data appear disguised in several ways: different components might be reported in different physical units, different cases might sum up to different totals, and almost never all relevant components are reported. Nevertheless, the constraints of constant sum and relative meaning of the portions have important implications for their statistical analysis, contradicting the typical assumptions of usual uni- and multivariate statistical methods and thus rendering their direct application spurious. A comprehensive statistical methodology, based on a vector space structure of the mathematical simplex, has only been developed very recently, and several software packages are now available to treat compositional data within it. This book is at the same time a textbook on compositional data analysis from a modern perspective and a sort of manual on the R-package “compositions”: both R and “compositions” are available for download as free software. This chapter discusses the need of an own statistical methodology for compositions, the historic background of compositional data analysis, and the software needs for compositional data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.stat.boogaart.de/compositionsRBook.

  2. 2.

    http://www.cran.r-project.org.

  3. 3.

    http://www.stat.boogaart.de/compositionsRBook.

References

  • Aitchison, J. (1986). The statistical analysis of compositional data. Monographs on statistics and applied probability. London: Chapman & Hall (Reprinted in 2003 with additional material by The Blackburn Press), 416 pp.

    Google Scholar 

  • Aitchison, J. (1997). The one-hour course in compositional data analysis or compositional data analysis is simple. In V. Pawlowsky-Glahn (Ed.), Proceedings of IAMG’97—The third annual conference of the International Association for Mathematical Geology, Volume I, II and addendum (pp. 3–35). Barcelona: International Center for Numerical Methods in Engineering (CIMNE), 1100 pp.

    Google Scholar 

  • Barceló-Vidal, C. (2000). Fundamentación matemática del análisis de datos composicionales. Technical Report IMA 00-02-RR. Spain: Departament d’Informática i Matemática Aplicada, Universitat de Girona, 77 pp.

    Google Scholar 

  • Barcelo-Vidal, C. (2003). When a data set can be considered compositional? See Thió-Henestrosa and Martín-Fernández (2003).

    Google Scholar 

  • Billheimer, D., Guttorp, P., & Fagan, W. (2001). Statistical interpretation of species composition. Journal of the American Statistical Association, 96(456), 1205–1214.

    Article  MathSciNet  MATH  Google Scholar 

  • Butler, J. C. (1978). Visual bias in R-mode dendrograms due to the effect of closure. Mathematical Geology, 10(2), 243–252.

    Article  Google Scholar 

  • Butler, J. C. (1979). The effects of closure on the moments of a distribution. Mathematical Geology, 11(1), 75–84.

    Article  Google Scholar 

  • Chasalow, S. (2005). combinat: Combinatorics utilities. R package version 0.0-6.

    Google Scholar 

  • Chayes, F. (1960). On correlation between variables of constant sum. Journal of Geophysical Research, 65(12), 4185–4193.

    Article  Google Scholar 

  • Chayes, F., & Trochimczyk, J. (1978). An effect of closure on the structure of principal components. Mathematical Geology, 10(4), 323–333.

    Article  Google Scholar 

  • Cortes, J. A. (2009). On the harker variation diagrams; a comment on “the statistical analysis of compositional data. Where are we and where should we be heading?” by Aitchison and Egozcue (2005). Mathematical Geosciences, 41(7), 817–828.

    Article  MATH  Google Scholar 

  • Dalgaard, P. (2008). Introductory tatistics with R (2nd ed.). Statistics and computing. Berlin: Springer.

    Book  Google Scholar 

  • Franz, C. (2006). cramer: Multivariate nonparametric Cramer-Test for the two-sample-problem. R package version 0.8-1.

    Google Scholar 

  • Ihaka, R., Murrell, P., Hornik, K., & Zeileis, A. (2009). colorspace: Color Space Manipulation. R package version 1.0-1.

    Google Scholar 

  • Martin, A. D., Quinn, K. M., & Park, J. H. (2008). MCMCpack: Markov chain Monte Carlo (MCMC) Package. R package version 0.9-4.

    Google Scholar 

  • Pawlowsky-Glahn, V. (1984). On spurious spatial covariance between variables of constant sum. Science de la Terre, Série Informatique, 21, 107–113.

    Google Scholar 

  • Pawlowsky-Glahn, V. (2003). Statistical modelling on coordinates. See Thió-Henestrosa and Martín-Fernández (2003).

    Google Scholar 

  • Pawlowsky-Glahn, V., & Egozcue, J. J. (2001). Geometric approach to statistical analysis on the simplex. Stochastic Environmental Research and Risk Assessment (SERRA), 15(5), 384–398.

    Google Scholar 

  • Pearson, K. (1897). Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proceedings of the Royal Society of London, LX, 489–502.

    Google Scholar 

  • R Development Core Team (2004). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. ISBN 3-900051-00-3.

    Google Scholar 

  • Rizzo, M. L., & Szekely, G. J. (2008). energy: E-statistics (energy statistics). R package version 1.1-0.

    Google Scholar 

  • Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Maechler, M., et al. (2007). robustbase: Basic Robust Statistics. R package version 0.2-8.

    Google Scholar 

  • Shurtz, R. F. (2003). Compositional geometry and mass conservation. Mathematical Geology, 35(8), 927–937.

    Google Scholar 

  • Thió-Henestrosa, S., Barceló-Vidal, C., Martín-Fernández, J., & Pawlowsky-Glahn, V. (2003). CoDaPack. An Excel and Visual Basic based software for compositional data analysis. Current version and discussion for upcoming versions. See Thió-Henestrosa and Martín-Fernández (2003).

    Google Scholar 

  • Thió-Henestrosa, S., & Martín-Fernández, J. A. (Eds.). (2003). Compositional data analysis workshop—CoDaWork’03, Proceedings. Universitat de Girona, ISBN 84-8458-111-X, http://ima.udg.es/Activitats/CoDaWork03/.

  • van den Boogaart, K. G. (2007). tensorA: Advanced tensors arithmetic with named indices. R package version 0.31.

    Google Scholar 

  • van den Boogaart, K. G., Tolosana, R., & Bren, M. (2009). compositions: Compositional data analysis. R package version 1.02-1.

    Google Scholar 

  • Warnes, G. R. (2008). gtools: Various R programming tools. Includes R source code and/or documentation contributed by Ben Bolker and Thomas Lumley.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

van den Boogaart, K.G., Tolosana-Delgado, R. (2013). Introduction. In: Analyzing Compositional Data with R. Use R!. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-36809-7_1

Download citation

Publish with us

Policies and ethics