Skip to main content
Log in

Compositional Data Analysis: Where Are We and Where Should We Be Heading?

  • Published:
Mathematical Geology Aims and scope Submit manuscript

Abstract

We take stock of the present position of compositional data analysis, of what has been achieved in the last 20 years, and then make suggestions as to what may be sensible avenues of future research. We take an uncompromisingly applied mathematical view, that the challenge of solving practical problems should motivate our theoretical research; and that any new theory should be thoroughly investigated to see if it may provide answers to previously abandoned practical considerations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison, J., 1981, A new approach to null correlations of proportions: Math. Geol., v. 13, no. 2, p. 175–189.

    Google Scholar 

  • Aitchison, J., 1982, The statistical analysis of compositional data (with discussion): J. R. Stat. Soc., Ser. B (Stat. Methodol.), v. 44, no. 2, p. 139–177.

    Google Scholar 

  • Aitchison, J., 1983, Principal component analysis of compositional data: Biometrika, v. 70, no. 1, p. 57–65.

    Google Scholar 

  • Aitchison, J., 1984, The statistical analysis of geochemical compositions: Math. Geol., v. 16, no. 6, p. 531–564.

    Google Scholar 

  • Aitchison, J., 1985, A general class of distributions on the simplex: J. R. Stat. Soc., Ser. B (Stat. Methodol.), v. 47, no. 1, p. 136–146.

    Google Scholar 

  • Aitchison, J., 1986, The statistical analysis of compositional data. Monographs on statistics and applied Probability: Chapman & Hall, London (Reprinted in 2003 with additional material by Blackburn Press), 416 p.

  • Aitchison, J., 1990, Relative variation diagrams for describing patterns of compositional variability: Math. Geol., v. 22, no. 4, p. 487–511.

    Article  Google Scholar 

  • Aitchison, J., 1992a, On criteria for measures of compositional difference: Math. Geol., v. 24, no. 4, p. 365–379.

    Article  Google Scholar 

  • Aitchison, J., 1992b, The triangle in statistics, in Mardia, K., ed., The art of statistical science. A tribute to G. S. Watson: Wiley, New York, p. 89–104.

  • Aitchison, J., 1994, Principles of compositional data analysis, in Anderson, T. W., Olkin, I., and Fang, K., eds., Multivariate analysis and its applications: Institute of Mathematical Statistics, Hayward, CA, p. 73–81.

  • Aitchison, J., 1997, The one-hour course in compositional data analysis or compositional data analysis is simple, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97—The third annual conference of the International Association for Mathematical Geology, Vol. I, II and addendum: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, Spain, p. 3–35.

  • Aitchison, J., 1999, Logratios and natural laws in compositional data analysis: Math. Geol., v. 131, no. 5, p. 563–580.

    Google Scholar 

  • Aitchison, J., 2002, Simplicial inference, in Viana, M. A. G., and Richards, D. S. P., eds., Algebraic methods in statistics and probability, v. 287, Contemporary mathematics series: American Mathematical Society, Providence, RI, p. 1–22.

  • Aitchison, J., 2003, Compositional data analysis: Where are we and where should we be heading? See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Aitchison, J., and Bacon-Shone, J., 1999, Convex linear combination of compositions: Biometrika, v. 86, no. 2, p. 351–364.

    Article  Google Scholar 

  • Aitchison, J., and Barceló-Vidal, C., 2002, Compositional processes: A statistical search for understanding: See Bayer, Burger, and Skala (2002, p. 381–386).

  • Aitchison, J., Barceló-Vidal, C., Egozcue, J. J., and Pawlowsky-Glahn, V., 2002, A concise guide for the algebraic–geometric structure of the simplex, the sample space for compositional data analysis. See Bayer, Burger, and Skala (2002, p. 387–392).

  • Aitchison, J., and Greenacre, M., 2002, Biplots for compositional data: J. R. Stat. Soc., Ser. C (Appl. Stat.), v. 51, no. 4, p. 375–392.

    Google Scholar 

  • Aitchison, J., and Kay, J., 2003, Possible solution of some essential zero problems in compositional data analysis. See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Aitchison, J., and Lauder, I. J., 1985, Kernel density estimation for compositional data: J. R. Stat. Soc., Ser. C (Appl. Stat.), v. 34, no. 2, p. 129–137.

    Google Scholar 

  • Aitchison, J., Mateu-Figueras, G., and Ng, K. W., 2004, Characterization of distributional forms for compositional data and associated distributional tests: Math. Geol., v. 35, no. 6, p. 667–680.

    Google Scholar 

  • Aitchison, J., and Ng, K. W., 2003, Compositional hypotheses of subcompositional stability and specific perturbation change and their testing. See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Aitchison, J., and Shen, S. M., 1980, Logistic-normal distributions. Some properties and uses: Biometrika, v. 67, no. 2, p. 261–272.

    Google Scholar 

  • Aitchison, J., and Thomas, C. W., 1998, Differential perturbation processes: A tool for the study of compositional processes. See Buccianti, Nardi, and Potenza (1998, p. 499–504).

  • Azzalini, A., and Capitanio, A., 1999, Statistical applications of the multivariate skew-normal distribution: J. R. Stat. Soc., Ser. B (Stat. Methodol.) v. 61, no. 3, p. 579–602.

  • Azzalini, A., and Dalla Valle, A., 1996, The multivariate skew-normal distribution: Biometrika, v. 83, no. 4, p. 715–726.

    Article  Google Scholar 

  • Bacon-Shone, J., 1992, Ranking methods for compositional data: Appl. Stat., v. 41, no. 3, p. 533–537.

    Google Scholar 

  • Bacon-Shone, J., 2003, Modelling structural zeros in compositional data. See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Barceló, C., Pawlowsky-Glahn, V., and Grunsky, E., 1996, Some aspects of transformations of compositional data and the identification of outliers: Math. Geol., v. 28, no. 4, p. 501–518.

    Google Scholar 

  • Barceló-Vidal, C., Martín-Fernández, J. A., and Pawlowsky-Glahn, V., 2001, Mathematical foundations of compositional data analysis, in Ross, G., ed., Proceedings of IAMG'01—The sixth annual conference of the International Association for Mathematical Geology, CD-ROM, 20 p.

  • Bayer, U., Burger, H., and Skala, W., eds., 2002, Proceedings of IAMG'02—The eighth annual conference of the International Association for Mathematical Geology, Terra Nostra, no. 3

  • Billheimer, D., Guttorp, P., and Fagan, W., 1997, Statistical analysis and interpretation of discrete compositional data: Technical report, NRCSE technical report 11: University of Washington, Seattle, Washington, 48 p.

  • Billheimer, D., Guttorp, P., and Fagan, W., 2001, Statistical interpretation of species composition: J. Am. Stat. Assoc., v. 96, no. 456, p. 1205–1214.

    Article  Google Scholar 

  • Box, G. E. P., and Cox, D. R., 1964, The analysis of transformations: J. R. Stat. Soc., Ser. B (Stat. Methodol.), v. 26, no. 2, p. 211–252.

    Google Scholar 

  • Buccianti, A., Nardi, G., and Potenza, R., eds., 1998, Proceedings of IAMG'98—The fourth annual conference of the International Association for Mathematical Geology, Vol. I and II: De Frede Editore, Napoli, 969 p.

  • Buccianti, A., and Pawlowsky-Glahn, V., 2003, Random variables and geochemical processes: A way to describe natural variability: in Ottonello, G., and Serva, L., Geochemical baselines of Italy, Chapter 4: Pacini Editore, Genova, Italy, 294 p.

  • Buccianti, A., Pawlowsky-Glahn, V., Barceló-Vidal, C., and Jarauta-Bragulat, E., 1999, Visualization and modeling of natural trends in ternary diagrams: A geochemical case study. See Lippard, Næss, and Sinding-Larsen (1999, p. 139–144).

  • Buccianti, A., Vaselli, O., and Nisi, B., 2003, New insights on river water chemistry by using noncentred simplicial principal component analysis: A case study. See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Butler, J. C., 1979, The effects of closure on the moments of a distribution: Math. Geol., v. 11, no. 1, p. 75–84.

    Google Scholar 

  • Chayes, F., 1960, On correlation between variables of constant sum: J. Geophys. Res., v. 65, no. 12, p. 4185–4193.

    Article  Google Scholar 

  • Daunis-i-Estadella, J., Egozcue, J. J., and Pawlowsky-Glahn, V., 2002, Least squares regression in the simplex. See Bayer, Burger, and Skala (2002, p. 411–416).

  • Egozcue, J. J., and Pawlowsky-Glahn, V., 2005, Groups of parts and their balances in compositional data analysis. Math. Geol., v. 37, no. 7, p. 795–828.

  • Egozcue, J. J., Pawlowsky-Glahn, V., Mateu-Figueras, G., and Barceló-Vidal, C., 2003, Isometric logratio transformations for compositional data analysis: Math. Geol., v. 35, no. 3, p. 279–300.

    Article  Google Scholar 

  • Fry, J. M., Fry, T. R. L., and McLaren, K. R., 2000, Compositional data analysis and zeros in micro data: Appl. Econ., v. 32, no. 8, p. 953–959.

    Google Scholar 

  • Gabriel, K. R., 1971, The biplot—graphic display of matrices with application to principal component analysis: Biometrika, v. 58, no. 3, p. 453–467.

    Google Scholar 

  • Gabriel, K. R., 1981, Biplot display of multivariate matrices for inspection of data and diagnosis, in Barnett, V., ed., Interpreting multivariate data: Wiley, New York, p. 147–173.

    Google Scholar 

  • Galton, F., 1879, The geometric mean, in vital and social statistics: Proc. R. Soc. Lond., v. 29, p. 365–366.

  • Lippard, S. J., Næss, A., and Sinding-Larsen, R., eds., 1999, Proceedings of IAMG'99—The fifth annual conference of the International Association for Mathematical Geology, Vol. I and II: Tapir, Trondheim, Norway, 784 p.

  • Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 2000, Zero replacement in compositional data sets, in Kiers, H., Rasson, J., Groenen, P., and Shader, M., eds., Studies in classification, data analysis, and knowledge organization: Springer-Verlag, Berlin, p. 155–160.

  • Martín-Fernández, J. A., Bren, M., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1999, A measure of difference for compositional data based on measures of divergence. See Lippard, Næss, and Sinding-Larsen (1999, p. 211–216).

  • Martin-Fernández, J. A., Paladea-Albadalejo, J., and Gómez-García, J., 2003, Markov chain Monte Carlo method applied to rounding zeros of compositional data: First approach. See Thió-Henestrosa and Martín-Fernández (2003) (electronic publication).

  • Mateu-Figueras, G., 2003, Models de distribució sobre el símplex: PhD Thesis, Universitat Politècnica de Catalunya, Barcelona, Spain.

  • Mateu-Figueras, G., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998, Modeling compositional data with multivariate skew-normal distributions. See Buccianti, Nardi, and Potenza (1998, p. 532–537).

  • Mateu-Figueras, G., and Pawlowsky-Glahn, V., 2003, Una alternativa a la distribución lognormal. See Saralegui and Ripoll (2003) (electronic publication).

  • Mateu-Figueras, G., Pawlowsky-Glahn, V., and Martín-Fernández, J. A., 2002, Normal in ℝ+ vs. lognormal in ℝ. See Bayer, Burger, and Skala (2002, p. 305–310).

  • McAlister, D., 1879, The law of the geometric mean: Proc. R. Soc. Lond., v. 29, p. 367–376.

  • Mosimann, J. E., 1962, On the compound multinomial distribution, the multivariate β-distribution and correlations among proportions: Biometrika, v. 49, nos. 1–2, p. 65–82.

  • Pawlowsky-Glahn, V., 2003, Statistical modelling on coordinates. See (Thió-Henestrosa and Martín-Fernández, 2003) (electronic publication).

  • Pawlowsky-Glahn, V., and Buccianti, A., 2002, Visualization and modeling of subpopulations of compositional data: Statistical methods illustrated by means of geochemical data from fumarolic fluids: Int. J. Earth Sci. (Geol. Rundschau), v. 91, no. 2, p. 357–368.

    Google Scholar 

  • Pawlowsky-Glahn, V., and Egozcue, J. J., 2001, Geometric approach to statistical analysis on the simplex: Stochastic Environ. Res. Risk Assess. (SERRA), v. 15, no. 5, p. 384–398.

    Google Scholar 

  • Pawlowsky-Glahn, V., and Egozcue, J. J., 2002, BLU estimators and compositional data: Math. Geol., v. 34, no. 3, p. 259–274.

    Article  Google Scholar 

  • Pawlowsky-Glahn, V., Egozcue, J. J., and Burger, H., 2003, An alternative model for the statistical analysis of bivariate positive measurements, in Cubitt, J., ed., Proceedings of IAMG'03—The ninth annual conference of the International Association for Mathematical Geology, CD-ROM: University of Portsmouth, Portsmouth, UK.

  • Pearson, K., 1897, Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs: Proc. R. Soc. Lond., v. LX, p. 489–502.

  • Renner, R. M., 1993, The resolution of a compositional data set into mixtures of fixed source components: J. R. Stat. Soc., Ser. C (Appl. Stat.), v. 42, no. 4, p. 615–631.

    Google Scholar 

  • Saralegui, J., and Ripoll, E., eds., 2003, Actas del XXVII Congreso Nacional de la Sociedad de Estadística e Investigación Operativa (SEIO), CD-ROM: Sociedad de Estadística e Investigación Operativa, Lleida (Spain).

  • Sarmanov, O. V., and Vistelius, A. B., 1959, On the correlation of percentage values: Dokl. Akad. Nauk. SSSR, v. 126, p. 22–25.

    Google Scholar 

  • Thió-Henestrosa, S., and Martín-Fernández, J. A., eds., 2003, Compositional Data Analysis Workshop—CoDaWork'03, Proceedings: Universitat de Girona, CD-ROM, ISBN 84-8458-111-X, available at http://ima.udg.es/Activitats/CoDaWork03/.

  • Thomas, C. W., and Aitchison, J., 1998, The use of logratios in subcompositional analysis and geochemical discrimination of metamorphosed limestones from the northeast and central scottish highlands. See Buccianti, Nardi, and Potenza (1998, p. 549–554).

  • Thomas, C. W., and Aitchison, J., 2003, Exploration of geological variability and possible processes through the use of compositional data analysis: An example using Scottish metamorphosed limestones. See Buccianti, Nardi, and Potenza (1998) (electronic publication).

  • Tolosana-Delgado, R., Otero, N., Pawlowsky-Glahn, V., and Soler, A., 2005, Extracting latent factor subcompositions from hydrochemical conpositions. Math. Geol., v. 37, no. 7, p. 681–702.

  • Tolosana-Delgado, R., Palomera-Román, R., Gimeno-Torrente, D., Pawlowsky-Glahn, V., and Thió-Henestrosa, S., 2002, A first approach to the classification of basalts using trace elements. See Bayer, Burger, and Skala (2002, p. 435–440).

  • Tolosana-Delgado, R., Pawlowsky-Glahn, V., and Mateu-Figueras, G., 2003, Krigeado de variables positivas. Un modelo alternativo. See Bayer, Burger, and Skala (2002) (electronic publication).

  • von Eynatten, H., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 2003, Modelling compositional change: The example of chemical weathering of granitoid rocks: Math. Geol., v. 35, no. 3, p. 231–251.

    Article  Google Scholar 

  • von Eynatten, H., Pawlowsky-Glahn, V., and Egozcue, J. J., 2002, Understanding perturbation on the simplex: A simple method to better visualize and interpret compositional data in ternary diagrams: Math. Geol., v. 34, no. 3, p. 249–257.

    Article  Google Scholar 

  • Weltje, J. G., 1997, End-member modeling of compositional data: Numerical–statistical algorithms for solving the explicit mixing problem: Math. Geol., v. 29, no. 4, p. 503–549.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to J. Aitchison.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aitchison, J., J. Egozcue, J. Compositional Data Analysis: Where Are We and Where Should We Be Heading?. Math Geol 37, 829–850 (2005). https://doi.org/10.1007/s11004-005-7383-7

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11004-005-7383-7

Keywords

Navigation