Skip to main content
Log in

Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements

  • Published:
Journal of Classification Aims and scope Submit manuscript

Abstract

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • AITCHISON, J. (1980), “Relative Variation Diagrams for Describing Patterns of Variability in Compositional Data,” Mathematical Geology, 22, 487–512.

    Article  Google Scholar 

  • AITCHISON, J. (1983), “Principal Component Analysis of Compositional Data”, Biometrika, 70, 57–65.

    Article  MATH  MathSciNet  Google Scholar 

  • AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman & Hall, reprinted in 2003 by Blackburn Press.

    MATH  Google Scholar 

  • AITCHISON, J.(1992), “On Criteria for Measures of Compositional Difference,” Mathematical Geology, 24, 365–80.

    Article  MATH  MathSciNet  Google Scholar 

  • AITCHISON, J., BARCELÓ-VIDAL, C., MARTIN-FERNÁNDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), “Logratio Analysis and Compositional Distance,” Mathematical Geology, 32, 271–275.

    Article  MATH  Google Scholar 

  • AITCHISON, J., and EGOZCUE, J.J. (2005), “Compositional Data Analysis: Where Are We and Where Should We Be Heading?”, Mathematical Geology, 37, 829–850.

    Article  MATH  MathSciNet  Google Scholar 

  • AITCHISON, J., and GREENACRE, M.J. (2002), “Biplots of Compositional Data,” Applied Statistics, 51, 375–392.

    MATH  MathSciNet  Google Scholar 

  • BAXTER, M.J., COOL, H.E.M., and HEYWORTH, M.P. (1990), “Principal Component and Correspondence Analysis of Compositional Data: Some Similarities,” Journal of Applied Statistics, 17, 229–235.

    Article  Google Scholar 

  • BAVAUD, F. (2002), “Quotient Dissimilarities, Euclidean Embeddability, and Huygens’ Weak Principle,” in Classification, Clustering and Data Analysis, eds. K. Jajuga, A. Sokolowski and H.-H.Bock, New York: Springer, pp. 195–202.

    Google Scholar 

  • BAVAUD, F. (2004), “Generalized Factor Analyses for Contingency Tables,” in Classification, Clustering, and Data Mining Applications, eds. D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul, New York: Springer, pp. 597–606.

    Google Scholar 

  • BEARDAH, C.C., BAXTER, M.J., COOL, H.E.M., and JACKSON, C.M. (2003), “Compositional Data Analysis of Archaeological Glass: Problems and Possible Solutions,” in: Proceedings of the First Compositional Data Analysis Workshop, Girona, Spain, http://ima.udg.edu/Activitats/CoDaWork03/paper_baxter_Beardah2.pdf

  • BENZÉCRI, J.-P. (1973), L’Analyse des Données, Tôme I: La Classification, Tôme II: L’Analyse des Correspondances, Paris : Dunod.

    Google Scholar 

  • CUADRAS, C., CUADRAS, D., and GREENACRE, M.J. (2006), “A Comparison of Methods for Analyzing Contingency Tables,” Communications in Statistics Simulation and Computation, 35, 447–459.

    Article  MATH  MathSciNet  Google Scholar 

  • CUADRAS, C., and FORTIANA, J. (1998), “Visualizing Categorical Data with Related Metric Scaling,” in Visualization of Categorical Data, eds. J. Blasius and M.J. Greenacre, San Diego: Academic Press, pp. 112–129.

    Google Scholar 

  • EGOZCUE, J.J., and PAWLOWSKY-GLAHN, V. (2005), “Groups of Parts and Their Balances in Compositional Data Analysis,” Mathematical Geology, 37, 795–828.

    Article  MATH  MathSciNet  Google Scholar 

  • ESCOFIER, B. (1978), “Analyse factorielle et distances répondant au principe d’équivalence distributionelle,” Revue de Statistique Appliquée, 26, 29–37.

    Google Scholar 

  • GABRIEL, K.R. (1971), “The Biplot-graphical Display with Applications to Principal Component Analysis,” Biometrika, 58, 453–467.

    Article  MATH  MathSciNet  Google Scholar 

  • GABRIEL, K.R. (1972), “Analysis of Meteorological Data by Means of Canonical Decomposition and Biplots,” Journal of Applied Meteorology, 11, 1071–1077.

    Article  Google Scholar 

  • GABRIEL, K. R. (2002), “Goodness of Fit of Biplots and Correspondence Analysis,” Biometrika, 89, 423–436.

    Article  MATH  MathSciNet  Google Scholar 

  • GOODMAN, L.A. (1968), “The Analysis of Cross-classified Data: Independence, Quasiindependence, and Interactions in Contingency Tables, With or Without Missing Entries,” Journal of the American Statistical Association, 63, 1091–1131.

    Article  MATH  Google Scholar 

  • GOODMAN, L.A. (1985), “The Analysis of Cross-classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries,” The Annals of Statistics, 13, 10–98.

    Article  MATH  MathSciNet  Google Scholar 

  • GREENACRE, M.J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press.

    MATH  Google Scholar 

  • GREENACRE, M.J. (1993), “Biplots in Correspondence Analysis,” Journal of Applied Statistics, 20, 251–269.

    Article  Google Scholar 

  • GREENACRE, M.J. (2006), “Tying Up the Loose Ends in Simple, Multiple and Joint Correspondence Analysis,” Keynote Address, COMPSTAT 2006, in Proceedings in Computational Statistics, eds. A. Rizzi and M. Vichi, Berlin: Springer-Verlag, pp.163–186.

    Chapter  Google Scholar 

  • GREENACRE, M.J. (2007), Correspondence Analysis in Practice (2nd Ed.), London: Chapman & Hall / CRC.

    MATH  Google Scholar 

  • GREENACRE, M.J. (2008), “Power Transformations in Correspondence Analysis,” accepted for publication in Computational Statistics and Data Analysis, downloadable at http://www.econ.upf.edu/en/research/onepaper.php?id=1044

  • GREENACRE, M.J., and BLASIUS, J. (eds) (1994), Correspondence Analysis in the Social Sciences, London: Academic Press.

    Google Scholar 

  • GREENACRE, M.J., and PARDO, R. (2006), “Subset Correspondence Analysis: Visualizing Relationships Among a Selected Set of Response Categories from a Questionnaire Survey,” Sociological Methods and Research, 35, 193–218.

    Article  MathSciNet  Google Scholar 

  • KAZMIERCZAK, J.B. (1985), Analyse logarithmique: deux exemples d’application, Revue de Statistique Appliquée, 33, 13–24.

    MathSciNet  Google Scholar 

  • LEBART, L., MORINEAU A., and WARWICK, K. (1984), Multivariate Descriptive Statistical Analysis, New York: Wiley.

    MATH  Google Scholar 

  • LEWI, P.J. (1976), “Spectral Mapping, A Technique for Classifying Biological Activity Profiles of Chemical Compounds,” Arzneimittel Forschung, 26, 1295–1300.

    Google Scholar 

  • LEWI, P.J. (1980), “Multivariate Data Analysis in APL,” in Proceedings of APL-80 Conference, ed. G.A. van der Linden, Amsterdam: North-Holland, pp. 267–271.

    Google Scholar 

  • LEWI, P.J. (1998), “Analysis of Contingency Tables,” in Handbook of Chemometrics and Qualimetrics: Part B, eds. B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke, Amsterdam: Elsevier, pp. 161–206.

    Google Scholar 

  • MARTÍN-FERNÁNDEZ, J.A., BARCELÓ-VIDAL, C., and PAWLOWSKY-GLAHN, V. (2003), “Dealing with Zeros and Missing Values in Compositional Data Sets,” Mathematical Geology, 35, 253–278.

    Article  Google Scholar 

  • NENADIĆ, O., and GREENACRE, M.J. (2007), “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package,” Journal of Statistical Software 20(3), http://www.jstatsoft.org/v20/i03/.

  • R DEVELOPMENT CORE TEAM (2007), “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org.

  • S-PLUS, VERSION 7 (2007). Insightful Corporation, Seattle, USA, http://www.insightful.com.

  • TER BRAAK, C.J.F. (1985), “Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model,” Biometrics, 41, 859–873.

    Article  Google Scholar 

  • VERMUNT, J.K. (1997), “LEM: A General Program for the Analysis of Categorical Data,” The Netherlands: Department of Methodology and Statistics, Tilburg University.

    Google Scholar 

  • WOUTERS, L., GÖHLMANN, H.W., BIJNENS, L., KASS, S.U., MOLENBERGHS, G., and LEWI, P.J. (2003), “Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods,” Biometrics, 59, 1131–1139.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Greenacre.

Additional information

The first author acknowledges research support from the Fundación BBVA in Madrid as well as partial support by the Spanish Ministry of Education and Science, grant MEC-SEJ2006-14098. The constructive comments of the referees, who also brought additional relevant literature to our attention, significantly improved our article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Greenacre, M., Lewi, P. Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements. J Classif 26, 29–54 (2009). https://doi.org/10.1007/s00357-009-9027-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00357-009-9027-y

Keywords

Navigation