Advertisement

Journal of Classification

, Volume 26, Issue 1, pp 29–54 | Cite as

Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements

  • Michael Greenacre
  • Paul Lewi
Article

Abstract

We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology.

Keywords

Association models Biplot Correspondence analysis Log-ratio analysis Singular value decomposition Spectral mapping 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AITCHISON, J. (1980), “Relative Variation Diagrams for Describing Patterns of Variability in Compositional Data,” Mathematical Geology, 22, 487–512.CrossRefGoogle Scholar
  2. AITCHISON, J. (1983), “Principal Component Analysis of Compositional Data”, Biometrika, 70, 57–65.zbMATHCrossRefMathSciNetGoogle Scholar
  3. AITCHISON, J. (1986), The Statistical Analysis of Compositional Data, London: Chapman & Hall, reprinted in 2003 by Blackburn Press.zbMATHGoogle Scholar
  4. AITCHISON, J.(1992), “On Criteria for Measures of Compositional Difference,” Mathematical Geology, 24, 365–80.zbMATHCrossRefMathSciNetGoogle Scholar
  5. AITCHISON, J., BARCELÓ-VIDAL, C., MARTIN-FERNÁNDEZ, J.A., and PAWLOWSKY-GLAHN, V. (2000), “Logratio Analysis and Compositional Distance,” Mathematical Geology, 32, 271–275.zbMATHCrossRefGoogle Scholar
  6. AITCHISON, J., and EGOZCUE, J.J. (2005), “Compositional Data Analysis: Where Are We and Where Should We Be Heading?”, Mathematical Geology, 37, 829–850.zbMATHCrossRefMathSciNetGoogle Scholar
  7. AITCHISON, J., and GREENACRE, M.J. (2002), “Biplots of Compositional Data,” Applied Statistics, 51, 375–392.zbMATHMathSciNetGoogle Scholar
  8. BAXTER, M.J., COOL, H.E.M., and HEYWORTH, M.P. (1990), “Principal Component and Correspondence Analysis of Compositional Data: Some Similarities,” Journal of Applied Statistics, 17, 229–235.CrossRefGoogle Scholar
  9. BAVAUD, F. (2002), “Quotient Dissimilarities, Euclidean Embeddability, and Huygens’ Weak Principle,” in Classification, Clustering and Data Analysis, eds. K. Jajuga, A. Sokolowski and H.-H.Bock, New York: Springer, pp. 195–202.Google Scholar
  10. BAVAUD, F. (2004), “Generalized Factor Analyses for Contingency Tables,” in Classification, Clustering, and Data Mining Applications, eds. D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul, New York: Springer, pp. 597–606.Google Scholar
  11. BEARDAH, C.C., BAXTER, M.J., COOL, H.E.M., and JACKSON, C.M. (2003), “Compositional Data Analysis of Archaeological Glass: Problems and Possible Solutions,” in: Proceedings of the First Compositional Data Analysis Workshop, Girona, Spain, http://ima.udg.edu/Activitats/CoDaWork03/paper_baxter_Beardah2.pdf
  12. BENZÉCRI, J.-P. (1973), L’Analyse des Données, Tôme I: La Classification, Tôme II: L’Analyse des Correspondances, Paris : Dunod.Google Scholar
  13. CUADRAS, C., CUADRAS, D., and GREENACRE, M.J. (2006), “A Comparison of Methods for Analyzing Contingency Tables,” Communications in Statistics Simulation and Computation, 35, 447–459.zbMATHCrossRefMathSciNetGoogle Scholar
  14. CUADRAS, C., and FORTIANA, J. (1998), “Visualizing Categorical Data with Related Metric Scaling,” in Visualization of Categorical Data, eds. J. Blasius and M.J. Greenacre, San Diego: Academic Press, pp. 112–129.Google Scholar
  15. EGOZCUE, J.J., and PAWLOWSKY-GLAHN, V. (2005), “Groups of Parts and Their Balances in Compositional Data Analysis,” Mathematical Geology, 37, 795–828.zbMATHCrossRefMathSciNetGoogle Scholar
  16. ESCOFIER, B. (1978), “Analyse factorielle et distances répondant au principe d’équivalence distributionelle,” Revue de Statistique Appliquée, 26, 29–37.Google Scholar
  17. GABRIEL, K.R. (1971), “The Biplot-graphical Display with Applications to Principal Component Analysis,” Biometrika, 58, 453–467.zbMATHCrossRefMathSciNetGoogle Scholar
  18. GABRIEL, K.R. (1972), “Analysis of Meteorological Data by Means of Canonical Decomposition and Biplots,” Journal of Applied Meteorology, 11, 1071–1077.CrossRefGoogle Scholar
  19. GABRIEL, K. R. (2002), “Goodness of Fit of Biplots and Correspondence Analysis,” Biometrika, 89, 423–436.zbMATHCrossRefMathSciNetGoogle Scholar
  20. GOODMAN, L.A. (1968), “The Analysis of Cross-classified Data: Independence, Quasiindependence, and Interactions in Contingency Tables, With or Without Missing Entries,” Journal of the American Statistical Association, 63, 1091–1131.zbMATHCrossRefGoogle Scholar
  21. GOODMAN, L.A. (1985), “The Analysis of Cross-classified Data Having Ordered and/or Unordered Categories: Association Models, Correlation Models, and Asymmetry Models for Contingency Tables With or Without Missing Entries,” The Annals of Statistics, 13, 10–98.zbMATHCrossRefMathSciNetGoogle Scholar
  22. GREENACRE, M.J. (1984), Theory and Applications of Correspondence Analysis, London: Academic Press.zbMATHGoogle Scholar
  23. GREENACRE, M.J. (1993), “Biplots in Correspondence Analysis,” Journal of Applied Statistics, 20, 251–269.CrossRefGoogle Scholar
  24. GREENACRE, M.J. (2006), “Tying Up the Loose Ends in Simple, Multiple and Joint Correspondence Analysis,” Keynote Address, COMPSTAT 2006, in Proceedings in Computational Statistics, eds. A. Rizzi and M. Vichi, Berlin: Springer-Verlag, pp.163–186.CrossRefGoogle Scholar
  25. GREENACRE, M.J. (2007), Correspondence Analysis in Practice (2nd Ed.), London: Chapman & Hall / CRC.zbMATHGoogle Scholar
  26. GREENACRE, M.J. (2008), “Power Transformations in Correspondence Analysis,” accepted for publication in Computational Statistics and Data Analysis, downloadable at http://www.econ.upf.edu/en/research/onepaper.php?id=1044
  27. GREENACRE, M.J., and BLASIUS, J. (eds) (1994), Correspondence Analysis in the Social Sciences, London: Academic Press.Google Scholar
  28. GREENACRE, M.J., and PARDO, R. (2006), “Subset Correspondence Analysis: Visualizing Relationships Among a Selected Set of Response Categories from a Questionnaire Survey,” Sociological Methods and Research, 35, 193–218.CrossRefMathSciNetGoogle Scholar
  29. KAZMIERCZAK, J.B. (1985), Analyse logarithmique: deux exemples d’application, Revue de Statistique Appliquée, 33, 13–24.MathSciNetGoogle Scholar
  30. LEBART, L., MORINEAU A., and WARWICK, K. (1984), Multivariate Descriptive Statistical Analysis, New York: Wiley.zbMATHGoogle Scholar
  31. LEWI, P.J. (1976), “Spectral Mapping, A Technique for Classifying Biological Activity Profiles of Chemical Compounds,” Arzneimittel Forschung, 26, 1295–1300.Google Scholar
  32. LEWI, P.J. (1980), “Multivariate Data Analysis in APL,” in Proceedings of APL-80 Conference, ed. G.A. van der Linden, Amsterdam: North-Holland, pp. 267–271.Google Scholar
  33. LEWI, P.J. (1998), “Analysis of Contingency Tables,” in Handbook of Chemometrics and Qualimetrics: Part B, eds. B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke, Amsterdam: Elsevier, pp. 161–206.Google Scholar
  34. MARTÍN-FERNÁNDEZ, J.A., BARCELÓ-VIDAL, C., and PAWLOWSKY-GLAHN, V. (2003), “Dealing with Zeros and Missing Values in Compositional Data Sets,” Mathematical Geology, 35, 253–278.CrossRefGoogle Scholar
  35. NENADIĆ, O., and GREENACRE, M.J. (2007), “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package,” Journal of Statistical Software 20(3), http://www.jstatsoft.org/v20/i03/.
  36. R DEVELOPMENT CORE TEAM (2007), “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org.
  37. S-PLUS, VERSION 7 (2007). Insightful Corporation, Seattle, USA, http://www.insightful.com.
  38. TER BRAAK, C.J.F. (1985), “Correspondence Analysis of Incidence and Abundance Data: Properties in Terms of a Unimodal Response Model,” Biometrics, 41, 859–873.CrossRefGoogle Scholar
  39. VERMUNT, J.K. (1997), “LEM: A General Program for the Analysis of Categorical Data,” The Netherlands: Department of Methodology and Statistics, Tilburg University.Google Scholar
  40. WOUTERS, L., GÖHLMANN, H.W., BIJNENS, L., KASS, S.U., MOLENBERGHS, G., and LEWI, P.J. (2003), “Graphical Exploration of Gene Expression Data: A Comparative Study of Three Multivariate Methods,” Biometrics, 59, 1131–1139.zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Departament d’Economia i EmpresaUniversitat Pompeu FabraBarcelonaSpain
  2. 2.Katholieke Universiteit Leuven and Vrije Universiteit BrusselBrusselBelgium

Personalised recommendations