Distributional Equivalence and Subcompositional Coherence in the Analysis of Compositional Data, Contingency Tables and Ratio-Scale Measurements
- 277 Downloads
We consider two fundamental properties in the analysis of two-way tables of positive data: the principle of distributional equivalence, one of the cornerstones of correspondence analysis of contingency tables, and the principle of subcompositional coherence, which forms the basis of compositional data analysis. For an analysis to be subcompositionally coherent, it suffices to analyze the ratios of the data values. A common approach to dimension reduction in compositional data analysis is to perform principal component analysis on the logarithms of ratios, but this method does not obey the principle of distributional equivalence. We show that by introducing weights for the rows and columns, the method achieves this desirable property and can be applied to a wider class of methods. This weighted log-ratio analysis is theoretically equivalent to “spectral mapping”, a multivariate method developed almost 30 years ago for displaying ratio-scale data from biological activity spectra. The close relationship between spectral mapping and correspondence analysis is also explained, as well as their connection with association modeling. The weighted log-ratio methodology is used here to visualize frequency data in linguistics and chemical compositional data in archeology.
KeywordsAssociation models Biplot Correspondence analysis Log-ratio analysis Singular value decomposition Spectral mapping
Unable to display preview. Download preview PDF.
- BAVAUD, F. (2002), “Quotient Dissimilarities, Euclidean Embeddability, and Huygens’ Weak Principle,” in Classification, Clustering and Data Analysis, eds. K. Jajuga, A. Sokolowski and H.-H.Bock, New York: Springer, pp. 195–202.Google Scholar
- BAVAUD, F. (2004), “Generalized Factor Analyses for Contingency Tables,” in Classification, Clustering, and Data Mining Applications, eds. D. Banks, L. House, F.R. McMorris, P. Arabie and W. Gaul, New York: Springer, pp. 597–606.Google Scholar
- BEARDAH, C.C., BAXTER, M.J., COOL, H.E.M., and JACKSON, C.M. (2003), “Compositional Data Analysis of Archaeological Glass: Problems and Possible Solutions,” in: Proceedings of the First Compositional Data Analysis Workshop, Girona, Spain, http://ima.udg.edu/Activitats/CoDaWork03/paper_baxter_Beardah2.pdf
- BENZÉCRI, J.-P. (1973), L’Analyse des Données, Tôme I: La Classification, Tôme II: L’Analyse des Correspondances, Paris : Dunod.Google Scholar
- CUADRAS, C., and FORTIANA, J. (1998), “Visualizing Categorical Data with Related Metric Scaling,” in Visualization of Categorical Data, eds. J. Blasius and M.J. Greenacre, San Diego: Academic Press, pp. 112–129.Google Scholar
- ESCOFIER, B. (1978), “Analyse factorielle et distances répondant au principe d’équivalence distributionelle,” Revue de Statistique Appliquée, 26, 29–37.Google Scholar
- GREENACRE, M.J. (2008), “Power Transformations in Correspondence Analysis,” accepted for publication in Computational Statistics and Data Analysis, downloadable at http://www.econ.upf.edu/en/research/onepaper.php?id=1044
- GREENACRE, M.J., and BLASIUS, J. (eds) (1994), Correspondence Analysis in the Social Sciences, London: Academic Press.Google Scholar
- LEWI, P.J. (1976), “Spectral Mapping, A Technique for Classifying Biological Activity Profiles of Chemical Compounds,” Arzneimittel Forschung, 26, 1295–1300.Google Scholar
- LEWI, P.J. (1980), “Multivariate Data Analysis in APL,” in Proceedings of APL-80 Conference, ed. G.A. van der Linden, Amsterdam: North-Holland, pp. 267–271.Google Scholar
- LEWI, P.J. (1998), “Analysis of Contingency Tables,” in Handbook of Chemometrics and Qualimetrics: Part B, eds. B.G.M. Vandeginste, D.L. Massart, L.M.C. Buydens, S. de Jong, P.J. Lewi, and J. Smeyers-Verbeke, Amsterdam: Elsevier, pp. 161–206.Google Scholar
- NENADIĆ, O., and GREENACRE, M.J. (2007), “Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package,” Journal of Statistical Software 20(3), http://www.jstatsoft.org/v20/i03/.
- R DEVELOPMENT CORE TEAM (2007), “R: A Language and Environment for Statistical Computing,” R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, http://www.R-project.org.
- S-PLUS, VERSION 7 (2007). Insightful Corporation, Seattle, USA, http://www.insightful.com.
- VERMUNT, J.K. (1997), “LEM: A General Program for the Analysis of Categorical Data,” The Netherlands: Department of Methodology and Statistics, Tilburg University.Google Scholar