Advertisement

Mathematical Geology

, Volume 35, Issue 3, pp 253–278 | Cite as

Dealing with Zeros and Missing Values in Compositional Data Sets Using Nonparametric Imputation

  • J. A. Martín-Fernández
  • C. Barceló-Vidal
  • V. Pawlowsky-Glahn
Article

Abstract

The statistical analysis of compositional data based on logratios of parts is not suitable when zeros are present in a data set. Nevertheless, if there is interest in using this modeling approach, several strategies have been published in the specialized literature which can be used. In particular, substitution or imputation strategies are available for rounded zeros. In this paper, existing nonparametric imputation methods—both for the additive and the multiplicative approach—are revised and essential properties of the last method are given. For missing values a generalization of the multiplicative approach is proposed.

Aitchison distance detection limit logratio transformation simplex, stress threshold 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

REFRENCES

  1. 1.
    Aitchison, J., 1986, The statistical analysis of compositional data: Chapman and Hall, London, 416p.Google Scholar
  2. 2.
    Aitchison, J., 1997, The one-hour course in compositional data analysis or compositional data analysis is simple, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE); Barcelona, Spain, p. 3-35.Google Scholar
  3. 3.
    Aitchison, J., 2002, Simplicial inference, in Viana, M. A. G., and Richards, D. S. P., eds., Contemporary mathematics series, Vol.287: Algebraic methods in statistics and probability, American Mathematical Society, Providence, RI, p. 1-22.Google Scholar
  4. 4.
    Aitchison, J., Barceló-Vidal, C., Martín-Fernández, J. A., and Pawlowsky-Glahn, V., 2000, Logratio analysis and compositional distance: Math. Geol., v.32, no.3, p. 271-275.Google Scholar
  5. 5.
    Aitchison, J., and Greenacre, M., 2002, Biplots of compositional data: Appl. Stat., v.51, no.4, p. 375-392.Google Scholar
  6. 6.
    Allison, P. D., 2001, Missing data: Sage University Papers Series on Quantitative Applications in the Social Sciences, 07-136, Thousand Oaks, CA, 93p.Google Scholar
  7. 7.
    Bacon-Shone, J., 1992, Ranking methods for compositional data: Appl. Stat., v.41, no.3, p. 533-537.Google Scholar
  8. 8.
    Barceló-Vidal, C., Martíln-Fernández, J. A., and Pawlowsky-Glahn, V., 2001, Mathematical foundations of compositional data analysis, in Ross, G., ed., Proceedings of IAMG'01, The sixth annual conference of the International Association for Mathematical Geology: Cancun, Mexico, 20p. (CD, electronic publication).Google Scholar
  9. 9.
    Billheimer, D., Guttorp, P., and Fagan, W., 2001, Statistical interpretation of species composition: J. Am. Stat. Assoc., v.96, p. 1205-1214.Google Scholar
  10. 10.
    Bohling, G. C., Davis, J. C., Olea, R. A., and Harff, J., 1996, Singularity and nonnormality in the classification of compositional data: Math. Geol., v.30, no.1, p. 5-20.Google Scholar
  11. 11.
    Cox, T. F., and Cox, M. A., 1994, Multidimensional Scaling: Monographs on statistics and applied probability: Chapman and Hall, London, 213p.Google Scholar
  12. 12.
    Davis, J. C., Harff, J., Olea, R., and Bohling, G. C., 1995, Regionalized classification of the Darss Sill sediments, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 145-150.Google Scholar
  13. 13.
    Fry, J. M., Fry, T. R. L., and McLaren, K. R., 1996, Compositional data analysis and zeros in micro data: Centre of Policy Studies (COPS), General Paper no. G-120, Monash University, Clayton, Australia.Google Scholar
  14. 14.
    Krzanowski, W. J., 1988, Principles of multivariate analysis: A user's perspective: Clarendon Press, Oxford, 563p. (reprinted 1996).Google Scholar
  15. 15.
    Little, R. J. A., and Rubin, D. B., 1987, Statistical analysis with missing data: Wiley, New York, 278p.Google Scholar
  16. 16.
    Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1997, Different classifications of the Darss Sill data set based on mixture models for compositional data, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 151-158.Google Scholar
  17. 17.
    Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998a, Measures of difference for compositional data and hierarchical clustering methods, in Buccianti, A., Nardi, G., and Potenza, R., eds., Proceedings of IAMG'98, The Fourth Annual Conference of the International Association for Mathematical Geology, Vol.2: De Frede Editore, Napoli, p. 526-531.Google Scholar
  18. 18.
    Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998b, A critical approach to nonparametric classification of compositional data, in Rizzi, A., Vichi, M., and Bock, H. H., eds., Advances in data science and classification, Proceedings of the 6th Conference of the International Federation of Classification Societies (IFCS-98), Università La Sapienza, Roma: Springer-Verlag, Berlin, p. 49-56.Google Scholar
  19. 19.
    Martín-Fernández, J. A., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 2000, Zero replacement in compositional data sets, in Kiers, H., Rasson, J., Groenen, P., and Shader, M., eds., Studies in classification, data analysis, and knowledge organization, Proceedings of the 7th Conference of the International Federation of Classification Societies (IFCS'2000), University of Namur, Namur: Springer-Verlag, Berlin, p. 155-160.Google Scholar
  20. 20.
    Martín-Fernández, J. A., Olea-Meneses, R., and Pawlowsky-Glahn, V., 2001, Criteria to compare estimation methods of regionalized compositions: Math. Geol., v.33, no.8, p. 889-909.Google Scholar
  21. 21.
    Mateu-Figueras, G., Barceló-Vidal, C., and Pawlowsky-Glahn, V., 1998, Modeling compositional data with multivariate skew-normal distributions, in Buccianti, A., Nardi, G., and Potenza, R., eds., Proceedings of IAMG'98, The Fourth Annual Conference of the International Association for Mathematical Geology, Vol.1: De Frede Editore, Napoli, p. 532-537.Google Scholar
  22. 22.
    Pawlowsky-Glahn, V., and Egozcue, J. J., 2001, Geometric approach to statistical analysis on the simplex: SERRA, v.15, no.5, p. 384-398.Google Scholar
  23. 23.
    Pawlowsky-Glahn, V., and Egozcue, J. J., 2002, BLU estimators and compositional data: Math. Geol., v.34, no.3, p. 259-274.Google Scholar
  24. 24.
    Sandford, R. F., Pierson, C. T., and Crovelli, R. A., 1993, An objective replacement method for censored geochemical data: Math. Geol., v.25, no.1, p. 59-80.Google Scholar
  25. 25.
    Shafer, J. L., 1997, Analysis of incomplete multivariate data: Chapman and Hall, London, 430p.Google Scholar
  26. 26.
    Tauber, F., 1999, Spurious clusters in granulometric data caused by logratio transformation: Math. Geol., v.31, no.5, p. 491-504.Google Scholar
  27. 27.
    Zhou, D., 1997, Logratio statistical classification and estimation of hydrodynamic parameters from Darss Sill grain-size data, in Pawlowsky-Glahn, V., ed., Proceedings of IAMG'97, The Third Annual Conference of the International Association for Mathematical Geology, Vol.1: International Center for Numerical Methods in Engineering (CIMNE), Barcelona, p. 139-144.Google Scholar

Copyright information

© International Association for Mathematical Geology 2003

Authors and Affiliations

  • J. A. Martín-Fernández
    • 1
  • C. Barceló-Vidal
    • 1
  • V. Pawlowsky-Glahn
    • 1
  1. 1.Dept. Informàtica i Matemàtica AplicadaUniversitat de GironaGironaSpain

Personalised recommendations