Advertisement

Mathematical Geosciences

, Volume 45, Issue 4, pp 487–498 | Cite as

Covariance-Based Variable Selection for Compositional Data

  • Karel HronEmail author
  • Peter Filzmoser
  • Sandra Donevska
  • Eva Fišerová
Article

Abstract

Omitting variables in compositional data analysis may lead to a substantial change in results from that of multivariate statistical analysis. In particular, this is the case for principal component analysis and the compositional biplot, where both the interpretation of loadings and scores of the remaining subcomposition are affected. A stepwise procedure is introduced that allows for a reduction of the original composition to a subcomposition by avoiding a substantial change of the information, like those carried by the compositional biplot. The subcomposition is easier to handle and interpret. Numerical results give evidence of the usefulness of the procedure.

Keywords

Aitchison geometry on the simplex Centered log-ratio transformation Isometric log-ratio transformation Variable selection 

Notes

Acknowledgements

The authors are grateful to the referee for helpful comments and suggestions. The authors also gratefully acknowledge the support by the Operational Program Education for Competitiveness—European Social Fund (project CZ.1.07/2.3.00/20.0170 of the Ministry of Education, Youth, and Sports of the Czech Republic) and the Grant No. PrF-2012-017 of the Internal Grant Agency of the Palacký University in Olomouc.

References

  1. Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London CrossRefGoogle Scholar
  2. Aitchison J, Greenacre M (2002) Biplots of compositional data. Appl Stat 51:375–392 Google Scholar
  3. Chayes F (1960) On correlation between variables of constant sum. J Geophys Res 65:4185–4193 CrossRefGoogle Scholar
  4. Egozcue JJ (2009) Reply to “On the Harker Variation Diagrams; …” by J.A. Cortés. Math Geosci 41:829–834 CrossRefGoogle Scholar
  5. Egozcue JJ, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37:795–828 CrossRefGoogle Scholar
  6. Egozcue JJ, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice, Special publications, vol 264. Geological Society, London, pp 145–160 Google Scholar
  7. Egozcue JJ, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35:279–300 CrossRefGoogle Scholar
  8. Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20:621–632 CrossRefGoogle Scholar
  9. Filzmoser P, Hron K, Reimann C (2012) Interpretation of multivariate outliers for compositional data. Comput Geosci 39:77–85 CrossRefGoogle Scholar
  10. Fišerová E, Hron K (2011) On interpretation of orthonormal coordinates for compositional data. Math Geosci 43:455–468 CrossRefGoogle Scholar
  11. Hron K, Kubáček L (2011) Statistical properties of the total variation estimator for compositional data. Metrika 74:221–230 CrossRefGoogle Scholar
  12. Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54:3095–3107 CrossRefGoogle Scholar
  13. Pawlowsky-Glahn V, Egozcue JJ (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15:384–398 CrossRefGoogle Scholar
  14. R Development Core Team (2012) R: a language and environment for statistical computing. Vienna. http://www.r-project.org
  15. Reimann C, Äyräs M, Chekushin V, Bogatyrev I, Boyd R, Caritat P, Dutter R, Finne T, Halleraker J, Jæger O, Kashulina G, Lehto O, Niskavaara H, Pavlov V, Räisänen M, Strand T, Volden T (1998) Environmental geochemical atlas of the Central Barents Region. Geological Survey of Norway (NGU), Geological Survey of Finland (GTK), Central Kola Expedition (CKE), Special Publication, Trondheim, Espoo, Monchegorsk Google Scholar
  16. Reimann C, Siewers U, Tarvainen T, Bityukova L, Eriksson J, Gilucis A, Gregorauskiene V, Lukashev VK, Matinian NN, Pasieczna A (2003) Agricultural soils in Northern Europe: a geochemical atlas. Geologisches Jahrbuch. Schweizerbart’sche Verlagsbuchhandlung, Stuttgart, 279 pp Google Scholar
  17. Tolosana-Delgado R, Otero N, Pawlowsky-Glahn V, Soler A (2005) Latent compositional factors in the Llobregat River Basin (Spain) hydrogeochemistry. Math Geol 37:681–702 CrossRefGoogle Scholar

Copyright information

© International Association for Mathematical Geosciences 2013

Authors and Affiliations

  • Karel Hron
    • 1
    • 2
    Email author
  • Peter Filzmoser
    • 3
  • Sandra Donevska
    • 1
    • 2
  • Eva Fišerová
    • 1
    • 2
  1. 1.Department of Mathematical Analysis and Applications of Mathematics, Faculty of SciencePalacký UniversityOlomoucCzech Republic
  2. 2.Department of Geoinformatics, Faculty of SciencePalacký UniversityOlomoucCzech Republic
  3. 3.Department of Statistics and Probability TheoryVienna University of TechnologyViennaAustria

Personalised recommendations