Skip to main content

First Steps for a Statistical Analysis

  • Chapter
  • First Online:
Applied Compositional Data Analysis

Part of the book series: Springer Series in Statistics ((SSS))

  • 3515 Accesses

Abstract

Following consistently the principles of compositional data analysis has serious impacts for distributional modeling and statistical processing in general. Particularly, due to the lack of scale invariance, the known Dirichlet distribution is no longer the “must” as the underlying distribution of compositions. It is rather preferred to make use of the concept of normal distribution on the simplex, because the appropriateness of the distribution can be verified by using a standard normality test in coordinates, and the parameters are easy to interpret. Consequently, it can be utilized as the underlying distribution for a wide range of popular methods and tests, including Hotelling tests and MANOVA models in any orthonormal coordinate representation. Because compositional data frequently contain outliers, data inconsistencies, rounding effects, dependencies among the observations, etc., it is recommendable to apply robust counterparts to classical methods in practice. Either univariate or multivariate robust statistical processing can be performed, based on such logratio coordinate representation that serves the purpose of the analysis. Even the classical estimators of location and scale, the sample mean and the sample covariance matrix, are highly sensitive to outliers. As robust alternatives, affine equivariant estimators (like the MCD estimator) are preferred as they can be computed in any coordinate representation. Robust estimators of location and scale can then be used to compute Mahalanobis distances in order to identify multivariate outliers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 139.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986). Reprinted in 2003 with additional material by The Blackburn Press

    Google Scholar 

  • J. Aitchison, G. Mateu-Figueras, K.W. Ng, Characterisation of distributional forms for compositional data and associated distributional tests. Math. Geol. 35(6), 667–680 (2004)

    Article  Google Scholar 

  • T.W. Anderson, An Introduction to Multivariate Statistical Analysis (Wiley, Chichester, 2003)

    MATH  Google Scholar 

  • V. Barnett, T. Lewis, Outliers in Statistical Data, 3rd edn. (Wiley, New York, 1994)

    MATH  Google Scholar 

  • P. Filzmoser, M. Gschwandtner, mvoutlier: Multivariate Outlier Detection Based on Robust Methods, 2017. https://CRAN.R-project.org/package=mvoutlier. R package version 2.0.8

  • P. Filzmoser, K. Hron, Outlier detection for compositional data using robust methods. Math. Geosci. 40(3), 233–248 (2008)

    Article  Google Scholar 

  • P. Filzmoser, R.G. Garrett, C. Reimann, Multivariate outlier detection in exploration geochemistry. Comput. Geosci. 31, 579–587 (2005)

    Article  Google Scholar 

  • P. Filzmoser, K. Hron, C. Reimann, Interpretation of multivariate outliers for compositional data. Comput. Geosci. 39, 77–85 (2012a).

    Article  Google Scholar 

  • P. Filzmoser, K. Hron, M. Templ, Discriminant analysis for compositional data and robust parameter estimation. J. Comput. Stat. 27(4), 585–604 (2012b)

    Article  MathSciNet  Google Scholar 

  • F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W. Stahel, Robust Statistics. The Approach Based on Influence Functions (Wiley, New York, 1986)

    Google Scholar 

  • R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, 6th edn. (Prentice Hall, Upper Saddle River, 2007)

    Google Scholar 

  • R.A. Maronna, R.H. Zamar, Robust estimation of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)

    Article  MathSciNet  Google Scholar 

  • R. Maronna, D. Martin, V. Yohai, Robust Statistics: Theory and Methods (Wiley, Chichester, 2006)

    Book  Google Scholar 

  • G. Mateu-Figueras, V. Pawlowsky-Glahn, A critical approach to probability laws in geochemistry. Math. Geosci. 40(5), 489–502 (2008)

    Article  Google Scholar 

  • G.S. Monti, G. Mateu-Figueras, V. Pawlowsky-Glahn, Notes of the scaled Dirichlet distribution, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, Chichester, 2011), pp. 128–138

    Chapter  Google Scholar 

  • V. Pawlowsky-Glahn, J.J. Egozcue, R. Tolosana-Delgado, Modeling and Analysis of Compositional Data (Wiley, Chichester, 2015)

    Google Scholar 

  • C. Reimann, M. Äyräs, V. Chekushin, I. Bogatyrev, R. Boyd, P. de Caritat, R. Dutter, T.E. Finne, J.H. Halleraker, Ø. Jæger, G. Kashulina, O. Letho, H. Niskavaara, V. Pavlov, M.L. Räisänen, T. Strand, T. Volden, Environmental Geochemical Atlas of the Central Parts of the Barents Region (Geological Survey of Norway, Trondheim, 1998)

    Google Scholar 

  • P. Rousseeuw, Multivariate estimation with high breakdown point, in Mathematical Statistics and Applications, ed. by W. Grossmann, G. Pflug, I. Vincze, W. Wertz (Reidel Publishing Company, Dordrecht, 1985), pp. 283–297

    Chapter  Google Scholar 

  • P.J. Rousseeuw, K. Van Driessen, A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)

    Article  Google Scholar 

  • J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading, 1977)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Filzmoser, P., Hron, K., Templ, M. (2018). First Steps for a Statistical Analysis. In: Applied Compositional Data Analysis. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_5

Download citation

Publish with us

Policies and ethics