Abstract
Following consistently the principles of compositional data analysis has serious impacts for distributional modeling and statistical processing in general. Particularly, due to the lack of scale invariance, the known Dirichlet distribution is no longer the “must” as the underlying distribution of compositions. It is rather preferred to make use of the concept of normal distribution on the simplex, because the appropriateness of the distribution can be verified by using a standard normality test in coordinates, and the parameters are easy to interpret. Consequently, it can be utilized as the underlying distribution for a wide range of popular methods and tests, including Hotelling tests and MANOVA models in any orthonormal coordinate representation. Because compositional data frequently contain outliers, data inconsistencies, rounding effects, dependencies among the observations, etc., it is recommendable to apply robust counterparts to classical methods in practice. Either univariate or multivariate robust statistical processing can be performed, based on such logratio coordinate representation that serves the purpose of the analysis. Even the classical estimators of location and scale, the sample mean and the sample covariance matrix, are highly sensitive to outliers. As robust alternatives, affine equivariant estimators (like the MCD estimator) are preferred as they can be computed in any coordinate representation. Robust estimators of location and scale can then be used to compute Mahalanobis distances in order to identify multivariate outliers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
J. Aitchison, The Statistical Analysis of Compositional Data (Chapman & Hall, London, 1986). Reprinted in 2003 with additional material by The Blackburn Press
J. Aitchison, G. Mateu-Figueras, K.W. Ng, Characterisation of distributional forms for compositional data and associated distributional tests. Math. Geol. 35(6), 667–680 (2004)
T.W. Anderson, An Introduction to Multivariate Statistical Analysis (Wiley, Chichester, 2003)
V. Barnett, T. Lewis, Outliers in Statistical Data, 3rd edn. (Wiley, New York, 1994)
P. Filzmoser, M. Gschwandtner, mvoutlier: Multivariate Outlier Detection Based on Robust Methods, 2017. https://CRAN.R-project.org/package=mvoutlier. R package version 2.0.8
P. Filzmoser, K. Hron, Outlier detection for compositional data using robust methods. Math. Geosci. 40(3), 233–248 (2008)
P. Filzmoser, R.G. Garrett, C. Reimann, Multivariate outlier detection in exploration geochemistry. Comput. Geosci. 31, 579–587 (2005)
P. Filzmoser, K. Hron, C. Reimann, Interpretation of multivariate outliers for compositional data. Comput. Geosci. 39, 77–85 (2012a).
P. Filzmoser, K. Hron, M. Templ, Discriminant analysis for compositional data and robust parameter estimation. J. Comput. Stat. 27(4), 585–604 (2012b)
F.R. Hampel, E.M. Ronchetti, P.J. Rousseeuw, W. Stahel, Robust Statistics. The Approach Based on Influence Functions (Wiley, New York, 1986)
R.A. Johnson, D.W. Wichern, Applied Multivariate Statistical Analysis, 6th edn. (Prentice Hall, Upper Saddle River, 2007)
R.A. Maronna, R.H. Zamar, Robust estimation of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)
R. Maronna, D. Martin, V. Yohai, Robust Statistics: Theory and Methods (Wiley, Chichester, 2006)
G. Mateu-Figueras, V. Pawlowsky-Glahn, A critical approach to probability laws in geochemistry. Math. Geosci. 40(5), 489–502 (2008)
G.S. Monti, G. Mateu-Figueras, V. Pawlowsky-Glahn, Notes of the scaled Dirichlet distribution, in Compositional Data Analysis: Theory and Applications, ed. by V. Pawlowsky-Glahn, A. Buccianti (Wiley, Chichester, 2011), pp. 128–138
V. Pawlowsky-Glahn, J.J. Egozcue, R. Tolosana-Delgado, Modeling and Analysis of Compositional Data (Wiley, Chichester, 2015)
C. Reimann, M. Äyräs, V. Chekushin, I. Bogatyrev, R. Boyd, P. de Caritat, R. Dutter, T.E. Finne, J.H. Halleraker, Ø. Jæger, G. Kashulina, O. Letho, H. Niskavaara, V. Pavlov, M.L. Räisänen, T. Strand, T. Volden, Environmental Geochemical Atlas of the Central Parts of the Barents Region (Geological Survey of Norway, Trondheim, 1998)
P. Rousseeuw, Multivariate estimation with high breakdown point, in Mathematical Statistics and Applications, ed. by W. Grossmann, G. Pflug, I. Vincze, W. Wertz (Reidel Publishing Company, Dordrecht, 1985), pp. 283–297
P.J. Rousseeuw, K. Van Driessen, A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
J.W. Tukey, Exploratory Data Analysis (Addison-Wesley, Reading, 1977)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Filzmoser, P., Hron, K., Templ, M. (2018). First Steps for a Statistical Analysis. In: Applied Compositional Data Analysis. Springer Series in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-96422-5_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-96422-5_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-96420-1
Online ISBN: 978-3-319-96422-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)