Abstract
Compositional data, i.e. data including only relative information, need to be transformed prior to applying the standard discriminant analysis methods that are designed for the Euclidean space. Here it is investigated for linear, quadratic, and Fisher discriminant analysis, which of the transformations lead to invariance of the resulting discriminant rules. Moreover, it is shown that for robust parameter estimation not only an appropriate transformation, but also affine equivariant estimators of location and covariance are needed. An example and simulated data demonstrate the effects of working in an inappropriate space for discriminant analysis.
Similar content being viewed by others
References
Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London
Aitchison J, Greenacre M (2002) Biplots of compositional data. Appl Stat 51: 375–392
Barceló-Vidal C, Martín-Fernández J, Pawlowsky-Glahn V (1999) Comment on ’Singularity and nonnormality in the classification of compositional data’ by G. C. Bohling, J.C. Davis, R.A. Olea, and J. Harff. Math Geol 31(5): 581–585
Bohling G, Davis J, Olea R, J H (1998) Singularity and nonnormality in the classification of compositional data. Math Geol 30(1): 5–20
Croux C, Dehon C (2001) Robust linear discriminant analysis using S-estimators. Can J Stat 29: 473–492
Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin (18): 581–599
Egozcue J, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7): 795–828
Egozcue J, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 145–160
Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3): 279–300
Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3): 233–248
Filzmoser P, Hron K (2009) Correlation analysis for compositional data. Math Geosci 41: 905–919
Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20: 621–632
Fisher RA (1938) The statistical utilization of multiple measurements. Ann Eugen 8: 376–386
Gorelikova N, Tolosana-Delgado R, Pawlowsky-Glahn V, Khanchuk A, Gonevchuk V (2006) Discriminating geodynamical regimes of tin ore formation using trace element composition of cassiterite: the Sikhote’Alin case (Far Eastern Russia). In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 43–57
Hawkins D, McLachlan G (1997) High-breakdown linear discriminant analysis. J Am Stat Assoc 92: 136–143
He X, Fung W (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. J Multivar Stat 72: 151–162
Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54(12): 3095–3107
Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal (45): 301–320
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn.. Prentice Hall, New York
Kovács L, Kovács G, Martín-Fernández J, Barceló-Vidal C (2006) Major-oxide compositional discrimination in Cenozoic volcanites of Hungary. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 11–23
Maronna R, Martin R, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York
Martín-Fernández J, Barceló-Vidal C, Pawlowsky-Glahn V, Kovács L, Kovács G (2005) Subcompositional patterns in Cenozoic volcanic rocks of Hungary. Math Geol 37(7): 729–752
Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40: 489–502
Pawlowsky-Glahn V, Egozcue J (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15(5): 384–398
Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R (2008) Lecture notes on compositional data analysis. Universitat de Girona. http://hdl.handle.net/10256/297
Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc London 60: 489–502
Rao CR (1948) The utilization of multiple measurements in problems of biological classification. J R Stat Soc B 10: 159–203
Reimann C, Filzmoser P, Garrett R, Dutter R (2008) Statistical data analysis explained. Applied Environmental Statistics with R. John Wiley, Chichester
Thomas C, Aitchison J (2005) Compositional data analysis of geological variability and process: a case study. Math Geol 37(7): 753–772
Thomas C, Aitchison J (2006) Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 25–41
Von Eynatten H, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Composition and discrimination of sandstones: a statistical evaluation of different analytical methods. J Sediment Res 73(1): 47–57
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Filzmoser, P., Hron, K. & Templ, M. Discriminant analysis for compositional data and robust parameter estimation. Comput Stat 27, 585–604 (2012). https://doi.org/10.1007/s00180-011-0279-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-011-0279-8