Skip to main content
Log in

Discriminant analysis for compositional data and robust parameter estimation

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

Compositional data, i.e. data including only relative information, need to be transformed prior to applying the standard discriminant analysis methods that are designed for the Euclidean space. Here it is investigated for linear, quadratic, and Fisher discriminant analysis, which of the transformations lead to invariance of the resulting discriminant rules. Moreover, it is shown that for robust parameter estimation not only an appropriate transformation, but also affine equivariant estimators of location and covariance are needed. An example and simulated data demonstrate the effects of working in an inappropriate space for discriminant analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, London

    Book  MATH  Google Scholar 

  • Aitchison J, Greenacre M (2002) Biplots of compositional data. Appl Stat 51: 375–392

    Article  MathSciNet  MATH  Google Scholar 

  • Barceló-Vidal C, Martín-Fernández J, Pawlowsky-Glahn V (1999) Comment on ’Singularity and nonnormality in the classification of compositional data’ by G. C. Bohling, J.C. Davis, R.A. Olea, and J. Harff. Math Geol 31(5): 581–585

    Article  Google Scholar 

  • Bohling G, Davis J, Olea R, J H (1998) Singularity and nonnormality in the classification of compositional data. Math Geol 30(1): 5–20

    Article  Google Scholar 

  • Croux C, Dehon C (2001) Robust linear discriminant analysis using S-estimators. Can J Stat 29: 473–492

    Article  MathSciNet  MATH  Google Scholar 

  • Croux C, Filzmoser P, Joossens K (2008) Classification efficiencies for robust linear discriminant analysis. Stat Sin (18): 581–599

    MathSciNet  MATH  Google Scholar 

  • Egozcue J, Pawlowsky-Glahn V (2005) Groups of parts and their balances in compositional data analysis. Math Geol 37(7): 795–828

    Article  MathSciNet  MATH  Google Scholar 

  • Egozcue J, Pawlowsky-Glahn V (2006) Simplicial geometry for compositional data. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 145–160

  • Egozcue J, Pawlowsky-Glahn V, Mateu-Figueras G, Barceló-Vidal C (2003) Isometric logratio transformations for compositional data analysis. Math Geol 35(3): 279–300

    Article  MathSciNet  Google Scholar 

  • Filzmoser P, Hron K (2008) Outlier detection for compositional data using robust methods. Math Geosci 40(3): 233–248

    Article  MATH  Google Scholar 

  • Filzmoser P, Hron K (2009) Correlation analysis for compositional data. Math Geosci 41: 905–919

    Article  MathSciNet  MATH  Google Scholar 

  • Filzmoser P, Hron K, Reimann C (2009) Principal component analysis for compositional data with outliers. Environmetrics 20: 621–632

    Article  MathSciNet  Google Scholar 

  • Fisher RA (1938) The statistical utilization of multiple measurements. Ann Eugen 8: 376–386

    Google Scholar 

  • Gorelikova N, Tolosana-Delgado R, Pawlowsky-Glahn V, Khanchuk A, Gonevchuk V (2006) Discriminating geodynamical regimes of tin ore formation using trace element composition of cassiterite: the Sikhote’Alin case (Far Eastern Russia). In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 43–57

  • Hawkins D, McLachlan G (1997) High-breakdown linear discriminant analysis. J Am Stat Assoc 92: 136–143

    Article  MathSciNet  MATH  Google Scholar 

  • He X, Fung W (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. J Multivar Stat 72: 151–162

    Article  MathSciNet  MATH  Google Scholar 

  • Hron K, Templ M, Filzmoser P (2010) Imputation of missing values for compositional data using classical and robust methods. Comput Stat Data Anal 54(12): 3095–3107

    Article  MathSciNet  Google Scholar 

  • Hubert M, Van Driessen K (2004) Fast and robust discriminant analysis. Comput Stat Data Anal (45): 301–320

    Article  MathSciNet  MATH  Google Scholar 

  • Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn.. Prentice Hall, New York

    MATH  Google Scholar 

  • Kovács L, Kovács G, Martín-Fernández J, Barceló-Vidal C (2006) Major-oxide compositional discrimination in Cenozoic volcanites of Hungary. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 11–23

  • Maronna R, Martin R, Yohai V (2006) Robust statistics: theory and methods. Wiley, New York

    Book  MATH  Google Scholar 

  • Martín-Fernández J, Barceló-Vidal C, Pawlowsky-Glahn V, Kovács L, Kovács G (2005) Subcompositional patterns in Cenozoic volcanic rocks of Hungary. Math Geol 37(7): 729–752

    Article  Google Scholar 

  • Mateu-Figueras G, Pawlowsky-Glahn V (2008) A critical approach to probability laws in geochemistry. Math Geosci 40: 489–502

    Article  MATH  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue J (2001) Geometric approach to statistical analysis on the simplex. Stoch Environ Res Risk Assess 15(5): 384–398

    Article  MATH  Google Scholar 

  • Pawlowsky-Glahn V, Egozcue J, Tolosana-Delgado R (2008) Lecture notes on compositional data analysis. Universitat de Girona. http://hdl.handle.net/10256/297

  • Pearson K (1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation which may arise when indices are used in the measurement of organs. Proc R Soc London 60: 489–502

    MATH  Google Scholar 

  • Rao CR (1948) The utilization of multiple measurements in problems of biological classification. J R Stat Soc B 10: 159–203

    MATH  Google Scholar 

  • Reimann C, Filzmoser P, Garrett R, Dutter R (2008) Statistical data analysis explained. Applied Environmental Statistics with R. John Wiley, Chichester

    Book  Google Scholar 

  • Thomas C, Aitchison J (2005) Compositional data analysis of geological variability and process: a case study. Math Geol 37(7): 753–772

    Article  MATH  Google Scholar 

  • Thomas C, Aitchison J (2006) Log-ratios and geochemical discrimination of Scottish Dalradian limestones: a case study. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional data analysis in the geosciences: from theory to practice. Geological Society, London, pp 25–41

  • Von Eynatten H, Barceló-Vidal C, Pawlowsky-Glahn V (2003) Composition and discrimination of sandstones: a statistical evaluation of different analytical methods. J Sediment Res 73(1): 47–57

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Filzmoser.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Filzmoser, P., Hron, K. & Templ, M. Discriminant analysis for compositional data and robust parameter estimation. Comput Stat 27, 585–604 (2012). https://doi.org/10.1007/s00180-011-0279-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-011-0279-8

Keywords

Navigation