Skip to main content

High-Breakdown Estimators of Multivariate Location and Scatter

  • Chapter
Robustness and Complex Data Structures

Abstract

This contribution gives a brief summary of robust estimators of multivariate location and scatter. We assume that the original (uncontaminated) data follow an elliptical distribution with location vector μ and positive definite scatter matrix Σ. Robust methods aim to estimate μ and Σ even though the data has been contaminated by outliers. The well-known multivariate M-estimators can break down when the outlier fraction exceeds 1/(p+1) where p is the number of variables. We describe several robust estimators that can withstand a high fraction (up to 50 %) of outliers, such as the minimum covariance determinant estimator (MCD), the Stahel–Donoho estimator, S-estimators and MM-estimators. We also discuss faster methods that are only approximately equivariant under linear transformations, such as the orthogonalized Gnanadesikan–Kettenring estimator and the deterministic MCD algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Becker, C., & Gather, U. (1999). The masking breakdown point of multivariate outlier identification rules. Journal of the American Statistical Association, 94, 947–955.

    Article  MathSciNet  MATH  Google Scholar 

  • Becker, C., & Gather, U. (2001). The largest nonidentifiable outlier: a comparison of multivariate simultaneous outlier identification rules. Computational Statistics & Data Analysis, 36, 119–127.

    Article  MathSciNet  MATH  Google Scholar 

  • Billor, N., Hadi, A., & Velleman, P. (2000). BACON: blocked adaptive computationally efficient outlier nominators. Computational Statistics & Data Analysis, 34, 279–298.

    Article  MATH  Google Scholar 

  • Cator, E., & Lopuhaä, H. (2012). Central limit theorem and influence function for the MCD estimators at general multivariate distributions. Bernoulli, 18, 520–551.

    Article  MathSciNet  MATH  Google Scholar 

  • Croux, C., & Haesbroeck, G. (1999). Influence function and efficiency of the Minimum Covariance Determinant scatter matrix estimator. Journal of Multivariate Analysis, 71, 161–190.

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, L. (1987). Asymptotic behavior of S-estimators of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269–1292.

    Article  MathSciNet  MATH  Google Scholar 

  • Davies, P., & Gather, U. (2005). Breakdown and groups (with discussion and rejoinder). The Annals of Statistics, 33, 977–1035.

    Article  MathSciNet  MATH  Google Scholar 

  • Debruyne, M., & Hubert, M. (2009). The influence function of the Stahel–Donoho covariance estimator of smallest outlyingness. Statistics & Probability Letters, 79, 275–282.

    Article  MathSciNet  MATH  Google Scholar 

  • Donoho, D. (1982). Breakdown properties of multivariate location estimators. Ph.D. Thesis, Harvard University, Boston.

    Google Scholar 

  • Fritz, H., Filzmoser, P., & Croux, C. (2012). A comparison of algorithms for the multivariate L1-median. Computational Statistics, 27, 393–410.

    Article  MathSciNet  Google Scholar 

  • Gather, U., & Hilker, T. (1997). A note on Tyler’s modification of the MAD for the Stahel-Donoho estimator. The Annals of Statistics, 25, 2024–2026.

    Article  MathSciNet  MATH  Google Scholar 

  • Gnanadesikan, R., & Kettenring, J. (1972). Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics, 28, 81–124.

    Article  Google Scholar 

  • Hampel, F., Ronchetti, E., Rousseeuw, P., & Stahel, W. (1986). Robust statistics: the approach based on influence functions. New York: Wiley.

    MATH  Google Scholar 

  • Hubert, M., Rousseeuw, P., & Vanden Branden, K. (2005). ROBPCA: a new approach to robust principal components analysis. Technometrics, 47, 64–79.

    Article  MathSciNet  Google Scholar 

  • Hubert, M., Rousseeuw, P., & Verdonck, T. (2012). A deterministic algorithm for robust location and scatter. Journal of Computational and Graphical Statistics, 21, 618–637.

    Article  MathSciNet  Google Scholar 

  • Lopuhaä, H. (1989). On the relation between S-estimators and M-estimators of multivariate location and covariance. The Annals of Statistics, 17, 1662–1683.

    Article  MathSciNet  MATH  Google Scholar 

  • Lopuhaä, H., & Rousseeuw, P. (1991). Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. The Annals of Statistics, 19, 229–248.

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna, R. (1976). Robust M-estimators of multivariate location and scatter. The Annals of Statistics, 4, 51–67.

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna, R., Martin, D., & Yohai, V. (2006). Robust statistics: theory and methods. New York: Wiley.

    Book  MATH  Google Scholar 

  • Maronna, R., & Yohai, V. (1995). The behavior of the Stahel–Donoho robust multivariate estimator. Journal of the American Statistical Association, 90, 330–341.

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna, R., & Zamar, R. (2002). Robust estimates of location and dispersion for high-dimensional data sets. Technometrics, 44, 307–317.

    Article  MathSciNet  Google Scholar 

  • Pison, G., Van Aelst, S., & Willems, G. (2002). Small sample corrections for LTS and MCD. Metrika, 55, 111–123.

    Article  MathSciNet  Google Scholar 

  • Rousseeuw, P. (1984). Least median of squares regression. Journal of the American Statistical Association, 79, 871–880.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P. (1985). Multivariate estimation with high breakdown point. In W. Grossmann, G. Pflug, I. Vincze, & W. Wertz (Eds.), Mathematical statistics and applications (Vol. B, pp. 283–297). Dordrecht: Reidel.

    Chapter  Google Scholar 

  • Rousseeuw, P., & Croux, C. (1993). Alternatives to the median absolute deviation. Journal of the American Statistical Association, 88, 1273–1283.

    Article  MathSciNet  MATH  Google Scholar 

  • Rousseeuw, P., & Leroy, A. (1987). Robust regression and outlier detection. New York: Wiley-Interscience.

    Book  MATH  Google Scholar 

  • Rousseeuw, P., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.

    Article  Google Scholar 

  • Rousseeuw, P., & van Zomeren, B. (1990). Unmasking multivariate outliers and leverage points. Journal of the American Statistical Association, 85, 633–651.

    Article  Google Scholar 

  • Salibian-Barrera, M., Van Aelst, S., & Willems, G. (2006). PCA based on multivariate MM-estimators with fast and robust bootstrap. Journal of the American Statistical Association, 101, 1198–1211.

    Article  MathSciNet  MATH  Google Scholar 

  • Salibian-Barrera, M., & Yohai, V. J. (2006). A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414–427.

    Article  MathSciNet  Google Scholar 

  • Stahel, W. (1981). Robuste Schätzungen: infinitesimale Optimalität und Schätzungen von Kovarianzmatrizen. Ph.D. Thesis, ETH Zürich.

    Google Scholar 

  • Tatsuoka, K., & Tyler, D. (2000). On the uniqueness of S-functionals and M-functionals under nonelliptical distributions. The Annals of Statistics, 28, 1219–1243.

    Article  MathSciNet  MATH  Google Scholar 

  • Verboven, S., & Hubert, M. (2005). LIBRA: a Matlab library for robust analysis. Chemometrics and Intelligent Laboratory Systems, 75, 127–136.

    Article  Google Scholar 

  • Visuri, S., Koivunen, V., & Oja, H. (2000). Sign and rank covariance matrices. Journal of Statistical Planning and Inference, 91, 557–575.

    Article  MathSciNet  MATH  Google Scholar 

  • Yohai, V., & Zamar, R. (1988). High breakdown point estimates of regression by means of the minimization of an efficient scale. Journal of the American Statistical Association, 83, 406–413.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter Rousseeuw .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Rousseeuw, P., Hubert, M. (2013). High-Breakdown Estimators of Multivariate Location and Scatter. In: Becker, C., Fried, R., Kuhnt, S. (eds) Robustness and Complex Data Structures. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-35494-6_4

Download citation

Publish with us

Policies and ethics