Abstract
Detecting outliers in a multivariate point cloud is not trivial, especially when dealing with a sizable fraction of contamination. Over time, it has increasingly been recognized that the safest and most feasible approach to exposing outliers starts by computing a highly robust estimator of location and scatter that can withstand a large proportion of contamination. Many such estimators have been proposed in recent years. We will compare the worst-case bias of several prominent robust multivariate estimators by means of simulation. We also propose a new tool to compare robust estimators on real data sets, and illustrate it.
Similar content being viewed by others
References
Billor N, Hadi AS, Velleman PF (2000) BACON: blocked adaptive computationally efficient outlier nominators. Comput Stat Data Anal 34:279–298
Daudin JJ, Duby C, Trecourt P (1988) Stability of principal component analysis studied by the bootstrap method. Statistics 19:241–258
Debruyne M, Hubert M (2009) The influence function of the Stahel–Donoho covariance estimator of smallest outlyingness. Stat Probab Lett 79:275–282
Ginsberg J, Mohebbi MH, Patel RS, Brammer L, Smolinski MS, Brilliant L (2009) Detecting influenza epidemics using search engine query data. Nature 457:1012–1014
Google Flu Trends (2012). (http://www.google.org/flutrends). Accessed 25 March 2012
Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79
Hubert M, Rousseeuw PJ, Verdonck T (2012) A deterministic algorithm for robust location and scatter. J Comput Graph Stat 21:618–637
Hubert M, Vandervieren E (2008) An adjusted boxplot for skewed distributions. Comput Stat Data Anal 52:5186–5201
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics: theory and methods. Wiley, New York
Maronna RA, Yohai VJ (1995) The behavior of the Stahel–Donoho robust multivariate estimator. J Am Stat Assoc 90:330–341
Maronna RA, Zamar RH (2002) Robust estimates of location and dispersion for high-dimensional data sets. Technometrics 44:307–317
Rocke DM, Woodruff DL (1996) Identification of outliers in multivariate data. J Am Stat Assoc 91:1047–1061
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Rousseeuw PJ, Van Aelst S, Van Driessen K, Agulló J (2004) Robust multivariate regression. Technometrics 46:293–305
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Salibian-Barrera M, Van Aelst S, Willems G (2006) PCA based on multivariate MM-estimators with fast and robust bootstrap. J Am Stat Assoc 101:1198–1211
Salibian-Barrera M, Yohai VJ (2006) A fast algorithm for S-regression estimates. J Comput Graph Stat 15:414–427
Stahel W, Maechler M (2009) robustX: eXperimental eXtraneous eXtraordinary ... functionality for robust statistics. R package version 1.1-2.
Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Soft 32:1–47
Yohai VJ, Maronna RA (1990) The maximum bias of robust covariances. Commun Stat Theory Methods 19:3925–3933
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Hubert, M., Rousseeuw, P. & Vakili, K. Shape bias of robust covariance estimators: an empirical study. Stat Papers 55, 15–28 (2014). https://doi.org/10.1007/s00362-013-0544-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-013-0544-8