Advertisement

The minimum regularized covariance determinant estimator

  • Kris BoudtEmail author
  • Peter J. Rousseeuw
  • Steven Vanduffel
  • Tim Verdonck
Article
  • 36 Downloads

Abstract

The minimum covariance determinant (MCD) approach estimates the location and scatter matrix using the subset of given size with lowest sample covariance determinant. Its main drawback is that it cannot be applied when the dimension exceeds the subset size. We propose the minimum regularized covariance determinant (MRCD) approach, which differs from the MCD in that the scatter matrix is a convex combination of a target matrix and the sample covariance matrix of the subset. A data-driven procedure sets the weight of the target matrix, so that the regularization is only used when needed. The MRCD estimator is defined in any dimension, is well-conditioned by construction and preserves the good robustness properties of the MCD. We prove that so-called concentration steps can be performed to reduce the MRCD objective function, and we exploit this fact to construct a fast algorithm. We verify the accuracy and robustness of the MRCD estimator in a simulation study and illustrate its practical use for outlier detection and regression analysis on real-life high-dimensional data sets in chemistry and criminology.

Keywords

Breakdown value High-dimensional data Regularization Robust covariance estimation 

Notes

References

  1. Agostinelli, C., Leung, A., Yohai, V., Zamar, R.: Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. Test 24(3), 441–461 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  2. Agulló, J., Croux, C., Van Aelst, S.: The multivariate least trimmed squares estimator. J. Multivar. Anal. 99, 311–338 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  3. Atkinson, A.C., Riani, M., Cerioli, A.: Exploring Multivariate Data with the Forward Search. Springer, New York (2004)CrossRefzbMATHGoogle Scholar
  4. Bartlett, M.S.: An inverse matrix adjustment arising in discriminant analysis. Ann. Math. Stat. 22(1), 107–111 (1951)MathSciNetCrossRefzbMATHGoogle Scholar
  5. Boudt, K., Cornelissen, J., Croux, C.: Jump robust daily covariance estimation by disentangling variance and correlation components. Comput. Stat. Data Anal. 56(11), 2993–3005 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  6. Butler, R., Davies, P., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. Ann. Stat. 21(3), 1385–1400 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cator, E., Lopuhaä, H.: Central limit theorem and influence function for the MCD estimator at general multivariate distributions. Bernoulli 18(2), 520–551 (2012)MathSciNetCrossRefzbMATHGoogle Scholar
  8. Croux, C., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71(2), 161–190 (1999)MathSciNetCrossRefzbMATHGoogle Scholar
  9. Croux, C., Haesbroeck, G.: Principal components analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87, 603–618 (2000)MathSciNetCrossRefzbMATHGoogle Scholar
  10. Croux, C., Gelper, S., Haesbroeck, G.: Regularized Minimum Covariance Determinant Estimator. Mimeo, New York (2012)Google Scholar
  11. Esbensen, K., Midtgaard, T., Schönkopf, S.: Multivariate Analysis in Practice: A Training Package. Camo As, Oslo (1996)Google Scholar
  12. Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(2), 432–441 (2008)CrossRefzbMATHGoogle Scholar
  13. Gnanadesikan, R., Kettenring, J.: Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 28, 81–124 (1972)CrossRefGoogle Scholar
  14. Grübel, R.: A minimal characterization of the covariance matrix. Metrika 35(1), 49–52 (1988)MathSciNetCrossRefzbMATHGoogle Scholar
  15. Hardin, J., Rocke, D.: Outlier detection in the multiple cluster setting using the minimum covariance determinant estimator. Comput. Stat. Data Anal. 44, 625–638 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  16. Hardin, J., Rocke, D.: The distribution of robust distances. J. Comput. Graph. Stat. 14(4), 928–946 (2005)MathSciNetCrossRefGoogle Scholar
  17. Hubert, M., Van Driessen, K.: Fast and robust discriminant analysis. Comput. Stat. Data Anal. 45, 301–320 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  18. Hubert, M., Rousseeuw, P., Vanden Branden, K.: ROBPCA: a new approach to robust principal components analysis. Technometrics 47, 64–79 (2005)MathSciNetCrossRefGoogle Scholar
  19. Hubert, M., Rousseeuw, P., Van Aelst, S.: High breakdown robust multivariate methods. Stat. Sci. 23, 92–119 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  20. Hubert, M., Rousseeuw, P., Verdonck, T.: A deterministic algorithm for robust location and scatter. J. Comput. Graph. Stat. 21(3), 618–637 (2012)MathSciNetCrossRefGoogle Scholar
  21. Khan, J., Van Aelst, S., Zamar, R.H.: Robust linear model selection based on least angle regression. J. Am. Stat. Assoc. 102(480), 1289–1299 (2007)MathSciNetCrossRefzbMATHGoogle Scholar
  22. Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88, 365–411 (2004)MathSciNetCrossRefzbMATHGoogle Scholar
  23. Lopuhaä, H., Rousseeuw, P.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann. Stat. 19, 229–248 (1991)MathSciNetCrossRefzbMATHGoogle Scholar
  24. Maronna, R., Zamar, R.H.: Robust estimates of location and dispersion for high-dimensional datasets. Technometrics 44(4), 307–317 (2002)MathSciNetCrossRefGoogle Scholar
  25. Öllerer, V., Croux, C.: Robust high-dimensional precision matrix estimation. In: Modern Nonparametric, Robust and Multivariate Methods, pp. 325–350. Springer (2015)Google Scholar
  26. Pison, G., Rousseeuw, P., Filzmoser, P., Croux, C.: Robust factor analysis. J. Multivar. Anal. 84, 145–172 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
  27. Rousseeuw, P.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984)MathSciNetCrossRefzbMATHGoogle Scholar
  28. Rousseeuw, P.: Multivariate estimation with high breakdown point. In: Grossmann, W., Pflug, G., Vincze, I., Wertz, W. (eds.) Mathematical Statistics and Applications, vol. B, pp. 283–297. Reidel Publishing Company, Dordrecht (1985)CrossRefGoogle Scholar
  29. Rousseeuw, P., Croux, C.: Alternatives to the median absolute deviation. J. Am. Stat. Assoc. 88(424), 1273–1283 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
  30. Rousseeuw, P., Van Driessen, K.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41, 212–223 (1999)CrossRefGoogle Scholar
  31. Rousseeuw, P., Van Zomeren, B.: Unmasking multivariate outliers and leverage points. J. Am. Stat. Assoc. 85(411), 633–639 (1990)CrossRefGoogle Scholar
  32. Rousseeuw, P., Van Aelst, S., Van Driessen, K., Agulló, J.: Robust multivariate regression. Technometrics 46, 293–305 (2004)MathSciNetCrossRefGoogle Scholar
  33. Rousseeuw, P., Croux, C., Todorov, V., Ruckstuhl, A., Salibian-Barrera, M., Verbeke, T., Koller, M., Maechler, M.: Robustbase: Basic Robust Statistics. R package version 0.92-3 (2012)Google Scholar
  34. SenGupta, A.: Tests for standardized generalized variances of multivariate normal populations of possibly different dimensions. J. Multivar. Anal. 23(2), 209–219 (1987)MathSciNetCrossRefzbMATHGoogle Scholar
  35. Sherman, J., Morrison, W.J.: Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. Ann. Math. Stat. 21(1), 124–127 (1950)MathSciNetCrossRefzbMATHGoogle Scholar
  36. Todorov, V., Filzmoser, P.: An object-oriented framework for robust multivariate analysis. J. Stat. Softw. 32(3), 1–47 (2009)CrossRefGoogle Scholar
  37. Won, J.-H., Lim, J., Kim, S.-J., Rajaratnam, B.: Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 75(3), 427–450 (2013)Google Scholar
  38. Woodbury, M.A.: Inverting modified matrices. Memo. Rep. 42, 106 (1950)Google Scholar
  39. Zhao, T., Liu, H., Roeder, K., Lafferty, J., Wasserman, L.: The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res. 13, 1059–1062 (2012)MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Department of EconomicsGhent UniversityGhentBelgium
  2. 2.Solvay Business SchoolVrije Universiteit BrusselBrusselsBelgium
  3. 3.School of Business and EconomicsVrije Universiteit AmsterdamAmsterdamThe Netherlands
  4. 4.Department of MathematicsKU LeuvenLeuvenBelgium
  5. 5.Department of MathematicsUniversiteit AntwerpenAntwerpenBelgium

Personalised recommendations