The minimum weighted covariance determinant estimator for high-dimensional data

Kalina, Jan; Tichavský, Jan

doi:10.1007/s11634-021-00471-6

The minimum weighted covariance determinant estimator for high-dimensional data

Regular Article
Published: 07 October 2021

Volume 16, pages 977–999, (2022)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

430 Accesses
5 Citations
Explore all metrics

Abstract

In a variety of diverse applications, it is very desirable to perform a robust analysis of high-dimensional measurements without being harmed by the presence of a possibly larger percentage of outlying measurements. The minimum weighted covariance determinant (MWCD) estimator, based on implicit weights assigned to individual observations, represents a promising and flexible extension of the popular minimum covariance determinant (MCD) estimator of the expectation and scatter matrix of mlutivariate data. In this work, a regularized version of the MWCD denoted as the minimum regularized weighted covariance determinant (MRWCD) estimator is proposed. At the same time, it is accompanied by an outlier detection procedure. The novel MRWCD estimator is able to outperform other available robust estimators in several simulation scenarios, especially in estimating the scatter matrix of contaminated high-dimensional data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The minimum regularized covariance determinant estimator

Article 02 April 2019

Optimization techniques for multivariate least trimmed absolute deviation estimation

Article 25 January 2017

The minimum covariance determinant estimator for interval-valued data

Article 17 February 2024

References

Agostinelli C, Leung A, Yohai VJ, Zamar RH (2015) Robust estimation of multivariate location and scatter in the presence of cellwise and casewise contamination. TEST 24:441–461
Article MathSciNet MATH Google Scholar
Ashurbekova K, Usseglio-Carleve A, Forbes F, Achard S (2019) Optimal shrinkage for robust covariance matrix estimators in a small sample size setting. https://hal.archives-ouvertes.fr/hal-02378034
Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc B 57:289–300
MathSciNet MATH Google Scholar
Boudt K, Rousseeuw PJ, Vanduffel S, Verdonck T (2020) The minimum regularized covariance determinant estimator. Stat Comput 30:113–128
Article MathSciNet MATH Google Scholar
Cerioli A (2010) Multivariate outlier detection with high-breakdown estimators. J Am Stat Assoc 105:147–156
Article MathSciNet MATH Google Scholar
Cerioli A, Farcomeni A (2011) Error rates for multivariate outlier detection. Comput Stat Data Anal 55:544–553
Article MathSciNet MATH Google Scholar
Cerioli A, Riani M, Atkinson AC, Corbellini A (2018) The power of monitoring: how to make the most of a contaminated multivariate sample. Stat Methods Appl 27:559–587
Article MathSciNet MATH Google Scholar
Chen Y, Wiesel A, Hero AO (2011) Robust shrinkage estimation of high dimensional covariance matrices. IEEE Trans Signal Process 59:4097–4107
Article MathSciNet MATH Google Scholar
Čížek P (2011) Semiparametrically weighted robust estimation of regression models. Comput Stat Data Anal 55:774–788
Article MathSciNet MATH Google Scholar
Couillet R, McKay M (2014) Large dimensional analysis and optimization of robust shrinkage covariance matrix estimators. J Multivar Anal 131:99–120
Article MathSciNet MATH Google Scholar
DeMiguel V, Martin-Utrera A, Nogales FJ (2013) Size matters: optimal calibration of shrinkage estimators for portfolio selection. J Bank Finance 37:3018–3034
Article Google Scholar
Filzmoser P, Todorov V (2011) Review of robust multivariate statistical methods in high dimension. Anal Chinica Acta 705:2–14
Article Google Scholar
Filzmoser P, Maronna R, Werner M (2008) Outlier identification in high dimensions. Comput Stat Data Anal 52:1694–1711
Article MathSciNet MATH Google Scholar
Fritsch V, Varoquaux G, Thyreau B, Poline JB, Thirion B (2011) Detecting outlying subjects in high-dimensional neuroimaging datasets with regularized minimum covariance determinant. Lect Notes Comput Sci 6893:264–271
Article Google Scholar
Gschwandtner M, Filzmoser P (2013) Outlier detection in high dimension using regularization. In: Kruse R et al (eds) Synergies of soft computing and statistics. Springer, Berlin, pp 37–244
Google Scholar
Gschwandtner M, Filzmoser P, Croux C, Haesbroeck G (2012) rrlda: robust regularized linear discriminant analysis. R package version 1.1. https://CRAN.R-project.org/package=rrlda
Hardin J, Rocke DM (2005) The distribution of robust distances. J Comput Graph Stat 14:928–946
Article MathSciNet Google Scholar
Hastie T, Tibshirani R, Wainwright M (2015) Statistical learning with sparsity: the lasso and generalizations. CRC Press, Boca Raton
Book MATH Google Scholar
Hubert M, Debruyne M (2010) Minimal covariance determinant. Wiley Interdiscip Rev Comput Stat 2:36–43
Article Google Scholar
Hubert M, Rousseeuw PJ, Vanden Branden K (2005) ROBPCA: a new approach to robust principal component analysis. Technometrics 47:64–79
Article MathSciNet Google Scholar
Hubert M, Rousseeuw PJ, Verdonck T (2012) A deterministic algorithm for robust location and scatter. J Comput Graph Stat 21:618–637
Article MathSciNet Google Scholar
Hubert M, Debruyne M, Rousseeuw PJ (2018) Minimum covariance determinant and extensions. WIREs Comput Stat 10:e1421
Article MathSciNet Google Scholar
Jurečková J, Sen PK, Picek J (2013) Methodology in robust and nonparametric statistics. CRC Press, Boca Raton
MATH Google Scholar
Jurečková J, Picek J, Schindler M (2019) Robust statistical methods with R, 2nd edn. CRC Press, Boca Raton
Book MATH Google Scholar
Kalina J (2021) The minimum weighted covariance determinant estimator revisited. Commun Stat Simul Comput. https://doi.org/10.1080/03610918.2020.1725818
Article MathSciNet MATH Google Scholar
Kalina J, Tichavský J (2019) Statistical learning for recommending (robust) nonlinear regression methods. J Appl Math Stat Inform 15(2):47–59
Article MathSciNet MATH Google Scholar
Kalina J, Tichavský J (2020) On robust estimation of error variance in (highly) robust regression. Meas Sci Rev 20:6–14
Article Google Scholar
Kalina J, Hlinka J, (2017) Implicitly weighted robust classification applied to brain activity research. In: Fred A, Gamboa H (eds) Biomedical engineering systems and technologies BIOSTEC, (2016) Communications in Computer and Information Science 690. Springer, Cham, pp 87–107
Karjanto S, Ramli NM, Ghani NAM, Aripin R, Yusop NM (2015) Shrinkage covariance matrix approach based on robust trimmed mean in gene sets detection. AIP Conf Proc 1643:225–231
Article Google Scholar
Ledoit O, Wolf M (2004) A well-conditioned estimator for large-dimensional covariance matrices. J Multivar Anal 88:365–411
Article MathSciNet MATH Google Scholar
Lee K, You K (2019) CovTools: statistical tools for covariance analysis. R package version 0.5.3. https://CRAN.R-project.org/package=CovTools
Marozzi M, Mukherjee A, Kalina J (2020) Interpoint distance tests for high-dimensional comparison studies. J Appl Stat 47:653–665
Article MathSciNet MATH Google Scholar
Pourahmadi M (2013) High-dimensional covariance estimation. Wiley, Hoboken
Book MATH Google Scholar
R Core Team (2018) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna. http://www.R-project.org
Ro K, Zou C, Wang Z (2015) Outlier detection for high-dimensional data. Biometrika 102:589–599
Article MathSciNet MATH Google Scholar
Roelant E, Van Aelst S, Willems G (2009) The minimum weighted covariance determinant estimator. Metrika 70:177–204
Article MathSciNet MATH Google Scholar
Rousseeuw PJ (1984) Least median of squares regression. J Am Stat Assoc 79:871–880
Article MathSciNet MATH Google Scholar
Rousseeuw PJ, Croux C (1993) Alternatives to the median absolute deviation. J Am Stat Assoc 88:1273–1283
Article MathSciNet MATH Google Scholar
Rousseeuw PJ, Van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Article Google Scholar
Rousseeuw PJ, Leroy AM (1987) Robust regression and outlier detection. Wiley, New York
Book MATH Google Scholar
Rousseeuw PJ, Van Zomeren BC (1990) Unmasking multivariate outliers and leverage points. J Am Stat Assoc 85:633–639
Article Google Scholar
Rusiecki A (2008) Robust MCD-based backpropagation learning algorithm. Lect Notes Artif Intell 5097:154–163
Google Scholar
Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genet Mol Biol 4:32
Article MathSciNet Google Scholar
Todorov V, Filzmoser P (2009) An object-oriented framework for robust multivariate analysis. J Stat Softw 32(3):1–47
Article Google Scholar
Tong J, Hu R, Xi J, Xiao Z, Guo Q, Yu Y (2018) Linear shrinkage estimation of covariance matrices using low-complexity cross-validation. Signal Process 148:223–233
Article Google Scholar
Van Aelst S (2016) Stahel–Donoho estimation for high-dimensional data. Int J Comput Math 93:628–639
Article MathSciNet MATH Google Scholar
Víšek JÁ (2006) The least trimmed squares. Part I: consistency. Kybernetika 42:1–36
MathSciNet MATH Google Scholar
Víšek JÁ (2011) Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47:179–206
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The research was supported by the projects GA21-05325S and GA19-05704S of the Czech Science Foundation. The authors would like to thank Jurjen Duintjer Tebbens for discussion, and would like to thank the anonymous referees, an associate editor, and the editor-in-chief for their time and constructive advice.

Author information

Authors and Affiliations

The Czech Institute of Sciences, Institute of Computer Science, Prague 8, Czech Republic
Jan Kalina & Jan Tichavský
The Czech Institute of Sciences, Institute of Information Theory and Automation, Prague 8, Czech Republic
Jan Kalina

Authors

Jan Kalina
View author publications
You can also search for this author in PubMed Google Scholar
Jan Tichavský
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jan Kalina.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kalina, J., Tichavský, J. The minimum weighted covariance determinant estimator for high-dimensional data. Adv Data Anal Classif 16, 977–999 (2022). https://doi.org/10.1007/s11634-021-00471-6

Download citation

Received: 14 January 2021
Revised: 25 August 2021
Accepted: 27 September 2021
Published: 07 October 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s11634-021-00471-6

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The minimum weighted covariance determinant estimator for high-dimensional data

Abstract

Access this article

Similar content being viewed by others

The minimum regularized covariance determinant estimator

Optimization techniques for multivariate least trimmed absolute deviation estimation

The minimum covariance determinant estimator for interval-valued data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

The minimum weighted covariance determinant estimator for high-dimensional data

Abstract

Access this article

Similar content being viewed by others

The minimum regularized covariance determinant estimator

Optimization techniques for multivariate least trimmed absolute deviation estimation

The minimum covariance determinant estimator for interval-valued data

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation