Abstract
Effective estimation of covariance matrices is crucial for statistical analyses and applications. In this paper, we focus on the robust estimation of covariance matrix for interval-valued data in low and moderately high dimensions. In the low-dimensional scenario, we extend the Minimum Covariance Determinant (MCD) estimator to interval-valued data. We derive an iterative algorithm for computing this estimator, demonstrate its convergence, and theoretically establish that it retains the high breakdown-point property of the MCD estimator. Further, we propose a projection-based estimator and a regularization-based estimator to extend the MCD estimator to moderately high-dimensional settings, respectively. We propose efficient iterative algorithms for solving these two estimators and demonstrate their convergence properties. We conduct extensive simulation studies and real data analysis to validate the finite sample properties of these proposed estimators.
Similar content being viewed by others
References
Agulló, J., Croux, C., Van Aelst, S.: The multivariate least-trimmed squares estimator. J. Multivar. Anal. 99(3), 311–338 (2008)
Anderson, T.W.: An introduction to multivariate statistical analysis, vol. 2. Wiley, New York (1958)
Aubry, A., De Maio, A., Pallotta, L., et al.: Maximum likelihood estimation of a structured covariance matrix with a condition number constraint. IEEE Trans. Signal Process. 60(6), 3004–3021 (2012)
Avella-Medina, M., Battey, H.S., Fan, J., et al.: Robust estimation of high-dimensional covariance and precision matrices. Biometrika 105(2), 271–284 (2018)
Bertrand, P., Goupil, F.: Descriptive statistics for symbolic data. In: Analysis of symbolic data. Springer, p 106–124 (2000)
Bickel, P.J., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36, 199–227 (2008)
Billard, L.: Sample covariance functions for complex quantitative data. In: Proceedings of World IASC Conference, Yokohama, Japan, pp 157–163 (2008)
Billard, L., Diday, E.: From the statistics of data to the statistics of knowledge: symbolic data analysis. J. Am. Stat. Assoc. 98(462), 470–487 (2003)
Blanco-Fernández, A., Corral, N., González-Rodríguez, G.: Estimation of a flexible simple linear model for interval data based on set arithmetic. Computat. Statist. Data Anal. 55(9), 2568–2578 (2011)
Boudt, K., Rousseeuw, P.J., Vanduffel, S., et al.: The minimum regularized covariance determinant estimator. Stat. Comput. 30(1), 113–128 (2020)
Bühlmann, P., Van De Geer, S.: Statistics for high-dimensional data: methods, theory and applications. Springer Science & Business Media (2011)
Butler, R., Davies, P., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. The Annals of Statistics pp 1385–1400 (1993)
Cai, T.T., Zhang, C.H., Zhou, H.H.: Optimal rates of convergence for covariance matrix estimation. Ann. Stat. 38, 2118–2144 (2010)
Cator, E.A., Lopuhaä, H.P.: Central limit theorem and influence function for the mcd estimators at general multivariate distributions. Bernoulli 18(2), 520–551 (2012)
Cazes, P., Chouakria, A., Diday, E., et al.: Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique appliquée 45(3), 5–24 (1997)
Cazes, P., Chouakria, A., Diday, E., et al.: Extension de l’analyse en composantes principales à des données de type intervalle. Revue de Statistique Appliquée 45(3), 5–24 (1997)
Chou, RY.: Forecasting financial volatilities with extreme values: the conditional autoregressive range (carr) model. J. Money Credit Bank, 561–582 (2005)
Croux, C., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71(2), 161–190 (1999)
Davies, P.L.: Asymptotic behaviour of S-estimates of multivariate location parameters and dispersion matrices. Annals Stat. 1269–1292 (1987)
Diamond, P.: Least squares fitting of compact set-valued data. J. Math. Anal. Appl. 147(2), 351–362 (1990)
Efron, B., Hastie, T.J., Johnstone, I.M., et al.: Least angle regression. Ann. Stat. 32, 407–499 (2004)
Fan, J., Liao, Y., Liu, H.: An overview of the estimation of large covariance and precision matrices. Economet. J. 19(1), C1–C32 (2016)
Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)
Furrer, R., Bengtsson, T.: Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal. 98, 227–255 (2007)
Gil, M.Á., López-García, M.T., Lubiano, M.A., et al.: Regression and correlation analyses of a linear relation between random intervals. TEST 10, 183–201 (2001)
Golan, A., Ullah, A.: Interval estimation: an info-metrics approach. Econ. Rev. (2015)
González-Rivera, G., Lin, W.: Constrained regression for interval-valued data. J. Bus. Econ. Stat. 31(4), 473–490 (2013)
Han, A., Hong, Y., Wang, S., et al.: A vector autoregressive moving average model for interval-valued time series data. In: Essays in Honor of Aman Ullah. Emerald Group Publishing Limited (2016)
Huang, C.C., Liu, K., Pope, R.M., et al.: Activated TLR signaling in atherosclerosis among women with lower Framingham risk score: the multi-ethnic study of atherosclerosis. PLoS ONE 6(6), e21067 (2011)
Huber, PJ.: Robust statistics. In: International encyclopedia of statistical science. Springer, 1248–1251 (2011)
Huber, P.J., Donoho, D.: The notion of breakdown point. A Festschrift for Erich L Lehmann (1983)
Hubert, M., Debruyne, M.: Minimum covariance determinant. Wiley interdisciplinary reviews: Computational statistics 2(1), 36–43 (2010)
Kent, J.T., Tyler, D.E.: Constrained M-estimation for multivariate location and scatter. Ann. Stat. 24(3), 1346–1370 (1996)
Ledoit, O., Wolf, M.: A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 88(2), 365–411 (2004)
Lopuhaa, H.P., Rousseeuw, P.J.: Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Annals Statist. 229–248 (1991)
Maronna, R.A., Yohai, V.J.: Robust estimation of multivariate location and scatter, pp. 1–12. Statistics Reference Online, Wiley StatsRef (2014)
Molchanov, I., Molinari, F.: Random sets in econometrics, vol. 60. Cambridge University Press, Cambridge (2018)
Ogata, H., Goto, S., Sato, K., et al.: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 27(1), 29–34 (1999)
Ramos-Guajardo, A.B., Grzegorzewski, P.: Distance-based linear discriminant analysis for interval-valued data. Inf. Sci. 372, 591–607 (2016)
Rousseeuw, P.J.: Multivariate estimation with high breakdown point. Math. Stat. Appl. 8(283–297), 37 (1985)
Rousseeuw, P.J., Driessen, K.V.: A fast algorithm for the minimum covariance determinant estimator. Technometrics 41(3), 212–223 (1999)
Rousseeuw, P.J., Leroy, A.M.: Robust regression and outlier detection. Wiley (2005)
Sun, Y., Zhang, X., Wan, A.T., et al.: Model averaging for interval-valued data. Eur. J. Oper. Res. 301(2), 772–784 (2022)
Tatsuoka, KS., Tyler, DE.: On the uniqueness of S-functionals and M-functionals under nonelliptical distributions. Annal Stat pp 1219–1243 (2000)
Todorov, V., Filzmoser, P.: An object-oriented framework for robust multivariate analysis. J. Stat. Softw. 32(3), 1–47 (2009)
Tsiatis, A.A.: Estimating regression parameters using linear rank tests for censored data. Annal Stat. 354–372 (1990)
Van Aelst, S., Rousseeuw, P.: Minimum volume ellipsoid. Wiley interdisciplinary reviews: computational statistics 1(1), 71–82 (2009)
Wang, H., Guan, R., Wu, J.: CIPCA: complete-information-based principal component analysis for interval-valued data. Neurocomputing 86, 158–169 (2012)
Weisberg, H.: The distribution of linear combinations of order statistics from the uniform distribution. Ann. Math. Stat. 42(2), 704–709 (1971)
Wit, E.C., Abbruzzo, A.: Inferring slowly-changing dynamic gene-regulatory networks. BMC Bioinf 16(S6) (2015)
Won, J.H., Lim, J., Kim, S.J., et al.: Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B Stat Methodol. 75(3), 427–450 (2013)
Wu, W.B., Pourahmadi, M.: Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90(4), 831–844 (2003)
Xue, L., Ma, S., Zou, H.: Positive-definite \(\ell \)-penalized estimation of large covariance matrices. J. Am. Stat. Assoc. 107(500), 1480–1491 (2012)
Zhang, J., Liu, M., Dong, M.: Variational Bayesian inference for interval regression with an asymmetric Laplace distribution. Neurocomputing 323, 214–230 (2019)
Zuo, Y., Cui, H., He, X.: On the Stahel-Donoho estimator and depth-weighted means of multivariate data. Ann. Stat. 32(1), 167–188 (2004)
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 72071008) and the Open Research Fund of Key Laboratory of Advanced Theory and Application in Statistics and Data Science (East China Normal University), Ministry of Education.
Author information
Authors and Affiliations
Contributions
WT Conceptualization, Methodology, Software, Validation, Formal analysis, Investigation, Writing original draft. ZQ Conceptualization, Methodology, Validation, Supervision,Writing-review & editing.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animal rights
This article does not contain any studies with human participants performed by any of the authors.
Informed consent
Informed consent was obtained from all individual participants included in the study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Tian, W., Qin, Z. The minimum covariance determinant estimator for interval-valued data. Stat Comput 34, 80 (2024). https://doi.org/10.1007/s11222-024-10386-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10386-9