Abstract
Outliers contaminating data sets are a challenge to statistical estimators. Even a small fraction of outlying observations can heavily influence most classical statistical methods. In this paper we propose generalized spherical principal component analysis, a new robust version of principal component analysis that is based on the generalized spatial sign covariance matrix. Theoretical properties of the proposed method including influence functions, breakdown values and asymptotic efficiencies are derived. These theoretical results are complemented with an extensive simulation study and two real-data examples. We illustrate that generalized spherical principal component analysis can combine great robustness with solid efficiency properties, in addition to a low computational cost.
Similar content being viewed by others
References
Baik, J., Silverstein, J.W.: Eigenvalues of large sample covariance matrices of spiked population models. J. Multivar. Anal. 97(6), 1382–1408 (2006)
Baik, J., Ben Arous, G., Péché, S.: Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33(5), 1643–1697 (2005)
Bickel, P.J., Kur, G., Nadler, B.: Projection pursuit in high dimensions. Proc. Natl. Acad. Sci. 115(37), 9151–9156 (2018)
Billor, N., Hadi, A.S., Velleman, P.F.: Bacon: blocked adaptive computationally efficient outlier nominators. Comput. Stat. Data Anal. 34(3), 279–298 (2000). https://doi.org/10.1016/S0167-9473(99)00101-2
Boente, G.: Asymptotic theory for robust principal components. J. Multivar. Anal. 21(1), 67–78 (1987). https://doi.org/10.1016/0047-259X(87)90099-6
Boente, G., Rodriguez, D., Sued, M.: The spatial sign covariance operator: asymptotic results and applications. J. Multivar. Anal. 170, 115–128 (2019). https://doi.org/10.1016/j.jmva.2018.10.002
Bro, R.: Parafac. Tutorial and applications. Chemom. Intell. Lab. Syst. 38(2), 149–171 (1997). https://doi.org/10.1016/S0169-7439(97)00032-4. (ISSN 0169-7439)
Butler, R., Davies, P., Jhun, M.: Asymptotics for the minimum covariance determinant estimator. Ann. Stat. (1993). https://doi.org/10.1214/aos/1176349264
Cai, T.T., Liang, T., Zhou, H.H.: Law of log determinant of sample covariance matrix and optimal estimation of differential entropy for high-dimensional gaussian distributions. J. Multivar. Anal. 137, 161–172 (2015). https://doi.org/10.1016/j.jmva.2015.02.003. (ISSN 0047-259X)
Campbell, N.A.: Robust procedures in multivariate analysis I: robust covariance estimation. J. R. Stat. Soc. Ser. C (Appl. Stat.) 29(3), 231–237 (1980). https://doi.org/10.2307/2346896
Cator, E.A., Lopuhaä, H.P.: Central limit theorem and influence function for the mcd estimators at general multivariate distributions. Bernoulli 18(2), 520–551 (2012)
Croux, C., Haesbroeck, G.: Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivar. Anal. 71(2), 161–190 (1999). https://doi.org/10.1006/jmva.1999.1839
Croux, C., Haesbroeck, G.: Principal component analysis based on robust estimators of the covariance or correlation matrix: influence functions and efficiencies. Biometrika 87(3), 603–618 (2000). https://doi.org/10.1093/biomet/87.3.603
Croux, C., Ruiz-Gazen, A.: High breakdown estimators for principal components: the projection-pursuit approach revisited. J. Multivar. Anal. 95, 206–226 (2005). https://doi.org/10.1016/j.jmva.2004.08.002
Croux, C., Ollila, E., Oja, H.: Sign and rank covariance matrices: statistical properties and application to principal components analysis. In: Statistical Data Analysis Based on the L1-Norm and Related Methods, pp. 257–269. Springer (2002). https://doi.org/10.1007/978-3-0348-8201-9_22
Croux, C., Filzmoser, P., Oliveira, M.R.: Algorithms for projection-pursuit robust principal component analysis. Chemom. Intell. Lab. Syst. 87(2), 218–225 (2007). https://doi.org/10.1016/j.chemolab.2007.01.004
Croux, C., Dehon, C., Yadine, A.: The k-step spatial sign covariance matrix. Adv. Data Anal. Classif. 4, 137–150 (2010). https://doi.org/10.1007/s11634-010-0062-7
Davies, P.L.: Asymptotic behaviour of s-estimates of multivariate location parameters and dispersion matrices. Ann. Stat. 1269–1292 (1987). https://www.jstor.org/stable/2241828
Davies, L.: The asymptotics of Rousseeuw’s minimum volume ellipsoid estimator. Ann. Stat. 20(4), 1828–1843 (1992). https://doi.org/10.1214/aos/1176348891
Dürre, A., Vogel, D., Tyler, D.E.: The spatial sign covariance matrix with unknown location. J. Multivar. Anal. 130, 107–117 (2014). https://doi.org/10.1016/j.jmva.2014.05.004
Dürre, A., Tyler, D.E., Vogel, D.: On the eigenvalues of the spatial sign covariance matrix in more than two dimensions. Stat. Probab. Lett. 111, 80–85 (2016). https://doi.org/10.1016/j.spl.2016.01.009
Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., Stahel, W.A.: Robust Statistics: The Approach Based on Influence Functions. Wiley, New York (1986)
Hubert, M., Rousseeuw, P., Vanden Branden, K.: Robpca: a new approach to robust principal component analysis. Technometrics 47(1), 64–79 (2005). https://doi.org/10.1198/004017004000000563
Hubert, M., Rousseeuw, P., Vakili, K.: Shape bias of robust covariance estimators: an empirical study. Stat. Pap. 55, 15–28 (2014)
Johnstone, I.M., Paul, D.: PCA in high dimensions: an orientation. Proc. IEEE 106(8), 1277–1292 (2018)
Johnstone, I.M., Lu, A.Y., Nadler, B., Witten, D.M., Hastie, T., Tibshirani, R., Ramsay, J.O.: On consistency and sparsity for principal components analysis in high dimensions [with comments]. J. Am. Stat. Assoc. 104(486), 682–703 (2009). (ISSN 01621459)
Li, G., Chen, Z.: Projection-pursuit approach to robust dispersion matrices and principal components: primary theory and Monte Carlo. J. Am. Stat. Assoc. 80(391), 759–766 (1985). https://doi.org/10.2307/2288497
Li, L., Huang, W., Gu, I.Y.-H., Tian, Q.: Statistical modeling of complex backgrounds for foreground object detection. IEEE Trans. Image Process. 13(11), 1459–1472 (2004). https://doi.org/10.1109/TIP.2004.836169
Locantore, N., Marron, J., Simpson, D., Tripoli, N., Zhang, J., Cohen, K.: Robust principal component analysis for functional data. Test 8(1), 1–73 (1999). https://doi.org/10.1007/BF02595862
Louvet, G., Raymaekers, J., Van Bever, G., Wilms, I.: The influence function of graphical lasso estimators. Econom. Stat. (2023). https://doi.org/10.1016/j.ecosta.2023.03.004. (ISSN 2452-3062)
Magyar, A.F., Tyler, D.E.: The asymptotic inadmissibility of the spatial sign covariance matrix for elliptically symmetric distributions. Biometrika 101(3), 673–688 (2014). https://doi.org/10.1093/biomet/asu020
Marden, J.I.: Some robust estimates of principal components. Stat. Probab. Lett. 43(4), 349–359 (1999). https://doi.org/10.1016/S0167-7152(98)00272-7
Nordhausen, K., Oja, H., Tyler, D.E.: Asymptotic and bootstrap tests for subspace dimension. J. Multivar. Anal. 188, 104830 (2022). https://doi.org/10.1016/j.jmva.2021.104830. (50th Anniversary Jubilee Edition, ISSN 0047-259X)
Paul, D.: Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sin. 17, 1617–1642 (2007)
Pires, A., Branco, J.: High dimensionality: the latest challenge to data analysis. arXiv preprintarXiv:1902.04679 (2019)
Raymaekers, J., Rousseeuw, P.: A generalized spatial sign covariance matrix. J. Multivar. Anal. 171, 94–111 (2019). https://doi.org/10.1016/j.jmva.2018.11.010
Rousseeuw, P.J.: Least median of squares regression. J. Am. Stat. Assoc. 79(388), 871–880 (1984). https://doi.org/10.1080/01621459.1984.10477105
Rousseeuw, P., Yohai, V.: Robust regression by means of s-estimators. In: Robust and Nonlinear Time Series Analysis, pp. 256–272. Springer (1984). https://doi.org/10.1007/978-1-4615-7821-5_15
Rousseeuw, P.J., Raymaekers, J., Hubert, M.: A measure of directional outlyingness with applications to image data and video. J. Comput. Graph. Stat. 27(2), 345–359 (2018). https://doi.org/10.1080/10618600.2017.1366912
Sibson, R.: Studies in the robustness of multidimensional scaling: perturbational analysis of classical scaling. J. R. Stat. Soc. Ser. B Methodol. 41(2), 217–229 (1979). https://doi.org/10.1111/j.2517-6161.1979.tb01076.x
Taskinen, S., Koch, I., Oja, H.: Robustifying principal component analysis with spatial sign vectors. Stat. Probab. Lett. 82(4), 765–774 (2012). https://doi.org/10.1016/j.spl.2012.01.001
Visuri, S., Oja, H., Koivunen, V.: Subspace-based direction-of-arrival estimation using nonparametric statistics. IEEE Trans. Signal Process. 49(9), 2060–2073 (2001). https://doi.org/10.1109/78.942634
Acknowledgements
First author is supported by Fonds Wetenschappelijk onderzoek - Vlaanderen (FWO) as a PhD fellow Fundamental Research (PhD fellowship 11K5523N).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Leyder, S., Raymaekers, J. & Verdonck, T. Generalized spherical principal component analysis. Stat Comput 34, 104 (2024). https://doi.org/10.1007/s11222-024-10413-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11222-024-10413-9