Abstract
The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended to infinite dimensional spaces. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high-dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions in general separable Hilbert spaces. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high-dimension data. An illustration on a large sample and high-dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 h confirms the interest of considering the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN.
Similar content being viewed by others
Notes
In this subsection, vectors and matrices will be denoted with bold symbols and letters to make a clear distinction with functions and operators.
References
Bali J-L, Boente G, Tyler D-E, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882
Bosq D (2000) Linear processes in function spaces, vol 149. Lecture notes in statistics, theory and applications, Springer, New York,
Cardot H, Cénac P, Chaouch M (2010) Stochastic approximation to the multivariate and the functional median. In: Lechevallier Y, Saporta G (eds) Compstat 2010. Springer, New York, pp 421–428
Cardot, H, Cénac P, Godichon-Baggioni A (2016) Online estimation of the geometric median in Hilbert spaces: non asymptotic confidence balls. Ann Stat arXiv:1501.06930
Cardot H, Cénac P, Monnez J-M (2012) A fast and recursive algorithm for clustering large datasets with k-medians. Comput Stat Data Anal 56:1434–1449
Cardot H, Cénac P, Zitt P-A (2013) Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19:18–43
Cardot H, Degras D (2015) Online principal components analysis: which algorithm to choose? Tech Rep arXiv:1511.03688
Chaudhuri P (1992) Multivariate location estimation using extension of \(R\)-estimates through \(U\)-statistics type approach. Ann Stat 20(2):897–916
Croux C, Filzmoser P, Oliveira M (2007) Algorithms for projection-pursuit robust principal component analysis. Chemometr Intell Lab Syst 87:218–225
Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226
Devlin S, Gnanadesikan R, Kettenring J (1981) Robust estimation of dispersion matrices and principal components. J Am Stat Assoc 76:354–362
Fritz H, Filzmoser P, Croux C (2012) A comparison of algorithms for the multivariate \(L_1\)-median. Comput Stat 27:393–410
Gervini D (2008) Robust functional estimation using the median and spherical principal components. Biometrika 95(3):587–600
Godichon-Baggioni A (2016) Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms; \(L^{p}\) and almost sure rates of convergence. J Multivar Anal 146:209–222
Gu M, Eisenstat S (1994) A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM J Matrix Anal Appl 15:1266–1276
Huber P, Ronchetti E (2009) Robust statistics. Wiley, Amsterdam
Hubert M, Rousseeuw P, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 13:92–119
Hyndman R, Ullah S (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51:4942–4956
Jolliffe I (2002) Principal components analysis, 2nd edn. Springer, New York
Kemperman JHB (1987) The median of a finite measure on a Banach space. In: Statistical data analysis based on the \(L_1\)-norm and related methods (Neuchâtel, 1987). North-Holland, Amsterdam, pp 217–230
Kraus D, Panaretos VM (2012) Dispersion operators and resistant second-order functional data analysis. Biometrika 99:813–832
Locantore N, Marron J, Simpson D, Tripoli N, Zhang J, Cohen K (1999) Robust principal components for functional data. Test 8:1–73
Lopuhaä HP, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248
Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley series in probability and statistics, theory and methods. Wiley, Chichester
Mokkadem A, Pelletier M (2006) Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms. Ann Appl Probab 16(3):1671–1702
Polyak B, Juditsky A (1992) Acceleration of stochastic approximation. SIAM J Control Optim 30:838–855
R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna ISBN 3-900051-07-0
Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York
Rousseeuw P, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223
Small CG (1990) A survey of multidimensional medians. Int Stat Rev/Revue Int Stat 58(3):263–277
Taskinen S, Koch I, Oja H (2012) Robustifying principal components analysis with spatial sign vectors. Stat Prob Lett 82:765–774
Vardi Y, Zhang C-H (2000) The multivariate \(L_1\)-median and associated data depth. Proc Natl Acad Sci USA 97(4):1423–1426
Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Tohoku Math J 43:355–386
Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25:1034–1040
Acknowledgements
We thank the two anonymous referees for their comments and suggestions that helped us to improve the presentation of the paper. We thank the company Médiamétrie for allowing us to illustrate our methodologies with their data. We also thank Dr. Peggy Cénac for a careful reading of the proofs.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Cardot, H., Godichon-Baggioni, A. Fast estimation of the median covariation matrix with application to online robust principal components analysis. TEST 26, 461–480 (2017). https://doi.org/10.1007/s11749-016-0519-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11749-016-0519-x