Advertisement

TEST

, Volume 26, Issue 3, pp 461–480 | Cite as

Fast estimation of the median covariation matrix with application to online robust principal components analysis

  • Hervé CardotEmail author
  • Antoine Godichon-Baggioni
Original Paper

Abstract

The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended to infinite dimensional spaces. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high-dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions in general separable Hilbert spaces. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high-dimension data. An illustration on a large sample and high-dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 h confirms the interest of considering the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN.

Keywords

Functional data Geometric median \(L_1\)-median Recursive robust estimation Stochastic gradient 

Mathematics Subject Classification

62G05 62L20 

Notes

Acknowledgements

We thank the two anonymous referees for their comments and suggestions that helped us to improve the presentation of the paper. We thank the company Médiamétrie for allowing us to illustrate our methodologies with their data. We also thank Dr. Peggy Cénac for a careful reading of the proofs.

Supplementary material

11749_2016_519_MOESM1_ESM.pdf (329 kb)
Supplementary material 1 (pdf 329 KB)

References

  1. Bali J-L, Boente G, Tyler D-E, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882MathSciNetCrossRefzbMATHGoogle Scholar
  2. Bosq D (2000) Linear processes in function spaces, vol 149. Lecture notes in statistics, theory and applications, Springer, New York,Google Scholar
  3. Cardot H, Cénac P, Chaouch M (2010) Stochastic approximation to the multivariate and the functional median. In: Lechevallier Y, Saporta G (eds) Compstat 2010. Springer, New York, pp 421–428CrossRefGoogle Scholar
  4. Cardot, H, Cénac P, Godichon-Baggioni A (2016) Online estimation of the geometric median in Hilbert spaces: non asymptotic confidence balls. Ann Stat arXiv:1501.06930
  5. Cardot H, Cénac P, Monnez J-M (2012) A fast and recursive algorithm for clustering large datasets with k-medians. Comput Stat Data Anal 56:1434–1449MathSciNetCrossRefzbMATHGoogle Scholar
  6. Cardot H, Cénac P, Zitt P-A (2013) Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19:18–43MathSciNetCrossRefzbMATHGoogle Scholar
  7. Cardot H, Degras D (2015) Online principal components analysis: which algorithm to choose? Tech Rep arXiv:1511.03688
  8. Chaudhuri P (1992) Multivariate location estimation using extension of \(R\)-estimates through \(U\)-statistics type approach. Ann Stat 20(2):897–916MathSciNetCrossRefzbMATHGoogle Scholar
  9. Croux C, Filzmoser P, Oliveira M (2007) Algorithms for projection-pursuit robust principal component analysis. Chemometr Intell Lab Syst 87:218–225CrossRefGoogle Scholar
  10. Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226MathSciNetCrossRefzbMATHGoogle Scholar
  11. Devlin S, Gnanadesikan R, Kettenring J (1981) Robust estimation of dispersion matrices and principal components. J Am Stat Assoc 76:354–362CrossRefzbMATHGoogle Scholar
  12. Fritz H, Filzmoser P, Croux C (2012) A comparison of algorithms for the multivariate \(L_1\)-median. Comput Stat 27:393–410MathSciNetCrossRefzbMATHGoogle Scholar
  13. Gervini D (2008) Robust functional estimation using the median and spherical principal components. Biometrika 95(3):587–600MathSciNetCrossRefzbMATHGoogle Scholar
  14. Godichon-Baggioni A (2016) Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms; \(L^{p}\) and almost sure rates of convergence. J Multivar Anal 146:209–222MathSciNetCrossRefzbMATHGoogle Scholar
  15. Gu M, Eisenstat S (1994) A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM J Matrix Anal Appl 15:1266–1276MathSciNetCrossRefzbMATHGoogle Scholar
  16. Huber P, Ronchetti E (2009) Robust statistics. Wiley, AmsterdamCrossRefzbMATHGoogle Scholar
  17. Hubert M, Rousseeuw P, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 13:92–119MathSciNetCrossRefzbMATHGoogle Scholar
  18. Hyndman R, Ullah S (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51:4942–4956MathSciNetCrossRefzbMATHGoogle Scholar
  19. Jolliffe I (2002) Principal components analysis, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  20. Kemperman JHB (1987) The median of a finite measure on a Banach space. In: Statistical data analysis based on the \(L_1\)-norm and related methods (Neuchâtel, 1987). North-Holland, Amsterdam, pp 217–230Google Scholar
  21. Kraus D, Panaretos VM (2012) Dispersion operators and resistant second-order functional data analysis. Biometrika 99:813–832CrossRefzbMATHGoogle Scholar
  22. Locantore N, Marron J, Simpson D, Tripoli N, Zhang J, Cohen K (1999) Robust principal components for functional data. Test 8:1–73MathSciNetCrossRefzbMATHGoogle Scholar
  23. Lopuhaä HP, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248MathSciNetCrossRefzbMATHGoogle Scholar
  24. Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley series in probability and statistics, theory and methods. Wiley, ChichesterGoogle Scholar
  25. Mokkadem A, Pelletier M (2006) Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms. Ann Appl Probab 16(3):1671–1702MathSciNetCrossRefzbMATHGoogle Scholar
  26. Polyak B, Juditsky A (1992) Acceleration of stochastic approximation. SIAM J Control Optim 30:838–855MathSciNetCrossRefzbMATHGoogle Scholar
  27. R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna ISBN 3-900051-07-0Google Scholar
  28. Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New YorkzbMATHGoogle Scholar
  29. Rousseeuw P, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223CrossRefGoogle Scholar
  30. Small CG (1990) A survey of multidimensional medians. Int Stat Rev/Revue Int Stat 58(3):263–277CrossRefGoogle Scholar
  31. Taskinen S, Koch I, Oja H (2012) Robustifying principal components analysis with spatial sign vectors. Stat Prob Lett 82:765–774MathSciNetCrossRefzbMATHGoogle Scholar
  32. Vardi Y, Zhang C-H (2000) The multivariate \(L_1\)-median and associated data depth. Proc Natl Acad Sci USA 97(4):1423–1426MathSciNetCrossRefzbMATHGoogle Scholar
  33. Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Tohoku Math J 43:355–386MathSciNetzbMATHGoogle Scholar
  34. Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25:1034–1040CrossRefGoogle Scholar

Copyright information

© Sociedad de Estadística e Investigación Operativa 2016

Authors and Affiliations

  1. 1.Institut de Mathématiques de BourgogneUniversité de Bourgogne Franche-ComtéDijonFrance

Personalised recommendations