Skip to main content
Log in

Fast estimation of the median covariation matrix with application to online robust principal components analysis

  • Original Paper
  • Published:
TEST Aims and scope Submit manuscript

Abstract

The geometric median covariation matrix is a robust multivariate indicator of dispersion which can be extended to infinite dimensional spaces. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high-dimensional data without being obliged to store all the data in memory. Asymptotic convergence properties of the recursive algorithms are studied under weak conditions in general separable Hilbert spaces. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high-dimension data. An illustration on a large sample and high-dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 h confirms the interest of considering the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In this subsection, vectors and matrices will be denoted with bold symbols and letters to make a clear distinction with functions and operators.

References

  • Bali J-L, Boente G, Tyler D-E, Wang J-L (2011) Robust functional principal components: a projection-pursuit approach. Ann Stat 39:2852–2882

    Article  MathSciNet  MATH  Google Scholar 

  • Bosq D (2000) Linear processes in function spaces, vol 149. Lecture notes in statistics, theory and applications, Springer, New York,

  • Cardot H, Cénac P, Chaouch M (2010) Stochastic approximation to the multivariate and the functional median. In: Lechevallier Y, Saporta G (eds) Compstat 2010. Springer, New York, pp 421–428

    Chapter  Google Scholar 

  • Cardot, H, Cénac P, Godichon-Baggioni A (2016) Online estimation of the geometric median in Hilbert spaces: non asymptotic confidence balls. Ann Stat arXiv:1501.06930

  • Cardot H, Cénac P, Monnez J-M (2012) A fast and recursive algorithm for clustering large datasets with k-medians. Comput Stat Data Anal 56:1434–1449

    Article  MathSciNet  MATH  Google Scholar 

  • Cardot H, Cénac P, Zitt P-A (2013) Efficient and fast estimation of the geometric median in Hilbert spaces with an averaged stochastic gradient algorithm. Bernoulli 19:18–43

    Article  MathSciNet  MATH  Google Scholar 

  • Cardot H, Degras D (2015) Online principal components analysis: which algorithm to choose? Tech Rep arXiv:1511.03688

  • Chaudhuri P (1992) Multivariate location estimation using extension of \(R\)-estimates through \(U\)-statistics type approach. Ann Stat 20(2):897–916

    Article  MathSciNet  MATH  Google Scholar 

  • Croux C, Filzmoser P, Oliveira M (2007) Algorithms for projection-pursuit robust principal component analysis. Chemometr Intell Lab Syst 87:218–225

    Article  Google Scholar 

  • Croux C, Ruiz-Gazen A (2005) High breakdown estimators for principal components: the projection-pursuit approach revisited. J Multivar Anal 95:206–226

    Article  MathSciNet  MATH  Google Scholar 

  • Devlin S, Gnanadesikan R, Kettenring J (1981) Robust estimation of dispersion matrices and principal components. J Am Stat Assoc 76:354–362

    Article  MATH  Google Scholar 

  • Fritz H, Filzmoser P, Croux C (2012) A comparison of algorithms for the multivariate \(L_1\)-median. Comput Stat 27:393–410

    Article  MathSciNet  MATH  Google Scholar 

  • Gervini D (2008) Robust functional estimation using the median and spherical principal components. Biometrika 95(3):587–600

    Article  MathSciNet  MATH  Google Scholar 

  • Godichon-Baggioni A (2016) Estimating the geometric median in Hilbert spaces with stochastic gradient algorithms; \(L^{p}\) and almost sure rates of convergence. J Multivar Anal 146:209–222

    Article  MathSciNet  MATH  Google Scholar 

  • Gu M, Eisenstat S (1994) A stable and efficient algorithm for the rank-one modification of the symmetric eigenproblem. SIAM J Matrix Anal Appl 15:1266–1276

    Article  MathSciNet  MATH  Google Scholar 

  • Huber P, Ronchetti E (2009) Robust statistics. Wiley, Amsterdam

    Book  MATH  Google Scholar 

  • Hubert M, Rousseeuw P, Van Aelst S (2008) High-breakdown robust multivariate methods. Stat Sci 13:92–119

    Article  MathSciNet  MATH  Google Scholar 

  • Hyndman R, Ullah S (2007) Robust forecasting of mortality and fertility rates: a functional data approach. Comput Stat Data Anal 51:4942–4956

    Article  MathSciNet  MATH  Google Scholar 

  • Jolliffe I (2002) Principal components analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Kemperman JHB (1987) The median of a finite measure on a Banach space. In: Statistical data analysis based on the \(L_1\)-norm and related methods (Neuchâtel, 1987). North-Holland, Amsterdam, pp 217–230

  • Kraus D, Panaretos VM (2012) Dispersion operators and resistant second-order functional data analysis. Biometrika 99:813–832

    Article  MATH  Google Scholar 

  • Locantore N, Marron J, Simpson D, Tripoli N, Zhang J, Cohen K (1999) Robust principal components for functional data. Test 8:1–73

    Article  MathSciNet  MATH  Google Scholar 

  • Lopuhaä HP, Rousseeuw PJ (1991) Breakdown points of affine equivariant estimators of multivariate location and covariance matrices. Ann Stat 19(1):229–248

    Article  MathSciNet  MATH  Google Scholar 

  • Maronna RA, Martin RD, Yohai VJ (2006) Robust statistics. Wiley series in probability and statistics, theory and methods. Wiley, Chichester

  • Mokkadem A, Pelletier M (2006) Convergence rate and averaging of nonlinear two-time-scale stochastic approximation algorithms. Ann Appl Probab 16(3):1671–1702

    Article  MathSciNet  MATH  Google Scholar 

  • Polyak B, Juditsky A (1992) Acceleration of stochastic approximation. SIAM J Control Optim 30:838–855

    Article  MathSciNet  MATH  Google Scholar 

  • R Development Core Team (2010) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna ISBN 3-900051-07-0

  • Ramsay JO, Silverman BW (2005) Functional data analysis, 2nd edn. Springer, New York

    MATH  Google Scholar 

  • Rousseeuw P, van Driessen K (1999) A fast algorithm for the minimum covariance determinant estimator. Technometrics 41:212–223

    Article  Google Scholar 

  • Small CG (1990) A survey of multidimensional medians. Int Stat Rev/Revue Int Stat 58(3):263–277

    Article  Google Scholar 

  • Taskinen S, Koch I, Oja H (2012) Robustifying principal components analysis with spatial sign vectors. Stat Prob Lett 82:765–774

    Article  MathSciNet  MATH  Google Scholar 

  • Vardi Y, Zhang C-H (2000) The multivariate \(L_1\)-median and associated data depth. Proc Natl Acad Sci USA 97(4):1423–1426

    Article  MathSciNet  MATH  Google Scholar 

  • Weiszfeld E (1937) On the point for which the sum of the distances to n given points is minimum. Tohoku Math J 43:355–386

    MathSciNet  MATH  Google Scholar 

  • Weng J, Zhang Y, Hwang W-S (2003) Candid covariance-free incremental principal component analysis. IEEE Trans Pattern Anal Mach Intell 25:1034–1040

    Article  Google Scholar 

Download references

Acknowledgements

We thank the two anonymous referees for their comments and suggestions that helped us to improve the presentation of the paper. We thank the company Médiamétrie for allowing us to illustrate our methodologies with their data. We also thank Dr. Peggy Cénac for a careful reading of the proofs.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hervé Cardot.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 329 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cardot, H., Godichon-Baggioni, A. Fast estimation of the median covariation matrix with application to online robust principal components analysis. TEST 26, 461–480 (2017). https://doi.org/10.1007/s11749-016-0519-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11749-016-0519-x

Keywords

Mathematics Subject Classification

Navigation