Probability Theory and Related Fields

, Volume 161, Issue 3–4, pp 781–815 | Cite as

Optimal estimation and rank detection for sparse spiked covariance matrices

  • Tony Cai
  • Zongming Ma
  • Yihong Wu


This paper considers a sparse spiked covariance matrix model in the high-dimensional setting and studies the minimax estimation of the covariance matrix and the principal subspace as well as the minimax rank detection. The optimal rate of convergence for estimating the spiked covariance matrix under the spectral norm is established, which requires significantly different techniques from those for estimating other structured covariance matrices such as bandable or sparse covariance matrices. We also establish the minimax rate under the spectral norm for estimating the principal subspace, the primary object of interest in principal component analysis. In addition, the optimal rate for the rank detection boundary is obtained. This result also resolves the gap in a recent paper by Berthet and Rigollet (Ann Stat 41(4):1780–1815, 2013) where the special case of rank one is considered.


Covariance matrix Group sparsity Low-rank matrix  Minimax rate of convergence Sparse principal component analysis Principal subspace Rank detection 

Mathematics Subject Classification (2010)

Primary 62H12 Secondary 62F12 62G09 


  1. 1.
    Berthet, Q., Rigollet, P.: Complexity theoretic lower bounds for sparse principal component detection. J. Mach. Learn. Res. Workshop Conf. Proc. 30, 1–21 (2013)Google Scholar
  2. 2.
    Berthet, Q., Rigollet, P.: Optimal detection of sparse principal components in high dimension. Ann. Stat. 41(4), 1780–1815 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Bickel, P., Levina, E.: Covariance regularization by thresholding. Ann. Stat. 36(6), 2577–2604 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Bickel, P., Levina, E.: Regularized estimation of large covariance matrices. Ann. Stat. 36(1), 199–227 (2008)CrossRefzbMATHMathSciNetGoogle Scholar
  5. 5.
    Birnbaum, A., Johnstone, I., Nadler, B., Paul, D.: Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Stat. 41, 1055–1084 (2013)Google Scholar
  6. 6.
    Bretagnolle, J., Massart, P.: Hungarian constructions from the nonasymptotic viewpoint. Ann. Probab. 17(1), 239–256 (1989)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Bunea, F., Xiao, L.: On the sample covariance matrix estimator of reduced effective rank population matrices, with applications to fPCA. arXiv, preprint arXiv:1212.5321 (2012)
  8. 8.
    Cai, T., Liu, W.: Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106, 672–684 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Cai, T., Liu, W., Zhou, H.: Estimating sparse precision matrix: optimal rates of convergence and adaptive estimation. arXiv, preprint arXiv:1212.2882 (2012)
  10. 10.
    Cai, T., Ma, Z., Wu, Y.: Sparse PCA: optimal rates and adaptive estimation. Ann. Stat. 41, 3074–3110 (2013)Google Scholar
  11. 11.
    Cai, T., Ren, Z., Zhou, H.: Optimal rates of convergence for estimating toeplitz covariance matrices. Probability Theory and Related Fields. pp. 1–43 (2012)Google Scholar
  12. 12.
    Cai, T., Zhang, C.H., Zhou, H.: Optimal rates of convergence for covariance matrix estimation. Ann. Stat. 38(4), 2118–2144 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Cai, T., Zhou, H.: Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat. 40, 2389–2420 (2012)Google Scholar
  14. 14.
    Davidson, K., Szarek, S.: Handbook on the Geometry of Banach Spaces, vol. 1, chap. Local operator theory, random matrices and Banach spaces, pp. 317–366. Elsevier Science (2001)Google Scholar
  15. 15.
    Davis, C., Kahan, W.: The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7(1), 1–46 (1970)CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Fan, J., Fan, Y., Lv, J.: High dimensional covariance matrix estimation using a factor model. J. Econom. 147(1), 186–197 (2008)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Friedman, J., Hastie, T., Tibshirani, R.: Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9(3), 432–441 (2008)CrossRefzbMATHGoogle Scholar
  18. 18.
    Horn, R.A., Johnson, C.R.: Matrix analysis. Cambridge University Press, Cambridge (1990)zbMATHGoogle Scholar
  19. 19.
    Johnstone, I.: On the distribution of the largest eigenvalue in principal component analysis. Ann. Stat. 29, 295–327 (2001)CrossRefzbMATHMathSciNetGoogle Scholar
  20. 20.
    Johnstone, I., Lu, A.: On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104, 682–693 (2009)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Jung, S., Marron, J.: PCA consistency in high dimension, low sample size context. Ann. Stat. 37(6B), 4104–4130 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  22. 22.
    Karoui, N.: Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Stat. 36, 2717–2756 (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    Klenke, A., Mattner, L.: Stochastic ordering of classical discrete distributions. Adv. Appl. Probab. 42(2), 392–410 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Kritchman, S., Nadler, B.: Determining the number of components in a factor model from limited noisy data. Chemom. Intell. Lab. Systems 94(1), 19–32 (2008)CrossRefGoogle Scholar
  25. 25.
    Kritchman, S., Nadler, B.: Non-parametric detection of the number of signals: hypothesis testing and random matrix theory. IEEE Trans. Signal Process. 57(10), 3930–3941 (2009)Google Scholar
  26. 26.
    Lam, C., Fan, J.: Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37, 4254–4278 (2009)CrossRefzbMATHMathSciNetGoogle Scholar
  27. 27.
    Le Cam, L.: Asymptotic methods in statistical heory. Springer-Verlag New York Inc, New York (1986)Google Scholar
  28. 28.
    Lounici, K.: High-dimensional covariance matrix estimation with missing observations. arXiv, preprint arXiv:1201.2577 (2012)
  29. 29.
    Lounici, K.: Sparse principal component analysis with missing observations. arXiv, preprint arXiv:1205.7060 (2012)
  30. 30.
    Lounici, K., Pontil, M., Van De Geer, S., Tsybakov, A.: Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39(4), 2164–2204 (2011)CrossRefzbMATHGoogle Scholar
  31. 31.
    Ma, Z.: Sparse principal component analysis and iterative thresholding. Ann. Stat. 41, 772–801 (2013)Google Scholar
  32. 32.
    Onatski, A.: Asymptotics of the principal components estimator of large factor models with weak factors. J. Econom. 168, 244–258 (2012)CrossRefMathSciNetGoogle Scholar
  33. 33.
    Onatski, A., Moreira, M., Hallin, M.: Signal detection in high dimension: the multispiked case. Ann. Stat. 42, 225–254 (2014)Google Scholar
  34. 34.
    Patterson, N., Price, A., Reich, D.: Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006). doi: 10.1371/journal.pgen.0020190 CrossRefGoogle Scholar
  35. 35.
    Paul, D.: Asymptotics of sample eigenstruture for a large dimensional spiked covariance model. Stat. Sinica 17(4), 1617–1642 (2007)zbMATHGoogle Scholar
  36. 36.
    Ravikumar, P., Wainwright, M., Raskutti, G., Yu, B.: High-dimensional covariance estimation by minimizing \(l_{1}\)-penalized log-determinant divergence. Electron. J. Stat. 5, 935–980 (2011)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Stewart, G., Sun, J.G.: Matrix Perturbation Theory. Computer science and scientific computing. Academic Press, San Diego, CA (1990)Google Scholar
  38. 38.
    Tsybakov, A.: Introduction to nonparametric estimation. Springer, Berlin (2009)CrossRefzbMATHGoogle Scholar
  39. 39.
    Vu, V., Lei, J.: Minimax rates of estimation for sparse PCA in high dimensions. In: the Fifteenth International Conference on Artificial Intelligence and Statistics (AISTATS’12) (2012).
  40. 40.
    Vu, V., Lei, J.: Minimax sparse principal subspace estimation in high dimensions. Ann. Stat. 41, 2905–2947 (2013)Google Scholar
  41. 41.
    Yuan, M.: High dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 99, 2261–2286 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  1. 1.Department of Statistics, The Wharton SchoolUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Department of Electrical and Computer EngineeringUniversity of Illinois Urbana-ChampaignUrbanaUSA

Personalised recommendations