Skip to main content
Log in

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

  • Published:
Acta Mathematicae Applicatae Sinica, English Series Aims and scope Submit manuscript

Abstract

In this paper, we propose a criterion based on the variance variation of the sample eigenvalues to correctly estimate the number of significant components in high-dimensional principal component analysis (PCA), and it corresponds to the number of significant eigenvalues of the covariance matrix for p-dimensional variables. Using the random matrix theory, we derive that the consistent properties of the proposed criterion for the situations that the significant eigenvalues tend to infinity, as well as that the bounded significant population eigenvalues. Numerical simulation shows that the probability of estimator is correct by our variance variation criterion converges to 1 is faster than that by criterion of Passemier and Yao [Estimation of the number of spikes, possibly equal, in the high-dimensional case. J. Multivariate Anal., (2014)](PYC), AIC and BIC under the finite fourth moment condition as the dominant population eigenvalues tend to infinity. Moreover, in the case of the maximum eigenvalue bounded, once the gap condition is satisfied, the rate of convergence to 1 is faster than that of PYC and AIC, especially the effect is better than AIC when the sample size is small. It is worth noting that the variance variation criterion significantly improves the accuracy of model selection compared with PYC and AIC when the random variable is a heavy-tailed distribution or finite fourth moment not exists.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Akaike, H. Information theory and an extension of the maximum likelihood principle. In: 2nd Information Symposium on Information Theory. Springer, Berlin, 1973

    MATH  Google Scholar 

  2. Anderson, T.W. Asymptotic theory for principal components. Annals of Mathematical Statistics, 34: 122–148 (1963)

    Article  MathSciNet  Google Scholar 

  3. Bai, Z.D., Choi, K.P., Fujikoshi, Y. Consistency of aic and bic in estimating the number of significant components in high-dimensional principal component analysis. Annals of Statistics, 46: 1050–1076 (2018)

    MathSciNet  MATH  Google Scholar 

  4. Bai, Z.D., Silverstein, J.W. Spectral analysis of large dimensional random matrices (Second Edition). Springer, Berlin, 2010

    Book  Google Scholar 

  5. Bai, Z.D., Yao, J.F. Central limit theorems for eigenvalues in a spiked population model. Annales de l’Institut Henri Poincaré- Probabilités et Statistiques, 44: 447–474 (2008)

    Article  MathSciNet  Google Scholar 

  6. Bai, Z.D., Yao, J.F. On sample eigenvalues in a generalized spiked population model. Journal of Multivariate Analysis, 106: 167–177 (2012)

    Article  MathSciNet  Google Scholar 

  7. Bai, Z.D., Silverstein, J.W. CLT for linear spectral statistics of large-dimensional sample covariance matrices. Annals of Probability, 32: 553–605 (2004)

    Article  MathSciNet  Google Scholar 

  8. Baik, J., Silverstein, J.W. Eigenvalues of large sample covariance matrices of spiked population models. Journal of Multivariate Analysis, 97: 1382–1408 (2006)

    Article  MathSciNet  Google Scholar 

  9. Chakraborty, A., Mukherjee, S.S., Chakrabarti, A. High dimensional PCA: a new model selection criterion. arXiv:2011.04470v1, 2020

  10. Ferre, L. Selection of components in principal component analysis: A comparison of methods. Computational Statistics & Data Analysis, 19: 669–682 (1995)

    Article  MathSciNet  Google Scholar 

  11. Fujikoshi, Y., Sakurai, T. Some properties of estimation criteia for dimensionality in principal component analysis. American Jounal of Mathmatical and Management Sciences, 35: 133–142 (2016)

    Google Scholar 

  12. Fujikoshi, Y., Sakurai, T., Yanagihara, H. Consistency of high-dimensional aic-type and cp-type criteria in multivariate linear regression. Journal of Multivariate Analysis, 123: 184–200 (2014)

    Article  MathSciNet  Google Scholar 

  13. Hotelling, H. Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology, 24: 417–441 (1933)

    Article  Google Scholar 

  14. Johnstone, I.M. On the distribution of the largest eigenvalues in principal components analysis. Annals of Statistics, 29: 295–327 (2001)

    Article  MathSciNet  Google Scholar 

  15. Jolliffe, I.T. Principal component analysis (2nd ed.). Springer, Berlin, 2002

    MATH  Google Scholar 

  16. Kim, Y., Kwon, S., Choi, H. Consistent model selection criteria on high dimensions. Journal of Machine Learning Research, 13: 1037–1057 (2012)

    MathSciNet  MATH  Google Scholar 

  17. Luo, W., Li, Bing. Combining eigenvalues and variation of eigenvectors for order determination. Biometrika, 103: 875–887 (2016)

    Article  MathSciNet  Google Scholar 

  18. Luo, W., Li, Bing. On order determination by predictors augmentation. Biometrika, 108: 557–574 (2021)

    Article  MathSciNet  Google Scholar 

  19. Paul, D. Asymptotics of sample eigenstruture for a large dimensional spiked covariance model. Statistica Sinica, 17: 1617–1642 (2007)

    MathSciNet  MATH  Google Scholar 

  20. Passemier, D., Yao, J.F. On determining the number of spikes in a high-dimensional spiked population model. Random Matrices: Theory and Applications, 1: 1150002 (2012)

    Article  MathSciNet  Google Scholar 

  21. Passemier, D., Yao, J.F. Estimation of the number of spikes, possibly equal, in the high-dimensional case. Journal of Multivariate Analysis, 127: 173–183 (2014)

    Article  MathSciNet  Google Scholar 

  22. Pearson, K. On lines and planes of closest fit to systems of points in space. Philosophical Magazine, 11: 559–572 (1901)

    MATH  Google Scholar 

  23. Schwarz, G. Estimating the dimension of a model. Annals of Statistics, 6: 461–464 (1978)

    Article  MathSciNet  Google Scholar 

  24. Shao, J. An asymptotic theory for linear model selection. Statistica Sinica, 7: 221–264 (1997)

    MathSciNet  MATH  Google Scholar 

  25. Shibata, R. Selection of the order of an autoregressive model by akaike information criterion. Biometrika, 63: 117–126 (1976)

    Article  MathSciNet  Google Scholar 

  26. Wang, Q., Yao, J. On the sphericity test with large-dimensional observations. Electronic Journal of Statistics, 7: 2164–2192 (2013)

    MathSciNet  MATH  Google Scholar 

  27. Yanagihara, H., Wakaki, H., Fujikoshi, Y. A consistency property of the aic for multivariate linear models when the dimension and the sample size are large. Electronic Journal of Statistics, 43: 231–245 (2015)

    MathSciNet  MATH  Google Scholar 

  28. Yang, Y. Can the strengths of aic and bic be shared? a conflict between model indentification and regression. Biometrika, 92: 937–950 (2005)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgments

The authors sincerely thank the editor, the associate editor and the two referees for their constructive comments and suggestions that have substantially improved the original manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heng-jian Cui.

Additional information

This work is partly supported by National Natural Science Foundation of China (Nos: 12031016, 11971324, 11471223); Foundations of Science and Technology Innovation Service Capacity Building, Interdisciplinary Construction of Bioinformatics and Statistics, and Academy for Multidisciplinary Studies, Capital Normal University, Beijing.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Gp., Cui, Hj. Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA. Acta Math. Appl. Sin. Engl. Ser. 38, 513–531 (2022). https://doi.org/10.1007/s10255-022-1094-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10255-022-1094-4

Keywords

2000 MR Subject Classification

Navigation