A generalized information criterion for high-dimensional PCA rank selection

Hung, Hung; Huang, Su-Yun; Ing, Ching-Kang

doi:10.1007/s00362-021-01276-7

A generalized information criterion for high-dimensional PCA rank selection

Regular Article
Published: 10 January 2022

Volume 63, pages 1295–1321, (2022)
Cite this article

Statistical Papers Aims and scope Submit manuscript

574 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Principal component analysis (PCA) is a commonly used statistical tool for dimension reduction. An important issue in PCA is to determine the rank, which is the number of dominant eigenvalues of the covariance matrix. Among information-based criteria, the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) are the two most common ones. Both use the number of free parameters for assessing model complexity, which requires the validity of the simple spiked covariance model. As a result, AIC and BIC may suffer from the problem of model misspecification when the tail eigenvalues do not follow the simple spiked model assumption. To alleviate this difficulty, we adopt the idea of the generalized information criterion (GIC) to propose a model complexity measure for PCA rank selection. The proposed model complexity takes into account the sizes of eigenvalues and, hence, is more robust to model misspecification. Asymptotic properties of our GIC are established under the high-dimensional setting, where \(n\rightarrow \infty \) and \(p/n\rightarrow c >0\). Our asymptotic results show that GIC is better than AIC in excluding noise eigenvalues, and is more sensitive than BIC in detecting signal eigenvalues. Numerical studies and a real data example are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Article 01 July 2022

Cauchy robust principal component analysis with applications to high-dimensional data sets

Article Open access 02 November 2023

Semi-sparse PCA

Article 27 November 2018

References

Bai J, Ng S (2002) Determining the number of factors in approximate factor models. Econometrica 70:191–221
Article MathSciNet Google Scholar
Bai ZD, Silverstein JW (2010) Spectral analysis of large dimensional random matrices, 2nd edn. Springer, New York
Book Google Scholar
Bai ZD, Yao J (2012) On sample eigenvalues in a generalized spiked population model. J Multivar Anal 106:167–177
Article MathSciNet Google Scholar
Bai Z, Chen J, Yao J (2010) On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust N Z J Stat 52:423–437
Article MathSciNet Google Scholar
Bai ZD, Choi KP, Fujikoshi Y (2018) Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis. Ann Stat 46:1050–1076
MathSciNet MATH Google Scholar
Chan T, Jia K, Gao S, Lu J, Zeng Z, Ma Y (2015) PCANet: a simple deep learning baseline for image classification? IEEE Trans Image Process 24:5017–5032
Article MathSciNet Google Scholar
Chung SC, Wang SH, Niu PY, Huang SY, Chang WH, Tu IP (2020) Two-stage dimension reduction for noisy high-dimensional images and application to Cryogenic Electron Microscopy. Ann Math Sci Appl 5:283–316
Article MathSciNet Google Scholar
Ding Q, Kolaczyk ED (2013) A compressed PCA subspace method for anomaly detection in high-dimensional data. IEEE Trans Inf Theory 59:7419–7433
Article Google Scholar
Fujikoshi Y, Sakurai T (2016) Some properties of estimation criteria for dimensionality in principal component analysis. Am J Math Manag Sci 35:133–142
Google Scholar
Gorban AN, Kégl B, Wunsch DC, Zinovyev AY (eds) (2008) Principal manifolds for data visualization and dimension reduction. Springer, Berlin
Google Scholar
Gottumukkal R, Asari VK (2004) An improved face recognition technique based on modular PCA approach. Pattern Recogn Lett 25(4):429–436
Article Google Scholar
Hsu HL, Ing CK, Tong H (2019) On model selection from a finite family of possibly misspecified time series models. Ann Stat 47:1061–1087
Article MathSciNet Google Scholar
Jiang Q, Yan X, Zhao W (2013) Fault detection and diagnosis in chemical processes using sensitive principal component analysis. Ind Eng Chem Res 52:1635–1644
Article Google Scholar
Jiang Q, Huang B, Yan X (2016) GMM and optimal principal components-based Bayesian method for multimode fault diagnosis. Comput Chem Eng 84:338–349
Article Google Scholar
Konishi S, Kitagawa G (1996) Generalized information criteria in model selection. Biometrika 83:875–890
Article MathSciNet Google Scholar
Lv J, Liu JS (2014) Model selection principles in misspecified models. J R Stat Soc B 76:141–167
Article MathSciNet Google Scholar
Wax M, Kailath T (1985) Detection of signals by information theoretic criteria. IEEE Trans Acoust Speech Signal Process 33:387–392
Article MathSciNet Google Scholar
Wu GD, Chen J, Hoffmann C, Bittinger K, Chen YY, Keilbaugh SA, Bewtra M, Knights D, Walters WA, Knight R, Sinha R, Gilroy E, Gupta K, Baldassano R, Nessel L, Li H, Bushman FD, Lewis JD (2011) Linking long-term dietary patterns with gut microbial enterotypes. Science 334:105–108
Article Google Scholar
Yao J, Zheng S, Bai Z (2015) Large sample covariance matrices and high-dimensional data analysis. Cambridge University Press, New York
Book Google Scholar
Zhang L, Dong W, Zhang D, Shi G (2010) Two-stage image denoising by principal component analysis with local pixel grouping. Pattern Recogn 43:1531–1549
Article Google Scholar
Zheng Z, Lv J, Lin W (2020) Nonsparse learning with latent variables. Oper Res (to appear)

Download references

Funding

Funding was provided by Ministry of Science and Technology, Taiwan. (110-2118-M-002-001-MY3 for HH; MOST 107-2118-M-001-012-MY3 for SYH).

Author information

Authors and Affiliations

Institute of Epidemiology and Preventive Medicine, National Taiwan University, Taipei, Taiwan
Hung Hung
Institute of Statistical Science, Academia Sinica, Taipei, Taiwan
Su-Yun Huang
Institute of Statistics, National Tsing Hua University, Hsinchu, Taiwan
Ching-Kang Ing

Authors

Hung Hung
View author publications
You can also search for this author in PubMed Google Scholar
Su-Yun Huang
View author publications
You can also search for this author in PubMed Google Scholar
Ching-Kang Ing
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hung Hung.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1979 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Hung, H., Huang, SY. & Ing, CK. A generalized information criterion for high-dimensional PCA rank selection. Stat Papers 63, 1295–1321 (2022). https://doi.org/10.1007/s00362-021-01276-7

Download citation

Received: 10 May 2021
Revised: 14 September 2021
Accepted: 10 November 2021
Published: 10 January 2022
Issue Date: August 2022
DOI: https://doi.org/10.1007/s00362-021-01276-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A generalized information criterion for high-dimensional PCA rank selection

Abstract

Access this article

Similar content being viewed by others

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Cauchy robust principal component analysis with applications to high-dimensional data sets

Semi-sparse PCA

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 1979 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A generalized information criterion for high-dimensional PCA rank selection

Abstract

Access this article

Similar content being viewed by others

Variance Variation Criterion and Consistency in Estimating the Number of Significant Signals of High-dimensional PCA

Cauchy robust principal component analysis with applications to high-dimensional data sets

Semi-sparse PCA

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Supplementary Information

Supplementary material 1 (pdf 1979 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation