Skip to main content
Log in

Determining the number of canonical correlation pairs for high-dimensional vectors

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

For two random vectors whose dimensions are both proportional to the sample size, we in this paper propose two ridge ratio criteria to determine the number of canonical correlation pairs. The criteria are, respectively, based on eigenvalue difference-based and centered eigenvalue-based ridge ratios. Unlike existing methods, the criteria make the ratio at the index we want to identify stick out to show a visualized “valley-cliff” pattern and thus can adequately avoid the local optimal solutions that often occur in the eigenvalues multiplicity cases. The numerical studies also suggest its advantage over existing scree plot-based method that is not a visualization method and more seriously underestimates the number of pairs than the proposed ones and the AIC and \(C_p\) criteria that often extremely over-estimate the number, and the BIC criterion that has very serious underestimation problem. A real data set is analyzed for illustration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov, F. Csáki (Eds.), 2nd International Symposium on Information Theory, pp. 267–281. Budapest: Akadémiai Kaido.

    Google Scholar 

  • Bai, Z., Choi, K. P., Fujikoshi, Y. (2018). Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis. The Annals of Statistics, 46(3), 1050–1076.

    Article  MathSciNet  Google Scholar 

  • Bao, Z., Hu, J., Pan, G., Zhou, W. (2019). Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case. The Annals of Statistics, 47(1), 612–640.

    Article  MathSciNet  Google Scholar 

  • Cabanski, C. R., Qi, Y., Yin, X., Bair, E., Hayward, M. C., Fan, C., Li, J., Wilkerson, M. D., Marron, J. S., Perou, C. M., Hayes, D. N. (2010). SWISS MADE: Standardized within class sum of squares to evaluate methodologies and dataset elements. PLoS ONE, 5(3), e9905.

    Article  Google Scholar 

  • Cancer Genome Atlas Network. (2012). Comprehensive molecular portraits of human breast tumours. Nature, 490(7418), 61–70.

    Article  Google Scholar 

  • Ciriello, G., Gatza, M. L., Beck, A. H., Wilkerson, M. D., Rhie, S. K., Pastore, A., Zhang, H., McLellan, M., Yau, C., Kandoth, C., Bowlby, R., Shen, H., Hayat, S., Fieldhouse, R., Lester, S. C., Tse, G. M., Factor, R. E., Collins, L. C., Allison, K. H., Chen, Y., Jensen, K., Johnson, N. B., Oesterreich, S., Mills, G. B., Cherniack, A. D., Robertson, G., Benz, C., Sander, C., Laird, P. W., Hoadley, K. A., King, T. A., TCGA Research Network, Perou, C. M. (2015). Comprehensive molecular portraits of invasive lobular breast cancer. Cell, 163(2), 506–519.

    Article  Google Scholar 

  • Fujikoshi, Y. (1985). Two methods for estimation of dimensionality in canonical correlation analysis and the multivariate linear model. In K. Matsushita (Ed.), Statistical theory and data analysis, pp. 233–240. Amsterdam: Elsevier Science.

    Google Scholar 

  • Fujikoshi, Y. (2017a). High-dimensional asymptotic distributions of characteristic roots in multivariate linear models and canonical correlation analysis. Hiroshima Mathematical Journal, 47(3), 249–271.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y. (2017b). High-dimensional properties of AIC, BIC and \(C_{p}\) for estimation of dimensionality in canonical correlation analysis. SUT Journal of Mathematics, 53(1), 59–72.

    MathSciNet  MATH  Google Scholar 

  • Fujikoshi, Y., Sakurai, T. (2009). High-dimensional asymptotic expansions for the distributions of canonical correlations. Journal of Multivariate Analysis, 100(1), 231–242.

    Article  MathSciNet  Google Scholar 

  • Fujikoshi, Y., Veitch, L. (1979). Estimation of dimensionality in canonical correlation analysis. Biometrika, 66(2), 345–351.

    Article  MathSciNet  Google Scholar 

  • Gunderson, B., Muirhead, R. (1997). On estimating the dimensionality in canonical correlation analysis. Journal of Multivariate Analysis, 62(1), 121–136.

    Article  MathSciNet  Google Scholar 

  • Headrick, T. C. (2002). Fast fifth-order polynomial transforms for generating univariate and multivariate nonnormal distributions. Computational Statistics and Data Analysis, 40(4), 685–711.

    Article  MathSciNet  Google Scholar 

  • Hotelling, H. (1936). Relations between two sets of variates. Biometrika, 28(3–4), 321–377.

    Article  Google Scholar 

  • Kendall, M., Stuart, A. (1977). The advanced theory of statistics 4th ed. New York: Macmillan.

    MATH  Google Scholar 

  • Mallows, C. L. (1973). Some comments on \(C_{p}\). Technometrics, 15(4), 661–675.

    MATH  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.

    Article  MathSciNet  Google Scholar 

  • Shu, H., Wang, X., Zhu, H. (2019). D-CCA: A decomposition-based canonical correlation analysis for high-dimensional datasets. Journal of the American Statistical Association, 115, 292–306. https://doi.org/10.1080/01621459.2018.1543599.

    Article  MathSciNet  MATH  Google Scholar 

  • Song, Y., Schreier, P. J., Roseveare, N. J. (2015). Determining the number of correlated signals between two data sets using PCA-CCA when sample support is extremely small. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3452–3456. South Brisbane, QLD: IEEE.

  • Song, Y., Schreier, P. J., Ramírez, D., Hasija, T. (2016). Canonical correlation analysis of high-dimensional data with very small sample support. Signal Processing, 128, 449–458.

    Article  Google Scholar 

  • Wachter, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. The Annals of Statistics, 8(5), 937–957.

    Article  MathSciNet  Google Scholar 

  • Zhu, X., Guo, X., Wang, T., Zhu, L. (2020). Dimensionality determination: A thresholding double ridge ratio approach. Computational Statistics and Data Analysis, 146, 106910.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor, the associated editor and two anonymous referees for their constructive suggestions and comments that led to the improvement of an early manuscript.

Funding

The research described herewith was supported by a Grant (HKBU12303419) from The University Grants Council of Hong Kong, and a grant from The National Natural Science Foundation of China (NSFC11671042).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lixing Zhu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 58 KB)

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zheng, J., Zhu, L. Determining the number of canonical correlation pairs for high-dimensional vectors. Ann Inst Stat Math 73, 737–756 (2021). https://doi.org/10.1007/s10463-020-00776-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-020-00776-x

Keywords

Navigation