Skip to main content
Log in

Canonical Dependency Analysis Using a Bias-Corrected \(\chi ^2\) Statistics Matrix

  • Original Article
  • Published:
Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Abstract

Canonical correlation and canonical covariance analyses are popular dimensional reduction methods when managing two datasets. Because these methods seek a subspace maximizing correlation (or covariance), it is not suitable to apply them to datasets that have a non-linear relationship. To tackle this issue, some researchers have proposed canonical dependency analysis (CDA), which seeks a subspace maximizing dependency. However, applying this method to datasets with categorical variables may not be appropriate because CDA does not consider categorical variables directly. Moreover, some methods are time consuming for hyper-parameter tuning. We therefore propose a quantification method that includes a CDA that minimizes the distance between the dependency matrix and the estimated matrix and a calculation method for a bias-corrected \(\chi ^2\) statistics matrix in a moderate amount of time. We derived the explicit updated formula of the parameter estimation using the majorization technique and applied a method to calculate a bias-corrected \(\chi ^2\) statistics matrix without hyper-parameters. We then applied this method to both simulated and real datasets. From the simulated data, the proposed method shows the best performance when the data include some categorical variables. We obtain a reasonable result from the application to real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Fig. 1
Fig. 2

Similar content being viewed by others

References

  1. Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J Roy Stat Soc B 28:131–142

    MathSciNet  Google Scholar 

  2. Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48

    MathSciNet  Google Scholar 

  3. Bergsma W (2013) A bias-correction for cramer’s v and tschuprow’s t. J Korean Stat Soc 42:323–328

    Article  MathSciNet  Google Scholar 

  4. ter Braak CJF, Verdonschot PFM (1995) Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquat Sci 57:255–289

    Article  Google Scholar 

  5. Cortez P, Silva A (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp 5–12

  6. Csiszár I (1967) Information-type measure of difference of probability distributions and indirect observation. Stud Sci Math Hung 2:229–318

    MathSciNet  Google Scholar 

  7. Gessaman MPA (1970) Consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann Math Stat 41:1344–1346

    Article  Google Scholar 

  8. Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York

    Google Scholar 

  9. Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: Jain S, Simon HU, Tomita E (eds) Proceedings of the 16th conference on algorithmic learning theory, pp 63–77

  10. Horst P (1961) Relations among m sets of measures. Phychometrika 26:129–149

    Article  MathSciNet  Google Scholar 

  11. Hotelling H (1936) Relations between two sets of variates. Biometrika 28:328–377

    Article  Google Scholar 

  12. Karasuyama M, Sugiyama M (2012) Canonical dependency analysis based on squared-loss mutual information. Neural Netw 34:46–55

    Article  PubMed  Google Scholar 

  13. Kiers HA (1995) Maximization of sums of quotients of quadratic forms and some generalizations. Psychometrika 60:221–245

    Article  MathSciNet  Google Scholar 

  14. Kiers HA (2002) Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Comput Stat Data Anal 41:157–170

    Article  MathSciNet  Google Scholar 

  15. Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5(50):157–175

    Article  Google Scholar 

  16. Sugiyama M, Liu S, du Plessis MC, Yamanaka M, Yamada M, Suzuki T (2013) Direct divergence approximation between probability distributions and its applications in machine learning. J Comput Sci Eng 7:99–111

    Article  Google Scholar 

  17. Suzuki J (2016) An estimator of mutual information and its application to independence testing. Entropy 18:109–127

    Article  ADS  Google Scholar 

  18. Suzuki T, Sugiyama M (2011) Least-squares independent component analysis. Neural Comput 23:284–301

    Article  MathSciNet  PubMed  Google Scholar 

  19. Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76:257–284

    Article  MathSciNet  Google Scholar 

  20. Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 29:1236–1265

    MathSciNet  Google Scholar 

  21. ten Berge J (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden

    Google Scholar 

  22. Yamashita N (2012) Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrix. Kodokeiryogaku 39:1–9 (in Japanese)

    Google Scholar 

  23. Yin X (2004) Canonical correlation analysis based on information theory. J Multivar Anal 91:161–176

    Article  MathSciNet  Google Scholar 

  24. Yoon G, Carroll RJ, Gaynanova I (2020) Sparse semiparametric canonical correlation analysis for data of mixed types. Biometrika 107:609–625

    Article  MathSciNet  PubMed  Google Scholar 

  25. Yoon G, Müller CL, Gaynanova I (2021) Fast computation of latent correlations. J Comput Gr Stat. https://doi.org/10.1080/10618600.2021.1882468

    Article  MathSciNet  Google Scholar 

  26. Young F, Takane Y, De Leeuw J (1978) The principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features. Psychometrika 43:279–281

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jun Tsuchida.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Tsuchida, J., Yadohisa, H. Canonical Dependency Analysis Using a Bias-Corrected \(\chi ^2\) Statistics Matrix. J Stat Theory Pract 18, 7 (2024). https://doi.org/10.1007/s42519-023-00360-5

Download citation

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s42519-023-00360-5

Keywords

Navigation