Canonical Dependency Analysis Using a Bias-Corrected $$\chi ^2$$ Statistics Matrix

Tsuchida, Jun; Yadohisa, Hiroshi

doi:10.1007/s42519-023-00360-5

Canonical Dependency Analysis Using a Bias-Corrected $\chi ^2$ Statistics Matrix

Original Article
Published: 08 January 2024

Volume 18, article number 7, (2024)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

85 Accesses
Explore all metrics

Abstract

Canonical correlation and canonical covariance analyses are popular dimensional reduction methods when managing two datasets. Because these methods seek a subspace maximizing correlation (or covariance), it is not suitable to apply them to datasets that have a non-linear relationship. To tackle this issue, some researchers have proposed canonical dependency analysis (CDA), which seeks a subspace maximizing dependency. However, applying this method to datasets with categorical variables may not be appropriate because CDA does not consider categorical variables directly. Moreover, some methods are time consuming for hyper-parameter tuning. We therefore propose a quantification method that includes a CDA that minimizes the distance between the dependency matrix and the estimated matrix and a calculation method for a bias-corrected $\chi ^2$ statistics matrix in a moderate amount of time. We derived the explicit updated formula of the parameter estimation using the majorization technique and applied a method to calculate a bias-corrected $\chi ^2$ statistics matrix without hyper-parameters. We then applied this method to both simulated and real datasets. From the simulated data, the proposed method shows the best performance when the data include some categorical variables. We obtain a reasonable result from the application to real datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Structured Variable Selection for Regularized Generalized Canonical Correlation Analysis

Strong Consistency of Log-Likelihood-Based Information Criterion in High-Dimensional Canonical Correlation Analysis

Article 26 June 2019

Numerical Methods for the Genvar Criterion of Multiple-Sets Canonical Analysis

Article 18 September 2015

References

Ali SM, Silvey SD (1966) A general class of coefficients of divergence of one distribution from another. J Roy Stat Soc B 28:131–142
MathSciNet Google Scholar
Bach FR, Jordan MI (2002) Kernel independent component analysis. J Mach Learn Res 3:1–48
MathSciNet Google Scholar
Bergsma W (2013) A bias-correction for cramer’s v and tschuprow’s t. J Korean Stat Soc 42:323–328
Article MathSciNet Google Scholar
ter Braak CJF, Verdonschot PFM (1995) Canonical correspondence analysis and related multivariate methods in aquatic ecology. Aquat Sci 57:255–289
Article Google Scholar
Cortez P, Silva A (2008) Using data mining to predict secondary school student performance. In: Brito A, Teixeira J (eds) Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008), pp 5–12
Csiszár I (1967) Information-type measure of difference of probability distributions and indirect observation. Stud Sci Math Hung 2:229–318
MathSciNet Google Scholar
Gessaman MPA (1970) Consistent nonparametric multivariate density estimator based on statistically equivalent blocks. Ann Math Stat 41:1344–1346
Article Google Scholar
Gifi A (1990) Nonlinear multivariate analysis. Wiley, New York
Google Scholar
Gretton A, Bousquet O, Smola A, Schölkopf B (2005) Measuring statistical dependence with hilbert-schmidt norms. In: Jain S, Simon HU, Tomita E (eds) Proceedings of the 16th conference on algorithmic learning theory, pp 63–77
Horst P (1961) Relations among m sets of measures. Phychometrika 26:129–149
Article MathSciNet Google Scholar
Hotelling H (1936) Relations between two sets of variates. Biometrika 28:328–377
Article Google Scholar
Karasuyama M, Sugiyama M (2012) Canonical dependency analysis based on squared-loss mutual information. Neural Netw 34:46–55
Article PubMed Google Scholar
Kiers HA (1995) Maximization of sums of quotients of quadratic forms and some generalizations. Psychometrika 60:221–245
Article MathSciNet Google Scholar
Kiers HA (2002) Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems. Comput Stat Data Anal 41:157–170
Article MathSciNet Google Scholar
Pearson K (1900) On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 5(50):157–175
Article Google Scholar
Sugiyama M, Liu S, du Plessis MC, Yamanaka M, Yamada M, Suzuki T (2013) Direct divergence approximation between probability distributions and its applications in machine learning. J Comput Sci Eng 7:99–111
Article Google Scholar
Suzuki J (2016) An estimator of mutual information and its application to independence testing. Entropy 18:109–127
Article ADS Google Scholar
Suzuki T, Sugiyama M (2011) Least-squares independent component analysis. Neural Comput 23:284–301
Article MathSciNet PubMed Google Scholar
Tenenhaus A, Tenenhaus M (2011) Regularized generalized canonical correlation analysis. Psychometrika 76:257–284
Article MathSciNet Google Scholar
Terrell GR, Scott DW (1992) Variable kernel density estimation. Ann Stat 29:1236–1265
MathSciNet Google Scholar
ten Berge J (1993) Least squares optimization in multivariate analysis. DSWO Press, Leiden
Google Scholar
Yamashita N (2012) Canonical correlation analysis formulated as maximizing sum of squared correlations and rotation of structure matrix. Kodokeiryogaku 39:1–9 (in Japanese)
Google Scholar
Yin X (2004) Canonical correlation analysis based on information theory. J Multivar Anal 91:161–176
Article MathSciNet Google Scholar
Yoon G, Carroll RJ, Gaynanova I (2020) Sparse semiparametric canonical correlation analysis for data of mixed types. Biometrika 107:609–625
Article MathSciNet PubMed Google Scholar
Yoon G, Müller CL, Gaynanova I (2021) Fast computation of latent correlations. J Comput Gr Stat. https://doi.org/10.1080/10618600.2021.1882468
Article MathSciNet Google Scholar
Young F, Takane Y, De Leeuw J (1978) The principal components of mixed measurement level multivariate data: an alternating least squares method with optimal scaling features. Psychometrika 43:279–281
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Data Science, Kyoto Women’s University, Kyoto, Japan
Jun Tsuchida
Faculty of Culture and Information Science, Doshisha University, Kyoto, Japan
Hiroshi Yadohisa

Authors

Jun Tsuchida
View author publications
You can also search for this author in PubMed Google Scholar
Hiroshi Yadohisa
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Tsuchida.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Tsuchida, J., Yadohisa, H. Canonical Dependency Analysis Using a Bias-Corrected $\chi ^2$ Statistics Matrix. J Stat Theory Pract 18, 7 (2024). https://doi.org/10.1007/s42519-023-00360-5

Download citation

Accepted: 02 December 2023
Published: 08 January 2024
DOI: https://doi.org/10.1007/s42519-023-00360-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Canonical Dependency Analysis Using a Bias-Corrected \(\chi ^2\) Statistics Matrix

Abstract

Access this article

Similar content being viewed by others

Structured Variable Selection for Regularized Generalized Canonical Correlation Analysis

Strong Consistency of Log-Likelihood-Based Information Criterion in High-Dimensional Canonical Correlation Analysis

Numerical Methods for the Genvar Criterion of Multiple-Sets Canonical Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Canonical Dependency Analysis Using a Bias-Corrected \(\chi ^2\) Statistics Matrix

Abstract

Access this article

Similar content being viewed by others

Structured Variable Selection for Regularized Generalized Canonical Correlation Analysis

Strong Consistency of Log-Likelihood-Based Information Criterion in High-Dimensional Canonical Correlation Analysis

Numerical Methods for the Genvar Criterion of Multiple-Sets Canonical Analysis

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation