High-dimensional disjoint factor analysis with its EM algorithm version

Cai, Jingyu; Adachi, Kohei

doi:10.1007/s42081-021-00119-x

High-dimensional disjoint factor analysis with its EM algorithm version

Original Paper
Published: 05 May 2021

Volume 4, pages 427–448, (2021)
Cite this article

Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

116 Accesses
Explore all metrics

Abstract

Vichi (Advances in Data Analysis and Classification, 11:563–591, 2017) proposed disjoint factor analysis (DFA), which is a factor analysis procedure subject to the constraint that variables are mutually disjoint. That is, in the DFA solution, each variable loads only a single factor among multiple ones. It implies that the variables are clustered into exclusive groups. Such variable clustering is considered useful for high-dimensional data with variables much more than observations. However, the feasibility of DFA for high-dimensional data has not been considered in Vichi (2017). Thus, one purpose of this paper is to show the feasibility and usefulness of DFA for high-dimensional data. Another purpose is to propose a new computational procedure for DFA, in which an EM algorithm is used. This procedure is called EM-DFA in particular, which can serve the same original purpose as in Vichi (2017) but more efficiently. Numerical studies demonstrate that both DFA and EM-DFA can cluster variables fairly well, with EM-DFA more computationally efficient.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Density-Based Clustering Based on Hierarchical Density Estimates

References

Adachi, K. (2013). Factor analysis with EM algorithm never gives improper solutions when sample covariance and initial parameter matrices are proper. Psychometrika, 78, 380–394.
Article MathSciNet Google Scholar
Adachi, K. (2016). Three-way principal component analysis with its applications to psychology. In T. Sakata (Ed.), Applied matrix and tensor variate data analysis. (pp. 1–21). Springer.
Google Scholar
Adachi, K. (2019). Factor analysis: Latent variable, matrix decomposition, and constrained uniqueness formulations. WIREs Computational Statistics, https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.1458. Accessed 19 Mar 2019
Adachi, K., & Trendafilov, N. T. (2016). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1403–1427.
Article MathSciNet Google Scholar
Adachi, K., & Trendafilov, N. T. (2018a). Sparsest factor analysis for clustering variables: A matrix decomposition approach. Advances in Data Analysis and Classification, 12, 559–585.
Article MathSciNet Google Scholar
Adachi, K., & Trendafilov, N. T. (2018b). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 407–424.
Article MathSciNet Google Scholar
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.
Article MathSciNet Google Scholar
Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach (Third Edition). . Wiley.
Book Google Scholar
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
MathSciNet MATH Google Scholar
Gan, G., Ma, C., & Wu, J. (2007). Data clustering: Theory, algorithms, and applications. . Society of Industrial and Applied Mathematics (SIAM).
Book Google Scholar
Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–160.
Article MathSciNet Google Scholar
Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25, 863–875.
Article MathSciNet Google Scholar
Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482.
Article MathSciNet Google Scholar
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurements, 20, 141–151.
Article Google Scholar
Koch, I. (2014). Analysis of multivariate and high-dimensional data. . Cambridge University Press.
MATH Google Scholar
Konishi, S., & Kitagawa, G. (2007). Information criteria and statistical modeling. . Springer.
MATH Google Scholar
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. . University of Illinois Press.
Google Scholar
Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47, 69–76.
Article MathSciNet Google Scholar
Seber, G. A. F. (2008). A matrix handbook for statisticians. . Wiley.
MATH Google Scholar
Stegeman, A. (2016). A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Computational Statistics & Data Analysis, 99, 189–203.
Article MathSciNet Google Scholar
Vichi, M. (2017). Disjoint factor analysis with cross-loadings. Advances in Data Analysis and Classification, 11, 563–591.
Article MathSciNet Google Scholar
Vichi, M., & Saporta, G. (2009). Clustering and disjoint principal component analysis with cross-loadings. Computational Statistics & Data Analysis, 53, 3194–3208.
Article MathSciNet Google Scholar
Yanai, H., & Ichikawa, M. (2007). Factor analysis. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, vol. 26: Psychometrics. (pp. 257–296). Elsevier.
Google Scholar
Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17, 763–774.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Graduate School of Human Sciences, Osaka University, 1-2 Yamadaoka, Suita, Osaka, 565-0871, Japan
Jingyu Cai & Kohei Adachi

Authors

Jingyu Cai
View author publications
You can also search for this author in PubMed Google Scholar
Kohei Adachi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jingyu Cai.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Here, we consider minimizing loss function (7) over b_k with the other parameter kept fixed. For the sake of simplicity, let us omit the subscript k from the symbols in (7). Then, it is simplified as f(b) = log|bb′ + ${{\varvec{\Psi}}}$|+ trS(bb′ + ${{\varvec{\Psi}}}$)⁻¹. Using W = ${{\varvec{\Psi}}}$^−1/2S${{\varvec{\Psi}}}$^−1/2 and

$$\phi \left( {\mathbf{b}} \right) = { 1} + {\mathbf{b^{\prime}}}{{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}},$$

(20)

the loss function can be rewritten as

$$\begin{aligned} f\left( {\mathbf{b}} \right) & \, = \log |{{\varvec{\Psi}}}| + \log \phi \left( {\mathbf{b}} \right) + {\text{tr}}{\mathbf{S}}{{\varvec{\Psi}}}^{ - 1} - \frac{1}{\phi \left( {\mathbf{b}} \right)}{\mathbf{b^{\prime}}}{{\varvec{\Psi}}}^{ - 1} {\mathbf{S}}{{\varvec{\Psi}}}^{ - 1} {\mathbf{b}} \\ & = {\text{log}}|{\mathbf{S}}| - {\text{log}}|{\mathbf{W}}| + {\text{ log}}\phi \left( {\mathbf{b}} \right) + {\text{tr}}{\mathbf{W}} \\ &- \frac{1}{\phi \left( {\mathbf{b}} \right)}{\mathbf{b^{\prime}}}{{\varvec{\Psi}}}^{{ - {1}/{2}}} {\mathbf{W}}{{\varvec{\Psi}}}^{{ - {1}/{2}}} {\mathbf{b}}, \\ \end{aligned}$$

(21)

where we have used |bb′ + Ψ|= ϕ(b)|Ψ| (Seber, 2008, p. 312), (bb′ + Ψ)⁻¹ = Ψ⁻¹ − ϕ(b)⁻¹Ψ⁻¹bb′Ψ⁻¹ (Seber, 2008, p. 309), and |Ψ| =|S|/|W|.

Using Σ = bb′ + Ψ, (21) is also expressed as f(b) = log|Σ|+ trSΣ⁻¹. This minimizer must satisfy ∂f(b)/∂b = (Σ⁻¹ − Σ⁻¹SΣ⁻¹)b = 0_m, or equivalently,

$${\mathbf{b}} = {\mathbf{S \Sigma}}^{{ - {1}}} {\mathbf{b}}.$$

(22)

Multiplying both sides by b′${{\varvec{\Psi}}}$⁻¹b leads to

$$({\mathbf{b^{\prime}}}{{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}}){\mathbf{b}} = {\mathbf{S \Sigma}}^{{ - {1}}} {\mathbf{bb^{\prime}}}{{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}} = {\mathbf{S \Sigma}}^{{ - {1}}} ({\mathbf{\Sigma}} -{{\varvec{\Psi}}}){{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}} = {\mathbf{S}}{{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}} - {\mathbf{S \Sigma}}^{{ - {1}}} {\mathbf{b}} = {\mathbf{S}}{{\varvec{\Psi}}}^{{ - {1}}} {\mathbf{b}} - {\mathbf{b}},$$

(23)

where bb′ = Σ − ${{\varvec{\Psi}}}$ and (22) have been used. We can use (20) to rewrite (23) as S${{\varvec{\Psi}}}$⁻¹b = ϕ(b)b and premultiply both sides by ${{\varvec{\Psi}}}$^−1/2 to have ${{\varvec{\Psi}}}$^−1/2S${{\varvec{\Psi}}}$^−1/2${{\varvec{\Psi}}}$^−1/2b = ϕ(b) ${{\varvec{\Psi}}}$^−1/2b, i.e.,

$${\mathbf{W}}({{\varvec{\Psi}}}^{{ - {1}/{2}}} {\mathbf{b}}) \, = \phi \left( {\mathbf{b}} \right)({{\varvec{\Psi}}}^{{ - {1}/{2}}} {\mathbf{b}}) \, .$$

(24)

This is an eigen equation showing that an eigenvalue of W is expressed as (20), when b is the optimal.

Using (20) and (24), we can rewrite the final term on the right side of (21) as ϕ(b)⁻¹b′${{\varvec{\Psi}}}$^−1/2W${{\varvec{\Psi}}}$^−1/2b = ϕ(b)⁻¹b′${{\varvec{\Psi}}}$^−1/2{ϕ(b)( ${{\varvec{\Psi}}}$^−1/2b)} = b′${{\varvec{\Psi}}}$⁻¹b = ϕ(b) − 1. Thus, (21) is rewritten as

$$\begin{aligned} f\left( {\mathbf{b}} \right) & \, ={\text{log}}|{\mathbf{S}}| - {\text{log}}|{\mathbf{W}}| + {\text{ log}}\phi \left( {\mathbf{b}} \right) + {\text{tr}}{\mathbf{W}} - \left\{ {\phi \left( {\mathbf{b}} \right) - {1}} \right\} \\ & = {\text{ log}}|{\mathbf{S}}| - {\text{log}}|{\mathbf{W}}| + {\text{ tr}}{\mathbf{W}} - \left\{ {\phi \left( {\mathbf{b}} \right) - {\text{log}}\phi \left( {\mathbf{b}} \right) - {1}} \right\}. \\ \end{aligned}$$

(25)

This shows that minimizing (21) over b amounts to maximizing g = ϕ(b) − logϕ(b) − 1. Here, dg/dϕ(b) = 1 − 1/ϕ(b) shows a larger ϕ(b) leading to a greater g, and b is subject to (20) being an eigenvalue of W. Thus, the maximization is attained by selecting b so that (20) is the largest eigenvalue of W: ρ_max(W) = 1 + b′${{\varvec{\Psi}}}$⁻¹b. This holds for b = Ψ^1/2u{ρ_max(W) − 1}^1/2, i.e., (7).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cai, J., Adachi, K. High-dimensional disjoint factor analysis with its EM algorithm version. Jpn J Stat Data Sci 4, 427–448 (2021). https://doi.org/10.1007/s42081-021-00119-x

Download citation

Received: 05 June 2020
Revised: 06 January 2021
Accepted: 24 March 2021
Published: 05 May 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s42081-021-00119-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

High-dimensional disjoint factor analysis with its EM algorithm version

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

High-dimensional disjoint factor analysis with its EM algorithm version

Abstract

Access this article

Similar content being viewed by others

A new criterion for assessing discriminant validity in variance-based structural equation modeling

A Comprehensive Survey of Clustering Algorithms

Density-Based Clustering Based on Hierarchical Density Estimates

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation