Abstract
Vichi (Advances in Data Analysis and Classification, 11:563β591, 2017) proposed disjoint factor analysis (DFA), which is a factor analysis procedure subject to the constraint that variables are mutually disjoint. That is, in the DFA solution, each variable loads only a single factor among multiple ones. It implies that the variables are clustered into exclusive groups. Such variable clustering is considered useful for high-dimensional data with variables much more than observations. However, the feasibility of DFA for high-dimensional data has not been considered in Vichi (2017). Thus, one purpose of this paper is to show the feasibility and usefulness of DFA for high-dimensional data. Another purpose is to propose a new computational procedure for DFA, in which an EM algorithm is used. This procedure is called EM-DFA in particular, which can serve the same original purpose as in Vichi (2017) but more efficiently. Numerical studies demonstrate that both DFA and EM-DFA can cluster variables fairly well, with EM-DFA more computationally efficient.
Similar content being viewed by others
References
Adachi, K. (2013). Factor analysis with EM algorithm never gives improper solutions when sample covariance and initial parameter matrices are proper. Psychometrika, 78, 380β394.
Adachi, K. (2016). Three-way principal component analysis with its applications to psychology. In T. Sakata (Ed.), Applied matrix and tensor variate data analysis. (pp. 1β21). Springer.
Adachi, K. (2019). Factor analysis: Latent variable, matrix decomposition, and constrained uniqueness formulations. WIREs Computational Statistics, https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.1458. Accessed 19 Mar 2019
Adachi, K., & Trendafilov, N. T. (2016). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1403β1427.
Adachi, K., & Trendafilov, N. T. (2018a). Sparsest factor analysis for clustering variables: A matrix decomposition approach. Advances in Data Analysis and Classification, 12, 559β585.
Adachi, K., & Trendafilov, N. T. (2018b). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 407β424.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317β332.
Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach (Third Edition). . Wiley.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1β38.
Gan, G., Ma, C., & Wu, J. (2007). Data clustering: Theory, algorithms, and applications. . Society of Industrial and Applied Mathematics (SIAM).
Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149β160.
Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25, 863β875.
JΓΆreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443β482.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurements, 20, 141β151.
Koch, I. (2014). Analysis of multivariate and high-dimensional data. . Cambridge University Press.
Konishi, S., & Kitagawa, G. (2007). Information criteria and statistical modeling. . Springer.
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. . University of Illinois Press.
Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47, 69β76.
Seber, G. A. F. (2008). A matrix handbook for statisticians. . Wiley.
Stegeman, A. (2016). A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Computational Statistics & Data Analysis, 99, 189β203.
Vichi, M. (2017). Disjoint factor analysis with cross-loadings. Advances in Data Analysis and Classification, 11, 563β591.
Vichi, M., & Saporta, G. (2009). Clustering and disjoint principal component analysis with cross-loadings. Computational Statistics & Data Analysis, 53, 3194β3208.
Yanai, H., & Ichikawa, M. (2007). Factor analysis. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, vol. 26: Psychometrics. (pp. 257β296). Elsevier.
Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17, 763β774.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Here, we consider minimizing loss function (7) over bk with the other parameter kept fixed. For the sake of simplicity, let us omit the subscript k from the symbols in (7). Then, it is simplified as f(b)β=βlog|bbβ²β+β\({{\varvec{\Psi}}}\)|+βtrS(bbβ²β+β\({{\varvec{\Psi}}}\))β1. Using Wβ=β\({{\varvec{\Psi}}}\)β1/2S\({{\varvec{\Psi}}}\)β1/2 and
the loss function can be rewritten as
where we have used |bbβ²β+βΞ¨|=βΟ(b)|Ξ¨| (Seber, 2008, p. 312), (bbβ²β+βΞ¨)β1β=βΞ¨β1βββΟ(b)β1Ξ¨β1bbβ²Ξ¨β1 (Seber, 2008, p. 309), and |Ξ¨|β=|S|/|W|.
Using Ξ£β=βbbβ²β+βΞ¨, (21) is also expressed as f(b)β=βlog|Ξ£|+βtrSΞ£β1. This minimizer must satisfy βf(b)/βbβ=β(Ξ£β1βββΞ£β1SΞ£β1)bβ=β0m, or equivalently,
Multiplying both sides by bβ²\({{\varvec{\Psi}}}\)β1b leads to
where bbβ²β=βΞ£βββ\({{\varvec{\Psi}}}\) and (22) have been used. We can use (20) to rewrite (23) as S\({{\varvec{\Psi}}}\)β1bβ=βΟ(b)b and premultiply both sides by \({{\varvec{\Psi}}}\)β1/2 to have \({{\varvec{\Psi}}}\)β1/2S\({{\varvec{\Psi}}}\)β1/2\({{\varvec{\Psi}}}\)β1/2bβ=βΟ(b) \({{\varvec{\Psi}}}\)β1/2b, i.e.,
This is an eigen equation showing that an eigenvalue of W is expressed as (20), when b is the optimal.
Using (20) and (24), we can rewrite the final term on the right side of (21) as Ο(b)β1bβ²\({{\varvec{\Psi}}}\)β1/2W\({{\varvec{\Psi}}}\)β1/2bβ=βΟ(b)β1bβ²\({{\varvec{\Psi}}}\)β1/2{Ο(b)( \({{\varvec{\Psi}}}\)β1/2b)}β=βbβ²\({{\varvec{\Psi}}}\)β1bβ=βΟ(b)βββ1. Thus, (21) is rewritten as
This shows that minimizing (21) over b amounts to maximizing gβ=βΟ(b)βββlogΟ(b)βββ1. Here, dg/dΟ(b)β=β1βββ1/Ο(b) shows a larger Ο(b) leading to a greater g, and b is subject to (20) being an eigenvalue of W. Thus, the maximization is attained by selecting b so that (20) is the largest eigenvalue of W: Οmax(W)β=β1β+βbβ²\({{\varvec{\Psi}}}\)β1b. This holds for bβ=βΞ¨1/2u{Οmax(W)βββ1}1/2, i.e., (7).
Rights and permissions
About this article
Cite this article
Cai, J., Adachi, K. High-dimensional disjoint factor analysis with its EM algorithm version. Jpn J Stat Data Sci 4, 427β448 (2021). https://doi.org/10.1007/s42081-021-00119-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-021-00119-x