Abstract
Vichi (Advances in Data Analysis and Classification, 11:563–591, 2017) proposed disjoint factor analysis (DFA), which is a factor analysis procedure subject to the constraint that variables are mutually disjoint. That is, in the DFA solution, each variable loads only a single factor among multiple ones. It implies that the variables are clustered into exclusive groups. Such variable clustering is considered useful for high-dimensional data with variables much more than observations. However, the feasibility of DFA for high-dimensional data has not been considered in Vichi (2017). Thus, one purpose of this paper is to show the feasibility and usefulness of DFA for high-dimensional data. Another purpose is to propose a new computational procedure for DFA, in which an EM algorithm is used. This procedure is called EM-DFA in particular, which can serve the same original purpose as in Vichi (2017) but more efficiently. Numerical studies demonstrate that both DFA and EM-DFA can cluster variables fairly well, with EM-DFA more computationally efficient.
Similar content being viewed by others
References
Adachi, K. (2013). Factor analysis with EM algorithm never gives improper solutions when sample covariance and initial parameter matrices are proper. Psychometrika, 78, 380–394.
Adachi, K. (2016). Three-way principal component analysis with its applications to psychology. In T. Sakata (Ed.), Applied matrix and tensor variate data analysis. (pp. 1–21). Springer.
Adachi, K. (2019). Factor analysis: Latent variable, matrix decomposition, and constrained uniqueness formulations. WIREs Computational Statistics, https://onlinelibrary.wiley.com/doi/abs/10.1002/wics.1458. Accessed 19 Mar 2019
Adachi, K., & Trendafilov, N. T. (2016). Sparse principal component analysis subject to prespecified cardinality of loadings. Computational Statistics, 31, 1403–1427.
Adachi, K., & Trendafilov, N. T. (2018a). Sparsest factor analysis for clustering variables: A matrix decomposition approach. Advances in Data Analysis and Classification, 12, 559–585.
Adachi, K., & Trendafilov, N. T. (2018b). Some mathematical properties of the matrix decomposition solution in factor analysis. Psychometrika, 83, 407–424.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317–332.
Bartholomew, D., Knott, M., & Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach (Third Edition). . Wiley.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
Gan, G., Ma, C., & Wu, J. (2007). Data clustering: Theory, algorithms, and applications. . Society of Industrial and Applied Mathematics (SIAM).
Guttman, L. (1954). Some necessary conditions for common-factor analysis. Psychometrika, 19, 149–160.
Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in factor analysis model. Statistics and Computing, 25, 863–875.
Jöreskog, K. G. (1967). Some contributions to maximum likelihood factor analysis. Psychometrika, 32, 443–482.
Kaiser, H. F. (1960). The application of electronic computers to factor analysis. Educational and Psychological Measurements, 20, 141–151.
Koch, I. (2014). Analysis of multivariate and high-dimensional data. . Cambridge University Press.
Konishi, S., & Kitagawa, G. (2007). Information criteria and statistical modeling. . Springer.
Osgood, C. E., Suci, G. J., & Tannenbaum, P. H. (1957). The measurement of meaning. . University of Illinois Press.
Rubin, D. B., & Thayer, D. T. (1982). EM algorithms for ML factor analysis. Psychometrika, 47, 69–76.
Seber, G. A. F. (2008). A matrix handbook for statisticians. . Wiley.
Stegeman, A. (2016). A new method for simultaneous estimation of the factor model parameters, factor scores, and unique parts. Computational Statistics & Data Analysis, 99, 189–203.
Vichi, M. (2017). Disjoint factor analysis with cross-loadings. Advances in Data Analysis and Classification, 11, 563–591.
Vichi, M., & Saporta, G. (2009). Clustering and disjoint principal component analysis with cross-loadings. Computational Statistics & Data Analysis, 53, 3194–3208.
Yanai, H., & Ichikawa, M. (2007). Factor analysis. In C. R. Rao & S. Sinharay (Eds.), Handbook of statistics, vol. 26: Psychometrics. (pp. 257–296). Elsevier.
Yeung, K. Y., & Ruzzo, W. L. (2001). Principal component analysis for clustering gene expression data. Bioinformatics, 17, 763–774.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Here, we consider minimizing loss function (7) over bk with the other parameter kept fixed. For the sake of simplicity, let us omit the subscript k from the symbols in (7). Then, it is simplified as f(b) = log|bb′ + \({{\varvec{\Psi}}}\)|+ trS(bb′ + \({{\varvec{\Psi}}}\))−1. Using W = \({{\varvec{\Psi}}}\)−1/2S\({{\varvec{\Psi}}}\)−1/2 and
the loss function can be rewritten as
where we have used |bb′ + Ψ|= ϕ(b)|Ψ| (Seber, 2008, p. 312), (bb′ + Ψ)−1 = Ψ−1 − ϕ(b)−1Ψ−1bb′Ψ−1 (Seber, 2008, p. 309), and |Ψ| =|S|/|W|.
Using Σ = bb′ + Ψ, (21) is also expressed as f(b) = log|Σ|+ trSΣ−1. This minimizer must satisfy ∂f(b)/∂b = (Σ−1 − Σ−1SΣ−1)b = 0m, or equivalently,
Multiplying both sides by b′\({{\varvec{\Psi}}}\)−1b leads to
where bb′ = Σ − \({{\varvec{\Psi}}}\) and (22) have been used. We can use (20) to rewrite (23) as S\({{\varvec{\Psi}}}\)−1b = ϕ(b)b and premultiply both sides by \({{\varvec{\Psi}}}\)−1/2 to have \({{\varvec{\Psi}}}\)−1/2S\({{\varvec{\Psi}}}\)−1/2\({{\varvec{\Psi}}}\)−1/2b = ϕ(b) \({{\varvec{\Psi}}}\)−1/2b, i.e.,
This is an eigen equation showing that an eigenvalue of W is expressed as (20), when b is the optimal.
Using (20) and (24), we can rewrite the final term on the right side of (21) as ϕ(b)−1b′\({{\varvec{\Psi}}}\)−1/2W\({{\varvec{\Psi}}}\)−1/2b = ϕ(b)−1b′\({{\varvec{\Psi}}}\)−1/2{ϕ(b)( \({{\varvec{\Psi}}}\)−1/2b)} = b′\({{\varvec{\Psi}}}\)−1b = ϕ(b) − 1. Thus, (21) is rewritten as
This shows that minimizing (21) over b amounts to maximizing g = ϕ(b) − logϕ(b) − 1. Here, dg/dϕ(b) = 1 − 1/ϕ(b) shows a larger ϕ(b) leading to a greater g, and b is subject to (20) being an eigenvalue of W. Thus, the maximization is attained by selecting b so that (20) is the largest eigenvalue of W: ρmax(W) = 1 + b′\({{\varvec{\Psi}}}\)−1b. This holds for b = Ψ1/2u{ρmax(W) − 1}1/2, i.e., (7).
Rights and permissions
About this article
Cite this article
Cai, J., Adachi, K. High-dimensional disjoint factor analysis with its EM algorithm version. Jpn J Stat Data Sci 4, 427–448 (2021). https://doi.org/10.1007/s42081-021-00119-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42081-021-00119-x