Abstract
Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional \(\ell _1\)-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods.
Similar content being viewed by others
References
Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton, NJ: Princeton University Press.
Boumal, N., Mishra, B., Absil, P.-A., & Sepulchre, R. (2014). MANOPT: a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15, 1455–1459.
Choi, J., Zou, H., & Oehlert, G. (2011). A penalized maximum likelihood approach to sparse factor analysis. Statistics and Its Interface, 3, 429–436.
Del Buono, N., & Lopez, L. (2001). Runge–Kutta type methods based on geodesics for systems of ODEs on the Stiefel manifold. BIT Numerical Mathematics, 41(5), 912–923.
Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20, 303–353.
Fontanella, S., Trendafilov, N., & Adachi, K. (2014). Sparse exploratory factor analysis. In Proceedings of COMPSTAT, 2014 (pp. 281–288).
Hage, C., & Kleinsteuber, M. (2014). Robust PCA and subspace tracking from incomplete observations using \(\ell _0\)-surrogates. Computational Statistics, 29, 467–487.
Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago, IL: University of Chicago Press.
Hirose, K., & Yamamoto, M. (2014). Estimation of an oblique structure via penalized likelihood factor analysis. Computational Statistics and Data Analysis, 79, 120–132.
Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in a factor analysis model. Statistics and Computing, 25, 863–875.
Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York, NY: Springer-verlag.
Jöreskog, K. G. (1977). Factor analysis by least-squares and maximum likelihood methods. In K. Enslein, A. Ralston, & H. S. Wilf (Eds.), Mathematical methods for digital computers (pp. 125–153). New York, NY: John Wiley & Sons.
Luss, R., & Teboulle, M. (2013). Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Review, 55, 65–98.
MATLAB. (2014). MATLAB R2014b. New York, NY: The MathWorks Inc.
Mulaik, S. A. (2010). The foundations of factor analysis (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC.
Ning, N., & Georgiou, T. T. (2011). Sparse factor analysis via likelihood and \(\ell _1\)-regularization. In 50th IEEE conference on decision and control and european control conference (CDC-ECC) Orlando, FL, USA, December 12–15, 2011.
Trendafilov, N. T. (2003). Dynamical system approach to factor analysis parameter estimation. British Journal of Mathematical and Statistical Psychology, 56, 27–46.
Trendafilov, N. T. (2014). From simple structure to sparse components: A review. Computational Statistics, 29, 431–454.
Trendafilov, N. T., & Adachi, K. (2015). Sparse versus simple structure loadings. Psychometrika, 80, 776–790.
Trendafilov, N. T., & Jolliffe, I. T. (2006). Projected gradient approach to the numerical solution of the SCoTLASS. Computational Statistics and Data Analysis, 50, 242–253.
Wen, Z., & Yin, W. (2013). A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142, 397–434.
Acknowledgements
We are grateful to the Reviewers for the careful reading of the manuscript and their helpful comments. We also thank Dr Kei Hirose, Osaka University, for his help with fanc.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work is supported by a Grant RPG-2013-211 from The Leverhulme Trust, UK.
Appendices
Appendix 1
Here, we find the gradient of the penalty term \(P_\tau (Q)^\top P_\tau (Q)\) in (11) and (12), which can be then combined with the gradients of the objective functions of ML-, LS-, or GLS-EFA. Let start with
which requires the calculation of \(d(P_\tau )\). At this point, we need an approximation of \(\text{ sign }(x)\), and we employ the one already used in (Trendafilov & Jolliffe, 2006), which is \(\text{ sign }(x) \approx \text{ tanh }(\gamma x)\) for some large \(\gamma > 0\), or for short \(\text{ th }(\gamma x)\). See also (Hage & Kleinsteuber, 2014; Luss & Teboulle, 2013). Then
where \(1_r\) is a \(r \times 1\) vector with unit entries. The next differential to be found is:
where \(1_{p \times r}\) is a \(p \times r\) matrix with unit entries.
Now we are ready to find the gradient \(\nabla _Q\) of the penalty term with respect to Q. To simplify the notations, let
and
Going back to (14) and (15), we find that:
making use of the identity \(\text{ trace } (A \odot B) C = \text{ trace } A (B^\top \odot C)\). Thus, the gradient \(\nabla _Q\) of the penalty term with respect to Q is:
Appendix 2
Here, we summarize some technical details related to the numerical solutions employed in the work.
The gradients of the ML-, LS- and GLS-EFA objective functions with respect to the unknowns \(\{Q, D, \Psi \}\) are given in Trendafilov (2003) as the following block-matrix: \(( -Y \textit{QD}^{2}, -Q^{T} Y Q \odot D, -Y \odot \Psi )\). For ML-EFA, one has \(Y = 2 R_{ZZ}^{-1} (R-R_{ZZ}) R_{ZZ}^{-1}\), and for LS- and GLS-EFA, it changes to \(Y = 4 (R - R_{ZZ})V^2\). Additionally, we need the gradient \(\nabla _Q\) of the penalty term \(P_\tau (Q)^\top P_\tau (Q)\) with respect to Q, which should be added to \(-Y \textit{QD}^{2}\). Its derivation is given in details in Appendix.
The dynamical system approach employed in (Trendafilov, 2003) can be readily applied for solving (11) and (12). It involves numerical integration of matrix ordinary differential equations (ODE) for \(\{Q, D, \Psi \}\) defined by their projected gradients. Particularly, it involves projected gradient dynamical system for Q on the Stiefel manifold of all \(p \times r\) orthonormal matrices. There exist a number of specialized numerical methods for solving such problem listed in (Trendafilov, 2003), e.g. Del Buono & Lopez (2001) and etc. In contrast to the standard EFA alternating approaches (Jöreskog, 1977; Mulaik, 2010), the dynamical system approach gives matrix algorithms which produce simultaneous solution for \(\{Q, D, \Psi \}\) exploiting the geometry of their specific matrix structures. Moreover, such algorithms are globally convergent, i.e. the convergence is reached independently of the starting (initial) point (Absil et al., 2008; Trendafilov, 2003).
The numerical ODE solvers currently available in MATLAB (MATLAB, 2014) are not suitable for solving large optimization problems. They track the whole trajectory defined by the ODE which is time-consuming and undesirable when the asymptotic state is of interest only. This limits the application of the proposed approach to solving (11) and (12) for rather small data sets.
An alternative way is to employ iterative algorithms directly working on matrix manifolds (Absil et al., 2008; Edelman et al., 1998; Wen & Yin, 2013). The listed above gradients can be readily used for solving (11) and (12) by employing MANOPT, a free MATLAB-based software for optimization on matrix manifolds (Boumal et al., 2014). The MANOPT code for solving (11) and (12) can be obtained from the authors upon request, and will be available online. Note that by choosing \(\mu = 0\), one can obtain solutions for the standard ML-, LS- and GLS-EFA problems (9) and (10).
Rights and permissions
About this article
Cite this article
Trendafilov, N.T., Fontanella, S. & Adachi, K. Sparse Exploratory Factor Analysis. Psychometrika 82, 778–794 (2017). https://doi.org/10.1007/s11336-017-9575-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-017-9575-8