Skip to main content
Log in

Sparse Exploratory Factor Analysis

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Sparse principal component analysis is a very active research area in the last decade. It produces component loadings with many zero entries which facilitates their interpretation and helps avoid redundant variables. The classic factor analysis is another popular dimension reduction technique which shares similar interpretation problems and could greatly benefit from sparse solutions. Unfortunately, there are very few works considering sparse versions of the classic factor analysis. Our goal is to contribute further in this direction. We revisit the most popular procedures for exploratory factor analysis, maximum likelihood and least squares. Sparse factor loadings are obtained for them by, first, adopting a special reparameterization and, second, by introducing additional \(\ell _1\)-norm penalties into the standard factor analysis problems. As a result, we propose sparse versions of the major factor analysis procedures. We illustrate the developed algorithms on well-known psychometric problems. Our sparse solutions are critically compared to ones obtained by other existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2

Similar content being viewed by others

References

  • Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton, NJ: Princeton University Press.

    Book  Google Scholar 

  • Boumal, N., Mishra, B., Absil, P.-A., & Sepulchre, R. (2014). MANOPT: a Matlab toolbox for optimization on manifolds. Journal of Machine Learning Research, 15, 1455–1459.

    Google Scholar 

  • Choi, J., Zou, H., & Oehlert, G. (2011). A penalized maximum likelihood approach to sparse factor analysis. Statistics and Its Interface, 3, 429–436.

    Article  Google Scholar 

  • Del Buono, N., & Lopez, L. (2001). Runge–Kutta type methods based on geodesics for systems of ODEs on the Stiefel manifold. BIT Numerical Mathematics, 41(5), 912–923.

    Article  Google Scholar 

  • Edelman, A., Arias, T. A., & Smith, S. T. (1998). The geometry of algorithms with orthogonality constraints. SIAM Journal on Matrix Analysis and Applications, 20, 303–353.

    Article  Google Scholar 

  • Fontanella, S., Trendafilov, N., & Adachi, K. (2014). Sparse exploratory factor analysis. In Proceedings of COMPSTAT, 2014 (pp. 281–288).

  • Hage, C., & Kleinsteuber, M. (2014). Robust PCA and subspace tracking from incomplete observations using \(\ell _0\)-surrogates. Computational Statistics, 29, 467–487.

    Article  Google Scholar 

  • Harman, H. H. (1976). Modern factor analysis (3rd ed.). Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Hirose, K., & Yamamoto, M. (2014). Estimation of an oblique structure via penalized likelihood factor analysis. Computational Statistics and Data Analysis, 79, 120–132.

    Article  Google Scholar 

  • Hirose, K., & Yamamoto, M. (2015). Sparse estimation via nonconcave penalized likelihood in a factor analysis model. Statistics and Computing, 25, 863–875.

    Article  Google Scholar 

  • Jolliffe, I. T. (2002). Principal component analysis (2nd ed.). New York, NY: Springer-verlag.

    Google Scholar 

  • Jöreskog, K. G. (1977). Factor analysis by least-squares and maximum likelihood methods. In K. Enslein, A. Ralston, & H. S. Wilf (Eds.), Mathematical methods for digital computers (pp. 125–153). New York, NY: John Wiley & Sons.

    Google Scholar 

  • Luss, R., & Teboulle, M. (2013). Conditional gradient algorithms for rank-one matrix approximations with a sparsity constraint. SIAM Review, 55, 65–98.

    Article  Google Scholar 

  • MATLAB. (2014). MATLAB R2014b. New York, NY: The MathWorks Inc.

    Google Scholar 

  • Mulaik, S. A. (2010). The foundations of factor analysis (2nd ed.). Boca Raton, FL: Chapman and Hall/CRC.

    Google Scholar 

  • Ning, N., & Georgiou, T. T. (2011). Sparse factor analysis via likelihood and \(\ell _1\)-regularization. In 50th IEEE conference on decision and control and european control conference (CDC-ECC) Orlando, FL, USA, December 12–15, 2011.

  • Trendafilov, N. T. (2003). Dynamical system approach to factor analysis parameter estimation. British Journal of Mathematical and Statistical Psychology, 56, 27–46.

    Article  PubMed  Google Scholar 

  • Trendafilov, N. T. (2014). From simple structure to sparse components: A review. Computational Statistics, 29, 431–454.

    Article  Google Scholar 

  • Trendafilov, N. T., & Adachi, K. (2015). Sparse versus simple structure loadings. Psychometrika, 80, 776–790.

    Article  PubMed  Google Scholar 

  • Trendafilov, N. T., & Jolliffe, I. T. (2006). Projected gradient approach to the numerical solution of the SCoTLASS. Computational Statistics and Data Analysis, 50, 242–253.

    Article  Google Scholar 

  • Wen, Z., & Yin, W. (2013). A feasible method for optimization with orthogonality constraints. Mathematical Programming, 142, 397–434.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to the Reviewers for the careful reading of the manuscript and their helpful comments. We also thank Dr Kei Hirose, Osaka University, for his help with fanc.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nickolay T. Trendafilov.

Additional information

This work is supported by a Grant RPG-2013-211 from The Leverhulme Trust, UK.

Appendices

Appendix 1

Here, we find the gradient of the penalty term \(P_\tau (Q)^\top P_\tau (Q)\) in (11) and (12), which can be then combined with the gradients of the objective functions of ML-, LS-, or GLS-EFA. Let start with

$$\begin{aligned} d(P(Q_\tau )^\top P_\tau (Q)) = 2 d(P_\tau (Q))^\top P_\tau (Q) = 2 d(P_\tau )^\top P_\tau , \end{aligned}$$
(14)

which requires the calculation of \(d(P_\tau )\). At this point, we need an approximation of \(\text{ sign }(x)\), and we employ the one already used in (Trendafilov & Jolliffe, 2006), which is \(\text{ sign }(x) \approx \text{ tanh }(\gamma x)\) for some large \(\gamma > 0\), or for short \(\text{ th }(\gamma x)\). See also (Hage & Kleinsteuber, 2014; Luss & Teboulle, 2013). Then

$$\begin{aligned} 2(dP_\tau )= & {} (d{\mathbf {q}}_\tau ) \odot [1_r + \text{ th }(\gamma {\mathbf {q}}_\tau )] + {\mathbf {q}}_\tau \odot [1_r - \text{ th }^2(\gamma {\mathbf {q}}_\tau )] \odot \gamma (d{\mathbf {q}}_\tau ),\nonumber \\= & {} (d{\mathbf {q}}_\tau ) \odot \left\{ 1_r + \text{ th }(\gamma {\mathbf {q}}_\tau ) + \gamma {\mathbf {q}}_\tau \odot [1_r - \text{ th }^2(\gamma {\mathbf {q}}_\tau )] \right\} , \end{aligned}$$
(15)

where \(1_r\) is a \(r \times 1\) vector with unit entries. The next differential to be found is:

$$\begin{aligned} dq_\tau= & {} 1_p^\top \left\{ (dQ) \odot \text{ th }(\gamma Q) + Q \odot [1_{p \times r} - \text{ th }^2(\gamma Q)] \odot \gamma (dQ)\right\} \nonumber \\= & {} 1_p^\top \left\{ (dQ) \odot \{ \text{ th }(\gamma Q) + (\gamma Q) \odot [1_{p \times r} - \text{ th }^2(\gamma Q)\} \right\} , \end{aligned}$$
(16)

where \(1_{p \times r}\) is a \(p \times r\) matrix with unit entries.

Now we are ready to find the gradient \(\nabla _Q\) of the penalty term with respect to Q. To simplify the notations, let

$$\begin{aligned} {{\mathbf {w}}} = 1_r + \text{ th }(\gamma {{\mathbf {q}}}_\tau ) + (\gamma {{\mathbf {q}}}_\tau ) \odot [1_r - \text{ th }^2(\gamma {{\mathbf {q}}}_\tau )], \end{aligned}$$
(17)

and

$$\begin{aligned} W = \text{ th }(\gamma Q) + (\gamma Q) \odot [1_{p \times r} - \text{ th }^2(\gamma Q)]. \end{aligned}$$
(18)

Going back to (14) and (15), we find that:

$$\begin{aligned} 2 (dP_\tau )^\top P_\tau= & {} \text{ trace } [(d{{\mathbf {q}}}_\tau ) \odot {{\mathbf {w}}} ]^\top P_\tau = \text{ trace } (d{{\mathbf {q}}}_\tau )^\top ({{\mathbf {w}}} \odot P_\tau ) \nonumber \\= & {} \text{ trace }\{ 1_p^\top [(dQ) \odot W] \}^\top ({{\mathbf {w}}} \odot P_\tau ) \nonumber \\= & {} \text{ trace } [(dQ)^\top \odot W^\top ] 1_p ({\mathbf {w}} \odot P_\tau ) \nonumber \\= & {} \text{ trace } (dQ)^\top \{W \odot [1_p ({\mathbf {w}} \odot P_\tau )]\}, \end{aligned}$$
(19)

making use of the identity \(\text{ trace } (A \odot B) C = \text{ trace } A (B^\top \odot C)\). Thus, the gradient \(\nabla _Q\) of the penalty term with respect to Q is:

$$\begin{aligned} \nabla _Q = W \odot [1_p ({\mathbf {w}} \odot P_\tau )]. \end{aligned}$$
(20)

Appendix 2

Here, we summarize some technical details related to the numerical solutions employed in the work.

The gradients of the ML-, LS- and GLS-EFA objective functions with respect to the unknowns \(\{Q, D, \Psi \}\) are given in Trendafilov (2003) as the following block-matrix: \(( -Y \textit{QD}^{2}, -Q^{T} Y Q \odot D, -Y \odot \Psi )\). For ML-EFA, one has \(Y = 2 R_{ZZ}^{-1} (R-R_{ZZ}) R_{ZZ}^{-1}\), and for LS- and GLS-EFA, it changes to \(Y = 4 (R - R_{ZZ})V^2\). Additionally, we need the gradient \(\nabla _Q\) of the penalty term \(P_\tau (Q)^\top P_\tau (Q)\) with respect to Q, which should be added to \(-Y \textit{QD}^{2}\). Its derivation is given in details in Appendix.

The dynamical system approach employed in (Trendafilov, 2003) can be readily applied for solving (11) and (12). It involves numerical integration of matrix ordinary differential equations (ODE) for \(\{Q, D, \Psi \}\) defined by their projected gradients. Particularly, it involves projected gradient dynamical system for Q on the Stiefel manifold of all \(p \times r\) orthonormal matrices. There exist a number of specialized numerical methods for solving such problem listed in (Trendafilov, 2003), e.g. Del Buono & Lopez (2001) and etc. In contrast to the standard EFA alternating approaches (Jöreskog, 1977; Mulaik, 2010), the dynamical system approach gives matrix algorithms which produce simultaneous solution for \(\{Q, D, \Psi \}\) exploiting the geometry of their specific matrix structures. Moreover, such algorithms are globally convergent, i.e. the convergence is reached independently of the starting (initial) point (Absil et al., 2008; Trendafilov, 2003).

The numerical ODE solvers currently available in MATLAB (MATLAB, 2014) are not suitable for solving large optimization problems. They track the whole trajectory defined by the ODE which is time-consuming and undesirable when the asymptotic state is of interest only. This limits the application of the proposed approach to solving (11) and (12) for rather small data sets.

An alternative way is to employ iterative algorithms directly working on matrix manifolds (Absil et al., 2008; Edelman et al., 1998; Wen & Yin, 2013). The listed above gradients can be readily used for solving (11) and (12) by employing MANOPT, a free MATLAB-based software for optimization on matrix manifolds (Boumal et al., 2014). The MANOPT code for solving (11) and (12) can be obtained from the authors upon request, and will be available online. Note that by choosing \(\mu = 0\), one can obtain solutions for the standard ML-, LS- and GLS-EFA problems (9) and (10).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Trendafilov, N.T., Fontanella, S. & Adachi, K. Sparse Exploratory Factor Analysis. Psychometrika 82, 778–794 (2017). https://doi.org/10.1007/s11336-017-9575-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-017-9575-8

Keywords

Navigation