Abstract
We consider the problem of sparse estimation in a factor analysis model. A traditional estimation procedure in use is the following two-step approach: the model is estimated by maximum likelihood method and then a rotation technique is utilized to find sparse factor loadings. However, the maximum likelihood estimates cannot be obtained when the number of variables is much larger than the number of observations. Furthermore, even if the maximum likelihood estimates are available, the rotation technique does not often produce a sufficiently sparse solution. In order to handle these problems, this paper introduces a penalized likelihood procedure that imposes a nonconvex penalty on the factor loadings. We show that the penalized likelihood procedure can be viewed as a generalization of the traditional two-step approach, and the proposed methodology can produce sparser solutions than the rotation technique. A new algorithm via the EM algorithm along with coordinate descent is introduced to compute the entire solution path, which permits the application to a wide variety of convex and nonconvex penalties. Monte Carlo simulations are conducted to investigate the performance of our modeling strategy. A real data example is also given to illustrate our procedure.
Similar content being viewed by others
References
Akaike, H.: Information theory and an extension of the maximum likelihood principle. In: Petrov, B.N., Csaki, F. (eds.) 2nd International Symposium on Information Theory, pp. 267–281. Akademiai Kiado, Budapest (1973)
Akaike, H.: Factor analysis and AIC. Psychometrika 52(3), 317–332 (1987)
Anderson, T., Rubin, H.: Statistical inference in factor analysis. In: Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability vol. 5, pp. 111–150 (1956)
Bai, J., Li, K.: Statistical analysis of factor models of high dimension. Ann. Stat. 40(1), 436–465 (2012)
Bai, J., Liao, Y.: Efficient estimation of approximate factor models via regularized maximum likelihood. arXiv, preprint arXiv:12095911 (2012)
Bozdogan, H.: Model selection and akaike’s information criterion (AIC): the general theory and its analytical extensions. Psychometrika 52(3), 345–370 (1987)
Breheny, P., Huang, J.: Coordinate descent algorithms for nonconvex penalized regression, with applications to biological feature selection. Ann. Appl. Stat. 5(1), 232 (2011)
Caner, M.: Selecting the correct number of factors in approximate factor models: the large panel case with bridge estimators. Technical report. (2011)
Carvalho, C.M., Chang, J., Lucas, J.E., Nevins, J.R., Wang, Q.: West M High-dimensional sparse factor modeling: applications in gene expression genomics. J. American Stat. Assoc. 103(484), 1438–1456 (2008)
Chen, J., Chen, Z.: Extended bayesian information criteria for model selection with large model spaces. Biometrika 95(3), 759–771 (2008)
Choi, J., Zou, H., Oehlert, G.: A penalized maximum likelihood approach to sparse factor analysis. Stat. Interface 3(4), 429–436 (2011)
Clarke, M.: A rapidly convergent method for maximum-likelihood factor analysis. British J. Math. Stat. Psychol. 23(1), 43–52 (1970)
Efron, B.: How biased is the apparent error rate of a prediction rule? J. American Stat. Assoc. 81, 461–470 (1986)
Efron, B.: The estimation of prediction error: covariance penalties and cross-validation. J. American Stat. Assoc. 99, 619–642 (2004)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression (with discussion). Ann. Stat. 32, 407–499 (2004)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. American Stat. Assoc. 96, 1348–1360 (2001)
Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools. Technometrics 35, 109–148 (1993)
Friedman, J., Hastie, H., Höfling, H., Tibshirani, R.: Pathwise coordinate optimization. Ann. Appl. Stat. 1, 302–332 (2007)
Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)
Friedman, J.: Fast sparse regression and classification. Int. J. Forecast. 28(3), 722–738 (2012)
Fu, W.: Penalized regression: the bridge versus the lasso. J. Comput. Graph. Stat. 7, 397–416 (1998)
Hastie, T., Rosset, S., Tibshirani, R., Zhu, J.: The entire regularization path for the support vector machine. J. Mach. Learn. Res. 5, 1391–1415 (2004)
Hendrickson, A., White, P.: Promax: a quick method for rotation to oblique simple structure. British J. Stat. Psychol. 17(1), 65–70 (1964)
Hirose, K., Konishi, S.: Variable selection via the weighted group lasso for factor analysis models. Canadian J. Stat. 40(2), 345–361 (2012)
Hirose, K., Tateishi, S., Konishi, S.: Tuning parameter selection in sparse regression modeling. Comput. Stat. Data Anal. (2012)
Jennrich, R.: Rotation to simple loadings using component loss functions: the orthogonal case. Psychometrika 69(2), 257–273 (2004)
Jennrich, R.: Rotation to simple loadings using component loss functions: the oblique case. Psychometrika 71(1), 173–191 (2006)
Jennrich, R., Robinson, S.: A Newton–Raphson algorithm for maximum likelihood factor analysis. Psychometrika 34(1), 111–123 (1969)
Jöreskog, K.: Some contributions to maximum likelihood factor analysis. Psychometrika 32(4), 443–482 (1967)
Kaiser, H.: The varimax criterion for analytic rotation in factor analysis. Psychometrika 23(3), 187–200 (1958)
Kato, K.: On the segrees of dreedom in shrinkage estimation. J. Multivar. Anal. 100, 1338–1352 (2009)
Kiers, H.A.: Simplimax, oblique rotation to an optimal target with simple structure. Psychometrika 59(4), 567–579 (1994)
Mazumder, R., Friedman, J., Hastie, T.: Sparsenet: coordinate descent with nonconvex penalties. J. American Stat. Assoc. 106, 1125–1138 (2011)
Mulaik, S.: The Foundations of Factor Analysis, 2nd edn. Chapman and Hall/CRC, Boca Raton (2010)
Ning, L., Georgiou, T.T.: Sparse factor analysis via likelihood and \(\ell _1\) regularization. In: Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference, pp 5188–5192 (2011)
R Development Core Team.: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org (2010), ISBN 3-900051-07-0
Rubin, D., Thayer, D.: EM algorithms for ML factor analysis. Psychometrika 47(1), 69–76 (1982)
Schwarz, G.: Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9, 1135–1151 (1978)
Stock, J.H., Watson, M.W.: Forecasting using principal components from a large number of predictors. J. American Stat. Assoc. 97(460), 1167–1179 (2002)
Thurstone, L.L.: Multiple Factor Analysis. University of Chicago Press, Chicago (1947)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal Stat. Soc. Ser. B 58, 267–288 (1996)
Tipping, M.E., Bishop, C.M.: Probabilistic principal component analysis. J. Royal Stat. Soc. 61(3), 611–622 (1999)
Ulfarsson, M.O., Solo, V.: Sparse variable principal component analysis with application to fmri. In: Proceedings of the 4th IEEE International Symposium on Biomedical Imaging from Nano to Macro, ISBI 2007, pp 460–463 (2007)
Xie, S., Krishnan, S., Lawniczak, A.T.: Sparse principal component extraction and classification of long-term biomedical signals. In: Proceedings of the IEEE 25th International Symposium on Computer-Based Medical Systems (CBMS), pp 1–6 (2012)
Ye, J.: On measuring and correcting the effects of data mining and model selection. J. American Stat. Assoc. 93, 120–131 (1998)
Yoshida, R., West, M.: Bayesian learning in sparse graphical factor models via variational mean-field annealing. J. Mach. Learn. Res. 99, 1771–1798 (2010)
Yuan, M., Lin, Y.: Model selection and estimation in the gaussian graphical model. Biometrika 94(1), 19–35 (2007)
Zhang, C.: Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhao, P., Yu, B.: On model selection consistency of lasso. J. Mach. Learn. Res. 7(2), 2541 (2007)
Zou, H.: The adaptive lasso and its oracle properties. J. American Stat. Assoc. 101, 1418–1429 (2006)
Zou, H., Li, R.: One-step sparse estimates in nonconcave penalized likelihood models. Ann. Stat. 36(4), 1509 (2008)
Zou, H., Hastie, T., Tibshirani, R.: Sparse principal component analysis. J. Comput. Graph. Stat. 15(2), 265–286 (2006)
Zou, H., Hastie, T., Tibshirani, R.: On the degrees of freedom of the lasso. Ann. Stat. 35, 2173–2192 (2007)
Acknowledgments
The authors would like to thank anonymous reviewers for the constructive and helpful comments that improved the quality of the paper considerably. We also thank Professor Yutaka Kano for the helpful discussions.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A: Derivation of complete-data penalized log-likelihood function in EM algorithm
In order to apply the EM algorithm, first, the common factors \(\varvec{f}_n\) can be regarded as missing data and maximize the complete-data penalized log-likelihood function
where the density function \(f(\varvec{x}_n,\varvec{f}_n)\) is defined by
Then, the expectation of \({l}_{\rho }^{C}({\varvec{\varLambda }},\varvec{\varPsi }) \) can be taken with respect to the distributions \(f(\varvec{f}_n | \varvec{x}_n,{\varvec{\varLambda }},\varvec{\varPsi })\),
For given \({\varvec{\varLambda }}_\text {old}\) and \(\varvec{\varPsi }_\text {old}\), the posterior \(f(\varvec{f}_n | \varvec{x}_n,{\varvec{\varLambda }}_\text {old}, \varvec{\varPsi }_\text {old})\) is normally distributed with \(E[\varvec{F}_n|\varvec{x}_n] = \varvec{M}^{-1}{\varvec{\varLambda }}_\text {old}^T\varvec{\varPsi }_\text {old}^{-1} \varvec{x}_n\) and \(E[\varvec{F}_n\varvec{F}_n^T|\varvec{x}_n] = \varvec{M} ^{-1} + E[\varvec{F}_n|\varvec{x}_n] E[\varvec{F}_n|\varvec{x}_n] ^T\), where \(\varvec{M} = {\varvec{\varLambda }}_\text {old}^T\varvec{\varPsi }_\text {old}^{-1}{\varvec{\varLambda }}_\text {old} + \varvec{I}_m\). Then, we have
Let \(\varvec{M}^{-1}{\varvec{\varLambda }}_\text {old}^T\varvec{\varPsi }_\text {old}^{-1}\varvec{s}_i\) and \(\varvec{M} ^{-1} + \varvec{M}^{-1}{\varvec{\varLambda }}_\text {old}^T\varvec{\varPsi }_\text {old}^{-1}\varvec{S}\varvec{\varPsi }_\text {old}^{-1}{\varvec{\varLambda }}_\text {old}\varvec{M}^{-1}\) be \(\varvec{b}_i\) and \(\varvec{A}\), respectively. Then, the expectation of \({l}_{\rho }^{C}({\varvec{\varLambda }},\varvec{\varPsi }) \) in (7) can be derived.
Appendix B: proof of Lemma 1
The proof is by contradiction. Assume that \(\hat{{\varvec{\varLambda }}}\) and \(\hat{\varvec{\varPsi }}\) are the solution of (6) and the \(j\)th column of \(\hat{{\varvec{\varLambda }}}\) has only one nonzero element, say, \(\hat{\lambda }_{aj}\). Another parameter \(\hat{{\varvec{\varLambda }}}^*\) and \(\hat{\varvec{\varPsi }}^*\) are defined, where \(\hat{{\varvec{\varLambda }}}^*\) is same as \(\hat{{\varvec{\varLambda }}}\) but with the \(aj\)th element being zero and \(\hat{\varvec{\varPsi }}^*\) is same as \(\hat{\varvec{\varPsi }}\) but with the \(j\)th diagonal element being \(\hat{\psi }_j+\hat{\lambda }_{aj}^2\). In this case, we have the same covariance structure, i.e., \(\hat{{\varvec{\varLambda }}}\hat{{\varvec{\varLambda }}}^T+\hat{\varvec{\varPsi }}=\hat{{\varvec{\varLambda }}}^*\hat{{\varvec{\varLambda }}}^{*T}+\hat{\varvec{\varPsi }}^*\), which suggests \(\ell (\hat{{\varvec{\varLambda }}}, \hat{\varvec{\varPsi }}) = \ell (\hat{{\varvec{\varLambda }}}^*, \hat{\varvec{\varPsi }}^*)\), whereas the penalty term of \( \sum _{i=1}^p\sum _{j=1}^m \rho P(|\hat{\lambda }_{ij}|)\) is larger than \(\sum _{i=1}^p\sum _{j=1}^m \rho P(|\hat{\lambda }^*_{ij}|)\). This means \(\ell _{\rho }(\hat{{\varvec{\varLambda }}}, \hat{\varvec{\varPsi }}) < \ell _{\rho }(\hat{{\varvec{\varLambda }}}^*, \hat{\varvec{\varPsi }}^*)\), which contradicts the assumption that \(\hat{{\varvec{\varLambda }}}\) and \(\hat{\varvec{\varPsi }}\) are penalized maximum likelihood estimates.
Rights and permissions
About this article
Cite this article
Hirose, K., Yamamoto, M. Sparse estimation via nonconcave penalized likelihood in factor analysis model. Stat Comput 25, 863–875 (2015). https://doi.org/10.1007/s11222-014-9458-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9458-0