Abstract
There has been regained interest in joint maximum likelihood (JML) estimation of item factor analysis (IFA) recently, primarily due to its efficiency in handling high-dimensional data and numerous latent factors. It has been established under mild assumptions that the JML estimator is consistent as both the numbers of respondents and items tend to infinity. The current work presents an efficient Riemannian optimization algorithm for JML estimation of exploratory IFA with dichotomous response data, which takes advantage of the differential geometry of the fixed-rank matrix manifold. The proposed algorithm takes substantially less time to converge than a benchmark method that alternates between gradient ascent steps for person and item parameters. The performance of the proposed algorithm in the recovery of latent dimensionality, response probabilities, item parameters, and factor scores is evaluated via simulations.
Similar content being viewed by others
Notes
de Leeuw (2006) used the term binary principal component analysis instead of one-bit matrix completion.
The nuclear norm of a matrix is the sum of its singular values.
This definition is a special case of the more general version in Absil et al. (2008, Sect. 3.3).
Members of \(\mathcal{T}_{{\varvec{\Theta }}}\mathcal{M}_{k}(d_1, d_2)\) are referred to as tangent vectors, although they are in fact matrices.
The term is used when the optimization problem involves constraints in addition to those imposed by the manifold (e.g., the infinity-norm constraint in JML estimation). Similarly, an unconstrained manifold optimization problem involves only manifold constraints.
For the Riemannian optimization algorithm, the total number of inner (Riemannian CG) iterations was reported.
References
Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton: Princeton University Press.
Absil, P.-A., & Malick, J. (2012). Projection-like retractions on matrix manifolds. SIAM Journal on Optimization, 221, 135–158.
Andersen, E. B. (1970). Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society: Series B (Methodological), 32(2), 283–301.
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Boca Raton: CRC Press.
Bartholomew, D. J., Steele, F., Galbraith, J., & Moustaki, I. (2008). Analysis of multivariate social science data. Boca Raton: CRC Press.
Bertsekas, D. P. (1999). Nonlinear programming. Belmont: Athena Scientific.
Björck, A., & Golub, G. H. (1973). Numerical methods for computing angles between linear subspaces. Mathematics of Computation, 27(123), 579–594.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 464, 443–459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for \(n\) dichotomously scored items. Psychometrika, 352, 179–197.
Borckmans, P. B., Selvan, S. E., Boumal, N., & Absil, P.-A. (2014). A Riemannian subgradient algorithm for economic dispatch with valve-point effect. Journal of Computational and Applied Mathematics, 255, 848–866.
Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 361, 111–150.
Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 751, 33–57.
Cai, T., & Zhou, W. X. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. The Journal of Machine Learning Research, 14(1), 3619–3647.
Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 96, 717–772.
Carpentier, A., Klopp, O., Löffler, M., & Nickl, R. (2018). Adaptive confidence sets for matrix completion. Bernoulli, 244A, 2429–2460.
Chen, Y., Li, X., & Zhang, S. (2018). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika (Advance Online Publication). https://doi.org/10.1007/s11336-018-9646-5.
Chen, Y., Li, X. & Zhang, S. (2019). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association. (Advance Online Publication) https://doi.org/10.1080/01621459.2019.1635485
Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 142, 81–100.
Davenport, M. A., Plan, Y., Van Den Berg, E., & Wootters, M. (2014). 1-bit matrix completion. Information and Inference: A Journal of the IMA, 33, 189–223.
de Leeuw, J. (2006). Principal component analysis of binary data by iterated singular value decomposition. Computational Statistics & Data Analysis, 501, 21–39.
Fan, J., Gong, W., & Zhu, Z. (2019). Generalized high-dimensional trace regression via nuclear norm regularization. Journal of Econometrics, 212(1), 177–202.
Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 323, 928–961.
Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 581, 145–172.
Golub, G., & Van Loan, C. (2013). Matrix computations (4th ed.). Baltimore: Johns Hopkins University Press.
Haberman, S. J. (2006). Adaptive quadrature for item response models (Tech. Rep. No. RR-06-29). Princeton: ETS.
Hofer, S. M., & Piccinin, A. M. (2009). Integrative data analysis through coordination of measurement and analysis protocol across independent longitudinal studies. Psychological Methods, 142, 150.
Huang, W., Gallivan, K. A., & Absil, P.-A. (2015). A Broyden class of quasi-newton methods for Riemannian optimization. SIAM Journal on Optimization, 253, 1660–1685.
Jeon, M., Kaufman, C., & Rabe-Hesketh, S. (2019). Monte Carlo local likelihood approximation. Biostatistics, 201, 164–179.
Klopp, O. (2015). Matrix completion by singular value thresholding: Sharp bounds. Electronic Journal of Statistics, 92, 2348–2369.
Klopp, O., Lafond, J., Moulines, É., & Salmon, J. (2015). Adaptive multinomial matrix completion. Electronic Journal of Statistics, 92, 2950–2975.
Koopmans, T. C., & Reiersøl, O. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 212, 165–181.
Liu, C., & Boumal, N. (2019). Simple algorithms for optimization on Riemannian manifolds with constraints. Applied Mathematics & Optimization,. https://doi.org/10.1007/s00245-019-09564-3.
Liu, Y., Magnus, B., Quinn, H., & Thissen, D. (2018). Multidimensional item response theory. In D. Hughes, P. Irwing, & T. Booth (Eds.), Handbook of psychometric testing. Hoboken: Wiley.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah: Routledge.
Magnus, J., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley.
McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 341, 100–117.
Monroe, S., & Cai, L. (2014). Estimation of a Ramsay-curve item response theory model by the Metropolis–Hastings Robbins–Monro algorithm. Educational and Psychological Measurement, 742, 343–369.
Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16(1), 1–32.
O’Rourke, S., Vu, V., & Wang, K. (2018). Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra and its Applications, 540, 26–59.
Pinar, M. Ç., & Zenios, S. A. (1994). On smoothing exact penalty functions for convex constrained optimization. SIAM Journal on Optimization, 43, 486–511.
Polak, E., & Ribière, G. (1969). Note sur la convergence de méthodes de directions conjuguées. Revue Française d’Informatique et de Recherche Opérationnelle. Série Rouge, 316, 35–43.
R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. https://www.R-project.org/
Reckase, M. (2009). Multidimensional item response theory. New York: Springer.
Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.), Handbook of individual differences in cognition (pp. 27–49). Berlin: Springer.
Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 703, 533–555.
Shalit, U., Weinshall, D., & Chechik, G. (2012). Online learning in the embedded manifold of low-rank matrices. Journal of Machine Learning Research, 13(Feb), 429–458.
Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 523, 393–408.
Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.), The sage handbook of quantitative methods in psychology (pp. 148–177). London: Sage Publications.
Vandereycken, B. (2013). Low-rank matrix completion by Riemannian optimization. SIAM Journal on Optimization, 232, 1214–1236.
Wirth, R., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological methods, 121, 58–79.
Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 332, 102–117.
Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 712, 281–301.
Yu, Y., Wang, T., & Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 1022, 315–323.
Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix A: Proofs
Appendix A: Proofs
1.1 A.1 Proof of Proposition 1
Let \(\bar{\mathbf{U}}_0^{} = \bar{\mathbf{U}}_{0*}^{}\bar{\mathbf{R}}_{0u}^{}\) be the QR factorization of the true person parameter matrix: \(\bar{\mathbf{U}}_{0*} = (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_{0*}^{})^\top \) is orthonormal, i.e., \(\bar{\mathbf{U}}_{0*}^\top \bar{\mathbf{U}}_{0*}^{} = \mathbf{I}_{k + 1}\), and \(\bar{\mathbf{R}}_{0u}^{} = (\mathbf{e}_{k + 1}^{}, \mathbf{R}_{0u}^{})\) is upper-triangular with normalized columns by Assumption ii). The block matrix inversion formula (Magnus and Neudecker 1999, p. 12) implies that \(\bar{\mathbf{R}}_{0u}^{-1} = (\mathbf{e}_{k + 1}^{}, \mathbf{R}_{0u}^-)\), i.e., the first column remains to be \(\mathbf{e}_{k + 1}\). By Lemma C.1 of Chen et al. (2018), there exists constants \(C_1, C_2>0\) and a \((k + 1)\times (k + 1)\) orthogonal rotation matrix of form \(\bar{\mathbf{Q}}_*^{} = 1\oplus \mathbf{Q}_*^{}\), where \(\oplus \) denotes the (matrix) direct sum and \(\mathbf{Q}_*^\top \mathbf{Q}_*^{} = \mathbf{I}_k^{}\), such that
with probability greater than \(1 - C_2/(d_1 + d_2)\). Now, let
Note that the first row of \(\bar{\mathbf{Q}}^{}\) is \(\mathbf{e}_{k + 1}^\top \) due to the partitioned structure of \(\bar{\mathbf{Q}}_*^{}\) and \(\bar{\mathbf{R}}_{0u}^{-1}\) . \(\bar{\mathbf{Q}}^{}\) is indeed an oblique rotation matrix, because \(\bar{\mathbf{Q}}^{-1} = \bar{\mathbf{R}}_{0u}^\top \bar{\mathbf{Q}}_*^\top \), and thus \(\mathrm {diag}(\bar{\mathbf{Q}}^{-1}\bar{\mathbf{Q}}^{-\top })\)\(=\)\(\mathrm {diag}(\bar{\mathbf{R}}_{0u}^\top \bar{\mathbf{R}}_{0u}^{})\)\(=\)\(\mathbf{1}_{k + 1}\). Finally, the Cauchy–Schwarz inequality implies that
As \(\bar{\mathbf{R}}_{0u}\) and \(\bar{\mathbf{U}}_0\) share the same set of singular values, \(\Vert \bar{\mathbf{R}}_{0u}^{-\top }\Vert _F^2\)\(\le \)\((k + 1)/\sigma _{k + 1}^2(\bar{\mathbf{U}}_0^{})\)\(\le \)\((k + 1)/c_1^2\) by Assumption iii. Equation 11 then follows from Eqs. A1 and A3.
To establish Eq. 12, notice that
Because \(\bar{\mathbf{R}}_{0u}^{}\) is column-wise normalized, \(\Vert \bar{\mathbf{R}}_{0u}^{}\Vert _F^2 = k + 1\). The remaining task is to bound \(\Vert \tilde{\mathbf{U}}\bar{\mathbf{Q}}_*^{} - \bar{\mathbf{U}}_{0*}^{}\Vert _F^2\); because the leading columns of \(\tilde{\mathbf{U}}\) and \(\bar{\mathbf{U}}_{0*}^{}\) are identical, it further suffices to bound \(\Vert \hat{\mathbf{U}}_*^{}\mathbf{Q}_*^{} - \mathbf{U}_{0*}^{}\Vert _F^2\). By the choice of \(\mathbf{Q}_*^{} = \hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{} (\mathbf{U}_{0*}^\top \hat{\mathbf{U}}_*^{}\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{})^{-1/2}\) in Chen et al. (2018, Eq. C.12),
in which \(\angle _l(\hat{\mathbf{U}}_*^{}, \mathbf{U}_{0*}^{})\), \(l = 1,\dots , k\), denotes the principal angles between \(\mathrm {span}(\hat{\mathbf{U}}_*^{})\) and \(\mathrm {span}(\mathbf{U}_{0*}^{})\), and the inequality follows from the fact that \(\sigma _l(\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{}) = \cos \angle _l(\hat{\mathbf{U}}_*^{}, \mathbf{U}_{0*}^{})\) (Björck and Golub 1973). The right-hand side of Eq. A5 converges to 0 in \(P_{{{\varvec{\Theta }}}_0}\)-probability by Equation C.10 in Chen et al. (2018). The proof is now complete.
1.2 A.2 Proof of Proposition 2
Let \({{\varvec{\Theta }}}\in \mathcal{M}_{k}(d_1, d_2)\) and \({\varvec{\gamma }}: \mathcal{R}\rightarrow \mathcal{M}_{k}(d_1, d_2)\) be a smooth curve such that \({\varvec{\gamma }}(0) = {{\varvec{\Theta }}}\). There exists \(\mathbf{w}(t)\in \mathcal{R}^{d_2}\), \(\mathbf{U}(t)\in \mathcal{R}_*^{d_1\times k}\), and \(\mathbf{V}(t)\in \mathcal{R}_*^{d_2\times k}\) such that
for t in some neighborhood of 0. Differentiating Eq. A6 with respect to t and evaluating at \(t = 0\) yield
Given a choice of orthonormal basis matrices \((\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_*^{})\) and \(\mathbf{V}_*^{}\) corresponding to \((\mathbf{1}_{d_1}, \mathbf{U}(0))\) and \(\mathbf{V}(0)\), there exist fixed \(\mathbf{m}\in \mathcal{R}^{k}\) and \(\mathbf{M}, \mathbf{N}\in \mathcal{R}_*^{k\times k}\) such that
Because \(\mathbf{1}_{d_1}\) is perpendicular to \(\mathbf{U}_*^{}\), it is possible to select \(\mathbf{U}_\perp ^{}= (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_\dagger ^{})\) where \(\mathbf{U}_{\dagger }^\top \mathbf{1}_{d_1} = \mathbf{0}_{d_1 - k - 1}\), \(\mathbf{U}_{\dagger }^\top \mathbf{U}_*^{}= \mathbf{0}_{(d_1 - k - 1)\times k}\), and \(\mathbf{U}_{\dagger }^\top \mathbf{U}_{\dagger }^{} = \mathbf{I}_{d_1 - k - 1}\). The vector/matrix derivatives in Eq. A7 can then be expanded on the orthonormal bases:
in which \(\mathbf{a}_1, \mathbf{b}_2\in \mathcal{R}^{k}\), \(\mathbf{a}_2\in \mathcal{R}^{d_2 - k}\), \(\mathbf{B}_1, \mathbf{C}_1\in \mathcal{R}^{k\times k}\), \(\mathbf{B}_3\in \mathcal{R}^{(d_1 - k - 1)\times k}\), and \(\mathbf{C}_2\in \mathcal{R}^{(d_2 - k)\times k}\). Plugging Eqs. A8 and A9 into Eq. A7 gives
which reduces to Eq. 13 upon identifying \(\mathbf{A}= \sqrt{d_1}(\mathbf{a}_2^{} + \mathbf{C}_2^{}\mathbf{m})\), \(\mathbf{B}= \mathbf{B}_1^{}\mathbf{N}^\top + \mathbf{M}\mathbf{C}_1^\top \), \(\mathbf{C}= (\mathbf{a}_1^\top + \mathbf{b}_2^\top \mathbf{N}^\top + \mathbf{m}^\top \mathbf{C}_1^\top , \mathbf{B}_3^{}\mathbf{N}^\top )\), and \(\mathbf{D}= \mathbf{C}_2^{}\mathbf{M}^\top \). Therefore, every tangent vector can be expressed as a member of \(\mathcal{T}_{{\varvec{\Theta }}}\mathcal{M}_{k}(d_1, d_2)\) (Eq. 13). Conversely, let \({{\varvec{\Xi }}}\) be a member of Eq. 13. Equation A14 in “Appendix A.4” implies that the curve \({\varvec{\gamma }}: t\mapsto R_{{\varvec{\Theta }}}(t{{\varvec{\Xi }}})\) passes through \({{\varvec{\Theta }}}\) at \(t = 0\) and \(\dot{{\varvec{\gamma }}}(0) = {{\varvec{\Xi }}}\), so Eq. 13 is a subset of the tangent space. In conclusion, Eq. 13 gives a representation of the tangent space of \(\mathcal{M}_{k}(d_1, d_2)\) at \({{\varvec{\Theta }}}\).
1.3 A.3 Proof of Proposition 3
It suffices to verify that \(\langle \mathbf{G}- {{\varvec{\Xi }}}, {{\varvec{\Xi }}}\rangle = 0\). Note that \(\mathbf{I}_{d_1}\) = \(\mathbf{U}_*^{}\mathbf{U}_*^\top + (\mathbf{I}_{d_1} - \mathbf{U}_*^{}\mathbf{U}_*^\top )\) = \(\mathbf{U}_*^{}\mathbf{U}_*^\top + \mathbf{U}_\perp ^{}\mathbf{U}_\perp ^\top \) and \(\mathbf{I}_{d_2}\) = \(\mathbf{V}_*^{}\mathbf{V}_*^\top + (\mathbf{I}_{d_2} - \mathbf{V}_*^{}\mathbf{V}_*^\top )\) = \(\mathbf{V}_*^{}\mathbf{V}_*^\top + \mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top \), which admits the following decomposition of \(\mathbf{G}\in \mathcal{R}^{d_1\times d_2}\):
Partition \(\mathbf{U}_\perp ^{}= (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_\dagger ^{})\) as in Sect. A.2. The orthogonal decomposition
implies that \(\mathbf{G}- {{\varvec{\Xi }}}= \mathbf{U}_\dagger ^\top \mathbf{U}_\dagger ^{}\mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top \); therefore, \(\langle \mathbf{G}- {{\varvec{\Xi }}}, {{\varvec{\Xi }}}\rangle = 0\).
1.4 A.4 Proof of Proposition 4
Expand and rearrange the right-hand side of Eq. 17:
For s belonging to some neighborhood of 0, it follows that
The centering condition (Eq. 15) follows by setting \(s = 0\) in Eq. A14. Equation A14 also suggests that \(R_{{{\varvec{\Theta }}}}(s{{\varvec{\Xi }}})\) is quadratic in s; therefore, differentiating with respect to s and evaluating at \(s = 0\) yield the local rigidity condition (Eq. 16). It is then concluded that \(R_{{{\varvec{\Theta }}}}({{\varvec{\Xi }}})\) is a valid retraction.
Rights and permissions
About this article
Cite this article
Liu, Y. A Riemannian Optimization Algorithm for Joint Maximum Likelihood Estimation of High-Dimensional Exploratory Item Factor Analysis. Psychometrika 85, 439–468 (2020). https://doi.org/10.1007/s11336-020-09711-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-020-09711-8