Skip to main content

Advertisement

Log in

A Riemannian Optimization Algorithm for Joint Maximum Likelihood Estimation of High-Dimensional Exploratory Item Factor Analysis

  • Theory and Methods
  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

There has been regained interest in joint maximum likelihood (JML) estimation of item factor analysis (IFA) recently, primarily due to its efficiency in handling high-dimensional data and numerous latent factors. It has been established under mild assumptions that the JML estimator is consistent as both the numbers of respondents and items tend to infinity. The current work presents an efficient Riemannian optimization algorithm for JML estimation of exploratory IFA with dichotomous response data, which takes advantage of the differential geometry of the fixed-rank matrix manifold. The proposed algorithm takes substantially less time to converge than a benchmark method that alternates between gradient ascent steps for person and item parameters. The performance of the proposed algorithm in the recovery of latent dimensionality, response probabilities, item parameters, and factor scores is evaluated via simulations.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Notes

  1. Some authors prefer to use IFA and item response theory for two different parameterizations of the same (or approximately the same) model; see Takane and de Leeuw (1987) and Wirth and Edwards (2007) for more details.

  2. de Leeuw (2006) used the term binary principal component analysis instead of one-bit matrix completion.

  3. The nuclear norm of a matrix is the sum of its singular values.

  4. This definition is a special case of the more general version in Absil et al. (2008, Sect. 3.3).

  5. Members of \(\mathcal{T}_{{\varvec{\Theta }}}\mathcal{M}_{k}(d_1, d_2)\) are referred to as tangent vectors, although they are in fact matrices.

  6. The orthogonal projection typically approximates the exponential map to the second-order, which is even stronger than Eqs. 15 and 16.

  7. The term is used when the optimization problem involves constraints in addition to those imposed by the manifold (e.g., the infinity-norm constraint in JML estimation). Similarly, an unconstrained manifold optimization problem involves only manifold constraints.

  8. For the Riemannian optimization algorithm, the total number of inner (Riemannian CG) iterations was reported.

References

  • Absil, P.-A., Mahony, R., & Sepulchre, R. (2008). Optimization algorithms on matrix manifolds. Princeton: Princeton University Press.

    Google Scholar 

  • Absil, P.-A., & Malick, J. (2012). Projection-like retractions on matrix manifolds. SIAM Journal on Optimization, 221, 135–158.

    Google Scholar 

  • Andersen, E. B. (1970). Asymptotic properties of conditional maximum-likelihood estimators. Journal of the Royal Statistical Society: Series B (Methodological), 32(2), 283–301.

    Google Scholar 

  • Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques. Boca Raton: CRC Press.

    Google Scholar 

  • Bartholomew, D. J., Steele, F., Galbraith, J., & Moustaki, I. (2008). Analysis of multivariate social science data. Boca Raton: CRC Press.

    Google Scholar 

  • Bertsekas, D. P. (1999). Nonlinear programming. Belmont: Athena Scientific.

    Google Scholar 

  • Björck, A., & Golub, G. H. (1973). Numerical methods for computing angles between linear subspaces. Mathematics of Computation, 27(123), 579–594.

    Google Scholar 

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 464, 443–459.

    Google Scholar 

  • Bock, R. D., & Lieberman, M. (1970). Fitting a response model for \(n\) dichotomously scored items. Psychometrika, 352, 179–197.

    Google Scholar 

  • Borckmans, P. B., Selvan, S. E., Boumal, N., & Absil, P.-A. (2014). A Riemannian subgradient algorithm for economic dispatch with valve-point effect. Journal of Computational and Applied Mathematics, 255, 848–866.

    Google Scholar 

  • Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 361, 111–150.

    Google Scholar 

  • Cai, L. (2010). High-dimensional exploratory item factor analysis by a Metropolis–Hastings Robbins–Monro algorithm. Psychometrika, 751, 33–57.

    Google Scholar 

  • Cai, T., & Zhou, W. X. (2013). A max-norm constrained minimization approach to 1-bit matrix completion. The Journal of Machine Learning Research, 14(1), 3619–3647.

    Google Scholar 

  • Candès, E. J., & Recht, B. (2009). Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 96, 717–772.

    Google Scholar 

  • Carpentier, A., Klopp, O., Löffler, M., & Nickl, R. (2018). Adaptive confidence sets for matrix completion. Bernoulli, 244A, 2429–2460.

    Google Scholar 

  • Chen, Y., Li, X., & Zhang, S. (2018). Joint maximum likelihood estimation for high-dimensional exploratory item factor analysis. Psychometrika (Advance Online Publication). https://doi.org/10.1007/s11336-018-9646-5.

  • Chen, Y., Li, X. & Zhang, S. (2019). Structured latent factor analysis for large-scale data: Identifiability, estimability, and their implications. Journal of the American Statistical Association. (Advance Online Publication) https://doi.org/10.1080/01621459.2019.1635485

  • Curran, P. J., & Hussong, A. M. (2009). Integrative data analysis: The simultaneous analysis of multiple data sets. Psychological Methods, 142, 81–100.

    Google Scholar 

  • Davenport, M. A., Plan, Y., Van Den Berg, E., & Wootters, M. (2014). 1-bit matrix completion. Information and Inference: A Journal of the IMA, 33, 189–223.

    Google Scholar 

  • de Leeuw, J. (2006). Principal component analysis of binary data by iterated singular value decomposition. Computational Statistics & Data Analysis, 501, 21–39.

    Google Scholar 

  • Fan, J., Gong, W., & Zhu, Z. (2019). Generalized high-dimensional trace regression via nuclear norm regularization. Journal of Econometrics, 212(1), 177–202.

    Google Scholar 

  • Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 323, 928–961.

    Google Scholar 

  • Fox, J.-P. (2005). Multilevel IRT using dichotomous and polytomous response data. British Journal of Mathematical and Statistical Psychology, 581, 145–172.

    Google Scholar 

  • Golub, G., & Van Loan, C. (2013). Matrix computations (4th ed.). Baltimore: Johns Hopkins University Press.

    Google Scholar 

  • Haberman, S. J. (2006). Adaptive quadrature for item response models (Tech. Rep. No. RR-06-29). Princeton: ETS.

  • Hofer, S. M., & Piccinin, A. M. (2009). Integrative data analysis through coordination of measurement and analysis protocol across independent longitudinal studies. Psychological Methods, 142, 150.

    Google Scholar 

  • Huang, W., Gallivan, K. A., & Absil, P.-A. (2015). A Broyden class of quasi-newton methods for Riemannian optimization. SIAM Journal on Optimization, 253, 1660–1685.

    Google Scholar 

  • Jeon, M., Kaufman, C., & Rabe-Hesketh, S. (2019). Monte Carlo local likelihood approximation. Biostatistics, 201, 164–179.

    Google Scholar 

  • Klopp, O. (2015). Matrix completion by singular value thresholding: Sharp bounds. Electronic Journal of Statistics, 92, 2348–2369.

    Google Scholar 

  • Klopp, O., Lafond, J., Moulines, É., & Salmon, J. (2015). Adaptive multinomial matrix completion. Electronic Journal of Statistics, 92, 2950–2975.

    Google Scholar 

  • Koopmans, T. C., & Reiersøl, O. (1950). The identification of structural characteristics. The Annals of Mathematical Statistics, 212, 165–181.

    Google Scholar 

  • Liu, C., & Boumal, N. (2019). Simple algorithms for optimization on Riemannian manifolds with constraints. Applied Mathematics & Optimization,. https://doi.org/10.1007/s00245-019-09564-3.

    Article  Google Scholar 

  • Liu, Y., Magnus, B., Quinn, H., & Thissen, D. (2018). Multidimensional item response theory. In D. Hughes, P. Irwing, & T. Booth (Eds.), Handbook of psychometric testing. Hoboken: Wiley.

    Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Mahwah: Routledge.

    Google Scholar 

  • Magnus, J., & Neudecker, H. (1999). Matrix differential calculus with applications in statistics and econometrics. New York: Wiley.

    Google Scholar 

  • McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 341, 100–117.

    Google Scholar 

  • Monroe, S., & Cai, L. (2014). Estimation of a Ramsay-curve item response theory model by the Metropolis–Hastings Robbins–Monro algorithm. Educational and Psychological Measurement, 742, 343–369.

    Google Scholar 

  • Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica: Journal of the Econometric Society, 16(1), 1–32.

  • O’Rourke, S., Vu, V., & Wang, K. (2018). Random perturbation of low rank matrices: Improving classical bounds. Linear Algebra and its Applications, 540, 26–59.

    Google Scholar 

  • Pinar, M. Ç., & Zenios, S. A. (1994). On smoothing exact penalty functions for convex constrained optimization. SIAM Journal on Optimization, 43, 486–511.

    Google Scholar 

  • Polak, E., & Ribière, G. (1969). Note sur la convergence de méthodes de directions conjuguées. Revue Française d’Informatique et de Recherche Opérationnelle. Série Rouge, 316, 35–43.

    Google Scholar 

  • R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. https://www.R-project.org/

  • Reckase, M. (2009). Multidimensional item response theory. New York: Springer.

    Google Scholar 

  • Revelle, W., Wilt, J., & Rosenthal, A. (2010). Individual differences in cognition: New methods for examining the personality-cognition link. In A. Gruszka, G. Matthews, & B. Szymura (Eds.), Handbook of individual differences in cognition (pp. 27–49). Berlin: Springer.

    Google Scholar 

  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 703, 533–555.

    Google Scholar 

  • Shalit, U., Weinshall, D., & Chechik, G. (2012). Online learning in the embedded manifold of low-rank matrices. Journal of Machine Learning Research, 13(Feb), 429–458.

    Google Scholar 

  • Takane, Y., & de Leeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 523, 393–408.

    Google Scholar 

  • Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.), The sage handbook of quantitative methods in psychology (pp. 148–177). London: Sage Publications.

    Google Scholar 

  • Vandereycken, B. (2013). Low-rank matrix completion by Riemannian optimization. SIAM Journal on Optimization, 232, 1214–1236.

    Google Scholar 

  • Wirth, R., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological methods, 121, 58–79.

    Google Scholar 

  • Woods, C. M., & Lin, N. (2009). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 332, 102–117.

    Google Scholar 

  • Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 712, 281–301.

    Google Scholar 

  • Yu, Y., Wang, T., & Samworth, R. J. (2015). A useful variant of the Davis–Kahan theorem for statisticians. Biometrika, 1022, 315–323.

    Google Scholar 

  • Zhang, S., Chen, Y., & Liu, Y. (2020). An improved stochastic EM algorithm for large-scale full-information item factor analysis. British Journal of Mathematical and Statistical Psychology, 73(1), 44–71.

    PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (R 14 KB)

Appendix A: Proofs

Appendix A: Proofs

1.1 A.1 Proof of Proposition 1

Let \(\bar{\mathbf{U}}_0^{} = \bar{\mathbf{U}}_{0*}^{}\bar{\mathbf{R}}_{0u}^{}\) be the QR factorization of the true person parameter matrix: \(\bar{\mathbf{U}}_{0*} = (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_{0*}^{})^\top \) is orthonormal, i.e., \(\bar{\mathbf{U}}_{0*}^\top \bar{\mathbf{U}}_{0*}^{} = \mathbf{I}_{k + 1}\), and \(\bar{\mathbf{R}}_{0u}^{} = (\mathbf{e}_{k + 1}^{}, \mathbf{R}_{0u}^{})\) is upper-triangular with normalized columns by Assumption ii). The block matrix inversion formula (Magnus and Neudecker 1999, p. 12) implies that \(\bar{\mathbf{R}}_{0u}^{-1} = (\mathbf{e}_{k + 1}^{}, \mathbf{R}_{0u}^-)\), i.e., the first column remains to be \(\mathbf{e}_{k + 1}\). By Lemma C.1 of Chen et al. (2018), there exists constants \(C_1, C_2>0\) and a \((k + 1)\times (k + 1)\) orthogonal rotation matrix of form \(\bar{\mathbf{Q}}_*^{} = 1\oplus \mathbf{Q}_*^{}\), where \(\oplus \) denotes the (matrix) direct sum and \(\mathbf{Q}_*^\top \mathbf{Q}_*^{} = \mathbf{I}_k^{}\), such that

$$\begin{aligned} \frac{\Vert \tilde{\mathbf{V}}\bar{\mathbf{Q}}_*- \bar{\mathbf{V}}_0^{}\bar{\mathbf{R}}_{0u}^\top \Vert _F^2}{d_1d_2}\le C_1\sqrt{\frac{d_1 + d_2}{n}} \end{aligned}$$
(A1)

with probability greater than \(1 - C_2/(d_1 + d_2)\). Now, let

$$\begin{aligned} \bar{\mathbf{Q}}^{} = \bar{\mathbf{Q}}_*^{}\bar{\mathbf{R}}_{0u}^{-\top }. \end{aligned}$$
(A2)

Note that the first row of \(\bar{\mathbf{Q}}^{}\) is \(\mathbf{e}_{k + 1}^\top \) due to the partitioned structure of \(\bar{\mathbf{Q}}_*^{}\) and \(\bar{\mathbf{R}}_{0u}^{-1}\) . \(\bar{\mathbf{Q}}^{}\) is indeed an oblique rotation matrix, because \(\bar{\mathbf{Q}}^{-1} = \bar{\mathbf{R}}_{0u}^\top \bar{\mathbf{Q}}_*^\top \), and thus \(\mathrm {diag}(\bar{\mathbf{Q}}^{-1}\bar{\mathbf{Q}}^{-\top })\)\(=\)\(\mathrm {diag}(\bar{\mathbf{R}}_{0u}^\top \bar{\mathbf{R}}_{0u}^{})\)\(=\)\(\mathbf{1}_{k + 1}\). Finally, the Cauchy–Schwarz inequality implies that

$$\begin{aligned} \Vert \tilde{\mathbf{V}}\bar{\mathbf{Q}}^{} - \bar{\mathbf{V}}_0^{}\Vert _F^2 = \Vert (\tilde{\mathbf{V}}\bar{\mathbf{Q}}_*- \bar{\mathbf{V}}_0^{}\bar{\mathbf{R}}_{0u}^\top )\bar{\mathbf{R}}_{0u}^{-\top }\Vert _F^2 \le \Vert \tilde{\mathbf{V}}\bar{\mathbf{Q}}_*^{} - \bar{\mathbf{V}}_0^{}\bar{\mathbf{R}}_{0u}^\top \Vert _F^2 \Vert \bar{\mathbf{R}}_{0u}^{-\top }\Vert _F^2. \end{aligned}$$
(A3)

As \(\bar{\mathbf{R}}_{0u}\) and \(\bar{\mathbf{U}}_0\) share the same set of singular values, \(\Vert \bar{\mathbf{R}}_{0u}^{-\top }\Vert _F^2\)\(\le \)\((k + 1)/\sigma _{k + 1}^2(\bar{\mathbf{U}}_0^{})\)\(\le \)\((k + 1)/c_1^2\) by Assumption iii. Equation 11 then follows from Eqs. A1 and A3.

To establish Eq. 12, notice that

$$\begin{aligned} \Vert \tilde{\mathbf{U}}\bar{\mathbf{Q}}^{-\top } - \bar{\mathbf{U}}_0^{}\Vert _F^2 = \Vert \big (\tilde{\mathbf{U}}\bar{\mathbf{Q}}_*^{} - \bar{\mathbf{U}}_{0*}^{}\big )\bar{\mathbf{R}}_{0u}^{}\Vert _F^2\le \Vert \tilde{\mathbf{U}}\bar{\mathbf{Q}}_*^{} - \bar{\mathbf{U}}_{0*}^{}\Vert _F^2\Vert \bar{\mathbf{R}}_{0u}^{}\Vert _F^2. \end{aligned}$$
(A4)

Because \(\bar{\mathbf{R}}_{0u}^{}\) is column-wise normalized, \(\Vert \bar{\mathbf{R}}_{0u}^{}\Vert _F^2 = k + 1\). The remaining task is to bound \(\Vert \tilde{\mathbf{U}}\bar{\mathbf{Q}}_*^{} - \bar{\mathbf{U}}_{0*}^{}\Vert _F^2\); because the leading columns of \(\tilde{\mathbf{U}}\) and \(\bar{\mathbf{U}}_{0*}^{}\) are identical, it further suffices to bound \(\Vert \hat{\mathbf{U}}_*^{}\mathbf{Q}_*^{} - \mathbf{U}_{0*}^{}\Vert _F^2\). By the choice of \(\mathbf{Q}_*^{} = \hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{} (\mathbf{U}_{0*}^\top \hat{\mathbf{U}}_*^{}\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{})^{-1/2}\) in Chen et al. (2018, Eq. C.12),

$$\begin{aligned} \big \Vert \hat{\mathbf{U}}_*^{}\mathbf{Q}_*^{} - \mathbf{U}_{0*}^{}\big \Vert _F^2 =&\big \Vert \hat{\mathbf{U}}_*^{}\mathbf{Q}_*^{}\big \Vert _F^2 + \big \Vert \mathbf{U}_{0*}^{}\big \Vert _F^2 - 2\mathrm {tr}\big (\mathbf{Q}_*^\top \hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{}\big ) = 2\left[ k - \sum _{l=1}^k\sigma _l\big (\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{}\big )\right] \nonumber \\ \le&2\left[ k - \sum _{l=1}^k\sigma _l^2\big (\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{}\big )\right] = 2\sum _{l=1}^k\sin ^2\angle _l\big (\hat{\mathbf{U}}_*^{}, \mathbf{U}_{0*}^{}\big ), \end{aligned}$$
(A5)

in which \(\angle _l(\hat{\mathbf{U}}_*^{}, \mathbf{U}_{0*}^{})\), \(l = 1,\dots , k\), denotes the principal angles between \(\mathrm {span}(\hat{\mathbf{U}}_*^{})\) and \(\mathrm {span}(\mathbf{U}_{0*}^{})\), and the inequality follows from the fact that \(\sigma _l(\hat{\mathbf{U}}_*^\top \mathbf{U}_{0*}^{}) = \cos \angle _l(\hat{\mathbf{U}}_*^{}, \mathbf{U}_{0*}^{})\) (Björck and Golub 1973). The right-hand side of Eq. A5 converges to 0 in \(P_{{{\varvec{\Theta }}}_0}\)-probability by Equation C.10 in Chen et al. (2018). The proof is now complete.

1.2 A.2 Proof of Proposition 2

Let \({{\varvec{\Theta }}}\in \mathcal{M}_{k}(d_1, d_2)\) and \({\varvec{\gamma }}: \mathcal{R}\rightarrow \mathcal{M}_{k}(d_1, d_2)\) be a smooth curve such that \({\varvec{\gamma }}(0) = {{\varvec{\Theta }}}\). There exists \(\mathbf{w}(t)\in \mathcal{R}^{d_2}\), \(\mathbf{U}(t)\in \mathcal{R}_*^{d_1\times k}\), and \(\mathbf{V}(t)\in \mathcal{R}_*^{d_2\times k}\) such that

$$\begin{aligned} {\varvec{\gamma }}(t) = \mathbf{1}_{d_1}\mathbf{w}(t)^\top + \mathbf{U}(t)\mathbf{V}(t)^\top \end{aligned}$$
(A6)

for t in some neighborhood of 0. Differentiating Eq. A6 with respect to t and evaluating at \(t = 0\) yield

$$\begin{aligned} \dot{{\varvec{\gamma }}}(0) = \mathbf{1}_{d_1}\dot{\mathbf{w}}(0)^\top + \dot{\mathbf{U}}(0)\mathbf{V}(0)^\top + \mathbf{U}(0)\dot{\mathbf{V}}(0)^\top . \end{aligned}$$
(A7)

Given a choice of orthonormal basis matrices \((\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_*^{})\) and \(\mathbf{V}_*^{}\) corresponding to \((\mathbf{1}_{d_1}, \mathbf{U}(0))\) and \(\mathbf{V}(0)\), there exist fixed \(\mathbf{m}\in \mathcal{R}^{k}\) and \(\mathbf{M}, \mathbf{N}\in \mathcal{R}_*^{k\times k}\) such that

$$\begin{aligned} \mathbf{U}(0) = \mathbf{1}_{d_1}^{}\mathbf{m}^\top + \mathbf{U}_*^{}\mathbf{M},\hbox { and } \mathbf{V}(0) = \mathbf{V}_*^{}\mathbf{N}. \end{aligned}$$
(A8)

Because \(\mathbf{1}_{d_1}\) is perpendicular to \(\mathbf{U}_*^{}\), it is possible to select \(\mathbf{U}_\perp ^{}= (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_\dagger ^{})\) where \(\mathbf{U}_{\dagger }^\top \mathbf{1}_{d_1} = \mathbf{0}_{d_1 - k - 1}\), \(\mathbf{U}_{\dagger }^\top \mathbf{U}_*^{}= \mathbf{0}_{(d_1 - k - 1)\times k}\), and \(\mathbf{U}_{\dagger }^\top \mathbf{U}_{\dagger }^{} = \mathbf{I}_{d_1 - k - 1}\). The vector/matrix derivatives in Eq. A7 can then be expanded on the orthonormal bases:

$$\begin{aligned} \dot{\mathbf{w}}(0) = \mathbf{V}_*^{}\mathbf{a}_1 + \mathbf{V}_\perp ^{}\mathbf{a}_2,\ \dot{\mathbf{U}}(0) = \mathbf{U}_*^{}\mathbf{B}_1 + \mathbf{1}_{d_1}\mathbf{b}_2^\top + \mathbf{U}_{\dagger }^{}\mathbf{B}_3,\hbox { and } \dot{\mathbf{V}}(0) = \mathbf{V}_*^{}\mathbf{C}_1 + \mathbf{V}_\perp ^{}\mathbf{C}_2, \end{aligned}$$
(A9)

in which \(\mathbf{a}_1, \mathbf{b}_2\in \mathcal{R}^{k}\), \(\mathbf{a}_2\in \mathcal{R}^{d_2 - k}\), \(\mathbf{B}_1, \mathbf{C}_1\in \mathcal{R}^{k\times k}\), \(\mathbf{B}_3\in \mathcal{R}^{(d_1 - k - 1)\times k}\), and \(\mathbf{C}_2\in \mathcal{R}^{(d_2 - k)\times k}\). Plugging Eqs. A8 and A9 into Eq. A7 gives

$$\begin{aligned} \dot{\varvec{\gamma }}(0) =\&\mathbf{1}_{d_1}^{}(\mathbf{a}_1^\top \mathbf{V}_*^\top + \mathbf{a}_2^\top \mathbf{V}_\perp ^\top ) + (\mathbf{U}_*\mathbf{B}_1^{} + \mathbf{1}_{d_1}^{}{} \mathbf{b}_2^\top + \mathbf{U}_\dagger \mathbf{B}_3^{})\mathbf{N}^\top \mathbf{V}_*^\top \\&+ (\mathbf{1}_{d_1}{} \mathbf{m}^\top + \mathbf{U}_*\mathbf{M})(\mathbf{C}_1^\top \mathbf{V}_*^\top + \mathbf{C}_2^\top \mathbf{V}_\perp ^\top ) \end{aligned}$$
(A10)

which reduces to Eq. 13 upon identifying \(\mathbf{A}= \sqrt{d_1}(\mathbf{a}_2^{} + \mathbf{C}_2^{}\mathbf{m})\), \(\mathbf{B}= \mathbf{B}_1^{}\mathbf{N}^\top + \mathbf{M}\mathbf{C}_1^\top \), \(\mathbf{C}= (\mathbf{a}_1^\top + \mathbf{b}_2^\top \mathbf{N}^\top + \mathbf{m}^\top \mathbf{C}_1^\top , \mathbf{B}_3^{}\mathbf{N}^\top )\), and \(\mathbf{D}= \mathbf{C}_2^{}\mathbf{M}^\top \). Therefore, every tangent vector can be expressed as a member of \(\mathcal{T}_{{\varvec{\Theta }}}\mathcal{M}_{k}(d_1, d_2)\) (Eq. 13). Conversely, let \({{\varvec{\Xi }}}\) be a member of Eq. 13. Equation A14 in “Appendix A.4” implies that the curve \({\varvec{\gamma }}: t\mapsto R_{{\varvec{\Theta }}}(t{{\varvec{\Xi }}})\) passes through \({{\varvec{\Theta }}}\) at \(t = 0\) and \(\dot{{\varvec{\gamma }}}(0) = {{\varvec{\Xi }}}\), so Eq. 13 is a subset of the tangent space. In conclusion, Eq. 13 gives a representation of the tangent space of \(\mathcal{M}_{k}(d_1, d_2)\) at \({{\varvec{\Theta }}}\).

1.3 A.3 Proof of Proposition 3

It suffices to verify that \(\langle \mathbf{G}- {{\varvec{\Xi }}}, {{\varvec{\Xi }}}\rangle = 0\). Note that \(\mathbf{I}_{d_1}\) = \(\mathbf{U}_*^{}\mathbf{U}_*^\top + (\mathbf{I}_{d_1} - \mathbf{U}_*^{}\mathbf{U}_*^\top )\) = \(\mathbf{U}_*^{}\mathbf{U}_*^\top + \mathbf{U}_\perp ^{}\mathbf{U}_\perp ^\top \) and \(\mathbf{I}_{d_2}\) = \(\mathbf{V}_*^{}\mathbf{V}_*^\top + (\mathbf{I}_{d_2} - \mathbf{V}_*^{}\mathbf{V}_*^\top )\) = \(\mathbf{V}_*^{}\mathbf{V}_*^\top + \mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top \), which admits the following decomposition of \(\mathbf{G}\in \mathcal{R}^{d_1\times d_2}\):

$$\begin{aligned} \mathbf{G}= \mathbf{U}_*^{}\mathbf{U}_*^\top \mathbf{G}\mathbf{V}_*^{}\mathbf{V}_*^\top + \mathbf{U}_*^{}\mathbf{U}_*^\top \mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top + \mathbf{U}_\perp ^{}\mathbf{U}_\perp ^\top \mathbf{G}\mathbf{V}_*^{}\mathbf{V}_*^\top + \mathbf{U}_\perp ^{}\mathbf{U}_\perp ^\top \mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top . \end{aligned}$$
(A11)

Partition \(\mathbf{U}_\perp ^{}= (\mathbf{1}_{d_1}/\sqrt{d_1}, \mathbf{U}_\dagger ^{})\) as in Sect. A.2. The orthogonal decomposition

$$\begin{aligned} \mathbf{U}_\perp ^{}\mathbf{U}_\perp ^\top \mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top = \frac{\mathbf{1}_{d_1}\mathbf{1}_{d_1}^\top \mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top }{d_1} + \mathbf{U}_\dagger ^\top \mathbf{U}_\dagger ^{}\mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top \end{aligned}$$
(A12)

implies that \(\mathbf{G}- {{\varvec{\Xi }}}= \mathbf{U}_\dagger ^\top \mathbf{U}_\dagger ^{}\mathbf{G}\mathbf{V}_\perp ^{}\mathbf{V}_\perp ^\top \); therefore, \(\langle \mathbf{G}- {{\varvec{\Xi }}}, {{\varvec{\Xi }}}\rangle = 0\).

1.4 A.4 Proof of Proposition 4

Expand and rearrange the right-hand side of Eq. 17:

$$\begin{aligned} R_{{{\varvec{\Theta }}}}({{\varvec{\Xi }}}) =&\left[ \frac{\mathbf{1}_{d_1}\mathbf{w}_{*}^\top }{d_1} + \mathbf{U}_*^{}\mathbf{R}^\top \mathbf{V}_*^\top \right] + \left[ \frac{\mathbf{1}_{d_1}\mathbf{a}^\top \mathbf{V}_\perp ^\top }{\sqrt{d_1}} + \mathbf{U}_*^{}\mathbf{B}\mathbf{V}_*^\top + \mathbf{U}_\perp ^{}\mathbf{C}\mathbf{V}_*^\top + \mathbf{U}_*^{}\mathbf{D}^\top \mathbf{V}_\perp ^\top \right] \nonumber \\&+ (\mathbf{U}_*^{}\mathbf{B}+ \mathbf{U}_\perp ^{}\mathbf{C})\mathbf{R}^{-\top }\mathbf{D}^\top \mathbf{V}_\perp ^\top \nonumber \\ =&{{\varvec{\Theta }}}+ {{\varvec{\Xi }}}+ (\mathbf{U}_*^{}\mathbf{B}+ \mathbf{U}_\perp ^{}\mathbf{C})\mathbf{R}^{-\top }\mathbf{D}^\top \mathbf{V}_\perp ^\top . \end{aligned}$$
(A13)

For s belonging to some neighborhood of 0, it follows that

$$\begin{aligned} R_{{{\varvec{\Theta }}}}(s{{\varvec{\Xi }}}) = {{\varvec{\Theta }}}+ s{{\varvec{\Xi }}}+ s^2(\mathbf{U}_*^{}\mathbf{B}+ \mathbf{U}_\perp ^{}\mathbf{C})\mathbf{R}^{-\top }\mathbf{D}^\top \mathbf{V}_\perp ^\top . \end{aligned}$$
(A14)

The centering condition (Eq. 15) follows by setting \(s = 0\) in Eq. A14. Equation A14 also suggests that \(R_{{{\varvec{\Theta }}}}(s{{\varvec{\Xi }}})\) is quadratic in s; therefore, differentiating with respect to s and evaluating at \(s = 0\) yield the local rigidity condition (Eq. 16). It is then concluded that \(R_{{{\varvec{\Theta }}}}({{\varvec{\Xi }}})\) is a valid retraction.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y. A Riemannian Optimization Algorithm for Joint Maximum Likelihood Estimation of High-Dimensional Exploratory Item Factor Analysis. Psychometrika 85, 439–468 (2020). https://doi.org/10.1007/s11336-020-09711-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-020-09711-8

Keywords

Navigation