A Note on Exploratory Item Factor Analysis by Singular Value Decomposition

We revisit a singular value decomposition (SVD) algorithm given in Chen et al. (Psychometrika 84:124–146, 2019b) for exploratory item factor analysis (IFA). This algorithm estimates a multidimensional IFA model by SVD and was used to obtain a starting point for joint maximum likelihood estimation in Chen et al. (2019b). Thanks to the analytic and computational properties of SVD, this algorithm guarantees a unique solution and has computational advantage over other exploratory IFA methods. Its computational advantage becomes significant when the numbers of respondents, items, and factors are all large. This algorithm can be viewed as a generalization of principal component analysis to binary data. In this note, we provide the statistical underpinning of the algorithm. In particular, we show its statistical consistency under the same double asymptotic setting as in Chen et al. (2019b). We also demonstrate how this algorithm provides a scree plot for investigating the number of factors and provide its asymptotic theory. Further extensions of the algorithm are discussed. Finally, simulation studies suggest that the algorithm has good finite sample performance. Electronic supplementary material The online version of this article (10.1007/s11336-020-09704-7) contains supplementary material, which is available to authorized users.

Z = (z ij ) m×n and a function f : R → R, let f (Z) := (f (z ij )) m×n . Let σ k (Z) denote the k-th largest singular value of Z, and Z , Z * denote the spectrum norm and nuclear norm of Z, which is the largest singular value and the sum of all singular values, respectively. If Z is a square matrix, let λ k (Z) denote the k-th largest eigenvalue of Z.
We denote as the true probability matrix and defineX = (x ij ) N ×J bỹ where x ij is defined in step 5 of Algorithm 2.
Throughout the proof, we use c to denote constant, whose value may change from line to line or even within a line. We will drop the subscripts in N,J and write for notional simplicity.

B Proof of Theorems
Proof of Theorem 1. Since Theorem 1 is a special case of Proposition 4 when p = 1 and W = 1 N 1 J , we refer the readers to the proof of Proposition 4.

C Proof of Propositions
Proof of Proposition 1. According to the choice of , we have h(2 ) ≥ C C 2 0 + 1. Then, We complete the proof by Theorem 1.
Proof of Proposition 3. The proof of Proposition 3 is similar to proof of Lemma 1. We will only state the main steps and omit the repeating details. According to Lemma 3 in Appendix There is a difference from the proof of Lemma 1 that the rank of matrix f (M δ ) is upper bounded by For any ∆ N,J > 0, by Chebyshev's inequality, we have Thus, for any sequence ∆ N,J satisfying ∆ N, In what follows, we restrict our analysis to the event 1 Following the similar procedure as in proof of Lemma 1, we can further bound X − X * 2 To summarize, we have where the second equation is due to N ≥ J.
Proof of Proposition 4. We have Then Θ * (A * ) =ΘÃ , andθ i s are independent and identically distributed from a distributionF which has mean 0 and covariance matrix I K . Therefore, it suffices to show L N,J (A * ,Â) pr → 0 when Σ = I K . We prove it through the following two lemmas whose proofs are given in Appendix D.
Lemma 2. Suppose conditions A1, A2 and A4 are satisfied and further suppose that We complete the proof.

D Proof of Lemmas
Proof of Lemma 1. We first give a lemma regarding the error bound for recovering the probability matrix X * .
Lemma 3. Given X * , we have where C = h(2 )/C is a quantity depending on . Let In what follows, we restrict the analysis to the event A N,J . Let G 1 , G 2 be two δ-nets for For any θ * i , let p(θ * i ) be a point in G 1 such that With a little abuse of notation, we use p(θ + i ) to denote (1, p(θ * i ) ) . For any a + j , let p(a + j ) be a point in G 2 such that It is not hard to see that we can find such G 1 , G 2 such that This is due to definition of A N,J and condition A1.
Now we provide an upper bound for X * * on the right-hand side of (D.1). We have The second term on the right-hand side of the above display is bounded above by Now we consider the first term. We have We have used the Lipschitz continuity in condition A3 here. Then the first term in (D.2) is bounded from above as . By Chebyshev's inequality, for any Thus, SinceX andX are not far away from each other by definition, we can bound X − X * 2 F by where the last inequality is because ≤ 1 4 . According to condition A3 and the above inequality, we have (D.9) The first inequality holds because x * ij ,x ij ∈ [ , 1 − ] on the event B N,J .
We proceed to an upper bound ofM −Θ * (A * ) . Recall that We first bound the trace term in the above display, By the definition ofd j , we So we can bound tr{H 1 H 2 } by According to condition A2 and law of large number, we have for any ξ > 0. Let Recall how we getΘ,Â in algorithm 2 and we have where the first inequality is due to rank ΘÂ − Θ * (A * ) ≤ 2K, the second inequality is due to (D.13) and the last inequality is due to (D.12). Thus, on the event C N,J,ξ where ∆ N,J could be any sequence satisfying ∆ N,J = o(1). By (3), (4) and condition A5, there exists ∆ N,J = o(1) such that h(N,J) ( g( )) 2 = o(1). So fix any ξ < 1, for N, J large enough, we have h(N,J) ( g( )) 2 ≤ ξ. Then there is a constant κ such that for N, J large enough, on C N,J,ξ with ξ ∈ (0, 1), we have, This combined with Pr(C N,J,ξ ) → 1 for any ξ sufficiently small completes the proof.
Proof of Lemma 2. Let and in the following we will show that For any α > 0, let Applying Theorem 5.39 of Vershynin (2010) to the matrix Θ * , we have lim N,J→∞ Pr(D N,J,α ) = 1 for any α > 0. We restrict our analysis on D N,J,α in what follows and denote Then, . (D.17) We consider (b) first: Proof of Lemma 3. This lemma is almost the same as Theorem 1.1 of Chatterjee (2015) by setting, in his notations, η = 0.02 and σ 2 = 1/4, except two small differences. The first is that the probability p can be changed through N, J in the setting of  while p is a constant in our setting. Therefore we absorb p into constants c in the LHS of (D.1). The second difference is a modification in step 5 of Algorithm 2 that we require X to include at least K + 1 singular values of Z. This does not change the result of Theorem 1.1