Asymptotic Posterior Normality of Multivariate Latent Traits in an IRT Model

The asymptotic posterior normality (APN) of the latent variable vector in an item response theory (IRT) model is a crucial argument in IRT modeling approaches. In case of a single latent trait and under general assumptions, Chang and Stout (Psychometrika, 58(1):37–52, 1993) proved the APN for a broad class of latent trait models for binary items. Under the same setup, they also showed the consistency of the latent trait’s maximum likelihood estimator (MLE). Since then, several modeling approaches have been developed that consider multivariate latent traits and assume their APN, a conjecture which has not been proved so far. We fill this theoretical gap by extending the results of Chang and Stout for multivariate latent traits. Further, we discuss the existence and consistency of MLEs, maximum a-posteriori and expected a-posteriori estimators for the latent traits under the same broad class of latent trait models. Supplementary Information The online version contains supplementary material available at 10.1007/s11336-021-09838-2.

Proof. The proof is based on arguments of Chang and Stout (1991, Lemma 3.1).
Denote by ∇P i the gradient of P i . Then, for a pointη ∈ {(1 − c)η + cη , c ∈ [0, 1]} on the line between η and η , we get from the multivariate mean value theorem and the Cauchy-Schwarz inequality that Due to (CS2 ), if restricted to K + , all ∂P i ∂η k are uniformly bounded for all i ∈ N, 1 ≤ k ≤ q. Then there is a finite number ζ(K + ) such that sup (i,η)∈N×K + ∇P i (η) = ζ(K + ) and hence for all η, η ∈ K + Especially holds for all η, η ∈ K + with η − η < δ = ε ζ(K + ) , for ε > 0 and all i ∈ N. Notice, (W1) and (W2) are still true, if we take η, η ∈ K, where K ⊂ K + . Hence, the family of maps { P i | K } i∈N is equicontinuous for any compact set K, for which a convex and compact set K + ⊂ Θ exists such that K ⊂ K + .
Notice that sets K, as considered in Lemma W.1, are in fact all compact subsets of Θ due to the requird convexity of Θ.
Kolmogorov's SLLN can be used to show that that 1 asymptotically negative, as shown in the next lemma. Using the equicontinuity of the item response functions, i.e. Lemma W.1, this can then extended to the supremum of η ∈ Θ to obtain Lemma 1.
Proof. The proof is based on arguments of Chang and Stout (1991, Lemma 3.1).
This enables the application of Theorem W.1 on the sequence {log Z i (η, η 0 )} i∈N , to obtain It follows from the last step and (CS3 ) that there is a constant c(η) < 0 so that (W3) holds. Now we can prove Lemma 1.
Proof of Lemma 1. This proof is based on arguments by Chang and Stout (1991, Lemma 3.1).
Since the image of P i does not include {0, 1}, H i is continuous. Moreover, for any η ∈ Θ and any δ > 0 such thatB δ (η ) ⊂ Θ, the map is continuous on the compact setB δ (η ) and thus has a maximum value, whereB denotes the closure of the set B. We denote this maximum value bŷ In the following we assume, that δ > 0 is sufficiently small so thatB δ (η ) ⊂ Θ for the selected value η ∈ Θ. Letting δ → 0 yieldsB δ (η ) → {η } and therefore, lim δ→0Ĥi (δ, η ) = 0 for each i ∈ N and η ∈ Θ. Since Y i ∈ {0, 1}, we get from the triangle inequality for δ ≥ η − η . For any fixed η ∈ Θ we get Applying the multivariate mean value theorem to log( · ) we get where ξ(P i (η), P i (η )) is a point between P i (η) and P i (η ). Let ζ 0 (K) and ζ 1 (K) be given for each compact K ⊂ Θ as in (CS2 ). Additionally, due to Lemma W.1, for each compact and convex set K ⊆ Θ there is a ζ 3 (K) > 0 such that for all i ∈ N. Combining (W5), the analogously derived version for 1 − P i (η) instead of P i (η), (W6) and equation (20) in condition (CS2'), we obtain for all compact and convex K ⊇B δ (η ). Therefore, for each ε > 0 there is a δ > 0 so that Next, we will show that for any η i = η 0 there is a sufficiently small δ i > 0 and a sufficiently large c i < 0 such that Equation (W7) implies, for all η = η i and for each ε > 0, that there is a sufficiently small Therefore, there is a δ > 0 sufficiently small and a negative number c i , for example c i = c(η i ) 2 so that (W8) holds.
Equation (W8) still holds if we replace B δ (η i ) by an arbitrary subset of B δ (η i ).
Especially, for all η i = η 0 , there exists a sufficiently small δ i and a constant c i < 0 such that for all compact sets holds.
The following lemma can be found in Witting and Müller-Funk (1995, "Hilfssatz" 6.7 part b), p. 173). 1 This lemma is needed to show that there exist measurable solutions of the likelihood equations and, in particular, the MLE is actually a random vector. Thus, equations that contain the MLE can be manipulated as for any random vector.
Lemma W.3. Consider a function g(· , ·) so that for each fixed η ∈ Θ ⊂ R q the mapping is measurable and for each fixed x x x ∈ R d the mapping g(x x x, ·) : Θ → R d is continuous. Assume that Θ is compact or that there is a sequence of compact sets {U i } i∈N with Θ = i∈N U i . Further, assume that there is a mapping ϑ : Next, we prove the consistency of the MLE and the MAP (Theorem 5 (i) and (ii)), which is required to prove Lemmas 2 and 3 and, consequently, Thereom 5 (iii) and Theorem 6.
Proof of Theorem 5 (i) and (ii). We start with part (i). The proof of the consistency of the MLE is based on the proof of Corollary 3.1 of Chang and Stout (1991). Similar to showing the existence of the MLE in classic maximum likelihood theory (cf. Lehmann and Casella, 1998), we define the set So, for all y ∈ C d,δ , (d) (· | y) has at least one local maximum in B δ (η 0 ). For all y ∈ C d,δ , we and so for all y ∈ C d,δ each point η * ∈ M y satisfies the likelihood equations, i.e.
Therefore, with probability tending to one for d → ∞, at least one of the local maxima of (d) (· | y) in the interior of B δ (η 0 ) has to be a global maximum of (d) (· | y).
Next, we have to select a specific maximum point in a way that it is a measurable where Pow denotes the power set, to define a sequence of random variables, which are statistics of the response variables.
We write Θ δ := B δ (η 0 ) ∩ Θ ⊂ Θ for a restricted compact parameter space. 2 Notice that is measurable for each fixed η ∈ Θ δ . Then by the remark of Witting and Müller-Funk (1995, page 173) is a measurable mapping. Further, since Θ δ is compact, for each y ∈ {0, 1} d the mapping (d) (· | y) takes the maximum over Θ δ and we can select an η * ∈ Θ δ such that We may now apply Lemma W.3 to ensure the existence of a measurable mapping for each d ∈ N. By the previous part, i.e., for d → ∞, this probability is tending to one so thatη d is a local maximum in Θ δ and a global maximum of (d) ( · | Y (d) ) (and thus the maximum likelihood estimator (MLE)).
2 If Θ is compact, we do not have to restrict Θ. Nevertheless, we do it for the simplicity of the formulation.
In fact, by Witting and Müller-Funk (1995, page 173), the following argument is in principle true even if Θ is unbounded. But then the corresponding random variables can become infinte, which we want to avoid.
It remains to prove the consistency. We assume thatη d is the (restricted) MLE for Therefore, it is sufficient to proof for all ε > 0 and δ > 0 that there is an N (ε, δ) ∈ N such that . Supposeη d is not consistent, then there exists ε 0 > 0 and δ 0 > 0 such that for any N ∈ N there exists some d > N satisfying with for an infinite number of d ∈ N and this implies sup for infinitely many d ∈ N. In particular, it holds for the event But, according to Lemma 1, there is a c = c(δ 0 ) < 0 such that for the event where we can replace "≤" in the last step by "=". Notice that which is a contradiction and proves the consistency of the MLE.
For part (ii), we show that Lemma 1 is still valid if we replace with˜ . The remaining steps are identical to the proof of part (i), simply replacing by˜ , its maximumη d byη d and For any d ∈ N and δ > 0, we get Clearly, for any for all d > D. Thus, we can apply Lemma 1 and conclude from (W18) and (W19) with Lemmas W.4 to W.6 that follow, are required for the proof of Lemma 2. We start with the equicontinuity of the sequence of Hessians of λ i , i ∈ N, and the sequence of item Lemma W.4. Suppose that the conditions (CS2 ) and (CS4 ) hold. If restricted to any convex and compact set K ⊆ Θ, the two families of mappings {∇∇ λ i } i∈N and {I i } i∈N are equicontinuous.
Proof. The proofs for {∇∇ λ i } i∈N and {I i } i∈N can be formulated equivalently. Hence, let F (i) either be I i or ∇∇ λ i for all i ∈ N to simplify the notation. Further, we denote by F (i) jk the (j, k)th component of F (i) for j, k = 1, . . . , q and all i ∈ N. Conditions (CS2 ) and (CS4 ) imply that ∇F (i) jk (η) exists for all j, k = 1, . . . , q, i ∈ N and η ∈ K, and that there is a C < ∞ such that Due to the multivariate mean value theorem and the convexity of K, for all η 1 , η 2 ∈ K, j, k = 1, . . . , q and i ∈ N, there is a c ∈ [0, 1] so that Then, by the Cauchy-Schwarz-inequality and (W20), we get Notice that the right-hand side of (W21) is independent of i, j, k. We therefore obtain the following estimate for the maximum-norm of The equivalence of all norms on R d×d implies that there is a C > 0 so that for all i ∈ N and all η 1 , η 2 ∈ K. The final step is done analogously to (W2) in the proof of the equicontinuity of the item response functions.
The next lemma is required for the approximation of the quadratic form of the Hessian of the log-likelihood by the quadratic form of the test information matrix, Q d in Lemma 2. In particular, this is used for the final step in the proof of Lemma 2 where the valid quadratic approximation of the log-likelihood-ratio using the test information matrix is shown.
Lemma W.5. Let A A A ∈ R q×q be symmetric and positive definite, x x x ∈ R q and B B B ∈ R q×q for q ∈ N. Then where B B B denotes the spectral norm of the matrix B B B.
Proof. We first observe that due to the Courant-Fischer theorem.
Since A A A is symmetric and positive definite, we can define a scalar product and a norm by Additionally, we define the matrix norm Since each symmetric and positive definite matrix has a unique symmetric and positive The induced matrix norm is always compatible to the corresponding vector norm and hence we get from the sub-multiplicativity Applying (W22) and (W24) to (W23), we get an estimate for an upper bound Finally, by the Cauchy-Schwarz's inequality and the sub-multiplicativity of compatible matrix norms we obtain This is equivalent to which completes the proof.
Next, we prove that conditions (CS2 ) and (CS5 ) ensure the asymptotic regularity of This generalizes directly to finite sums. Now consider an arbitrary η ∈ Θ. Condition (CS5 ) implies that there is a c > 0 and D ∈ N such that Additionally, condition (CS2 ) implies that there is a c > 0 so that Thus, we get ν min We are now able to prove Lemmas 2 and 3, provided in the appendix of the paper.
Proof of Lemma 2. This proof is based on arguments by Chang and Stout (1991, Lemma 3.2).
We get the first part directly by using a second order Taylor expansion of (d) (· | Y (d) ) at η d , since ∇ (d) (η d | Y (d) ) = 0 0 0. Theorem 5 (i) implies that for each ε > 0 and δ > 0 there is an ). Therefore, we assume without loss of generality that η d − η 0 ≤ δ for the discussed d and δ. Further, due to for all d > N . Notice that A −1 = 1/ν min (A) holds for the selected matrix norm (i.e. the spectral norm) for all A ∈ R q×q with det(A) = 0. Therefore, we assume without restriction that 1 d I d (η d ) is regular and that there is a constant C 0 > 0 (which is independent of d) such that for d > N .
Using the sub-multiplicative property of matrix norms and (W25), we get .
We now study a decomposition of 1 d (I d (η d ) + H d (η * d )) in order to prove (A4). Notice first that for all η ∈ Θ. The triangle inequality next implies Since Y i ∈ {0, 1}, P i ∈ (0, 1), i ∈ N, and due to Lemma W.4, there are constants C 1 , C 2 > 0 such that and for i ∈ N. Further, due to (CS2 ), the norm-equivalence and Lemma W.1, there are constants for all i ∈ N with the maximum-norm · max , the q × q matrix matrix of ones 1 1 1 q×q and Next, for each (j, k) ∈ {1, . . . , q} 2 and i ∈ N we get and By applying Theorem W.1, we obtain Combining (W26) to (W30), it follows that For any ε > 0 and d sufficiently large, we can set to obtain the second part of Lemma 2.
Finally, we shall prove its third part. By assumption, I d (η d ) is always symmetric and for d > N positive definite. Now, let d > N be fixed, then Therefore, it is sufficient to show that for each ε > 0, it exists a δ > 0 such that By assumption (CS5') and the consistency of the MLEη d , there is a constant C 1 > 0 such that We complete the proof by applying the second part with ε = ε 1 /C 1 for an arbitrary ε 1 > 0.
Proof of Lemma 3. This proof is partially based on arguments by Chang and Stout (1991, Theorem 3.1).

Note that
where Sinceη d is a maximum of (d) ( · |Y (d) ), it always holds bounded for all η in an arbitrary compact subset of Θ and all d ∈ N. Utilizing this fact and by the consistency ofη d for d → ∞, we get First, suppose that H is improper in a way that the posterior is proper and that equation (28) of the paper's discussion is satisfied. Further suppose that there is a constant C f > 0 in such a way that |f (η)| < C f for all η ∈ Θ, i.e. f is bounded by a constant in absolute value. Then we get from the definition of T d in (W32) and from (28) Combining (W33) and (W34) directly implies Otherwise, one of the following cases holds by the conditions of Lemma 3(1.): (i) f is bounded by a constant in absolute value and H is proper, (ii) f is H-integrable and H is proper, (iii) f is H-integrable and H is improper.
Each of these cases results in the fact that f is H-integrable and hence E(|f • η|) < ∞.
Lemma 1 implies that there is a c(δ) < 0 such that Further, Combining (W35) and (W36) results in The facts that d → exp(dc(δ)) is decreasing faster than any polynomial and that det Σ grows in polynomial order, due to (W33), imply which completes the proof of the first part.
Lemma 3 (2.) and a utilization of the continuous mapping theorem with R \ {0} → R, x Proof of Theorem 5 (iii). This proof is partially based on arguments by Chang and Stout (1991, Theorems 3.1 and 3.3).
We start with the proof for the convergence in P η 0 with restriction to bounded B. In the following step, this will then be extended to unbounded B and, finally, the convergence in P will be proved. Analogously to the proof of Lemma 2, we can assume without loss of To show Theorem 5 (iii) for bounded B we utilize the reformulation Pη 0 −→ 0, from (CS5 ) and the fact that A A A −1 = 1/ν min (A A A) for any regular symmetric matrix A A A, we get that This implies Σ d (2.) is satisfied, where G d is defined in (W41). We can therefore apply Lemma 3 (2.) and get for d → ∞. Further, from the definition of G d , we get Combining (W42), (W43), (W44) and Corollary 1 results in Theorem 5 (iii) for bounded B and convergence in P η 0 , i.e. (26) for bounded B.
Next, we show the case of unbounded B ∈ B q . In order to show this, we define the sequence of random probability measures {Ψ d } d∈N on (R q , B q ) by setting Due to the transformation theorem, holds for all d ∈ N and A ∈ B q . Notice that Ψ d is the posterior probability distribution of the affine transformation G −1 d (η) of η given Y (d) . So, Ψ d (A) is well-defined and finite for each Borel set A ∈ B q and d ∈ N.
for any ε > 0. Especially, (W45) also holds for ε = 6ε π 2 m 2 with ε > 0. Since the probability of each of these events is tending to one, this is also true for the (countable) intersection of these events. Thus, we get with probability tending to one for d → ∞ for any ε > 0 and this completes the proof of (26).
For the convergence in P, let B ∈ B q be chosen arbitrarily. We define for all d ∈ N, ε > 0 and η ∈ Θ, where φ q is the pdf of N q (0 0 0, I q ). Notice that 0 ≤ H d,ε ≤ 1 is true for all d ∈ N and ε > 0. Hence, for each ε > 0, η → 1 dominates the sequence {H d,ε } d∈N , while η → 1 is always G-integrable for a proper G. An application of Lebesgue's theorem of dominated convergence to {H d,ε } d∈N results in since lim d→∞ H d,ε (η 0 , ε) = 0 for all ε > 0 and η 0 ∈ R q , due to (26), and this finishes the proof.
Proof of Theorem 6. We first prove part (i). Similar to (W42), we utilize the reformulation for an arbitrary B ∈ B q with η 0 ∈ ∂B and each d ∈ N. If η 0 ∈ B, then condition (A7) of Lemma 3 (2.) is satisfied with G d (B) := B for all d ∈ N. However, if η 0 ∈ B, then the condition of Lemma 3 (1.) is satisfied with f := 1 1 B . We therefore get for d → ∞. The claim follows from a combination of (W47), (W48) and Corollary 1.
Next, we prove part (ii). In a first step, the existence of E(f (η) | Y (d) ) for all functions f : Θ → R, which are continuous and for which the integral Θ f (η)h(η) dη exists, will be proved. In a second step its consistency for f (η 0 ) will be discussed.
Notice that the last statement does not follow directly, because P (d) (y (d) ) → 0 for any sequence {y i } i∈N and d → ∞.
Similar to (W42), we start with the representation for each d ∈ N. We decompose for an arbitrary δ > 0 as follows Further, part (i) implies Finally, since f is continuous at η 0 , for each ε > 0 there is a δ > 0 such that |f (η) − f (η 0 )| < ε, for all η ∈ B δ (η). Therefore, for every ε > 0 we get Since δ > 0 was chosen arbitrarily in the decomposition of Θ, we get for each ε > 0 which is what we had to show for the consistency of E(f (η) | Y (d) ). The consistency of E(η | Y (d) ) follows directly by considering the mappings η → η j , j ∈ {1, . . . , q}, in the first part, which are continuous and by assumption integrable.