Skip to main content

Advertisement

Log in

Generalized Fiducial Inference for Binary Logistic Item Response Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

Generalized fiducial inference (GFI) has been proposed as an alternative to likelihood-based and Bayesian inference in mainstream statistics. Confidence intervals (CIs) can be constructed from a fiducial distribution on the parameter space in a fashion similar to those used with a Bayesian posterior distribution. However, no prior distribution needs to be specified, which renders GFI more suitable when no a priori information about model parameters is available. In the current paper, we apply GFI to a family of binary logistic item response theory models, which includes the two-parameter logistic (2PL), bifactor and exploratory item factor models as special cases. Asymptotic properties of the resulting fiducial distribution are discussed. Random draws from the fiducial distribution can be obtained by the proposed Markov chain Monte Carlo sampling algorithm. We investigate the finite-sample performance of our fiducial percentile CI and two commonly used Wald-type CIs associated with maximum likelihood (ML) estimation via Monte Carlo simulation. The use of GFI in high-dimensional exploratory item factor analysis was illustrated by the analysis of a set of the Eysenck Personality Questionnaire data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. In the sequel, lowercase letters are routinely used for realizations of random variables.

  2. For the set inverse function considered here, the closure amounts to the same polyhedrons with all the boundaries attained.

  3. Both \(V_I\) and \(D_I\) depend on the observed data \(\mathbf y\); the dependency is omitted from the expressions for conciseness.

  4. In practice, slopes might be fixed at values other than zero. The theoretical properties discussed in the current work still apply after subtracting the inner product of those fixed slopes and the corresponding normal variates from \(A_{ij}\)’s and substituting its distribution function for the standard logistic density and distribution functions.

  5. For ease of notation, we use \(\Phi \) to denote the probability measure corresponding to a standard normal distribution of arbitrary dimensionality. By default, the dimensionality is determined by the quantity in the parenthesis that follows.

  6. \(i\in I\) means \(i\in I_j\) for some j, with a slight abuse of notation. As a general notation, we put index set in subscript to denote the corresponding elements.

  7. For example, if a 2PL item is moderately difficult but highly discriminating, then observing either a correct response with a negative \(Z^\star _i\) or an incorrect response with a positive \(Z^\star _i\) is unlikely; therefore, the generated set inverse functions may not have an upper bound for the corresponding slope parameter.

  8. For example, if \(I = \{1, \ldots , \sum _{j=1}^mq_j\}\) is the first \(\sum _{j=1}^mq_j\) observations in the sample, then [sj] corresponds to the observation \(i = \sum _{j=1}^mq_j + (s - 1)m + j\).

References

  • Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational and Behavioral Statistics, 17(3), 251–269.

    Article  Google Scholar 

  • Asparouhov, T., & Muthén, B. (2012). Comparison of computational methods for high dimensional item factor analysis. Unpublished manuscript retrieved from www.statmodel.com.

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society, Series B: Methodological, 57(1), 289–300

  • Benjamini, Y., & Yekutieli, D. (2005). False discovery rate-adjusted multiple confidence intervals for selected parameters. Journal of the American Statistical Association, 100(469), 71–81.

    Article  Google Scholar 

  • Bernaards, C. A., & Jennrich, R. I. (2005). Gradient projection algorithms and software for arbitrary rotation criteria in factor analysis. Educational and Psychological Measurement, 65, 676–696.

    Article  Google Scholar 

  • Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.

  • Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Article  Google Scholar 

  • Bock, R. D., & Lieberman, M. (1970). Fitting a response model for \(n\) dichotomously scored items. Psychometrika, 35(2), 179–197.

    Article  Google Scholar 

  • Browne, M. W. (2001). An overview of analytic rotation in exploratory factor analysis. Multivariate Behavioral Research, 36(1), 111–150.

    Article  Google Scholar 

  • Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61(2), 309–329.

    Article  PubMed  Google Scholar 

  • Cai, L. (2010a). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75(1), 33–57.

    Article  Google Scholar 

  • Cai, L. (2010b). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335.

    Article  Google Scholar 

  • Crawford, C. B., & Ferguson, G. A. (1970). A general rotation criterion and its use in orthogonal rotation. Psychometrika, 35(3), 321–332.

    Article  Google Scholar 

  • Dempster, A. P. (1968). A generalization of bayesian inference. Journal of the Royal Statistical Society. Series B (Methodological), 20(2), 205–247.

    Google Scholar 

  • Dempster, A. P. (2008). The dempster-shafer calculus for statisticians. International Journal of Approximate Reasoning, 48(2), 365–377.

    Article  Google Scholar 

  • Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75(3), 474–497.

    Article  Google Scholar 

  • Eysenck, S. B., Eysenck, H. J., & Barrett, P. (1985). A revised version of the psychoticism scale. Personality and individual differences, 6(1), 21–29.

    Article  Google Scholar 

  • Fisher, R. A. (1930). Inverse probability. Proceedings of the Cambridge Philosophical Society, 26, 528–535.

    Article  Google Scholar 

  • Fisher, R. A. (1933). The concepts of inverse probability and fiducial probability referring to unknown parameters. Proceedings of the Royal Society of London. Series A, 139(838), 343–348.

    Article  Google Scholar 

  • Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Eugenics, 6(4), 391–398.

    Article  Google Scholar 

  • Fraser, D. A. S. (1968). The structure of inference. New York: John Wiley & Sons.

    Google Scholar 

  • Ghosh, J. K., & Ramamoorthi, R. (2003). Bayesian nonparametrics. New York: Springer.

    Google Scholar 

  • Gunsjö, A. (1994). Faktoranalys av ordinala variabler. Studia statistica Upsaliensia. Stockholm, Sweden: Acta Universitatis Upsaliensis.

    Google Scholar 

  • Haberman, S. J. (2006). Adaptive quadrature for item response models. ETS Research Report Series, 2006(2), 1–10.

    Article  Google Scholar 

  • Haberman, S. J. (2013). A general program for item-response analysis that employs the stabilized Newton-Raphson algorithm. ETS Research Report Series, 2013(2), 1–98.

    Article  Google Scholar 

  • Hannig, J. (2009). On generalized fiducial inference. Statistica Sinica, 19(2), 491.

    Google Scholar 

  • Hannig, J. (2013). Generalized fiducial inference via discretization. Statistica Sinica, 23(2), 489–514.

    Google Scholar 

  • Jennrich, R. I. (1973). Standard errors for obliquely rotated factor loadings. Psychometrika, 38(4), 593–604.

    Article  Google Scholar 

  • Le Cam, L., & Yang, G. L. (2000). Asymptotics in Statistics: Some Basic Concepts. Springer Series in Statistics. New York: Springer-Verlag.

    Book  Google Scholar 

  • Liu, Y., & Maydeu-Olivares, A. (2013). Identifying the source of misfit in item response theory models. Multivariate Behavioral Research. In press.

  • Liu, Y., & Thissen, D. (2012). Identifying local dependence with a score test statistic based on the bifactor logistic model. Applied Psychological Measurement, 36(8), 670–688.

    Article  Google Scholar 

  • Meng, X.-L., & Schilling, S. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91(435), 1254–1267.

    Article  Google Scholar 

  • Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43(4), 551–560.

    Article  Google Scholar 

  • Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus User’s Guide. Los Angeles, CA: Muthén & Muthén.

  • Neale, M. C., & Miller, M. B. (1997). The use of likelihood-based confidence intervals in genetic models. Behavior genetics, 27(2), 113–120.

    Article  PubMed  Google Scholar 

  • Patz, R. J., & Junker, B. W. (1999). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses. Journal of Educational and Behavioral Statistics, 24(4), 342–366.

    Article  Google Scholar 

  • Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applies statistician. The Annals of Statistics (pp. 1151–1172).

  • Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70(3), 533–555.

    Google Scholar 

  • Shafer, G. (1976). A mathematical theory of evidence. Princeton, NJ: Princeton University Press.

    Google Scholar 

  • Van der Vaart, A. W. (2000). Asymptotic statistics. Cambridge University Press.

  • Weerahandi, S. (1993). Generalized confidence intervals. Journal of the American Statistical Association, 88(423), 899–905.

    Article  Google Scholar 

  • Wirth, R., & Edwards, M. C. (2007). Item factor analysis: current approaches and future directions. Psychological methods, 12(1), 58.

    Article  PubMed  PubMed Central  Google Scholar 

  • Yuan, K.-H., Cheng, Y., & Patton, J. (2014). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79(2), 232–254.

    Article  PubMed  Google Scholar 

  • Zabell, S. L. (1992). R. A. Fisher and fiducial argument. Statistical Science, 7(3), 369–387.

    Article  Google Scholar 

Download references

Acknowledgments

We are grateful to Dr. David Thissen from the Department of Psychology at the University of North Carolina at Chapel Hill and Dr. Alberto Maydeu-Olivares from the Department of Psychology at University of Barcelona for their valuable advice and feedback on this paper. Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1016441 and 1512945.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Appendices

Appendix 1: Proof of Lemma 1

Let \(\mathbf{V}\) be the random variable that equals to one of the \(C_n\) potential extremal points with equal probability unconditionally, i.e., \(P\{\mathbf{V} = \mathbf{V}_I\} = C_n^{-1}\). It follows that

$$\begin{aligned} P\left\{ \mathbf{V}\le {\varvec{\theta }}, Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^\star )\ne \emptyset \right\} = C_n^{-1}\sum _IP\left\{ \mathbf{V}_I\le {\varvec{\theta }}, D_I\right\} . \end{aligned}$$
(31)

The remaining task is to derive each summand on the right-hand side (RHS) of Eq. 31 and then differentiate it with respect to \(\varvec{\theta }\).

Consider a single item j first. Recall that \(\mathbf{V}_{I_j}\) is the potential vertex determined by sub-sample \(I_j\). When \(\mathbf{V}_{I_j}= {\varvec{\theta }}_j'\) and serves as an interior vertex of \(Q_j(\mathbf{y}_{(j)}, \mathbf{A}_{(j)}^\star , \mathbf{Z}^\star )\), it means that \(\tau _j({\varvec{\theta }}'_j, \mathbf{Z}_i^\star )=A_{ij}^\star \) for all \(i\in {I}_j\), and that \({\varvec{\theta }}_j'\) should not conflict with the half-spaces of the other observations: i.e., for all \(i\in {I}_j^c\), \(A_{ij}^\star \le \tau _j({\varvec{\theta }}_j', \mathbf{Z}_i^\star )\) if \(y_{ij} = 1\), and \( A_{ij}^\star >\tau _j({\varvec{\theta }}_j', \mathbf{Z}_i^\star )\) if \(y_{ij} = 0\). Thus, conditional on \(\mathbf{Z}^\star = \mathbf{z}\), we have

$$\begin{aligned}&P\left\{ \mathbf{V}_{I_j}\le {\varvec{\theta }}_j, Q_j(\mathbf{y}_{(j)}, \mathbf{A}_{(j)}^\star , \mathbf{Z}^\star )\ne \emptyset \ |\ \mathbf{V} = \mathbf{V}_I, \mathbf{Z}^\star = \mathbf{z}\right\} \\&\quad =\int _{{\varvec{\theta }}'_j\le {\varvec{\theta }}_j}\left| \det (\partial \tau _j({\varvec{\theta }}_j,\mathbf{z}_i)/\partial {\varvec{\theta }}_j)_{i\in I_j}\right| \prod _{i\in {I}_j}\frac{e^{\tau _j({\varvec{\theta }}_j', \mathbf{z}_i)}}{[1 + e^{\tau _j({\varvec{\theta }}_j', \mathbf{z}_i)}]^2}\prod _{i\in {I}_j^c}f_j({\varvec{\theta }}_j', y_{ij}|\mathbf{z}_i)d{\varvec{\theta }}'_j, \end{aligned}$$
(32)

in which the determinant and the first product are due to the change of variables from \((A_{ij}^\star )_{i\in I_j}\) to \(\mathbf{V}_{I_j}\) (the standard logistic density \(\psi (x) = e^x/(1 + e^x)^2\)), and the second product corresponds to the logistic probabilities of those inequalities that the other observations should satisfy.

Due to the conditional independence assumption,

$$\begin{aligned}&P\left\{ \mathbf{V}_I\le {\varvec{\theta }}, Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^\star )\ne \emptyset \ |\ \mathbf{V} = \mathbf{V}_I\right\} \\&\quad =\int _{ {\mathbb R}^{nr}}\prod _{j=1}^mP\left\{ \mathbf{V}_{I_j}\le {\varvec{\theta }}_j, Q(\mathbf{y}_{(j)}, \mathbf{A}^\star _{(j)}, \mathbf{Z}^\star )\ne \emptyset \ |\ \mathbf{V} = \mathbf{V}_I, \mathbf{Z}^\star = \mathbf{z}\right\} d\Phi (\mathbf{z})\\&\quad =\int _{ {\mathbb R}^{nr}}\int _{{\varvec{\theta }}'\le {\varvec{\theta }}}d_I({\varvec{\theta }}', \mathbf{z}_I)\prod _{j=1}^m\left\{ \prod _{i\in {I}_j}\frac{e^{\tau _j({\varvec{\theta }}_j', \mathbf{z}_i)}}{[1 + e^{\tau _j({\varvec{\theta }}_j', \mathbf{z}_i)}]^2}\prod _{i\in {I}_j^c}f_j({\varvec{\theta }}_j',y_{ij}|\mathbf{z}_i)\right\} d{\varvec{\theta }}'d\Phi (\mathbf{z}). \end{aligned}$$
(33)

Equation 15 is established by substituting Eq. 33 back into the RHS of Eq. 31, switching the order of integrals, and differentiating with respective to \(\varvec{\theta }\).

Appendix 1: Proof of Theorem 1

We start from re-expressing the density of \(\mathbf R(y)\), i.e., Eq. 15. Note that the summands of Eq. 15 corresponding to I and \(I'\), \(I\ne I'\), are the same whenever \(\mathbf{y}_I = \mathbf{y}_{I'}\); hence, the (outer) sum over index sets I therein can be reduced to a finite sum over sub-sample response patterns \(\mathbf{y}_I\). Note that \(\bigcup _{j=1}^m I_{j}\) has at least \(\max _jq_j\) and at most \(\sum _{j=1}^mq_j\) elements. Let \(G_n = {n\atopwithdelims ()\sum _{j=1}^mq_j}\) be the total number of size-\(\sum _{j=1}^mq_j\) sub-samples. Also let \(p_n(\mathbf{y}_I) = G_n^{-1}\sum _{I}{\mathbb {I}}\{ \mathbf{Y}_{I} = \mathbf{y}_I\}\). By the standard theory of U-statistics, \(p_n(\mathbf{y}_I)\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}\pi _0(\mathbf{y}_I)\), in which \(\pi _0(\mathbf{y}_I)\) is determined by the data-generating parameter values \({\varvec{\theta }}_0\), and \(\pi _0(\mathbf{y}_I) = 0\) if \(|I|<\sum _{j=1}^mq_j\). Then, the density can be written as

$$\begin{aligned} g_n({\varvec{\theta }} | \mathbf{y}) \propto b_n({\varvec{\theta }}, \mathbf{y})f_n({\varvec{\theta }}, \mathbf{y}). \end{aligned}$$
(34)

In Eq. 34, \(f_n({\varvec{\theta }}, \mathbf{y})\) is the sample likelihood, and

$$\begin{aligned} b_n({\varvec{\theta }}, \mathbf{y}) = G_n\sum _{\mathbf{y}_I}p_n(\mathbf{y}_I)b_{ \mathbf{y}_I}({\varvec{\theta }}), \end{aligned}$$
(35)

in which

$$\begin{aligned} b_{ {{\mathbf{y}}}_I}({{\varvec{\theta }}}) =&\int d_I({{\varvec{\theta }}}, \mathbf{z}_I)\prod _{i\in {I}}\left\{ \prod _{j\in J_i}\frac{e^{\tau _j({{\varvec{\theta }}}_j, \mathbf{z}_i)}}{[1 + e^{\tau _j({{\varvec{\theta }}}_j, {{\mathbf{z}}}_i)}]^2}\prod _{j\notin J_i}f_j({{\varvec{\theta }}}_j, y_{ij}| \mathbf{z}_i)\right\} \mathrm{d}\Phi ({{\mathbf{z}}}_{I})\nonumber \\&\biggr /\int \prod _{i\in I}\prod _{j=1}^mf_j({{\varvec{\theta }}}_j, y_{ij}| \mathbf{z}_i)\mathrm{d}\Phi ({{\mathbf{z}}}_{I}). \end{aligned}$$
(36)

Equation 35 is a repetition of Eq. 19. Also let

$$\begin{aligned} a_n({\varvec{\theta }}, \mathbf{y}) = f_n({\varvec{\theta }}, \mathbf{y})b_n({\varvec{\theta }},\mathbf{y}) \end{aligned}$$
(37)

be the RHS of Eq. 34.

Next, we consider the local parameter \(\mathbf{h}=\sqrt{n}({\varvec{\theta }} - {\varvec{\theta }}_0)\). Some short-hand notation is introduced for conciseness: Let \(b_{n, \mathbf{h}} = b_n({\varvec{\theta }}_0+\mathbf{h}/\sqrt{n},\mathbf{y})/G_n\), \(a_{n, \mathbf h} = a_n({\varvec{\theta }}_0 + \mathbf{h}/\sqrt{n}, \mathbf{y})\), and \(f_{n, \mathbf h} = f_n({\varvec{\theta }}_0 + \mathbf{h}/\sqrt{n}, \mathbf{y})/\sqrt{n}\); also let \(b_0=\sum _{\mathbf{y}_{I}} \pi _0(\mathbf{y}_{I})b_{ \mathbf{y}_I}({\varvec{\theta }}_0)\), \(a_{n, 0} = a_n({\varvec{\theta }}_0, \mathbf{y})\), and \(f_{n, 0}=f_n({\varvec{\theta }}_0,\mathbf{y})\). Using this new notation, the corresponding density of the local parameter can be written as

$$\begin{aligned} \bar{g}_n(\mathbf{h} | \mathbf{y})\propto a_{n, \mathbf h} = b_{n, \mathbf h}f_{n, \mathbf h}. \end{aligned}$$
(38)

For each \(\mathbf{y}_{I}\), \(b_{ \mathbf{y}_I}({\varvec{\theta }})\) is continuous in \(\varvec{\theta }\) (it is in fact differentiable). In addition, we know that \(p_n(\mathbf{y}_{I})\rightarrow \pi _0(\mathbf{y}_{I})\) in \(P_{ {\varvec{\theta }}_0}\)-probability. Consequently, \(b_{n,\mathbf h}\rightarrow b_0\) in \(P_{ {\varvec{\theta }}_0}\)-probability.

We also consider the Taylor series expansion of \(\log f_{n, \mathbf h}\) at the true parameter \({\varvec{\theta }}_0\):

$$\begin{aligned} \log \frac{f_{n, \mathbf{h}}}{f_{n, 0}} = \mathbf{h}{}^\top \mathbf{S}_n + \frac{1}{2n}\sum _{i=1}^n\mathbf{h}{}^\top \mathbf{H}({\varvec{\theta }}_0,\mathbf{y}_i)\mathbf{h} + R_{n, \mathbf h}. \end{aligned}$$
(39)

Here, some comments are made for each term of Eq. 39. (a) The sequence \(\{ \mathbf{S}_n\}\) is tight by the convergence result given by Eq. 13; hence, for each \(\varepsilon >0\), there exists a compact set \(K_\varepsilon \subset {\mathbb R}^q\) such that \( P (K_\varepsilon )>1-\varepsilon \) and \(\mathbf{S}_n\in K_\varepsilon \) for all n. If we restrict the consideration to \(K_\varepsilon \), then the first term of Eq. 39 is bounded for each \(\mathbf h\). (b) By the (Uniform) Law of Large Numbers, the second term converges to \(\mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}\) in probability (the convergence is uniform for \(\mathbf h\) in compact sets). (c) The remainder term has the following form:

$$\begin{aligned} R_{n,\mathbf h} = \sum _{i=1}^n\sum _{|\mathbf{t}|=3}\frac{f^{(\mathbf t)}(\bar{\varvec{\theta }},\mathbf{Y}_i)}{ \mathbf{t}!}\left( \frac{ \mathbf{h}}{\sqrt{n}} \right) ^\mathbf{t}. \end{aligned}$$
(40)

In Eq. 40, \(\mathbf{t}=(t_1,\ldots ,t_q)\) is a q-tuple of non-negative integers serving as a multi-index: \(|\mathbf{t}|=\sum _{s=1}^qt_s\), \(\mathbf{h}^\mathbf{t}=h_1^{t_1}\cdots h_q^{t_q}\), \(\mathbf{t}!=\frac{q!}{t_1!\cdots t_q!}\), and \(f^{(\mathbf{t})}=\frac{\partial ^{|\mathbf t|}f}{\partial ^{t_1}\theta _1\cdots \partial ^{t_q}\theta _q}\), where \(h_1,\ldots , h_q\) and \(\theta _1,\ldots ,\theta _q\) are the coordinates of \(\mathbf h\) and \(\varvec{\theta }\), respectively. \(\bar{\varvec{\theta }}\) lies between \({\varvec{\theta }}_0\) and \({\varvec{\theta }}_0+\mathbf{h}/\sqrt{n}\).

Now we proceed to the proof of Theorem 1, i.e., Eq. 20. By an argument similar to Ghosh and Ramamoorthi (2003), it suffices to show for each \(\varepsilon >0\) that

$$\begin{aligned} \int _{H_n}\left| \frac{a_{n, \mathbf h}}{f_{n, 0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h}\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0 \end{aligned}$$
(41)

To see this, let \(D_n=\int _{H_n} a_{n, \mathbf{h}}d\mathbf{h}/f_{n, 0}\). The left-hand side (LHS) of Eq. 20 can be bounded by

$$\begin{aligned} D_n^{-1}\int _{H_n}\left| \frac{a_{n, \mathbf h}}{f_{n,0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h} + \int _{H_n} \left| D_n^{-1}b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}} - \phi _{\varvec{\mathcal I}_0^{-1}{} \mathbf{S}_n,\varvec{\mathcal I}_0^{-1}}(\mathbf{h})\right| d\mathbf{h}. \end{aligned}$$
(42)

Notice that Eq. 41 implies \(|D_n - b_0\int _{H_n}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}|\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0\). We also know that

$$\begin{aligned} De^{\frac{1}{2}{} \mathbf{S}_n{}^\top \varvec{\mathcal I}_0^{-1}{} \mathbf{S}_n} \le \int _{H_n}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}\mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h} \le D'e^{\frac{1}{2}{} \mathbf{S}_n{}^\top \varvec{\mathcal I}_0^{-1}{} \mathbf{S}_n}, \end{aligned}$$
(43)

for some suitable constants D and \(D'\), because the local parameter space \(H_n\) satisfies \(\Theta -{\varvec{\theta }}_0\subset H_n \subset {\mathbb R}^q\). It follows that \(D_n^{-1}\) is \(O_p(1)\), and that the first integral in Eq. 42 converges to zero in probability. Further let \(T_{1, n}\) be the integral in Eq. 43, and \(T_{2, n}=|D_n^{-1}b_0-T_{1, n}^{-1}|\); then, the second integral of Eq. 42 can be written as \(T_{1, n}T_{2, n}\). The sequence \(\{T_{1, n}\}\) is tight by Eq. 43, so for each \(\eta > 0\), there exists an \(L_{\eta }\) such that \( P (T_{1, n} \le L_{\eta })>1-\eta \) for all n. Moreover, \(T_{2, n}\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0\) by Eq. 41. Fix \(\varepsilon ,\eta >0\), we have

$$\begin{aligned} P (T_{1, n}T_{2, n} >\varepsilon ) \le P (T_{1, n}T_{2, n} >\varepsilon , T_{1, n}\le L_{\eta }) + P (T_{1, n} > L_{\eta }) \le P (T_{2, n} >\varepsilon /L_{\eta }) + \eta ,\\ \end{aligned}$$
(44)

which can be made less than \(2\eta \) for large enough n. Therefore, \(T_{1, n}T_{2, n}\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0\). Because both integrals in Eq. 42 converge to 0 in probability, Eq. 20 is established.

For the remaining part of the proof, we partition the domain of integration of Eq. 20 into four regions (for n large enough), and establish the desired convergence on each part. The four regions are as follows:

$$\begin{aligned}&A_{1,n}=\{ \mathbf{h}: \Vert \mathbf{h}\Vert < B\log n\}\cap H_n ,\ \hbox {for some large number }B>0;\\&A_{2,n}=\{ \mathbf{h}: B\log n \le \Vert \mathbf{h}\Vert < \delta \sqrt{n}\}\cap H_n ,\ \hbox {for some small number }\delta >0;\\&A_{3,n}=\{ \mathbf{h}: \delta \sqrt{n} \le \Vert \mathbf{h}\Vert \le B'\sqrt{n}\}\cap H_n ,\ \hbox {for another large number }B'>0;\\&A_{4,n}=\{ \mathbf{h}: \Vert \mathbf{h}\Vert > B'\sqrt{n}\}\cap H_n . \end{aligned}$$

In terms of the constants, we first choose \(\delta \) and B to ensure the convergence on \(A_{2,n}\). The convergence on \(A_{1,n}\) holds for any \(B>0\), so it also holds for the particular B that we select. Then we consider region \(A_{4,n}\) and select \(B'\). Finally, we show the integral convergences for \(\mathbf{h}/\sqrt{n}\) in any compact sets excluding \(\mathbf 0\), from which the convergence on \(A_{3,n}\) follows.

Region \(A_{2,n}\)   Because the likelihood function is three times continuously differentiable with respect to \(\varvec{\theta }\), and also because there are finitely many (i.e., \(\prod _{j=1}^mK_j\)) individual patterns of \(\mathbf{y}_i\), the remainder term (Eq. 40) of the Taylor expansion (Eq. 39) has the following bound for each \(\delta >0\) and \(\Vert \mathbf{h}\Vert \le \delta \sqrt{n}\):

$$\begin{aligned} |R_{n, \mathbf h}| \le M(\delta ) \frac{\Vert \mathbf{h}\Vert ^3}{n^{3/2}} \le M(\delta )\delta ^3, \end{aligned}$$
(45)

as a result of the multinomial theorem and the Cauchy–Schwarz inequality, in which \(M(\delta )\) is a constant multiple of \(|\max _{|\mathbf{t}|=3, \mathbf{y}_i}\sup _{\Vert {\varvec{\theta }-\theta }_0\Vert \le \delta }f^{(\mathbf{t})}({\varvec{\theta }},\mathbf{y}_i)|\). Since \(M(\delta )\downarrow \) as \(\delta \downarrow 0\), Eq. 45 allows us to choose \(\delta \) small enough such that \(|R_{n,\mathbf{h}}| < \frac{1}{4}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}\) for all \(\mathbf{h}\in A_{2,n}\). Then for such \(\delta \),

$$\begin{aligned}&\int _{A_{2,n}}\left| \frac{a_{n, \mathbf h}}{f_{n, 0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h}\\&\quad \le \int _{A_{2,n}}\frac{a_{n, \mathbf h}}{f_{n, 0}}d\mathbf{h} + \int _{A_{2,n}}b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}\\&\quad \le \sup _{ \mathbf{h}\in A_{2,n}}b_{n, \mathbf h}\int _{A_{2,n}}\frac{f_{n, \mathbf h}}{f_{n, 0}}d\mathbf{h} + b_0\int _{A_{2,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}\\&\quad \le \left( \sup _{ \mathbf{h}\in A_{2,n}}b_{n, \mathbf h} + b_0\right) \int _{A_{2,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{4}\mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h} + o_p(1). \end{aligned}$$
(46)

In the last line of Eq. 46, the parenthesized term is bounded due to the continuity of function \(b_{\mathbf{y}_I}({\varvec{\theta }})\), the boundedness of set \(A_{2, n}\), and our selection of \(\delta \). The \(o_p(1)\) term comes from the uniform convergence of the second term in the Taylor expansion (Eq. 39). Also notice that

$$\begin{aligned} \int _{A_{2,n}}e^{-\frac{1}{4}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h} \le Ce^{-C'B\log n}(\delta \sqrt{n}-B\log n)^q \le C''n^{q/2-C'B}, \end{aligned}$$
(47)

where C, \(C'\), and \(C''\) are constants. By selecting B large enough, Eq. 47 implies \(\int _{A_{2,n}}e^{-\frac{1}{4}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}\rightarrow 0\). Finally, an argument using tightness similar to Eqs. 43 and 44 shows that the RHS of Eq. 46 converges to 0 in probability.

Region \(A_{1,n}\)   The convergence on \(A_{1,n}\) can be established similarly. Fix an arbitrary \(B>0\). For the particular \(\delta \), we have selected

$$\begin{aligned} \sup _{ \mathbf{h}\in A_{1, n}}|R_{n, \mathbf h}| \le M(\delta ) \sup _{ \mathbf{h}\in A_{1, n}}\frac{\Vert \mathbf{h}\Vert ^3}{n^{3/2}} \le M(\delta )B^3 \frac{\log ^3n}{n^{3/2}} = o(1), \end{aligned}$$
(48)

in which \(M(\delta )\) is the same as in Eq. 45. Then

$$\begin{aligned} \int _{A_{1,n}}\left| \frac{a_{n, \mathbf h}}{f_{n, 0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h} \le&\int _{A_{1,n}}b_{n,\mathbf h}\left| e^{R_{n, \mathbf h}} - 1\right| e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}\mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h} \\&+ \int _{A_{1,n}}|b_{n, \mathbf h} - b_0|e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}+ o_p(1).\\ \end{aligned}$$
(49)

In Eq. 49, the \(o_p(1)\) term is again due to the uniform convergence of the second term in Eq. 39. Eq. 48 implies that \(\sup _{ \mathbf{h}\in A_{1, n}}|e^{R_{n, \mathbf h}} - 1| \rightarrow 0\); together with \(b_{n, \mathbf h} \mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}b_0\) and the boundedness of \(A_{1, n}\), both integrals on the RHS of Eq. 49 converges to 0 in probability (the tightness argument needs to be used again).

Region \(A_{4,n}\)   Assume for a moment that there exists a large number \(B'\) such that

$$\begin{aligned} \sup _{ \Vert {\varvec{\theta }-\varvec{\theta }}_0\Vert >B'}\min _{ \mathbf{y}_i}f({\varvec{\theta }}, \mathbf{y}_i) < f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )^{2/f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )}, \end{aligned}$$
(50)

in which \(\mathbf{y}_i^\circ \) is the least plausible individual response pattern under \(P_{{\varvec{\theta }}_0}\). Also write \(p_n(\mathbf{y}_i^\circ )\) be the observed proportion of \(\mathbf{y}_i\); \(p_n(\mathbf{y}_i^\circ )\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )\). Then, on region \(A_{4, n}\) defined by such a \(B'\),

$$\begin{aligned} P\left\{ \frac{\min _{y_i}f({\varvec{\theta }}, \mathbf{y}_i)^{p_n(\mathbf{y}_i^\circ )} }{f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )} < 1 \right\} \ge P\left\{ p_n(\mathbf{y}_i^\circ ) > \frac{f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )}{2}\right\} \rightarrow 1. \end{aligned}$$
(51)

Therefore, we have

$$\begin{aligned} \frac{f_{n, \mathbf h}}{f_{n, 0}}\le \left[ \frac{\min _{y_i}f({\varvec{\theta }}, \mathbf{y}_i)^{p_n(\mathbf{y}_i^\circ )} }{f({\varvec{\theta }}_0, \mathbf{y}_i^\circ )} \right] ^n \le \rho ^n + o_p(1) \end{aligned}$$
(52)

for some \(0<\rho <1\). Also note that this likelihood ratio bound is not affected if finitely many observations are removed from \(f_{n, \mathbf h}\), which is the case after dividing by the denominator of each summand of \(b_{n, \mathbf h}\). As a result,

$$\begin{aligned} \int _{A_{4,n}}\left| \frac{a_{n, \mathbf h}}{f_{n, 0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h} \le&\int _{A_{4,n}}\frac{b_{n, \mathbf h}f_{n, \mathbf h}}{f_{n, 0}}d\mathbf{h} + b_0\int _{A_{4,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}\\\le&\ K\rho ^n + b_0\int _{A_{4,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h} + o_p(1), \end{aligned}$$
(53)

in which K is a constant. Equation 53 results from the fact that: (a) The numerator of Eq. 36 is integrable with respect to the Lebesgue measure on the parameter space, which contributes to the constant K; (b) \(p_n(\mathbf{y}_I)\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}\pi _0(\mathbf{y}_I)\), so the latter also contributes to K, while the difference of the two is merged into the \(o_p(1)\) term. The second term on the RHS of Eq. 53 converges to zero by a similar tightness argument and the tail estimate of a multivariate normal distribution. These altogether show that the LHS of Eq. 53 converges to zero in probability.

Now we prove the result stated by Eq. 50; we denote the RHS of Eq. 50 by \(\eta \).

First, consider the parameter subspace of \(\alpha _j\) and \({\varvec{\beta }}_j\) for each j. Let \(L_j = \Vert (\alpha _j\ {\varvec{\beta }}_j{}^\top ){}^\top \Vert \), and \(\mathbf{d}_j = (\alpha _j\ {\varvec{\beta }}_j{}^\top ){}^\top / L_j\in {\mathbb R}^{r + 1}\) be a unit directional vector, in which the coordinates corresponding to fixed slopes are set to 0. Also introduce the partition \(\mathbf{d}_j=(x_j\ \mathbf{e}_j{}^\top ){}^\top \) separating the direction of the intercept parameter, i.e., the first coordinate \(x_j\), from those of the slopes. Then, we write

$$\begin{aligned} \tau _j({\varvec{\theta }}_j, \mathbf{Z}_i^\star ) = \alpha _j+{\varvec{\beta }}_j{}^\top \mathbf{Z}_i^\star = L_j(x_j + \mathbf{e}_j{}^\top \mathbf{Z}_i^\star ), \end{aligned}$$
(54)

in which \(x_j + \mathbf{e}_j{}^\top \mathbf{Z}_i^\star \sim \mathcal{N}(x_j, 1-x_j^2)\). For fixed \(\mathbf{d}_j\), define \(H_{ \mathbf{d}_j}^\varepsilon (y) = \{ \mathbf{z}_i\in {\mathbb R}^r: (-1)^y(x_j + \mathbf{e}_j{}^\top \mathbf{z}_i) \ge \varepsilon \}\) for \(\varepsilon \ge 0\).

Now pool across multiple items. A direct consequence of Lemma 2, which is presented soon, is that \({\mathbb R}^r\subset \bigcup _{j=1}^{r+1}H_{ \mathbf{d}_j}^0(y_{ij})\) for properly selected \((y_{ij})_{j=1}^{r + 1}\) (recall that we assume \(m>r\), so there are sufficient items). Then, for any \(\varepsilon > 0\), the following bound can be established for the likelihood of an individual response pattern in which the first \(r + 1\) items have the selected pattern \((y_{ij})_{j=1}^{r + 1}\):

$$\begin{aligned} f({\varvec{\theta }}, \mathbf{y}_i) =&\int _{ {\mathbb R}^r}\prod _{j=1}^mf_j({\varvec{\theta }}_j, y_{ij}|\mathbf{z}_i)d\Phi (\mathbf{z}_i) \\\le&\sum _{j=1}^{r+1}\int _{H_{ \mathbf{d}_j}^{0}(y_{ij})}f_j({\varvec{\theta }}_j, y_{ij}|\mathbf{z}_i)d\Phi (\mathbf{z}_i) \\\le&\sum _{j=1}^{r+1}\int _{H_{ \mathbf{d}_j}^{\varepsilon }(y_{ij})}f_j({\varvec{\theta }}_j, y_{ij}|\mathbf{z}_i)d\Phi (\mathbf{z}_i)+\sum _{j=1}^{r+1}\Phi \{H_{ \mathbf{d}_j}^{0}(y_{ij})\backslash H_{ \mathbf{d}_j}^{\varepsilon }(y_{ij})\}\\\le&\sum _{j=1}^{r+1}\frac{1}{1 + e^{\varepsilon L_j}}+\sum _{j=1}^{r+1} \Phi \{H_{ \mathbf{d}_j}^{0}(y_{ij})\backslash H_{ \mathbf{d}_j}^{\varepsilon }(y_{ij})\} \end{aligned}$$
(55)

In the last line of Eq. 55, each summand of the second term can be made smaller than \(\frac{\eta }{2(r + 1)}\) by choosing a proper \(\varepsilon \); this result can be strengthened to hold uniformly for all directions \(\mathbf{d}_j\) on \({\mathbb R}^{r + 1}\), as a consequence of Lemma 3. In addition, since there are only finitely many intercept parameters, we can choose a large enough \(B'\) (i.e., \({\varvec{\theta }}\) is sufficiently distant from \({\varvec{\theta }}_0\)) such that \(\frac{1}{1 + e^{\varepsilon L_j}} < \frac{\eta }{2(r + 1)}\) for all j. Consequently, for each \(\varvec{\theta }\) satisfying \(\Vert {\varvec{\theta }} - {\varvec{\theta }}_0\Vert > B'\), we are able to find an individual response pattern \(\mathbf{y}_i\) such that the corresponding value of Eq. 55 can be bounded by the desired number \(\eta \), which establishes the result stated by Eq. 50. The two lemmas required in the foregoing proof are presented next.

Lemma 2

Consider a sequence of affine hyperplanes \(\{\mathbf{z}\in {\mathbb R}^r: \mathbf{a}_i{}^\top \mathbf{z}=b_i\}_{i=1}^k\). Let half-space \(H_i\) be either \(\mathbf{a}_i{}^\top \mathbf{z}\ge b_i\) or \(\mathbf{a}_i{}^\top \mathbf{z}\le b_i\). There exists some choice of \(\{H_i\}_{i=1}^k\) such that \({\mathbb R}^r \subset \bigcup _{i=1}^k H_i\), if and only if \(\mathbf{a}_i\)’s are linearly dependent.

Proof

\((\Leftarrow )\) Suppose \(\mathbf{a}_i\)’s are linearly dependent. There exists an \(\mathbf{a}_i\) that can be written as a non-trivial linear combination of the others. Without loss of generality, let \(\mathbf{a}_1\) be such a vector:

$$\begin{aligned} \mathbf{a}_1 = \sum _{i=2}^kc_i\mathbf{a}_i, \end{aligned}$$
(56)

in which at least one \(c_i\) is non-zero. If \(\sum _{i=2}^kc_ib_i\ge b_1\), then for \(i=2,\ldots ,k\) set \(H_i=\{ \mathbf{z}: \mathbf{a}_i{}^\top \mathbf{z} \ge b_i\}\) when \(c_i \le 0\) and \(H_i=\{ \mathbf{z}: \mathbf{a}_i{}^\top \mathbf{z} \le b_i\}\) when \(c_i > 0\). It follows that

$$\begin{aligned} \bigcap _{i=2}^k H_i^c \subset \left\{ \mathbf{z}: \sum _{i=2}^kc_i\mathbf{a}_i{}^\top \mathbf{z} > \sum _{i=2}^kc_ib_i\right\} \subset \{ \mathbf{z}: \mathbf{a}_1{}^\top \mathbf{z} \ge b_1 \}. \end{aligned}$$
(57)

By letting \(H_1\) be the RHS of Eq. 57, we have \({\mathbb R}^r\subset \bigcap _{i=1}^k H_i\). A similar argument can be used to establish the statement when \(\sum _{i=2}^kc_ib_i < b_1\).

\((\Rightarrow )\) Suppose the \(\mathbf{a}_i\)’s are linearly independent, which implies that the set of Eqs. \(\{\mathbf{a}_i{}^\top \mathbf{z}=\mathbf{b}_i\}_{i=1}^k\) has at least one solution, denoted \(\mathbf{z}'\). Consider the k-dimensional subspace spanned by the coordinate system \(\{ \mathbf{a}_i\}_{i=1}^n\) with an origin at \(\mathbf{z}'\). For each i, the half-space \(H_i\) corresponds to either the positive or negative side of vector \(\mathbf{a}_i\), depending on the direction of the inequality. No matter how we choose the \(H_i\)’s, there will be one out of \(2^k\) “orthants” corresponding to \(\bigcap _{i=1}^kH_i^c\) left uncovered, which proves the “only if” part. \(\square \)

Lemma 3

Let \(Z_x\sim \mathcal{N}(x, 1-x^2)\) be a one-parameter family of normal random variables with \(x\in [-1, 1]\). Given any \(\eta \in (0, 1/2)\), there exists an \(\varepsilon >0\) such that \(\sup _{x\in [-1, 1]}P(|Z_x| \le \varepsilon ) < \eta \).

Proof

By symmetry, \(\sup _{x\in [0, 1]}P(|Z_x| \le \varepsilon )=\sup _{x\in [-1, 1]}P(|Z_x| \le \varepsilon )\), so we only need to consider non-negative x’s in the proof. Note that for all \(\varepsilon \in [0, 1)\) and \(x>\varepsilon \),

$$\begin{aligned} P (Z_x \le \varepsilon ) = \Phi \left( \frac{\varepsilon -x}{\sqrt{1-x^2}}\right) \downarrow 0, \end{aligned}$$
(58)

as \(x\uparrow 1\), due to the monotonicity of the functions involved. Now fix an \(\eta \in (0, 1/2)\). Equation 58 implies there exists an \(x'\in (1/2, 1)\) such that \( P (Z_{x'} \le 1/2 ) < \eta \). Then for all \(x\in (x', 1]\) and \(\varepsilon \in (0, 1/2]\), we have

$$\begin{aligned} P (|Z_x| \le \varepsilon ) \le P (Z_x\le \varepsilon ) \le P (Z_{x'} \le \varepsilon ) < \eta . \end{aligned}$$
(59)

For \(x\in [0, x']\), the variance of \(Z_x\) is bounded from below by \(1-x'{}^2\). We select \(\varepsilon '\) such that \(P(|Z_{x'} - x'|\le \varepsilon ') < \eta \). Then by Anderson’s inequality,

$$\begin{aligned} P (|Z_x| \le \varepsilon ') \le P (|Z_x-x| \le \varepsilon ') \le P (|Z_{x'} - x'| \le \varepsilon ') < \eta . \end{aligned}$$
(60)

The statement follows by setting \(\varepsilon =\min \{1/2, \varepsilon '\}\).

Region \(A_{3,n}\)   Let \(K_{-0}\) be any compact subset of \(\Theta \) which is bounded away from \({\varvec{\theta }}_0\). By a well-known application of Jensen’s inequality:

$$\begin{aligned} E _{ {\varvec{\theta }}_0}\log \frac{f({\varvec{\theta }}, \mathbf{Y}_i)}{f({\varvec{\theta }}_0, \mathbf{Y}_i)} \le \log E _{ {\varvec{\theta }}_0}\frac{f({\varvec{\theta }}, \mathbf{Y}_i)}{f({\varvec{\theta }}_0, \mathbf{Y}_i)}=0. \end{aligned}$$
(61)

In fact, the inequality in Eq. 61 is strict by the model identification assumption (ii) of Theorem 1. Because \( K_{-0}\) is compact, there exists a positive number \(\kappa \) such that

$$\begin{aligned} \sup _{{\varvec{\theta }}\in K_{-0}} E _{ {\varvec{\theta }}_0}\log \frac{f({\varvec{\theta }}, \mathbf{Y}_i)}{f({\varvec{\theta }}_0, \mathbf{Y}_i)} < -\kappa , \end{aligned}$$
(62)

by the continuity of the LHS function. Moreover, by the Uniform Law of Large Numbers,

$$\begin{aligned} \sup _{{\varvec{\theta }}\in K_{-0}}\left| \frac{1}{n}\sum _{i=1}^n\log \frac{f({\varvec{\theta }}, \mathbf{Y}_i)}{f({\varvec{\theta }}_0, \mathbf{Y}_i)} - E _{ {\varvec{\theta }}_0}\log \frac{f({\varvec{\theta }}, \mathbf{Y}_i)}{f({\varvec{\theta }}_0, \mathbf{Y}_i)}\right| \mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0. \end{aligned}$$
(63)

Therefore, \(\sup _{{\varvec{\theta }}\in K_{-0}}\prod _{i=1}^nf({\varvec{\theta }}, \mathbf{Y}_i)/\prod _{i=1}^nf({\varvec{\theta }}_0, \mathbf{Y}_i) \mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0\), which implies

$$\begin{aligned} \sup _{ \mathbf{h}\in A_{3,n}}\frac{f_{n, \mathbf h}}{f_{n,0}}\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0, \end{aligned}$$
(64)

because \(\mathbf{h}\in A_{3,n}\) implies \(\Vert {\varvec{\theta }} - {\varvec{\theta }}_0\Vert \in [\delta , B']\). It follows that

$$\begin{aligned}&\int _{A_{3,n}}\left| \frac{b_{n, \mathbf h}f_{n, \mathbf h}}{f_{n, 0}} - b_0e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}\right| d\mathbf{h}\\&\quad \le \int _{A_{3,n}}\left| \frac{b_{n, \mathbf h}f_{n, \mathbf h}}{f_{n, 0}}\right| d\mathbf{h} + b_0\int _{A_{3,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}\\&\quad \le \sup _{ \mathbf{h}\in A_{3,n}}\left| \frac{f_{n, \mathbf h}}{f_{n, 0}}\right| \int _{A_{3,n}}b_{n, \mathbf h}d\mathbf{h} + b_0\int _{A_{3,n}}e^{ \mathbf{h}{}^\top \mathbf{S}_n - \frac{1}{2}{} \mathbf{h}{}^\top \varvec{\mathcal I}_0\mathbf{h}}d\mathbf{h}.\\ \end{aligned}$$
(65)

Equation 65 converges in probability to 0 due to the integrability of \(b_{n, \mathbf h}\), the tail estimates of a multivariate normal distribution, and the tightness of \(\mathbf{S}_n\). The proof is now complete.

Appendix 3: Proof of Theorem 2

Recall that \(\mathbf{V}\) has density \(g_n({\varvec{\theta }}|\mathbf{y})\) conditional on \(D(\mathbf{y})\). Take \(\delta > 0\). For each fixed \(\mathbf y\), \(\rho _K(\mathbf{y})\), defined by Eq. 22, can be bounded by

$$\begin{aligned} \rho _K(\mathbf{y})=\,&P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ D(\mathbf{y})\}\\=\,&P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n, \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta \ |\ D(\mathbf{y})\}\\&+ P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n, \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert >\delta \ |\ D(\mathbf{y})\}\\\le \,&P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta , D(\mathbf{y})\}\\&+ P\{\Vert \mathbf{V}-{\varvec{\theta }}_0\Vert >\delta \ |\ D(\mathbf{y})\}. \end{aligned}$$
(66)

Theorem 1 implies that for \(\mathbf Y\) generated from \(P_{ {\varvec{\theta }}_0}\), \(P\{\Vert \mathbf{V}-{\varvec{\theta }}_0\Vert >\delta \ |\ D(\mathbf{Y})\}\), as a measurable function of \(\mathbf Y\), converges to 0 in \(P_{ {\varvec{\theta }}_0}\)-probability: i.e.,

$$\begin{aligned} P\{\Vert \mathbf{V}-{\varvec{\theta }}_0\Vert >\delta \ |\ D(\mathbf{Y})\} = \int _{\Vert {\varvec{\theta }}-{\varvec{\theta }}_0\Vert >\delta } g_n({\varvec{\theta }}|\mathbf{Y})d{\varvec{\theta }}\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}0. \end{aligned}$$
(67)

Hence, we focus on the first term in Eq. 66. This term can be further bounded by

$$\begin{aligned}&P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta ,D(\mathbf{y})\}\\&\quad = \sum _IP\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta , \mathbf{V}= \mathbf{V}_I, D(\mathbf{y})\}\\&\qquad \cdot P\{\mathbf{V}= \mathbf{V}_I \ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta , D(\mathbf{y})\}\\&\quad =\sum _{ \mathbf{y}_I}P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta , \mathbf{V}= \mathbf{V}_I, D(\mathbf{y})\}\\&\qquad \cdot \left( \sum _{I':\mathbf{y}_{I'} = \mathbf{y}_I}P\{\mathbf{V}= \mathbf{V}_{I'}\ |\ \Vert \mathbf{V}-{\varvec{\theta }}_0\Vert \le \delta , D(\mathbf{y})\}\right) \\&\quad \le \sum _{ \mathbf{y}_I}P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}_I-{\varvec{\theta }}_0\Vert \le \delta , \mathbf{V}= \mathbf{V}_I, D(\mathbf{y})\}\\&\quad =\sum _{ \mathbf{y}_I}\int P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ \Vert \mathbf{V}_I-{\varvec{\theta }}_0\Vert \le \delta , \mathbf{V} = \mathbf{V}_I, D(\mathbf{y}), \mathbf{Z}_I^\star = \mathbf{z}_I\}d\Phi (\mathbf{z}_I).\\ \end{aligned}$$
(68)

The first sum over index sets I in the second line of Eq. 68 can be collapsed into a finite sum over all patterns of \(\mathbf{y}_I\) in the third line, for the reason that sub-samples I and \(I'\) having the same response pattern \(\mathbf{y}_I = \mathbf{y}_{I'}\) are exchangeable. Note that the event being conditioned on in the integrand of the last line of Eq. 68 happens with a positive probability almost surely under the probability measure of \(\mathbf{Z}^\star \); to simplify notation, write \(E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I) = \{\Vert \mathbf{V}_I-{\varvec{\theta }}_0\Vert \le \delta , \mathbf{V} = \mathbf{V}_I, D(\mathbf{y}), \mathbf{Z}_I^\star = \mathbf{z}_I\)} as that event. Because there are only finitely many patterns of \(\mathbf{y}_I\), it suffices to prove that for each \(\varepsilon > 0\) and some \(\delta > 0\),

$$\begin{aligned} P_{ {\varvec{\theta }}_0}\left\{ \exists K, N > 0:\ \int P\{\hbox {diam}Q(\mathbf{Y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\}d\Phi (\mathbf{z}_I) < \varepsilon ,\ \forall n>N\right\} \rightarrow 1. \end{aligned}$$
(69)

So fix \(\mathbf{y}_I\) and \(\delta \) for the rest of the proof. Also note that conditional on \(E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\), the remaining observations \(i\notin I\) are independent.

To proceed, we sequentially project the set inverse \(Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star })\) onto m subspaces, each of which is spanned by the \(q_j\) free parameters for item j. For each projection, we find a bounding random variable for its diameter; then, the sum of constructed bounds across all projections serves as a upper bound, up to a constant multiplier depending on the dimension of the parameter space, for the diameter of the set inverse. We prove the result stated in Eq. 69 with the diameter of \(Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^\star )\) replaced by the constructed bound. In order to establish the desired property for the bounding variables, we allocate the rest observations (i.e., not in I) to each projection, and subsequently use the standard theory for order statistics of i.i.d. random variables. In particular, we rearrange those observations to fill a growing two-dimensional array indexed by a pair of indices s and j: The second dimension of the array, \(j = 1,\ldots ,m\), is filled first, then the first one; therefore, the first dimension indexed by \(s = \lfloor i / m\rfloor \), \(i=1,\ldots ,n\), grows as the sample size increases. Notationally, elements corresponding to an observation in the array are denoted by a subscript [sj].Footnote 8

Fix \(\mathbf{V} = \mathbf{V}_I = {\varvec{\theta }}\) for now. For each item j, let \(\tilde{\varvec{\beta }}_j\) be the collection of the \(r_j\) free slopes. Now intersecting the half-space of a new observation [sj] in the two-dimensional array with those of observations \(I_j\), the resulting intersection on the subspace of \({\varvec{\theta }}_j\) can be either bounded (i.e., a simplex) or unbounded. The following lemma provides sufficient and necessary conditions for the (un)bounded case:

Lemma 4

Consider \(p+1\) half-spaces: \(H_i = \{\mathbf{x}\in {\mathbb R}^p: \mathbf{n}_i{}^\top \mathbf{x}\le b_i\}\), \(i = 1,\ldots , p + 1\), in which \(\mathbf{n}_i\)’s are considered fixed. Then, the following statements are equivalent:

(i) \(\bigcap _{i=1}^{p + 1}H_i\) is bounded for all choices of \(b_i\)’s, \(i = 1,\ldots , p + 1\), such that the intersection is not empty;

(ii) \(\bigcap _{i=1}^{p + 1}H_i\) is a bounded simplex for some choices of \(b_i\)’s, \(i = 1,\ldots , p + 1\);

(iii) For all \(\mathbf{c}\in {\mathbb R}^p\), there exists \(i\in \{1,\ldots ,p+1\}\) such that \(\mathbf{n}_i{}^\top \mathbf{c} > 0\);

(iv) There exists \(i\in \{1,\ldots ,p+1\}\) such that \(\mathbf{n}_j\)’s, \(j\ne i\), are linear independent, and that \(\mathbf{n}_i = -\sum _{j\ne i}\gamma _j\mathbf{n}_j\) with \(\gamma _j>0\) for all \(j\ne i\).

Proof

(i) \(\Rightarrow \) (ii). We can always make the intersection non-empty by choosing \(b_i > 0\) for all \(i = 1,\ldots ,p + 1\). In this case, \(\bigcap _{i=1}^{p+1}H_i\) must contain some neighborhood of \(\mathbf 0\). So (i) \(\Rightarrow \) (ii) is trivial.

(ii) \(\Rightarrow \) (iii). Fix \(b_i\)’s, \(i = 1,\ldots ,p + 1\), such that \(\bigcap _{i=1}^{p + 1}H_i\) is a bounded simplex. Take \(\mathbf{x}_0\in \bigcap _{i=1}^{p + 1}H_i\); i.e., \(\mathbf{n}_i{}^\top \mathbf{x}_0\le b_i\) for all \(i = 1,\ldots ,p+1\). If there exists \(\mathbf{c}\in {\mathbb R}^p\) such that \(\mathbf{c}{}^\top \mathbf{n}_i\le 0\) for all i, then \(\mathbf{n}_i{}^\top \mathbf{x}_0 + \lambda \mathbf{n}_i{}^\top \mathbf{c}\le b_i\) for all i and all \(\lambda > 0\). This implies \(\mathbf{x}_0 + \lambda \mathbf{c}\in \bigcap _{i=1}^{p + 1}H_i\) for all \(\lambda > 0\), which contradicts the boundedness.

(iii) \(\Rightarrow \) (i). On each direction \(\mathbf{c}\), choose i such that \(\mathbf{n}_i{}^\top \mathbf{c} > 0\). For every possible value of the corresponding \(b_i\), there exists some \(\lambda _0 > 0\) such that for all \(\lambda > \lambda _0\), \(\mathbf{n}_i{}^\top (\lambda \mathbf{c}) > b_i\), i.e., \(\lambda \mathbf{c}_i\notin H_i\). So \(\bigcap _{i=1}^{p + 1}H_i\) is always bounded.

(iii) \(\Rightarrow \) (iv). Let \(C_i\) be the convex cone defined by all but the ith normal vectors. (iii) implies \(-\mathbf{n}_i{}^\top \mathbf{c} < 0\) for all \(\mathbf{c}\in C_i^N=\{\mathbf{c}: \mathbf{n}_i{}^\top \mathbf{c}\le 0\), for all \(j\ne i\}\), i.e., the normal cone (denoted by a superscript N) of \(C_i\). Hence, \(-\mathbf{n}_i\in (C_i^N)^N = C_i\).

(iv) \(\Rightarrow \) (iii). For \(\mathbf{c}\in C_i^N\), (iv) implies \(\mathbf{n}_i{}^\top \mathbf{c} > 0\). For \(\mathbf{c}\notin C_i^N\), there exists some \(j\ne i\) such that \(\mathbf{n}_j{}^\top \mathbf{c} > 0\). \(\square \)

Let \(\tilde{\mathbf{z}}_{ij}\) be the elements of \(\mathbf{z}_i\) associated with \(\tilde{\varvec{\beta }}_j\). For each \(i\in I_j\), write \(\mathbf{n}_{ij} = \omega _{ij}(1\ \ \tilde{\mathbf{z}}_{ij}{}^\top ){}^\top \) as the normal vector of the corresponding \((r_j+1)\)-dimensional half-space, in which \(\omega _{ij}=\pm 1\) is determined by the item response \(y_{ij}\). Similar notation is defined for observations in the array: Let \(\tilde{\mathbf{Z}}_{[sj]}^\star \) be the elements of \(\mathbf{Z}_{[sj]}^\star \) associated with \(\tilde{\varvec{\beta }}_j\), and \(\mathbf{N}_{[sj]}^\star = \omega _{[sj]}(1\ \ \tilde{\mathbf{Z}}_{[sj]}^\star {}^\top ){}^\top \) be the corresponding (random) normal vector; the random variable \(\omega _{[sj]}=\pm 1\) depends on this observation’s response to item j, which is denoted \(y_{[sj]}\) for simplicity. For each j, Lemma 4 implies that observation [sj] produces a bounded intersection if there exist positive real numbers \(\gamma _i\), \(i\in I_j\), such that

$$\begin{aligned} \omega _{[sj]}\tilde{\mathbf{Z}}_{[sj]}^\star = -\sum _{i\in I_j}\gamma _i\omega _{ij}\tilde{\mathbf{z}}_{ij}, \end{aligned}$$
(70)

and

$$\begin{aligned} \omega _{[sj]} = -\sum _{i\in I_j}\gamma _i\omega _{ij}. \end{aligned}$$
(71)

Conditioning on \(\mathbf{V}_j= \mathbf{V}_{I_j} = {\varvec{\theta }}_j\), the intersection cannot be empty, which introduces a truncation to \(A_{[sj]}^\star \), i.e., the associated logistic variate for observation [sj] and item j:

$$\begin{aligned} (-1)^{y_{[sj]}}(A_{[sj]}^\star - \alpha _j - \tilde{\varvec{\beta }}_j{}^\top \tilde{\mathbf{Z}}_{[sj]}^\star ) \ge 0. \end{aligned}$$
(72)

Fix j. When Eqs. 70 and 71 hold, let \({\varvec{\theta }}_{[sj]}^{i} = (\alpha _{j}^i\ {\varvec{\beta }}_{j}^i{}^\top ){}^\top \), \(i\in I_{j}\), be the vertex on the subspace of \({\varvec{\theta }}_{j}\) determined by observations \(I_{j}\setminus \{i\}\) together with the new observation [sj], which is random due to its dependency on \(A_{[sj]}^\star \) and \(\mathbf{Z}_{[sj]}^\star \). Also let \(I_{j}^i = I_{j}\setminus \{i\}\) for \(i\in I_{j}\), and treat \(\tilde{\mathbf{z}}_{ I_{j}^i}=(\tilde{\mathbf{z}}_{ij})_{i\in I^i_{j}}\) as an \(r_j\times r_j\) matrix throughout this part of derivation. A geometric illustration of these notations for \(r = 1\) is shown in Figure 5.

Fig. 5
figure 5

Illustration of notation used in the proof of Theorem 2. Here, \(r = 1\), and j is fixed. \(I_j = \{1, 2\}\), which determines the fixed vertex \({\varvec{\theta }}_j\) (shown as a dot). The line corresponding to the new observation [sj] intersects with those of observations 1 and 2, respectively, and produces two new vertices \({\varvec{\theta }}_{[sj]}^2\) and \({\varvec{\theta }}_{[sj]^1}\) (shown as circles). The sum of \(\Vert {\varvec{\theta }}_{[sj]}^1 - {\varvec{\theta }}_{j}\Vert \) and \(\Vert {\varvec{\theta }}_{[sj]}^2 - {\varvec{\theta }}_{j}\Vert \) gives an upper bound of the diameter of the plotted triangle.

Applying the formula for inverting a partitioned matrix, we have

(73)

It follows that the elements of \({\varvec{\theta }}_{[sj]}^i-{\varvec{\theta }}_{j}\) can be expressed as following:

$$\begin{aligned} \tilde{\varvec{\beta }}_{[sj]}^i - \tilde{\varvec{\beta }}_j = \frac{\tilde{\mathbf{z}}_{I_{j}^i}^{-1}{} \mathbf{1}(A_{[sj]}^\star -\tilde{\varvec{\beta }}_j{}^\top \tilde{\mathbf{Z}}_{[sj]}^\star - \alpha _{j})}{\tilde{\mathbf{Z}}_{[sj]}^\star {}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1} - 1}, \end{aligned}$$
(74)

and

$$\begin{aligned} \alpha _{[sj]}^i - \alpha _{j} = -{\tilde{\mathbf{z}}}_{ij}{}^\top (\tilde{\varvec{\beta }}_{[sj]}^i - \tilde{\varvec{\beta }}_j),\hbox { for all } i\in I_{j}^i. \end{aligned}$$
(75)

Define

$$\begin{aligned} \bar{U}^\star _{[sj]} =\left[ \sum _{i\in I_{j}}\frac{\left\| \tilde{\mathbf{z}}_{I_{j}^i}^{-1}{} \mathbf{1}\right\| \left( 1 + \sum _{i'\in I_{j}^i}\Vert \tilde{\mathbf{z}}_{i'j}\Vert ^2\right) }{\left| \tilde{\mathbf{Z}}_{[sj]}^\star {}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1} - 1\right| }\right] |A_{[sj]}^\star -\tilde{\varvec{\beta }}_j{}^\top \tilde{\mathbf{Z}}_{[sj]}^\star - \alpha _{j}| \end{aligned}$$
(76)

If both Eqs. 70 and 71 are satisfied, the random variable defined by Eq. 76 gives an upper bound for \(\Vert {\varvec{\theta }}_{[sj]}^i - {\varvec{\theta }}\Vert \). Also define

(77)

which is a random variable that is defined on the extended real line.

Pooling across all observations in the array, we have

$$\begin{aligned} \hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) \le C\sum _{j=1}^m\min _{t\le s}U^\star _{[tj]}, \end{aligned}$$
(78)

in which C is a constant determined by the dimension of the parameter space. It follows that

$$\begin{aligned}&\int P\{\hbox {diam}Q(\mathbf{y}, \mathbf{A}^\star , \mathbf{Z}^{\star }) > K/n\ |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\}d\Phi (\mathbf{z}_I)\\&\quad \le \int P\left\{ \sum _{j=1}^m\min _{t\le s}U^\star _{[tj]}> \frac{K}{Cn}\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} d\Phi (\mathbf{z}_I)\\&\quad \le \sum _{j=1}^m\int P\left\{ \min _{t\le s}U^\star _{[tj]}> K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} d\Phi (\mathbf{z}_{I_{j}}), \end{aligned}$$
(79)

in which \(K' = \frac{K}{Cm}\). Now fix \(\varepsilon ,\delta > 0\). It suffices to prove for each summand of Eq. 79:

$$\begin{aligned} P_{ {\varvec{\theta }}_0}\bigg \{\exists K', N > 0:\displaystyle \int P\left\{ \min _{t\le s}U^\star _{[tj]} > K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} d\Phi (\mathbf{z}_I)< \varepsilon ,\ \forall n>N\bigg \}\rightarrow 1. \end{aligned}$$
(80)

For each item j, define \(T_{sj}^k = \{t: t\le s, y_{[tj]} = k\}\) for \(k = 0\) and 1, respectively. We intend to prove that the sub-collections satisfy

$$\begin{aligned} |T_{sj}^k|/n\mathop {\rightarrow }\limits ^{P_{ {\varvec{\theta }}_0}}\rho _k,\hbox { as }n\rightarrow \infty \hbox { for some }0<\rho _k<1. \end{aligned}$$
(81)

In this case, we write \(\varrho = \min \{\rho _0, \rho _1\}\). Within each sub-collection, \(U^\star _{[tj]}\), \(t\in T_{sj}^k\), are i.i.d. conditional on \(E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\); let \(\varphi _{j}(u, {\varvec{\theta }}_{j}, \mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, k)\) be its corresponding (conditional) density. We also intend to find a set \(B_{j}\subset {\mathbb R}^{r_j^2}\) such that \(P\{ \tilde{\mathbf{Z}}_{I_{j}}^\star \notin B_{j}\} < \varepsilon /2\), and also a \(\kappa > 0\) such that for every \(\tilde{\mathbf{z}}_{I_{j}}\in B_{j}\), there exists a particular \(y_{[sj]} = k\), for which

$$\begin{aligned} \inf \left\{ \varphi _{j}(u, {\varvec{\theta }}_{j}, \mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, k):0\le u\le \eta , \Vert {\varvec{\theta }} - {\varvec{\theta }}_0\Vert \le \delta \right\} \ge \kappa \end{aligned}$$
(82)

for some \(\eta > 0\). Assume for a moment that Eqs. 81 and 82 hold. Then we can construct a sequence of i.i.d. non-negative random variables \(\{X_n\}\), whose density function is constantly equal to \(\kappa \) within \([0, \eta ]\). By the Delta method and the standard result for i.i.d. uniform order statistics, \(n\min _{i\le n}X_i\mathop {\rightarrow }\limits ^{d}W/\kappa \), in which \(W\sim \hbox {Exp}(1)\). Fix \(K'\) such that \(P\left( W/\kappa >K'\right) < \varepsilon /8\). By the Portmanteau Lemma, there exists an \(n_1\) such that for all \(n > n_1\), \(P\{n\min _{i\le \lfloor \varrho n/2\rfloor }X_i > K'\}\le P\{W/\kappa > K'\} + \varepsilon /8\le \varepsilon /4\). Also take \(n_2\) such that \(K'/n_2 < \eta \), and \(n_3\) such that \(P\{\min _{k=0,1}|T_{sj}^k|/n<\varrho /2\} \le \varepsilon /4\). Thus, for every \(\mathbf{z}_{I_{j}}\in B_{j}\), there exists \(k = 0\) or 1 such that along the corresponding subsequence \(T_{sj}^k\):

$$\begin{aligned} P\left\{ \min _{t\le s}U^\star _{[tj]} > K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\}&\le P\left\{ \min _{t\in T_{sj}^k}U^\star _{[tj]} > K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} \\&\le P\left\{ \min _{i\le \lfloor \varrho n/2\rfloor } X_i > K'/n\right\} +\varepsilon /4\le \varepsilon /2 \end{aligned}$$
(83)

for all \(n > \max \{n_1, n_2, n_3\}\). It follows that for all these large n’s,

$$\begin{aligned}&\int P\left\{ \min _{t\le s}U^\star _{[tj]}> K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} d\Phi (\mathbf{z}_{I_{j}})\\&\quad \le \int _{B_{j}} P\left\{ \min _{t\le s}U^\star _{[tj]}> K'/n\ \bigg |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\right\} d\Phi (\mathbf{z}_{I_{j}}) + \varepsilon /2\le \varepsilon , \end{aligned}$$
(84)

This implies the intended results (Eq. 80).

When \(\mathbf{Y}\) is considered random, in fact, the probability that Eq. 81 holds for both \(k = 0\) and 1 goes to 1, because the data-generating parameter values \({\varvec{\theta }}_0\) are assumed to be in the interior of the parameter space (and thus \(\varrho > 0\) is determined solely by \({\varvec{\theta }}_0\)).

Let \(\bar{\varphi }_{j}(u, {\varvec{\theta }}_{j},\tilde{\mathbf{z}}_{I_{j}})\) be the density of \(\bar{U}_{[sj]}\) conditional on \(E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\), and the event:

$$\begin{aligned} C_{j}(\mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, y_{[sj]}) = \{&\omega _{[sj]}\tilde{\mathbf{Z}}_{[sj]}^\star = -\sum _{i\in I_{j}}\gamma _i\omega _{ij}\tilde{\mathbf{z}}_{ij},\\&\omega _{[sj]} = -\sum _{i\in I_{j}}\gamma _i\omega _{ij},\\&\gamma _i > 0\hbox { for all }i\in I_{j}\}. \end{aligned}$$
(85)

Then, \(\varphi _{j}(u, {\varvec{\theta }}_{j}, \mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, y_{[sj]}) = \bar{\varphi }_{j}(u, {\varvec{\theta }}_{j},\tilde{\mathbf{z}}_{I_{j}})P\{C_{j}(\mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, y_{[sj]})|E^\delta _{ \mathbf{y}_I }(\mathbf{z}_I)\}\). Next, we find proper lower bounds for the two parts on the RHS, respectively, which subsequently establishes Eq. 82.

First, fix a \(y_{[sj]}\) ensuring Eqs. 70 and 71. For easy reference, let

$$\begin{aligned} \sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{Z}}_{[sj]}^\star ) = \sum _{i\in I_{j}}\frac{\left\| \tilde{\mathbf{z}}_{I_{j}^i}^{-1}\mathbf{1}\right\| \left( 1 + \sum _{i'\in I_{j}^i}\Vert \tilde{\mathbf{z}}_{i'j}\Vert ^2\right) }{\left| \tilde{\mathbf{Z}}_{[sj]}^\star {}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1} - 1\right| } \end{aligned}$$
(86)

and

$$\begin{aligned} \mu _{j}({\varvec{\theta }}_{j}, \tilde{\mathbf{Z}}_{[sj]}^\star ) = \tilde{\varvec{\beta }}_j{}^\top \tilde{\mathbf{Z}}_{[sj]}^\star + \alpha _{j}. \end{aligned}$$
(87)

Then we rewrite Eq. 76 as \(\bar{U}_{[sj]} = \sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{Z}}_{[sj]}^\star )|A_{[sj]}^\star - \mu _{j}({\varvec{\theta }}_{j}, \tilde{\mathbf{Z}}_{[sj]}^\star )|\), whose density function is

$$\begin{aligned} \bar{\varphi }_{j}(u, {\varvec{\theta }}_{j}, \tilde{\mathbf{z}}_{I_{j}}) =&\int \frac{\bar{\psi }(\mu _{j}({\varvec{\theta }}_{j}, \tilde{\mathbf{z}}_{[sj]}) + u/\sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{z}}_{[sj]}))}{\sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{z}}_{[sj]})}d\Phi (\tilde{\mathbf{z}}_{[sj]})\\&+\int \frac{\bar{\psi }(\mu _{j}({\varvec{\theta }}_{j}, \tilde{\mathbf{z}}_{[sj]}) - u/\sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{z}}_{[sj]}))}{\sigma _{j}(\tilde{\mathbf{z}}_{ I_{j}}, \tilde{\mathbf{z}}_{[sj]})}d\Phi (\tilde{\mathbf{z}}_{[sj]}) \end{aligned}$$
(88)

in which \(\bar{\psi }(\cdot )\) is the standard logistic density conditional on Eq. 72. By the theory of multivariate normal random variables, we can find

$$\begin{aligned} B_{j}^1 = \{\tilde{\mathbf{z}}_{I_{j}}\in {\mathbb R}^{r_j^2}:&\ \lambda \le \Vert \tilde{\mathbf{z}}_{ij}\Vert \le L,\hbox { for all }i\in I_{j}\} \end{aligned}$$
(89)

with properly defined \(\lambda \) and L such that \(P\{\tilde{\mathbf{Z}}_{I_{j}}^\star \in B_{j}^1\} > 1 - \varepsilon /4\). Also for fixed \(D' > 0\) and \(D > \delta > 0\), define

$$\begin{aligned} G_{j}(\tilde{\mathbf{z}}_{I_{j}}) = \{\tilde{\mathbf{z}}_{[sj]}\in {\mathbb R}^{r_j}:&\ \delta '\le \left| \tilde{\mathbf{z}}_{[sj]}{}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1} - 1\right| \le D'\hbox { for all }i\in I_{j},\ \Vert \tilde{\mathbf{z}}_{[sj]}\Vert \le D\}. \end{aligned}$$
(90)

Note that \(\tilde{\mathbf{Z}}_{[sj]}^\star {}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1} - 1\sim \mathcal{N}(- 1, \mathbf{1}{}^\top {\tilde{\mathbf{z}}}_{I_{j}^i}^{-\top }{\tilde{\mathbf{z}}}_{I_{j}^i}^{-1}{} \mathbf{1})\), in which the variance is uniformly bounded from above and below for all \(\tilde{\mathbf{z}}_{I_{j}}\in B_{j}^1\). It follows that

(91)

Thus, by restricting the integrals on the RHS of Eq. 88 to \(G_{j}(\tilde{\mathbf{z}}_{I_{j}})\), we are able to obtain an uniform lower bound of \(\bar{\varphi }_{j}(u, {\varvec{\theta }}_{j},\tilde{\mathbf{z}}_{I_{j}})\) for all \(\tilde{\mathbf{z}}_{I_{j}}\in B_{j}^1\).

Our final task is to find \(B_{j}^2\subset {\mathbb R}^{r_j^2}\) such that \(P\{ \tilde{\mathbf{Z}}_{I_{j}}^\star \in B_{j}^2\} > 1-\varepsilon /4\), and that \(P\{C_{j}(\mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}}, k)\ |\ E^\delta _{ \mathbf{y}_I}(\mathbf{z}_I)\}\) has a uniform lower bound for all \(\tilde{\mathbf{z}}_{I_{j}}\in B_{j}^2\). Here, we only prove the statement for \(r = 1\), and we conjecture that an extended argument can be established for \(r > 1\).

When \(r = 1\), \(|I_{j}| = 2\); without loss of generality, let \(I_{j}\) be the first two observations. We fix j, and for simplicity denote the two normal vectors corresponding to the first two observations by \(\mathbf{n}_1 = \omega _1(1\ z_1){}^\top \) and \(\mathbf{n}_2 = \omega _2(1\ z_2){}^\top \), in which \(\omega _1,\omega _2=\pm 1\). We now discuss two cases with different combinations of \(\omega _1\) and \(\omega _2\); in either case, the joint probability of \(Z_{[sj]}^\star = -\gamma _1\omega _1z_1 - \gamma _2\omega _2z_2\) and \(1 = -\gamma _1\omega _1 - \gamma _2\omega _2\), \(\gamma _1,\gamma _2>0\), is uniformly bounded from below for \((z_1\ z_2){}^\top \in B_{j}^2 = \{(z_1\ z_2){}^\top : |z_1 - z_2| \ge \eta , |z_1|\le H, |z_2|\le H\}\) with properly selected \(\eta , H > 0\) such that \(P\{ \tilde{\mathbf{Z}}_{I_{j}}^\star \in B_{j}^2\} > 1-\varepsilon /4\).

Case 1 \(\omega _1 = 1\) and \(\omega _2 = 1\). Choose \(\omega _{[sj]} = -1\), which happens with positive probability provided the data-generating parameter values are in the interior of the parameter space. Then, \(\mathbf{N}_{[sj]}^\star = -\gamma _1\mathbf{n}_1-\gamma _2\mathbf{n}_2\) implies \(\gamma _1 + \gamma _2 = 1\) and \(Z^\star _{[sj]} = -\gamma _1z_1 - \gamma _2z_2\), i.e., \(Z^\star _{[sj]}\) falls in the line segment between \(-z_1\) and \(-z_2\). For all \((z_1\ z_2){}^\top \in B_{j}^2\), \(P\{\min \{-z_1,-z_2\}\le Z_{[sj]}^\star \le \max \{-z_1,-z_2\}\} > \Phi (H) - \Phi (H + \eta )\).

Case 2 \(\omega _1 = 1\) and \(\omega _2 = -1\). In this case, the constraints are \(\omega _{[sj]} = -\gamma _1 + \gamma _2\) and \(\omega _{[sj]}Z^\star _{[sj]} = -\gamma _1z_1 + \gamma _2z_2\). If we choose \(\omega _{[sj]} = 1\), then \(\gamma _2 = 1 + \gamma _1\). It follows that \(Z^\star _{[sj]} = \gamma _1(z_2 - z_1) + z_2\), which is greater than \(z_2\) when \(z_2 > z_1\) and less than \(z_2\) when \(z_2 < z_1\). Then, both \(P\{Z_{[sj]}^\star < z_2\}\) and \(P\{Z_{[sj]}^\star > z_2\}\) are uniformly greater than \(1 - \Phi (L)\) for all \((z_1\ z_2){}^\top \in B_{j}^2\). A similar argument applies to the case when \(\omega _{[sj]} = -1\) is chosen.

The remaining combinations of \(\omega _1\) and \(\omega _2\) are reflections of the two cases having been discussed. Altogether we have shown that \(P\{C_{j}(\mathbf{y}_{I_{j}},\tilde{\mathbf{z}}_{I_{j}})|E^\delta _{ \mathbf{y}_I, \mathbf{k}_I}(\mathbf{z}_I)\}\) is uniformly bounded from below for \(\tilde{\mathbf{z}}_{I_{j}}\in B_{j}^2\). Take \(B_j = B_j^1\cap B_j^2\); \(P\{ \tilde{\mathbf{Z}}_{I_{j}}^\star \in B_j\} > 1-\varepsilon /2\). Then, the proof is now complete for \(r = 1\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Hannig, J. Generalized Fiducial Inference for Binary Logistic Item Response Models. Psychometrika 81, 290–324 (2016). https://doi.org/10.1007/s11336-015-9492-7

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-015-9492-7

Keywords

Navigation