Skip to main content
Log in

Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1.
Figure 2.
Figure 3.

Similar content being viewed by others

References

  • Agresti, A., & Coull, B. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. American Statistician, 52, 119–126.

    Google Scholar 

  • Agresti, A., Gottard, A., Berger, R., Casella, G., Brown, L., Tony Cai, T., DasGupta, A., Gelman, A., Thompson, E., Geyer, C., & Meeden, G. (2005). Discussion: fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 367–387.

    Article  Google Scholar 

  • Blyth, C., & Still, H. (1983). Binomial confidence intervals. Journal of the American Statistical Association, 78, 108–116.

    Article  Google Scholar 

  • Bock, R., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459.

    Article  Google Scholar 

  • Brown, L., Cai, T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117.

    Google Scholar 

  • Geyer, C., & Meeden, G. (2005). Fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 358–366.

    Article  Google Scholar 

  • Ghosh, B. (1979). A comparison of some approximate confidence intervals for the binomial parameter. Journal of the American Statistical Association, 74, 894–900.

    Article  Google Scholar 

  • Haberman, S. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815–841.

    Article  Google Scholar 

  • Hornke, L. (1999). Item generation models for higher order cognitive functions. In I. Sidney (Ed.), Item generation. Hillsdale: Erlbaum.

    Google Scholar 

  • Hornke, L. (2000). Item response times in computerized adaptive testing. Psicológica, 21, 175–190.

    Google Scholar 

  • Hornke, L., & Habon, M. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369–380.

    Article  Google Scholar 

  • Hornke, L., Küppers, A., & Etzel, S. (2000). Konstruktion und Evaluation eines adaptiven Matrizentests. Diagnostica, 46, 182–188.

    Article  Google Scholar 

  • Hornke, L., Rettig, K., & Etzel, S. (1999). AMT Adaptiver Matrizentest. German language computer adaptive test.

  • Klauer, K.C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547.

    Article  Google Scholar 

  • Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245.

    Article  Google Scholar 

  • Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27(1), 251–280.

    Article  Google Scholar 

  • Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: developmental and operational aspects. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 191–211). Berlin: Springer.

    Google Scholar 

  • Pratt, J. (1961). Length of confidence intervals. Journal of the American Statistical Association, 56, 549–567.

    Article  Google Scholar 

  • Shao, J. (2003). Mathematical statistics. Springer texts in statistics (2nd ed.). New York: Springer.

    Book  Google Scholar 

  • Walter, O. (2010). Adaptive tests for measuring anxiety and depression. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 123–136). Berlin: Springer.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Anna Doebler.

Additional information

This work was supported by a Grant of the Studienstiftung des Deutschen Volkes.

Appendix: Mathematical Proofs

Appendix: Mathematical Proofs

The following proves Proposition 3.1.

Proof

Note that the prior g is derived by a simple application of integration by substitution. To show that τ −1 I is WAMA we first observe that for any confidence interval J and any \(\tilde{\rho}\) with \(\tilde{\theta}= \tau(\tilde{\rho})\),

$$\int_{\varTheta}\mathrm{P}_\theta(\tilde{\theta}\in J) f( \theta) \mathrm{d}\theta= \int_{U}\mathrm{P}_\rho \bigl(\tilde{\rho}\in\tau^{-1}J\bigr) g(\rho) \mathrm{d}\rho. $$

If J is now any confidence interval on U, then

$$\int_{U}\mathrm{P}_\rho\bigl(\tilde{\rho}\in \tau^{-1}I\bigr) g(\rho) \mathrm{d}\rho\leq\int_{U} \mathrm{P}_\rho(\tilde{\rho}\in J) g(\rho) \mathrm{d}\rho; $$

otherwise J would induce, with obvious notation, an interval τJ on ℝ with

$$\int_{\varTheta}\mathrm{P}_\theta(\tilde{\theta}\in I) f( \theta) \mathrm{d}\theta> \int_{\varTheta}\mathrm{P}_\theta( \tilde{\theta}\in\tau J) f(\theta) \mathrm{d}\theta, $$

in contradiction to the fact that I is WAMA on Θ. □

The following proves Proposition 4.4.

Proof

Let k,k′ be test results with the same total score R, and let β 1,…,β N be an enumeration of all item difficulty parameters of the test and, without loss of generality, let RN be such that β 1,…,β R are the item difficulty parameters of the items which have to be correctly solved in order to obtain k. Let \(\beta_{1}', \ldots,\beta_{R}'\) denote the parameters of the items which have to be solved to obtain k′. For r=1,…,R let \(x_{r}:=\beta_{r}-\beta_{r}'\). Furthermore, let \(Q(\theta):=\prod_{i=1}^{N}(1+\exp(\theta-\beta_{i}))\). Then

 □

The following proves Proposition 4.3.

Proof

It suffices to show that for every possible test result k

$$ I_e(k)=I_n(k) $$
(A.1)

holds. For a given k and arbitrary θ we have

here the second equivalence holds because of the normal assumption. Therefore (A.1) is shown. □

The following proves Theorem 4.1.

Proof

Let \(\tilde{\theta}\in\varTheta\) be arbitrary. We need to show

$$\int_\theta\mathrm{P}_\theta(\tilde{\theta}\in I_h) f(\theta) \mathrm{d}\theta= \min_{ \{I: I \mathrm{is\ an}\ (1- \alpha)\hbox{-}\mathrm{confidence}\ \mathrm{interval} \}}\int _\theta\mathrm {P}_\theta(\tilde{\theta}\in I) f(\theta) \mathrm{d}\theta. $$

Rearranging terms yields

$$\int_\theta\mathrm{P}_\theta(\tilde{\theta}\in I_h) f(\theta) \mathrm{d}\theta= \int_\theta\sum _{k \in A_{I_h,\tilde{\theta}}} \mathrm{P}_\theta(k) f(\theta) \mathrm{d}\theta= \sum_{k \in A_{I_h,\tilde{\theta}}}\int_\theta \mathrm{P}_\theta(k) f(\theta) \mathrm{d}\theta. $$

Let I be an alternative confidence interval for level 1−α with corresponding sets A I,θ :={k:θI(k)} for all θ. Define

We have to show

$$ \sum_{k\in C} \int_\theta \mathrm{P}_\theta(k)f(\theta)\mathrm {d}\theta\leq\sum _{k \in B} \int_\theta\mathrm{P}_\theta (k)f(\theta)\mathrm{d}\theta. $$
(A.2)

Because of the definition of I h ,

$$ \forall k \in B\colon\quad \mathrm{c}_{\tilde{\theta}}\mathrm{P}_{\tilde{\theta}}(k)\leq\int _{\theta}\mathrm{P}_{\theta}(k)f(\theta) \mathrm {d}\theta $$

and

$$ \forall k \in C\colon\quad \int_{\theta}\mathrm{P}_{\theta}(k)f( \theta) d(\theta)\leq\mathrm{c}_{\tilde{\theta}}\mathrm{P}_{\tilde{\theta}}(k) $$

hold. These equations can be summed up over B and C, respectively. Therefore we obtain

$$ \mathrm{c}_{\tilde{\theta}} \sum_{k \in B} \mathrm{P}_{\tilde{\theta}}(k)\leq\sum_{k\in B}\int _{\theta}\mathrm{P}_{\theta}(k)f(\theta ) \mathrm{d}\theta $$
(A.3)

and

$$ \sum_{k \in C}\int_{\theta} \mathrm{P}_{\theta}(k)f(\theta) d(\theta)\leq\mathrm{c}_{\tilde{\theta}}\sum _{k \in C} \mathrm{P}_{\tilde{\theta}}(k). $$
(A.4)

Since I is a confidence interval for the level 1−α and I h meets the level exactly, it is known that

$$\sum_{k\in C}\mathrm{P}_{\tilde{\theta}}(k)\leq\sum _{k\in B}\mathrm {P}_{\tilde{\theta}}(k). $$

From (A.3) and (A.4), (A.2) follows. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doebler, A., Doebler, P. & Holling, H. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models. Psychometrika 78, 98–115 (2013). https://doi.org/10.1007/s11336-012-9290-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-012-9290-4

Key words

Navigation