Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

Doebler, Anna; Doebler, Philipp; Holling, Heinz

doi:10.1007/s11336-012-9290-4

Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

Published: 02 October 2012

Volume 78, pages 98–115, (2013)
Cite this article

Psychometrika Aims and scope Submit manuscript

Anna Doebler¹,
Philipp Doebler¹ &
Heinz Holling¹

626 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Visualizing Uncertainty of Estimated Response Functions in Nonparametric Item Response Theory

Priors in Bayesian Estimation Under the Two-Parameter Logistic Model

Model-Based Measures for Detecting and Quantifying Response Bias

Article 15 June 2018

References

Agresti, A., & Coull, B. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. American Statistician, 52, 119–126.
Google Scholar
Agresti, A., Gottard, A., Berger, R., Casella, G., Brown, L., Tony Cai, T., DasGupta, A., Gelman, A., Thompson, E., Geyer, C., & Meeden, G. (2005). Discussion: fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 367–387.
Article Google Scholar
Blyth, C., & Still, H. (1983). Binomial confidence intervals. Journal of the American Statistical Association, 78, 108–116.
Article Google Scholar
Bock, R., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459.
Article Google Scholar
Brown, L., Cai, T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117.
Google Scholar
Geyer, C., & Meeden, G. (2005). Fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 358–366.
Article Google Scholar
Ghosh, B. (1979). A comparison of some approximate confidence intervals for the binomial parameter. Journal of the American Statistical Association, 74, 894–900.
Article Google Scholar
Haberman, S. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815–841.
Article Google Scholar
Hornke, L. (1999). Item generation models for higher order cognitive functions. In I. Sidney (Ed.), Item generation. Hillsdale: Erlbaum.
Google Scholar
Hornke, L. (2000). Item response times in computerized adaptive testing. Psicológica, 21, 175–190.
Google Scholar
Hornke, L., & Habon, M. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369–380.
Article Google Scholar
Hornke, L., Küppers, A., & Etzel, S. (2000). Konstruktion und Evaluation eines adaptiven Matrizentests. Diagnostica, 46, 182–188.
Article Google Scholar
Hornke, L., Rettig, K., & Etzel, S. (1999). AMT Adaptiver Matrizentest. German language computer adaptive test.
Klauer, K.C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547.
Article Google Scholar
Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245.
Article Google Scholar
Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27(1), 251–280.
Article Google Scholar
Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: developmental and operational aspects. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 191–211). Berlin: Springer.
Google Scholar
Pratt, J. (1961). Length of confidence intervals. Journal of the American Statistical Association, 56, 549–567.
Article Google Scholar
Shao, J. (2003). Mathematical statistics. Springer texts in statistics (2nd ed.). New York: Springer.
Book Google Scholar
Walter, O. (2010). Adaptive tests for measuring anxiety and depression. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 123–136). Berlin: Springer.
Google Scholar

Download references

Author information

Authors and Affiliations

Fachbereich Psychologie und Sportwissenschaft (FB 7), Institut für Psychologie, Westfälische Wilhelms-Universität, Fliednerstr. 21, 48149, Münster, Germany
Anna Doebler, Philipp Doebler & Heinz Holling

Authors

Anna Doebler
View author publications
You can also search for this author in PubMed Google Scholar
Philipp Doebler
View author publications
You can also search for this author in PubMed Google Scholar
Heinz Holling
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anna Doebler.

Additional information

This work was supported by a Grant of the Studienstiftung des Deutschen Volkes.

Appendix: Mathematical Proofs

The following proves Proposition 3.1.

Proof

Note that the prior g is derived by a simple application of integration by substitution. To show that τ ⁻¹ I is WAMA we first observe that for any confidence interval J and any $\tilde{\rho}$ with $\tilde{\theta}= \tau(\tilde{\rho})$,

$$\int_{\varTheta}\mathrm{P}_\theta(\tilde{\theta}\in J) f( \theta) \mathrm{d}\theta= \int_{U}\mathrm{P}_\rho \bigl(\tilde{\rho}\in\tau^{-1}J\bigr) g(\rho) \mathrm{d}\rho. $$

If J is now any confidence interval on U, then

$$\int_{U}\mathrm{P}_\rho\bigl(\tilde{\rho}\in \tau^{-1}I\bigr) g(\rho) \mathrm{d}\rho\leq\int_{U} \mathrm{P}_\rho(\tilde{\rho}\in J) g(\rho) \mathrm{d}\rho; $$

otherwise J would induce, with obvious notation, an interval τJ on ℝ with

$$\int_{\varTheta}\mathrm{P}_\theta(\tilde{\theta}\in I) f( \theta) \mathrm{d}\theta> \int_{\varTheta}\mathrm{P}_\theta( \tilde{\theta}\in\tau J) f(\theta) \mathrm{d}\theta, $$

in contradiction to the fact that I is WAMA on Θ. □

The following proves Proposition 4.4.

Proof

Let k,k′ be test results with the same total score R, and let β ₁,…,β _N be an enumeration of all item difficulty parameters of the test and, without loss of generality, let R≤N be such that β ₁,…,β _R are the item difficulty parameters of the items which have to be correctly solved in order to obtain k. Let $\beta_{1}', \ldots,\beta_{R}'$ denote the parameters of the items which have to be solved to obtain k′. For r=1,…,R let $x_{r}:=\beta_{r}-\beta_{r}'$. Furthermore, let $Q(\theta):=\prod_{i=1}^{N}(1+\exp(\theta-\beta_{i}))$. Then

□

The following proves Proposition 4.3.

Proof

It suffices to show that for every possible test result k

$$ I_e(k)=I_n(k) $$

(A.1)

holds. For a given k and arbitrary θ we have

here the second equivalence holds because of the normal assumption. Therefore (A.1) is shown. □

The following proves Theorem 4.1.

Proof

Let $\tilde{\theta}\in\varTheta$ be arbitrary. We need to show

$$\int_\theta\mathrm{P}_\theta(\tilde{\theta}\in I_h) f(\theta) \mathrm{d}\theta= \min_{ \{I: I \mathrm{is\ an}\ (1- \alpha)\hbox{-}\mathrm{confidence}\ \mathrm{interval} \}}\int _\theta\mathrm {P}_\theta(\tilde{\theta}\in I) f(\theta) \mathrm{d}\theta. $$

Rearranging terms yields

$$\int_\theta\mathrm{P}_\theta(\tilde{\theta}\in I_h) f(\theta) \mathrm{d}\theta= \int_\theta\sum _{k \in A_{I_h,\tilde{\theta}}} \mathrm{P}_\theta(k) f(\theta) \mathrm{d}\theta= \sum_{k \in A_{I_h,\tilde{\theta}}}\int_\theta \mathrm{P}_\theta(k) f(\theta) \mathrm{d}\theta. $$

Let I be an alternative confidence interval for level 1−α with corresponding sets A _I,θ:={k:θ∈I(k)} for all θ. Define

We have to show

$$ \sum_{k\in C} \int_\theta \mathrm{P}_\theta(k)f(\theta)\mathrm {d}\theta\leq\sum _{k \in B} \int_\theta\mathrm{P}_\theta (k)f(\theta)\mathrm{d}\theta. $$

(A.2)

Because of the definition of I _h,

$$ \forall k \in B\colon\quad \mathrm{c}_{\tilde{\theta}}\mathrm{P}_{\tilde{\theta}}(k)\leq\int _{\theta}\mathrm{P}_{\theta}(k)f(\theta) \mathrm {d}\theta $$

and

$$ \forall k \in C\colon\quad \int_{\theta}\mathrm{P}_{\theta}(k)f( \theta) d(\theta)\leq\mathrm{c}_{\tilde{\theta}}\mathrm{P}_{\tilde{\theta}}(k) $$

hold. These equations can be summed up over B and C, respectively. Therefore we obtain

$$ \mathrm{c}_{\tilde{\theta}} \sum_{k \in B} \mathrm{P}_{\tilde{\theta}}(k)\leq\sum_{k\in B}\int _{\theta}\mathrm{P}_{\theta}(k)f(\theta ) \mathrm{d}\theta $$

(A.3)

and

$$ \sum_{k \in C}\int_{\theta} \mathrm{P}_{\theta}(k)f(\theta) d(\theta)\leq\mathrm{c}_{\tilde{\theta}}\sum _{k \in C} \mathrm{P}_{\tilde{\theta}}(k). $$

(A.4)

Since I is a confidence interval for the level 1−α and I _h meets the level exactly, it is known that

$$\sum_{k\in C}\mathrm{P}_{\tilde{\theta}}(k)\leq\sum _{k\in B}\mathrm {P}_{\tilde{\theta}}(k). $$

From (A.3) and (A.4), (A.2) follows. □

Rights and permissions

Reprints and permissions

About this article

Cite this article

Doebler, A., Doebler, P. & Holling, H. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models. Psychometrika 78, 98–115 (2013). https://doi.org/10.1007/s11336-012-9290-4

Download citation

Received: 26 September 2010
Revised: 07 March 2012
Published: 02 October 2012
Issue Date: January 2013
DOI: https://doi.org/10.1007/s11336-012-9290-4

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models

Abstract

Access this article

Similar content being viewed by others

Visualizing Uncertainty of Estimated Response Functions in Nonparametric Item Response Theory

Priors in Bayesian Estimation Under the Two-Parameter Logistic Model

Model-Based Measures for Detecting and Quantifying Response Bias

References