Abstract
The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.
Similar content being viewed by others
References
Agresti, A., & Coull, B. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. American Statistician, 52, 119–126.
Agresti, A., Gottard, A., Berger, R., Casella, G., Brown, L., Tony Cai, T., DasGupta, A., Gelman, A., Thompson, E., Geyer, C., & Meeden, G. (2005). Discussion: fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 367–387.
Blyth, C., & Still, H. (1983). Binomial confidence intervals. Journal of the American Statistical Association, 78, 108–116.
Bock, R., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459.
Brown, L., Cai, T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117.
Geyer, C., & Meeden, G. (2005). Fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 358–366.
Ghosh, B. (1979). A comparison of some approximate confidence intervals for the binomial parameter. Journal of the American Statistical Association, 74, 894–900.
Haberman, S. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815–841.
Hornke, L. (1999). Item generation models for higher order cognitive functions. In I. Sidney (Ed.), Item generation. Hillsdale: Erlbaum.
Hornke, L. (2000). Item response times in computerized adaptive testing. Psicológica, 21, 175–190.
Hornke, L., & Habon, M. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369–380.
Hornke, L., Küppers, A., & Etzel, S. (2000). Konstruktion und Evaluation eines adaptiven Matrizentests. Diagnostica, 46, 182–188.
Hornke, L., Rettig, K., & Etzel, S. (1999). AMT Adaptiver Matrizentest. German language computer adaptive test.
Klauer, K.C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547.
Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245.
Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27(1), 251–280.
Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: developmental and operational aspects. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 191–211). Berlin: Springer.
Pratt, J. (1961). Length of confidence intervals. Journal of the American Statistical Association, 56, 549–567.
Shao, J. (2003). Mathematical statistics. Springer texts in statistics (2nd ed.). New York: Springer.
Walter, O. (2010). Adaptive tests for measuring anxiety and depression. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 123–136). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was supported by a Grant of the Studienstiftung des Deutschen Volkes.
Appendix: Mathematical Proofs
Appendix: Mathematical Proofs
The following proves Proposition 3.1.
Proof
Note that the prior g is derived by a simple application of integration by substitution. To show that τ −1 I is WAMA we first observe that for any confidence interval J and any \(\tilde{\rho}\) with \(\tilde{\theta}= \tau(\tilde{\rho})\),
If J is now any confidence interval on U, then
otherwise J would induce, with obvious notation, an interval τJ on ℝ with
in contradiction to the fact that I is WAMA on Θ. □
The following proves Proposition 4.4.
Proof
Let k,k′ be test results with the same total score R, and let β 1,…,β N be an enumeration of all item difficulty parameters of the test and, without loss of generality, let R≤N be such that β 1,…,β R are the item difficulty parameters of the items which have to be correctly solved in order to obtain k. Let \(\beta_{1}', \ldots,\beta_{R}'\) denote the parameters of the items which have to be solved to obtain k′. For r=1,…,R let \(x_{r}:=\beta_{r}-\beta_{r}'\). Furthermore, let \(Q(\theta):=\prod_{i=1}^{N}(1+\exp(\theta-\beta_{i}))\). Then
□
The following proves Proposition 4.3.
Proof
It suffices to show that for every possible test result k
holds. For a given k and arbitrary θ we have
here the second equivalence holds because of the normal assumption. Therefore (A.1) is shown. □
The following proves Theorem 4.1.
Proof
Let \(\tilde{\theta}\in\varTheta\) be arbitrary. We need to show
Rearranging terms yields
Let I be an alternative confidence interval for level 1−α with corresponding sets A I,θ :={k:θ∈I(k)} for all θ. Define
We have to show
Because of the definition of I h ,
and
hold. These equations can be summed up over B and C, respectively. Therefore we obtain
and
Since I is a confidence interval for the level 1−α and I h meets the level exactly, it is known that
Rights and permissions
About this article
Cite this article
Doebler, A., Doebler, P. & Holling, H. Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models. Psychometrika 78, 98–115 (2013). https://doi.org/10.1007/s11336-012-9290-4
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9290-4