, Volume 78, Issue 1, pp 98–115 | Cite as

Optimal and Most Exact Confidence Intervals for Person Parameters in Item Response Theory Models



The common way to calculate confidence intervals for item response theory models is to assume that the standardized maximum likelihood estimator for the person parameter θ is normally distributed. However, this approximation is often inadequate for short and medium test lengths. As a result, the coverage probabilities fall below the given level of significance in many cases; and, therefore, the corresponding intervals are no longer confidence intervals in terms of the actual definition. In the present work, confidence intervals are defined more precisely by utilizing the relationship between confidence intervals and hypothesis testing. Two approaches to confidence interval construction are explored that are optimal with respect to criteria of smallness and consistency with the standard approach.

Key words

confidence intervals optimality item response theory monotone likelihood ratio adaptive testing 


  1. Agresti, A., & Coull, B. (1998). Approximate is better than “exact” for interval estimation of binomial proportions. American Statistician, 52, 119–126. Google Scholar
  2. Agresti, A., Gottard, A., Berger, R., Casella, G., Brown, L., Tony Cai, T., DasGupta, A., Gelman, A., Thompson, E., Geyer, C., & Meeden, G. (2005). Discussion: fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 367–387. CrossRefGoogle Scholar
  3. Blyth, C., & Still, H. (1983). Binomial confidence intervals. Journal of the American Statistical Association, 78, 108–116. CrossRefGoogle Scholar
  4. Bock, R., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459. CrossRefGoogle Scholar
  5. Brown, L., Cai, T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117. Google Scholar
  6. Geyer, C., & Meeden, G. (2005). Fuzzy and randomized confidence intervals and P-values. Statistical Science, 20, 358–366. CrossRefGoogle Scholar
  7. Ghosh, B. (1979). A comparison of some approximate confidence intervals for the binomial parameter. Journal of the American Statistical Association, 74, 894–900. CrossRefGoogle Scholar
  8. Haberman, S. (1977). Maximum likelihood estimates in exponential response models. Annals of Statistics, 5, 815–841. CrossRefGoogle Scholar
  9. Hornke, L. (1999). Item generation models for higher order cognitive functions. In I. Sidney (Ed.), Item generation. Hillsdale: Erlbaum. Google Scholar
  10. Hornke, L. (2000). Item response times in computerized adaptive testing. Psicológica, 21, 175–190. Google Scholar
  11. Hornke, L., & Habon, M. (1986). Rule-based item bank construction and evaluation within the linear logistic framework. Applied Psychological Measurement, 10, 369–380. CrossRefGoogle Scholar
  12. Hornke, L., Küppers, A., & Etzel, S. (2000). Konstruktion und Evaluation eines adaptiven Matrizentests. Diagnostica, 46, 182–188. CrossRefGoogle Scholar
  13. Hornke, L., Rettig, K., & Etzel, S. (1999). AMT Adaptiver Matrizentest. German language computer adaptive test. Google Scholar
  14. Klauer, K.C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547. CrossRefGoogle Scholar
  15. Lord, F.M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245. CrossRefGoogle Scholar
  16. Lumsden, J. (1976). Test theory. Annual Review of Psychology, 27(1), 251–280. CrossRefGoogle Scholar
  17. Nogami, Y., & Hayashi, N. (2010). A Japanese adaptive test of English as a foreign language: developmental and operational aspects. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 191–211). Berlin: Springer. Google Scholar
  18. Pratt, J. (1961). Length of confidence intervals. Journal of the American Statistical Association, 56, 549–567. CrossRefGoogle Scholar
  19. Shao, J. (2003). Mathematical statistics. Springer texts in statistics (2nd ed.). New York: Springer. CrossRefGoogle Scholar
  20. Walter, O. (2010). Adaptive tests for measuring anxiety and depression. In W. van der Linden & C. Glas (Eds.), Elements of adaptive testing (pp. 123–136). Berlin: Springer. Google Scholar

Copyright information

© The Psychometric Society 2012

Authors and Affiliations

  1. 1.Fachbereich Psychologie und Sportwissenschaft (FB 7), Institut für PsychologieWestfälische Wilhelms-UniversitätMünsterGermany

Personalised recommendations