Abstract
In educational and psychological measurement when short test forms are used, the asymptotic normality of the maximum likelihood estimator of the person parameter of item response models does not hold. As a result, hypothesis tests or confidence intervals of the person parameter based on the normal distribution are likely to be problematic. Inferences based on the exact distribution, on the other hand, do not suffer from this limitation. However, the computation involved for the exact distribution approach is often prohibitively expensive. In this paper, we propose a general framework for constructing hypothesis tests and confidence intervals for IRT models within the exponential family based on exact distribution. In addition, an efficient branch and bound algorithm for calculating the exact p value is introduced. The type-I error rate and statistical power of the proposed exact test as well as the coverage rate and the lengths of the associated confidence interval are examined through a simulation. We also demonstrate its practical use by analyzing three real data sets.
Similar content being viewed by others
References
Agresti, A. (2003). Dealing with discreteness: Making ‘exact’ confidence intervals for proportions, differences of proportions, and odds ratios more exact. Statistical Methods in Medical Research, 12(1), 3–21.
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Boca Raton: CRC Press.
Biehler, M., Holling, H., & Doebler, P. (2014). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80(3), 665–688. doi:10.1007/s11336-014-9405-1.
Bock, D. R., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179–197. doi:10.1007/BF02291262.
Brent, R. P. (1973). Algorithms for minimization without derivatives. Englewood Cliffs, NJ: Prentice-Hall.
Casella, G., & Berger, R. (2001). Statistical inference. Textbook Binding: Duxbury Resource Center.
Doebler, A., Doebler, P., & Holling, H. (2012). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78(1), 98–115. doi:10.1007/s11336-012-9290-4.
Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton: CRC.
Hagell, P., & Westergren, A. (2011). Measurement properties of the SF-12 health survey in Parkinson’s disease. Journal of Parkinson’s Disease, 1, 185–196. doi:10.3233/JPD-2011-11026.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff Publishing. doi:10.1007/978-94-017-1988-9.
Johnson, M. S. (2004). Item response models and their use in measuring food insecurity and hunger. In Paper presented at the workshop on the measurement of food insecurity and hunger. The national academy of science panel to review USDA’s measurement of food insecurity and hunger.
Klauer, K. C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547. doi:10.1007/BF02294489.
Land, A. H., & Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica, 28(3), 497–520. doi:10.2307/1910129.
Leiserson, C. C. E., Rivest, R. R. L., Stein, C., & Cormen, T. H. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.
Liou, M., & Chang, C.-H. (1992). Constructing the exact significance level for a person fit statistic. Psychometrika, 57(2), 169–181. doi:10.1007/BF02294503.
Little, J. D. C., Murty, K. G., Sweeney, D. W., & Karel, C. (1963). An algorithm for the traveling salesman problem. Operations Research, 11(6), 972–989.
Lord, F. M. (1980). Applications of item response theory to practical testing problems (Vol. 365). Broadway, NJ: Lawrence Erlbaum Associates, Inc.
Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245. doi:10.1007/BF02294018.
Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2), 475. doi:10.2307/1426607.
Mair, P., & Hatzinger, R. (2007). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science, 49(1), 26–43.
Molenaar, I. W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75–106. doi:10.1007/BF02294745.
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
Thissen, D. (2016). Bad questions: An essay involving item response theory. Journal of Educational and Behavioral Statistics, 41(1), 81–89.
Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–33.
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. doi:10.1007/BF02294627.
Wasserman, L. (2004). All of statistics. New York, NY: Springer. doi:10.1007/978-0-387-21736-9.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix A
Under the 2PL model, the probability of a correct response for jth item from a subject is
where \(a_j\) is the item discrimination parameter, \(b_j\) is the item difficulty parameter, and \(\theta \) is the ability parameter for the subject. It follows that the likelihood of \(\theta \) given a response pattern \(\varvec{X} = \varvec{x}\) is
Equation (20) is in the exponential form \(L(\theta | \varvec{x}) = \exp [\eta (\theta )T(\varvec{x})]h(\varvec{x})g(\theta )\), where
and
Appendix B
In the food security data example, we are interested in testing the one-sided hypothesis: \(H_0{:}\; \theta \le 1.93\) against \(H_1{:}\; \theta \ge 1.93\). Using the exact test approach, the following response patterns are rejected at \(\alpha =0.05\) level:
Rights and permissions
About this article
Cite this article
Liu, X., Han, Z. & Johnson, M.S. The UMP Exact Test and the Confidence Interval for Person Parameters in IRT Models. Psychometrika 83, 182–202 (2018). https://doi.org/10.1007/s11336-017-9580-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-017-9580-y