Skip to main content

Advertisement

Log in

The UMP Exact Test and the Confidence Interval for Person Parameters in IRT Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In educational and psychological measurement when short test forms are used, the asymptotic normality of the maximum likelihood estimator of the person parameter of item response models does not hold. As a result, hypothesis tests or confidence intervals of the person parameter based on the normal distribution are likely to be problematic. Inferences based on the exact distribution, on the other hand, do not suffer from this limitation. However, the computation involved for the exact distribution approach is often prohibitively expensive. In this paper, we propose a general framework for constructing hypothesis tests and confidence intervals for IRT models within the exponential family based on exact distribution. In addition, an efficient branch and bound algorithm for calculating the exact p value is introduced. The type-I error rate and statistical power of the proposed exact test as well as the coverage rate and the lengths of the associated confidence interval are examined through a simulation. We also demonstrate its practical use by analyzing three real data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  • Agresti, A. (2003). Dealing with discreteness: Making ‘exact’ confidence intervals for proportions, differences of proportions, and odds ratios more exact. Statistical Methods in Medical Research, 12(1), 3–21.

    Article  PubMed  Google Scholar 

  • Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). Boca Raton: CRC Press.

    Google Scholar 

  • Biehler, M., Holling, H., & Doebler, P. (2014). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80(3), 665–688. doi:10.1007/s11336-014-9405-1.

    Article  PubMed  Google Scholar 

  • Bock, D. R., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35(2), 179–197. doi:10.1007/BF02291262.

    Article  Google Scholar 

  • Brent, R. P. (1973). Algorithms for minimization without derivatives. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Casella, G., & Berger, R. (2001). Statistical inference. Textbook Binding: Duxbury Resource Center.

    Google Scholar 

  • Doebler, A., Doebler, P., & Holling, H. (2012). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78(1), 98–115. doi:10.1007/s11336-012-9290-4.

    Article  PubMed  Google Scholar 

  • Fisher, R. A. (1935). The design of experiments. Edinburgh: Oliver and Boyd.

    Google Scholar 

  • Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton: CRC.

    Google Scholar 

  • Hagell, P., & Westergren, A. (2011). Measurement properties of the SF-12 health survey in Parkinson’s disease. Journal of Parkinson’s Disease, 1, 185–196. doi:10.3233/JPD-2011-11026.

    PubMed  Google Scholar 

  • Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff Publishing. doi:10.1007/978-94-017-1988-9.

    Book  Google Scholar 

  • Johnson, M. S. (2004). Item response models and their use in measuring food insecurity and hunger. In Paper presented at the workshop on the measurement of food insecurity and hunger. The national academy of science panel to review USDA’s measurement of food insecurity and hunger.

  • Klauer, K. C. (1991). Exact and best confidence intervals for the ability parameter of the Rasch model. Psychometrika, 56(3), 535–547. doi:10.1007/BF02294489.

    Article  Google Scholar 

  • Land, A. H., & Doig, A. G. (1960). An automatic method of solving discrete programming problems. Econometrica, 28(3), 497–520. doi:10.2307/1910129.

    Article  Google Scholar 

  • Leiserson, C. C. E., Rivest, R. R. L., Stein, C., & Cormen, T. H. (2009). Introduction to algorithms (3rd ed.). Cambridge: The MIT Press.

    Google Scholar 

  • Liou, M., & Chang, C.-H. (1992). Constructing the exact significance level for a person fit statistic. Psychometrika, 57(2), 169–181. doi:10.1007/BF02294503.

    Article  Google Scholar 

  • Little, J. D. C., Murty, K. G., Sweeney, D. W., & Karel, C. (1963). An algorithm for the traveling salesman problem. Operations Research, 11(6), 972–989.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems (Vol. 365). Broadway, NJ: Lawrence Erlbaum Associates, Inc.

    Google Scholar 

  • Lord, F. M. (1983). Unbiased estimators of ability parameters, of their variance, and of their parallel-forms reliability. Psychometrika, 48(2), 233–245. doi:10.1007/BF02294018.

    Article  Google Scholar 

  • Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2), 475. doi:10.2307/1426607.

    Article  Google Scholar 

  • Mair, P., & Hatzinger, R. (2007). CML based estimation of extended Rasch models with the eRm package in R. Psychology Science, 49(1), 26–43.

    Google Scholar 

  • Molenaar, I. W., & Hoijtink, H. (1990). The many null distributions of person fit indices. Psychometrika, 55(1), 75–106. doi:10.1007/BF02294745.

    Article  Google Scholar 

  • Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.

    Article  Google Scholar 

  • Thissen, D. (2016). Bad questions: An essay involving item response theory. Journal of Educational and Behavioral Statistics, 41(1), 81–89.

    Article  Google Scholar 

  • Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–33.

    Article  PubMed  Google Scholar 

  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450. doi:10.1007/BF02294627.

    Article  Google Scholar 

  • Wasserman, L. (2004). All of statistics. New York, NY: Springer. doi:10.1007/978-0-387-21736-9.

    Book  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiang Liu.

Appendices

Appendix A

Under the 2PL model, the probability of a correct response for jth item from a subject is

$$\begin{aligned} P_j(X_j=1 | a_j,b_j,\theta ) = \frac{\exp [a_j(\theta -b_j)]}{1+\exp [a_j(\theta -b_j)]}, \end{aligned}$$
(13)

where \(a_j\) is the item discrimination parameter, \(b_j\) is the item difficulty parameter, and \(\theta \) is the ability parameter for the subject. It follows that the likelihood of \(\theta \) given a response pattern \(\varvec{X} = \varvec{x}\) is

$$\begin{aligned} L\left( \theta |\varvec{x},\varvec{a},\varvec{b}\right)&= \prod _{j=1}^{J} P_j\left( X_j=1 | a_j,b_j,\theta \right) ^{x_j} P_j\left( X_j=0 | a_j,b_j,\theta \right) ^{1-x_j} \end{aligned}$$
(14)
$$\begin{aligned}&= \prod _{j=1}^{J} \left\{ \frac{\exp \left[ a_j\left( \theta -b_j\right) \right] }{1+\exp \left[ a_j\left( \theta -b_j\right) \right] }\right\} ^{x_j} \left\{ \frac{1}{1+\exp \left[ a_j\left( \theta -b_j\right) \right] }\right\} ^{1-x_j} \end{aligned}$$
(15)
$$\begin{aligned}&= \prod _{j=1}^{J} \frac{\left\{ \exp \left[ a_j\left( \theta -b_j\right) \right] \right\} ^{x_j}}{1+\exp \left[ a_j\left( \theta -b_j\right) \right] } \left\{ \frac{1}{1+\exp \left[ a_j\left( \theta -b_j\right) \right] }\right\} ^{x_j-x_j} \end{aligned}$$
(16)
$$\begin{aligned}&= \prod _{j=1}^{J} \frac{\left\{ \exp \left[ a_j\left( \theta -b_j\right) \right] \right\} ^{x_j}}{1+\exp \left[ a_j\left( \theta -b_j\right) \right] } \end{aligned}$$
(17)
$$\begin{aligned}&= \frac{\exp \left[ \sum _{j=1}^{J}x_j a_j\left( \theta -b_j\right) \right] }{\prod _{j=1}^{J} \left\{ 1+\exp \left[ a_j\left( \theta -b_j\right) \right] \right\} } \end{aligned}$$
(18)
$$\begin{aligned}&= \frac{\exp \left[ \theta \sum _{j=1}^{J}a_j x_j\right] }{\exp \left[ \sum _{j=1}^{J}a_j x_j b_j\right] \prod _{j=1}^{J} \left\{ 1+\exp \left[ a_j\left( \theta -b_j\right) \right] \right\} } \end{aligned}$$
(19)
$$\begin{aligned}&= \exp \left[ \theta \sum _{j=1}^{J}a_j x_j\right] \left\{ \exp \left[ \sum _{j=1}^{J}a_j x_j b_j\right] \right\} ^{-1} \left\{ \prod _{j=1}^{J} \left\{ 1+\exp \left[ a_j\left( \theta -b_j\right) \right] \right\} \right\} ^{-1}. \end{aligned}$$
(20)

Equation (20) is in the exponential form \(L(\theta | \varvec{x}) = \exp [\eta (\theta )T(\varvec{x})]h(\varvec{x})g(\theta )\), where

$$\begin{aligned} \exp \left[ \eta \left( \theta \right) T\left( \varvec{x}\right) \right]&= \exp \left( \theta \sum _{j=1}^{n}a_j x_j\right) , \end{aligned}$$
(21)
$$\begin{aligned} h\left( \varvec{x}\right)&= \exp \left( \sum _{j=1}^{n}a_j x_j b_j\right) ^{-1}, \end{aligned}$$
(22)

and

$$\begin{aligned} g(\theta ) = \prod _{j=1}^{n}\{1-\exp [a_j(\theta -b_j)]\}^{-1}. \end{aligned}$$
(23)

Appendix B

In the food security data example, we are interested in testing the one-sided hypothesis: \(H_0{:}\; \theta \le 1.93\) against \(H_1{:}\; \theta \ge 1.93\). Using the exact test approach, the following response patterns are rejected at \(\alpha =0.05\) level:

Table 2 18 Patterns that are rejected under the exact test.
Table 3 10 Patterns that are rejected under the asymptotic approach.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, X., Han, Z. & Johnson, M.S. The UMP Exact Test and the Confidence Interval for Person Parameters in IRT Models. Psychometrika 83, 182–202 (2018). https://doi.org/10.1007/s11336-017-9580-y

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-017-9580-y

Keywords

Navigation