Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models

Abstract

In applications of item response theory (IRT), it is often of interest to compute confidence intervals (CIs) for person parameters with prescribed frequentist coverage. The ubiquitous use of short tests in social science research and practices calls for a refinement of standard interval estimation procedures based on asymptotic normality, such as the Wald and Bayesian CIs, which only maintain desirable coverage when the test is sufficiently long. In the current paper, we propose a simple construction of second-order probability matching priors for the person parameter in unidimensional IRT models, which in turn yields CIs with accurate coverage even when the test is composed of a few items. The probability matching property is established based on an expansion of the posterior distribution function and a shrinkage argument. CIs based on the proposed prior can be efficiently computed for a variety of unidimensional IRT models. A real data example with a mixed-format test and a simulation study are presented to compare the proposed method against several existing asymptotic CIs.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Notes

  1. 1.

    Unless otherwise specified, asymptotic CIs are referred to as CIs in the sequel. As pointed out by a referee, Bayesian CIs are typically referred to as credible intervals, although they are usually asymptotic CIs as well. To highlight our focus on the frequentist coverage, the term credible interval is not further used.

  2. 2.

    The dependence of \(f_i(y;\theta )\) on item parameters are omitted from the notation, as item parameters are treated as known in the current setup.

  3. 3.

    Our parameterization of the nominal model slightly differs from Bock (1972), which generalizes Equation 10 of Muraki (1992).

  4. 4.

    Our discussion can be straightforwardly extended to two-sided CIs.

  5. 5.

    The order of the remainder term in Eq. 8 is typically \(O(n^{-(r+1)/2})\), and thus, an rth-order matching CI is also referred to as \((r+1)\)th-order accurate in the literature. Nevertheless, we do not use the latter terminology so as to avoid confusion with the order of PMPs.

  6. 6.

    In Eqs. 18 and 19 , write \(g^{(a, b)}(x_1, x_2) = \partial ^{a+b}g/(\partial x_1^a\partial x_2^b)\) for a differentiable bivariate function \(g(x_1, x_2)\) and \(a, b\in {\mathbb {N}}\).

  7. 7.

    As remarked by a referee, it is also possible to approximate the posterior quantiles by Markov chain Monte Carlo sampling; however, we prefer the current implementation due to computational efficiency.

  8. 8.

    At each \(\theta _0\) level, we selected the most frequently occurred response pattern that has a finite ML estimate and has not been plotted at any lower \(\theta _0\) levels.

References

  1. Barndorff-Nielsen, O., & Cox, D. R. (1979). Edgeworth and saddle-point approximations with statistical applications. Journal of the Royal Statistical Society. Series B (Methodological), 41, 279–312.

    Google Scholar 

  2. Bhattacharya, R. N., & Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion. The Annals of Statistics, 6(2), 434–451.

    Google Scholar 

  3. Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. I). Boca Raton: CRC Press.

    Google Scholar 

  4. Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80(3), 665–688.

    PubMed  Google Scholar 

  5. Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  6. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.

    Google Scholar 

  7. Brazzale, A. R., & Davison, A. C. (2008). Accurate parametric inference for small samples. Statistical Science, 23(4), 465–484.

    Google Scholar 

  8. Brent, R. P. (1973). Some efficient algorithms for solving systems of nonlinear equations. SIAM Journal on Numerical Analysis, 10(2), 327–344.

    Google Scholar 

  9. Briggs, D. C., & Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement: Issues and Practice, 28(4), 3–14.

    Google Scholar 

  10. Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117.

    Google Scholar 

  11. Brown, L. D., Cai, T. T., & DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. The Annals of Statistics, 30(1), 160–201.

    Google Scholar 

  12. Cai, T. T. (2005). One-sided confidence intervals in discrete distributions. Journal of Statistical Planning and Inference, 131(1), 63–88.

    Google Scholar 

  13. Chang, H.-H. (1996). The asymptotic posterior normality of the latent trait for polytomous IRT models. Psychometrika, 61(3), 445–463. https://doi.org/10.1007/BF02294549.

    Article  Google Scholar 

  14. Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52.

    Google Scholar 

  15. Cheng, Y., & Yuan, K.-H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280–291.

    PubMed  PubMed Central  Google Scholar 

  16. Daniels, H. E. (1954). Saddlepoint approximations in statistics. The Annals of Mathematical Statistics, 25, 631–650.

    Google Scholar 

  17. Datta, G., & Mukerjee, R. (2004). Probability matching priors: Higher order asymptotics. New York: Springer.

    Google Scholar 

  18. de la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.

    Google Scholar 

  19. Deutskens, E., De Ruyter, K., Wetzels, M., & Oosterveld, P. (2004). Response rate and response quality of internet-based surveys: An experimental study. Marketing Letters, 15(1), 21–36.

    Google Scholar 

  20. Doebler, A., Doebler, P., & Holling, H. (2013). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78(1), 98–115.

    PubMed  Google Scholar 

  21. Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.

    Google Scholar 

  22. Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378.

    Google Scholar 

  23. Fritsch, F. N., & Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis, 17(2), 238–246.

    Google Scholar 

  24. Ghosh, J. K., & Mukerjee, R. (1993). On priors that match posterior and frequentist distribution functions. Canadian Journal of Statistics, 21(1), 89–96.

    Google Scholar 

  25. Ghosh, J. K., & Ramamoorthi, R. V. (2006). Bayesian nonparametrics. New York: Springer.

    Google Scholar 

  26. Ghosh, M. (2011). Objective priors: An introduction for frequentists. Statistical Science, 26, 187–202.

    Google Scholar 

  27. Glas, C. A., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27(3), 217–233.

    Google Scholar 

  28. Ibragimov, I., & Has’minskii, R. (1981). Statistical estimation: Asymptotic theory. New York: Springer.

    Google Scholar 

  29. Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 186, 453–461.

    Google Scholar 

  30. Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56(2), 213–228.

    Google Scholar 

  31. Liu, X., Han, Z., & Johnson, M. S. (2017). The UMP exact test and the confidence interval for person parameters in IRT models. Psychometrika, 83, 182–202.

    PubMed  Google Scholar 

  32. Liu, Y., & Yang, J. S. (2017a). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika. https://doi.org/10.1007/s11336-017-9582-9.

  33. Liu, Y., & Yang, J. S. (2017b). Interval estimation of latent variable scores in item response theory. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/1076998617732764.

  34. Lord, F. M. (1952). A theory of test scores. New York: Psychometric Society.

    Google Scholar 

  35. Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2), 475–490.

    Google Scholar 

  36. Magis, D. (2015a). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40(1), 96–105.

    Google Scholar 

  37. Magis, D. (2015b). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80(1), 200–204.

    PubMed  Google Scholar 

  38. Magis, D., & Raîche, G. (2012). On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77(1), 163–169.

    Google Scholar 

  39. McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34(1), 100–117.

    Google Scholar 

  40. Mukerjee, R. (2008). Data-dependent probability matching priors for empirical and related likelihoods. In B. Clarke and S. Ghosal (Eds.), Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh (pp. 60–70). Beachwood, Ohio: Institute of Mathematical Statistics.

  41. Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.

    Google Scholar 

  42. Ogasawara, H. (2012). Asymptotic expansions for the ability estimator in item response theory. Computational Statistics, 27(4), 661–683.

    Google Scholar 

  43. Ong, S., & Mukerjee, R. (2010). Data-dependent probability matching priors of the second order. Statistics, 44(3), 291–302.

    Google Scholar 

  44. R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.

  45. Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 4, pp. 321–333).

  46. Reid, N. (1988). Saddlepoint methods and statistical inference. Statistical Science, 3, 213–227.

    Google Scholar 

  47. Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3–38.

    Google Scholar 

  48. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17. Richmond, VA: Psychometric Society.

    Google Scholar 

  49. Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40(4), 343–365.

    Google Scholar 

  50. Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.), The SAGE handbook of quantitative methods in psychology (pp. 148–177). London: Sage Publications.

    Google Scholar 

  51. Thissen, D., & Wainer, H. (2001). Test scoring. New York: Taylor & Francis.

    Google Scholar 

  52. van der Linden, W. (2016). Handbook of item response theory, volume one: Models. Boca Raton: CRC Press.

    Google Scholar 

  53. van der Linden, W., & Glas, C. (2007). Computerized adaptive testing: Theory and practice. Dordrecht: Springer.

    Google Scholar 

  54. Wang, X., Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41(4), 243–263.

    PubMed  PubMed Central  Google Scholar 

  55. Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.

    Google Scholar 

  56. Wasserman, L. (2000). Asymptotic inference for mixture models by using data-dependent priors. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(1), 159–180.

    Google Scholar 

  57. Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.

    Google Scholar 

  58. Welch, B., & Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. Journal of the Royal Statistical Society. Series B (Methodological), 25(2), 318–329.

    Google Scholar 

  59. Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290.

    PubMed  Google Scholar 

Download references

Acknowledgements

Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1512945 and 1633074.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Hannig, J. & Pal Majumder, A. Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models. Psychometrika 84, 701–718 (2019). https://doi.org/10.1007/s11336-019-09675-4

Download citation

Keywords

  • item response theory
  • test scoring
  • person parameter
  • objective Bayes
  • probability matching prior
  • data-dependent prior
  • higher-order asymptotics
  • Edgeworth expansion
  • confidence interval