Skip to main content

Advertisement

Log in

Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models

  • Published:
Psychometrika Aims and scope Submit manuscript

Abstract

In applications of item response theory (IRT), it is often of interest to compute confidence intervals (CIs) for person parameters with prescribed frequentist coverage. The ubiquitous use of short tests in social science research and practices calls for a refinement of standard interval estimation procedures based on asymptotic normality, such as the Wald and Bayesian CIs, which only maintain desirable coverage when the test is sufficiently long. In the current paper, we propose a simple construction of second-order probability matching priors for the person parameter in unidimensional IRT models, which in turn yields CIs with accurate coverage even when the test is composed of a few items. The probability matching property is established based on an expansion of the posterior distribution function and a shrinkage argument. CIs based on the proposed prior can be efficiently computed for a variety of unidimensional IRT models. A real data example with a mixed-format test and a simulation study are presented to compare the proposed method against several existing asymptotic CIs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Notes

  1. Unless otherwise specified, asymptotic CIs are referred to as CIs in the sequel. As pointed out by a referee, Bayesian CIs are typically referred to as credible intervals, although they are usually asymptotic CIs as well. To highlight our focus on the frequentist coverage, the term credible interval is not further used.

  2. The dependence of \(f_i(y;\theta )\) on item parameters are omitted from the notation, as item parameters are treated as known in the current setup.

  3. Our parameterization of the nominal model slightly differs from Bock (1972), which generalizes Equation 10 of Muraki (1992).

  4. Our discussion can be straightforwardly extended to two-sided CIs.

  5. The order of the remainder term in Eq. 8 is typically \(O(n^{-(r+1)/2})\), and thus, an rth-order matching CI is also referred to as \((r+1)\)th-order accurate in the literature. Nevertheless, we do not use the latter terminology so as to avoid confusion with the order of PMPs.

  6. In Eqs. 18 and 19 , write \(g^{(a, b)}(x_1, x_2) = \partial ^{a+b}g/(\partial x_1^a\partial x_2^b)\) for a differentiable bivariate function \(g(x_1, x_2)\) and \(a, b\in {\mathbb {N}}\).

  7. As remarked by a referee, it is also possible to approximate the posterior quantiles by Markov chain Monte Carlo sampling; however, we prefer the current implementation due to computational efficiency.

  8. At each \(\theta _0\) level, we selected the most frequently occurred response pattern that has a finite ML estimate and has not been plotted at any lower \(\theta _0\) levels.

References

  • Barndorff-Nielsen, O., & Cox, D. R. (1979). Edgeworth and saddle-point approximations with statistical applications. Journal of the Royal Statistical Society. Series B (Methodological), 41, 279–312.

    Google Scholar 

  • Bhattacharya, R. N., & Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion. The Annals of Statistics, 6(2), 434–451.

    Google Scholar 

  • Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. I). Boca Raton: CRC Press.

    Google Scholar 

  • Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model. Psychometrika, 80(3), 665–688.

    PubMed  Google Scholar 

  • Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  • Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.

    Google Scholar 

  • Brazzale, A. R., & Davison, A. C. (2008). Accurate parametric inference for small samples. Statistical Science, 23(4), 465–484.

    Google Scholar 

  • Brent, R. P. (1973). Some efficient algorithms for solving systems of nonlinear equations. SIAM Journal on Numerical Analysis, 10(2), 327–344.

    Google Scholar 

  • Briggs, D. C., & Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations. Educational Measurement: Issues and Practice, 28(4), 3–14.

    Google Scholar 

  • Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16, 101–117.

    Google Scholar 

  • Brown, L. D., Cai, T. T., & DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions. The Annals of Statistics, 30(1), 160–201.

    Google Scholar 

  • Cai, T. T. (2005). One-sided confidence intervals in discrete distributions. Journal of Statistical Planning and Inference, 131(1), 63–88.

    Google Scholar 

  • Chang, H.-H. (1996). The asymptotic posterior normality of the latent trait for polytomous IRT models. Psychometrika, 61(3), 445–463. https://doi.org/10.1007/BF02294549.

    Article  Google Scholar 

  • Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model. Psychometrika, 58(1), 37–52.

    Google Scholar 

  • Cheng, Y., & Yuan, K.-H. (2010). The impact of fallible item parameter estimates on latent trait recovery. Psychometrika, 75(2), 280–291.

    PubMed  PubMed Central  Google Scholar 

  • Daniels, H. E. (1954). Saddlepoint approximations in statistics. The Annals of Mathematical Statistics, 25, 631–650.

    Google Scholar 

  • Datta, G., & Mukerjee, R. (2004). Probability matching priors: Higher order asymptotics. New York: Springer.

    Google Scholar 

  • de la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution. Journal of Educational Measurement, 45(2), 159–177.

    Google Scholar 

  • Deutskens, E., De Ruyter, K., Wetzels, M., & Oosterveld, P. (2004). Response rate and response quality of internet-based surveys: An experimental study. Marketing Letters, 15(1), 21–36.

    Google Scholar 

  • Doebler, A., Doebler, P., & Holling, H. (2013). Optimal and most exact confidence intervals for person parameters in item response theory models. Psychometrika, 78(1), 98–115.

    PubMed  Google Scholar 

  • Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80(1), 27–38.

    Google Scholar 

  • Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378.

    Google Scholar 

  • Fritsch, F. N., & Carlson, R. E. (1980). Monotone piecewise cubic interpolation. SIAM Journal on Numerical Analysis, 17(2), 238–246.

    Google Scholar 

  • Ghosh, J. K., & Mukerjee, R. (1993). On priors that match posterior and frequentist distribution functions. Canadian Journal of Statistics, 21(1), 89–96.

    Google Scholar 

  • Ghosh, J. K., & Ramamoorthi, R. V. (2006). Bayesian nonparametrics. New York: Springer.

    Google Scholar 

  • Ghosh, M. (2011). Objective priors: An introduction for frequentists. Statistical Science, 26, 187–202.

    Google Scholar 

  • Glas, C. A., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27(3), 217–233.

    Google Scholar 

  • Ibragimov, I., & Has’minskii, R. (1981). Statistical estimation: Asymptotic theory. New York: Springer.

    Google Scholar 

  • Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 186, 453–461.

    Google Scholar 

  • Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model. Psychometrika, 56(2), 213–228.

    Google Scholar 

  • Liu, X., Han, Z., & Johnson, M. S. (2017). The UMP exact test and the confidence interval for person parameters in IRT models. Psychometrika, 83, 182–202.

    PubMed  Google Scholar 

  • Liu, Y., & Yang, J. S. (2017a). Bootstrap-calibrated interval estimates for latent variable scores in item response theory. Psychometrika. https://doi.org/10.1007/s11336-017-9582-9.

  • Liu, Y., & Yang, J. S. (2017b). Interval estimation of latent variable scores in item response theory. Journal of Educational and Behavioral Statistics. https://doi.org/10.3102/1076998617732764.

  • Lord, F. M. (1952). A theory of test scores. New York: Psychometric Society.

    Google Scholar 

  • Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables. Advances in Applied Probability, 12(2), 475–490.

    Google Scholar 

  • Magis, D. (2015a). A note on the equivalence between observed and expected information functions with polytomous IRT models. Journal of Educational and Behavioral Statistics, 40(1), 96–105.

    Google Scholar 

  • Magis, D. (2015b). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models. Psychometrika, 80(1), 200–204.

    PubMed  Google Scholar 

  • Magis, D., & Raîche, G. (2012). On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models. Psychometrika, 77(1), 163–169.

    Google Scholar 

  • McDonald, R. P. (1981). The dimensionality of tests and items. British Journal of Mathematical and Statistical Psychology, 34(1), 100–117.

    Google Scholar 

  • Mukerjee, R. (2008). Data-dependent probability matching priors for empirical and related likelihoods. In B. Clarke and S. Ghosal (Eds.), Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh (pp. 60–70). Beachwood, Ohio: Institute of Mathematical Statistics.

  • Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159–176.

    Google Scholar 

  • Ogasawara, H. (2012). Asymptotic expansions for the ability estimator in item response theory. Computational Statistics, 27(4), 661–683.

    Google Scholar 

  • Ong, S., & Mukerjee, R. (2010). Data-dependent probability matching priors of the second order. Statistics, 44(3), 291–302.

    Google Scholar 

  • R Core Team. (2018). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from https://www.R-project.org/.

  • Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In Proceedings of the fourth Berkeley symposium on mathematical statistics and probability (Vol. 4, pp. 321–333).

  • Reid, N. (1988). Saddlepoint methods and statistical inference. Statistical Science, 3, 213–227.

    Google Scholar 

  • Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies. Psychological Test and Assessment Modeling, 55(1), 3–38.

    Google Scholar 

  • Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17. Richmond, VA: Psychometric Society.

    Google Scholar 

  • Sinharay, S. (2015). Assessment of person fit for mixed-format tests. Journal of Educational and Behavioral Statistics, 40(4), 343–365.

    Google Scholar 

  • Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.), The SAGE handbook of quantitative methods in psychology (pp. 148–177). London: Sage Publications.

    Google Scholar 

  • Thissen, D., & Wainer, H. (2001). Test scoring. New York: Taylor & Francis.

    Google Scholar 

  • van der Linden, W. (2016). Handbook of item response theory, volume one: Models. Boca Raton: CRC Press.

    Google Scholar 

  • van der Linden, W., & Glas, C. (2007). Computerized adaptive testing: Theory and practice. Dordrecht: Springer.

    Google Scholar 

  • Wang, X., Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method. Applied Psychological Measurement, 41(4), 243–263.

    PubMed  PubMed Central  Google Scholar 

  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427–450.

    Google Scholar 

  • Wasserman, L. (2000). Asymptotic inference for mixture models by using data-dependent priors. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 62(1), 159–180.

    Google Scholar 

  • Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods. Journal of Statistical Software, 35(12), 1–33.

    Google Scholar 

  • Welch, B., & Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. Journal of the Royal Statistical Society. Series B (Methodological), 25(2), 318–329.

    Google Scholar 

  • Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290.

    PubMed  Google Scholar 

Download references

Acknowledgements

Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1512945 and 1633074.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Hannig, J. & Pal Majumder, A. Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models. Psychometrika 84, 701–718 (2019). https://doi.org/10.1007/s11336-019-09675-4

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11336-019-09675-4

Keywords

Navigation