## Abstract

In applications of item response theory (IRT), it is often of interest to compute confidence intervals (CIs) for person parameters with prescribed frequentist coverage. The ubiquitous use of short tests in social science research and practices calls for a refinement of standard interval estimation procedures based on asymptotic normality, such as the Wald and Bayesian CIs, which only maintain desirable coverage when the test is sufficiently long. In the current paper, we propose a simple construction of second-order probability matching priors for the person parameter in unidimensional IRT models, which in turn yields CIs with accurate coverage even when the test is composed of a few items. The probability matching property is established based on an expansion of the posterior distribution function and a shrinkage argument. CIs based on the proposed prior can be efficiently computed for a variety of unidimensional IRT models. A real data example with a mixed-format test and a simulation study are presented to compare the proposed method against several existing asymptotic CIs.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
Unless otherwise specified, asymptotic CIs are referred to as CIs in the sequel. As pointed out by a referee, Bayesian CIs are typically referred to as credible intervals, although they are usually asymptotic CIs as well. To highlight our focus on the frequentist coverage, the term credible interval is not further used.

- 2.
The dependence of \(f_i(y;\theta )\) on item parameters are omitted from the notation, as item parameters are treated as known in the current setup.

- 3.
- 4.
Our discussion can be straightforwardly extended to two-sided CIs.

- 5.
The order of the remainder term in Eq. 8 is typically \(O(n^{-(r+1)/2})\), and thus, an

*r*th-order matching CI is also referred to as \((r+1)\)*th-order accurate*in the literature. Nevertheless, we do not use the latter terminology so as to avoid confusion with the order of PMPs. - 6.
- 7.
As remarked by a referee, it is also possible to approximate the posterior quantiles by Markov chain Monte Carlo sampling; however, we prefer the current implementation due to computational efficiency.

- 8.
At each \(\theta _0\) level, we selected the most frequently occurred response pattern that has a finite ML estimate and has not been plotted at any lower \(\theta _0\) levels.

## References

Barndorff-Nielsen, O., & Cox, D. R. (1979). Edgeworth and saddle-point approximations with statistical applications.

*Journal of the Royal Statistical Society. Series B (Methodological)*,*41*, 279–312.Bhattacharya, R. N., & Ghosh, J. K. (1978). On the validity of the formal Edgeworth expansion.

*The Annals of Statistics*,*6*(2), 434–451.Bickel, P. J., & Doksum, K. A. (2015).

*Mathematical statistics: Basic ideas and selected topics*(2nd ed., Vol. I). Boca Raton: CRC Press.Biehler, M., Holling, H., & Doebler, P. (2015). Saddlepoint approximations of the distribution of the person parameter in the two parameter logistic model.

*Psychometrika*,*80*(3), 665–688.Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.),

*Statistical theories of mental test scores*(pp. 395–479). Reading, MA: Addison-Wesley.Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories.

*Psychometrika*,*37*(1), 29–51.Brazzale, A. R., & Davison, A. C. (2008). Accurate parametric inference for small samples.

*Statistical Science*,*23*(4), 465–484.Brent, R. P. (1973). Some efficient algorithms for solving systems of nonlinear equations.

*SIAM Journal on Numerical Analysis*,*10*(2), 327–344.Briggs, D. C., & Weeks, J. P. (2009). The impact of vertical scaling decisions on growth interpretations.

*Educational Measurement: Issues and Practice*,*28*(4), 3–14.Brown, L. D., Cai, T. T., & DasGupta, A. (2001). Interval estimation for a binomial proportion.

*Statistical Science*,*16*, 101–117.Brown, L. D., Cai, T. T., & DasGupta, A. (2002). Confidence intervals for a binomial proportion and asymptotic expansions.

*The Annals of Statistics*,*30*(1), 160–201.Cai, T. T. (2005). One-sided confidence intervals in discrete distributions.

*Journal of Statistical Planning and Inference*,*131*(1), 63–88.Chang, H.-H. (1996). The asymptotic posterior normality of the latent trait for polytomous IRT models.

*Psychometrika*,*61*(3), 445–463. https://doi.org/10.1007/BF02294549.Chang, H.-H., & Stout, W. (1993). The asymptotic posterior normality of the latent trait in an IRT model.

*Psychometrika*,*58*(1), 37–52.Cheng, Y., & Yuan, K.-H. (2010). The impact of fallible item parameter estimates on latent trait recovery.

*Psychometrika*,*75*(2), 280–291.Daniels, H. E. (1954). Saddlepoint approximations in statistics.

*The Annals of Mathematical Statistics*,*25*, 631–650.Datta, G., & Mukerjee, R. (2004).

*Probability matching priors: Higher order asymptotics*. New York: Springer.de la Torre, J., & Deng, W. (2008). Improving person-fit assessment by correcting the ability estimate and its reference distribution.

*Journal of Educational Measurement*,*45*(2), 159–177.Deutskens, E., De Ruyter, K., Wetzels, M., & Oosterveld, P. (2004). Response rate and response quality of internet-based surveys: An experimental study.

*Marketing Letters*,*15*(1), 21–36.Doebler, A., Doebler, P., & Holling, H. (2013). Optimal and most exact confidence intervals for person parameters in item response theory models.

*Psychometrika*,*78*(1), 98–115.Firth, D. (1993). Bias reduction of maximum likelihood estimates.

*Biometrika*,*80*(1), 27–38.Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations.

*Social Psychological and Personality Science*,*8*(4), 370–378.Fritsch, F. N., & Carlson, R. E. (1980). Monotone piecewise cubic interpolation.

*SIAM Journal on Numerical Analysis*,*17*(2), 238–246.Ghosh, J. K., & Mukerjee, R. (1993). On priors that match posterior and frequentist distribution functions.

*Canadian Journal of Statistics*,*21*(1), 89–96.Ghosh, J. K., & Ramamoorthi, R. V. (2006).

*Bayesian nonparametrics*. New York: Springer.Ghosh, M. (2011). Objective priors: An introduction for frequentists.

*Statistical Science*,*26*, 187–202.Glas, C. A., & Meijer, R. R. (2003). A Bayesian approach to person fit analysis in item response theory models.

*Applied Psychological Measurement*,*27*(3), 217–233.Ibragimov, I., & Has’minskii, R. (1981).

*Statistical estimation: Asymptotic theory*. New York: Springer.Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems.

*Proceedings of the Royal Society of London A: Mathematical, Physical and Engineering Sciences*,*186*, 453–461.Klauer, K. C. (1991). An exact and optimal standardized person test for assessing consistency with the Rasch model.

*Psychometrika*,*56*(2), 213–228.Liu, X., Han, Z., & Johnson, M. S. (2017). The UMP exact test and the confidence interval for person parameters in IRT models.

*Psychometrika*,*83*, 182–202.Liu, Y., & Yang, J. S. (2017a). Bootstrap-calibrated interval estimates for latent variable scores in item response theory.

*Psychometrika*. https://doi.org/10.1007/s11336-017-9582-9.Liu, Y., & Yang, J. S. (2017b). Interval estimation of latent variable scores in item response theory.

*Journal of Educational and Behavioral Statistics*. https://doi.org/10.3102/1076998617732764.Lord, F. M. (1952).

*A theory of test scores*. New York: Psychometric Society.Lugannani, R., & Rice, S. (1980). Saddle point approximation for the distribution of the sum of independent random variables.

*Advances in Applied Probability*,*12*(2), 475–490.Magis, D. (2015a). A note on the equivalence between observed and expected information functions with polytomous IRT models.

*Journal of Educational and Behavioral Statistics*,*40*(1), 96–105.Magis, D. (2015b). A note on weighted likelihood and Jeffreys modal estimation of proficiency levels in polytomous item response models.

*Psychometrika*,*80*(1), 200–204.Magis, D., & Raîche, G. (2012). On the relationships between Jeffreys modal and weighted likelihood estimation of ability under logistic IRT models.

*Psychometrika*,*77*(1), 163–169.McDonald, R. P. (1981). The dimensionality of tests and items.

*British Journal of Mathematical and Statistical Psychology*,*34*(1), 100–117.Mukerjee, R. (2008). Data-dependent probability matching priors for empirical and related likelihoods. In B. Clarke and S. Ghosal (Eds.),

*Pushing the limits of contemporary statistics: Contributions in honor of Jayanta K. Ghosh*(pp. 60–70). Beachwood, Ohio: Institute of Mathematical Statistics.Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm.

*Applied Psychological Measurement*,*16*(2), 159–176.Ogasawara, H. (2012). Asymptotic expansions for the ability estimator in item response theory.

*Computational Statistics*,*27*(4), 661–683.Ong, S., & Mukerjee, R. (2010). Data-dependent probability matching priors of the second order.

*Statistics*,*44*(3), 291–302.R Core Team. (2018).

*R: A language and environment for statistical computing [Computer software manual]*. Vienna, Austria. Retrieved from https://www.R-project.org/.Rasch, G. (1961). On general laws and the meaning of measurement in psychology. In

*Proceedings of the fourth Berkeley symposium on mathematical statistics and probability*(Vol. 4, pp. 321–333).Reid, N. (1988). Saddlepoint methods and statistical inference.

*Statistical Science*,*3*, 213–227.Rupp, A. A. (2013). A systematic review of the methodology for person fit research in item response theory: Lessons about generalizability of inferences from the design of simulation studies.

*Psychological Test and Assessment Modeling*,*55*(1), 3–38.Samejima, F. (1969).

*Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17*. Richmond, VA: Psychometric Society.Sinharay, S. (2015). Assessment of person fit for mixed-format tests.

*Journal of Educational and Behavioral Statistics*,*40*(4), 343–365.Thissen, D., & Steinberg, L. (2009). Item response theory. In R. Millsap & A. Maydeu-Olivares (Eds.),

*The SAGE handbook of quantitative methods in psychology*(pp. 148–177). London: Sage Publications.Thissen, D., & Wainer, H. (2001).

*Test scoring*. New York: Taylor & Francis.van der Linden, W. (2016).

*Handbook of item response theory, volume one: Models*. Boca Raton: CRC Press.van der Linden, W., & Glas, C. (2007).

*Computerized adaptive testing: Theory and practice*. Dordrecht: Springer.Wang, X., Liu, Y., & Hambleton, R. K. (2017). Detecting item preknowledge using a predictive checking method.

*Applied Psychological Measurement*,*41*(4), 243–263.Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory.

*Psychometrika*,*54*(3), 427–450.Wasserman, L. (2000). Asymptotic inference for mixture models by using data-dependent priors.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*,*62*(1), 159–180.Weeks, J. P. (2010). plink: An R package for linking mixed-format tests using IRT-based methods.

*Journal of Statistical Software*,*35*(12), 1–33.Welch, B., & Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods.

*Journal of the Royal Statistical Society. Series B (Methodological)*,*25*(2), 318–329.Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores.

*Educational and Psychological Measurement*,*72*(2), 264–290.

## Acknowledgements

Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1512945 and 1633074.

## Author information

### Affiliations

### Corresponding author

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Electronic supplementary material

Below is the link to the electronic supplementary material.

## Rights and permissions

## About this article

### Cite this article

Liu, Y., Hannig, J. & Pal Majumder, A. Second-Order Probability Matching Priors for the Person Parameter in Unidimensional IRT Models.
*Psychometrika* **84, **701–718 (2019). https://doi.org/10.1007/s11336-019-09675-4

Received:

Revised:

Published:

Issue Date:

### Keywords

- item response theory
- test scoring
- person parameter
- objective Bayes
- probability matching prior
- data-dependent prior
- higher-order asymptotics
- Edgeworth expansion
- confidence interval