Generalized Fiducial Inference for Logistic Graded Response Models

Abstract

Samejima’s graded response model (GRM) has gained popularity in the analyses of ordinal response data in psychological, educational, and health-related assessment. Obtaining high-quality point and interval estimates for GRM parameters attracts a great deal of attention in the literature. In the current work, we derive generalized fiducial inference (GFI) for a family of multidimensional graded response model, implement a Gibbs sampler to perform fiducial estimation, and compare its finite-sample performance with several commonly used likelihood-based and Bayesian approaches via three simulation studies. It is found that the proposed method is able to yield reliable inference even in the presence of small sample size and extreme generating parameter values, outperforming the other candidate methods under investigation. The use of GFI as a convenient tool to quantify sampling variability in various inferential procedures is illustrated by an empirical data analysis using the patient-reported emotional distress data.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Notes

  1. 1.

    As a notational convention, we use the corresponding lowercase letter for a fixed value of the random variable.

  2. 2.

    The added asterisk is used to distinguish a random variable from its data-generating counterpart, which is adopted as a notational convention in the sequel.

  3. 3.

    All appendices are included as online supplemental materials.

  4. 4.

    Note that a different order-statistic approach applied to the difficulty parameters (see Eq. 17) was recommended by both Curtis (2010) and Kieftenbeld and Natesan (2012).

  5. 5.

    Eq. S.1 implies \(P\{Y_{ij}\ge k|\mathbf{Z}_i = \mathbf{z}_i\} = \Psi (\alpha _{jk} + {\varvec{\beta }}_j{}^\top \mathbf{z}_i)\).

References

  1. Agresti, A. (2002). Categorical data analysis. Hoboken, NJ: Wiley.

    Google Scholar 

  2. Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. i). Boca Raton, FL: CRC Press.

  3. Birnbaum, A. (1968). Some latent train models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.

    Google Scholar 

  4. Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37(1), 29–51.

    Article  Google Scholar 

  5. Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459.

    Article  Google Scholar 

  6. Bock, R. D., & Lieberman, M. (1970). Fitting a response model for \(n\) dichotomously scored items. Psychometrika, 35(2), 179–197.

    Article  Google Scholar 

  7. Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168.

    Article  Google Scholar 

  8. Cai, L. (2008). SEM of another flavour: Two new applications of the supplemented EM algorithm. British Journal of Mathematical and Statistical Psychology, 61(2), 309–329.

    Article  PubMed  Google Scholar 

  9. Cai, L. (2010a). High-dimensional exploratory item factor analysis by a Metropolis-Hastings Robbins-Monro algorithm. Psychometrika, 75(1), 33–57.

    Article  Google Scholar 

  10. Cai, L. (2010b). Metropolis-Hastings Robbins-Monro algorithm for confirmatory item factor analysis. Journal of Educational and Behavioral Statistics, 35(3), 307–335.

    Article  Google Scholar 

  11. Cai, L. (2010c). A two-tier full-information item factor analysis model with applications. Psychometrika, 75(4), 581–612.

    Article  Google Scholar 

  12. Cai, L., Thissen, D., & du Toit, S. H. C. (2011). IRTPRO for windows [Computer software manual]. Lincolnwood, IL: Scientific Software International.

    Google Scholar 

  13. Carpenter, B., Gelman, A., Hoffman, M., Lee, D., Goodrich, B., Betancourt, M., et al. (2016). Stan: A probabilistic programming language. Journal of Statistical Software, 76(1), 1–32.

    Google Scholar 

  14. Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. http://www.jstatsoft.org/v48/i06/.

  15. Cisewski, J., & Hannig, J. (2012). Generalized fiducial inference for normal linear mixed models. The Annals of Statistics, 40(4), 2102–2127.

    Article  Google Scholar 

  16. Curtis, S. M. (2010). BUGS code for item response theory. Journal of Statistical Software, 36(1), 1–34.

    Google Scholar 

  17. Datta, G. S., & Mukerjee, R. (2004). Probability matching priors: Higher order asymptotics. New York: Springer.

    Google Scholar 

  18. Doucet, A., De Freitas, N., & Gordon, N. (2001). An introduction to sequential Monte Carlo methods. New York: Springer.

    Google Scholar 

  19. Duong, T. (2014). ks: Kernel smoothing [Computer software manual]. R package version 1.9.3. http://CRAN.R-project.org/package=ks.

  20. Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item factor analysis. Psychometrika, 75(3), 474–497.

    Article  Google Scholar 

  21. Efron, B. (1998). R. A. Fisher in the 21st century. Statistical Science, 13(2), 95–114.

    Article  Google Scholar 

  22. Efron, B., & Tibshirani, R. (1994). An Introduction to the bootstrap. Boca Raton, FL: CRC Press. Retrieved from https://books.google.com/books?id=gLlpIUxRntoC.

  23. Fisher, R. A. (1930). Inverse probability. Proceedings of the Cambridge Philosophical Society, 26, 528–535.

    Article  Google Scholar 

  24. Fisher, R. A. (1933). The concepts of inverse probability and fiducial probability referring to unknown parameters. Proceedings of the Royal Society of London Series A, 139(838), 343–348.

    Article  Google Scholar 

  25. Fisher, R. A. (1935). The fiducial argument in statistical inference. Annals of Eugenics, 6(4), 391–398.

    Article  Google Scholar 

  26. Forero, C. G., Maydeu-Olivares, A., & Gallardo-Pujol, D. (2009). Factor analysis with ordinal indicators: A Monte Carlo study comparing DWLS and ULS estimation. Structural Equation Modeling, 16(4), 625–641.

    Article  Google Scholar 

  27. Gelman, A., Jakulin, A., Pittau, M. G., & Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. The Annals of Applied Statistics, 2(4), 1360–1383.

    Article  Google Scholar 

  28. Ghosh, J., & Bickel, P. J. (1990). A decomposition for the likelihood ratio statistic and the bartlett correction: A Bayesian argument. Annals of Statistics, 18(3), 1070–1090.

    Article  Google Scholar 

  29. Haberman, S. J. (2006). Adaptive quadrature for item response models. ETS Research Report Series, 2006(2), 1–10.

    Article  Google Scholar 

  30. Haberman, S. J. (2013). A general program for item-response analysis that employs the stabilized newton-raphson algorithm. ETS Research Report Series, 2013(2). doi:10.1002/j.2333-8504.2013.tb02339.x.

  31. Hamilton, M. (1960). A rating scale for depression. Journal of Neurology, Neurosurgery, and Psychiatry, 23(1), 56–62.

    Article  PubMed  PubMed Central  Google Scholar 

  32. Hannig, J. (2009). On generalized fiducial inference. Statistica Sinica, 19(2), 491.

    Google Scholar 

  33. Hannig, J. (2013). Generalized fiducial inference via discretization. Statistica Sinica, 23(2), 489–514.

    Google Scholar 

  34. Hannig, J., Iyer, H., Lai, R. C. S., & Lee, T.C.M. (2015). Generalized fiducial inference: A review (Unpublished manuscript).

  35. Hill, C. D. (2004). Precision of parameter estimates for the graded item response model (Unpublished master’s thesis). The University of North Carolina at Chapel Hill.

  36. Houts, C. R., & Cai, L. (2013). flexMIRT user’s manual version 2: Flexible multilevel multidimensional item analysis and test scoring [Computer software manual]. Chapel Hill, NC: Vector Psychometric Group.

    Google Scholar 

  37. Irwin, D. E., Stucky, B., Langer, M. M., Thissen, D., DeWitt, E. M., Lai, J. S., et al. (2010). An item response analysis of the pediatric PROMIS anxiety and depressive symptoms scales. Quality of Life Research, 19(4), 595–607.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Kieftenbeld, V., & Natesan, P. (2012). Recovery of graded response model parameters: A comparison of marginal maximum likelihood and Markov chain Monte Carlo estimation. Applied Psychological Measurement, 36(5), 399–419.

    Article  Google Scholar 

  39. Lehmann, E. (1999). Elements of large-sample theory. New York, NY: Springer. Retrieved from https://books.google.com/books?id=geIoxvgTXlEC.

  40. Liu, Y. (2015). Generalized fiducial inference for graded response models. (Doctoral dissertation), Retrieved from ProQuest Dissertations and Theses (Accession No. UNC15157)

  41. Liu, Y., & Hannig, J. (2016). Generalized fiducial inference for binary logistic item response models. Psychometrika, 81(2), 290–324.

    Article  PubMed  Google Scholar 

  42. Liu, Y., & Thissen, D. (2014). Comparing score tests and other local dependence diagnostics for the graded response model. British Journal of Mathematical and Statistical Psychology, 67(3), 496–513.

    Article  PubMed  Google Scholar 

  43. Meng, X. L., & Schilling, S. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling. Journal of the American Statistical Association, 91(435), 1254–1267.

    Article  Google Scholar 

  44. Muthén, L. K., & Muthén, B. O. (2012). Mplus user’s guide [Computer software manual]. Los Angeles, CA: Muthén & Muthén.

    Google Scholar 

  45. Pal Majumder, A., & Hannig, J. (2016). Higher order asymptotics of Generalized Fiducial Distribution (Unpublished manuscript).

  46. Plummer, M. (2013a). Jags version 3.4.0 user manual [Computer software manual]. http://sourceforge.net/mcmc-jags/files/Manuals/3.x/.

  47. Plummer, M., (2013b). rjags: Bayesian graphical models using MCMC [Computer software manual]. R package version 3-10. http://CRAN.R-project.org/package=rjags.

  48. Reckase, M. (2009). Multidimensional item response theory. New York: Springer.

    Google Scholar 

  49. Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic assessment: Theory, methods, and applications. New York: Guilford.

    Google Scholar 

  50. Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika monograph (Vol. 17). Richmond, VA: Psychometric Society.

  51. Schilling, S., & Bock, R. D. (2005). High-dimensional maximum marginal likelihood item factor analysis by adaptive quadrature. Psychometrika, 70(3), 533–555.

    Google Scholar 

  52. Schweder, T., & Hjort, N. L. (2002). Confidence and likelihood. Scandinavian Journal of Statistics, 29(2), 309–332.

    Article  Google Scholar 

  53. Spiegelhalter, D., Thomas, A., & Best, N. D. L. (2010). OpenBUGS version 3.1.1 user manual. http://www.openbugs.info/.

  54. Thissen, D., & Hill, C. D. (2004). Infinite slope estimates in item response theory. Presentation at the annual meeting of the Psychometric Society, Monterey, CA, June 14–17.

  55. Thissen, D., & Steinberg, L. (1988). Data analysis using item response theory. Psychological Bulletin, 104(3), 385–395.

    Article  Google Scholar 

  56. Thissen, D., & Steinberg, L. (2010). Using item response theory to disentangle constructs at different levels of generality. In S. Embretson (Ed.), Measuring psychological constructs: Advances in model-based approaches (pp. 123–144). Washington, DC: American Psychological Association.

    Google Scholar 

  57. van der Vaart, A. W. (2000). Asymptotic statistics. New York: Cambridge University Press.

    Google Scholar 

  58. Wand, M. P., & Jones, M. C. (1994). Kernel smoothing. London: Chapman and Hall.

    Google Scholar 

  59. Wirth, R., & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Xie, M., & Singh, K. (2013). Confidence distribution, the frequentist distribution estimator of a parameter: A review. International Statistical Review, 81(1), 3–39.

    Article  Google Scholar 

  61. Yang, J. S., Hansen, M., & Cai, L. (2012). Characterizing sources of uncertainty in item response theory scale scores. Educational and Psychological Measurement, 72(2), 264–290.

    Article  PubMed  Google Scholar 

  62. Yuan, K. H., Cheng, Y., & Patton, J. (2014). Information matrices and standard errors for MLEs of item parameters in IRT. Psychometrika, 79(2), 232–254.

    Article  PubMed  Google Scholar 

  63. Zabell, S. L. (1992). R. A. Fisher and fiducial argument. Statistical Science, 7(3), 369–387.

    Article  Google Scholar 

Download references

Acknowledgements

We are grateful to Drs. David Thissen, Daniel Bauer, Patrick Curran, and Andrea Hussong from the Department of Psychology at the University of North Carolina at Chapel Hill, and Drs. Shelby Haberman and Yi-Hsuan Lee at Educational Testing Service (ETS) for their valuable advice and feedback on this paper. The work is sponsored by the Harold Gulliksen Psychometric Research Fellowship generously offered by ETS. Jan Hannig’s research was supported in part by the National Science Foundation under Grant No. 1512945 and 1633074.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Yang Liu.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 615 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Hannig, J. Generalized Fiducial Inference for Logistic Graded Response Models. Psychometrika 82, 1097–1125 (2017). https://doi.org/10.1007/s11336-017-9554-0

Download citation

Keywords

  • generalized fiducial inference
  • confidence interval
  • Markov chain Monte Carlo
  • Bernstein–von Mises theorem
  • item response theory
  • graded response model
  • bifactor model