, Volume 66, Issue 4, pp 541–561 | Cite as

MCMC estimation and some model-fit analysis of multidimensional IRT models

  • A. A. Béguin
  • C. A. W. Glas


A Bayesian procedure to estimate the three-parameter normal ogive model and a generalization of the procedure to a model with multidimensional ability parameters are presented. The procedure is a generalization of a procedure by Albert (1992) for estimating the two-parameter normal ogive model. The procedure supports analyzing data from multiple populations and incomplete designs. It is shown that restrictions can be imposed on the factor matrix for testing specific hypotheses about the ability structure. The technique is illustrated using simulated and real data.

Key words

Bayes estimates full-information factor analysis Gibbs sampler item response theory Markov chain Monte Carlo multidimensional item response theory normal ogive model 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ackerman, T.A. (1996a). Developments in multidimensional item response theory.Applied Psychological Measurement, 20, 309–310.Google Scholar
  2. Ackerman, T.A. (1996b). Graphical representation of multidimensional item response theory analyses.Applied Psychological Measurement, 20, 311–329.Google Scholar
  3. ACT. (1997).ACT Assessment Technical Manual. Iowa City, IA: Author.Google Scholar
  4. Albert, J.H. (1992). Bayesian estimation of normal ogive item response functions using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.Google Scholar
  5. Andersen, E.B. (1973). A goodness of for test for the Rasch model.Psychometrika, 38, 123–140.CrossRefGoogle Scholar
  6. Baker, F.B. (1998). An investigation of item parameter recovery characteristics of a Gibbs sampling procedure.Applied Psychological Measurement, 22, 153–169.Google Scholar
  7. Bock, R.D., Gibbons, R.D., & Muraki, E. (1988). Full-information factor analysis.Applied Psychological Measurement, 12, 261–280.Google Scholar
  8. Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: An application of an EM-algorithm.Psychometrika, 46, 443–459.CrossRefGoogle Scholar
  9. Bock, R.D., & Schilling, S.G. (1997). High dimensional full-information item factor analysis. In M. Berkane (Ed.),Latent variable modeling and applications of causality (pp. 163–176). New York, NY: Springer.Google Scholar
  10. Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 433–448). New York, NY: Springer.Google Scholar
  11. Box, G., & Tiao, G. (1973).Bayesian inference in statistical analysis. Reading, MA: Addison-Wesley.Google Scholar
  12. Bradlow, E.T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets.Psychometrika, 64, 153–168.CrossRefGoogle Scholar
  13. Cressie, N., & Holland, P.W. (1983). Characterizing the manifest probabilities of latent trait models.Psychometrika, 48, 129–141.Google Scholar
  14. Fischer, G.H. (1995). Derivations of the Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 15–38). New York, NY: Springer.Google Scholar
  15. Fox, J.P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling.Psychometrika, 66, 271–288.CrossRefGoogle Scholar
  16. Fraser, C. (1988).NOHARM: A computer program for fitting both unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: University of New England.Google Scholar
  17. Gelfand, A.E., & Smith, A.F.M. (1990). Sampling-based approaches to calculating marginal densities.Journal of the American Statistical Association, 85, 398–409.Google Scholar
  18. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman and Hall.Google Scholar
  19. Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution.Psychometrika, 53, 525–546.CrossRefGoogle Scholar
  20. Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests.Statistica Sinica, 8(1). 647–667.Google Scholar
  21. Glas, C.A.W. (1999). Modification indices for the 2-pl and the nominal response model.Psychometrika, 64, 273–294.CrossRefGoogle Scholar
  22. Glas C.A.W., & Ellis, J.L. (1993).RSP, Rasch scaling program, computer program and user's manual. Groningen: ProGAMMA.Google Scholar
  23. Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model.Psychometrika, 54, 635–659.CrossRefGoogle Scholar
  24. Glas, C.A.W., & Verhelst, N.D. (1995). Tests of fit for polytomous Rasch models. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 325–352). New York, NY: Springer.Google Scholar
  25. Glas, C.A.W., Wainer, H., & Bradlow, E.T. (2000). MML and EAP estimates for the testlet response model. In W.J. van der Linden & C.A.W. Glas (Eds.),Computer adaptive testing: Theory and practice (pp. 271–287). Boston MA: Kluwer-Nijhoff Publishing.Google Scholar
  26. Hoijtink, H., & Molenaar, I.W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks.Psychometrika, 62, 171–189.Google Scholar
  27. Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and uni-dimensionality in monotone latent variable models.Annals of Statistics, 14, 1523–1543.Google Scholar
  28. Junker, B. (1991). Essential independence and likelihood-based ability estimation for polytomous items.Psychometrika, 56, 255–278.CrossRefGoogle Scholar
  29. Kelderman, H. (1984). Loglinear RM tests.Psychometrika, 49, 223–245.CrossRefGoogle Scholar
  30. Kelderman, H. (1989). Item bias detection using loglinear IRT.Psychometrika, 54, 681–697.CrossRefGoogle Scholar
  31. Lawley, D.N. (1943). On problems connected with item selection and test construction.Proceedings of the Royal Society of Edinburgh, 61, 273–287.Google Scholar
  32. Lawley, D.N. (1944). The factorial analysis of multiple test items.Proceedings of the Royal Society of Edinburgh, Series A, 62, 74–82.Google Scholar
  33. Lord, F.M. (1952). A theory of test scores.Psychometric Monograph No. 7.Google Scholar
  34. Lord, F.M. (1953a). An application of confidence intervals and of maximum likelihood to the estimation of an examinee's ability.Psychometrika, 18, 57–75.CrossRefGoogle Scholar
  35. Lord, F.M. (1953b). The relation of test score to the trait underlying the test.Educational and Psychological Measurement, 13, 517–548.Google Scholar
  36. Lord, F.M., & Novick, M.R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  37. Lord, F.M., & Wingersky, M.S. (1984). Comparison of IRT true-score and equipercentile observed-score “equatings”.Applied Psychological Measurement, 8, 453–461.Google Scholar
  38. Martin-Löf, P. (1973).Statistika Modeller [Statistical models] (Anteckningar från seminarier Lasåret 1969–1970, utardeltade av Rolf Sunberg. Obetydligt ändrat nytryck, oktober 1973). Stockholm: Institutet för Försäkringsmatematik och Matematisk Statistik vid Stockholms Universitet.Google Scholar
  39. Martin Löf, P. (1974). The notion of redundancy and its use as a quantitative measure if the discrepancy between a statistical hypothesis and a set of observational data.Scandinavian Journal of Statistics, 1, 3–18.Google Scholar
  40. McDonald, R.P. (1967). Nonlinear factor analysis.Psychometric Monograph No. 15.Google Scholar
  41. McDonald, R.P. (1982). Linear versus nonlinear models in item response theory.Applied Psychological Measurement, 6, 379–396.Google Scholar
  42. McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden, & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 257–269). New York, NY: Springer.Google Scholar
  43. Mellenbergh, G.J. (1994). Generalized linear item response theory.Psychological Bulletin, 115, 300–307.CrossRefGoogle Scholar
  44. Meng, X.L., & Schilling, S.G. (1996). Fitting full-information item factor models and an empirical investigation of bridge sampling.Journal of the American Statistical Association, 91, 1254–1267.Google Scholar
  45. Mislevy, R.J. (1986). Bayes modal estimation in item response models.Psychometrika, 51, 177–195.CrossRefGoogle Scholar
  46. Mislevy, R.J., & Bock, R.D. (1990).PC-BILOG. Item analysis and test scoring with binary logistic models. Chicago, IL: Scientific Software International.Google Scholar
  47. Mislevy, R.J., & Wu, P.K. (1996).Missing responses and IRT ability estimation: Omits, choice, time limits and adaptive testing (ETS Research Reports RR-96-30-ONR). Princeton, NJ: Educational Testing Service.Google Scholar
  48. Molenaar, I.W. (1995). Estimation of item parameters. In G.H. Fischer & I.W. Molenaar (Eds.),Rasch models: Foundations, recent developments and applications (pp. 39–51). New York, NY: Springer.Google Scholar
  49. Patz, R.J., & Junker, B.W. (1999a). A straightforward approach to Markov chain Monte Carlo methods for item response models.Journal of Educational and Behavioral Statistics, 24, 146–178.Google Scholar
  50. Patz, R.J., & Junker, B.W. (1999b). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses.Journal of Educational and Behavioral Statistics, 24, 342–366.Google Scholar
  51. Reckase, M.D. (1985). The difficulty of test items that measure more than one ability.Applied Psychological Measurement, 9, 401–412.Google Scholar
  52. Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory (pp. 271–286). New York, NY: Springer.Google Scholar
  53. Rasch, G. (1977).On specific objectivity: An attempt at formalizing the request for generality and validity of scientific statements. In M. Blegvad (Ed.),The Danish yearbook of philosophy (pp. 58–94). Copenhagen: Munksgaard.Google Scholar
  54. Reiser, M. (1996). Analysis of residuals for the multinomial item response model.Psychometrika, 61, 509–528.Google Scholar
  55. Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory.Psychometrika, 49, 425–436.Google Scholar
  56. Rubin, D.B. (1976). Inference and missing data.Biometrika, 63, 581–592.Google Scholar
  57. Shi, J.Q., & Lee, S.Y. (1998). Bayesian sampling based approach for factor analysis models with continuous and polytomous data.British Journal of Mathematical and Statistical Psychology, 51, 233–252.Google Scholar
  58. Sijtsma, K. (1998). Methodology review: Nonparametric IRT approaches to the analysis of dichotomous item scores.Applied Psychological Measurement, 22, 3–32.Google Scholar
  59. Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality.Psychometrika, 52, 589–617.CrossRefGoogle Scholar
  60. Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.Psychometrika, 55, 293–326.Google Scholar
  61. Thurstone, L.L. (1947).Multiple factor analysis. Chicago, IL: University of Chicago Press.Google Scholar
  62. Wainer, H., Bradlow, E.T., & Du, Z. (2000). Testlet response theory: An analog for the 3pl model useful in testlet-based adaptive testing. In W.J. van der Linden & C.A.W. Glas (Eds.),Computerized adaptive testing: Theory and practice (pp. 245–269). Boston, MA: Kluwer Academic Publishers.Google Scholar
  63. Wilson, D.T., Wood, R., & Gibbons, R. (1991)TESTFACT: Test scoring, item statistics, and item factor analysis [Computer program]. Chicago, IL: Scientific Software International.Google Scholar
  64. Yen, W.M. (1981). Using simultaneous results to choose a latent trait model.Applied Psychological Measurement, 5, 245–262.Google Scholar
  65. Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model.Applied Psychological Measurement, 8, 125–145.Google Scholar
  66. Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.Google Scholar

Copyright information

© The Psychometric Society 2001

Authors and Affiliations

  • A. A. Béguin
    • 1
  • C. A. W. Glas
    • 1
  1. 1.University of TwenteThe Netherlands

Personalised recommendations