Psychometrika

, Volume 64, Issue 3, pp 273–294

Modification indices for the 2-PL and the nominal response model

  • Cees A. W. Glas
Article

Abstract

In this paper, it is shown that various violations of the 2-PL model and the nominal response model can be evaluated using the Lagrange multiplier test or the equivalent efficient score test. The tests presented here focus on violation of local stochastic independence and insufficient capture of the form of the item characteristic curves. Primarily, the tests are item-oriented diagnostic tools, but taken together, they also serve the purpose of evaluation of global model fit. A useful feature of Lagrange multiplier statistics is that they are evaluated using maximum likelihood estimates of the null-model only, that is, the parameters of alternative models need not be estimated. As numerical examples, an application to real data and some power studies are presented.

Key words

efficient score test generalized partial credit model item response theory model fit modification indices 2-parameter logistic model nominal response model Lagrange multiplier test 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agresti, A., & Yang, M. (1987). An empirical investigation of some effects of sparseness in contingency tables.Computational Statistics and Data Analysis, 5, 9–21.Google Scholar
  2. Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints.Annals of Mathematical Statistics, 29, 813–828.Google Scholar
  3. Albert, J.H. (1992). Bayesian estimation of normal ogive item response functions using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.Google Scholar
  4. Andersen, E.B. (1973). A goodness of for test for the Rasch model.Psychometrika, 38, 123–140.Google Scholar
  5. Andersen, E.B. (1985). Estimating latent correlations between repeated testings.Psychometrika, 50, 3–16.Google Scholar
  6. Ando, A., & Kaufmann, O.M. (1965). Bayesian analysis of the independent normal process-neither mean nor precision known.Journal of the American Statistical Association, 60, 347–358.Google Scholar
  7. Baker, F.B. (1998). An investigation of item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153–169.Google Scholar
  8. Birnbaum, A. (1968). Some latent trait models. In F.M. Lord & M.R. Novick (Eds.),Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.Google Scholar
  9. Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories.Psychometrika, 37, 29–51.Google Scholar
  10. Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: an application of an EM-algorithm.Psychometrika, 46, 443–459.Google Scholar
  11. Breusch, T.S., & Pagan, A.R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics.Review of Economic Studies, 47, 239–254.Google Scholar
  12. Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository note.The American Statistician, 36, 153–157.Google Scholar
  13. Choppin, B. (1983).A two-parameter latent trait model (CSE report No. 197). Los Angeles, CA: University of California, Center for Study of Evaluation, Graduate School of Education.Google Scholar
  14. de Leeuw, J., & Verhelst, N. D. (1986). Maximum likelihood estimation in generalized Rasch models.Journal of Educational Statistics, 11, 183–196.Google Scholar
  15. Fischer, G.H. (1974).Einführung in die Theorie Psychologischer Tests [Introduction to the theory of psychological tests]. Bern: Huber.Google Scholar
  16. Follmann, D. (1988). Consistent estimation in the Rasch model based on nonparametric margins.Psychometrika, 53, 553–562.Google Scholar
  17. Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman and Hall.Google Scholar
  18. Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution.Psychometrika, 53, 525–546.Google Scholar
  19. Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of ability. In M. Wilson, (Ed.),Objective measurement: Theory into practice, Vol. 1. (pp.236–258) New Jersey: Ablex Publishing Co.Google Scholar
  20. Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests.Statistica Sinica, 8, 647–667.Google Scholar
  21. Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model.Psychometrika, 54, 635–659.Google Scholar
  22. Glas, C.A.W., & Verhelst, N.D. (1995). Tests of fit for polytomous Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.).Rasch models. Their foundation, recent developments and applications. New York: Springer.Google Scholar
  23. Grayson, D.A. (1988). Two-group classification in item response theory: Scores with monotone likelihood ratio.Psychometrika, 53, 383–392.Google Scholar
  24. Hemker, B.T., Sijtsma, K., Molenaar, I.W. & Junker, B.W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score.Psychometrika, 61, 679–693.Google Scholar
  25. Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and unidimensionality in monotone latent variable models.Annals of Statistics, 14, 1523–1543.Google Scholar
  26. Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent bernoulli random variables.Psychometrika, 59, 77–79.Google Scholar
  27. Jannarone, R.J. (1986). Conjunctive item response theory kernels.Psychometrika, 51, 357–373.Google Scholar
  28. Junker, B. (1991). Essential independence and likelihood-based ability estimation for polytomous items.Psychometrika, 56, 255–278.Google Scholar
  29. Kelderman, H. (1984). Loglinear Rasch model tests.Psychometrika, 49, 223–245.Google Scholar
  30. Kelderman, H. (1989). Item bias detection using loglinear IRT.Psychometrika, 54, 681–697.Google Scholar
  31. Koehler, K. (1986). Goodness-of-fit tests for loglinear models in sparse contingency tables.Journal of the American Statistical Association, 81, 483–493.Google Scholar
  32. Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials.Journal of the American Statistical Association, 75, 336–344.Google Scholar
  33. Larntz, K. (1978). Small-sample comparison of exact levels for goodness-of-fit statistics.Journal of the American Statistical Association, 73, 253–263.Google Scholar
  34. Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm.Journal of the Royal Statistical Society, Series B, 44, 226–233.Google Scholar
  35. Lord, F.M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ, Erlbaum.Google Scholar
  36. Martin-Löf, P. (1973).Statistika Modeller. Anteckningar från seminarier Lasåret 1969–1970, utardeltade av Rolf Sunberg. Obetydligt ändrat nytryck, oktober 1973. Stockholm: Institutet för Försäkringsmatematik och Matematisk Statistik vid Stockholms Universitet.Google Scholar
  37. Martin Löf, P. (1974). The notion of redundancy and its use as a quantitative measure if the discrepancy between a statistical hypothesis and a set of observational data.Scandinavian Journal of Statistics, 1, 3–18.Google Scholar
  38. McDonald, R.P. (1967). Nonlinear factor analysis.Psychometric monographs, No.15.Google Scholar
  39. McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory, (pp. 257–269). New York: Springer.Google Scholar
  40. Mislevy, R.J. (1986). Bayes modal estimation in item response models.Psychometrika, 51, 177–195.Google Scholar
  41. Mislevy, R.J., & Bock, R.D. (1990).PC-Bilog. Item analysis and test scoring with binary logistic models. Chicago: Scientific Software International.Google Scholar
  42. Molenaar, I.W. (1983). Some improved diagnostics for failure in the Rasch model.Psychometrika, 48, 49–72.Google Scholar
  43. Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm.Applied Psychological Measurement, 16, 159–176.Google Scholar
  44. Patz, R.J. & Junker, B.W. (1997).Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses (Technical Report No. 670). Pittsburgh: Carnegie Mellon University, Department of Statistics.Google Scholar
  45. Rao, C.R. (1947). Large sample tests of statistical hypothesis concerning several parameters with applications to problems of estimation.Proceedings of the Cambridge Philosophical Society, 44, 50–57.Google Scholar
  46. Reckase, M.D. (1985). The difficulty of test items that measure more than one ability.Applied Psychological Measurement, 9, 401–412.Google Scholar
  47. Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W.J. van der Linden & R. K. Hambleton (Eds.),Handbook of modern item response theory (pp. 271–286). New York: Springer.Google Scholar
  48. Reiser, M. (1996). Analysis of residuals for the multinomial item response model.Psychometrika, 61, 509–528.Google Scholar
  49. Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory.Psychometrika, 49, 425–436.Google Scholar
  50. Rubin, D.B. (1976). Inference and missing data.Biometrika, 63, 581–592.Google Scholar
  51. Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality.Psychometrika, 52, 589–617.Google Scholar
  52. Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.Psychometrika, 55, 293–326.Google Scholar
  53. Thissen, D. (1991).MULTILOG. Multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software International.Google Scholar
  54. Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models.Psychometrika, 51, 567–577.Google Scholar
  55. Yen, W.M. (1981). Using simultaneous results to choose a latent trait model.Applied Psychological Measurement, 5, 245–262.Google Scholar
  56. Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model.Applied Psychological Measurement, 8, 125–145.Google Scholar
  57. Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago: Scientific Software International.Google Scholar

Copyright information

© The Psychometric Society 1999

Authors and Affiliations

  • Cees A. W. Glas
    • 1
  1. 1.Department of educational measurement and data analysisUniversity of TwenteThe Netherlands

Personalised recommendations