Abstract
In this paper, it is shown that various violations of the 2-PL model and the nominal response model can be evaluated using the Lagrange multiplier test or the equivalent efficient score test. The tests presented here focus on violation of local stochastic independence and insufficient capture of the form of the item characteristic curves. Primarily, the tests are item-oriented diagnostic tools, but taken together, they also serve the purpose of evaluation of global model fit. A useful feature of Lagrange multiplier statistics is that they are evaluated using maximum likelihood estimates of the null-model only, that is, the parameters of alternative models need not be estimated. As numerical examples, an application to real data and some power studies are presented.
Similar content being viewed by others
References
Agresti, A., & Yang, M. (1987). An empirical investigation of some effects of sparseness in contingency tables.Computational Statistics and Data Analysis, 5, 9–21.
Aitchison, J., & Silvey, S.D. (1958). Maximum likelihood estimation of parameters subject to restraints.Annals of Mathematical Statistics, 29, 813–828.
Albert, J.H. (1992). Bayesian estimation of normal ogive item response functions using Gibbs sampling.Journal of Educational Statistics, 17, 251–269.
Andersen, E.B. (1973). A goodness of for test for the Rasch model.Psychometrika, 38, 123–140.
Andersen, E.B. (1985). Estimating latent correlations between repeated testings.Psychometrika, 50, 3–16.
Ando, A., & Kaufmann, O.M. (1965). Bayesian analysis of the independent normal process-neither mean nor precision known.Journal of the American Statistical Association, 60, 347–358.
Baker, F.B. (1998). An investigation of item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153–169.
Birnbaum, A. (1968). Some latent trait models. In F.M. Lord & M.R. Novick (Eds.),Statistical theories of mental test scores (pp. 395–479). Reading, MA: Addison-Wesley.
Bock, R.D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories.Psychometrika, 37, 29–51.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: an application of an EM-algorithm.Psychometrika, 46, 443–459.
Breusch, T.S., & Pagan, A.R. (1980). The Lagrange multiplier test and its applications to model specification in econometrics.Review of Economic Studies, 47, 239–254.
Buse, A. (1982). The likelihood ratio, Wald, and Lagrange multiplier tests: An expository note.The American Statistician, 36, 153–157.
Choppin, B. (1983).A two-parameter latent trait model (CSE report No. 197). Los Angeles, CA: University of California, Center for Study of Evaluation, Graduate School of Education.
de Leeuw, J., & Verhelst, N. D. (1986). Maximum likelihood estimation in generalized Rasch models.Journal of Educational Statistics, 11, 183–196.
Fischer, G.H. (1974).Einführung in die Theorie Psychologischer Tests [Introduction to the theory of psychological tests]. Bern: Huber.
Follmann, D. (1988). Consistent estimation in the Rasch model based on nonparametric margins.Psychometrika, 53, 553–562.
Gelman, A., Carlin, J.B., Stern, H.S., & Rubin, D.B. (1995).Bayesian data analysis. London: Chapman and Hall.
Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution.Psychometrika, 53, 525–546.
Glas, C.A.W. (1992). A Rasch model with a multivariate distribution of ability. In M. Wilson, (Ed.),Objective measurement: Theory into practice, Vol. 1. (pp.236–258) New Jersey: Ablex Publishing Co.
Glas, C.A.W. (1998). Detection of differential item functioning using Lagrange multiplier tests.Statistica Sinica, 8, 647–667.
Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model.Psychometrika, 54, 635–659.
Glas, C.A.W., & Verhelst, N.D. (1995). Tests of fit for polytomous Rasch models. In G. H. Fischer & I. W. Molenaar (Eds.).Rasch models. Their foundation, recent developments and applications. New York: Springer.
Grayson, D.A. (1988). Two-group classification in item response theory: Scores with monotone likelihood ratio.Psychometrika, 53, 383–392.
Hemker, B.T., Sijtsma, K., Molenaar, I.W. & Junker, B.W. (1996). Polytomous IRT models and monotone likelihood ratio of the total score.Psychometrika, 61, 679–693.
Holland, P.W., & Rosenbaum, P.R. (1986). Conditional association and unidimensionality in monotone latent variable models.Annals of Statistics, 14, 1523–1543.
Huynh, H. (1994). A new proof for monotone likelihood ratio for the sum of independent bernoulli random variables.Psychometrika, 59, 77–79.
Jannarone, R.J. (1986). Conjunctive item response theory kernels.Psychometrika, 51, 357–373.
Junker, B. (1991). Essential independence and likelihood-based ability estimation for polytomous items.Psychometrika, 56, 255–278.
Kelderman, H. (1984). Loglinear Rasch model tests.Psychometrika, 49, 223–245.
Kelderman, H. (1989). Item bias detection using loglinear IRT.Psychometrika, 54, 681–697.
Koehler, K. (1986). Goodness-of-fit tests for loglinear models in sparse contingency tables.Journal of the American Statistical Association, 81, 483–493.
Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness-of-fit statistics for sparse multinomials.Journal of the American Statistical Association, 75, 336–344.
Larntz, K. (1978). Small-sample comparison of exact levels for goodness-of-fit statistics.Journal of the American Statistical Association, 73, 253–263.
Louis, T.A. (1982). Finding the observed information matrix when using the EM algorithm.Journal of the Royal Statistical Society, Series B, 44, 226–233.
Lord, F.M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ, Erlbaum.
Martin-Löf, P. (1973).Statistika Modeller. Anteckningar från seminarier Lasåret 1969–1970, utardeltade av Rolf Sunberg. Obetydligt ändrat nytryck, oktober 1973. Stockholm: Institutet för Försäkringsmatematik och Matematisk Statistik vid Stockholms Universitet.
Martin Löf, P. (1974). The notion of redundancy and its use as a quantitative measure if the discrepancy between a statistical hypothesis and a set of observational data.Scandinavian Journal of Statistics, 1, 3–18.
McDonald, R.P. (1967). Nonlinear factor analysis.Psychometric monographs, No.15.
McDonald, R.P. (1997). Normal-ogive multidimensional model. In W.J. van der Linden & R.K. Hambleton (Eds.),Handbook of modern item response theory, (pp. 257–269). New York: Springer.
Mislevy, R.J. (1986). Bayes modal estimation in item response models.Psychometrika, 51, 177–195.
Mislevy, R.J., & Bock, R.D. (1990).PC-Bilog. Item analysis and test scoring with binary logistic models. Chicago: Scientific Software International.
Molenaar, I.W. (1983). Some improved diagnostics for failure in the Rasch model.Psychometrika, 48, 49–72.
Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm.Applied Psychological Measurement, 16, 159–176.
Patz, R.J. & Junker, B.W. (1997).Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses (Technical Report No. 670). Pittsburgh: Carnegie Mellon University, Department of Statistics.
Rao, C.R. (1947). Large sample tests of statistical hypothesis concerning several parameters with applications to problems of estimation.Proceedings of the Cambridge Philosophical Society, 44, 50–57.
Reckase, M.D. (1985). The difficulty of test items that measure more than one ability.Applied Psychological Measurement, 9, 401–412.
Reckase, M.D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W.J. van der Linden & R. K. Hambleton (Eds.),Handbook of modern item response theory (pp. 271–286). New York: Springer.
Reiser, M. (1996). Analysis of residuals for the multinomial item response model.Psychometrika, 61, 509–528.
Rosenbaum, P.R. (1984). Testing the conditional independence and monotonicity assumptions of item response theory.Psychometrika, 49, 425–436.
Rubin, D.B. (1976). Inference and missing data.Biometrika, 63, 581–592.
Stout, W.F. (1987). A nonparametric approach for assessing latent trait dimensionality.Psychometrika, 52, 589–617.
Stout, W.F. (1990). A new item response theory modeling approach with applications to unidimensional assessment and ability estimation.Psychometrika, 55, 293–326.
Thissen, D. (1991).MULTILOG. Multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software International.
Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models.Psychometrika, 51, 567–577.
Yen, W.M. (1981). Using simultaneous results to choose a latent trait model.Applied Psychological Measurement, 5, 245–262.
Yen, W.M. (1984). Effects of local item dependence on the fit and equating performance of the three-parameter logistic model.Applied Psychological Measurement, 8, 125–145.
Zimowski, M.F., Muraki, E., Mislevy, R.J., & Bock, R.D. (1996).Bilog MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago: Scientific Software International.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Glas, C.A.W. Modification indices for the 2-PL and the nominal response model. Psychometrika 64, 273–294 (1999). https://doi.org/10.1007/BF02294296
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF02294296