Abstract
We investigate the performance of three statistics, R 1, R 2 (Glas in Psychometrika 53:525–546, 1988), and M 2 (Maydeu-Olivares & Joe in J. Am. Stat. Assoc. 100:1009–1020, 2005, Psychometrika 71:713–732, 2006) to assess the overall fit of a one-parameter logistic model (1PL) estimated by (marginal) maximum likelihood (ML). R 1 and R 2 were specifically designed to target specific assumptions of Rasch models, whereas M 2 is a general purpose test statistic. We report asymptotic power rates under some interesting violations of model assumptions (different item discrimination, presence of guessing, and multidimensionality) as well as empirical rejection rates for correctly specified models and some misspecified models. All three statistics were found to be more powerful than Pearson’s X 2 against two- and three-parameter logistic alternatives (2PL and 3PL), and against multidimensional 1PL models. The results suggest that there is no clear advantage in using goodness-of-fit statistics specifically designed for Rasch-type models to test these models when marginal ML estimation is used.
Similar content being viewed by others
References
Agresti, A., & Yang, M. (1987). An empirical investigation of some effects of sparseness in contingency tables. Computational Statistics & Data Analysis, 5, 9–21.
Andersen, E.B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123–140.
Bartholomew, D.J., & Leung, S.O. (2002). A goodness of fit test for sparse 2p contingency tables. British Journal of Mathematical & Statistical Psychology, 55, 1–15.
Bartholomew, D., & Tzamourani, P. (1999). The goodness of fit of latent trait models in attitude measurement. Sociological Methods & Research, 27, 525–546.
Bock, R.D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: application of an EM algorithm. Psychometrika, 46, 443–459.
Bock, R.D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items. Psychometrika, 35, 179–197.
Cai, L., Maydeu-Olivares, A., Coffman, D.L., & Thissen, D. (2006). Limited information goodness of fit testing of item response theory models for sparse 2p tables. British Journal of Mathematical & Statistical Psychology, 59, 173–194.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5–32.
De Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch models. Journal of Educational and Behavioral Statistics, 11, 183–196.
Fischer, G.H. & Molenaar, I.W. (Eds.) (1995). Rasch models: foundations, recent developments and applications. New York: Springer.
Glas, C.A.W. (1988). The derivation of some tests for the Rasch model from the multinomial distribution. Psychometrika, 53, 525–546.
Glas, C.A.W., & Verhelst, N.D. (1989). Extensions of the partial credit model. Psychometrika, 54, 635–659.
Glas, C.A.W., & Verhelst, N.D. (1995). Testing the Rasch model. In G.H. Fischer & I.W. Molenaar (Eds.), Rasch models: foundations, recent developments and applications (pp. 69–96). New York: Springer.
Glas, C.A.W. (2009). Personal communication.
Irtel, H. (1995). An extension of the concept of specific objectivity. Psychometrika, 60, 115–118.
Joe, H., & Maydeu-Olivares, A. (2010). A general family of limited information goodness-of-fit statistics for multinomial data. Psychometrika, 75, 393–419.
Jöreskog, K.G. (1994). On the estimation of polychoric correlations and their asymptotic covariance matrix. Psychometrika, 59, 381–389.
Jöreskog, K.G., & Moustaki, I. (2001). Factor analysis of ordinal variables: a comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.
Koehler, K., & Larntz, K. (1980). An empirical investigation of goodness of fit statistics for sparse multidimensional tables. Journal of the American Statistical Association, 75, 336–344.
Kullback, S., & Leibler, R.A. (1951). On information and sufficiency. Annals of Mathematical Statistics, 22, 79–86.
Lord, F.M., & Novick, M.R. (1968). Statistical theories of mental test scores. Reading: Addison-Wesley.
Mathai, A.M., & Provost, S.B. (1992). Quadratic forms in random variables: theory and applications. New York: Marcel Dekker.
Maydeu-Olivares, A., & Joe, H. (2005). Limited and full information estimation and goodness-of-fit testing in 2n tables: a unified approach. Journal of the American Statistical Association, 100, 1009–1020.
Maydeu-Olivares, A., & Joe, H. (2006). Limited information goodness-of-fit in multidimensional contingency tables. Psychometrika, 71, 713–732.
Maydeu-Olivares, A., & Joe, H. (2008). An overview of limited information goodness-of-fit testing in multidimensional contingency tables. In K. Shigemasu, A. Okada, T. Imaizumi, & T. Hoshino (Eds.), New trends in psychometrics (pp. 253–262). Tokyo: Universal Academy Press.
Maydeu-Olivares, A., & Liu, Y. (2012). Item diagnostics in multivariate discrete data. Manuscript under review.
Mavridis, D., Moustaki, I., & Knott, M. (2007). Goodness-of-fit measures for latent variable models for binary data. In S.-Y. Lee (Ed.), Handbook of latent variables and related models (pp. 135–162). Amsterdam: Elsevier.
McDonald, R.P. (1999). Test theory: a unified treatment. Mahwah: Lawrence Erlbaum.
Montaño, R. (2009). Una comparación de las estadísticas de bondad de ajuste R 1 y M 2 para modelos de la Teoría de Respuesta al Ítem [Comparing the R 1 and M 2 statistics for goodness of fit assessment in IRT models]. Unpublished Ph.D. dissertation, University of Barcelona.
Pfanzagel, J. (1993). A case of asymptotic equivalence between conditional and marginal maximum likelihood estimators. Journal of Statistical Planning and Inference, 35, 301–307.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Paedagogiske Institut.
Reiser, M. (1996). Analysis of residuals for the multinomial item response model. Psychometrika, 61, 509–528.
Reiser, M. (2008). Goodness-of-fit testing using components based on marginal frequencies of multinomial data. British Journal of Mathematical & Statistical Psychology, 61, 331–360.
Satorra, A., & Saris, W.E. (1985). Power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90.
Suárez-Falcon, J.C., & Glas, C.A.W. (2003). Evaluation of global testing procedure for item fit to the Rasch model. British Journal of Mathematical & Statistical Psychology, 56, 127–143.
Swaminathan, H., Hambleton, R.K., & Rogers, H.J. (2007). Assessing the fit of item response models. In C.R. Rao & S. Sinharay (Eds.), Handbook of statistics: Vol. 26. Psychometrics (pp. 683–718). Amsterdam: Elsevier.
Teugels, J.L. (1990). Some representations of the multivariate Bernoulli and binomial distributions. Journal of Multivariate Analysis, 32, 256–268.
Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic models. Psychometrika, 47, 175–186.
van den Wollenberg, A.L. (1982). Two new test statistics for the Rasch model. Psychometrika, 47, 123–139.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported by an ICREA-Academia Award and Grant SGR 2009 74 from the Catalan Government, and by Grants PSI2009-07726 and PR2010-0252 from the Spanish Ministry of Education awarded to the first author, and by a Dissertation Research Award of the Society of Multivariate Experimental Psychology awarded to the second author. The authors are indebted to the reviewers and to David Thissen for comments that improved the manuscript.
Appendix
Appendix
Relationship π R2=T R2 π for n=4 items
Rights and permissions
About this article
Cite this article
Maydeu-Olivares, A., Montaño, R. How Should We Assess the Fit of Rasch-Type Models? Approximating the Power of Goodness-of-Fit Statistics in Categorical Data Analysis. Psychometrika 78, 116–133 (2013). https://doi.org/10.1007/s11336-012-9293-1
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11336-012-9293-1