Score-Based Tests of Differential Item Functioning via Pairwise Maximum Likelihood Estimation
- 436 Downloads
Measurement invariance is a fundamental assumption in item response theory models, where the relationship between a latent construct (ability) and observed item responses is of interest. Violation of this assumption would render the scale misinterpreted or cause systematic bias against certain groups of persons. While a number of methods have been proposed to detect measurement invariance violations, they typically require advance definition of problematic item parameters and respondent grouping information. However, these pieces of information are typically unknown in practice. As an alternative, this paper focuses on a family of recently proposed tests based on stochastic processes of casewise derivatives of the likelihood function (i.e., scores). These score-based tests only require estimation of the null model (when measurement invariance is assumed to hold), and they have been previously applied in factor-analytic, continuous data contexts as well as in models of the Rasch family. In this paper, we aim to extend these tests to two-parameter item response models, with strong emphasis on pairwise maximum likelihood. The tests’ theoretical background and implementation are detailed, and the tests’ abilities to identify problematic item parameters are studied via simulation. An empirical example illustrating the tests’ use in practice is also provided.
Keywordspairwise maximum likelihood score-based test item response theory differential item functioning
- De Ayala, R. J. (2009). The theory and practice of item response theory. New York: Guilford Press.Google Scholar
- Doolaard, S. (1999). Schools in change or schools in chains. Unpublished doctoral dissertation, University of Twente, The NetherlandsGoogle Scholar
- Dorans, N. J. (2004). Using subpopulation invariance to assess test score equity. Journal of Educational Measurement, 41(1), 43–68. https://doi.org/10.1111/j.1745-3984.2004.tb01158.x.CrossRefGoogle Scholar
- Glas, C. A. W. (1998). Detection of differential item functioning using Lagrange multiplier tests. Statistica Sinica, 8(3), 647–667.Google Scholar
- Glas, C. A. W. (2010). Testing fit to IRT models for polytomously scored items. In M. L. Nering & R. Ostini (Eds.), Handbook of polytomous item response theory models (pp. 185–210). New York, NY: Routledge.Google Scholar
- Glas, C. A. W. (2015). Item response theory models in behavioral social science: Assessment of fit. Wiley StatsRef: Statistics Reference Online. https://doi.org/10.1002/9781118445112.stat06436.pub2.
- Glas, C. A. W., & Jehangir, K. (2014). Modeling country-specific differential item functioning. In L. Rutkowski, M. von Davier, & D. Rutkowski (Eds.), Handbook of international large-scale assessment: Background, technical issues, and methods of data analysis (pp. 97–115). Boca Raton, FL: Chapman and Hall/CRC. https://doi.org/10.1111/jedm.12095.Google Scholar
- Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Routledge.Google Scholar
- Magis, D., Beland, S., & Raiche, G. (2015). difR: Collection of methods to detect dichotomous differential item functioning (DIF) [Computer software manual]. (R package version 4.6). https://doi.org/10.3758/brm.42.3.847.
- Millsap, R. E. (2005). Four unresolved problems in studies of factorial invariance. In A. Maydeu-Olivares & J. J. McArdle (Eds.), Contemporary psychometrics (pp. 153–171). Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
- R Core Team. (2017). R: A language and environment for statistical computing [Computer software manual]. Vienna, Austria. Retrieved from http://www.R-project.org/.
- Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361–370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x.CrossRefGoogle Scholar
- Zeileis, A., Leisch, F., Hornik, K., & Kleiber, C. (2002). strucchange: An R package for testing structural change in linear regression models: An R package for testing structural change in linear regression models. Journal of Statistical Software, 7(2), 1–38. https://doi.org/10.18637/jss.v007.i02.CrossRefGoogle Scholar