Differential Item Functioning in Diagnostic Classification Models

  • Xue-Lan QiuEmail author
  • Xiaomin Li
  • Wen-Chung Wang
Part of the Methodology of Educational Measurement and Assessment book series (MEMA)


Assessment of differential item functioning (DIF) in diagnostic classification models (DCMs) has begun to attract research attention. In previous studies, authors found that DIF detection in DCMs appeared to be very powerful even when most or all the items on the studied test had DIF and no scale purification was necessary. This surprisingly good result was built on studies that made the unrealistic assumption of equality of the model and the Q-matrix across groups. The present study clarifies these weaknesses in previous studies, identifies various types of DIF, and proposes new DIF detection methods that are powerful in detecting DIF in DCMs. An illustrative simulation study was conducted to demonstrate the feasibility and advantages of the new methods. Finally, conclusions and suggestions for future studies are provided.


Differential item functioning Scale purification DIF-free-then-DIF strategy Mantel-Haenszel method 



This study was sponsored by the General Research Fund, Research Grants Council (No. 18604515).


  1. Bozard, J. (2010). Invariance testing in diagnostic classification models (Unpublished master thesis). Athens, Georgia: The University of Georgia.Google Scholar
  2. Chen, C.-T., & Hwu, B.-S. (2017). Improving the assessment of differential item functioning in large-scale programs with dual-scale purification of Rasch models: The PISA example. Applied Psychological Measurement, 1–15.
  3. Chen, J.-H., Chen, C.-T., & Shih, C.-L. (2014). Improving the control of type I error rate in assessing differential item functioning for hierarchical generalized linear model when impact is presented. Applied Psychological Measurement, 38, 18–36. CrossRefGoogle Scholar
  4. Chiu, C.-Y., Douglas, J. A., & Li, X. (2009). Cluster analysis for cognitive diagnosis: Theory and applications. Psychometrika, 74, 633–665.CrossRefGoogle Scholar
  5. de la Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179–199.CrossRefGoogle Scholar
  6. de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333–353.CrossRefGoogle Scholar
  7. de la Torre, J., & Lee, Y. S. (2010). A note on the invariance of the DINA model parameters. Journal of Educational Measurement, 47, 115–127.CrossRefGoogle Scholar
  8. French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373–393. CrossRefGoogle Scholar
  9. Hartz, S. M. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation. Urbana-Champaign, IL: University of Illinois at Urbana-Champaign.Google Scholar
  10. Henson, R., Templin, J., & Willse, J. (2009). Defining a family of cognitive diagnosis models using log-linear models with latent variables. Psychometrika, 74, 191–201. CrossRefGoogle Scholar
  11. Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale, NJ: Erlbaum.Google Scholar
  12. Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  13. Hou, L., de la Torre, J., & Nandakumar, R. (2014). Differential item functioning assessment in cognitive diagnosis modeling: Applying Wald test to investigate DIF for DINA model. Journal of Educational Measurement, 1, 98–125. CrossRefGoogle Scholar
  14. Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258–272. CrossRefGoogle Scholar
  15. Kunina-Habenicht, O., Rupp, A. A., & Wilhelm, O. (2012). The impact of model misspecification on parameter estimation and item-fit assessment in log-linear diagnostic classification models. Journal of Educational Measurement, 49, 59–81. CrossRefGoogle Scholar
  16. Li, F. M. (2008). A modified higher-order DINA model for detecting differential item functioning and differential attribute functioning. Unpublished doctoral dissertation. Athens, Georgia: University of Georgia.Google Scholar
  17. Li, X., & Wang, W.-C. (2015). Assessment of differential item functioning under cognitive diagnosis models: The DINA model example. Journal of Educational Measurement, 52, 28–54. CrossRefGoogle Scholar
  18. Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  19. Madison, M. J., & Bradshaw, L. P. (2015). The effects of Q-matrix design on classification accuracy in the log-linear cognitive diagnosis model. Educational and Psychological Measurement, 75, 491–511. CrossRefGoogle Scholar
  20. Mantel, N., & Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of National Cancer Institute, 22, 719–748.Google Scholar
  21. Plummer, M. (2003, March). JAGS: A program for analysis of Bayesian graphical models using Gibbs sampling. Paper presented at the Proceedings of the 3rd International Workshop on Distributed Statistical Computing, Vienna, Austria.Google Scholar
  22. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests (Expanded edition). Chicago, IL: University of Chicago Press. (Original work published 1960).Google Scholar
  23. Rupp, A. A., & Templin, J. (2008). The effects of Q-matrix misspecification on parameter estimates and classification accuracy in the DINA model. Educational and Psychological Measurement, 68, 78–96. CrossRefGoogle Scholar
  24. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370. CrossRefGoogle Scholar
  25. Templin, J. L., & Henson, R. A. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287–305. CrossRefGoogle Scholar
  26. Thissen, D., Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. Braun (Eds.), Test Validity (pp. 147–169). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  27. von Davier, M. (2008). A general diagnostic model applied to language testing data. British Journal of Mathematical and Statistical Psychology, 61, 287–307.CrossRefGoogle Scholar
  28. von Davier, M. (2013). The DINA model as a constrained general diagnostic model: Two variants of a model equivalency. British Journal of Mathematical and Statistical Psychology, 67, 49–71.CrossRefGoogle Scholar
  29. von Davier, M. (2014). The log-linear cognitive diagnostic model (LCDM) as a special case of the general diagnostic model (GDM) (research report no. RR-14-40). Princeton, NJ: Educational Testing Service. Google Scholar
  30. Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221–261. CrossRefGoogle Scholar
  31. Wang, W.-C. (2008). Assessment of differential item functioning. Journal of Applied Measurement, 9, 387–408.Google Scholar
  32. Wang, W.-C., Shih, C.-L., & Sun, G.-W. (2012). The DIF-free-then-DIF strategy for the assessment of differential item functioning. Educational and Psychological Measurement, 72, 687–708. CrossRefGoogle Scholar
  33. Wang, W.-C., Shih, C.-L., & Yang, C.-C. (2009). The MIMIC method with scale purification for detecting differential item functioning. Educational and Psychological Measurement, 69, 713–731. CrossRefGoogle Scholar
  34. Wang, W.-C., & Su, Y.-H. (2004). Factors influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450–480. CrossRefGoogle Scholar
  35. Wright, B. D., & Stone, M. H. (1979). Best test design. Chicago, IL: MESA Press.Google Scholar
  36. Xu, G., & Zhang, S. (2016). Identifiability of diagnostic classification models. Psychometrika, 81, 625–649. CrossRefGoogle Scholar
  37. Zhang, W. (2006). Detecting differential item functioning using the DINA model. Unpublished doctoral dissertation. Carolina, NC: University of North Carolina at Greensboro.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.The University of Hong KongPokfulamHong Kong

Personalised recommendations