Quality & Quantity

, 42:739 | Cite as

Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel–Haenszel and logistic regression techniques

  • Aura-Nidia Herrera
  • Juana Gómez


In recent decades several methods have been developed for detecting differential item functioning (DIF), and many studies have aimed to identify both the conditions under which items may or may not be adequate and the factors which affect their power and Type I error. This paper describes a Monte Carlo experiment that was carried out in order to analyse the effect of reference group sample size, focal group sample size and the interaction of the two on the power and Type I error of the Mantel–Haenszel (MH) and Logistic regression (LR) procedures. The data were generated using a three-parameter logistic model, the design was fully-crossed factorial with 12 experimental conditions arising from the crossing of the two main factors, and the dependent variables were power and the rate of false positives calculated across 100 replications. The results enabled the significant factors to be identified and the two statistics to be compared. Practical recommendations are made regarding use of the procedures by psychologists interested in the development and analysis of psychological tests.


Differential item functioning DIF Mantel–Haenszel Logistic regression Sample size 


  1. 1.
    Bennet R.E., Rock D.A. and Kaplan B.A. (1987). SAT differential item performance for nine handicapped group. J. Educ. Meas. 24: 41–5 CrossRefGoogle Scholar
  2. 2.
    Camilli G. (1993). The case against item bias detection techniques based on internal criteria: do item bias procedures obscure test fairness issues?. In: Holland, P.W. and Wainer, H. (eds) Differential Item Functioning, pp 397–17. Lawrence Erlbaum Associates, Publishers, Hillsdale, New Jersey Google Scholar
  3. 3.
    Clauser, B.E.: Factors influencing the performance of the Mantel–Haenszel procedure in identifying differential item functioning. Unpublished doctoral dissertation, University of Massachusetts, Ambherst, MA (1993)Google Scholar
  4. 4.
    Clauser B.E., Mazor K.M. and Hambleton R.K. (1993). The effect of purification of the matching criterion on the identification of DIF using the Mantel–Haenszel procedure. Appl. Meas. in Educ. 6: 269–79 CrossRefGoogle Scholar
  5. 5.
    Cohen A.S., Kane M.T. and Kim S.-H. (2001). The precision of simulation study results. Appl. Psych. Meas. 25: 136–45 CrossRefGoogle Scholar
  6. 6.
    Donoghue J.R., Holland P.W. and Thayer D.T. (1993). A Monte Carlo study of factor that affect the Mantel–Haenszel and standardization measures of Differential item functioning. In: Holland, P.W. and Wainer, H. (eds) Differential Item Functioning, pp 137–66. Lawrence Erlbaum Associates, Publisher, Hillsdale, New Jersey Google Scholar
  7. 7.
    Fidalgo Á.M., Ferreres D. and Muñiz J. (2004). Utility of the Mantel–Haenszel ’procedure for detecting differential item functioning in small samples. Educ. Psychol. Meas. 64: 925–36 CrossRefGoogle Scholar
  8. 8.
    Fidalgo Á.M., Mellenbergh G.J. and Muñiz J. (2000). Effects of amount of DIF, test length and purification type on robustness and power of Mantel–Haenszel procedure. Meth. Psychol. Res. Online 5: 43–3 Google Scholar
  9. 9.
    Gómez Benito J., Hidalgo Montesinos M.D., Guilera Ferré G. and Moreno Torrente M. (2005). A bibliometric study of differential item functioning. Scientometrics 64: 3–6 CrossRefGoogle Scholar
  10. 10.
    Gómez-Benito J. and Hidalgo-Montesinos M.D. (2003). Desarrollos recientes en psicometría. Avances en Medición 1: 17–6 Google Scholar
  11. 11.
    Hambleton R.K., Clauser B.E., Mazor K.M. and Jones R. (1993). Advances in the detection of differentially functioning test items. Eur. J. Psychol. Assess. 9: 1–8 Google Scholar
  12. 12.
    Herrera A.N., Gómez Benito J. and Hidalgo-Montesinos M.D. (2005). Detección de sesgo en los ítems mediante análisis de tablas de contingencia. Avances en Medición 3: 29–2 Google Scholar
  13. 13.
    Hidalgo-Montesinos M.D. and Gómez-Benito J. (2003). Test purification and the evaluation of differential item functioning with multinomial logistic regression. Eur. J. psychol. assess. 19: 1–1 CrossRefGoogle Scholar
  14. 14.
    Hidalgo, M.D., Gómez, J.: Nonuniform DIF detection using discriminant logistic analysis and multinomial logistic regression: a comparison for politomous items. Qual. Quan. (in press).Google Scholar
  15. 15.
    Hills J. (1990). Screening for potentially biased items in testing programs. Educ. Meas.: Issues and Practice 8: 5–1 CrossRefGoogle Scholar
  16. 16.
    Holland P.W. and Thayer D.T. (1988). Differential item performance and Mantel–Haenszel procedure. In: Wainer, H. and Braun, H.I. (eds) Test Validity, pp 129–45. Erlbaum, Hillsdale NJ Google Scholar
  17. 17.
    Mazor K.M., Clauser B.E. and Hambleton R.K. (1992). The effect of sample size on the functioning of the Mantel–Haenszel statistic. Educ. Psychol. Meas. 52: 443–51 CrossRefGoogle Scholar
  18. 18.
    Miller M.D. and Oshima T.C. (1992). Effect of sample size, number of biased items and magnitude of bias on a two-stage item bias estimation method. Appl. Psych. Meas. 16: 381–88 CrossRefGoogle Scholar
  19. 19.
    Mislevy, R.J., Bock, R.D.: BILOG 3: item analysis and test scoring with binary logistic models [Computer program]. Scientific Software, Mooresville, IN (1990)Google Scholar
  20. 20.
    Muñiz J., Hambleton R.K. and Xing D. (2001). Small sample studies to detect flaws in item translations. Int. J. Test. 1: 115–35 CrossRefGoogle Scholar
  21. 21.
    Narayanan P. and Swaminathan H. (1994). Performance of the Mantel–Haenszel and simultaneous item bias procedures for detecting differential item functioning. Appl. Psych. Meas. 18: 315–28 CrossRefGoogle Scholar
  22. 22.
    Narayanan P. and Swaminathan H. (1996). Identification of items that show nonuniform DIF. Appl. Psych. Meas. 20: 257–74 CrossRefGoogle Scholar
  23. 23.
    Navas-Ara M.J. and Gómez-Benito J. (2002). Effects of ability scale purification on the identification of DIF. Euro. J. Psychol. Assess. 18: 9–5 CrossRefGoogle Scholar
  24. 24.
    Raju N. (1988). The area between two item characteristic curves. Psychometrika 53: 495–02 CrossRefGoogle Scholar
  25. 25.
    Rogers H.J. and Swaminathan H. (1993). A comparison of logistic regression and Mantel–Haenszel procedures for detecting differential item functioning. Appl. Psych. Meas. 17: 105–16 CrossRefGoogle Scholar
  26. 26.
    Roussos L.A. and Stout W.F. (1996). Simulation studies of the effects of small sample size an studied item parameters on SIBTEST and Mantel–Haenszel type I error performance. J. Educ. Meas. 33: 215–30 CrossRefGoogle Scholar
  27. 27.
    Shepard L.A., Camilli G. and Williams D.M. (1985). Validity of approximation techniques for detecting item bias. J. Educ. Meas. 22: 77–05 CrossRefGoogle Scholar
  28. 28.
    Swaminathan H. and Rogers H.J. (1990). Detecting differential item functioning using logistic regression procedures. J. Educ. Meas. 27: 361–70 CrossRefGoogle Scholar
  29. 29.
    Uttaro T. and Millsap R. (1994). Factors influencing the Mantel–Haenszel procedure in the detection of differential item functioning. Appl. Psychol. Meas. 18: 15–5 CrossRefGoogle Scholar
  30. 30.
    Waller N.G. (1998). EZDIF: detection of uniform and nonuniform differential item functioning with the Mantel–Haenszel and logistic regression procedures. Appl. Psychol. Meas. 22: 391 CrossRefGoogle Scholar
  31. 31.
    Zieky M.. (1993). Practical questions in the use of DIF statistics in test development. In: Holland, P.W. and Wainer, H. (eds) Differential Item Functioning, pp 337–47. Lawrence Erlbaum Associates, Publishers, Hillsdale, New Jersey Google Scholar

Copyright information

© Springer Science + Business Media B.V. 2007

Authors and Affiliations

  1. 1.Laboratorio de PsicometríaUniversidad Nacional de Colombia, Ciudad UniversitariaOficinaColombia
  2. 2.Departmento de Metodologia de les Ciences del Comportamento, Facultad de PsicologiaUniversitat de BarcelonaBarcelonaSpain

Personalised recommendations