, Volume 83, Issue 2, pp 376–386 | Cite as

Improving the Crossing-SIBTEST Statistic for Detecting Non-uniform DIF

  • R. Philip ChalmersEmail author


This paper demonstrates that, after applying a simple modification to Li and Stout’s (Psychometrika 61(4):647–677, 1996) CSIBTEST statistic, an improved variant of the statistic could be realized. It is shown that this modified version of CSIBTEST has a more direct association with the SIBTEST statistic presented by Shealy and Stout (Psychometrika 58(2):159–194, 1993). In particular, the asymptotic sampling distributions and general interpretation of the effect size estimates are the same for SIBTEST and the new CSIBTEST. Given the more natural connection to SIBTEST, it is shown that Li and Stout’s hypothesis testing approach is insufficient for CSIBTEST; thus, an improved hypothesis testing procedure is required. Based on the presented arguments, a new chi-squared-based hypothesis testing approach is proposed for the modified CSIBTEST statistic. Positive results from a modest Monte Carlo simulation study strongly suggest the original CSIBTEST procedure and randomization hypothesis testing approach should be replaced by the modified statistic and hypothesis testing method.


DIF non-uniform DIF bidirectional bias SIBTEST Crossing-SIBTEST 


  1. Chalmers, R. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29. doi: 10.18637/jss.v048.i06.CrossRefGoogle Scholar
  2. Chalmers, R. P. (2016). SimDesign: Structure for organizing Monte Carlo simulation designs. R package version 1.0.
  3. Chalmers, R. P., Counsell, A., & Flora, D. B. (2016). It might not make a big DIF: Improved differential test functioning statistics that account for sampling variability. Educational and Psychological Measurement, 76(1), 114–140. doi: 10.1177/0013164415584576.CrossRefGoogle Scholar
  4. Chang, H.-H., Mazzeo, J., & Roussos, L. (1996). DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33(3), 333–353.CrossRefGoogle Scholar
  5. Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23(4), 355–368.CrossRefGoogle Scholar
  6. Edgington, E. S. (1987). Randomization tests. New York, NY: Maecel Dekker.Google Scholar
  7. Guttman, L. (1945). A basis for analyzing test–retest reliability. Psychometrika, 10, 255–282.CrossRefPubMedGoogle Scholar
  8. Kuder, G. F., & Richardson, M. W. (1937). The theory of the estimation of test reliability. Psychometrika, 2, 151–160.CrossRefGoogle Scholar
  9. Li, H.-H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647–677.CrossRefGoogle Scholar
  10. Lord, F. M., & Novick, M. R. (1968). Statistical theory of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  11. Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detect test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159–194.CrossRefGoogle Scholar
  12. Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24(3), 136–156. doi: 10.1080/10691898.2016.1246953.CrossRefGoogle Scholar
  13. Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Copyright information

© The Psychometric Society 2017

Authors and Affiliations

  1. 1.Department of Educational PsychologyThe University of GeorgiaAthensUSA

Personalised recommendations