, Volume 58, Issue 2, pp 159–194 | Cite as

A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF

  • Robin Shealy
  • William Stout


A model-based modification (SIBTEST) of the standardization index based upon a multidimensional IRT bias modeling approach is presented that detects and estimates DIF or item bias simultaneously for several items. A distinction between DIF and bias is proposed. SIBTEST detects bias/DIF without the usual Type 1 error inflation due to group target ability differences. In simulations, SIBTEST performs comparably to Mantel-Haenszel for the one item case. SIBTEST investigates bias/DIF for several items at the test score level (multiple item DIF called differential test functioning: DTF), thereby allowing the study of test bias/DIF, in particular bias/DIF amplification or cancellation and the cognitive bases for bias/DIF.

Key words

item bias test bias DIF differential test functioning DTF SIB SIBTEST standardization simultaneous items bias valid subtest bias/DIF Mantel-Haenszel 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Ackerman, T. (1992).A didactic explanation of item bias, item impact, and item validity from a multidimensional IRT perspective.Journal of Educational Measurement, 29, 67–91.Google Scholar
  2. Ackerman, T. (1992, April).Assessing construct validity using multidimensional item response theory. Paper presented at the 1992 AERA/NCME joint meeting, San Francisco, CA.Google Scholar
  3. Ansley, T. N., & Forsyth, R. A. (1985). An examination of the characteristics of unidimensional IRT parameter estimates derived from two-dimensional data.Applied Psychological Measurement, 9, 37–48.Google Scholar
  4. Dorans, N. J. (1992, November).Implications in choice of metric for DIF effect size on decisions about DIF. Paper presented at the 1991 International Symposium on Modern Theories in Measurement, Montebello, Quebec.Google Scholar
  5. Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test.Journal of Educational Measurement, 23, 355–368.Google Scholar
  6. Drasgow, F. (1987). A study of measurement bias of two standard psychological tests.Journal of Applied Psychology, 72, 19–30.Google Scholar
  7. Fraser, C. (1983).NOHARM II, A Fortran program for fitting unidimensional and multi-dimensional normal ogive models of latent trait theory (Technical Report). University of New England, Australia.Google Scholar
  8. Hambleton, R. K., & Rogers, H. J. (1989). Detecting potentially biased test items: Comparison of IRT area and Mantel-Haenszel methods.Applied Measurement in Education, 2, 313–334.Google Scholar
  9. Hambleton, R. K., & Swaminanthan, H. (1985).Item response theory: Principles and applications. Boston: Kluwer-Nijhoff Publishing.Google Scholar
  10. Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.),Test validity (pp. 129–145). Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  11. Kok, F. (1988). Item bias and test multidimensionality. In R. Langeheine & J. Rost (Eds.),Latent trait and latent models (pp. 263–275). New York: Plenum Press.Google Scholar
  12. Lautenschlager, G., & Park, D. (1988) IRT item bias detection procedures: Issues of model mis-specification, robustness, and parameter linking.Applied Psychological Measurement, 12, 365–376.Google Scholar
  13. Linn, R., Levine, M., Hastings, C., & Wardrop, J. (1981). Item bias on a test of reading comprehension.Applied Psychological Measurement, 5, 159–173.Google Scholar
  14. Lord, F. M. (1980).Applications of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  15. Lord, F. M., & Novick, M. R. (1968).Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar
  16. Mellenbergh, G. J. (1982). Contingency table methods for assessing item bias.Journal of Educational Statistics, 7, 105–118.Google Scholar
  17. Meredith, W., & Millsap, R. E. (1992). On the misuse of manifest variables in the detection of measurement bias.Psychometrika, 57, 289–311.Google Scholar
  18. Millsap, R. E., & Meredith, W. (1989, July).The detection of DIF: Why there is no free lunch. Paper presented at the Annual Meeting of the Psychometric Society, University of California at Los Angeles.Google Scholar
  19. Mislevy, R. J., & Bock, R. D. (1984).Item operating characteristics of the Armed Services Aptitude Battery (ASVAB). Form 8A. (Tech. Rep. N00014-83-C-0283). Washington, DC: Office of Naval Research.Google Scholar
  20. Nandakumar, R. (in press).Simultaneous DIF amplification and cancellation: Shealy-Stout's test for DIF. Journal of Educational Measurement.Google Scholar
  21. Raju, N. S., van der Linden, W. J., & Fleer, P. J. (1992, April).An IRT-based internal measure of test bias with applications for differential item functioning. Paper presented at the 1992 AERA meeting, San Francisco, CA.Google Scholar
  22. Reckase, M. D. (1992, April).Mathematics test item formats versus the skill being assessed: A brief review. Paper presented at the 1992 NCME Meeting, San Francisco, CA.Google Scholar
  23. Roussos, L. (1993).Simulation studies of effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenzel Type 1 error performance (Technical Report). Champaign, IL: University of Illinois.Google Scholar
  24. Shealy, R. T. (1989).An item response theory-based statistical procedure for detecting concurrent internal bias in ability tests. Unpublished doctoral dissertation, Department of Statistics, University of Illinois, Urbana-Champaign.Google Scholar
  25. Shealy, R. T., & Stout, W. F. (1991a).An item response theory model for test bias (Tech. Rep. 1991-#2). Washington, DC: Office of Naval Research.Google Scholar
  26. Shealy, R. T., & Stout, W. F. (1991b).A procedure to detect test bias present simultaneously in several items (Tech. Rep. 1991-#3). Washington, DC: Office of Naval Research.Google Scholar
  27. Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In (by invitation) P. Holland & H. Wainer (Eds.),Differential item functioning (pp. 197–240). Hillsdale, NJ: Erlbaum.Google Scholar
  28. Stout, W. F. (1987) A nonparametric approach for assessing latent trait unidimensionality.Psychometrika, 52, 589–617.Google Scholar
  29. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures.Journal of Educational Measurement, 27, 361–370.Google Scholar
  30. Wainer, H. (1993). Model-based standardized measurement of an item's differential impact. In P. Holland & H. Wainer (Eds.),Differential item functioning: theory and practice (pp. 123–136). Hillsdale, NJ: Erlbaum.Google Scholar
  31. Zwick, R. (1990). When do item response function and Mantel-Haenszel definitions of differential item functioning coincide?Journal of Educational Statistics, 15, 185–197.Google Scholar

Copyright information

© The Psychometric Society 1993

Authors and Affiliations

  • Robin Shealy
    • 1
  • William Stout
    • 1
  1. 1.Department of StatisticsUniversity of Illinois at Urbana-ChampaignUSA

Personalised recommendations