Skip to main content

A Comparison of Differential Item Functioning (DIF) Detection for Dichotomously Scored Items Using IRTPRO, BILOG-MG, and IRTLRDIF

  • Conference paper
Quantitative Psychology Research

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 140))

Abstract

This study was designed to provide an empirical comparison of three IRT calibration programs, IRTPRO, BILOG-MG, and IRTLRDIF, all of which can be used for detecting differential item functioning (DIF). The three programs were compared for each of three dichotomous IRT models, the one-parameter logistic, the two-parameter logistic, and the three-parameter logistic models. Results from each of these programs were examined using data from a test designed to predict high school graduation test results in a large Southeastern US state. Results suggested that all three programs detected DIF differently.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Baker, F. B., & Kim, S.-H. (2004). Item response theory—Parameter estimation techniques (2nd ed.). Boca Raton: Taylor & Francis.

    MATH  Google Scholar 

  • Basokcu, T. O., & Ogretmen, T. (2014). Comparison of parametric item response techniques in determining differential item functioning in polytomous scale. American Journal of Theoretical and Applied Statistics, 3, 31–38.

    Article  Google Scholar 

  • Cai, L., Thissen, D., & du Toit, S. (2011). IRTPRO 2.1 [Computer software]. Lincolnwood: Scientific Software International.

    Google Scholar 

  • Coffman, D. L., & Belue, R. (2009). Disparities in sense of community—True race differences or differential item functioning? Journal of Community Psychology, 37, 547–558.

    Article  Google Scholar 

  • Georgia Center for Assessment. (2007–2012). The Georgia high school graduation predictor test. Athens, GA: Author.

    Google Scholar 

  • Georgia Department of Education. (2010). Test content descriptions based on the Georgia performance standards social studies. http://archives.gadoe.org/DMGetDocument.aspx/GHSGT%20Social%20Studies%20Content%20Descriptions%20GPS%20Version%20Update%20Oct%202010.pdf?p=6CC6799F8C1371F6A344D9C15C23A9D859A861593B934AB75F446073BD12714C&Type=D. Accessed 15 Nov 2014.

  • Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44, S182–S188.

    Article  Google Scholar 

  • Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  • Kline, T. J. B. (2004). Gender and language differences on the test of workplace essential skills—Using overall mean scores and item-level differential item functioning analyses. Educational and Psychological Measurement, 64, 549–559.

    Article  MathSciNet  Google Scholar 

  • Logan, J. R., Minca, E., & Adar, S. (2012). The geography of inequality—Why separate means unequal in American public schools. Sociology of Education, 85, 287–301.

    Article  Google Scholar 

  • Lord, F. M. (1977). A broad-range tailored test of verbal ability. Applied Psychological Measurement, 1, 95–100.

    Article  Google Scholar 

  • Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  • McNulty, T. L., & Bellair, P. E. (2003). Explaining racial and ethnic differences in serious adolescent violent behavior. Criminology, 41, 709–748.

    Article  Google Scholar 

  • Paek, I., & Han, K. T. (2013). IRTPRO 2.1 for windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37, 242–252.

    Article  Google Scholar 

  • Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.

    Chapter  Google Scholar 

  • Steinberg, L. (1994). Context and serial-order effects in personality measurement—Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66, 341–349.

    Article  Google Scholar 

  • Thissen, D. (2001). IRTLRDIF v2.0b—Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. Chapel Hill: L. L. Thurstone Psychometric Laboratory, University of North Carolina.

    Google Scholar 

  • Thissen, D., Steinverg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response model. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale: Lawrence Erlbaum Associates.

    Google Scholar 

  • Van der Linden, W. J., & Hambleton, R. K. (1997). Item response theory—Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer.

    Chapter  Google Scholar 

  • Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning—Definitions and detection. Journal of Educational Measurement, 28, 197–219.

    Article  Google Scholar 

  • Wang, X.-B., Wainer, H., & Thissen, D. (1995). On the viability of some untestable assumptions in equating exams that allow examinee choice. Applied Measurement in Education, 8, 211–225.

    Article  Google Scholar 

  • Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57.

    Article  MathSciNet  Google Scholar 

  • Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups—Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547.

    Article  Google Scholar 

  • Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG [Computer software]. Lincolnwood: Scientific Software International.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Ling Ong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ong, M.L., Kim, SH., Cohen, A., Cramer, S. (2015). A Comparison of Differential Item Functioning (DIF) Detection for Dichotomously Scored Items Using IRTPRO, BILOG-MG, and IRTLRDIF. In: van der Ark, L., Bolt, D., Wang, WC., Douglas, J., Chow, SM. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 140. Springer, Cham. https://doi.org/10.1007/978-3-319-19977-1_10

Download citation

Publish with us

Policies and ethics