Abstract
This study was designed to provide an empirical comparison of three IRT calibration programs, IRTPRO, BILOG-MG, and IRTLRDIF, all of which can be used for detecting differential item functioning (DIF). The three programs were compared for each of three dichotomous IRT models, the one-parameter logistic, the two-parameter logistic, and the three-parameter logistic models. Results from each of these programs were examined using data from a test designed to predict high school graduation test results in a large Southeastern US state. Results suggested that all three programs detected DIF differently.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Baker, F. B., & Kim, S.-H. (2004). Item response theory—Parameter estimation techniques (2nd ed.). Boca Raton: Taylor & Francis.
Basokcu, T. O., & Ogretmen, T. (2014). Comparison of parametric item response techniques in determining differential item functioning in polytomous scale. American Journal of Theoretical and Applied Statistics, 3, 31–38.
Cai, L., Thissen, D., & du Toit, S. (2011). IRTPRO 2.1 [Computer software]. Lincolnwood: Scientific Software International.
Coffman, D. L., & Belue, R. (2009). Disparities in sense of community—True race differences or differential item functioning? Journal of Community Psychology, 37, 547–558.
Georgia Center for Assessment. (2007–2012). The Georgia high school graduation predictor test. Athens, GA: Author.
Georgia Department of Education. (2010). Test content descriptions based on the Georgia performance standards social studies. http://archives.gadoe.org/DMGetDocument.aspx/GHSGT%20Social%20Studies%20Content%20Descriptions%20GPS%20Version%20Update%20Oct%202010.pdf?p=6CC6799F8C1371F6A344D9C15C23A9D859A861593B934AB75F446073BD12714C&Type=D. Accessed 15 Nov 2014.
Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44, S182–S188.
Holland, P. W., & Thayer, D. T. (1988). Differential item functioning and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 129–145). Hillsdale: Lawrence Erlbaum Associates.
Kline, T. J. B. (2004). Gender and language differences on the test of workplace essential skills—Using overall mean scores and item-level differential item functioning analyses. Educational and Psychological Measurement, 64, 549–559.
Logan, J. R., Minca, E., & Adar, S. (2012). The geography of inequality—Why separate means unequal in American public schools. Sociology of Education, 85, 287–301.
Lord, F. M. (1977). A broad-range tailored test of verbal ability. Applied Psychological Measurement, 1, 95–100.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale: Lawrence Erlbaum Associates.
McNulty, T. L., & Bellair, P. E. (2003). Explaining racial and ethnic differences in serious adolescent violent behavior. Criminology, 41, 709–748.
Paek, I., & Han, K. T. (2013). IRTPRO 2.1 for windows (item response theory for patient-reported outcomes). Applied Psychological Measurement, 37, 242–252.
Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 85–100). New York: Springer.
Steinberg, L. (1994). Context and serial-order effects in personality measurement—Limits on the generality of measuring changes the measure. Journal of Personality and Social Psychology, 66, 341–349.
Thissen, D. (2001). IRTLRDIF v2.0b—Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. Chapel Hill: L. L. Thurstone Psychometric Laboratory, University of North Carolina.
Thissen, D., Steinverg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response model. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–114). Hillsdale: Lawrence Erlbaum Associates.
Van der Linden, W. J., & Hambleton, R. K. (1997). Item response theory—Brief history, common models, and extensions. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 1–28). New York: Springer.
Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning—Definitions and detection. Journal of Educational Measurement, 28, 197–219.
Wang, X.-B., Wainer, H., & Thissen, D. (1995). On the viability of some untestable assumptions in equating exams that allow examinee choice. Applied Measurement in Education, 8, 211–225.
Woods, C. M. (2009). Empirical selection of anchors for tests of differential item functioning. Applied Psychological Measurement, 33, 42–57.
Woods, C. M., Cai, L., & Wang, M. (2013). The Langer-improved Wald test for DIF testing with multiple groups—Evaluation and comparison to two-group IRT. Educational and Psychological Measurement, 73, 532–547.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG [Computer software]. Lincolnwood: Scientific Software International.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ong, M.L., Kim, SH., Cohen, A., Cramer, S. (2015). A Comparison of Differential Item Functioning (DIF) Detection for Dichotomously Scored Items Using IRTPRO, BILOG-MG, and IRTLRDIF. In: van der Ark, L., Bolt, D., Wang, WC., Douglas, J., Chow, SM. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 140. Springer, Cham. https://doi.org/10.1007/978-3-319-19977-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-19977-1_10
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19976-4
Online ISBN: 978-3-319-19977-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)