Abstract
The purpose of this study was to compare the DIF detection performance of the hierarchical generalized linear model (HGLM), the multiple-indicators multiple-causes (MIMIC) method, and the IRT likelihood ratio (IRT-LR) test in simulated hierarchical data. Conditions in the simulation study included the number of clusters, cluster sizes, and the intraclass correlation coefficient (ICC). Those methods are compared in terms of Type I error rates. These rates should be close to 0.05 when the level of significance is set at 0.05. Results show that the HGLM maintained the marginal Type I error rate. The MIMIC model maintained a Type I error control rate better than the other two methods when cluster sizes were small. When cluster size and intraclass correlation ρ increased, however, the Type I error rates increased as well. The IRT-LR test maintained a marginal Type I error control for small sample cluster sizes but failed to do so for larger cluster sizes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Acar T (2012) Determination of a differential item functioning procedure using the hierarchical generalized linear model: a comparison study with logistic regression and likelihood ratio procedure. SAGE Open. Advance online publication. doi:10.1177/2158244012436760
Baker FB, Kim S-H (2004) Item response theory: parameter estimation techniques. Taylor & Francis, Boca Raton
Bates D, Marchler M, Bolker B (2013) Linear mixed-effects models using S4 classes (R package). http://cran.rproject.org/web/packages/lme4/lme4.pdf
Binici S (2007) Random-effect differential item functioning via hierarchical generalized linear model and generalized linear latent mixed model: a comparison of estimation methods. Unpublished doctoral dissertation. Florida State University
Camilli G, Shepard LA (1994) Methods for identifying biased test items. Sage, Thousand Oaks
Cheong YF, Kamata A (2013) Centering, scale indeterminacy, and differential item functioning detection in hierarchical generalized linear and generalized linear mixed models. Appl Meas Educ 26(4):233–252
Chu K (2002) Equivalent group test equating with the presence of differential item functioning. Unpublished doctoral dissertation. Florida State University
Cohen AS, Kim S-H, Wollack JA (1996) An investigation of the likelihood ratio test for detection of differential item functioning. Appl Psychol Meas 20(1):15–26
Dorans NJ, Kulick E (1986) Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. J Educ Meas 23(4):355–368
Finch WH (2005) The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Appl Psychol Meas 29(4):278–295
Finch WH, French BF (2010) Detecting differential item functioning of a course satisfaction instrument in the presence of multilevel data. J First Year Exp Stud Transit 22(1):27–47
French BF, Finch WH (2010) Hierarchical logistic regression: accounting for multilevel data in DIF detection. J Educ Meas 47(3):299–317
French BF, Finch WH (2013) Extensions of Mantel-Haenszel for multilevel DIF detection. Educ Psychol Meas. doi:10.1177/0013164412472341, Advance online publication
Holland PW, Thayer DT (1988) Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H, Braun HI (eds) Test validity. Lawrence Erlbaum Associates, Hillsdale, pp 129–145
Hox JJ, Maas CJM (2001) The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Struct Equ Model 8:157–174
Jones RN (2006) Identification of measurement differences between English and Spanish language versions of the mini-mental state examination: detecting differential item functioning using MIMIC modeling. Med Care 44(11):124–133
Kamata A (1998) One-parameter hierarchical generalized linear logistic model: an application of HGLM to IRT. Paper presented at the annual meeting of the American Educational Research Association, April, California
Kamata A (2001) Item analysis by the hierarchical generalized linear model. J Educ Meas 38(1):79–93
Kamata A (2002) Procedure to perform item response analysis by hierarchical generalized linear model. Paper presented at the annual meeting of the American Educational Research Association, April, New Orleans
Kamata A, Cheong YF (2007) Multilevel Rasch models. In: von Davier M, Carstensen CH (eds) Multivariate and mixture distribution Rasch models: extensions and applications. Springer Science + Business Media, New York, pp 217–232
Kamata A, Vaughn BK (2011) Multilevel IRT modeling. In: Hox JJ, Roberts JK (eds) Handbook of advanced multilevel analysis. Taylor and Francis Group, New York, pp 41–57
Kim S-K, Cohen AS (1998) Detection of differential item functioning under the graded response model with the likelihood ratio test. Appl Psychol Meas 22(4):345–355
Lord FM (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Hillsdale
Maas CJM, Hox JJ (2005) Sufficient sample sizes for multilevel modeling. Methodology 1(3): 86–92
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hill, London
Muthén BO (1989) Latent variable modeling in heterogeneous populations. Psychometrika 54(4):557–585
Muthén LK, Muthén BO (1998–2012) Mplus user’s guide, 7th edn. Muthén & Muthén, Los Angeles
National Assessment of Educational Progress (2009). Reading assessment and item specifications. Retrieved March 14, 2014 from http://www.state.nj.us/education/assessment/naep/results/temspecs09.pdf
Raju NS (1988) The area between two item characteristic curves. Psychometrika 53(4):495–502
Raju NS (1990) Determining the significance of estimated signed and unsigned areas between two item response functions. Appl Psychol Meas 14(2):197–207
Rasch G (1960) Probabilistic models for some intelligence and attainment tests. The Danish Institute for Educational Research, Copenhagen
Raudenbush S, Bryk AS (1986) A hierarchical model for studying school effects. Sociol Educ 59(1):1–17
Raudenbush SW, Bryk AS (2002) Hierarchical linear models: applications and data analysis methods, 2nd edn. Sage, Newbury
Shih C-L, Wang W-C (2009) Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Appl Psychol Meas 33(3):184–199
Snijder TAB, Bosker RJ (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling, 2nd edn. Sage, Thousand Oaks
Thissen D (2001) IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. L. L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill
Thissen D, Steinberg L, Gerrard M (1986) Beyond group mean differences: the concept of item bias. Psychol Bull 99(1):118–128
Thissen D, Steinberg L, Wainer H (1988) Use of item response theory in the study of group differences in trace lines. In: Wainer H, Braun HI (eds) Test validity. Erlbaum, Hillsdale, pp 147–169
Thissen D, Steinberg L, Wainer H (1993) Detection of differential item functioning using the parameters of item response model. In: Holland PW, Wainer H (eds) Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, pp 67–114
Willse JT, Goodman JT (2008) Comparison of multiple-indicators, multiple-causes- and item response theory-based analyses of subgroup differences. Educ Psychol Meas 68(4):587–602
Woods CM (2008) Likelihood-ratio DIF testing: Effects of nonnormality. Appl Psychol Meas 32(7):511–526
Woods CM (2009) Evaluation of MIMIC-model methods for DIF testing with comparison to two-groups analysis. Multivar Behav Res 44(1):1–27
Woods CM, Oltmanns TF, Turkheimer E (2009) Illustration of MIMIC-Model DIF testing with the schedule for nonadaptive and adaptive personality. J Psychopathol Behav Asses 31(4):320–330
Zimowski MF, Muraki E, Mislevy RJ, Bock RD (2003) BILOG-MG 3 [Computer software]. Scientific Software International, Lincolnwood
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ong, M.L., Lu, L., Lee, S., Cohen, A. (2015). A Comparison of the Hierarchical Generalized Linear Model, Multiple-Indicators Multiple-Causes, and the Item Response Theory-Likelihood Ratio Test for Detecting Differential Item Functioning. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_22
Download citation
DOI: https://doi.org/10.1007/978-3-319-07503-7_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)