A Comparison of the Hierarchical Generalized Linear Model, Multiple-Indicators Multiple-Causes, and the Item Response Theory-Likelihood Ratio Test for Detecting Differential Item Functioning

Ong, Mei Ling; Lu, Laura; Lee, Sunbok; Cohen, Allan

doi:10.1007/978-3-319-07503-7_22

Mei Ling Ong⁵,
Laura Lu⁶,
Sunbok Lee⁷ &
…
Allan Cohen⁸

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 89))

2259 Accesses

Abstract

The purpose of this study was to compare the DIF detection performance of the hierarchical generalized linear model (HGLM), the multiple-indicators multiple-causes (MIMIC) method, and the IRT likelihood ratio (IRT-LR) test in simulated hierarchical data. Conditions in the simulation study included the number of clusters, cluster sizes, and the intraclass correlation coefficient (ICC). Those methods are compared in terms of Type I error rates. These rates should be close to 0.05 when the level of significance is set at 0.05. Results show that the HGLM maintained the marginal Type I error rate. The MIMIC model maintained a Type I error control rate better than the other two methods when cluster sizes were small. When cluster size and intraclass correlation ρ increased, however, the Type I error rates increased as well. The IRT-LR test maintained a marginal Type I error control for small sample cluster sizes but failed to do so for larger cluster sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Acar T (2012) Determination of a differential item functioning procedure using the hierarchical generalized linear model: a comparison study with logistic regression and likelihood ratio procedure. SAGE Open. Advance online publication. doi:10.1177/2158244012436760
Google Scholar
Baker FB, Kim S-H (2004) Item response theory: parameter estimation techniques. Taylor & Francis, Boca Raton
Google Scholar
Bates D, Marchler M, Bolker B (2013) Linear mixed-effects models using S4 classes (R package). http://cran.rproject.org/web/packages/lme4/lme4.pdf
Binici S (2007) Random-effect differential item functioning via hierarchical generalized linear model and generalized linear latent mixed model: a comparison of estimation methods. Unpublished doctoral dissertation. Florida State University
Google Scholar
Camilli G, Shepard LA (1994) Methods for identifying biased test items. Sage, Thousand Oaks
Google Scholar
Cheong YF, Kamata A (2013) Centering, scale indeterminacy, and differential item functioning detection in hierarchical generalized linear and generalized linear mixed models. Appl Meas Educ 26(4):233–252
Article Google Scholar
Chu K (2002) Equivalent group test equating with the presence of differential item functioning. Unpublished doctoral dissertation. Florida State University
Google Scholar
Cohen AS, Kim S-H, Wollack JA (1996) An investigation of the likelihood ratio test for detection of differential item functioning. Appl Psychol Meas 20(1):15–26
Article Google Scholar
Dorans NJ, Kulick E (1986) Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. J Educ Meas 23(4):355–368
Article Google Scholar
Finch WH (2005) The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Appl Psychol Meas 29(4):278–295
Article MathSciNet Google Scholar
Finch WH, French BF (2010) Detecting differential item functioning of a course satisfaction instrument in the presence of multilevel data. J First Year Exp Stud Transit 22(1):27–47
Google Scholar
French BF, Finch WH (2010) Hierarchical logistic regression: accounting for multilevel data in DIF detection. J Educ Meas 47(3):299–317
Article Google Scholar
French BF, Finch WH (2013) Extensions of Mantel-Haenszel for multilevel DIF detection. Educ Psychol Meas. doi:10.1177/0013164412472341, Advance online publication
Google Scholar
Holland PW, Thayer DT (1988) Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H, Braun HI (eds) Test validity. Lawrence Erlbaum Associates, Hillsdale, pp 129–145
Google Scholar
Hox JJ, Maas CJM (2001) The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Struct Equ Model 8:157–174
Article Google Scholar
Jones RN (2006) Identification of measurement differences between English and Spanish language versions of the mini-mental state examination: detecting differential item functioning using MIMIC modeling. Med Care 44(11):124–133
Article Google Scholar
Kamata A (1998) One-parameter hierarchical generalized linear logistic model: an application of HGLM to IRT. Paper presented at the annual meeting of the American Educational Research Association, April, California
Google Scholar
Kamata A (2001) Item analysis by the hierarchical generalized linear model. J Educ Meas 38(1):79–93
Article Google Scholar
Kamata A (2002) Procedure to perform item response analysis by hierarchical generalized linear model. Paper presented at the annual meeting of the American Educational Research Association, April, New Orleans
Google Scholar
Kamata A, Cheong YF (2007) Multilevel Rasch models. In: von Davier M, Carstensen CH (eds) Multivariate and mixture distribution Rasch models: extensions and applications. Springer Science + Business Media, New York, pp 217–232
Chapter Google Scholar
Kamata A, Vaughn BK (2011) Multilevel IRT modeling. In: Hox JJ, Roberts JK (eds) Handbook of advanced multilevel analysis. Taylor and Francis Group, New York, pp 41–57
Google Scholar
Kim S-K, Cohen AS (1998) Detection of differential item functioning under the graded response model with the likelihood ratio test. Appl Psychol Meas 22(4):345–355
Article Google Scholar
Lord FM (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Hillsdale
Google Scholar
Maas CJM, Hox JJ (2005) Sufficient sample sizes for multilevel modeling. Methodology 1(3): 86–92
MathSciNet Google Scholar
McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hill, London
Book MATH Google Scholar
Muthén BO (1989) Latent variable modeling in heterogeneous populations. Psychometrika 54(4):557–585
Article MathSciNet Google Scholar
Muthén LK, Muthén BO (1998–2012) Mplus user’s guide, 7th edn. Muthén & Muthén, Los Angeles
Google Scholar
National Assessment of Educational Progress (2009). Reading assessment and item specifications. Retrieved March 14, 2014 from http://www.state.nj.us/education/assessment/naep/results/temspecs09.pdf
Raju NS (1988) The area between two item characteristic curves. Psychometrika 53(4):495–502
Article MATH MathSciNet Google Scholar
Raju NS (1990) Determining the significance of estimated signed and unsigned areas between two item response functions. Appl Psychol Meas 14(2):197–207
Article Google Scholar
Rasch G (1960) Probabilistic models for some intelligence and attainment tests. The Danish Institute for Educational Research, Copenhagen
Google Scholar
Raudenbush S, Bryk AS (1986) A hierarchical model for studying school effects. Sociol Educ 59(1):1–17
Article Google Scholar
Raudenbush SW, Bryk AS (2002) Hierarchical linear models: applications and data analysis methods, 2nd edn. Sage, Newbury
Google Scholar
Shih C-L, Wang W-C (2009) Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Appl Psychol Meas 33(3):184–199
Article MathSciNet Google Scholar
Snijder TAB, Bosker RJ (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling, 2nd edn. Sage, Thousand Oaks
Google Scholar
Thissen D (2001) IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. L. L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill
Google Scholar
Thissen D, Steinberg L, Gerrard M (1986) Beyond group mean differences: the concept of item bias. Psychol Bull 99(1):118–128
Article Google Scholar
Thissen D, Steinberg L, Wainer H (1988) Use of item response theory in the study of group differences in trace lines. In: Wainer H, Braun HI (eds) Test validity. Erlbaum, Hillsdale, pp 147–169
Google Scholar
Thissen D, Steinberg L, Wainer H (1993) Detection of differential item functioning using the parameters of item response model. In: Holland PW, Wainer H (eds) Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, pp 67–114
Google Scholar
Willse JT, Goodman JT (2008) Comparison of multiple-indicators, multiple-causes- and item response theory-based analyses of subgroup differences. Educ Psychol Meas 68(4):587–602
Article MathSciNet Google Scholar
Woods CM (2008) Likelihood-ratio DIF testing: Effects of nonnormality. Appl Psychol Meas 32(7):511–526
Article MathSciNet Google Scholar
Woods CM (2009) Evaluation of MIMIC-model methods for DIF testing with comparison to two-groups analysis. Multivar Behav Res 44(1):1–27
Article Google Scholar
Woods CM, Oltmanns TF, Turkheimer E (2009) Illustration of MIMIC-Model DIF testing with the schedule for nonadaptive and adaptive personality. J Psychopathol Behav Asses 31(4):320–330
Article Google Scholar
Zimowski MF, Muraki E, Mislevy RJ, Bock RD (2003) BILOG-MG 3 [Computer software]. Scientific Software International, Lincolnwood
Google Scholar

Download references

Author information

Authors and Affiliations

Quantitative Methods, Department of Education Psychology, University of Georgia, 126H Aderhold Hall, Athens, GA, 30602, USA
Mei Ling Ong
Department of Education Psychology, University of Georgia, 325V Aderhold Hall, Athens, GA, 30602, USA
Laura Lu
Center for Family Research, 1095 College Station Rd., Athens, GA, 30602, USA
Sunbok Lee
Department of Education Psychology, University of Georgia, 125 Aderhold Hall, Athens, GA, 30602, USA
Allan Cohen

Authors

Mei Ling Ong
View author publications
You can also search for this author in PubMed Google Scholar
Laura Lu
View author publications
You can also search for this author in PubMed Google Scholar
Sunbok Lee
View author publications
You can also search for this author in PubMed Google Scholar
Allan Cohen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mei Ling Ong .

Editor information

Editors and Affiliations

Department of Psychology, Arizona State University, Tempe, Arizona, USA
Roger E. Millsap
Dept. of Educational Psychology, University of Wisconsin, Madison, USA
Daniel M. Bolt
University of Amsterdam, Amsterdam, The Netherlands
L. Andries van der Ark
Department of Psychological Studies, The Hong Kong Institute of Education, Hong Kong, Hong Kong SAR
Wen-Chung Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ong, M.L., Lu, L., Lee, S., Cohen, A. (2015). A Comparison of the Hierarchical Generalized Linear Model, Multiple-Indicators Multiple-Causes, and the Item Response Theory-Likelihood Ratio Test for Detecting Differential Item Functioning. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_22

Download citation

DOI: https://doi.org/10.1007/978-3-319-07503-7_22
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07502-0
Online ISBN: 978-3-319-07503-7
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics