Skip to main content

A Comparison of the Hierarchical Generalized Linear Model, Multiple-Indicators Multiple-Causes, and the Item Response Theory-Likelihood Ratio Test for Detecting Differential Item Functioning

  • Conference paper
Quantitative Psychology Research

Part of the book series: Springer Proceedings in Mathematics & Statistics ((PROMS,volume 89))

  • 2259 Accesses

Abstract

The purpose of this study was to compare the DIF detection performance of the hierarchical generalized linear model (HGLM), the multiple-indicators multiple-causes (MIMIC) method, and the IRT likelihood ratio (IRT-LR) test in simulated hierarchical data. Conditions in the simulation study included the number of clusters, cluster sizes, and the intraclass correlation coefficient (ICC). Those methods are compared in terms of Type I error rates. These rates should be close to 0.05 when the level of significance is set at 0.05. Results show that the HGLM maintained the marginal Type I error rate. The MIMIC model maintained a Type I error control rate better than the other two methods when cluster sizes were small. When cluster size and intraclass correlation ρ increased, however, the Type I error rates increased as well. The IRT-LR test maintained a marginal Type I error control for small sample cluster sizes but failed to do so for larger cluster sizes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Acar T (2012) Determination of a differential item functioning procedure using the hierarchical generalized linear model: a comparison study with logistic regression and likelihood ratio procedure. SAGE Open. Advance online publication. doi:10.1177/2158244012436760

    Google Scholar 

  • Baker FB, Kim S-H (2004) Item response theory: parameter estimation techniques. Taylor & Francis, Boca Raton

    Google Scholar 

  • Bates D, Marchler M, Bolker B (2013) Linear mixed-effects models using S4 classes (R package). http://cran.rproject.org/web/packages/lme4/lme4.pdf

  • Binici S (2007) Random-effect differential item functioning via hierarchical generalized linear model and generalized linear latent mixed model: a comparison of estimation methods. Unpublished doctoral dissertation. Florida State University

    Google Scholar 

  • Camilli G, Shepard LA (1994) Methods for identifying biased test items. Sage, Thousand Oaks

    Google Scholar 

  • Cheong YF, Kamata A (2013) Centering, scale indeterminacy, and differential item functioning detection in hierarchical generalized linear and generalized linear mixed models. Appl Meas Educ 26(4):233–252

    Article  Google Scholar 

  • Chu K (2002) Equivalent group test equating with the presence of differential item functioning. Unpublished doctoral dissertation. Florida State University

    Google Scholar 

  • Cohen AS, Kim S-H, Wollack JA (1996) An investigation of the likelihood ratio test for detection of differential item functioning. Appl Psychol Meas 20(1):15–26

    Article  Google Scholar 

  • Dorans NJ, Kulick E (1986) Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. J Educ Meas 23(4):355–368

    Article  Google Scholar 

  • Finch WH (2005) The MIMIC model as a method for detecting DIF: comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Appl Psychol Meas 29(4):278–295

    Article  MathSciNet  Google Scholar 

  • Finch WH, French BF (2010) Detecting differential item functioning of a course satisfaction instrument in the presence of multilevel data. J First Year Exp Stud Transit 22(1):27–47

    Google Scholar 

  • French BF, Finch WH (2010) Hierarchical logistic regression: accounting for multilevel data in DIF detection. J Educ Meas 47(3):299–317

    Article  Google Scholar 

  • French BF, Finch WH (2013) Extensions of Mantel-Haenszel for multilevel DIF detection. Educ Psychol Meas. doi:10.1177/0013164412472341, Advance online publication

    Google Scholar 

  • Holland PW, Thayer DT (1988) Differential item functioning and the Mantel-Haenszel procedure. In: Wainer H, Braun HI (eds) Test validity. Lawrence Erlbaum Associates, Hillsdale, pp 129–145

    Google Scholar 

  • Hox JJ, Maas CJM (2001) The accuracy of multilevel structural equation modeling with pseudobalanced groups and small samples. Struct Equ Model 8:157–174

    Article  Google Scholar 

  • Jones RN (2006) Identification of measurement differences between English and Spanish language versions of the mini-mental state examination: detecting differential item functioning using MIMIC modeling. Med Care 44(11):124–133

    Article  Google Scholar 

  • Kamata A (1998) One-parameter hierarchical generalized linear logistic model: an application of HGLM to IRT. Paper presented at the annual meeting of the American Educational Research Association, April, California

    Google Scholar 

  • Kamata A (2001) Item analysis by the hierarchical generalized linear model. J Educ Meas 38(1):79–93

    Article  Google Scholar 

  • Kamata A (2002) Procedure to perform item response analysis by hierarchical generalized linear model. Paper presented at the annual meeting of the American Educational Research Association, April, New Orleans

    Google Scholar 

  • Kamata A, Cheong YF (2007) Multilevel Rasch models. In: von Davier M, Carstensen CH (eds) Multivariate and mixture distribution Rasch models: extensions and applications. Springer Science + Business Media, New York, pp 217–232

    Chapter  Google Scholar 

  • Kamata A, Vaughn BK (2011) Multilevel IRT modeling. In: Hox JJ, Roberts JK (eds) Handbook of advanced multilevel analysis. Taylor and Francis Group, New York, pp 41–57

    Google Scholar 

  • Kim S-K, Cohen AS (1998) Detection of differential item functioning under the graded response model with the likelihood ratio test. Appl Psychol Meas 22(4):345–355

    Article  Google Scholar 

  • Lord FM (1980) Applications of item response theory to practical testing problems. Lawrence Erlbaum Associates, Hillsdale

    Google Scholar 

  • Maas CJM, Hox JJ (2005) Sufficient sample sizes for multilevel modeling. Methodology 1(3): 86–92

    MathSciNet  Google Scholar 

  • McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman and Hill, London

    Book  MATH  Google Scholar 

  • Muthén BO (1989) Latent variable modeling in heterogeneous populations. Psychometrika 54(4):557–585

    Article  MathSciNet  Google Scholar 

  • Muthén LK, Muthén BO (1998–2012) Mplus user’s guide, 7th edn. Muthén & Muthén, Los Angeles

    Google Scholar 

  • National Assessment of Educational Progress (2009). Reading assessment and item specifications. Retrieved March 14, 2014 from http://www.state.nj.us/education/assessment/naep/results/temspecs09.pdf

  • Raju NS (1988) The area between two item characteristic curves. Psychometrika 53(4):495–502

    Article  MATH  MathSciNet  Google Scholar 

  • Raju NS (1990) Determining the significance of estimated signed and unsigned areas between two item response functions. Appl Psychol Meas 14(2):197–207

    Article  Google Scholar 

  • Rasch G (1960) Probabilistic models for some intelligence and attainment tests. The Danish Institute for Educational Research, Copenhagen

    Google Scholar 

  • Raudenbush S, Bryk AS (1986) A hierarchical model for studying school effects. Sociol Educ 59(1):1–17

    Article  Google Scholar 

  • Raudenbush SW, Bryk AS (2002) Hierarchical linear models: applications and data analysis methods, 2nd edn. Sage, Newbury

    Google Scholar 

  • Shih C-L, Wang W-C (2009) Differential item functioning detection using the multiple indicators, multiple causes method with a pure short anchor. Appl Psychol Meas 33(3):184–199

    Article  MathSciNet  Google Scholar 

  • Snijder TAB, Bosker RJ (2012) Multilevel analysis: an introduction to basic and advanced multilevel modeling, 2nd edn. Sage, Thousand Oaks

    Google Scholar 

  • Thissen D (2001) IRTLRDIF v2.0b: Software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning [Computer software documentation]. L. L. Thurstone Psychometric Laboratory, University of North Carolina, Chapel Hill

    Google Scholar 

  • Thissen D, Steinberg L, Gerrard M (1986) Beyond group mean differences: the concept of item bias. Psychol Bull 99(1):118–128

    Article  Google Scholar 

  • Thissen D, Steinberg L, Wainer H (1988) Use of item response theory in the study of group differences in trace lines. In: Wainer H, Braun HI (eds) Test validity. Erlbaum, Hillsdale, pp 147–169

    Google Scholar 

  • Thissen D, Steinberg L, Wainer H (1993) Detection of differential item functioning using the parameters of item response model. In: Holland PW, Wainer H (eds) Differential item functioning. Lawrence Erlbaum Associates, Hillsdale, pp 67–114

    Google Scholar 

  • Willse JT, Goodman JT (2008) Comparison of multiple-indicators, multiple-causes- and item response theory-based analyses of subgroup differences. Educ Psychol Meas 68(4):587–602

    Article  MathSciNet  Google Scholar 

  • Woods CM (2008) Likelihood-ratio DIF testing: Effects of nonnormality. Appl Psychol Meas 32(7):511–526

    Article  MathSciNet  Google Scholar 

  • Woods CM (2009) Evaluation of MIMIC-model methods for DIF testing with comparison to two-groups analysis. Multivar Behav Res 44(1):1–27

    Article  Google Scholar 

  • Woods CM, Oltmanns TF, Turkheimer E (2009) Illustration of MIMIC-Model DIF testing with the schedule for nonadaptive and adaptive personality. J Psychopathol Behav Asses 31(4):320–330

    Article  Google Scholar 

  • Zimowski MF, Muraki E, Mislevy RJ, Bock RD (2003) BILOG-MG 3 [Computer software]. Scientific Software International, Lincolnwood

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mei Ling Ong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ong, M.L., Lu, L., Lee, S., Cohen, A. (2015). A Comparison of the Hierarchical Generalized Linear Model, Multiple-Indicators Multiple-Causes, and the Item Response Theory-Likelihood Ratio Test for Detecting Differential Item Functioning. In: Millsap, R., Bolt, D., van der Ark, L., Wang, WC. (eds) Quantitative Psychology Research. Springer Proceedings in Mathematics & Statistics, vol 89. Springer, Cham. https://doi.org/10.1007/978-3-319-07503-7_22

Download citation

Publish with us

Policies and ethics