Abstract
Purpose of Review
In this review, I trace the origins, applications, limitations, and future prospects for research on measurement item bias or differential item functioning (DIF) in the context of health research. DIF arises in the context of using multiple item or symptom health instruments to rate the level of a particular condition, and describes the situation where not all persons at the same level of the underlying condition have the same probability of endorsing one or more symptoms. The presence of DIF can lead to biased assessment of group differences and confound risk factor and outcomes research.
Recent Findings
The epidemiologic literature includes a great many applied, review, and methodological articles focusing on DIF. The preponderance of the literature appears in the areas of health-related quality of life, physical functioning, cognition, and mental health outcomes.
Summary
Epidemiologists and other researchers in the health sciences often rely upon multiple-item rating scales or questionnaires to assess for the presence of or level of health conditions or states that are otherwise not directly observable. When population subgroups respond differently to a subset of the items, this is referred to as differential item functioning (DIF) and might be a source of bias.
Similar content being viewed by others
Notes
Interested readers can type “net from http://s3.amazonaws.com/mplusmimicbucket” and install our Stata module mplusmimic, which automates Mplus/MIMIC and multiple group confirmatory factor analysis DIF detection algorithm.
References
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
Bontempo D, Hofer S. Assessing factorial invariance in cross-sectional and longitudinal studies. In: Ong A, van Dulmen M, editors. Handbook of methods in positive psychology: Oxford University Press; 2007. p. 153–75.
Bauer DJ. A more general model for testing measurement invariance and differential item functioning. Psychol Methods. 2017;22(3):507.
Meredith W. Measurement invariance, factor analysis and factorial invariance. Psychometrika. 1993;58(4):525–43.
Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organ Res Methods. 2000;3(1):4.
• Kim J, Smith T. Exploring measurement invariance by gender in the profile of mood states depression subscale among cancer survivors. Qual Life Res. 2017;26(1):171–5 Kim and Smith provide a nice example of blending the measurement invariance and differential item functioning modes of analysis.
Cole NS. History and development of DIF. In: Holland P, Wainer H, editors. Differential item functioning. New York: Routledge; 1993. p. 25–9.
Camilli G, Shepard LA. Methods for identifying biased test items. Newbury Park: Sage Publishers; 1994.
Teresi JA, Jones RN. Bias in psychological assessment and other measures. In: Geisinger K, Bracken B, Carlson J, Hansen J-I, Kuncel N, Reise S, et al., editors. APA handbook of testing and assessment in psychology, vol 1: test theory and testing and assessment in industrial and organizational psychology. APA handbooks in psychology. Washington, DC: American Psychological Association; 2013. p. 139–64.
Millsap R, Everson H. Methodology review: statistical approaches for assessing measurement bias. Appl Psychol Meas. 1993;17(4):297–334.
Bock DR. Different DIFs: comment on the papers read by Neil Dorans and David Thissen. In: Holland P, Wainer H, editors. Differential item functioning. New York: Routledge; 1993. p. 115–22.
Oort F. Using restricted factor analysis to detect item bias. Methodika. 1992;6:150–66.
Lord F, Novick M. Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968.
Embretson SE, Reise SP. Item response theory for psychologists. Mahwah, New Jersey: Lawrence Erlbaum Associates; 2000.
Reckase MD. Multidimensional item response theory. New York: Springer; 2009.
Hambleton RK, Swaminathan H, Rogers H. Fundamentals of item response theory. Newbury Park: SAGE Publications; 1991.
Woods CM. Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychol Methods. 2006;11(3):253.
Camilli G. Teacher's corner: origin of the scaling constant D=1.7 in item response theory. J Educ Behav Stat. 1994;19(3):293.
Raykov T, Marcoulides GA. A course in item response theory and modeling with Stata. College Station, TX: Stata Press; 2018.
Matlock Cole K, Paek I. PROC IRT: a SAS procedure for item response theory. Appl Psychol Meas. 2017;41(4):311–20.
Rusch T, Mair P, Hatzinger R. In: Regina Dittrich ML, Miko K, Rusch T, Schiffinger M, editors. In discussion paper series of the Center for Empirical Research Methods. WU Vienna, Austria, Vienna: Center for Empirical Research Methods; 2013. http://epub.wu.ac.at/id/eprint/4010.
Takane Y, De Leeuw J. On the relationship between item response theory and factor analysis of discretized variables. Psychometrika. 1987;52(3):393–408.
Lord F, Novick M. Latent traits and item characteristic functions (chapter 16). Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968. p. 358–93.
Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Educ Stat. 1986;11(1):3–31.
Macintosh R, Hashim S. Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Appl Psychol Meas. 2003;27(5):372–9.
Rosseel Y. Lavaan: an R package for structural equation modeling. J Stat Softw. 2012;48:1–36.
Teresi JA. Different approaches to differential item functioning in health applications: advantages, disadvantages and some neglected topics. Med Care. 2006;44(11 Suppl 3):S152–70.
Crane PK, Cetin K, Cook KF, Johnson K, Deyo R, Amtmann D. Differential item functioning impact in a modified version of the Roland–Morris disability questionnaire. Qual Life Res. 2007;16(6):981–90.
• Hays RD, Calderón JL, Spritzer KL, Reise SP, Paz SH. Differential item functioning by language on the PROMIS® physical functioning items for children and adolescents. Qual Life Res. 2018;27(1):235–47 Hays and colleagues demonstrate methods for examining the impact of differential item functioning.
•• Verdam MG, Oort FJ, Sprangers MA. Item bias detection in the Hospital Anxiety and Depression Scale using structural equation modeling: comparison with other item bias detection methods. Qual Life Res. 2017;26(6):1439–50 Verdam and colleagues present a cohesive discussion of extensions to the binary test item, two-group, unidimensional latent trait conditions for conceptualizing and evaluating measurement bias.
Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124.
Yang FM, Heslin KC, Mehta KM, Yang C-W, Ocepek-Welikson K, Kleinman M, et al. A comparison of item response theory-based methods for examining differential item functioning in object naming test by language of assessment among older Latinos. Psychol Test Assess Model. 2011;53(4):440–60.
Thissen DMULTILOG. User's guide: multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software, Inc; 1991.
Thissen D. IRTLRDIF v. 2.0 b: software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill: University of North Carolina, LL Thurstone Psychometric Laboratory; 2001.
Flowers CP, Oshima TC, Raju NS. A description and demonstration of the polytomous-DFIT framework. Appl Psychol Meas. 1999;23(4):309–26.
Crane P, Gibbons L, Jolley L, van Belle G. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. 2006;44(11 Suppl 3):S115–S23.
Muraki E, Bock D. PARSCALE for windows. Chicago: Scientific Software International; 2003.
Muthén L, Muthén B. Mplus Users Guide. Eighth ed. Los Angeles: Muthén & Muthén; 1998–2017.
Wiegand RE. Performance of using multiple stepwise algorithms for variable selection. Stat Med. 2010;29(15):1647–59.
Chun S, Stark S, Kim ES, Chernyshenko OS. MIMIC methods for detecting DIF among multiple groups: exploring a new sequential-free baseline procedure. Appl Psychol Meas. 2016;40(7):486–99.
Finch W. The MIMIC model as a method for detecting DIF: comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Appl Psychol Meas. 2005;29(4):278–95.
Finch W, French BF. Detection of crossing differential item functioning: a comparison of four methods. Educ Psychol Meas. 2007;67(4):565–82.
Finch W, French B. Anomalous type I error rates for identifying one type of differential item functioning in the presence of the other. Educ Psychol Meas. 2008;68:742–59.
French BF, Maller SJ. Iterative purification and effect size use with logistic regression for differential item functioning detection. Educ Psychol Meas. 2007;67(3):373.
Stark S, Chernyshenko OS, Drasgow F. Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. J Appl Psychol. 2006;91(6):1292–306.
Zwick R, Thayer DT, Wingersky M. A simulation study of methods for assessing differential item functioning in computerized adaptive tests. Appl Psychol Meas. 1994;18(2):121–40.
Wang W-C. Assessment of differential item functioning. J Appl Meas. 2008;9(4):387–408.
Woods CM, Grimm KJ. Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Appl Psychol Meas. 2011;35(5):339–61.
Muthén B. Beyond SEM: general latent variable modeling. Behaviormetrika. 2002;29(1):81–117.
Jones R, Gallo J. Education and sex differences in the mini-mental state examination: effects of differential item functioning. J Gerontol B-Psychol Sci Soc Sci. 2002;57(6):P548–P58.
Fratiglioni L, Jorm AF, Grut M, Viitanen M, Holmen K, Ahlbom A, et al. Predicting dementia from the mini-mental state examination in an elderly population: the role of education. J Clin Epidemiol. 1993;46(3):281–7.
Wu X, Sawatzky R, Hopman W, Mayo N, Sajobi TT, Liu J, et al. Latent variable mixture models to test for differential item functioning: a population-based analysis. Health Qual Life Outcomes. 2017;15(1):102.
Peng R, Dominici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol. 2006;163(9):783–9.
Rothman KJ, Greenland S, Lash T. Modern epidemiology. third ed: Wolters Kluwer, Lippincott Williams & Wilkins; 2008.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
R.N.J. declares no potential conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Epidemiologic Methods
Rights and permissions
About this article
Cite this article
Jones, R.N. Differential Item Functioning and its Relevance to Epidemiology. Curr Epidemiol Rep 6, 174–183 (2019). https://doi.org/10.1007/s40471-019-00194-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40471-019-00194-5