Skip to main content

Advertisement

Log in

Differential Item Functioning and its Relevance to Epidemiology

  • Epidemiologic Methods (P Howards, Section Editor)
  • Published:
Current Epidemiology Reports Aims and scope Submit manuscript

Abstract

Purpose of Review

In this review, I trace the origins, applications, limitations, and future prospects for research on measurement item bias or differential item functioning (DIF) in the context of health research. DIF arises in the context of using multiple item or symptom health instruments to rate the level of a particular condition, and describes the situation where not all persons at the same level of the underlying condition have the same probability of endorsing one or more symptoms. The presence of DIF can lead to biased assessment of group differences and confound risk factor and outcomes research.

Recent Findings

The epidemiologic literature includes a great many applied, review, and methodological articles focusing on DIF. The preponderance of the literature appears in the areas of health-related quality of life, physical functioning, cognition, and mental health outcomes.

Summary

Epidemiologists and other researchers in the health sciences often rely upon multiple-item rating scales or questionnaires to assess for the presence of or level of health conditions or states that are otherwise not directly observable. When population subgroups respond differently to a subset of the items, this is referred to as differential item functioning (DIF) and might be a source of bias.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Notes

  1. Interested readers can type “net from http://s3.amazonaws.com/mplusmimicbucket” and install our Stata module mplusmimic, which automates Mplus/MIMIC and multiple group confirmatory factor analysis DIF detection algorithm.

References

Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. Bontempo D, Hofer S. Assessing factorial invariance in cross-sectional and longitudinal studies. In: Ong A, van Dulmen M, editors. Handbook of methods in positive psychology: Oxford University Press; 2007. p. 153–75.

  2. Bauer DJ. A more general model for testing measurement invariance and differential item functioning. Psychol Methods. 2017;22(3):507.

    Article  PubMed  Google Scholar 

  3. Meredith W. Measurement invariance, factor analysis and factorial invariance. Psychometrika. 1993;58(4):525–43.

    Article  Google Scholar 

  4. Vandenberg RJ, Lance CE. A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organ Res Methods. 2000;3(1):4.

    Article  Google Scholar 

  5. • Kim J, Smith T. Exploring measurement invariance by gender in the profile of mood states depression subscale among cancer survivors. Qual Life Res. 2017;26(1):171–5 Kim and Smith provide a nice example of blending the measurement invariance and differential item functioning modes of analysis.

    Article  PubMed  Google Scholar 

  6. Cole NS. History and development of DIF. In: Holland P, Wainer H, editors. Differential item functioning. New York: Routledge; 1993. p. 25–9.

    Google Scholar 

  7. Camilli G, Shepard LA. Methods for identifying biased test items. Newbury Park: Sage Publishers; 1994.

    Google Scholar 

  8. Teresi JA, Jones RN. Bias in psychological assessment and other measures. In: Geisinger K, Bracken B, Carlson J, Hansen J-I, Kuncel N, Reise S, et al., editors. APA handbook of testing and assessment in psychology, vol 1: test theory and testing and assessment in industrial and organizational psychology. APA handbooks in psychology. Washington, DC: American Psychological Association; 2013. p. 139–64.

    Google Scholar 

  9. Millsap R, Everson H. Methodology review: statistical approaches for assessing measurement bias. Appl Psychol Meas. 1993;17(4):297–334.

    Article  Google Scholar 

  10. Bock DR. Different DIFs: comment on the papers read by Neil Dorans and David Thissen. In: Holland P, Wainer H, editors. Differential item functioning. New York: Routledge; 1993. p. 115–22.

    Google Scholar 

  11. Oort F. Using restricted factor analysis to detect item bias. Methodika. 1992;6:150–66.

    Google Scholar 

  12. Lord F, Novick M. Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968.

    Google Scholar 

  13. Embretson SE, Reise SP. Item response theory for psychologists. Mahwah, New Jersey: Lawrence Erlbaum Associates; 2000.

  14. Reckase MD. Multidimensional item response theory. New York: Springer; 2009.

    Book  Google Scholar 

  15. Hambleton RK, Swaminathan H, Rogers H. Fundamentals of item response theory. Newbury Park: SAGE Publications; 1991.

    Google Scholar 

  16. Woods CM. Ramsay-curve item response theory (RC-IRT) to detect and correct for nonnormal latent variables. Psychol Methods. 2006;11(3):253.

    Article  PubMed  Google Scholar 

  17. Camilli G. Teacher's corner: origin of the scaling constant D=1.7 in item response theory. J Educ Behav Stat. 1994;19(3):293.

    Article  Google Scholar 

  18. Raykov T, Marcoulides GA. A course in item response theory and modeling with Stata. College Station, TX: Stata Press; 2018.

    Google Scholar 

  19. Matlock Cole K, Paek I. PROC IRT: a SAS procedure for item response theory. Appl Psychol Meas. 2017;41(4):311–20.

    Article  PubMed Central  Google Scholar 

  20. Rusch T, Mair P, Hatzinger R. In: Regina Dittrich ML, Miko K, Rusch T, Schiffinger M, editors. In discussion paper series of the Center for Empirical Research Methods. WU Vienna, Austria, Vienna: Center for Empirical Research Methods; 2013. http://epub.wu.ac.at/id/eprint/4010.

    Google Scholar 

  21. Takane Y, De Leeuw J. On the relationship between item response theory and factor analysis of discretized variables. Psychometrika. 1987;52(3):393–408.

    Article  Google Scholar 

  22. Lord F, Novick M. Latent traits and item characteristic functions (chapter 16). Statistical theories of mental test scores. Reading, MA: Addison-Wesley; 1968. p. 358–93.

    Google Scholar 

  23. Mislevy RJ. Recent developments in the factor analysis of categorical variables. J Educ Stat. 1986;11(1):3–31.

    Article  Google Scholar 

  24. Macintosh R, Hashim S. Variance estimation for converting MIMIC model parameters to IRT parameters in DIF analysis. Appl Psychol Meas. 2003;27(5):372–9.

    Article  Google Scholar 

  25. Rosseel Y. Lavaan: an R package for structural equation modeling. J Stat Softw. 2012;48:1–36.

    Article  Google Scholar 

  26. Teresi JA. Different approaches to differential item functioning in health applications: advantages, disadvantages and some neglected topics. Med Care. 2006;44(11 Suppl 3):S152–70.

    Article  PubMed  Google Scholar 

  27. Crane PK, Cetin K, Cook KF, Johnson K, Deyo R, Amtmann D. Differential item functioning impact in a modified version of the Roland–Morris disability questionnaire. Qual Life Res. 2007;16(6):981–90.

    Article  PubMed  Google Scholar 

  28. • Hays RD, Calderón JL, Spritzer KL, Reise SP, Paz SH. Differential item functioning by language on the PROMIS® physical functioning items for children and adolescents. Qual Life Res. 2018;27(1):235–47 Hays and colleagues demonstrate methods for examining the impact of differential item functioning.

    Article  PubMed  Google Scholar 

  29. •• Verdam MG, Oort FJ, Sprangers MA. Item bias detection in the Hospital Anxiety and Depression Scale using structural equation modeling: comparison with other item bias detection methods. Qual Life Res. 2017;26(6):1439–50 Verdam and colleagues present a cohesive discussion of extensions to the binary test item, two-group, unidimensional latent trait conditions for conceptualizing and evaluating measurement bias.

    Article  PubMed  Google Scholar 

  30. Ioannidis JPA. Why most published research findings are false. PLoS Med. 2005;2(8):e124.

    Article  PubMed Central  PubMed  Google Scholar 

  31. Yang FM, Heslin KC, Mehta KM, Yang C-W, Ocepek-Welikson K, Kleinman M, et al. A comparison of item response theory-based methods for examining differential item functioning in object naming test by language of assessment among older Latinos. Psychol Test Assess Model. 2011;53(4):440–60.

    PubMed Central  PubMed  Google Scholar 

  32. Thissen DMULTILOG. User's guide: multiple, categorical item analysis and test scoring using item response theory. Chicago: Scientific Software, Inc; 1991.

    Google Scholar 

  33. Thissen D. IRTLRDIF v. 2.0 b: software for the computation of the statistics involved in item response theory likelihood-ratio tests for differential item functioning. Chapel Hill: University of North Carolina, LL Thurstone Psychometric Laboratory; 2001.

    Google Scholar 

  34. Flowers CP, Oshima TC, Raju NS. A description and demonstration of the polytomous-DFIT framework. Appl Psychol Meas. 1999;23(4):309–26.

    Article  Google Scholar 

  35. Crane P, Gibbons L, Jolley L, van Belle G. Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Med Care. 2006;44(11 Suppl 3):S115–S23.

    Article  PubMed  Google Scholar 

  36. Muraki E, Bock D. PARSCALE for windows. Chicago: Scientific Software International; 2003.

    Google Scholar 

  37. Muthén L, Muthén B. Mplus Users Guide. Eighth ed. Los Angeles: Muthén & Muthén; 1998–2017.

    Google Scholar 

  38. Wiegand RE. Performance of using multiple stepwise algorithms for variable selection. Stat Med. 2010;29(15):1647–59.

    PubMed  Google Scholar 

  39. Chun S, Stark S, Kim ES, Chernyshenko OS. MIMIC methods for detecting DIF among multiple groups: exploring a new sequential-free baseline procedure. Appl Psychol Meas. 2016;40(7):486–99.

    Article  PubMed Central  PubMed  Google Scholar 

  40. Finch W. The MIMIC model as a method for detecting DIF: comparison with Mantel–Haenszel, SIBTEST, and the IRT likelihood ratio. Appl Psychol Meas. 2005;29(4):278–95.

    Article  Google Scholar 

  41. Finch W, French BF. Detection of crossing differential item functioning: a comparison of four methods. Educ Psychol Meas. 2007;67(4):565–82.

    Article  Google Scholar 

  42. Finch W, French B. Anomalous type I error rates for identifying one type of differential item functioning in the presence of the other. Educ Psychol Meas. 2008;68:742–59.

    Article  Google Scholar 

  43. French BF, Maller SJ. Iterative purification and effect size use with logistic regression for differential item functioning detection. Educ Psychol Meas. 2007;67(3):373.

    Article  Google Scholar 

  44. Stark S, Chernyshenko OS, Drasgow F. Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. J Appl Psychol. 2006;91(6):1292–306.

    Article  PubMed  Google Scholar 

  45. Zwick R, Thayer DT, Wingersky M. A simulation study of methods for assessing differential item functioning in computerized adaptive tests. Appl Psychol Meas. 1994;18(2):121–40.

    Article  Google Scholar 

  46. Wang W-C. Assessment of differential item functioning. J Appl Meas. 2008;9(4):387–408.

    PubMed  Google Scholar 

  47. Woods CM, Grimm KJ. Testing for nonuniform differential item functioning with multiple indicator multiple cause models. Appl Psychol Meas. 2011;35(5):339–61.

    Article  Google Scholar 

  48. Muthén B. Beyond SEM: general latent variable modeling. Behaviormetrika. 2002;29(1):81–117.

    Article  Google Scholar 

  49. Jones R, Gallo J. Education and sex differences in the mini-mental state examination: effects of differential item functioning. J Gerontol B-Psychol Sci Soc Sci. 2002;57(6):P548–P58.

    Article  PubMed  Google Scholar 

  50. Fratiglioni L, Jorm AF, Grut M, Viitanen M, Holmen K, Ahlbom A, et al. Predicting dementia from the mini-mental state examination in an elderly population: the role of education. J Clin Epidemiol. 1993;46(3):281–7.

    Article  CAS  PubMed  Google Scholar 

  51. Wu X, Sawatzky R, Hopman W, Mayo N, Sajobi TT, Liu J, et al. Latent variable mixture models to test for differential item functioning: a population-based analysis. Health Qual Life Outcomes. 2017;15(1):102.

    Article  PubMed Central  PubMed  Google Scholar 

  52. Peng R, Dominici F, Zeger SL. Reproducible epidemiologic research. Am J Epidemiol. 2006;163(9):783–9.

    Article  PubMed  Google Scholar 

  53. Rothman KJ, Greenland S, Lash T. Modern epidemiology. third ed: Wolters Kluwer, Lippincott Williams & Wilkins; 2008.

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Richard N. Jones.

Ethics declarations

Conflict of Interest

R.N.J. declares no potential conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Epidemiologic Methods

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jones, R.N. Differential Item Functioning and its Relevance to Epidemiology. Curr Epidemiol Rep 6, 174–183 (2019). https://doi.org/10.1007/s40471-019-00194-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40471-019-00194-5

Keywords

Navigation