Differential item functioning and health assessment

Teresi, Jeanne A.; Fleishman, John A.

doi:10.1007/s11136-007-9184-6

Differential item functioning and health assessment

Original Paper
Published: 19 April 2007

Volume 16, pages 33–42, (2007)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Jeanne A. Teresi^1,3 &
John A. Fleishman²

2240 Accesses
179 Citations
Explore all metrics

Abstract

Establishing measurement equivalence is important because inaccurate assessment may lead to incorrect estimates of effects in research, and to suboptimal decisions at the individual, clinical level. Examination of differential item functioning (DIF) is a method for studying measurement equivalence. An item (i.e., one question in a longer scale) exhibits DIF if the item response differs across groups (e.g., gender, race), controlling for an estimate of the construct being measured. A distinction between applications in health, as contrasted with other settings such as educational and aptitude testing, is that there are many health-related constructs and multiple measures of each, few of which have received much critical evaluation. Discussed in this article are several methods for detection of differential item functioning (DIF), including non-parametric and parametric methods such as logistic regression, and those based on item response theory. Basic definitions and criteria for DIF detection are provided, as are steps in performing the analyses. Recommendations are presented and future directions discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

Article Open access 30 January 2023

Gordon W. Cheung, Helena D. Cooper-Thomas, … Linda C. Wang

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

Article 04 June 2018

Yan Xia & Yanyun Yang

A new criterion for assessing discriminant validity in variance-based structural equation modeling

Article Open access 22 August 2014

Jörg Henseler, Christian M. Ringle & Marko Sarstedt

References

Crane, P. K., Gibbons, L. E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Hays, R., & Teresi, J. A comparison of two sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, this issue.
Teresi, J. A., Ocepek-Welikson, K., Kleinman, M., Cook, K. F., Crane, P., Gibbons, L. E., Morales, L. S., Orlando-Edelen, M., & Cella, D. Evaluating measurement equivalence using the item response theory log-likelihood ratio (IRTLR) method to assess differential item functioning (DIF):Applications (with illustrations) to measures of physical functioning ability and general distress. Quality of Life Research, this issue.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, California: Sage Publications.
Google Scholar
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Google Scholar
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Article Google Scholar
Potenza, M. T., & Dorans, N. J. (1995). DIF assessment for polytomously scored items: A framework for classification and evaluation. Applied Psychological Measurement, 19, 23–37.
Article Google Scholar
Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 67–113). Hillsdale NJ: Lawrence Erlbaum, Inc.
Teresi, J. A. (2001). Statistical methods for examination of differential item functioning (DIF) with applications to cross-cultural measurement of functional, physical and mental health. Journal of Mental Health and Aging, 7, 31–40.
Google Scholar
Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44, S152–S170.
Article PubMed Google Scholar
King, G., Murray, C. J. L., Salomon, J. A., & Tandon, A. (2004). Enhancing the validity and cross-cultural comparability of measurement in survey research. American Political Science Review, 98, 191–207.
Article Google Scholar
Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(Suppl 3), S182–S188.
Article PubMed Google Scholar
Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19, 1651–1683.
Article PubMed CAS Google Scholar
Fleishman, J. A., & Lawrence, W. F. (2003) Demographic variation in SF-12 scores: True differences or differential item functioning? Medical Care, 41(Suppl. 7), III75–III86.
PubMed Google Scholar
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of differential item functioning on age and gender differences in functional disability. Journal of Gerontology: Social Sciences, 57B, S275–S284.
Google Scholar
Orlando-Edelen, M., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006). Identification of differential item functioning using item response theory and the likelihood-based model comparison approach: Application to the Mini-mental status examination. Medical Care, 44, S134–S142.
Article PubMed Google Scholar
Morales, L. S., Flowers, C., Gutiérrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental Status Exam assessed using the DFIT methodology. Medical Care, 44, S143–S151.
Article PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). DIF analysis with ordinal logistic regression techniques: DIFDETECT. Medical Care, 44(Suppl3), S115–S123.
Article PubMed Google Scholar
Mantel, N., & Haenszel, W. M. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancer Institute, 22, 719–748.
PubMed CAS Google Scholar
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & J. I. Braun (Eds.), Test validity. Hillsdale, NJ: Lawrence Erlbaum.
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the Scholastic Aptitude Test. Journal of Educational Measurement, 23, 355–368.
Article Google Scholar
Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
Article Google Scholar
Dorans, N. J., & Kulick, E. (2006). Differential item functioning on the Mini-Mental State Examination: An application of Mantel-Haenszel and standardization procedures. Medical Care, 44(Suppl. 3), S107–S114.
Article PubMed Google Scholar
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33, 215–230.
Article Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 26, 361–370.
Article Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type(ordinal) item scores. Ottawa, Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Retrieved from http://www.educ.ubc.ca/faculty/zumbo/DIF/index.html.
Crane, P. K., van Belle G, & Larson, E. B. (2004) Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.
Article PubMed Google Scholar
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.
Article Google Scholar
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading Massachusetts: Addison-Wesley Publishing Co.
Google Scholar
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale New Jersey: Lawrence Erlbaum.
Google Scholar
Hambleton, R. K., Swaminathan, H, & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, California: Sage Publications, Inc.
Google Scholar
Thissen, D. (1991). MULTILOG^TM User’s guide. Multiple, categorical item analysis and test scoring using Item response theory. Chicago: Scientific Software, Inc.
Thissen, D. (2001). IRTLRDIF v2.0b; Software for the Computation of the Statistics Involved in Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning. Available on Dave Thissen’s web page.
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19, 353–368.
Article Google Scholar
Flowers, C. P., Oshima, T. C., & Raju, N. S. (1999). A description and demonstration of the polytomous DFIT framework. Applied Psychological Measurement, 23, 309–326.
Article Google Scholar
Muthén, B. O. (2002). Beyond SEM: General latent variable modeling. Behaviormetrika, 29, 81–117.
Article Google Scholar
Muthén, L. K., & Muthén, B. O. (2004). MPLUS Statistical Analysis with latent variables. Users guide. Los Angeles, California: Muthén and Muthén.
Meredith, W. (1964). Notes on factorial invariance. Psychometricka, 29, 177–185.
Article Google Scholar
Gregorich, S. E. (2006). Do self-report instruments allow meaningful comparisons across diverse population groups? Testing measurement invariance using the confirmatory factor analysis framework. Medical Care, 44(Suppl 3), S78–S94.
Article PubMed Google Scholar
Jones, R. N., & Gallo, J. J. (2002). Education and sex differences in the Mini-Mental State Examination: Effects of differential item functioning. Journal of Gerontology: Psychological Sciences, 57B, P548–P558.
Google Scholar
Jones, R. N. (2006). Identification of measurement differences between English and Spanish language versions of the Mini-mental State Examination: Detecting differential item functioning using MIMIC modeling. Medical Care, 44(Suppl 3), S124–S133.
Article PubMed Google Scholar
Krause, N. (2002). A comprehensive strategy for developing closed-ended survey items for use in studies of older adults. Journal of Gerontology B Psychological Sciences, 57B, S263–S274.
Google Scholar
Allalouf, A., Hambleton, R., & Sireci, S. (1999). Identifying the causes of translation DIF on verbal items. Journal of Educational Measurement, 36, 185–198.
Article Google Scholar
Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement, 38, 164–187.
Article Google Scholar
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355–371.
Article Google Scholar
Nápoles-Springer, A. M., Santoyo-Olsson, J., O’Brien, H., & Stewart, A. L. (2006). Using cognitive interviews to develop surveys in diverse populations. Medical Care, 44(Suppl 3), S21–S30.
Article PubMed Google Scholar
McHorney, C. A. (2003). Ten recommendations for advancing patient-centered outcomes measurement for older persons. Annals of Internal Medicine, 139, 403–409.
PubMed Google Scholar
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Psychological Measurement, 15, 113–141.
Google Scholar
Wainer, H. (1993). Model-based standardized measurement of an item’s differential impact. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 123–135). Hillsdale NJ: Lawrence Erlbaum, Inc.
Nandakumar R., & Roussos L. (in press) Evaluation of CATSIB procedure in pretest setting. Journal of Educational and Behavioral Statistics.
Zwick, R., Thayer, D. T., & Wingersky, M. (1994). A simulation study of methods for assessing differential item functioning in computerized adaptive tests. Applied Psychological Measurement, 18, 121–140.
Article Google Scholar
Teresi, J. A., Holmes, D., Ramirez, M., Gurland, B. J., & Lantigua, R. (2001). Performance of cognitive tests among different racial/ethnic groups: Findings of differential item functioning and possible item bias. Journal of Mental Health and Aging, 7, 79–89.
Google Scholar
Teresi, J., Cross, P., & Golden, R. (1989). Some applications of latent trait analysis to the measurement of ADL. Journal of Gerontology: Social Sciences, 44, S196–S204.
CAS Google Scholar
Morales, L. S., Reise, S. P., & Hays, R. D. (2000). Evaluating the equivalence of health care ratings by whites and hispanics. Medical Care, 38, 517–527.
Article PubMed CAS Google Scholar
Orlando, M., & Marshall, G. N. (2002) Differential item functioning in a Spanish translation of the PTSD Checklist: Detection and evaluation of impact. Psychological Assessment, 14, 50–59.
Article PubMed Google Scholar
National Research Council. (2004). Measuring racial discrimination. Panel on methods for assessing discrimination. Committee on National Statistics, Division of Behavioral and Social Sciences and Education. Washington DC: The National Academies Press.
Johanson, G., & Alsmadi, A. (2002). Differential person functioning. Educational and Psychological Measurement, 62, 435–443.
Article Google Scholar
Hunter, J. E., & Schmidt, F. L. (2000) Racial and gender bias in ability and achievement tests. Psychology, Public Policy and Law, 6, 151–158.
Article Google Scholar

Download references

Acknowledgements

The authors thank Douglas Holmes for his review of several versions of this manuscript. The authors also thank Paul Crane and two anonymous reviewers and the editor for their helpful comments related to an earlier version of this manuscript. These analyses were conducted on behalf of the Statistical Coordinating Center to the Patient Reported Outcomes Information System (PROMIS) (AR052177). Funding for analyses was provided in part by the National Institute on Aging, Resource Center for Minority Aging Research at Columbia University (AG15294), and by the National Cancer Institute through the Veteran’s Administration Measurement Excellence and Training Resource Information Center (METRIC). An earlier version of this paper was presented at the National Institutes of Health Conference on Patient Reported Outcomes, Bethesda, June 2004.

Author information

Authors and Affiliations

Research Division, Hebrew Home for the Aged at Riverdale, 5901 Palisade Avenue, Riverdale, NY, 10471, USA
Jeanne A. Teresi
Center for Financing, Access and Cost Trends, Agency for Healthcare Research and Quality, Rockville, MD, USA
John A. Fleishman
Columbia University Stroud Center and Faculty of Medicine, New York State Psychiatric Institute, New York, NY, USA
Jeanne A. Teresi

Authors

Jeanne A. Teresi
View author publications
You can also search for this author in PubMed Google Scholar
John A. Fleishman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jeanne A. Teresi.

Additional information

The opinions expressed in this article are those of the authors. No official endorsement by AHRQ or the Department of Health and Human Services is intended or should be inferred.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Teresi, J.A., Fleishman, J.A. Differential item functioning and health assessment. Qual Life Res 16 (Suppl 1), 33–42 (2007). https://doi.org/10.1007/s11136-007-9184-6

Download citation

Received: 25 August 2006
Accepted: 30 January 2007
Published: 19 April 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11136-007-9184-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Differential item functioning and health assessment

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Differential item functioning and health assessment

Abstract

Access this article

Similar content being viewed by others

Reporting reliability, convergent and discriminant validity with structural equation modeling: A review and best-practice recommendations

RMSEA, CFI, and TLI in structural equation modeling with ordered categorical data: The story they tell depends on the estimation methods

A new criterion for assessing discriminant validity in variance-based structural equation modeling

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation