A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression

Crane, Paul K.; Gibbons, Laura E.; Ocepek-Welikson, Katja; Cook, Karon; Cella, David; Narasimhalu, Kaavya; Hays, Ron D.; Teresi, Jeanne A.

doi:10.1007/s11136-007-9185-5

A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression

Original Paper
Published: 07 June 2007

Volume 16, pages 69–84, (2007)
Cite this article

Quality of Life Research Aims and scope Submit manuscript

Paul K. Crane¹,
Laura E. Gibbons¹,
Katja Ocepek-Welikson²,
Karon Cook^3,4,
David Cella^5,6,
Kaavya Narasimhalu¹,
Ron D. Hays^7,8 &
…
Jeanne A. Teresi^9,10,11

1674 Accesses
109 Citations
Explore all metrics

Abstract

Background

Several techniques have been developed to detect differential item functioning (DIF), including ordinal logistic regression (OLR). This study compared different criteria for determining whether items have DIF using OLR.

Objectives

To compare and contrast findings from three different sets of criteria for detecting DIF using OLR. General distress and physical functioning items were evaluated for DIF related to five covariates: age, marital status, gender, race, and Hispanic origin.

Research design

Cross-sectional study.

Subjects

1,714 patients with cancer or HIV/AIDS.

Measures

A total of 23 items addressing physical functioning and 15 items addressing general distress were selected from a pool of 154 items from four different health-related quality of life questionnaires.

Results

The three sets of criteria produced qualitatively and quantitatively different results. Criteria based on statistical significance alone detected DIF in almost all the items, while alternative criteria based on magnitude detected DIF in far fewer items. Accounting for DIF by using demographic-group specific item parameters had negligible effects on individual scores, except for race.

Conclusions

Specific criteria chosen to determine whether items have DIF have an impact on the findings. Criteria based entirely on statistical significance may detect small differences that are clinically negligible.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ExternalRefStart http://www.common-metrics.org www.common-metrics.org ExternalRefEnd : a web application to estimate scores from different patient-reported outcome measures on a common scale

Article Open access 19 October 2016

Differential Item Functioning Analyses of the Patient-Reported Outcomes Measurement Information System (PROMIS®) Measures: Methods, Challenges, Advances, and Future Directions

Article 12 July 2021

A comparative performance analysis of the International Classification of Functioning, Disability and Health and the Item-Perspective Classification framework for classifying the content of patient reported outcome measures

Article Open access 23 April 2021

References

Hahn, E. A., Holzner, B., Kemmler, G., Sperner-Unterweger, B., Hudgens, S. A., & Cella, D. (2005). Cross-cultural evaluation of health status using item response theory: FACT-B comparisons between Austrian and U.S. patients with breast cancer. Evaluation & The Health Professions, 28, 233–259.
Article Google Scholar
Eremenco, S. L., Cella, D., & Arnold, B. J. (2005). A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Evaluation & The Health Professions, 28, 212–232.
Article Google Scholar
Martin, M., Blaisdell, B., Kwong, J. W., & Bjorner, J. B. (2004). The Short-Form Headache Impact Test (HIT-6) was psychometrically equivalent in nine languages. Journal of Clinical Epidemiology, 57, 1271–1278.
Article PubMed Google Scholar
Roorda, L. D., Jones, C. A., Waltz, M., Lankhorst, G. J., Bouter, L. M., van der Eijken, J. W., Willems, W. J., Heyligers, I. C., Voaklander, D. C., Kelly, K. D., & Suarez-Almazor, M. E. (2004). Satisfactory cross cultural equivalence of the Dutch WOMAC in patients with hip osteoarthritis waiting for arthroplasty. Annals of the Rheumatic Diseases, 63, 36–42.
Article PubMed CAS Google Scholar
Ryall, N. H., Eyres, S. B., Neumann, V. C., Bhakta, B. B., & Tennant, A. (2003). Is the Rivermead Mobility Index appropriate to measure mobility in lower limb amputees? Disability and Rehabilitation, 25, 143–153.
Article PubMed CAS Google Scholar
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.
Google Scholar
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.
Google Scholar
Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.
Article Google Scholar
Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.
Google Scholar
Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.
Article PubMed Google Scholar
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.
Article Google Scholar
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Google Scholar
Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with Center for Epidemiologic Studies Depression scale. Educational & Psychological Measurement, 63, 65–74.
Article Google Scholar
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44, S115–S123.
Article PubMed Google Scholar
Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.
Article PubMed CAS Google Scholar
Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). Cancer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.
Article PubMed CAS Google Scholar
Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C., et al. (1993). The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.
Article PubMed CAS Google Scholar
Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., & Brannon, J., et al. (1993). The Functional Assessment of Cancer Therapy scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.
PubMed CAS Google Scholar
Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.
CAS Google Scholar
Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.
Article PubMed CAS Google Scholar
McHorney, C. A., Ware, J. E. Jr., & Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care, 31, 247–263.
Article PubMed CAS Google Scholar
Ware, J. E. Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.
Article PubMed Google Scholar
Hu, L.-T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.
Article Google Scholar
Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
Article Google Scholar
Muraki, E., & Bock, D. (2003). PARSCALE for Windows. Chicago: SSI. Version 4.1.
Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory. NY: Springer.
Google Scholar
Samejima F. (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17.
StataCorp. (2003). Stata statistical software: release 8.0. College Station, TX: StataCorp.
McCullagh P., & Nelder, J.A. (1989). Generalized linear models. London: Chapman and Hall.
Google Scholar
Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.
PubMed CAS Google Scholar
Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.
Article PubMed Google Scholar
Cella, D., Hahn, E. A., & Dineen, K. (2002). Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening. Quality of Life Research, 11, 207–221.
Article PubMed Google Scholar
Eton, D. T., Cella, D., Yost, K. J., Yount, S. E., Peterman, A. H., Neuberg, D. S., Sledge, G. W., & Wood, W. C. (2004). A combination of distribution- and anchor-based approaches determined minimally important differences (MIDs) for four endpoints in a breast cancer scale. Journal of Clinical Epidemiology, 57, 898–910.
Article PubMed Google Scholar
Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J. S., & Cella D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The Functional Assessment of Cancer Therapy. Quality of Life Research, 16, 101–114.
Article PubMed Google Scholar
Long, J. S. (1997). Regression models for categorical and limited dependent variables. Advanced quantitative techniques in the social sciences. Thousand Oaks: Sage.
Google Scholar
Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.
Google Scholar

Download references

Acknowledgments

Some of these analyses were presented at the Advances in Health Outcomes Measurement: Exploring the Current State and the Future Applications of Item Response Theory, Item Banks, and Computerized-Adaptive Testing, June 24–25, 2004, in Bethesda, Maryland.

Author information

Authors and Affiliations

Department of Internal Medicine, Harborview Medical Center, University of Washington, 325 Ninth Avenue, Box 359780, Seattle, WA, 98104, USA
Paul K. Crane, Laura E. Gibbons & Kaavya Narasimhalu
Research Division, Hebrew Home for the Aged at Riverdale, 5901 Palisade Ave., Riverdale, NY, 10471, USA
Katja Ocepek-Welikson
Department of Rehabilitation Medicine, University of Washington, Seattle, WA, USA
Karon Cook
801 Cortlandt St., Houston, TX, 77007, USA
Karon Cook
Psychiatry and Behavioral Science, Institute for Healthcare Studies, Northwestern University, Evanston, IL, USA
David Cella
Center on Outcomes, Research and Education, Evanston Northwestern Healthcare, 1001 University Place, Suite 100, Evanston, IL, 60201, USA
David Cella
Health Services and Medicine, UCLA, 911 Broxton Avenue, Los Angeles, CA, 90095-1736, USA
Ron D. Hays
RAND, Santa Monica, CA, USA
Ron D. Hays
Columbia University Stroud Center and Faculty of Medicine, New York State Psychiatric Institute, New York, USA
Jeanne A. Teresi
Research Division, Hebrew Home for the Aged at Riverdale, Bronx, NY, USA
Jeanne A. Teresi
Stroud Center for the Quality of Life, 100 Haven Ave. Tower III 30-F, New York, NY, 10032, USA
Jeanne A. Teresi

Authors

Paul K. Crane
View author publications
You can also search for this author in PubMed Google Scholar
Laura E. Gibbons
View author publications
You can also search for this author in PubMed Google Scholar
Katja Ocepek-Welikson
View author publications
You can also search for this author in PubMed Google Scholar
Karon Cook
View author publications
You can also search for this author in PubMed Google Scholar
David Cella
View author publications
You can also search for this author in PubMed Google Scholar
Kaavya Narasimhalu
View author publications
You can also search for this author in PubMed Google Scholar
Ron D. Hays
View author publications
You can also search for this author in PubMed Google Scholar
Jeanne A. Teresi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Paul K. Crane.

Additional information

Sources of support: Data collection was supported by grant R01 CA 60068 (Cella). Support for these analyses for Drs. Crane and Gibbons and Ms. Narasimhalu was provided by grant K08 AG 022232 from the National Institute on Aging (Crane). Dr. Gibbons was also supported by grant P50 AG 05136 from the National Institute on Aging (Murray Raskind). Dr. Teresi was supported by the Columbia University Resource Center for Minority Aging Research (AG 15294) and the Statistical Coordinating Center for the Patient Reported Outcomes Measurement Information System (PROMIS), NIH Roadmap Project (AR 052177). Dr. Hays was also supported by the UCLA/DREW Project EXPORT, National Institutes of Health, National Center on Health & Health Disparities (P20-MD00148-01), and the UCLA Center for Health Improvement in Minority Elders/Resource Centers for Minority Aging Research, National Institutes of Health, National Institute on Aging (AG-02-004).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Crane, P.K., Gibbons, L.E., Ocepek-Welikson, K. et al. A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Qual Life Res 16 (Suppl 1), 69–84 (2007). https://doi.org/10.1007/s11136-007-9185-5

Download citation

Received: 26 August 2006
Accepted: 29 January 2007
Published: 07 June 2007
Issue Date: August 2007
DOI: https://doi.org/10.1007/s11136-007-9185-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression