Quality of Life Research

, 16:69

A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression

  • Paul K. Crane
  • Laura E. Gibbons
  • Katja Ocepek-Welikson
  • Karon Cook
  • David Cella
  • Kaavya Narasimhalu
  • Ron D. Hays
  • Jeanne A. Teresi
Original Paper

Abstract

Background

Several techniques have been developed to detect differential item functioning (DIF), including ordinal logistic regression (OLR). This study compared different criteria for determining whether items have DIF using OLR.

Objectives

To compare and contrast findings from three different sets of criteria for detecting DIF using OLR. General distress and physical functioning items were evaluated for DIF related to five covariates: age, marital status, gender, race, and Hispanic origin.

Research design

Cross-sectional study.

Subjects

1,714 patients with cancer or HIV/AIDS.

Measures

A total of 23 items addressing physical functioning and 15 items addressing general distress were selected from a pool of 154 items from four different health-related quality of life questionnaires.

Results

The three sets of criteria produced qualitatively and quantitatively different results. Criteria based on statistical significance alone detected DIF in almost all the items, while alternative criteria based on magnitude detected DIF in far fewer items. Accounting for DIF by using demographic-group specific item parameters had negligible effects on individual scores, except for race.

Conclusions

Specific criteria chosen to determine whether items have DIF have an impact on the findings. Criteria based entirely on statistical significance may detect small differences that are clinically negligible.

Keywords

Differential item functioning Ordinal logistic regression Test bias Item response theory Psychometrics 

References

  1. 1.
    Hahn, E. A., Holzner, B., Kemmler, G., Sperner-Unterweger, B., Hudgens, S. A., & Cella, D. (2005). Cross-cultural evaluation of health status using item response theory: FACT-B comparisons between Austrian and U.S. patients with breast cancer. Evaluation & The Health Professions, 28, 233–259.CrossRefGoogle Scholar
  2. 2.
    Eremenco, S. L., Cella, D., & Arnold, B. J. (2005). A comprehensive method for the translation and cross-cultural validation of health status questionnaires. Evaluation & The Health Professions, 28, 212–232.CrossRefGoogle Scholar
  3. 3.
    Martin, M., Blaisdell, B., Kwong, J. W., & Bjorner, J. B. (2004). The Short-Form Headache Impact Test (HIT-6) was psychometrically equivalent in nine languages. Journal of Clinical Epidemiology, 57, 1271–1278.PubMedCrossRefGoogle Scholar
  4. 4.
    Roorda, L. D., Jones, C. A., Waltz, M., Lankhorst, G. J., Bouter, L. M., van der Eijken, J. W., Willems, W. J., Heyligers, I. C., Voaklander, D. C., Kelly, K. D., & Suarez-Almazor, M. E. (2004). Satisfactory cross cultural equivalence of the Dutch WOMAC in patients with hip osteoarthritis waiting for arthroplasty. Annals of the Rheumatic Diseases, 63, 36–42.PubMedCrossRefGoogle Scholar
  5. 5.
    Ryall, N. H., Eyres, S. B., Neumann, V. C., Bhakta, B. B., & Tennant, A. (2003). Is the Rivermead Mobility Index appropriate to measure mobility in lower limb amputees? Disability and Rehabilitation, 25, 143–153.PubMedCrossRefGoogle Scholar
  6. 6.
    Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
  7. 7.
    Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.Google Scholar
  8. 8.
    Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17, 297–334.CrossRefGoogle Scholar
  9. 9.
    Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
  10. 10.
    Crane, P. K., van Belle, G., & Larson, E. B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23, 241–256.PubMedCrossRefGoogle Scholar
  11. 11.
    Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361–370.CrossRefGoogle Scholar
  12. 12.
    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
  13. 13.
    Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with Center for Epidemiologic Studies Depression scale. Educational & Psychological Measurement, 63, 65–74.CrossRefGoogle Scholar
  14. 14.
    Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques: DIFdetect and difwithpar. Medical Care, 44, S115–S123.PubMedCrossRefGoogle Scholar
  15. 15.
    Ganz, P. A., Schag, C. A., Lee, J. J., & Sim, M. S. (1992). The CARES: A generic measure of health-related quality of life for patients with cancer. Quality of Life Research, 1, 19–29.PubMedCrossRefGoogle Scholar
  16. 16.
    Schag, C. A., Ganz, P. A., & Heinrich, R. L. (1991). Cancer Rehabilitation Evaluation System-short form (CARES-SF). A cancer specific rehabilitation and quality of life instrument. Cancer, 68, 1406–1413.PubMedCrossRefGoogle Scholar
  17. 17.
    Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C., et al. (1993). The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.PubMedCrossRefGoogle Scholar
  18. 18.
    Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., & Brannon, J., et al. (1993). The Functional Assessment of Cancer Therapy scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.PubMedGoogle Scholar
  19. 19.
    Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.Google Scholar
  20. 20.
    Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-Item Health Survey 1.0. Health Economics, 2, 217–227.PubMedCrossRefGoogle Scholar
  21. 21.
    McHorney, C. A., Ware, J. E. Jr., & Raczek, A. E. (1993). The MOS 36-Item Short-Form Health Survey (SF-36): II. Psychometric and clinical tests of validity in measuring physical and mental health constructs. Medical Care, 31, 247–263.PubMedCrossRefGoogle Scholar
  22. 22.
    Ware, J. E. Jr., & Sherbourne, C. D. (1992). The MOS 36-item short-form health survey (SF-36). I. Conceptual framework and item selection. Medical Care, 30, 473–483.PubMedCrossRefGoogle Scholar
  23. 23.
    Hu, L.-T., & Bentler, P. M. (1998). Fit indices in covariance structure modeling: Sensitivity to underparameterized model misspecification. Psychological Methods, 3, 424–453.CrossRefGoogle Scholar
  24. 24.
    Hu, L.-T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.CrossRefGoogle Scholar
  25. 25.
    Muraki, E., & Bock, D. (2003). PARSCALE for Windows. Chicago: SSI. Version 4.1.Google Scholar
  26. 26.
    Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory. NY: Springer.Google Scholar
  27. 27.
    Samejima F. (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17.Google Scholar
  28. 28.
    StataCorp. (2003). Stata statistical software: release 8.0. College Station, TX: StataCorp.Google Scholar
  29. 29.
    McCullagh P., & Nelder, J.A. (1989). Generalized linear models. London: Chapman and Hall.Google Scholar
  30. 30.
    Maldonado, G., & Greenland, S. (1993). Simulation study of confounder-selection strategies. American Journal of Epidemiology, 138, 923–936.PubMedGoogle Scholar
  31. 31.
    Crane, P. K., Hart, D. L., Gibbons, L. E., & Cook, K. F. (2006). A 37-item shoulder functional status item pool had negligible differential item functioning. Journal of Clinical Epidemiology, 59, 478–484.PubMedCrossRefGoogle Scholar
  32. 32.
    Cella, D., Hahn, E. A., & Dineen, K. (2002). Meaningful change in cancer-specific quality of life scores: Differences between improvement and worsening. Quality of Life Research, 11, 207–221.PubMedCrossRefGoogle Scholar
  33. 33.
    Eton, D. T., Cella, D., Yost, K. J., Yount, S. E., Peterman, A. H., Neuberg, D. S., Sledge, G. W., & Wood, W. C. (2004). A combination of distribution- and anchor-based approaches determined minimally important differences (MIDs) for four endpoints in a breast cancer scale. Journal of Clinical Epidemiology, 57, 898–910.PubMedCrossRefGoogle Scholar
  34. 34.
    Crane, P. K., Gibbons, L. E., Narasimhalu, K., Lai, J. S., & Cella D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The Functional Assessment of Cancer Therapy. Quality of Life Research, 16, 101–114.PubMedCrossRefGoogle Scholar
  35. 35.
    Long, J. S. (1997). Regression models for categorical and limited dependent variables. Advanced quantitative techniques in the social sciences. Thousand Oaks: Sage.Google Scholar
  36. 36.
    Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar

Copyright information

© Springer Science+Business Media B.V. 2007

Authors and Affiliations

  • Paul K. Crane
    • 1
  • Laura E. Gibbons
    • 1
  • Katja Ocepek-Welikson
    • 2
  • Karon Cook
    • 3
    • 4
  • David Cella
    • 5
    • 6
  • Kaavya Narasimhalu
    • 1
  • Ron D. Hays
    • 7
    • 8
  • Jeanne A. Teresi
    • 9
    • 10
    • 11
  1. 1.Department of Internal Medicine, Harborview Medical CenterUniversity of WashingtonSeattleUSA
  2. 2.Research DivisionHebrew Home for the Aged at RiverdaleRiverdaleUSA
  3. 3.Department of Rehabilitation MedicineUniversity of WashingtonSeattleUSA
  4. 4.HoustonUSA
  5. 5.Psychiatry and Behavioral Science, Institute for Healthcare StudiesNorthwestern UniversityEvanstonUSA
  6. 6.Center on Outcomes, Research and EducationEvanston Northwestern HealthcareEvanstonUSA
  7. 7.Health Services and MedicineUCLALos AngelesUSA
  8. 8.RANDSanta MonicaUSA
  9. 9.Columbia University Stroud Center and Faculty of MedicineNew York State Psychiatric InstituteNew YorkUSA
  10. 10.Research DivisionHebrew Home for the Aged at RiverdaleBronxUSA
  11. 11.Stroud Center for the Quality of LifeNew YorkUSA

Personalised recommendations