A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression
- First Online:
- 546 Downloads
Several techniques have been developed to detect differential item functioning (DIF), including ordinal logistic regression (OLR). This study compared different criteria for determining whether items have DIF using OLR.
To compare and contrast findings from three different sets of criteria for detecting DIF using OLR. General distress and physical functioning items were evaluated for DIF related to five covariates: age, marital status, gender, race, and Hispanic origin.
1,714 patients with cancer or HIV/AIDS.
A total of 23 items addressing physical functioning and 15 items addressing general distress were selected from a pool of 154 items from four different health-related quality of life questionnaires.
The three sets of criteria produced qualitatively and quantitatively different results. Criteria based on statistical significance alone detected DIF in almost all the items, while alternative criteria based on magnitude detected DIF in far fewer items. Accounting for DIF by using demographic-group specific item parameters had negligible effects on individual scores, except for race.
Specific criteria chosen to determine whether items have DIF have an impact on the findings. Criteria based entirely on statistical significance may detect small differences that are clinically negligible.
KeywordsDifferential item functioning Ordinal logistic regression Test bias Item response theory Psychometrics
- 1.Hahn, E. A., Holzner, B., Kemmler, G., Sperner-Unterweger, B., Hudgens, S. A., & Cella, D. (2005). Cross-cultural evaluation of health status using item response theory: FACT-B comparisons between Austrian and U.S. patients with breast cancer. Evaluation & The Health Professions, 28, 233–259.CrossRefGoogle Scholar
- 4.Roorda, L. D., Jones, C. A., Waltz, M., Lankhorst, G. J., Bouter, L. M., van der Eijken, J. W., Willems, W. J., Heyligers, I. C., Voaklander, D. C., Kelly, K. D., & Suarez-Almazor, M. E. (2004). Satisfactory cross cultural equivalence of the Dutch WOMAC in patients with hip osteoarthritis waiting for arthroplasty. Annals of the Rheumatic Diseases, 63, 36–42.PubMedCrossRefGoogle Scholar
- 6.Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
- 7.Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks: Sage.Google Scholar
- 9.Holland, P. W., & Wainer, H. (Eds.) (1993). Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar
- 12.Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
- 17.Aaronson, N. K., Ahmedzai, S., Bergman, B., Bullinger, M., Cull, A., Duez, N. J., Filiberti, A., Flechtner, H., Fleishman, S. B., & de Haes, J. C., et al. (1993). The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in oncology. Journal of the National Cancer Institute, 85, 365–376.PubMedCrossRefGoogle Scholar
- 18.Cella, D. F., Tulsky, D. S., Gray, G., Sarafian, B., Linn, E., Bonomi, A., Silberman, M., Yellen, S. B., Winicour, P., & Brannon, J., et al. (1993). The Functional Assessment of Cancer Therapy scale: Development and validation of the general measure. Journal of Clinical Oncology, 11, 570–579.PubMedGoogle Scholar
- 19.Cella, D. F., & Bonomi, A. E. (1995). Measuring quality of life: 1995 update. Oncology (Williston Park), 9, 47–60.Google Scholar
- 25.Muraki, E., & Bock, D. (2003). PARSCALE for Windows. Chicago: SSI. Version 4.1.Google Scholar
- 26.Samejima, F. (1997). Graded response model. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory. NY: Springer.Google Scholar
- 27.Samejima F. (1969) Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph No. 17.Google Scholar
- 28.StataCorp. (2003). Stata statistical software: release 8.0. College Station, TX: StataCorp.Google Scholar
- 29.McCullagh P., & Nelder, J.A. (1989). Generalized linear models. London: Chapman and Hall.Google Scholar
- 33.Eton, D. T., Cella, D., Yost, K. J., Yount, S. E., Peterman, A. H., Neuberg, D. S., Sledge, G. W., & Wood, W. C. (2004). A combination of distribution- and anchor-based approaches determined minimally important differences (MIDs) for four endpoints in a breast cancer scale. Journal of Clinical Epidemiology, 57, 898–910.PubMedCrossRefGoogle Scholar
- 35.Long, J. S. (1997). Regression models for categorical and limited dependent variables. Advanced quantitative techniques in the social sciences. Thousand Oaks: Sage.Google Scholar
- 36.Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale, NJ: Erlbaum.Google Scholar