Abstract
Introduction
Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results.
Methods
Using our previously published ordinal logistic regression DIF results for the Fatigue scale of a HRQoL instrument as an example, the practical impact on a particular Norwegian clinical trial was investigated. The results were used to determine the difference in mean Fatigue scores assuming that the same trial was conducted in the UK. The results were then compared with published information on what would be considered a clinically important change in scores.
Results
The item with the largest DIF effect resulted in differences between the mean English and Norwegian Fatigue scores that, although small, could be considered clinically important. Sensitivity analyses showed that larger differences were found for shorter scales, and when the proportions in each response category were equal.
Discussion
Our scenarios suggest that translation differences in an item can result in small, but clinically important, differences at the scale score level. This is more likely to be problematic for observational studies than for clinical trials, where randomised groups are stratified by centre.
Similar content being viewed by others
References
Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. Medical Care, 44, S115–S123.
Groenvold, M., & Petersen, M. A. (2005). The role and use of differential item functioning (DIF) analysis of quality of life data from clinical trials. In P. Fayers & R. Hays (Eds.), Assessing quality of life in clinical trials (pp. 195–208). Oxford: Oxford University Press.
Teresi, J. A. (2006). Overview of quantitative measurement methods: Equivalence, invariance, and differential item functioning in health applications. Medical Care, 44, S39–S49.
Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44, S152–S170.
Scott, N. W., Fayers, P. M., Bottomley, A., Aaronson, N. K., de Graeff, A., Groenvold, M., et al. (2006). Comparing translations of the EORTC QLQ-C30 using differential item functioning analyses. Quality of Life Research, 15, 1103–1115.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 2, 31–44.
Millsap, R. E. (2006). Comments on methods for the investigation of measurement bias in the mini-mental state examination. Medical Care, 44, S171–S175.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.
Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51, 1189–1202.
Fayers, P., Aaronson, N., Bjordal, K., Groenvold, M., Curran, D., & Bottomley, A. (2001). EORTC QLQ-C30 scoring manual. Brussels: European Organization for Research and Treatment of Cancer.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ-C30. Quality of Life Research, 16, 115–129.
Wisloff, F., Hjorth, M., Kaasa, S., & Westin, J. (1996). Effect of interferon on the health-related quality of life of multiple myeloma patients: Results of a Nordic randomized trial comparing melphalan-prednisone to melphalan-prednisone + alpha-interferon. British Journal of Haematology, 94, 324–332.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–348). Hillsdale, NJ: Lawrence Erlbaum.
Author information
Authors and Affiliations
Corresponding author
Additional information
On behalf of the EORTC Quality of Life Group and the Quality of Life Cross-Cultural Meta-Analysis Group.
Appendix
Appendix
Coding the four response categories 0, 1, 2 and 3, respectively, the following three equations apply when conducting ordinal logistic regression using the proportional odds model:
where LANG is the language/translation (0 for English, 1 for Norwegian), FA is the overall Fatigue scale score and adjustment is also made for age (AGE) and other covariates.
These equations may then be written out separately for Norwegian (NO) and English (EN) speakers:
Combining the two equations and rearranging gives
From the results of our DIF analyses for Q18, the estimate of β1 was found to be −1.089. From Table 2, for this particular study:
Substituting these values into formulae [3] gives:
Hence the proportions choosing each category can be deduced, assuming that the same study was conducted using English-speaking patients:
By comparison with the proportions in Table 2, this would mean that English speakers would be more likely to score highly on this item than Norwegian speakers.
Assuming that Q18 is the only item with true DIF, how would this affect the mean Fatigue scale scores of the patients in this study? Using scores of 0, 33.33, 66.67 and 100 for the four categories of Q18, the average scores for this item for Norwegian and English speakers would be:
Therefore, Norwegian speakers would be expected to score on average 68.63 − 50.62 = 18.01 points lower on this item.
The Fatigue scale score is made up of three items of equal weighting, so this would mean that for the overall scale score Norwegians would be expected to score 18.01/3 = 6.00 more than English speakers on average.
Table 1 shows that the 95% confidence limits for the language effect for Q18 (β1) were −1.271 and −0.908. Following the same methods as for β1 itself (working not shown), this implies that the difference between English and Norwegian speakers on the Fatigue subscale would be 6.00 (95% CI: 5.03–6.95).
Rights and permissions
About this article
Cite this article
Scott, N.W., Fayers, P.M., Aaronson, N.K. et al. The practical impact of differential item functioning analyses in a health-related quality of life instrument. Qual Life Res 18, 1125–1130 (2009). https://doi.org/10.1007/s11136-009-9521-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-009-9521-z