Skip to main content
Log in

The practical impact of differential item functioning analyses in a health-related quality of life instrument

  • Published:
Quality of Life Research Aims and scope Submit manuscript

Abstract

Introduction

Differential item functioning (DIF) analyses are commonly used to evaluate health-related quality of life (HRQoL) instruments. There is, however, a lack of consensus as to how to assess the practical impact of statistically significant DIF results.

Methods

Using our previously published ordinal logistic regression DIF results for the Fatigue scale of a HRQoL instrument as an example, the practical impact on a particular Norwegian clinical trial was investigated. The results were used to determine the difference in mean Fatigue scores assuming that the same trial was conducted in the UK. The results were then compared with published information on what would be considered a clinically important change in scores.

Results

The item with the largest DIF effect resulted in differences between the mean English and Norwegian Fatigue scores that, although small, could be considered clinically important. Sensitivity analyses showed that larger differences were found for shorter scales, and when the proportions in each response category were equal.

Discussion

Our scenarios suggest that translation differences in an item can result in small, but clinically important, differences at the scale score level. This is more likely to be problematic for observational studies than for clinical trials, where randomised groups are stratified by centre.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  1. Holland, P. W., & Wainer, H. (1993). Differential item functioning. Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

  2. Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. Medical Care, 44, S115–S123.

    Article  PubMed  Google Scholar 

  3. Groenvold, M., & Petersen, M. A. (2005). The role and use of differential item functioning (DIF) analysis of quality of life data from clinical trials. In P. Fayers & R. Hays (Eds.), Assessing quality of life in clinical trials (pp. 195–208). Oxford: Oxford University Press.

    Google Scholar 

  4. Teresi, J. A. (2006). Overview of quantitative measurement methods: Equivalence, invariance, and differential item functioning in health applications. Medical Care, 44, S39–S49.

    Article  PubMed  Google Scholar 

  5. Teresi, J. A. (2006). Different approaches to differential item functioning in health applications: Advantages, disadvantages and some neglected topics. Medical Care, 44, S152–S170.

    Article  PubMed  Google Scholar 

  6. Scott, N. W., Fayers, P. M., Bottomley, A., Aaronson, N. K., de Graeff, A., Groenvold, M., et al. (2006). Comparing translations of the EORTC QLQ-C30 using differential item functioning analyses. Quality of Life Research, 15, 1103–1115.

    Article  PubMed  CAS  Google Scholar 

  7. Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 2, 31–44.

    Article  Google Scholar 

  8. Millsap, R. E. (2006). Comments on methods for the investigation of measurement bias in the mini-mental state examination. Medical Care, 44, S171–S175.

    Article  PubMed  Google Scholar 

  9. Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329–349.

    Article  Google Scholar 

  10. Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51, 1189–1202.

    Article  PubMed  CAS  Google Scholar 

  11. Fayers, P., Aaronson, N., Bjordal, K., Groenvold, M., Curran, D., & Bottomley, A. (2001). EORTC QLQ-C30 scoring manual. Brussels: European Organization for Research and Treatment of Cancer.

    Google Scholar 

  12. Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., de Graeff, A., Groenvold, M., et al. (2007). The use of differential item functioning analyses to identify cultural differences in responses to the EORTC QLQ-C30. Quality of Life Research, 16, 115–129.

    Article  PubMed  CAS  Google Scholar 

  13. Wisloff, F., Hjorth, M., Kaasa, S., & Westin, J. (1996). Effect of interferon on the health-related quality of life of multiple myeloma patients: Results of a Nordic randomized trial comparing melphalan-prednisone to melphalan-prednisone + alpha-interferon. British Journal of Haematology, 94, 324–332.

    Article  PubMed  CAS  Google Scholar 

  14. Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 337–348). Hillsdale, NJ: Lawrence Erlbaum.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Peter M. Fayers.

Additional information

On behalf of the EORTC Quality of Life Group and the Quality of Life Cross-Cultural Meta-Analysis Group.

Appendix

Appendix

Coding the four response categories 0, 1, 2 and 3, respectively, the following three equations apply when conducting ordinal logistic regression using the proportional odds model:

$$ {\text{logit(}}\Pr (Y \ge j ) )= \beta_{{0_{j} }} + \beta_{1}^{*} {\text{LANG}} + \beta_{2}^{*} {\text{FA}} + \beta_{3}^{*} {\text{AGE}} + \cdots ,\quad \, j = 1,2,3 $$
(1)

where LANG is the language/translation (0 for English, 1 for Norwegian), FA is the overall Fatigue scale score and adjustment is also made for age (AGE) and other covariates.

These equations may then be written out separately for Norwegian (NO) and English (EN) speakers:

$$ \begin{array}{*{20}c} {{\text{logit}}(\Pr (Y_{\text{NO}} \ge j )) = \beta_{{0_{j} }} + \beta_{1} + \beta_{2}^{*} {\text{FA}} + \beta_{3}^{*} {\text{AGE}} + \cdots ,\quad \, j = 1,2,3} \hfill \\ {{\text{logit}}(\Pr (Y_{\text{EN}} \ge j )) = \beta_{{0_{j} }} + \beta_{2}^{*} {\text{FA}} + \beta_{3}^{*} {\text{AGE}} + \cdots ,\quad \, j = 1,2,3} \hfill \\ \end{array} $$
(2)

Combining the two equations and rearranging gives

$$ {\text{Pr(}}Y_{\text{EN}} \ge \, j )= {\frac{1}{{\left( {{\frac{{1 - {\text{Pr(}}Y_{\text{NO}} \ge \, j )}}{{{\text{Pr(}}Y_{\text{NO}} \ge \, j )}}}} \right)e^{{\beta_{1} }} + 1}}},\quad j = 1,2,3 $$
(3)

From the results of our DIF analyses for Q18, the estimate of β1 was found to be −1.089. From Table 2, for this particular study:

$$ \begin{aligned} &{\Pr (Y_{\text{NO}} \ge 1) = 450/513 = 0.877}\\ &{\Pr (Y_{\text{NO}} \ge 2) = 242/513 = 0.472} \\ &{\Pr (Y_{\text{NO}} \ge 3) = 87/513\;\;= 0.170} \end{aligned} $$
(4)

Substituting these values into formulae [3] gives:

$$ \begin{array}{*{20}c} {\Pr (Y_{\text{EN}} \ge \, 1) = 0.955} \\ {\Pr (Y_{\text{EN}} \ge \, 2) = 0.726} \\ {\Pr (Y_{\text{EN}} \ge \, 3) = 0.378} \\ \end{array} $$
(5)

Hence the proportions choosing each category can be deduced, assuming that the same study was conducted using English-speaking patients:

$$ \begin{array}{ll} {{\text{Pr(not at all}}_{\text{EN}} )= 1- 0. 9 5 5= 0.0 4 5} \\ {{\text{Pr(a little}}_{\text{EN}} )= 0. 9 5 5- 0. 7 2 6= 0. 2 2 9} \\ {{\text{Pr(quite a bit}}_{\text{EN}} )= 0. 7 2 6- 0. 3 7 8= 0. 3 4 8} \\ {{\text{Pr(very much}}_{\text{EN}} )= 0. 3 7 8} \\ \end{array} $$
(6)

By comparison with the proportions in Table 2, this would mean that English speakers would be more likely to score highly on this item than Norwegian speakers.

Assuming that Q18 is the only item with true DIF, how would this affect the mean Fatigue scale scores of the patients in this study? Using scores of 0, 33.33, 66.67 and 100 for the four categories of Q18, the average scores for this item for Norwegian and English speakers would be:

$$ \begin{array}{*{20}c} {{\text{Norwegian: }}0 \times 0. 1 2 3+ 3 3. 3 3\times 0. 40 6+ 6 6. 6 7\times 0. 30 2+ 100 \times 0. 1 70 = 50. 6 2} \hfill \\ {{\text{English: }}0 \times 0.0 4 5+ 3 3. 3 3\times 0. 2 2 9+ 6 6. 6 7\times 0. 3 4 8+ 100 \times 0. 3 7 8= 6 8. 6 3} \hfill \\ \end{array} $$
(7)

Therefore, Norwegian speakers would be expected to score on average 68.63 − 50.62 = 18.01 points lower on this item.

The Fatigue scale score is made up of three items of equal weighting, so this would mean that for the overall scale score Norwegians would be expected to score 18.01/3 = 6.00 more than English speakers on average.

Table 1 shows that the 95% confidence limits for the language effect for Q18 (β1) were −1.271 and −0.908. Following the same methods as for β1 itself (working not shown), this implies that the difference between English and Norwegian speakers on the Fatigue subscale would be 6.00 (95% CI: 5.03–6.95).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Scott, N.W., Fayers, P.M., Aaronson, N.K. et al. The practical impact of differential item functioning analyses in a health-related quality of life instrument. Qual Life Res 18, 1125–1130 (2009). https://doi.org/10.1007/s11136-009-9521-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11136-009-9521-z

Keywords

Navigation