Abstract
In cross-national comparisons based on questionnaires, accurate translations are necessary to obtain valid results. Differential item functioning (DIF) analysis can be used to test whether translations of items in multi-item scales are equivalent to the original. In data from 10,815 respondents representing 10 European languages we tested for DIF in the nine translations of the EORTC QLQ-C30 emotional function scale when compared to the original English version. We tested for DIF using two different methods in parallel, a contingency table method and logistic regression. The DIF results obtained with the two methods were similar. We found indications of DIF in seven of the nine translations. At least two of the DIF findings seem to reflect linguistic problems in the translation. ‘Imperfect’ translations can affect conclusions drawn from cross-national comparisons. Given that translations can never be identical to the original we discuss how findings of DIF can be interpreted and discuss the difference between linguistic DIF and DIF caused by confounding, cross-cultural differences, or DIF in other items in the scale. We conclude that testing for DIF is a useful way to validate questionnaire translations.
Similar content being viewed by others
References
Cull A, Sprangers MA, Bjordal K and Aaronson NK. EORTCQuality of Life Study Group Translation Procedure. Brussels, 1998.
Bullinger M, Alonso J, Apolone G et al. Translating health status questionnaires and evaluating their quality: The IQOLA Project approach. J Clin Epidemiol 1998; 51: 913–923.
Meadows K, Bentzen N, Touw-Otten F. Cross-cultural issues: An outline of the important principles in establishing cross-cultural validity in health outcome assessment. In: Hutchinson A, Bentzen N, König-Zahn C(eds), Cross Cultural Health Outcome Assenssment: A User's Guide, 1997; 34–40.
Guillemin F, Bombardier C, Beaton D. Cross-cultural adaptation of health-related quality of life measures: Literature review and proposed guidelines. J Clin Epidemiol 1993; 46: 1417–1432.
Holland PW, Wainer H. Differential Item Functioning. Hilsdale, NJ: Lawrence Erlbaum Associates, 1993.
Avlund K, Era P, Davidsen M, Gause-Nilsson I. Item bias in self-reported functional ability among 75-year-old men and women in three Nordic localities. Scand J Soc Med 1996; 24: 206–217.
Bjorner JB, Kreiner S, Ware JE, Damsgaard MT, Bech P. Differential item functioning in the Danish translation of the SF-36. J Clin Epidemiol 1998; 51: 1189–1202.
Ellis BB, Becker P, Kimmel HD. An item bias theory evaluation of an English version of the Trier personalityinventory (TPI). J Cross-Cultural Psychol 1993; 24: 133–148.
Gierl MJ, Rogers WT, Klinger DA. Using statistical and judgmental reviews to identify and interpret translation differential item functioning. Alberta J Educ Res 1999; 45: 353–376.
Ellis BB, Mead AD. Assessment of the measurement equivalence of a Spanish translation of the 16PF questionnaire. Educ Psychol Meas 2000; 60: 787–807.
Sireci SG, Berberoglu G. Using bilingual respondents to evaluate translated-adapted items. Appl Meas Educ 2000; 13: 229–248.
Kreiner S. Validation of index scales for analysis of survey data: the Symptom index. In: Dean K (ed.), Population Health Research: Linking Theory and Methods. London: SAGE Publications, 1993: 116–144.
Osterlind SJ. Test Item Bias. Oxford: Oxford University Press, 1983.
Fayers PM, Machin D. Quality of Life. Assessment, Analysis and Interpretation. Chichester: John Wiley & Sons Ltd, 2000.
French AW, Miller TR. Logistic regression and its use in detecting differential item functioning in polytomous items. J Educ Meas 1996; 33: 315–332.
Swaminathan H, Rogers HJ. Detecting differential item functioning using logistic-regression procedures. J Educ Meas 1990; 27: 361–370.
Muraki E. Stepwise analysis of differential item functioning based on multiple-group partial credit model. J Educ Meas 1999; 36: 217–232.
Teresi JA, Kleinman M, Ocepek-Welikson K. Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Stat Med 2000; 19: 1651–1683.
Holland PW, Thayer DT. Differential Item Performance and the Mantel-Haenszel Procedure. In: Wainer H, Brain H (eds), Test Validity. Hillsdale NJ: LEA, 1988; 129–145.
Kreiner, S. User Guide to DIGRAM-A Program for Discrete Graphical Modelling, Technical Report 89-10. Statistical Research Unit, University of Copenhagen, 1989.
Kreiner S. Analysis of multidimensional contingency tables by exact conditional tests: Techniques and strategies. Scand J Stat 1987; 14: 97–112.
Mellenbergh GJ. Contingency table models for assessing item bias. J Educ Stat 1982; 7: 105–108.
Aaronson NK, Ahmedzai S, Bergman B et al. The European Organization for Research and Treatment of Cancer QLQ-C30: A quality-of-life instrument for use in international clinical trials in Oncology. J Natl Cancer Inst 1993; 85: 365–376.
Fayers PM, Aaronson NK, Bjordal K, Groenvold M, Curran D, Bottomley A. The EORTC QLQ-C30 Scoring Manual. Brussels: European Organization for Research and Treatment of Cancer, 2001.
Davis JA. A partial coefficient for Goodman and Kruskal's gamma. JASA 1967; 174–180.
Groenvold M, Bjorner JB, Klee MC, Kreiner S. Test for item bias in a quality of life questionnaire. J Clin Epidemiol 1995; 48: 805–816.
SAS/STAT User's Guide, Version 6, 4th ed., Vol. 2. Cary, NC: SAS Institute Inc., 1989.
Bjorner JB, Ware JE. Using Modern Psychometric Methods to Measure Health Outcomes. Medical Outcomes Trust 1998; 3: 12–16.
Reise SP, Widaman KF, Pugh RH. Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Psychol Bull 1993; 114: 552–566.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Petersen, M.A., Groenvold, M., Bjorner, J.B. et al. Use of differential item functioning analysis to assess the equivalence of translations of a questionnaire. Qual Life Res 12, 373–385 (2003). https://doi.org/10.1023/A:1023488915557
Issue Date:
DOI: https://doi.org/10.1023/A:1023488915557