Abstract
Purpose
It is important for clinical practice and research that measurement scales of well-being and quality of life exhibit only minimal differential item functioning (DIF). DIF occurs where different groups of people endorse items in a scale to different extents after being matched by the intended scale attribute. We investigate the equivalence or otherwise of common methods of assessing DIF.
Method
Three methods of measuring age- and sex-related DIF (ordinal logistic regression, Rasch analysis and Mantel χ2 procedure) were applied to Hospital Anxiety Depression Scale (HADS) data pertaining to a sample of 1,068 patients consulting primary care practitioners.
Results
Three items were flagged by all three approaches as having either age- or sex-related DIF with a consistent direction of effect; a further three items identified did not meet stricter criteria for important DIF using at least one method. When applying strict criteria for significant DIF, ordinal logistic regression was slightly less sensitive.
Conclusions
Ordinal logistic regression, Rasch analysis and contingency table methods yielded consistent results when identifying DIF in the HADS depression and HADS anxiety scales. Regardless of methods applied, investigators should use a combination of statistical significance, magnitude of the DIF effect and investigator judgement when interpreting the results.
References
Warner, J. (2004). Clinicians’ guide to evaluating diagnostic and screening tests in psychiatry. Advances in Psychiatric Treatment, 10(6), 446–454.
Crawford, J. R., Garthwaite, P. H., & Slick, D. J. (2009). On percentile norms in neuropsychology: Proposed reporting standards and methods for quantifying the uncertainty over the percentile ranks of test scores. The Clinical Neuropsychologist, 23, 1173–1195.
Scott, N. W., Fayers, P. M., Aaronson, N. K., Bottomley, A., De Graaf, R., Groenvold, M., et al. (2010). Differential Item Functioning (DIF) analysis of health-related quality of life instruments using logistic regression. Health and Quality of Life Outcomes, 8(81), 1–9.
Isacsson, G., Adler, M. (2011) Randomized clinical trials underestimate the efficacy of antidepressants in less severe depression. Acta Psychiatrica Scandinavica, 125(8), 453–459.
Cameron, I. M., Crawford, J. R., Lawton, K., & Reid, I. C. (2013). Differential item functioning of the HADS and PHQ-9: An investigation of age, gender and educational background in a clinical UK primary care sample. Journal of Affective Disorders, 147(1–3), 262–268.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31–44.
Zigmond, A. S., & Snaith, P. (1983). The Hospital Anxiety and Depression Scale (HAD). Acta Psychiatrica Scandinavica, 67, 361–370.
Herrmann, C. (1997). International experiences with the Hospital Anxiety and Depression Scale—a review of validation data and clinical results. Journal of Psychosomatic Research, 42, 17–41.
Cameron, I. M., Lawton, K., & Reid, I. C. (2009). Appropriateness of antidepressant prescribing: An observational study in a Scottish primary-care setting. British Journal of General Practice, 59, 644–649.
Bjorner, J. B., Kreiner, S., Ware, J. E., Damsgaard, M. T., & Bech, P. (1998). Differential item functioning in the Danish translation of the SF-36. Journal of Clinical Epidemiology, 51(11), 1189–1202.
Crane, P. K., Gibbons, L. E., Jolley, L., & van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Medical Care, 44(11 Suppl 3), S115–S123.
Zumbo, B. D. (1999). A handbook on the theory and methods of Differential Item Functioning (DIF). Ottawa: Directorate of Human Resources Research and Evaluation, National Defense Headquarters.
Bond, T. G., & Fox, C. M. (2007). Applying The Rasch Model. Fundamental measurement in the human sciences (2nd ed.). New Jersey: Lawrence Eribaum Associates Inc.
Linacre, J. M. (2010). Winsteps Rash Measurement, 3.70.0.
Tennant, A., Penta, M., Tesio, L., Grimby, G., Thonnard, J. L., Slade, A., et al. (2004). Assessing and adjusting for cross-cultural validity of impairment and activity limitation scales through differential item functioning within the framework of the Rasch model: the PRO-ESOR project. Medical Care, 42(1 Suppl), I37–I48.
Penfield, R. D. (2007) DIFAS 4.0: Differential item functioning analysis system user’s manual.
Mantel, N. (1963). Chi square tests with one degree of freedom: Extension of the Mantel-Haenszel procedure. Journal of the American Statistical Association, 58, 690–700.
Liu, I., & Agresti, A. (1996). Mantel-Haenszel-type inference for cumulative odds ratios with a stratified ordinal response. Biometrics, 52, 1223–1234.
Penfield, R. D., & Algina, J. (2003). Applying the Liu-Agresti estimator of the cumulative common odds ratio to DIF detection in polytomous items. Journal of Educational Measurement, 40, 353–370.
Lambert, S., Pallant, J. F., Girgis, A. (2010) Rasch analysis of the Hospital Anxiety and Depression Scale among caregivers of cancer survivors: Implications for its use in psycho-oncology. Psycho-Oncology , 20(9), 919–925.
Pallant, J. F., & Tennant, A. (2007). An introduction to the Rasch measurement model: An example using the Hospital Anxiety and Depression Scale (HADS). British Journal of Clinical Psychology, 46(1), 1–18.
Yang, F. M., & Jones, R. N. (2007). Center for Epidemiologic Studies-Depression scale (CES-D) item response bias found with Mantel-Haenszel method was successfully replicated using latent variable modeling. Journal of Clinical Epidemiology, 60(11), 1195–1200.
Cole, S. R., Kawachi, I., Maller, S. J., & Berkman, L. F. (2000). Test of item-response bias in the CES-D scale. Experience from the New Haven EPESE study. Journal of Clinical Epidemiology, 53(3), 285–289.
Huang, F. Y., Chung, H., Kroenke, K., Dellucchi, K. L., & Spitzer, R. L. (2006). Using the Patient Health Questionnaire 9 to measure depression among racially and ethnically diverse primary care patients. Journal of General Internal Medicine, 21, 547–552.
Dorans, N. J., & Kulick, E. (2006) Differential item functioning on the Mini-Mental State Examination. An application of the Mantel-Haenszel and standardization procedures. Medical Care, 44(11 Suppl 3):S107–S114.
Jones, R. N. (2006). Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination. Detecting differential item functioning using MIMIC modeling. Medial Care, 44(11 Suppl 3):S124–S133.
Orlando Edelen, M. O., Thissen, D., Teresi, J. A., Kleinman, M., & Ocepek-Welikson, K. (2006) Identification of differential item functioning using item response theory and the likelihood-based model comparison approach. Application to the Mini-Mental State Examination. Medical Care, 44(11 Suppl 3):S134–S142.
Morales, L. S., Flowers, C., Gutierrez, P., Kleinman, M., & Teresi, J. A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework. Medical Care, 44(11 Suppl 3), S143–S151.
Acknowledgments
We would like to thank the primary care participants and general practices who kindly took part in the original study from which the data were collected. The original research from which the data presently analysed were collected was funded by the Centre for Change and Innovation, of the then Scottish Executive; and from Support for Science funding, Grampian NHS Research and Development. The present methodological investigations were conducted without additional funding.
Ethical standards
The anonymised data analysed in this study were originally collected for research conducted with the approval of the North of Scotland Research Ethics Committee (06/S0802/27).
Conflict of interest
IMC and NWS have nothing to declare. MA has received fees for speaking from Ostuka, AstraZeneca and Servier and served as consultant for Otsuka. ICR has received fees for speaking from AstraZeneca UK and received travel and meeting registration assistance from Lundbeck.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cameron, I.M., Scott, N.W., Adler, M. et al. A comparison of three methods of assessing differential item functioning (DIF) in the Hospital Anxiety Depression Scale: ordinal logistic regression, Rasch analysis and the Mantel chi-square procedure. Qual Life Res 23, 2883–2888 (2014). https://doi.org/10.1007/s11136-014-0719-3
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-014-0719-3