Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey
Previous research suggests that gender differences in patient-reported outcome measures (PROMs) may reflect measurement bias rather than true differences in underlying health status. The aim of this study is to examine whether the Veterans Rand 12-item Health Survey (VR-12) allows for unbiased comparisons of physical and mental health scores across gender. The VR-12 is a generic PROM consisting of 12 items with 3–6 response options for the measurement of mental and physical health.
Study data were from the 2015 Health Outcomes Survey pertaining to the Medicare beneficiaries. A total of 277,518 participants included 116,817 (42.1%) males and 160,701 (57.9%) females. Scale-level and item-level differential functioning methods were applied using multiple-group confirmatory factor analysis and ordinal logistic regression, respectively.
The scale-level differential functioning showed support for strict invariance (RMSEA = 0.045; CFI = 0.995) across gender. Although we found statistically significant differential item functioning for several items, the magnitude was negligible (maximum ΔR 2 = 0.007).
The VR-12 physical and mental health status scores are unbiased with respect to gender.
KeywordsGender Measurement equivalence Differential item functioning Patient-reported outcome measure Health VR-12
This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Dr. Sawatzky holds a Canada Research Chair in Patient-Reported Outcomes.
Compliance with ethical standards
Conflict of interest
All authors declare that they have no conflicts of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Since this was a retrospective study using publicly available data with a legally designated custodian, the research ethics board provided exemption from seeking formal approval.
For this type of study, formal consent is not required.
- 1.Bares, C., Andrade, F., Delva, J., Grogan-Kaylor, A., & Kamata, A. (2012). Differential item functioning due to gender between depression and anxiety items among Chilean adolescents. The International Journal of Social Psychiatry, 58(4), 386–392. doi: 10.1177/0020764011400999.CrossRefPubMedGoogle Scholar
- 2.Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5, 41. doi: 10.1186/1477-7525-5-41.CrossRefPubMedPubMedCentralGoogle Scholar
- 6.The Centers for Medicare & Medicaid (CMS). (2016). Medicare health outcomes survey: 2015 cohort 18 baseline data user’s guide. Retrieved from http://www.hosonline.org/en/data-dissemination/data-users-guides/.
- 9.Chum, A., Skosireva, A., Tobon, J., & Hwang, S. (2016). Construct validity of the SF-12v2 for the homeless population with mental illness: An instrument to measure self-reported mental and physical health. PLoS ONE, 11(3), e0148856. doi: 10.1371/journal.pone.0148856.CrossRefPubMedPubMedCentralGoogle Scholar
- 11.Byrne, B. M. (2012). Structural equation modeling with MPlus: Basic concepts, applications, and programming. Abington: Routledge.Google Scholar
- 13.Muthén, B., & Muthén, L. (2013). MPlus (version 7.4). Los Angeles, CA: Statmodel.Google Scholar
- 15.Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Dissertation Abstracts International, 63(10), 3527B.Google Scholar
- 17.Ferro, M. A., & Boyle, M. H. (2012). Longitudinal invariance of measurement and structure of global self-concept: A population-based study examining trajectories among adolescents with and without chronic illness. Journal of Pediatric Psychology, 38, 425–437. doi: 10.1093/jpepsy/jss112.CrossRefPubMedGoogle Scholar
- 19.Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
- 20.Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. doi: 10.1177/0013164402239317.CrossRefGoogle Scholar
- 22.Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental health summary scales: A user’s manual. Boston: Health Institute, New England Medical Center.Google Scholar
- 23.Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J.-L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes, 13, 172. doi: 10.1186/s12955-015-0365-7.CrossRefPubMedPubMedCentralGoogle Scholar