Previous research suggests that gender differences in patient-reported outcome measures (PROMs) may reflect measurement bias rather than true differences in underlying health status. The aim of this study is to examine whether the Veterans Rand 12-item Health Survey (VR-12) allows for unbiased comparisons of physical and mental health scores across gender. The VR-12 is a generic PROM consisting of 12 items with 3–6 response options for the measurement of mental and physical health.
Study data were from the 2015 Health Outcomes Survey pertaining to the Medicare beneficiaries. A total of 277,518 participants included 116,817 (42.1%) males and 160,701 (57.9%) females. Scale-level and item-level differential functioning methods were applied using multiple-group confirmatory factor analysis and ordinal logistic regression, respectively.
The scale-level differential functioning showed support for strict invariance (RMSEA = 0.045; CFI = 0.995) across gender. Although we found statistically significant differential item functioning for several items, the magnitude was negligible (maximum ΔR 2 = 0.007).
The VR-12 physical and mental health status scores are unbiased with respect to gender.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Bares, C., Andrade, F., Delva, J., Grogan-Kaylor, A., & Kamata, A. (2012). Differential item functioning due to gender between depression and anxiety items among Chilean adolescents. The International Journal of Social Psychiatry, 58(4), 386–392. doi:10.1177/0020764011400999.
Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5, 41. doi:10.1186/1477-7525-5-41.
Gao, Y., & Zhu, W. (2011). Identifying group-sensitive physical activities: A differential item functioning analysis of NHANES data. Medicine and Science in Sports and Exercise, 43(5), 922–929. doi:10.1249/MSS.0b013e3181fdcc25.
Zumbo, B., & Koh, K. (2005). Manifestation of differences in item-level characteristics in scale-level measurement invariance tests of multi-group confirmatory factor analyses. Journal of Modern Applied Statistical Methods, 4(1), 24.
Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16, 33–42.
The Centers for Medicare & Medicaid (CMS). (2016). Medicare health outcomes survey: 2015 cohort 18 baseline data user’s guide. Retrieved from http://www.hosonline.org/en/data-dissemination/data-users-guides/.
Kazis, L. E., Miller, D. R., Skinner, K. M., Lee, A., Ren, X. S., Clark, J. A., et al. (2006). Applications of methodologies of the veterans health study in the VA healthcare system: Conclusions and summary. The Journal of Ambulatory Care Management, 29(2), 182–188.
Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-item health survey 1.0. Health Economics, 2(3), 217–227.
Chum, A., Skosireva, A., Tobon, J., & Hwang, S. (2016). Construct validity of the SF-12v2 for the homeless population with mental illness: An instrument to measure self-reported mental and physical health. PLoS ONE, 11(3), e0148856. doi:10.1371/journal.pone.0148856.
Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–233.
Byrne, B. M. (2012). Structural equation modeling with MPlus: Basic concepts, applications, and programming. Abington: Routledge.
Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4.
Muthén, B., & Muthén, L. (2013). MPlus (version 7.4). Los Angeles, CA: Statmodel.
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. doi:10.1207/s15328007sem1302_2.
Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Dissertation Abstracts International, 63(10), 3527B.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi:10.1207/S15328007SEM0902_5.
Ferro, M. A., & Boyle, M. H. (2012). Longitudinal invariance of measurement and structure of global self-concept: A population-based study examining trajectories among adolescents with and without chronic illness. Journal of Pediatric Psychology, 38, 425–437. doi:10.1093/jpepsy/jss112.
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi:10.1080/10705510701301834.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. doi:10.1177/0013164402239317.
Selim, A. J., Rogers, W., Fleishman, J. A., Qian, S. X., Fincke, B. G., Rothendler, J. A., et al. (2009). Updated U.S. population standard for the veterans RAND 12-item Health Survey (VR-12). Quality of Life Research, 18(1), 43–52.
Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental health summary scales: A user’s manual. Boston: Health Institute, New England Medical Center.
Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J.-L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes, 13, 172. doi:10.1186/s12955-015-0365-7.
Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in sf-12 scores: True differences or differential item functioning? Medical Care, 41(7), III75–III86.
Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170. doi:10.1097/01.mlr.0000245142.74628.ab.
This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Dr. Sawatzky holds a Canada Research Chair in Patient-Reported Outcomes.
Conflict of interest
All authors declare that they have no conflicts of interest.
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Since this was a retrospective study using publicly available data with a legally designated custodian, the research ethics board provided exemption from seeking formal approval.
For this type of study, formal consent is not required.
About this article
Cite this article
Kwon, J.Y., Sawatzky, R. Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey. Qual Life Res 26, 2877–2883 (2017). https://doi.org/10.1007/s11136-017-1638-x
- Measurement equivalence
- Differential item functioning
- Patient-reported outcome measure