Quality of Life Research

, Volume 26, Issue 10, pp 2877–2883 | Cite as

Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey

  • Jae Yung KwonEmail author
  • Richard Sawatzky
Brief Communication



Previous research suggests that gender differences in patient-reported outcome measures (PROMs) may reflect measurement bias rather than true differences in underlying health status. The aim of this study is to examine whether the Veterans Rand 12-item Health Survey (VR-12) allows for unbiased comparisons of physical and mental health scores across gender. The VR-12 is a generic PROM consisting of 12 items with 3–6 response options for the measurement of mental and physical health.


Study data were from the 2015 Health Outcomes Survey pertaining to the Medicare beneficiaries. A total of 277,518 participants included 116,817 (42.1%) males and 160,701 (57.9%) females. Scale-level and item-level differential functioning methods were applied using multiple-group confirmatory factor analysis and ordinal logistic regression, respectively.


The scale-level differential functioning showed support for strict invariance (RMSEA = 0.045; CFI = 0.995) across gender. Although we found statistically significant differential item functioning for several items, the magnitude was negligible (maximum ΔR 2 = 0.007).


The VR-12 physical and mental health status scores are unbiased with respect to gender.


Gender Measurement equivalence Differential item functioning Patient-reported outcome measure Health VR-12 



This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Dr. Sawatzky holds a Canada Research Chair in Patient-Reported Outcomes.

Compliance with ethical standards

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Since this was a retrospective study using publicly available data with a legally designated custodian, the research ethics board provided exemption from seeking formal approval.

Informed consent

For this type of study, formal consent is not required.


  1. 1.
    Bares, C., Andrade, F., Delva, J., Grogan-Kaylor, A., & Kamata, A. (2012). Differential item functioning due to gender between depression and anxiety items among Chilean adolescents. The International Journal of Social Psychiatry, 58(4), 386–392. doi: 10.1177/0020764011400999.CrossRefPubMedGoogle Scholar
  2. 2.
    Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5, 41. doi: 10.1186/1477-7525-5-41.CrossRefPubMedPubMedCentralGoogle Scholar
  3. 3.
    Gao, Y., & Zhu, W. (2011). Identifying group-sensitive physical activities: A differential item functioning analysis of NHANES data. Medicine and Science in Sports and Exercise, 43(5), 922–929. doi: 10.1249/MSS.0b013e3181fdcc25.CrossRefPubMedGoogle Scholar
  4. 4.
    Zumbo, B., & Koh, K. (2005). Manifestation of differences in item-level characteristics in scale-level measurement invariance tests of multi-group confirmatory factor analyses. Journal of Modern Applied Statistical Methods, 4(1), 24.CrossRefGoogle Scholar
  5. 5.
    Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16, 33–42.CrossRefPubMedGoogle Scholar
  6. 6.
    The Centers for Medicare & Medicaid (CMS). (2016). Medicare health outcomes survey: 2015 cohort 18 baseline data user’s guide. Retrieved from
  7. 7.
    Kazis, L. E., Miller, D. R., Skinner, K. M., Lee, A., Ren, X. S., Clark, J. A., et al. (2006). Applications of methodologies of the veterans health study in the VA healthcare system: Conclusions and summary. The Journal of Ambulatory Care Management, 29(2), 182–188.CrossRefPubMedGoogle Scholar
  8. 8.
    Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-item health survey 1.0. Health Economics, 2(3), 217–227.CrossRefPubMedGoogle Scholar
  9. 9.
    Chum, A., Skosireva, A., Tobon, J., & Hwang, S. (2016). Construct validity of the SF-12v2 for the homeless population with mental illness: An instrument to measure self-reported mental and physical health. PLoS ONE, 11(3), e0148856. doi: 10.1371/journal.pone.0148856.CrossRefPubMedPubMedCentralGoogle Scholar
  10. 10.
    Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–233.CrossRefPubMedGoogle Scholar
  11. 11.
    Byrne, B. M. (2012). Structural equation modeling with MPlus: Basic concepts, applications, and programming. Abington: Routledge.Google Scholar
  12. 12.
    Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi: 10.1207/S15327906MBR3903_4.CrossRefGoogle Scholar
  13. 13.
    Muthén, B., & Muthén, L. (2013). MPlus (version 7.4). Los Angeles, CA: Statmodel.Google Scholar
  14. 14.
    Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. doi: 10.1207/s15328007sem1302_2.CrossRefGoogle Scholar
  15. 15.
    Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Dissertation Abstracts International, 63(10), 3527B.Google Scholar
  16. 16.
    Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi: 10.1207/S15328007SEM0902_5.CrossRefGoogle Scholar
  17. 17.
    Ferro, M. A., & Boyle, M. H. (2012). Longitudinal invariance of measurement and structure of global self-concept: A population-based study examining trajectories among adolescents with and without chronic illness. Journal of Pediatric Psychology, 38, 425–437. doi: 10.1093/jpepsy/jss112.CrossRefPubMedGoogle Scholar
  18. 18.
    Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi: 10.1080/10705510701301834.CrossRefGoogle Scholar
  19. 19.
    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.Google Scholar
  20. 20.
    Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. doi: 10.1177/0013164402239317.CrossRefGoogle Scholar
  21. 21.
    Selim, A. J., Rogers, W., Fleishman, J. A., Qian, S. X., Fincke, B. G., Rothendler, J. A., et al. (2009). Updated U.S. population standard for the veterans RAND 12-item Health Survey (VR-12). Quality of Life Research, 18(1), 43–52.CrossRefPubMedGoogle Scholar
  22. 22.
    Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental health summary scales: A user’s manual. Boston: Health Institute, New England Medical Center.Google Scholar
  23. 23.
    Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J.-L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes, 13, 172. doi: 10.1186/s12955-015-0365-7.CrossRefPubMedPubMedCentralGoogle Scholar
  24. 24.
    Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in sf-12 scores: True differences or differential item functioning? Medical Care, 41(7), III75–III86.CrossRefPubMedGoogle Scholar
  25. 25.
    Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170. doi: 10.1097/01.mlr.0000245142.74628.ab.CrossRefPubMedGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.School of NursingUniversity of British ColumbiaVancouverCanada
  2. 2.School of NursingTrinity Western UniversityLangleyCanada
  3. 3.Centre for Health Evaluation and Outcome SciencesProvidence Health Care Research InstituteVancouverCanada

Personalised recommendations