Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey



Previous research suggests that gender differences in patient-reported outcome measures (PROMs) may reflect measurement bias rather than true differences in underlying health status. The aim of this study is to examine whether the Veterans Rand 12-item Health Survey (VR-12) allows for unbiased comparisons of physical and mental health scores across gender. The VR-12 is a generic PROM consisting of 12 items with 3–6 response options for the measurement of mental and physical health.


Study data were from the 2015 Health Outcomes Survey pertaining to the Medicare beneficiaries. A total of 277,518 participants included 116,817 (42.1%) males and 160,701 (57.9%) females. Scale-level and item-level differential functioning methods were applied using multiple-group confirmatory factor analysis and ordinal logistic regression, respectively.


The scale-level differential functioning showed support for strict invariance (RMSEA = 0.045; CFI = 0.995) across gender. Although we found statistically significant differential item functioning for several items, the magnitude was negligible (maximum ΔR 2 = 0.007).


The VR-12 physical and mental health status scores are unbiased with respect to gender.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2


  1. 1.

    Bares, C., Andrade, F., Delva, J., Grogan-Kaylor, A., & Kamata, A. (2012). Differential item functioning due to gender between depression and anxiety items among Chilean adolescents. The International Journal of Social Psychiatry, 58(4), 386–392. doi:10.1177/0020764011400999.

    Article  PubMed  Google Scholar 

  2. 2.

    Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5, 41. doi:10.1186/1477-7525-5-41.

    Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Gao, Y., & Zhu, W. (2011). Identifying group-sensitive physical activities: A differential item functioning analysis of NHANES data. Medicine and Science in Sports and Exercise, 43(5), 922–929. doi:10.1249/MSS.0b013e3181fdcc25.

    Article  PubMed  Google Scholar 

  4. 4.

    Zumbo, B., & Koh, K. (2005). Manifestation of differences in item-level characteristics in scale-level measurement invariance tests of multi-group confirmatory factor analyses. Journal of Modern Applied Statistical Methods, 4(1), 24.

    Article  Google Scholar 

  5. 5.

    Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16, 33–42.

    Article  PubMed  Google Scholar 

  6. 6.

    The Centers for Medicare & Medicaid (CMS). (2016). Medicare health outcomes survey: 2015 cohort 18 baseline data user’s guide. Retrieved from http://www.hosonline.org/en/data-dissemination/data-users-guides/.

  7. 7.

    Kazis, L. E., Miller, D. R., Skinner, K. M., Lee, A., Ren, X. S., Clark, J. A., et al. (2006). Applications of methodologies of the veterans health study in the VA healthcare system: Conclusions and summary. The Journal of Ambulatory Care Management, 29(2), 182–188.

    Article  PubMed  Google Scholar 

  8. 8.

    Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-item health survey 1.0. Health Economics, 2(3), 217–227.

    CAS  Article  PubMed  Google Scholar 

  9. 9.

    Chum, A., Skosireva, A., Tobon, J., & Hwang, S. (2016). Construct validity of the SF-12v2 for the homeless population with mental illness: An instrument to measure self-reported mental and physical health. PLoS ONE, 11(3), e0148856. doi:10.1371/journal.pone.0148856.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–233.

    Article  PubMed  Google Scholar 

  11. 11.

    Byrne, B. M. (2012). Structural equation modeling with MPlus: Basic concepts, applications, and programming. Abington: Routledge.

    Google Scholar 

  12. 12.

    Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4.

    Article  Google Scholar 

  13. 13.

    Muthén, B., & Muthén, L. (2013). MPlus (version 7.4). Los Angeles, CA: Statmodel.

    Google Scholar 

  14. 14.

    Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. doi:10.1207/s15328007sem1302_2.

    Article  Google Scholar 

  15. 15.

    Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Dissertation Abstracts International, 63(10), 3527B.

    Google Scholar 

  16. 16.

    Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi:10.1207/S15328007SEM0902_5.

    Article  Google Scholar 

  17. 17.

    Ferro, M. A., & Boyle, M. H. (2012). Longitudinal invariance of measurement and structure of global self-concept: A population-based study examining trajectories among adolescents with and without chronic illness. Journal of Pediatric Psychology, 38, 425–437. doi:10.1093/jpepsy/jss112.

    Article  PubMed  Google Scholar 

  18. 18.

    Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi:10.1080/10705510701301834.

    Article  Google Scholar 

  19. 19.

    Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.

    Google Scholar 

  20. 20.

    Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. doi:10.1177/0013164402239317.

    Article  Google Scholar 

  21. 21.

    Selim, A. J., Rogers, W., Fleishman, J. A., Qian, S. X., Fincke, B. G., Rothendler, J. A., et al. (2009). Updated U.S. population standard for the veterans RAND 12-item Health Survey (VR-12). Quality of Life Research, 18(1), 43–52.

    Article  PubMed  Google Scholar 

  22. 22.

    Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental health summary scales: A user’s manual. Boston: Health Institute, New England Medical Center.

    Google Scholar 

  23. 23.

    Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J.-L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes, 13, 172. doi:10.1186/s12955-015-0365-7.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in sf-12 scores: True differences or differential item functioning? Medical Care, 41(7), III75–III86.

    Article  PubMed  Google Scholar 

  25. 25.

    Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170. doi:10.1097/01.mlr.0000245142.74628.ab.

    Article  PubMed  Google Scholar 

Download references


This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Dr. Sawatzky holds a Canada Research Chair in Patient-Reported Outcomes.

Author information



Corresponding author

Correspondence to Jae Yung Kwon.

Ethics declarations

Conflict of interest

All authors declare that they have no conflicts of interest.

Ethical approval

All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Since this was a retrospective study using publicly available data with a legally designated custodian, the research ethics board provided exemption from seeking formal approval.

Informed consent

For this type of study, formal consent is not required.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kwon, J.Y., Sawatzky, R. Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey. Qual Life Res 26, 2877–2883 (2017). https://doi.org/10.1007/s11136-017-1638-x

Download citation


  • Gender
  • Measurement equivalence
  • Differential item functioning
  • Patient-reported outcome measure
  • Health
  • VR-12