Abstract
Purpose
Previous research suggests that gender differences in patient-reported outcome measures (PROMs) may reflect measurement bias rather than true differences in underlying health status. The aim of this study is to examine whether the Veterans Rand 12-item Health Survey (VR-12) allows for unbiased comparisons of physical and mental health scores across gender. The VR-12 is a generic PROM consisting of 12 items with 3–6 response options for the measurement of mental and physical health.
Methods
Study data were from the 2015 Health Outcomes Survey pertaining to the Medicare beneficiaries. A total of 277,518 participants included 116,817 (42.1%) males and 160,701 (57.9%) females. Scale-level and item-level differential functioning methods were applied using multiple-group confirmatory factor analysis and ordinal logistic regression, respectively.
Results
The scale-level differential functioning showed support for strict invariance (RMSEA = 0.045; CFI = 0.995) across gender. Although we found statistically significant differential item functioning for several items, the magnitude was negligible (maximum ΔR 2 = 0.007).
Conclusion
The VR-12 physical and mental health status scores are unbiased with respect to gender.
References
Bares, C., Andrade, F., Delva, J., Grogan-Kaylor, A., & Kamata, A. (2012). Differential item functioning due to gender between depression and anxiety items among Chilean adolescents. The International Journal of Social Psychiatry, 58(4), 386–392. doi:10.1177/0020764011400999.
Covic, T., Pallant, J. F., Conaghan, P. G., & Tennant, A. (2007). A longitudinal evaluation of the Center for Epidemiologic Studies-Depression scale (CES-D) in a rheumatoid arthritis population using Rasch analysis. Health and Quality of Life Outcomes, 5, 41. doi:10.1186/1477-7525-5-41.
Gao, Y., & Zhu, W. (2011). Identifying group-sensitive physical activities: A differential item functioning analysis of NHANES data. Medicine and Science in Sports and Exercise, 43(5), 922–929. doi:10.1249/MSS.0b013e3181fdcc25.
Zumbo, B., & Koh, K. (2005). Manifestation of differences in item-level characteristics in scale-level measurement invariance tests of multi-group confirmatory factor analyses. Journal of Modern Applied Statistical Methods, 4(1), 24.
Teresi, J. A., & Fleishman, J. A. (2007). Differential item functioning and health assessment. Quality of Life Research, 16, 33–42.
The Centers for Medicare & Medicaid (CMS). (2016). Medicare health outcomes survey: 2015 cohort 18 baseline data user’s guide. Retrieved from http://www.hosonline.org/en/data-dissemination/data-users-guides/.
Kazis, L. E., Miller, D. R., Skinner, K. M., Lee, A., Ren, X. S., Clark, J. A., et al. (2006). Applications of methodologies of the veterans health study in the VA healthcare system: Conclusions and summary. The Journal of Ambulatory Care Management, 29(2), 182–188.
Hays, R. D., Sherbourne, C. D., & Mazel, R. M. (1993). The RAND 36-item health survey 1.0. Health Economics, 2(3), 217–227.
Chum, A., Skosireva, A., Tobon, J., & Hwang, S. (2016). Construct validity of the SF-12v2 for the homeless population with mental illness: An instrument to measure self-reported mental and physical health. PLoS ONE, 11(3), e0148856. doi:10.1371/journal.pone.0148856.
Ware, J., Kosinski, M., & Keller, S. D. (1996). A 12-item short-form health survey: Construction of scales and preliminary tests of reliability and validity. Medical Care, 34(3), 220–233.
Byrne, B. M. (2012). Structural equation modeling with MPlus: Basic concepts, applications, and programming. Abington: Routledge.
Millsap, R. E., & Yun-Tein, J. (2004). Assessing factorial invariance in ordered-categorical measures. Multivariate Behavioral Research, 39(3), 479–515. doi:10.1207/S15327906MBR3903_4.
Muthén, B., & Muthén, L. (2013). MPlus (version 7.4). Los Angeles, CA: Statmodel.
Beauducel, A., & Herzberg, P. Y. (2006). On the performance of maximum likelihood versus means and variance adjusted weighted least squares estimation in CFA. Structural Equation Modeling, 13(2), 186–203. doi:10.1207/s15328007sem1302_2.
Yu, C. (2002). Evaluating cutoff criteria of model fit indices for latent variable models with binary and continuous outcomes. Dissertation Abstracts International, 63(10), 3527B.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating goodness-of-fit indexes for testing measurement invariance. Structural Equation Modeling, 9(2), 233–255. doi:10.1207/S15328007SEM0902_5.
Ferro, M. A., & Boyle, M. H. (2012). Longitudinal invariance of measurement and structure of global self-concept: A population-based study examining trajectories among adolescents with and without chronic illness. Journal of Pediatric Psychology, 38, 425–437. doi:10.1093/jpepsy/jss112.
Chen, F. F. (2007). Sensitivity of goodness of fit indexes to lack of measurement invariance. Structural Equation Modeling, 14(3), 464–504. doi:10.1080/10705510701301834.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (dif): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Gelin, M. N., & Zumbo, B. D. (2003). Differential item functioning results may change depending on how an item is scored: An illustration with the center for epidemiologic studies depression scale. Educational and Psychological Measurement, 63(1), 65–74. doi:10.1177/0013164402239317.
Selim, A. J., Rogers, W., Fleishman, J. A., Qian, S. X., Fincke, B. G., Rothendler, J. A., et al. (2009). Updated U.S. population standard for the veterans RAND 12-item Health Survey (VR-12). Quality of Life Research, 18(1), 43–52.
Ware, J. E., Kosinski, M., & Keller, S. D. (1994). SF-36 physical and mental health summary scales: A user’s manual. Boston: Health Institute, New England Medical Center.
Bourion-Bédès, S., Schwan, R., Laprevote, V., Bédès, A., Bonnet, J.-L., & Baumann, C. (2015). Differential item functioning (DIF) of SF-12 and Q-LES-Q-SF items among French substance users. Health and Quality of Life Outcomes, 13, 172. doi:10.1186/s12955-015-0365-7.
Fleishman, J. A., & Lawrence, W. F. (2003). Demographic variation in sf-12 scores: True differences or differential item functioning? Medical Care, 41(7), III75–III86.
Teresi, J. A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl 3), S152–S170. doi:10.1097/01.mlr.0000245142.74628.ab.
Acknowledgements
This research was undertaken, in part, thanks to funding from the Canada Research Chairs program. Dr. Sawatzky holds a Canada Research Chair in Patient-Reported Outcomes.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
All authors declare that they have no conflicts of interest.
Ethical approval
All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. Since this was a retrospective study using publicly available data with a legally designated custodian, the research ethics board provided exemption from seeking formal approval.
Informed consent
For this type of study, formal consent is not required.
Rights and permissions
About this article
Cite this article
Kwon, J.Y., Sawatzky, R. Examining gender-related differential item functioning of the Veterans Rand 12-item Health Survey. Qual Life Res 26, 2877–2883 (2017). https://doi.org/10.1007/s11136-017-1638-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11136-017-1638-x