Introduction

The Veterans Affairs (VA) is the largest integrated health care delivery system in the United States (US), providing care to over 6 million eligible veterans each year within 1200 geographically dispersed health care facilities [1,2,3]. With the size and scope of the VA health care delivery system comes a vast quantity of data on diagnoses, medication history, and sociodemographics, as well as provider and facility characteristics—all of which are stored in the VA electronic health records (EHR). Integration of data from several outside sources into the VA EHR (e.g., data for patients receiving care through the VA’s Community Care Program, Medicare data for patients aged ≥65) provide a nearly complete record of health care at both the patient- and organizational-level and an ideal data source for studying clinically important questions in veterans health. However, a major concern for the generalizability of EHR-based studies is selection bias, which is a systematic error of effect estimates introduced if the association between exposure and disease differs between those who contribute data to an EHR and those who do not [4,5,6]. Risk of selection bias is particularly high in cases when a subset of a population with a specific risk profile is strongly underrepresented in the study sample. As such, the VA beneficiary characteristics derived from the EHR raise concerns about selection bias and the generalizability of VA EHR-based research to the larger US veteran population. Nevertheless, the vast quantity of health-related data captured in the VA’s EHR represents an important resource for conducting health services research when properly interpreted.

The US Department of Veteran Affairs is a cabinet-level department tasked with providing services and benefits to US military veterans. The VA EHR data offer a critical tool for achieving these aims: they can be used to inform the development of prevention strategies tailored to the unique characteristics and needs of US veterans and to evaluate the efficacy of clinical and public health interventions. For example, veteran are recognized as a population at elevated risk of suicide [7, 8] and suicide prevention is an explicit focus of care improvement in the VA health system [9,10,11]. Recent research has used VA EHR data and machine learning technology to predict suicidal behavior in VA patients [12,13,14]. Using the information generated from this research, the VHA began national implementation of the Recovery Engagement and Coordination for Health-Veterans Enhanced Treatment (REACH VET) program, which applied the algorithm to identify patients in the highest suicide risk [12]. Using VA EHR data, subsequent studies evaluated the impact of the REACH VET program, finding that it was associated with greater treatment engagement and fewer mental health admissions, emergency department visits, and suicide attempts [15].

In addition to providing useful knowledge for improving the public health and medical care of US veterans, VA EHR data can be used to study the efficacy of clinical interventions, which can then be used to improve clinical care in the general US population. For example, although randomized controlled trials are generally considered the gold standard to determine the efficacy of medications and health care interventions, they are expensive, time-prohibitive, difficult to implement, and ill-suited for the study of high-risk interventions, rare outcomes, and the consequences of harmful exposures (e.g., exposure to potentially traumatic events) [16, 17]. Electronic health record data can provide information about the efficacy of health interventions in instances when randomized controlled trials are unfeasible or undesirable. For example, EHR data have been used to provide rapid results during the COVID-19 pandemic about the potential protective effect of some antihistamines on risk of SARS-CoV-2 infection [18]. However, demonstrating the efficacy of a treatment in one study sample (e.g., VA patients) does not necessarily provide evidence of its efficacy in other populations, whether that is US veterans in general or the total US adult population. For example, caution is warranted when interpreting results from EHR based studies due to selection bias. One particular type of selection bias of concern that is present in EHR based studies is called collider bias, which arises from the exposure of interest being associated with the likelihood of being observed and can result in spurious associations when none exists [19]. As such, no matter how rigorous or carefully executed an EHR-based research study, the results depend on the setting in which they were derived (e.g., VA patient population), and often depend on factors that might be constant within the studied population but different elsewhere. Because any given association between an exposure and an outcome will vary across settings and populations as a function of how different the study sample (e.g., VA patient population) and the target population (e.g., non-VA veteran population) are from one another, information on the distribution of covariates in the VA patient population and non-VA veteran population must be considered to use the knowledge generated from research conducted in VA EHR data to inform policy for populations outside the VA patient population.

Users of VA healthcare represent a population with greater physical, mental, and social challenges than the general US adult population [20,21,22] as well as the overall US veteran population [23,24,25,26,27,28]. The higher burden of health and social challenges present in the VA versus non-VA veteran population may be a consequence of the VA healthcare benefits eligibility criteria, which is based on each veteran’s military service history, disability rating, income level, and other benefits applicants receive (e.g., VA pension benefits).

Although prior research has provided insight into the sociodemographic and health characteristics that may vary between veterans who use the VA for their healthcare and non-VA veterans, at least two important gaps in the literature remain. First, most studies have focused on VA enrollees, a population that differs from veterans who use the VA for their healthcare [23,24,25,26,27,28], which is the patient population captured in the VA EHR. Veterans who use VA healthcare services live closer to VA facilities [29,30,31,32], and are more likely to have a psychiatric or substance use disorder diagnosis [29, 30], and greater healthcare needs [29, 31] than VA-enrolled veterans who do not. Because not all VA-enrolled veterans utilize VA health care services each year, prior research documenting sociodemographic and health differences in veterans by VA enrollment status may not adequately capture important differences between veterans overall and the VA patient population captured in the VA EHR. Second, the demographic profile of veterans, generally, and of the VA patient population in particular is changing: over the last two decades the age distribution has become younger, and the share of women veterans and racial/ethnic minorities has increased over the last two decades [33]; however, only two published studies have analyzed data that were collected within the past 10 years [26, 34], one of which limited its analysis to veterans with service-connected conditions [34, 35], and another that was focused on examining sociodemographic and health differences in veterans with versus without health coverage [26]. Therefore, the results of previous studies that have examined the differences between VA enrollees and non-VA veterans do not reflect the changing demographic profile of veterans, the representativeness of the VA patient population as contained in the VA EHR data remains unknown.

To address these limitations, we leveraged data from the 2019 National Health Interview Survey (NHIS) to characterize differences in the distribution of sociodemographic characteristics, physical and mental health, and health behaviors in US military veterans who did and did not use VA healthcare services during the past year. For this analysis, we selected variables that if they (a) have been previously shown to vary between VA and non-VA veterans or (b) are factors measured and available for study in the VA EHR data. The 2019 NHIS data are particularly well-suited for this analysis because of their large sample and ability to differentiate between veterans receiving VA healthcare services and veterans not receiving any past-year VA care. As such, this study provides the most current description of 1 year of VA use and non-use among non-institutionalized veterans.

Methods

Study population

We analyzed data on US veterans from the 2019 NHIS, a nationally representative household survey of the civilian noninstitutionalized US population. The investigation was carried out in accordance with the latest version of the Declaration of Helsinki and informed consent was obtained from all survey participants. The 2019 NHIS Sample Adult component included 31,997 adults, aged ≥18 years, of which 3061 (9.6%) were veterans, defined as adults who had ever served on active duty in the US Armed Forces, military Reserves, or National Guard and were not currently on active duty [36]. After excluding 10 respondents with missing age information, the analytic sample included 3051 veterans. These publicly available data are exempt from IRB review.

Measures

Past year VA healthcare use

Our primary predictor variable was past-year use of VA healthcare services. This variable captures all participants whose data would be included in the VA EHR. Past year VA healthcare use was assessed using the question “During the past 12 months, did you receive any care at a Veteran’s Health Administration facility or receive any other healthcare paid for by the VA?” A dichotomous variable assessed whether veterans did or did not endorse past year use of VA healthcare services, regardless of whether they also utilized a different type of healthcare coverage (labeled hereafter as “VA patients” and “non-VA veterans”, respectively).

Sociodemographic characteristics

Sociodemographic variables for this study included age (18–34, 35–44, 45–54, 55–64, 65+), gender (male, female), ethnicity and race (Hispanic, non-Hispanic: White, Black, Asian or Pacific Islander, other [Native American, Alaska Native, Other Race]), sexual orientation (heterosexual, sexual minority), education level (<high school, high school or equivalent, some college or more), and family income relative to the federal poverty line (FPL; < 100% FPL, 100–199% FPL, 200–399% FPL, or ≥ 400% FPL).

Chronic health conditions

Participants reported whether a doctor or other healthcare professional had ever diagnosed them with high blood pressure, heart disease, diabetes, cancer (excluding non-melanoma skin cancer), arthritis, asthma, or chronic lung disease (i.e., chronic obstructive pulmonary disease, emphysema, or chronic bronchitis). In addition to considering the 7 chronic health conditions individually, we also created a composite variable, coded yes if a participant reported having ever being diagnosed with 1 or more of the 7 selected chronic conditions. Self-reported physician-diagnosed medical conditions have been found to have high validity [37].

Pain

Pain frequency, severity, and specific pain conditions were assessed using questions developed by the Washington Group on Disability Statistics [38]. Respondents were first asked “In the past 3 months, how often did you have pain? Would you say never, some days, most days, or every day?” For those who had pain at least some days, a follow-up question assessing bothersomeness was asked: “Thinking about the last time you had pain, how much pain did you have—a little, between a little and a lot, or a lot?” Participants who reported pain at least some days in the past 3 months were considered to have any pain. Participants who reported pain on “most days” or “every day” during the past 3 months were considered to have frequent pain. Participants who reported pain on “most days” or “every day” in the past 3 months and that the pain bothered them “a lot” were considered to have severe pain. Finally, participants were asked separate questions about pain in specific areas of the body (back; hands, arms, or shoulder; hips, knees, or feet; abdominal, pelvic, or genitals; migraines or headaches; and tooth or jaw) in the past 3 months, and whether they had symptoms of arthritis-related joint pain in the past 30 days. All pain measures have been extensively validated in the US and internationally [38].

Mental health status

Depressive symptom severity was assessed using the Patient Health Questionnaire—version 8 (PHQ-8), with a value of ≥10 used to identify adults experiencing depression [39]. Generalized Anxiety Disorder scale—version 7 (GAD-7) was used to assess anxiety, with moderate/severe anxiety symptoms indicated by GAD-7 scores ≥10 [40].

Combustible and electronic cigarette use

Participants were categorized into three mutually exclusive groups based on whether they had smoked ≥100 cigarettes in their lifetime and smoked at least some days in the past 30 days: current smokers (≥100 lifetime cigarettes and past 30-day use), former smokers (≥100 lifetime cigarettes and no past 30-day use), and never smokers (smoked < 100 cigarettes in their lifetime). Current electronic cigarette use or “vaping” was based on respondents endorsing they now use electronic cigarettes either every day or some days.

Self-reported health status, disability, and obesity

An indicator variable for fair or poor self-reported health was constructed based on responses to the question “Would you say your health in general is excellent, very good, good, fair, or poor?”; coded as 1 if a participant endorsed fair or poor health and coded as 0 if they endorsed excellent, very good, or good. This dichotomous measure is a reliable and valid measure of general physical well-being and highly correlated with objective measures of functional impairment, morbidity, and mortality [41, 42]. Disability was assessed using the Washington Group Composite Disability indicator. Participants who reported having serious difficulty in either seeing, hearing, mobility, communication, cognition, or self-care were classified as having a disability [38]. Obesity was defined as current body mass index ≥30 kg/m2 [43].

Statistical analysis

Veterans were stratified by past-year VA healthcare use, and Pearson’s χ2 tests were used to evaluate differences between VA patients and non-VA veterans on sociodemographic characteristics, chronic health conditions, pain, mental health status, combustible cigarette use and vaping, and self-reported health. The χ2 test assumes the data were obtained through random selection, the data are frequencies or counts, with mutually exclusive levels of the variable, the study groups are independent, and the value of the cell expected should be 5 or more in at least 80% of the cells, with no cell having an expected count of less than one [44, 45]. In accordance with the American Statistical Association, we reported the actual P values, rather than expressing a statement of inequality (P < .05), to avoid the potential problem of incorrectly interpreting a P value as significant or not based on a pre-determined threshold value [46, 47]. All percentages and standard errors were calculated with SAS-callable SUDAAN 11.0.1 and NHIS sample weights were used to account for the complex survey design and survey nonresponse to produce estimates nationally representative of the non-institutionalized population of veterans residing in the US. Multivariable logistic regression models (SAS-callable SUDAAN 11.0.1) using sample weights generated weighted predicted marginal prevalence estimates (back-transformed from marginal log-odds) of sociodemographic and medical profiles in each US veteran group (VA patients and non-VA veterans), both unadjusted and adjusted for sociodemographic factors related to VA healthcare use, including age, gender, race and ethnicity, education, and family income. Predicted marginal prevalences were then used to calculate risk differences (RD) and adjusted risk differences (aRD) with 95% confidence intervals (CIs), which estimate group differences in absolute risk between VA patients and non-VA veterans. We estimated unadjusted and adjusted odds ratios (presented in the online appendix), which we transformed into Cohen’s d by \(d={L}_{OR}\frac{\sqrt{3}}{\pi }\), where π =3.14159 and LOR is the natural logarithm of the odds ratio to provide information on the magnitude of the effects [48], with effect sizes of d = 0.2, 0.5, and 0.8 indicating “small”, “medium”, and “large” effects, respectively [49].

Results

Most veterans were male (89.2%), non-Hispanic White (79.0%), heterosexual (97.4%), completed some college or more (65.4%); 47.9% aged 65 and above and 45.1% reported a family income > 400% the federal poverty line (Table 1). Approximately 32% of veterans reported receiving past-year VA healthcare services. Non-VA veterans were more likely to have higher incomes and to be non-Hispanic White than VA patients.

Table 1 Sociodemographic characteristics of US military veterans overall and by past-year use of Veterans Administration (VA) healthcare: NHIS 2019

VA patients had a higher burden of any chronic health condition (aRD = 11.94; 95%CI = 8.08–15.80), high blood pressure (aRD = 12.46; 95%CI = 8.25–16.67), diabetes (aRD = 7.73; 95%CI = 4.31–11.15), arthritis (aRD = 15.17; 95%CI = 10.90–19.45), and chronic lung disease (aRD = 6.65; 95%CI = 3.84–9.45) than non-VA veterans (Table 2). Pain was more prevalent among VA patients than non-VA veterans. The highest differences in pain prevalence were for any pain (aRD = 13.02; 95%CI = 9.01–17.03), frequent pain (aRD = 19.89; 95%CI = 15.64–24.14) and severe pain (aRD = 7.78; 95%CI = 4.24–10.71). Both depressive symptoms (aRD = 5.70; 95%CI = 2.91–8.49) and anxiety symptoms (aRD = 6.72; 95%CI = 4.28–9.16) were also more prevalent in VA patients than non-VA veterans. Although group differences in current cigarette use and current electronic cigarette use were negligible, VA patients were more likely to be former smokers (aRD = 8.73; 95%CI = 4.55–12.92) than non-VA veterans. Moderate differences in overall measures of general health (aRD = 12.87; 95%CI = 9.18–16.55) and disability (aRD = 10.58; 95%CI = 7.21–13.94) were observed between the two groups, with 27.9% of VA patients reporting fair or poor health and 20.2% reporting disability, compared to 18.0 and 11% of non-VA veterans.

Table 2 Health conditions and behaviors in US military veterans by past-year use of VA health care: NHIS, 2019

Figure 1 shows the magnitude of the sociodemographic and health differences between VA patients and non-VA veterans. The difference between VA patients and non-VA veterans in the prevalence of non-Hispanic Blacks were moderate (d ≥ 0.50), although differences between VA patients and non-VA veterans were small for all other race and ethnicity groups and income. For health conditions, we observed the largest group differences between VA patients and non-VA veterans for frequent pain (d = 0.49), severe pain (d = 0.40), depressive symptoms (d = 0.67), and anxiety symptoms (d = 0.47).

Fig. 1
figure 1

Love plot displaying unadjusted and adjusted effect sizes of 31 characteristics between US military veterans with past-year VA care and veterans without past-year VA care in the National Health Interview Survey (NHIS), 2019. Logistic models were used to generate unadjusted predicted marginal prevalences and adjusted predicted marginal prevalences, standardized to the distribution of sociodemographic characteristics of the sample. Regressions adjusted for age category, gender, race/ethnicity, education, and poverty status based on Federal Poverty Level. The unstandardized regression coefficients and pooled variance from the unadjusted and adjusted regression models were then used to calculate the Cohen’s d

Discussion

Using data from the nationally representative 2019 National Health Interview Survey, we documented important differences in the distribution of socioeconomic and health characteristics of veterans who use and who do not use VA services. There were several important findings from this study. First, consistent with prior work, the sociodemographic composition of VA patients in 2019 differed from non-VA veterans, the primary population of interest for many VA EHR-based studies [23]. Our finding that members of disadvantaged racial and ethnic minority groups and low-income veterans were overrepresented in the VA patient population is consistent with the prior research that defined VA healthcare use by VA enrollment status rather than VA healthcare use [23,24,25,26,27]. However, we observed relatively minimal differences in the age and gender distribution of VA patients and non-VA veterans, in contrast to these previously published studies that found women and younger veterans overrepresented in VA enrollees relative to non-enrolled veterans [23,24,25,26,27]. Although we cannot explain why VA enrollees, but not VA patients, are more likely to be younger and female than other veterans, women and younger VA enrollees may prefer to receive their care outside of the VA system; these groups may have greater access to non-VA healthcare (e.g., as part of their employment benefits) or have better health and lower healthcare needs than their peers.

Second, VA patients were disproportionately burdened by physical and psychological morbidity and disability, including higher prevalences of high blood pressure, diabetes, arthritis, chronic lung disease, frequent and severe pain, depression and anxiety symptoms, and fair/poor self-reported health. Although the over-representation of high-risk health conditions may be expected in a patient population accessing outpatient medical and hospital services [24, 27], the over-representation of physical and psychological morbidity and disability in the VA patient population may be exacerbated by the eligibility criteria for VA services, which prioritizes veterans with severe income limitations and service-connected disability [34, 35]. Veterans with service-connected conditions, particularly those with psychiatric disorders such as depression and PTSD, depend heavily upon the VA for health care. For example, Maynard et al. [34] found that veterans with service-connected psychiatric disorders accounted for most hospitalizations in the VA system, and almost half of VA enrollees with PTSD and/or major depression had one or more mental health visits in 2016. As such, we would expect VA patients to have greater physical and psychological morbidity and disability than non-VA veterans.

The over-representation of high-risk sociodemographic and health conditions in the VA patient population indicate that VA EHR-based studies may yield estimates that are not generalizable to the overall veteran population. However, statistical methods have been proposed to improve the generalizability of EHR results to populations of clinical and policy interest [50, 51]. For example, the substantial body of literature on suicide and its potential causes among veterans has relied heavily on data from the VA’s EHR databases [13, 52, 53], which will result in gaps in knowledge about those who do not receive care within the VA. Using information on the differences in the distribution of socioeconomic and health characteristics of veterans who use and who do not use VA services, future VA EHR-based studies could apply selection probabilities with model-based standardizations to estimate the results in the total US veteran population. The same approach can be applied to estimate the treatment effect in a population distinct from the study sample. For example, given that the factors that determine whether a person receives healthcare through the VA versus Medicare are well documented [54,55,56], similar methods could be applied to generalize results from the VA EHR data to the Medicare population to estimate the expected effect, for example, of implementing a VA program in the Medicare population. Although the specific set of characteristics that must be included in a selection model will depend on the research question being investigated, our study provides critical information on the variables that differentiate VA patients from non-VA veterans that future studies require to accurately estimate the conditional probability of being selected.

Study limitations are noted. First, the NHIS sample does not include homeless individuals or those in institutional settings. Although homeless veterans are disproportionately affected by physical and mental illness, they compromise a very small fraction of the VA patient population (~ 37,000 veterans were homeless in 2020 [57]), and thus their effect on the overall findings would be limited. Second, self-report measures of chronic medical conditions, mental distress, and anxiety in the NHIS are not confirmed with medical diagnosis or collateral information. Social desirability could lead to underreporting of stigmatized conditions, although there is no reason to believe this would vary by past-year VA healthcare use. Third, given the documented changes in the underlying VA patient population over time, our findings may not generalize to earlier years. Fourth, neither VA patients nor non-VA veterans were engaged as stakeholder partners in the planning, conduct, or dissemination phases of this study. However, our research team was comprised of a diverse set of experts, including VA research scientists and VA clinician/researchers, which are considered patient partners and stakeholder partners by the PCORI Engagement Rubric [58].

Our study provides valuable results on the representativeness of the VA patient population to the overall US veteran population. These findings will be useful for both hypothesizing about how inferences derived from VA EHR data will generalize to the overall US veteran population and minimizing the effect of bias in the context of differential patient population selection that affect both exposures and outcomes. Differences between the VA patient population and overall US veteran population should be continuously monitored to identify potential influential changes in their sociodemographic and clinical profile over time. Future research should investigate how to best use VA EHR data to better understand and meet the needs of all US veterans, including VA enrollees who might leave the VA for other public insurance options (e.g., Medicaid, Medicare) or those who choose community providers.