The disparities across racial and ethnic groups in the impact of COVID-19 in the United States (US) are well-established. Severe COVID-19, including death, has affected Asian American, Black and African American (BA/AA), Hispanic and Latina/Latino (HI/LA), and Indigenous American and Alaskan Native (IA/AN) communities at disproportionately high rates [1,2,3,4,5]. Articles have pointed to the cause of this increased risk: structural racism [1,2,3, 6,7,8], which has broad implications for individuals’ health and the quality of health care received [8,9,10], where and how individuals live[11], and the amount and type of their employment [2, 8]. Racial discrimination within healthcare systems and its impact on care delivery have been shown to contribute to differences in health outcomes [12,13,14]. Racial and ethnic minority populations are more likely to have jobs classified as essential while having less access to jobs with health insurance, flexible work from home schedules, and paid leave while sick or awaiting COVID-19 test results [8]. Housing disparities have led to unequal access to COVID-19 testing and differences in the ability to quarantine (e.g., apartment building versus single-family home) [7, 9, 11, 15, 16].

Our goal was to investigate if racial and ethnic disparities in severe COVID-19 were observed within a defined population of US adults with health insurance, and whether these disparities were primarily explained by increased risk of acquiring COVID-19 infection or progressing once infected (or both). In this cohort, information from severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) laboratory tests, hospitalization diagnosis codes, and death records allowed for assessment of primary COVID-19 infection and progression to severe disease, defined as invasive mechanical ventilation or death. If the risk of severe COVID-19 was different across racial and ethnic groups in the full population, but the same in the subset of individuals with COVID-19 infection, it would imply that observed disparities in severe COVID-19 were driven by increased infection rates among racial and ethnic minority groups.


Setting, Population, and Data Sources

We studied adult (18 years and older) individuals insured by and receiving care in the Colorado (Colorado State), Northwest (Oregon State, Southwest Washington State), and Washington (Washington State) regions of Kaiser Permanente (KP), a US health system that provides health care and health insurance. As of the cohort index date, February 29, 2020, individuals had to be continuously enrolled in their health system since September 1, 2019, or earlier (i.e., at least 6 months prior enrollment in their health system). Individuals with a COVID-19 infection prior to March 1, 2020, were excluded. Data were gathered from electronic health records (EHRs), insurance billing and other health system administrative sources, and state mortality certificates. Appropriate Institutional Review Boards approved this study.


COVID-19 infections were identified between March 1 and September 30, 2020. Infection was defined as present based on (1) EHR-recorded positive polymerase chain reaction (PCR) or nucleic acid amplification (NAA) tests; (2) hospital admissions with a COVID-19 International Classification Diseases version 10 (ICD-10) diagnosis code (“B342,” “B9721,” “B9729,” “U071,” “U072”), or (3) state death certificate with cause of death (COD) listed as COVID-19 or SARS infection due to coronavirus (“B342,” “B9721,” “B9729,” “U071,” “U072”). Severe COVID-19 was defined as hospitalization with invasive mechanical ventilation (see Appendix for codes used) or COVID-19-related death. COVID-19-related death was defined as a COVID-19 hospitalization with discharge status of expired, state death certificate COD listed as COVID-19 or SARS infection, or death within 28 days of COVID-19 infection. Interim (i.e., not finalized) state mortality certificates with cause of death (COD) information were available at KP Colorado and Washington through September 30, 2020; COD information was not available for KP Northwest. A sample of medical records from all three healthcare systems was reviewed to validate COVID-19 infection and invasive mechanical ventilation identified from diagnosis codes; positive predictive value for confirmed COVID-19 infection among hospitalizations with COVID-19 diagnosis codes was 96% (73/76; 95% confidence interval [CI] 89–99%) and for invasive mechanical ventilation identified through diagnosis and procedures codes was 100% (25/25; 95% CI 86–100%).

Race and Ethnicity

Self-reported race and ethnicity was gathered from EHRs. Individuals provide race and ethnicity information at clinic visits; in some clinics, HI/LA ethnicity was gathered through a separate ethnicity question, while in other clinics, ethnicity was included as a race category. We grouped individuals using both race and ethnicity information; non-HI/LA White individuals, the most privileged group in the US, was the reference group in all analyses. People who self-identified as HI/LA were classified as such, while all other individuals were grouped using all race information provided. In this study, individuals were considered multiracial if they did not identify as HI/LA and identified with at least two races other than White; individuals who selected two race categories, one of which was White, were categorized into the selected racial group that was not White.


Covariates were calculated as of February 29, 2020 (cohort entry). Age, gender, body mass index (BMI; < 25, 25–30, 30 + kg/m2), and smoking status (current, former, never) were extracted from the EHR; individuals with no smoking information recorded were classified as never smokers. Hypertension, asthma, atherosclerotic cardiovascular disease (cerebrovascular disease, myocardial infarction, peripheral vascular disease), chronic obstructive pulmonary disease (COPD), and heart failure (HF) were identified through ICD-10 codes in the year prior. Diabetes mellitus (DM) and chronic kidney disease (CKD) were identified using both ICD-10 codes and lab results, and DM was also identified through use of DM medication (Appendix). To measure CKD severity, we created an indicator for whether most recent estimated glomerular filtration rate (eGFR) in the prior 12 months was less than 30 mL/min per 1.73 m2; eGFR was calculated using EHR creatinine lab measures and the CKD-EPI formula [17], excluding the BA/AA race component following recent recommendations [18]. The Charlson score [19], excluding asthma, atherosclerotic cardiovascular disease, CKD, COPD, DM, and HF, was used to summarize additional medical comorbidity burden. Angiotensin-converting enzyme inhibitor (ACEI) and angiotensin II receptor blocker (ARB), oral steroid, and statin use at cohort entry were determined through pharmacy fill data. Neighborhood characteristics were defined using last known address as of March 1, 2020, and information from the 2016 and 2018 US American Community Survey conducted by the US Census Bureau. Neighborhood deprivation index (NDI) [20] was calculated at the census tract using the 2016 American Community Survey and categorized in quartiles. Neighborhood information on the proportion of housing units with more than 1 person per room was calculated at the census tract using the 2018 American Community Survey. See Supplementary Material for detailed variable definitions, including lists of ICD-10 codes used.

Statistical Analyses

We described the full cohort, those who had a COVID-19 infection, and those who had a COVID-19 PCR/NAA test (positive or negative), overall and by racial and ethnic groups. We estimated the relative risk (RR) of severe COVID-19 across racial and ethnic groups in the full cohort with 5 models, which progressively adjusted for more covariates to understand how each potential confounder set changed RR estimates. Model 1 adjusted for health system only (essentially a proxy for geographic region) and was designed to estimate the relative risk of COVID-19 infection or severe infection, accounting for geographical differences. Model 2 added age modeled with a linear spline (knots at 35, 45, 55, 65, and 75 years) and sex; this next adjustment set was selected because the age distribution is quite different across racial and ethnic groups in the US [21] and because age is a well-established risk factor for more severe COVID-19 disease. Model 3 added BMI and smoking status; these covariates were considered next as both BMI and smoking are known risk factors for COVID-19 infection and progression to more severe symptoms and their distribution differs between racial/ethnic subgroups. Model 4 added comorbid conditions and use of specific medications (see above) to evaluate if differences in chronic disease burden across racial and ethnic groups explained remaining disparities. Model 5 added NDI and percent of housing units in census tract with more than 1 person per room; these covariates were added to the adjustment set in the last model to evaluate whether after adjusting for all other factors socioeconomic characteristics and household density [21] further explained any differences observed. See Supplementary Material (Table M2) for a summary of the covariate set used in each model.

We fit the same 5 models to estimate RRs for severe COVID-19 among individuals who had a COVID-19 infection, to see if disparities remained. If higher rates of severe COVID-19 among racial and ethnic minority groups, compared to non-HI/LA White individuals, were no longer evident when analyses were restricted to individuals with any COVID-19 infection, this would indicate that disparities were driven by higher infection rates. We fit the same 5 models in the full cohort to estimate RRs for any COVID-19 infection to investigate differences in infection rates across racial and ethnic groups. Lastly, we calculated rates of positive PCR/NAA tests among those people tested, overall and by racial and ethnic groups. If asymptomatic individuals or individuals with mild symptoms from racial or ethnic minorities groups were tested more often compared to non-HI/LA White individuals, RRs of severe COVID-19 would be biased downwards in the sample of individuals with COVID-19 infection and positivity rates would also likely be lower. In all models limited to individuals who had been tested for, or diagnosed with, COVID-19, we additionally adjusted for month of infection or COVID-19 test, because availability of testing varied over our follow-up period.

We modeled all binary outcomes using modified Poisson regression [22] and used inverse probability weights to address missing data [23]. Missing follow-up occurred when individuals disenrolled from their health system before September 30, 2020; BMI and NDI also had missing values. Therefore, up to 3 analytic weights were multiplied together to account for missing data; see Supplementary Material (Table M2) for covariates included in missing data models. When the sample was limited to individuals with a PCR/NAA test, not enough people were missing NDI data to estimate weights; thus, missing NDI weights were not used in this sample. We calculated standard errors using the robust sandwich estimator to account for the mis-specified variance when using a Poisson model for a binary outcome and the analytic missing data weights [22, 23]. In addition to estimating RRs and 95% CIs, we marginalized over covariates to estimate absolute risks of outcomes in each racial and ethnic group. To assess impact of shorter prior enrollment times leading to potential covariate measurement error, we conducted sensitivity analyses including only individuals continuously enrolled in their health system since March 1, 2019 (or earlier). Descriptive tables were constructed using R version 3.6.3 (Vienna, Austria); all other analyses were performed using Stata version 15.1 (College Station, TX).

Role of the Funding Sources

The funders had no role in selecting health systems, study design, data collection, data analysis, data interpretation, or writing up of the results.


Our cohort included 1,053,118, individuals (53% female): Of these, 31.0% were from KP Colorado, 39.5% KP Northwest, and 29.5% KP Washington. The 344 individuals with unknown or non-binary gender were excluded from all analyses; there were no observed COVID-19 infections in this population. The racial and ethnic make-up of our analytic cohort (N = 1,052,774) is shown in Table 1. While the population was 68.7% White, substantial numbers of individuals from racial and ethnic minority groups were included (e.g., 68,887 Asian American, 41,243 BA/AA, and 93,580 HI/LA individuals). The Supplementary Material (Tables R5 and R6) contains detailed information about individuals who identified as HI/LA and individuals who were classified as multiracial. The full cohort, overall and by racial and ethnic groups, is described in Table 1. Differences were seen across racial and ethnic groups in demographics (e.g., White individuals were older; mean age 53.2 versus 43.8–51.4 years for all other racial and ethnic groups) and medical conditions (e.g., more BA/AA, IA/AN, and Native Hawaiian/Pacific Islander [NH/PI] individuals had DM; BA/AA: 19.8%, IA/AN: 19.2%, and NH/PI: 20.8% compared to 13.1–17.6% for other groups).

Table 1 Characteristics of study population, overall and by race and ethnicity (column percentages)

A total of 961,917 (91.4%) individuals remained enrolled in their health system through the entire follow-up period, and 7,399 people had a COVID-19 infection (see Supplementary Material for description of individuals with COVID-19 PCR/NAA test and a COVID-19 infection). Among the individuals with COVID-19, 1,205 (16.3%) were hospitalized and 442 people had severe COVID-19 (invasive mechanical ventilation or death). Details of COVID-19 outcomes across racial and ethnic groups are in Supplementary Material (Tables R6 and R7). Striking differences were seen in RRs of severe COVID-19 across racial and ethnic groups; these patterns persisted even after adjusting for demographics, medical comorbidities, and neighborhood information (Table 2). After adjusting for all covariates, Asian American, BA/AA, and HI/LA individuals were twice as likely as Whites to experience severe COVID-19 (Asian American RR: 2.09 [95% CI:1.36, 3.21]; BA/AA: 2.02 [1.39, 2.93]; HI/LA: 2.09 [1.57, 2.78]). Examining the impact of adjusting for nested confounder sets, RR estimates changed most after adjusting for age and sex, indicating increased risk compared to White individuals. Further adjusting for BMI, smoking, medical comorbidities, and medication use had less impact on RRs and adjusting for neighborhood characteristics had little impact on RRs.

Table 2 Estimated relative risk (95% confidence intervals) for severe COVID-19

Restricting to individuals with confirmed COVID-19 (Table 2, bottom half) resulted in RRs for severe COVID-19 closer to 1. After adjusting for all measured covariates, BA/AA people had an estimated RR of 0.95 (95% CI 0.66, 1.37) and for HI/LA individuals the RR was 0.97 (0.72, 1.31). The RR for severe COVID-19 remained elevated for Asian American individuals in this COVID-19-positive cohort (1.82 [1.23, 2.68]).

Risk of COVID-19 infection was increased among Asian American, BA/AA, HI/LA, and NH/PI individuals compared to non-HI/LA White individuals after adjusting for all measured covariates; estimated RRs compared to non-HI/LA White individuals were 1.20 (1.08, 1.33), 1.89 (1.72, 2.09), 2.67 (2.50, 2.84), and 2.49 (2.04, 3.03), respectively (Table 3). Table 4 presents estimated absolute rates of severe COVID-19 across racial and ethnic groups and Table 5 reports estimated absolute rates of COVID-19 infection. Among people tested for COVID-19, accounting for all measured covariates, the positivity rate for Asian American individuals was 574 per 10,000 individuals (95% CI: 519, 629), BA/AA: 686 (627, 745), HI/LA: 926 (880, 972), and NH/PI: 884 (723, 1046), which were all higher than for IA/AN (376 [264, 487]) and non-HI/LA White (394 [381, 407]) individuals. Adjusting for covariates did not have large impacts on absolute risk estimates of experiencing a COVID-19 infection or of testing positive for COVID-19.

Table 3 Estimated relative risk (95% confidence intervals) for any COVID-19 infection
Table 4 Estimated absolute rates of severe COVID-19 outcome in each racial and ethnic group, marginalizing out over potential confounders. Reported as number out of 10,000 (95% confidence interval) people
Table 5 Estimated absolute rates of positive COVID-19 PCR/NAA tests in each racial and ethnic group, marginalizing out over potential confounders. Reported as number out of 10,000 (95% confidence interval) tests performed

Sensitivity analyses requiring continuous enrollment for 12 months prior to cohort entry produced similar results (Appendix Tables S1–S4).


In this large cohort from three US health systems, large differences in risk of severe COVID-19 by self-reported race/ethnicity were no longer seen in analyses limited to people with documented COVID-19 infection. This finding suggests that disparities in severe COVID-19 in BA/AA and HI/LA individuals were explained by increased rates of COVID-19 infection. Increased risk of severe COVID-19 for Asian American individuals was partially, but not fully, explained by increased COVID-19 infection rates. Among individuals who experienced COVID-19, risk of severe COVID-19 outcome was still increased by nearly 80% in Asian American individuals compared to that in non-HI/LA White individuals. Less representation of people identifying as IA/AN, NH/PI, or multiracial in this cohort led to wide CIs. After adjusting for age and gender, the RR of severe COVID-19 in most racial and ethnic groups noticeably increased relative to non-HI/LA White individuals; this is likely because in our cohort, as in the general US population, the age distribution was shifted toward younger ages in Asian American, BA/AA, and HI/LA individuals compared to non-Hispanic White individuals, and younger individuals are less likely to experience severe COVID-19 infection. The proportion of individuals who tested positive for COVID-19 was higher among Asian American, BA/AA, HI/LA, and NH/PI individuals compared to IA/NA and non-HI\LA White individuals. If “over-testing” (i.e., testing a larger proportion of asymptomatic people or people with mild symptoms) was prominent, we would expect the proportion of individuals who tested positive for COVID-19 among all those tested would be low. In our cohort, positivity rates indicate that “over-testing” did not lead to capture of more of the less-severe COVID-19 cases in racial and ethnic minority groups.

Our findings align with literature examining disparities in COVID-19 infections and outcomes in the US. Mackey et al. performed a systematic review of research evaluating racial and ethnic disparities in COVID-19 infection and outcomes [24]. They synthesized the results of 37 articles and reported that BA/AA and HI/LA populations had higher COVID-19 infection risk, hospitalization, and COVID-19-related mortality compared to non-HI/LA White populations, but that case-fatality rates, estimated mostly from in-hospital mortality, were similar across these populations. They reported that Asian populations experienced similar outcomes to non-HI/LA White populations, but the strength of that evidence was low, and there was insufficient evidence to identify disparities in other populations.

Several articles have appeared since the Mackey et al. review. Ogedegbe et al. reported on a retrospective cohort study of 9,722 patients tested for COVID-19 in the New York University Langone Health System [25]. They reported that the odds of having a positive test were higher among BA/AA and HI/LA patients compared to White patients. The odds of hospitalization, among those patients who tested positive for COVID-19, was similar for BA/AA, HI/LA, and White patients, but was slightly higher among Asian American patients (odds ratio: 1.6 [1.1, 2.3]), similar to our findings. Escobar et al. found that Asian American, BA/AA, and HI/LA individuals (in Kaiser Permanente Northern California) had higher odds of COVID-19 infection compared to non-HI/LA White individuals, but that there were no differences across racial and ethnic groups in mortality among hospitalized patients [26]. A retrospective study conducted in the US Department of Veterans Affairs by Razjouyan et al. found higher testing positivity rates and increased risk of hospitalization due to COVID-19 among BA/AA and HI/LA veterans, but also found that among those hospitalized there were no differences across race and ethnic groups in intensive care utilization or death [27]. Khanna et al. found, in the population of patients served by the University of Maryland family medicine and immediate care, BA/AA patients and HI/LA patients were more likely to experience COVID-19 infection than non-HI/LA White patients [28]. Khanna et al. also found that patients living in neighborhoods with predominantly Black residents and with a higher deprivation index score had a higher risk of COVID-19 infection compared to patients living in less socioeconomically deprived areas that were predominantly White [28]. Similar results found by Khanijahani and Tomassoni showed that counties classified as more socioeconomically and racially segregated neighborhoods had higher rates of COVID-19 deaths [29]. In our analyses, accounting for NDI and percent of households with more than one person per room did not impact RR estimates. It is possible that the reason these factors did not change risk estimates is that the individuals in our population were insured and 70% of the people in our sample were insured through their employer.

Our study has limitations. First, ethnicity was not collected uniformly within or between healthcare systems. In some clinics, HI/LA ethnicity information was collected as a separate question and in other clinics, HI/LA ethnicity was included as a potential race category. Due to this difference, we categorized all individuals who did not identify as HI/LA ethnicity as not-HI/LA. Second, interim state death certificates with COD information were not available for KP Northwest. Third, our follow-up period ended in September 2020, and thus did not include the period of significantly higher infections in late 2020–early 2021. Fourth, approximately 70% of the people in our study were insured through employer-sponsored insurance plans (see Table R1); thus, it is possible that our results may not generalize to an uninsured or underinsured population. Fifth, most individuals these health systems serve live in urban or suburban areas, so we were not able to examine urban–rural differences, and our findings may not be generalizable to rural populations. Sixth, we lacked information about individuals’ occupation, an important predictor of potential exposure to COVID-19. This is particularly relevant to our study because people of color may be more likely to work as essential workers and thus to face disproportionately higher risks than more privileged individuals who have been able to work remotely.

Our study had notable strengths. This was a population-based cohort study of insured individuals enrolled in 3 health systems with comprehensive data capture on covariates and outcomes. These data included laboratory test results for most COVID-19 cases, hospitalization (including procedures performed during hospitalization), and mortality information from hospitalization discharge status and interim state mortality records. Our study used diagnosis codes from hospitalizations that were validated to have high accuracy in these health systems. Our study follow-up period included the beginning of the pandemic when testing was limited as well as the summer of 2020 when testing was more widely available. Access to rich data on all individuals in our cohort meant we could control for confounding by demographic and medical conditions to help address the impact restricted testing early in the pandemic may have had on our results.

Our work continues to highlight the importance of interventions to prevent COVID-19 infection, specifically in racial and ethnic minority communities. A continued focus on equity in vaccine distribution, education, and delivery is essential to achieving health equity for both initial COVID vaccination and subsequent booster shots. Early in 2021, COVID-19 vaccines were difficult to obtain in the US, and required a good internet connection and fluency in English and technology. According to the Centers for Disease Control and Prevention, COVID-19 vaccination rates still vary across racial and ethnic groups [30], but there is evidence that the differences in vaccination rates have narrowed over time [31].

Difficulty in access has been a major cause of unequal COVID-19 vaccination rates; literature points to racial and ethnic minority groups being willing to receive vaccines if offered [32, 33]. Recent US surveys on COVID-19 vaccination have shown an increase over time in the proportion of adults who have had the vaccine or who are willing to take the vaccine when offered [34]. While differences remain in the proportion of Americans willing to receive the COVID-19 vaccine across racial and ethnic groups, respondents identified concerns about COVID-19 vaccine efficacy and safety as reasons [34, 35]. Half of BA/AA and 25% of HI/LA respondents indicated they were not confident the COVID-19 vaccines have been adequately tested among people of their own racial or ethnic group [34, 35]. These concerns emphasize the need for researchers, health systems, and funders to ensure representation from all racial and ethnic groups in clinical trials, to make vaccines easily available to all racial and ethnic groups, and to provide accurate information on vaccine safety and efficacy available to all people, including information about the racial and ethnic make-up of trials.

One unexplained finding, which has limited evidence in prior literature [25], is our finding that Asian Americans may be at a higher risk of severe COVID-19 after infection compared to other racial and ethnic groups. A limitation of our study is that individuals from a variety of backgrounds were grouped together as Asian American. The finding that Asian Americans could be at increased risk of disease progression should be explored in future work, including understanding the impact of anti-Asian racism during the COVID-19 pandemic, identifying potential unknown risk factors that may have disproportionally affected Asian individuals in our cohort, and estimating risk among Asian American subgroups.

Our study adds to understanding of US health inequities of the COVID-19 pandemic, showing that Asian, BA/AA, HP/PI, and HI/LA individuals were at higher risk of severe COVID-19. We demonstrated that, for most racial and ethnic groups, this increased risk was driven by increased COVID-19 infection risk rather than other factors. It is important to note that during the period of this study there were no effective treatments for mild to moderate COVID-19 available that could prevent progression to severe disease. Thus, progression of COVID-19 symptoms to invasive mechanical ventilation or death likely represents the natural progression of disease without intervention. If there had been treatments available that could slow or stop progression to severe disease, we might expect to see disparities in receipt of those treatments by race and ethnicity due to bias and discrimination in healthcare access and treatment [12,13,14]. If that were the case, then after restricting analyses to individuals infected with COVID-19, we might see continued disparities for severe disease given infection, even if no natural variation in likelihood of disease progression existed. It will be important to monitor for such disparities as new treatments for mild to moderate disease become available, for instance the new oral agents developed by Merck (molnupiravir) and Pfizer (PF-07321332/ritonavir) which are currently under consideration by the FDA for Emergency Use Authorization. Vaccine distribution is an effective way to prevent COVID-19 infection and spread, highlighting the importance of equity in vaccine roll-out, both for initial vaccination and for any recommended booster vaccines. Our study underscores that the social impact of racial and ethnic inequality, as opposed to biological differences, influenced COVID-19 severity.