Introduction

By June 2020, the COVID-19 pandemic in the USA led to over 2.3 million confirmed infections, over 121,000 fatalities, and almost 31,000 hospitalizations (CDC, U. S. Centers for Disease Control and Prevention 2020, 2020c). Like many other issues pertaining to health and economic disparities, the burden of the COVID-19 pandemic falls disproportionately on Black and Hispanic communities. Through June 13, 2020, the rate of hospitalization for Blacks and Hispanics was more than four times as high as for Whites (CDC, U. S. 2020b). The pandemic is taking a substantial toll on physical, mental, and economic health across the USA, but disparities in whom is impacted by the virus are an additional cause for alarm.

Part of the barrier to carefully studying disparities from the pandemic—both their magnitudes and potential explanations—is that current published data on COVID-19 outcomes are coarse (Killeen et al. 2020). Most studies focus on COVID-19 outcomes at highly aggregated levels such as the state (Friedson et al. 2020) or county (Courtemanche et al. 2020). Within a state or county—especially in areas with larger populations—residential segregation by race and ethnicity can be quite stark, meaning that analyzing disparities at such levels misses an important part of the variation. Partway through the pandemic, a number of local and state governments began producing COVID-19 statistics in a more disaggregated fashion—most commonly reporting confirmed cases by ZIP code. This paper provides one of the first attempts to systematically investigate racial/ethnic disparities in COVID-19 using these newly available ZIP code level data. Specifically, we utilize data on confirmed COVID-19 cases from six cities—New York, Chicago, Atlanta, Baltimore, San Diego, and St. Louis—as well as data on fatalities from New York and Chicago.

Our analysis links these COVID-19 outcomes to six separate data sources to control for ZIP code level demographics, housing, socioeconomic status, occupational choices, transportation modes, health care access, long-run opportunity (income mobility and incarceration rates), human mobility, and population health disparities. This rich set of covariates allows us to investigate the extent to which mechanisms that have received popular attention—such as income; education; living in densely populated communities; reliance on public transportation; representation in forward-facing, essential jobs; mobility during lockdowns; pre-pandemic health; and access to health care—contribute to racial and ethnic disparities (Harrison 2020; Hubler et al. 2020; Oppel et al. 2020; CDC, U. S. 2020b). We find statistically significant and economically meaningful disparities for both Blacks and Hispanics at the ZIP code level in confirmed cases, and most of the disparity is remains unexplained even after including extensive controls. Without additional covariates, a 10 percentage point increase in a ZIP code’s share of Black residents is associated with 9.2 additional confirmed COVID-19 cases per 10,000 residents, while a similar change in the Hispanic share is associated with 20.6 additional cases. Both are sizable changes relative to the average confirmed case rate of 153 per 10,000 population. Using decompositions that are insensitive to the ordering of the covariates (Gelbach 2016), we find that at least part of these disparities can be explained by differences in long-run opportunity (income mobility and incarceration rates), human mobility as measured by cell phone activity, and demographics. However, even with an extensive set of controls, more than half of the disparity in COVID-19 cases remains unexplained. For the two cities where we are also able to examine COVID-19 fatalities, we find that differences in confirmed COVID-19 cases strongly predict the observed disparities, in fact entirely eliminating the entire association for proportion Hispanic and the majority of the association for proportion Black.

The remainder of the paper is arranged as follows. We first examine the rapidly evolving COVID-19 lietarure with respect to disparities due to race and ethnicity. Next, we discuss our data colection effort. We then provide an empirical model and findings, and then conclude.

Literature Review

Despite growing recognition about racial and ethnic disparities in COVID-19, our study is one the first to systematically investigate their size and possible explanations at a geographic level narrower than the county.Footnote 1 Below, we summarize the literature on racial disparities pertaining to COVID-19 infections, testing, and deaths, with the caveat that the literature is rapidly evolving. We then discuss, based on previous work, limitations on county-level analyses and underlying mechanisms through which disparities could occur.

Nationwide, County-Level Analyses

Several recent studies utilize data on COVID-19 outcomes from US counties and analyze racial and ethnic disparities. McLaren (2020) collects county-level data on COVID-19 mortality from the entire USA and links to county characteristics from the American Community Survey (ACS). By May 19, 2020, the unadjusted disparity shows a 10 percentage point increase in the Black share corresponding to an increase of 37.6 additional fatalities per million, with a 10 percentage point increase in the Hispanic share associated with 9.6 additional fatalities. With additional controls—especially for public transit—the disparities decrease and in some cases become insignificant (suggesting potential mechanisms for the observed disparity). The study concludes that the disparity for Blacks is very robust to the inclusion of additional covariates, although the one for Hispanics is more fragile.

Knittel and Ozaltun (2020) also examine COVID-19 death rates at the county-level and find mixed evidence of disparities. As of May 27, 2020, in a model with detailed controls but excluding state fixed effects, a 10 percentage point increase in the proportion Black is associated with a large and statistically significant increase of 126.2 deaths per million residents. With state fixed effects included, the estimated disparity shrinks to a still sizeable 46.8 deaths per million but becomes statistically insignificant. The disparity for proportion Hispanic is not statistically significant in either model and the magnitudes are much smaller: 18.8 and 9.6 deaths per million in regressions without and with state fixed effects, respectively.

Desmet and Wacziarg (2020) examine both confirmed COVID-19 cases and fatalities. Outcomes are measured as logarithm of cases or fatalities (plus one) for May 26, 2020, where logarithm of population is included as a control, implying the other estimates can be interpreted as the determinants of cases and deaths in per capita terms. They find highly significant and virtually identical disparities in cases and fatalities for proportion Black and somewhat smaller standardized coefficients for proportion Hispanic.

Racial Segregation Within US Counties

There is somewhat limited racial and ethnic variation across counties in the USA, although the share of minority-majority counties has been increasing since 2000 (Krogstad 2019). Fewer than 5% of all US counties (151 out of 3143) have either Black, Hispanic, or indigenous people as the majority. As of 2018, there were 72 majority-Black counties, primarily located in the southeastern U.S.; there were 69 counties where Hispanics are the majority, predominantly in the southwestern U.S. (Schaeffer 2019). Especially in large US counties, there can be significant residential segregation, which may hide racial disparities in COVID-19 that can be more precisely measured at a more localized level. This limitation is well-understood; for example, McLaren (2020) notes “much of the relevant variation exists at the zip code level.” On a national level, there are 658 counties that contain 10 or more residential ZIP codes entirely within the county.Footnote 2 More than one-third of these counties have large dispersion—25 percentage points or more—in the share Black or Hispanic between the ZIP codes with the highest and lowest shares.

Neither the southeast nor southwest regions were the initial hotspots for COVID-19 spread, raising the question of whether observed correlations between county racial composition and COVID-19 death rates are confounded by the staggered timing in which the disease reached different parts of the country. The sensitivity of the results of Knittel and Ozaltun (2020) to the inclusion of state fixed effects is suggestive of such confounding. Yet including state fixed effects in county-level disparities regressions could itself be problematic, as it controls away much of the identifying variation, reducing the precision of the estimates. Accordingly, their coefficient for proportion Black becomes statistically insignificant after including state fixed effects despite remaining sizeable at 46.8 deaths per million. Estimating racial disparities with both credibility and precision therefore appears to require data with finer-grained geographic detail.

In our analysis of six cities, nearly 31% of ZIP codes we analyze were minority-majority “neighborhoods” (to borrow the terminology of Almagro and Orane-Hutchinson (2020)), who refer to ZIP codes within New York City as neighborhoods). While analyses at even a more finely grained level than ZIP code (e.g., Census Tract) would be desirable, ZIP codes offer much more heterogeneity with respect to race and ethnicity than counties, while also having satisfactory indicators related to long-run economic opportunity, human mobility, and other key demographic and health-related information.

Potential Explanations and Related Evidence

Several factors may explain racial and ethnic disparities in the spread of COVID-19 and subsequent COVID-19 outcomes. We next explain these theories and associated empirical evidence. Many of the relevant studies examine New York City in isolation, since it was initially the hardest hit in the USA, and because of its relatively early posting of ZIP code level data. Of course, New York City differs from the rest of the USA in many respects, even relative to other large cities, so the extent to which these findings are generalizable is unclear.

One possibility may be the nature of jobs. Many features about occupations, commuting, and the workplace could contribute to the spread of COVID-19, and may disproportionately affect people of color. Almagro and Orane-Hutchinson (2020) argue that occupation is a key explanatory variable for understanding the early transmission of COVID-19 in New York City, and since minority workers are more likely than others to be front-line employees or unable to work from home, they are more likely to be working outside the home during those hours. Accordingly, Coven and Gupta (2020) find that Black and Hispanic neighborhoods (measured at the ZIP code level) in New York City exhibited more daytime work activity than other neighborhoods during the pandemic using mobile location data sourced from VenPath. Selden and Berdahl (2020) examine the Medical Expenditure Panel Survey and find that Black adults in every age group were more likely than White adults to have health risks associated with severe COVID-19 illness, and that Blacks at high risk for severe illness were 1.6 times as likely as Whites to live in households containing health-sector workers.

Another phenomenon observed in New York City during the early stages of the pandemic was temporary relocation outside the city. Ability to relocate depends on whether one’s job can be done remotely and also whether one has the financial resources to do so. Perhaps for these reasons, Coven and Gupta (2020) find that the propensity to leave the city was strongly negatively associated with the proportion Black in the Census Tract.

Health care access may also be an important factor, and the barriers to receiving COVID-19 testing at the onset of the pandemic are well known. However, Schmitt-Grohé et al. (2020) found that testing services were evenly shared across the income distribution in New York City’s 177 ZIP codes. Borjas (2020) finds access to testing was roughly uniform across the share of the ZIP code that was minority, but the conditional probability of a positive test result was far greater in neighborhoods with larger Black populations.

It is also possible that residential segregation as measured at the ZIP code level, and the consequences from it, led to long-lasting effects on health that made minority communities particularly vulnerable to COVID-19. Link and Phelan (1995) argue that social factors are a fundamental cause of disease that, because they embody access to important resources, affect multiple disease outcomes through multiple mechanisms, and consequently maintain an association with disease even when intervening mechanisms change. Phelan and Link (2015) argue that racial inequalities in health endure primarily because racism is a fundamental cause of racial differences in socioeconomic status, and in turn socioeconomic status is a fundamental cause of health inequalities. Chetty et al. (2019) find Black Americans have much lower rates of upward mobility and higher rates of downward mobility than Whites, leading to persistent income disparities across generations. The Black-White gap persists even among boys who grow up in the same neighborhood. Logan and Parman (2018) demonstrate that premature mortality among Blacks is rooted in historical segregation. Using person-level data, the authors apply a comprehensive measure of segregation extending the analysis of structural factors in racial health disparities. Wiemers et al. (2020) highlight disparities in potential COVID-19 complications by constructing a vulnerability index from the Panel Study of Income Dynamics, finding that Blacks are drastically more vulnerable than other groups among people aged 45 and older. National estimates find that COVID-19 related hospitalizations among Blacks and Hispanics are more than four times that of Whites (CDC, U. S. 2020b). Price-Haywood et al. (2020) found that more than 70% of patients who were hospitalized or died of COVID-19 were Black, compared to an overall population representation of 31%.

Data

To analyze racial and ethnic disparities, we combine ZIP code level data on COVID-19 outcomes from state and local government websites with data from (1) the 2018 ACS 5-year sample, (2) the 2010 Census, (3) the Opportunity Atlas, (4) SafeGraph mobility data, (5) health professional shortage areas published by the Health Resources & Services Administration, and (6) conditional life expectancy published by the Centers for Disease Control and Prevention.

COVID-19 Data

From state and local websites, we gathered COVID-19 data at the ZIP code level for six metropolitan areas: New York City, Chicago, Atlanta, San Diego, St. Louis, and Baltimore.Footnote 3 The ZIP codes include both the city proper, and in some cases the surrounding county. Cross-sectional data was gathered from June 6, 2020, to June 9, 2020, for these localities.Footnote 4 Once merged with all other sources, the full analysis uses 436 ZIP codes, with 177 in New York City, 58 in Chicago, 49 in Atlanta, 95 in San Diego, 21 in St. Louis, and 36 in Baltimore. Overall, there are approximately 17.7 million people living in these ZIP codes, with nearly half residing in New York City.

Our primary outcome variable is confirmed COVID-19 cases per 10,000 population. Although serological surveys provide strong evidence that confirmed cases are an undercount of total infections, confirmed case numbers still have clear clinical and economic significance. Nationally, the fatality and hospitalization rates for confirmed cases were roughly 5% and 10%, respectively, by June 2020.Footnote 5 Even after discharge from a hospital, persistent symptoms may remain (Carfì et al. 2020). In addition, confirmed infections (which tend to be more severe that those that remain undetected) undoubtedly lead to lost earnings, family strain, psychological distress, and potentially harmful long-term consequences (Eisenberg et al. Forthcoming). Kniesner and Sullivan (Forthcoming) estimate economic losses from COVID-19 at $46,000 per non-fatal case, by applying value per statistical life and relative severity/injury estimates from the Department of Transportation.

All six localities provided counts of COVID-19 cases, which are scaled into counts per 10,000 population. When weighted by population, Table 1 shows the median ZIP code had 143 cumulative cases per 10,000 population by early June, translating into a cumulative measured infection rate of about 1.4%. Measured infection rates varied substantially, with 12 cases per 10,000 (0.1%) in the lowest decile and 315 cases per 10,000 (3.2%) in the highest decile. In the aggregate, New York City had the highest rate of confirmed cases (2.3%), followed by Chicago (1.7%) and Baltimore (0.9%). The other three localities had confirmed case rates varying from 0.26 to 0.48%. In the aggregate, there were more than 271,000 confirmed COVID-19 cases in these 6 cities, approximately 14% of all cases nationally by that point.Footnote 6

Table 1 Summary statistics

We also conduct auxiliary analyses using a subsample of only Chicago and New York City, cities that provide additional data that allow us to investigate two important issues. First, do observed disparities in confirmed COVID-19 cases accurately reflect disparities in illnesses, or are they confounded by geographic variation in availability of tests and criteria for obtaining them? Note that we will include city fixed effects in all our models, which alleviates this concern to some extent. Nonetheless, since Chicago and New York City report tests run by ZIP code, they enable us to control for testing more directly. Second, are racial and ethnic disparities in COVID-19 fatalities the result of a higher likelihood of catching the virus, a greater risk of dying conditional on catching it, or some combination of both? Answering this question requires data on COVID-19 fatalities—not just cases—and Chicago and New York City are the only cities in our sample that report deaths by ZIP code. The bottom panel of Table 1 shows data for those two cities (and 235 ZIP codes). In these cities, the cumulative fatality rate from COVID-19 was 0.17% by early June.

Census Bureau Data

We merged this information to the 2018 ACS 5-year sample, as well as to the 2010 Decennial Census using Social Explorer (which provides summary statistics at the ZIP code level).Footnote 7 The ACS contains a rich set of variables on demographics, economic outcomes, and housing characteristics. Demographic variables include percent male, percent foreign born, and percent aged 18–44, 45–64, 65–74, and 75+ (children under 18 are omitted). Housing variables include density, percent renters, percent vacant units, percent overcrowded (1.5 or more persons per bedroom), and percent of units with 0 or 1 bedrooms. The 2010 Census—although dated—provides information about group quarters, specifically percent of population in nursing homes, correctional facilities, college dormitories, and military barracks. Returning to the ACS 5-year sample, our socioeconomic variables include percent in education bins (dropout, high school, some college, bachelor’s degree, the group beyond a bachelor’s degree is omitted as a reference category), income inequality as measured by the Gini coefficient, and percent in poverty bins (0–49% FPL, 50–74%, 75–99%, 100–149%, 150–199%, 200%+ is omitted). Occupation variables include percent of workers in service occupations, sales, farming, construction, production, or transport (managerial occupations omitted). Transportation variables include percent of workers of workers who use a car, percent who use public transportation, and percent with long commuting times (60+ min). Finally, one of our measures of health access—percent without health insurance—comes from the ACS.

Opportunity Atlas Data

The Opportunity Atlas is a collaboration of the Census Bureau, Harvard University, and Brown University that uses anonymous data following 20 million Americans from childhood to their mid-30s, with many outcomes measured at the Census Tract level (which we aggregate up to ZIP code).Footnote 8 As noted by Chetty et al. (2018), neighborhoods matter at a very granular level, where neighborhoods even one mile away have very little predictive power for child outcomes. We focus on two key variables, which represent long-run opportunity. The first is average annual household income ranking in 2014–2015 for children (in their mid-30s) who grew up in the area, based on having a low-income parent (25th percentile). The second is fraction of male children who grew up in the area who were in prison or jail on April 1, 2010. We aggregate Census Tracts to the ZIP code level using a crosswalk provided by the Missouri Census Data Center.Footnote 9 We follow the spirit of Courtemanche et al. (2017) by assigning Census Tract to the ZIP code where the plurality of residents live. In practice, approximately 53% of Tracts nationally map into one ZIP code only, and roughly 75% of Tracts have at least 80% of their population in one ZIP code.

SafeGraph Data

Many recent COVID-19 studies examine mobility using data from SafeGraph, which provides access to their data through free, non-commercial agreements.Footnote 10 Following Gupta et al. (2020), we compute the fraction of cell phone devices that were detected to be entirely at home during the day, aggregating from the Census Block Group level to the ZIP code level. We aggregate Census Block Groups to ZIP codes using a crosswalk provided by the Missouri Census Data Center. We computed daily averages for each ZIP code, and then averaged across all days for the months of March 2020, April 2020, and May 2020. Again, following Courtemanche et al. (2017), we assign Census Block Groups to the ZIP code where the plurality of residents live. In practice, approximately 73% of Census Block Groups nationally map into one ZIP code only, and roughly 85% of Tracts have at least 80% of their population in one ZIP code.

Health Professional Shortage Area Data

We incorporate information on each ZIP code’s status as being designated as a health professional shortage area (HPSA) for federal fiscal year 2020.Footnote 11 HPSAs are designated by the Health Resources and Services Administration (HRSA) to signify areas as medically underserved. The Centers for Medicare and Medicaid Services (CMS) provide HPSA designation status at the ZIP code level to signal to eligible health care professionals (e.g., physicians) if the location where they practice medical care is eligible for enhanced Medicare reimbursements per the 2005 Medicare Modernization Act (CMS, U.S. Centers for Medicare, and Medicaid Services 2020). This feature of the program creates a financial incentive for delivering care in medically underserved settings with historically higher uninsured rates and limited access to care. We use this as a proxy measure to capture differences in access to primary care and mental health services. HPSAs can be entire counties, but are most commonly smaller portions of a county—this is particularly the case in larger cities.

Centers for Disease Control and Prevention Data

Our population health variables are conditional life expectancies obtained from the U.S. Small-area Life Expectancy Estimates Project (USALEEP).Footnote 12 The files contain conditional life expectancies for different age bins at the Census Tract level; in our model, we include conditional life expectancies for ages 65–74, 75–84, and 85 plus, and aggregate from the Tract level to ZIP code. Many commentators attribute disparities in COVID-19 cases and deaths to underlying health conditions such as elevated rates of chronic illnesses among Blacks and Latinos (Artiga et al. 2020). We use variation in life expectancy—and focus on the elderly who are most vulnerable to COVID-19—to control for variation in underlying health status as well as the risk factors leading to differences in preventable mortality.

Table 2 shows, along some margins, large differences in neighborhood characteristics depending on racial and ethnic composition. Out of the 436 ZIP codes, 188 are majority White, 84 are majority Black, 49 are majority Hispanic, and 115 are none of these. With respect to demographics, Hispanic neighborhoods have much higher representation of foreign-born individuals. With respect to housing, there are more renters in majority-Black and majority-Hispanic neighborhoods. Lower educational attainment and higher poverty levels are also common attributes of these neighborhoods, as are lower levels of income mobility—an indicator of long-run opportunity. At least some types of workers whose jobs do not easily transfer online—those in service occupations—are more prevalent in predominantly Black and Hispanic neighborhoods. Also common in predominantly Black and Hispanic neighborhoods are larger dependence on public transit as a key mode of transportation (McLaren 2020) and longer commuting times. Cell phone mobility measures are relatively similar, on average, across neighborhoods. Health care access is worse for Black and Hispanic neighborhoods according to both percent uninsured and mental health HPSA, while population health—proxied by conditional life expectancy—is fairly similar across neighborhood types, especially from age 75 onward. Finally, racial composition and segregation varies by city. None of the ZIP codes in San Diego are majority Black, while Atlanta, Baltimore, and St. Louis have no ZIP codes that are majority Hispanic.

Table 2 Summary statistics by neighborhood type

Empirical Model and Findings

Model

We estimate linear models of the following form:

$$ {\mathrm{covid}}_{z,j}={\beta}_0+{\beta}_1\mathrm{PCT}\_{\mathrm{BLACK}}_{z,j}+{\beta}_2\mathrm{PCT}\_{\mathrm{HISPANIC}}_{z,\mathrm{j}}+{\beta}_3{\mathrm{OTHER}}_{z,j}+{\beta}_4{X}_{z,j}+{\delta}_j+{\varepsilon}_{z,j} $$
(1)

where covidz, j represents either confirmed COVID-19 cases per 10,000 population or COVID-19 fatalities per 1,000,000 population in ZIP code z in city j. The key explanatory variables include the percentage Black and percentage Hispanic in each ZIP code (we also include percentage other race, with percentage White omitted).Footnote 13 We successively include additional neighborhood characteristics in Xz, j. City fixed effects are given by δj and εz, j is the error term. All observations are weighted by population in the ZIP code. Standard errors are heteroscedasticity-robust.Footnote 14

Any observational analysis like this identifies correlations, not causation (Knittel and Ozaltun 2020). With many observable neighborhood differences—including some characteristics that intuitively might represent greater likelihood of COVID-19 transmission such as density, occupations, modes of transport, or health care access—it may be tempting to attach causal stories. However, these variables could influence each other or be influenced by unobservable characteristics of the ZIP code, preventing causal inference. Our principal goal, therefore, is simply to assess the extent to which such measured characteristics can collectively explain the racial and ethnic disparities in COVID-19 burden, and the extent to which these disparities remain unexplained by conventional measures.

Our model’s city fixed effects account for some potential confounders, such the arrival of the virus, weather patterns, and lockdown policies, but not unobserved heterogeneity at the sub-city level. A number of recent studies control for unobserved heterogeneity at granular levels (Schuetz et al. 2008; Price 2013), recognizing that micro-geographies are an important determinant of individual health outcomes (Arcaya et al. 2016). It is not possible to include ZIP code fixed effects (or fixed effects at a narrower level such as Census Tract) in our models because they would be perfectly collinear with the ZIP code-level covariates such as race and ethnicity.Footnote 15

Size of Disparities

Our primary results are presented in Table 3, which examines confirmed COVID-19 infections per 10,000 population. In the top panel, we examine 436 ZIP codes across the 6 cities, and successively include covariates for demographics, housing, socioeconomic status, opportunity, occupation, transportation, human mobility, health access, and population health. The base model in column (1)—which only includes the race/ethnicity variables, city fixed effects, and a constant term—shows large, highly significant health disparities for both Blacks and Hispanics. A 10 percentage point increase in a ZIP code’s Black share is associated with to 9.2 additional confirmed COVID-19 cases per 10,000 population, while a similar change in the Hispanic share is associated with 20.6 additional COVID-19 cases. Both are very sizable changes relative to the average confirmed cases rate of 153 per 10,000 population.

Table 3 COVID-19 cases per 10k population

As the analysis moves from one column to the next, sets of covariates are successively included. For example, including “demographics” in column (2) adds six additional variables for percent male, percent foreign, and percent in each of four age bins to the specification. Across the remaining columns, the measured disparity remains sizable and significant, regardless of the set of controls that are included. The full model in column (10) shows disparities that are roughly 60% as large as in column (1). Thus, a key insight is that even with an extensive set of controls for factors that should plausibly affect the transmission of the virus, more than half of the overall disparity in confirmed COVID-19 cases remains unexplained.

In the bottom panel, we focus on the 235 ZIP codes in Chicago and New York, since those localities provide additional data on COVID-19. The initial overall disparities in column (1) are larger, and a 10 percentage point increase in a ZIP code’s Black share is associated with 12.4 additional confirmed COVID-19 cases per 10,000 population, while a similar change in the Hispanic share is associated with to 24.8 additional COVID-19 cases. Again, both are sizable changes relative to the average confirmed cases rate of 219 per 10,000 population for these two cities. As neighborhood characteristics are added in the remaining columns, in some instances, the disparity falls and in others it rises. With the full set of controls in column (10), there remain sizable disparities for both Blacks and Hispanics, and again more than half of the overall disparity remains unexplained. It should be noted that the “health access” variables in column (9) now include COVID-19 tests per capita at the ZIP code level.

Next, we turn to Table 4, where we examine COVID-19 fatalities at the ZIP code level for Chicago and New York, an outcome that was a key focus in McLaren (2020) and Knittel and Ozaltun (2020). To maintain comparability, we scale COVID-19 fatalities per million population. In column (1), we estimate models with race/ethnicity controls, city fixed effects, and a constant term, finding large and statistically significant disparities. A 10 percentage point increase in the Black (Hispanic) share is associated with to 143 (149) additional fatalities per million, from a baseline of 1727 per million in these two cities. In column (2), we include as an additional control COVID-19 cases (per 10,000) in each ZIP code. Thus, we ask the extent to which variation in fatalities is simply explained by greater numbers of confirmed COVID-19 cases, versus the extent to which factors beyond additional cases matter. The results show that the coefficients fall by 50–100%, suggesting that racial and ethnic disparities in the spread of infection are an extremely important determinant for resultant fatalities. For proportion Hispanic, the coefficient estimate is very close to zero, while for proportion Black, the coefficient estimate is about half as large. In columns (3) and onward, additional covariates similar to the ones in the bottom panel of Table 3 are added. With sufficient controls, the original racial and ethnic disparities become insignificant.

Table 4 COVID-19 fatalities per 1m population

Mechanisms

In a highly cited paper, Gelbach (2016) shows that the sequence in which covariates are entered into the model can lead to very different conclusions about their relative importance. In our context, the extent to which different sets of neighborhood characteristics “explain” the observed racial and ethnic disparities may be sensitive to the sequence in which they are entered into the model. For example, the changes in racial and ethnic disparities from adding demographic covariates may have been different than those shown in Tables 3 and 4 if we had added housing variables first, since demographic and housing variables are correlated. In order to create a path-independent explanation of the influence of each set of neighborhood characteristics, Gelbach prescribes omitted variable bias equations. Essentially, we estimate omitted variable bias on the coefficient of interest from the exclusion of each sets of neighborhood controls one at a time from the full model (in Table 3, column 10). The influence of each set of neighborhood characteristics therefore becomes a function of the correlation between the covariate and race/ethnicity in addition to the covariate’s coefficient in the full specification.

Table 5 shows the results of the Gelbach decomposition for our full set of 6 cities on COVID-19 cases (Table 3, top panel), as well for the 2 cities on cases and fatalities (Table 3, bottom panel; Table 4).Footnote 16 The baseline coefficient comes from the model that only includes racial/ethnic composition, city fixed effects, and a constant term. The “explained difference” is the reduction that occurs from moving from the first column to the last column (e.g., from 0.92 to 0.59 for “% Black” in the top panel of Table 3), and the remaining rows show the contribution of each set of neighborhood characteristics to the explained reduction (as well as their statistical significance). In the model of COVID-19 cases with all 6 cities in column (1), much of the explained difference in the disparity for Blacks can be attributed to the long run opportunity variables (income mobility and male incarceration rates for those born between 1978 and 1983). Although other sets of neighborhood characteristics occasionally have large magnitudes (e.g., the occupation controls), they are not statistically significant. For the Hispanic disparity, the opportunity variables again contribute to part of the explained difference; however, demographics (which includes the fraction foreign born) and human mobility (from SafeGraph) are larger factors.

Table 5 Accounting for change in coefficients

The findings, when restricted to Chicago and New York, are somewhat different in for cases and fatalities (columns 2 and 3 of Table 5, respectively). For confirmed COVID-19 cases, the combined addition of all the covariates does little to change the estimate for proportion Black; however, several factors offset each other to result in essentially the zero change. Controlling for health care access and human mobility is associated with statistically significant reductions in the disparity, but this is offset by statistically significant increases in the disparity from housing and socioeconomic characteristics and insignificant but similarly sized increases from occupation and transportation. For proportion Hispanic, the key driver explaining much of the reduction in confirmed cases appears to be health care access, which in the context of the two cities includes COVID-19 tests per capita.Footnote 17

Finally, for COVID-19 fatalities (column 3 of Table 5), adding covariates from the base model to the full model entirely explains the disparities for proportion Black and Hispanic. The most important factor in explaining COVID-19 fatalities is simply the rate of infection. Differences in confirmed cases explain approximately two-thirds of the disparity in fatalities for Blacks, and the entire disparity for Hispanics. Other factors appear to have large and somewhat offsetting effects for both groups (e.g., socioeconomic controls and occupational controls), but a key implication is that understanding the core causes of COVID-19 cases can potentially explain the alarming subsequent differences in mortality.

Robustness Checks

We explore several robustness checks in Tables 6 and 7. As illustrated in Table 2, one key advantage of analyzing ZIP codes rather than counties is that residential segregation is much more stark. Overall, 327 of the 436 ZIP codes are highly segregated, and we re-estimate the models of COVID-19 cases restricted to neighborhoods that are the majority of one race or ethnicity in the top panel of Table 6. Overall, the findings—both from the base model and full model—are very similar to Table 3. Next, we test the sensitivity of the results to excluding one city at a time in Table 7, presenting findings from the base specification and full specification. In all cases, there are sizable racial and ethnic disparities in the base specification. The findings on Hispanic disparities are robust across all specifications; there are large disparities that are partially explained by neighborhood characteristics. For proportion Black, the baseline disparity was smaller than that for Hispanics, and the explained part was modest as well in Table 3.

Table 6 COVID-19 cases per 10k population robustness checks
Table 7 COVID-19 cases per 10k population (leave one city out)

Conclusion

In this study, we use ZIP code level data to understand the factors contributing to racial and ethnic disparities in COVID-19 burden. We find strong evidence that predominantly Black and Hispanic neighborhoods were disproportionately at risk to COVID-19 infections and mortality—although the disparities were larger among Hispanic neighborhoods. Even though our study was limited to six cities, these cities include the first (New York) and third (Chicago) largest in the USA, and case counts from these cities accounted for 14% of all confirmed US cases through June 9, 2020. The ZIP code level data allow us to examine a much wider range of variation in racial and ethnic composition than other studies using county-level data, and we also contribute to the literature by exploring numerous possible explanations for the disparities using decomposition methods. Differences in social mobility, demographics, and long-run opportunity arose as important contributors to COVID-related disparities. However, a significant share of the disparity in cases for Blacks and Hispanics remains unexplained despite the inclusion of an exhaustive list of covariates. For fatalities, the majority of the disparities appear to be driven by the differences in cases, as opposed to differential case fatality rates. This is an important result, as it implies that interventions targeting reducing the spread of the virus in minority communities might be a more effective use of scarce resources than those targeting health care utilization once infected.

Our inability to explain most of the disparities in COVID-19 spread is perhaps surprising since we control for the risk factors largely associated with social deprivation. Recent studies have used area deprivation indexes (ADI) to explain disparities in avoidable hospitalizations and readmissions (Kind and Buckingham 2018; Jencks et al. 2019; Hu et al. 2018) and difficulty managing chronic illnesses (Kurani et al. 2020; Zhang et al. 2020; Camacho et al. 2017; Durfey et al. 2019). ADIs are composite measures generally made up of weighted combinations of education, labor market composition, income, income inequality, and housing market information such as home ownership (Singh 2003). Rather than use an ADI, we allow for each of our key covariates to enter into the regression separately, which is theoretically more flexible and should allow for greater explanatory power.

With that said, if our lengthy list of explanatory variables does not explain the disparities in COVID-19 cases, then what does? Are we not more successful in explaining the disparities because we are considering the wrong theories, or because available data are inadequate to fully test the existing theories? Structural racism may influence multiple mechanisms that are difficult to quantify (Poteat et al. 2020; Braveman and Gottlieb 2014). For example, racial biases could influence clinical decision making, which would not be captured by crude access measures such as uninsured rates (Jones 2001). Obermeyer et al. (2019) found that several clinical decision-support algorithms were less likely to refer Black patients for advanced care and screening than White patients presenting with identical symptoms. With respect to the COVID pandemic, there are reports of Blacks being denied screenings after seeking care for COVID-19 symptoms at the early stages of this pandemic (Gathright 2020; Samuels 2020; Shamus 2020; Patton 2020; Mitropoulos and Moseley 2020). Additionally, perhaps conditional life expectancies are inadequate measures of underlying health risks. The premature deaths linked to COVID-19 complications are associated with poorer underlying health status and whether the person had a pre-existing condition such as diabetes, obesity, or hypertension, and we are unable to measure those directly.Footnote 18 As another example, available socioeconomic information may not truly capture the source of disadvantage; percent with a high school or college degree says little about the quality of the schools attended, for instance.

To call the COVID-19 pandemic “the Great Equalizer” is a misnomer (Mein 2020; Kim et al. 2020), and the pandemic’s key risk factors are unevenly distributed across communities. Even though we are able to explain some of the racial and ethnic disparities as attributable to different concentrations of socioeconomic risk factors, the fact that most of the case disparities remains unexplained demonstrates the difficulty of addressing deeply embedded racial and ethnic inequalities in health outcomes. Although more study is needed, our results suggest that policies enacted to curb COVID-19’s spread should consider how they would overcome the structural barriers to improvement across different groups. More superficial interventions such as economic stimulus or expanding health insurance coverage are unlikely to be fully adequate.