Baseline results
Results are reported in Tables 2 (deaths), 3 (discharged) and 4 (infections), for the period May 13–September 3, 2020. In all regression specifications, the North-East and Ile-de-France regions face higher numbers of deaths, discharged and infected per 100,000 people, restating our graphical results above.
The next columns turn to our main results, where we estimate the impact of income and income inequality on COVID-19 outcomes. The Gini coefficient is positive and significant at the 1 or 5% levels across all specifications: departments with higher inequality tend to face more deaths, more discharged patients, and a higher incidence of the disease. While column 2 in each table estimates the unconditional slope, this result remains after controlling for a series of covariates in columns 3 and 6. On average across departments, a 1% increase in the Gini coefficient corresponds to a 0.08% increase in the number of deaths, 0.09% increase in the number of discharged patients, and a 0.03% increase in the number of infected people per 100,000. Comparing across the three outcome specifications, inequality seems to be proportionally more important for a serious course of the disease (hospitalization or death), than for incidence.
Next, we show that it is the dispersion across incomes (measured through the Gini coefficient), and not the level (measured through the median disposable income) that drives these results. In columns 4–6, we also include the log median income. In isolation (column 4), higher median income seems to correlate with more severe COVID-19 outcomes, a counter-intuitive result when not controlling for socio-demographic factors. Controlling for these in column 5, this effect is attenuated, and turns insignificant in two out of three specifications. Finally, in column 6, we combine both the levels and dispersion of income, together with our controls. Here, we find that the inequality effect remains significant and stable, while the level effect of incomes now turns insignificant (and intuitively negative) in all specifications. Moreover, this specification, although parsimonious and with only aggregate data, explains between 50 and 87% of total variance in the data, as measured through the (adjusted) \(R^{2}\).
Several papers have studied the impact of incomes on COVID-19 outcomes. Borjas [1], Desmet and Wacziarg [3] and Verwimp [12] find, for the US and Belgium, that poorer areas are significantly more affected by the pandemic, with twice a death toll as more affluent neighborhoods. The Guardian [8] reports similar patterns for England and Wales. Without access to individual-level data, there can be a few explanations for this finding. Poorer individuals are more likely to have pre-existing conditions that are known co-morbidities or aggravating factors for the course of COVID-19, such as diabetes, obesity, cardio-vascular diseases etc. Additionally, poorer individuals are also more likely to be in jobs that cannot be safely distanced through tele-working, such as manufacturing, transportation and distribution, retail, etc. These are factors that lead to exposing the most vulnerable populations to the virus relatively more. However, these studies focus on the average or median regional income, rather than on income inequality. When accounting for both the level of income and income inequality, we find that inequality kills. It is the dispersion in incomes, not the level of median incomes, that drives the results.
We end with some notes on the covariates. We control for variables that take into account access to testing and critical care, and which are available at the district level: the number of general practitioners (GP) per 100,000 inhabitants, the rurality of a department, and the number of tests administered per 100,000 inhabitants. First, the number of GPs is negatively associated with the number of deaths and severely ill patients, as to be expected. A higher density of GPs helps to contain the outbreak as they are the first line of defense and guide patients in case of infection. Moreover, good and early treatment can reduce the severity of the disease, which is also tentatively confirmed in comparing the coefficients across specifications: a higher density of GPs correlates with fewer gravely ill patients and deaths, but not necessarily with a lower incidence, as the coefficient is insignificant in Table 4.
Second, we control for the intensity of testing in all our specifications. Although the value of the coefficient is very small, the number of tests per 100,000 people is also positively and significantly related to each of the outcome variables. This suggests that access to care, as proxied by the availability of GPs and tests play an important role. On the other hand, the rurality of a department and the housing conditions do not seem to be potential issues in France as they perhaps do in other countries. Even once these factors are taken into account, inequality continues to be a significant predictor of COVID-19, thus suggesting that factors other than access to care are also at work.
Finally, in terms of demographics, a higher share of people aged 60 years or more in the population correlates with a lower incidence of the disease, and a negative but insignificant effect on deaths and discharged patients. While surprising prima facie, this relationship is also reported in, e.g., [3].Footnote 10 Ideally, we would have information on COVID-19 outcomes by age group, which we do not have at our disposal.Footnote 11 We, thus, further scrutinize this last finding by separately looking at the geographical spread of 60+ people in France and the infection rates by age distribution nationally. In Fig. 2, we see that the 60+ are mostly located in the central and rural departments. By contrast, Paris and its surrounding departments are among those with the lowest share of individuals in this age group. The share of 60+ correlates negatively with the number of deaths, discharged and infected. Figure 3 reports information on tests across the age distribution at the national level. The age group of 20–29 is tested most intensely, and also shows the highest positivity ratio. Conversely, the group of 60+ has a lower positivity ratio than other age groups, supporting our negative coefficient in the regression tables.
Table 2 Cumulative deaths per 100,000 people (May 13–September 3, 2020) Table 3 Cumulative discharged patients per 100,000 people (May 13–September 3, 2020) Table 4 Cumulative confirmed cases per 100,000 people (May 13–September 3, 2020) Analysis of covariance
The estimated parameters displayed in Tables 2 and 3 do not seem to be very different, though the former deals with deaths, while the latter deals with those who were discharged from hospitals. To check whether they are significantly different, we use an analysis of covariance, which implicitly assumes that the distribution of errors is the same in both subsamples (deaths and discharged). The model is now:
$$\begin{aligned} \log Y_{i0}=R_{0}\alpha _{0}+X_{0}\beta _{0}+\delta R_{0}\alpha +\delta X_{0}\beta +\epsilon _{i0}. \end{aligned}$$
In this formulation, \(\log Y_{i0}\) is a vector constructed by stacking each department’s cumulative deaths followed by each department’s cumulative discharged for the period May 13–September 3, 2020. \(R_{0}\) is constructed as a block matrix from the two matrices \(R_{i}\). Matrix \(X_{0}\) is constructed in the same way from matrices \(X_{i}\). Finally, \(\delta\) is a dummy variable equal to 1 for observations related to \(\log Y_{i1}\), that is, discharged, and 0 for deaths. The coefficients on the interaction terms \(\delta R_{0}\) and \(\delta X_{0}\) tell us whether the effect of the covariates is different for deaths and discharged.
Results are reported in Table 5. The coefficients picked up by each variable alone, as well as the value of the intercept, are exactly the same as those in Table 2. This is due to the fact that our dummy, \(\delta\), is equal to zero for deaths and, hence, these coefficients pick the effect of the covariates on the cumulative number of deaths. Those coefficients that were significantly different from 0 remain so, and those that were not, remain so as well. Standard errors are also the same.
The coefficients, \(\alpha\) and \(\beta\), associated with each interaction term, yield the difference in the effect of each covariate across the two groups (deaths and discharged): if this coefficient is not statistically different from zero, we conclude that this difference is not significant at the level indicated. The estimates for standard errors are never significantly different from zero. This implies that the effect of each right-hand side variable is the same across the two groups. The only exception is the interaction between the dummy \(\delta\) and the cumulative number of tests in the population, which is, however, only significant at 10% confidence level. We, thus, conclude that we can use a joint model for both deaths and discharged patients in this setting.
Table 5 Analysis of covariance (May 13–September 3, 2020) Robustness checks
Finally, we exploit two alternative approaches as robustness to our main results. First, we repeat the same analysis as above for three different points in time. We first consider the cumulative number of deaths and discharged between March 1 and April 20, then between March 1 and May 12, and finally between March 1 and September 3, 2020. This allows us to control for potential misreporting of data at the beginning of the pandemic, for potential discrepancies in lockdown policies and for different timings in the onset of the pandemic across departments. Results are reported in Tables 6, 7, 8, 9, 10 and 11 in the Appendix. Notice that here we are not able to control for the number of tests administered in the population, as these data are only available starting from May 13 onwards. Results are highly similar to the baseline findings: the Gini coefficient is significant and positive across all specifications, and when accounting for both the Gini and the level of income, the latter turns insignificant.
Second, to further account for differences in the onset of the pandemic across departments, we use an approach similar to the one proposed by Desmet and Wacziarg [3]. We define the onset of the epidemic as the day in which the cumulated number of deaths per 100,000 inhabitants in a given department reaches a value of at least 10. After that, we consider, for each department, the cumulated number of deaths and discharged per 100,000 inhabitants 30 days after the onset. Notice that a few departments never reached the threshold during the time period under study. Hence, the number of observations for these regressions is equal to 69. These results are displayed in Tables 12 and 13. Again, our findings are not sensitive to the differential onset across departments in France.