Introduction

There have been almost 450 million SARS-CoV-2 infections and 6 million deaths (as of March 2022) worldwide since the novel coronavirus disease 2019 (COVID-19) pandemic emerged [1]. Several studies have demonstrated ethnic inequalities in the risk of infection and adverse outcomes, which has generated substantial concern [2,3,4,5]. In the United Kingdom (UK), compared to White individuals, men from all backgrounds other than Chinese, and women from any ethnic group other than Bangladeshi, Chinese or mixed ethnicity, had an increased risk of COVID-19 mortality when accounting for differences in demographics, socioeconomic status, and comorbidities [6]. Notably, in the UK Black African men and women were over 2 times as likely to die from COVID-19 than those of White ethnicity [6]. Other large-scale epidemiological analyses from the UK demonstrated that those from South Asian, Black, and ‘Mixed’ ethnic groups had increased rates of COVID-19 death compared to the White group [2]. However, the evidence from other health systems such, as the United States, is conflicting [7], and data on COVID-19 cases and mortality in Canada by ethnicity are more limited [8, 9].

The mechanisms driving the inequalities are unclear but have been posited to be related to a complex and interrelated patterning of multiple factors, including medical factors – such as comorbidities and medication use – as well as social determinants, including cultural, behavioural, and occupational factors, and structural inequalities [10]. The presence of comorbidities has been associated with both a higher risk of SARS-CoV-2 infection and worse outcomes in individuals with COVID-19 [11, 12], while medications (i.e., certain glucose-lowering [13] or immune-modifying drugs [14]) have been linked to either an increased or reduced risk of COVID-19 outcomes. Among the social factors, poor living and working conditions, low income, health literacy, poverty, or exposure to air pollution have all been associated with COVID-19 infectivity and mortality [15,16,17]. Robustly ascertaining the comparative contributory effects has been difficult to untangle, and one study [18] has sought to quantify potential mediators, rather than reporting ‘overall’ effects [2, 4, 19,20,21].

Establishing a nuanced understanding of ethnic inequalities in COVID-19-related outcomes is needed to reduce the burden of COVID-19 and may permit rapid public health interventions should modifiable factors be identified. Here, we carried out two observational studies with harmonised designs to reduce the bias due to heterogeneous definitions of exposures, outcomes, and confounders, in the UK (England) and Ontario to quantify the associations between ethnicity and COVID-19 severity and explore potential modifiable and non-modifiable explanatory factors. We then sought to synthesise these cohort-level estimates using a meta-analysis.

Methods

Data sources and study population

QResearch

QResearch database (version 45) comprises individuals registered across 1321 general practices covering 18% of the English population with linkages of primary care data to hospitalisation, intensive care (ICU) admission, and mortality data. For this study, we included 9,828,099 adults aged 18 to 99 years contributing to the QResearch database with at least 12 months of continuous prior registration. The study period ran from the date of the first confirmed SARS-CoV-2 infection in England (24th January 2020, start of follow-up) until 31st October 2020 (end of follow-up), the occurrence of outcome, or death, whichever occurred earlier.

Ontario

The second data source is the population-level healthcare administrative data in Ontario, Canada’s most populous and most ethnically diverse province. These data include the entire population of Ontario (currently 14.5 million, representing nearly 40% of the Canadian population) and are linked to sociodemographic information, hospital, and ICU admissions; in this investigation, 10,273,496 people aged over 18 years were included. The study period ran from the 25th January 2020 (start of follow-up) to 30th September 2020 (end of follow-up), the occurrence of outcome, or death, whichever occurred earlier.

Ethnicity and COVID-19 outcome: pooled analysis

In the first analysis, we explored the association between self-reported ethnicity and COVID-19 related death, hospitalisation, and ICU admission: these outcomes were slightly different in the QResearch and Ontario cohorts as based on country-specific definitions. For QResearch, outcomes included: (a) COVID-19 death, defined as either confirmed or suspected COVID-19 on death certificate, or a death from any cause with a confirmed positive SARS-CoV-2 test in the immediately preceding 28 days; (b) Hospitalisation due to COVID-19, defined as an admission with confirmed or suspected COVID-19 (as per ICD-10 codes U07.1 and U07.2), or new hospitalisation with a positive SARS-CoV-2 test in the immediately preceding 14 days; (c) ICU admission due to COVID-19, defined as admission to ICU with confirmed or suspected SARS-CoV-2 test in the preceding 28 days. In the Ontario database, outcomes were defined as: (a) COVID-19 death, defined as any death with a confirmed positive SARS-CoV-2 test in the immediately preceding 28 days; (b) Hospitalisation due to COVID-19, defined as an admission with confirmed or suspected COVID-19 (as per ICD-10 codes U07.1 and U07.2), or with a positive SARS-CoV-2 test between 28 days prior to and 14 days after the admission date; (c) ICU admission due to COVID-19, defined as a hospital admission that included ICU stay with confirmed or suspected COVID-19 (as per ICD-10 codes), or with a positive SARS-CoV-2 test between 28 days prior to and 14 days after the admission date.

We utilised a 3-level ethnicity classification comprised South Asian (Indian, Bangladeshi, Pakistani), Chinese, and ‘General Population’ (all other ethnic groups, 87.5% White in this cohort), based on the UK Office for National Statistics Census ethnic classification. In the Ontario linked healthcare administrative database, ethnicity was ascertained based on surnames, using lists that have been previously validated in this population to identify the two largest ethnic groups in Canada: South Asian and Chinese [22]. The positive predictive values for this approach to identifying ethnicity, when compared to self-reported ethnicity, are high: 89.3% for South Asians and 91.9% for Chinese; specificity 99.7% for both. People whose surnames were not on either list were labelled as ‘General population’ (all other ethnic groups, approximately 80% White). ‘General Population’ was used as the reference category for analyses.

The analyses were adjusted for demographic, clinical, and lifestyle factors (Supplementary Material; Tables S1 and S2); estimates of the associations between ethnicity and each of the three outcomes obtained in the QResearch and Ontario cohorts were combined in a two-stage random-effects meta-analysis. Further details on the definitions of the population and confounders are reported in the Supplementary Material.

Percentage of excess risk mediated by risk factors

The contribution of possible ‘risk factor’ classes to the increased relative risks in different ethnic groups was quantified in the QResearch data as the ‘percentage of excess risk mediated’ (PERM) [23]. By evaluating the change in the magnitude of the exposure-outcome association in models with different confounders, this analysis helps clarify the extent to which a confounder (or a set of confounders) accounts for the association between ethnicity and COVID-19 outcome. For the PERM analyses, we defined 5-level ethnic groups as Mixed ethnicity, South Asian, Black, and ‘Other’ ethnic groups; hazard ratios (HRs), relative to White, were estimated separately for each of the three outcomes and the following set of confounders: ‘minimally adjusted’ model (age, sex, and region); household and social factors; comorbidities; lifestyle factors (including BMI); and ‘maximally adjusted’ model.

Statistical analyses

Country-specific baseline socio-demographic and clinical characteristics were summarised using descriptive statistics by COVID-19 related hospitalisation, ICU admission, and mortality. In QResearch, survival analyses to evaluate the adjusted association of 3-level ethnicity with COVID-19 outcomes, accounting for clustering of practices (robust standard error), were performed with the Royston-Parmar model [24, 25]. We performed multiple imputation to replace missing values for ethnicity (20.2% missing), deprivation (0.6%), BMI (16.7%), and smoking status (5.2%) using chained equations under the missing at random assumption. These variables were modelled following a multinomial logistic model for ethnicity, ordinal logistic model for smoking and alcohol, and truncated regression for BMI. Results from five imputations were pooled using Rubin’s rules [26]. Complete case analyses and time-varying associations were performed as sensitivity analyses. Cox survival regressions were conducted to evaluate the adjusted associations of ethnicity with COVID-19 outcomes in the Ontario database; as the only missing data were a small number of people missing deprivation data (0.2%), we used complete cases regressions. The proportional hazards assumption was checked in both survival analyses by plotting log-log plot. HRs from maximally adjusted models were used as the common measure of association across QResearch and Ontario cohorts and combined with the DerSimonian-Laird random-effects method in a two-stage meta-analysis; heterogeneity was assessed with I 2.

In QResearch, we applied the following formula to estimate PERM using the HR across imputed datasets was:

$$\textrm{PERM}=100\frac{\ \left[\textrm{HR}\left(\textrm{age},\textrm{sex},\textrm{region}\right)\kern0.5em -\kern0.5em \textrm{HR}\left(\textrm{age},\textrm{sex},\textrm{region}+\textrm{risk}\ \textrm{factor}\ \textrm{group}\right)\right]}{\left[\textrm{HR}\left(\textrm{age},\textrm{sex},\textrm{region}\right)-1\right]\kern0.5em }$$

The PERM was also calculated for ‘maximal adjustment’ in each of the non-White ethnic groups to assess the extent to which inequalities were potentially attributable to the large set of measured adjustment factors.

All p-values are two sided and nominal statistical significance was considered at p < 0.05. We used Stata v.17 for the QResearch statistical analyses and SAS v.9.4 for the Ontario analyses. We followed current guidance for conducting and reporting observational studies using routinely collected health data (RECORD checklist in the Supplementary Material).

Patient and public involvement reporting

Two public representatives advised on interest and appropriateness of the research questions, were involved in writing the protocol for the wider study, and input on lay-summaries describing the planned study.

Results

Study populations

In the QResearch cohort, there were 9,828,099 individuals; during follow-up, 11,597 COVID-19 deaths, 21,917 hospitalisations and 2932 ICU admissions occurred; in the Ontario cohort, corresponding figures were 10,273,496 individuals, 951 COVID-19 deaths, 5132 hospitalisations, and 1191 ICU admissions (Table 1). Ethnicity data and their classifications are summarised in Table 1 and characteristics stratified by ethnicity are provided in Tables S1 -S2.

Table 1 Baseline sociodemographic, clinical characteristics and outcomes in the QResearch and Ontario cohorts

Cohort studies and meta-analyses

In QResearch, South Asian ethnicity was associated with increased rates of COVID-19 mortality (HR: 1.35; 95% CI: 1.20, 1.51; Fig. 1 and S1), hospitalisation (1.63; 1.51, 1.75; Fig. 1 and S2), and ICU admission (1.93; 1.67, 2.25; Fig. 1 and S3) compared to the general population group; corresponding estimates in Ontario were 2.04 (1.56, 2.68) for mortality, 1.41 (1.24, 1.59) for hospitalisation, and 1.41 (1.10, 1.79) for ICU admission. In the same maximally adjusted models, in QResearch there was no evidence of increased rates of COVID-19 mortality (HR: 1.12; 0.75, 1.66), hospitalisation (0.86; 0.67, 1.11), or ICU admission (1.20; 0.68, 2.11) in Chinese ethnic group compared to the general population group, whilst in Ontario the HRs were 0.92 (0.67, 1.25) for mortality, 0.79 (0.69, 0.91) for hospitalisation, and 1.29 (1.02, 1.63) for ICU admission. For all three outcomes, the direction of associations was similar for most of the confounders available in both the QResearch and Ontario cohorts, indicating an increased risk associated with the presence of medical conditions and a progressively higher risk in older people and larger households (Fig. S1-S3). In the QResearch cohort, complete case estimations were largely similar to those of the main analyses using multiple imputation (Fig. S4); time-varying associations by ethnic groups are presented in Fig. S5.

Fig. 1
figure 1

Cohort-level meta-analysis of individual participant data from QResearch and Ontario. Estimates and number of events and participants are shown following multiple imputation in QResearch cohort and for complete-case analysis in Ontario cohort. The reference ethnic group is “general population”, including: (1) people not South Asian and Chinese in Ontario (approximately 80% White); (2) White, Other Asian, Black African, Black Caribbean, and Other in QResearch

Combining estimates for South Asian ethnicity across QResearch and Ontario cohorts resulted in a random-effects HR of 1.63 (1.09, 2.44) for COVID-19 related mortality, with considerable heterogeneity between the two estimates (I 2 86.9%; Fig. 1). Corresponding estimates for hospitalisation and ICU admission were 1.53 (1.32, 1.76) and 1.67 (1.23, 2.28), with considerable heterogeneity: I 2 75.4% and I2 74.9%, respectively. The pooled random-effects HRs comparing Chinese ethnicity to the general population were 0.99 (0.77, 1.26) for mortality, 0.81 (0.72, 0.91) for hospitalisation, and 1.28 (1.03, 1.58) for ICU admission; there was no evidence of heterogeneity for all three outcomes (I2 0%; Fig. 1). There was no clear trend in the mortality, hospitalisation, or ICU admission HRs comparing ethnic groups across levels of deprivation (Fig. S6 ).

Percentage of excess risk mediated by risk factor classes (QResearch)

The percentage of excess risk mediated by separate groups of potential attributable factors across the entirety of follow-up in QResearch is reported in Table 2. We estimated that approximately 20-30% of the excess risk of COVID-19-related hospitalisation in non-White ethnic groups may be mediated by household size/status and deprivation; and that differences in comorbidity prevalence may mediate up to approximately 20% of excess risk (in South Asian). For COVID-19-related ICU admission, adjustment for comorbidities accounted for up to approximately 30% of the excess risk, whereas maximal adjustment accounted for up to approximately 40% of the excess risk (in Black ethnic group). Differences in smoking habits and BMI did not appear to mediate any degree of excess risk of COVID-19-related death in any non-White ethnic group. Maximal adjustment accounted for 42.9% (South Asian) and 39.4% (Black) of the excess risks of death. Therefore, the majority of excess risk in non-White groups may not be accounted for the range of sociodemographic, lifestyle, and comorbidity factors considered in this analysis.

Table 2 Percentage of excess risk mediated in COVID-19-related outcomes in ethnic minority groups (QResearch cohort)

Discussion

In this international study of population-level healthcare databases covering over 20 million individuals, we showed that adults of South Asian background had a 63% increased risk of COVID-19 mortality, 53% increased risk of COVID-19-related hospital admission, and 67% increased risk of ICU admission overall compared to the general population in England and Ontario. This compares to 28% of increased risk of ICU admission in Chinese, with no evidence of increased mortality and hospitalisation risks. In England, sociodemographic, lifestyle, and clinical factors accounted for approximately 40% of excess risks of COVID-19 death.

Our results are consistent with other UK population-level analyses derived from data using combinations of different IT systems, which also reported similar estimates of risk in non-White ethnic groups [2]. In this respect, it is important to note that the risks of COVID-19 outcomes estimated in QResearch across ethnic groups, and combined with the results from Ontario, should be considered in view of some variations in the magnitude of associations between ethnicity and COVID-19 outcomes both between waves and within the same wave; more importantly, the public health implications of these variations are primarily determined by the country- and region-specific change in the absolute risk of each outcome over time [27].

Whilst it is increasingly established in the literature that non-white ethnicity is associated with increased risk of severe COVID-19 outcomes, the degree to which modifiable and other factors may contribute to this risk in different ethnic groups is poorly understood. Some ethnic communities may be disadvantaged as living in poorer socioeconomic environments where the risk of infection and worse outcomes is higher, including overcrowded multigenerational houses or occupations with a high degree of public contact [2, 18]; at the same time, biological factors have been suggested to play a role as well, such as an unfavourable metabolic-inflammatory milieu (i.e., obesity, multimorbidity) [11, 20, 28]. In our investigation, rather than reporting summary effect estimates after full or serial adjustment, our approach in the QResearch also included assessment of relative contribution of potential attributable factors and suggests that there may be heterogeneity in the mechanistic underpinnings the increased risks in different ethnic groups. Our study found that the sociodemographic, lifestyle, and clinical factors considered in this investigation accounted for approximately 40% of excess risks of COVID-19 death. Hence, further research should investigate whether other factors, not captured in our data, may explain the proportion of excess risks in some ethnic groups and possible causal pathways in the COVID-19 syndemic [29]. It is possible, in fact, that ethnic differences are at least in part the epiphenomenon of a complex network of other risk factors associated with a higher risk of COVID-19 outcomes, including overcrowding and occupation [30].

Our study analysed in greater detail the differential effects of deprivation within ethnic groups, as well as the relative contributions of different factors to the increased risks in non-White groups, given the suggested interplay between ethnicity and deprivation on the risk of COVID-19 outcomes [31]. Our results also expand and clarify the evidence base regarding ethnic inequalities in COVID-19 outcomes in several ways. First, in contrast to evidence generated using data only from those attending hospitals or registered with providers within fragmented healthcare systems investigating the role of sociodemographic and clinical characteristics on the risk of outcomes across ethnic groups [27, 32], our population-level approach examined the relevant risk trajectories and avoided conditioning on positive tests or other intermediates [33]. Second, much of the available evidence about ethnicity and COVID-19 related outcomes is highly heterogeneous in terms of study designs, population, definitions of outcomes/exposures, confounders adjusted for (if any), and settings (geographical and healthcare system). This negatively affected individual study interpretation but also limited the cohesive synthesis of evidence via meta-analytical approaches due to significant within- and between-study heterogeneity [4, 34]. We explicitly sought to harmonise analytical approaches to facilitate pooling of robust estimates from multiple geographical units, namely different nations (England and Canada). Other key strengths of our study include the use of two large, population-level and representative healthcare databases without selection bias, which possess individual-level linkages across the healthcare network enabling accurate ascertainment of exposures, confounders, and outcomes. Our flexible harmonisation of definitions and analytical approaches facilitated cohort-level meta-analysis of results from both main study databases; we also used the Royston-Parmar survival model which allowed us to explore whether the association between ethnicity and COVID-19 related outcomes changed across the first and second wave. Lastly, we investigated the possible mediation role of some factors in explaining the increased risk observed across ethnic groups in UK. In this respect, it should be noted that different methods exist to investigate mediation (including the possibility to account for intermediate confounding) [35]; furthermore, while the difference between a confounder and mediator is well-known, the same factors may be considered mediators in some context and confounders in others, or even in the same context by different investigators [36], further highlighting the complex interactions among multiple factors in determining the health status. Moreover, some potential mediators have not been included in our analyses (i.e., education, employment status, income). As such, our PERM results should be considered explorative and no definitive causal inference can be derived from them: it is plausible that the comparative causal role of these factors would be different in heterogeneous healthcare systems and societies. Our study has also some limitations, including the inability to further disaggregate ethnicity into more granular groups in Ontario; lack of recorded other information that may be relevant to disease risks (such as occupation, which is relevant to SARS-CoV-2 exposure, and detailed household composition) [37]; the risk of residual confounding, which affects every observational analysis and hampers a conclusive causal interpretation; missing data, which were addressed assuming a missing at random mechanism, yet previous evidence would indicate that ethnicity could be missing not at random: [38] however, the complete-case analysis for the latter scenario [39] resulted in estimates virtually identical to those obtained using multiple imputation; and the potential variations in the ascertainment of COVID-19 infections over time, between countries, and among ethnic groups [40]. Furthermore, the contribution of potential attributable factors was explored only in the QResearch cohort as several of these factors were not available in the Ontario administrative data.

Evidence from large-scale cohort studies in England and Canada and from meta-analyses provide robust evidence of ethnic inequalities in COVID-19 outcomes. Not only do these persist despite accounting for potential sociodemographic and clinical confounders but the risks in individual ethnic groups have varied during the pandemic. The currently unexplainable proportion of excess risks in non-White groups requires careful consideration of economic, healthcare system, and other factors to guide public health strategy to protect everyone as the pandemic progresses globally.