Introduction

The world is currently experiencing a pandemic of coronavirus disease (COVID-19) caused by the Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) [1]. The risk of morbidity and mortality from the virus is strongly stratified, with poor clinical outcomes considered more likely in certain vulnerable groups. For example, studies from different countries have established that older age groups are at increased risk of death [2, 3].

The ability to identity the population groups most at risk from the virus has manifold public health purposes. Using such data, stratified vaccination policies for governmental delivery can be designed, similar to those for influenza [4]. It may also be possible to prioritise more active monitoring of groups more at risk of clinical deterioration, and facilitate access to healthcare facilities by early identification of the individuals most likely to progress to severe disease who would thus be in need of intensive care and ventilation. Official advice can be issued to vulnerable groups to let them know that they are more at risk from SARS-CoV-2 virus, to promote behaviour modification [5, 6]. Such population groups can also be the target of more formalised ‘segment and shield’ approaches: having divided the population into groups that present with similar health care concerns and needs (segmenting) it is possible to determine which groups require extra protection by reducing interaction with other groups (shielding), whilst relaxing restrictions for the rest of the population [7]. Potential public health policies along this route have been critiqued, however, on an inclusivity basis, particularly due to the unintended harmful consequences to already marginalised groups [8].

In the UK, vulnerable people were stratified into two tiers early on- 30th March 2020 (Table 1); those at risk of severe illness, who were advised to be particularly stringent with social distancing measures, and those within that group at further risk – described as ‘shielded’ individuals – who were advised to self-isolate and were provided with additional advice [9,10,11,12]. The former categorisation was based on the groups targeted for National Health Service programmes on influenza vaccination and the latter on clinical consensus. These strata were deliberately broad, to maximise the number of individuals protected. As the evidence evolves – e.g. regarding whether the development of lesions in the cardiovascular system contributes meaningfully to disease pathogenesis in patients with and without pre-existing cardiovascular conditions [13] – there is the opportunity for the categorisation of risk of COVID-19 and serious outcomes from COVID-19 to become more evidence-based.

Table 1 UK risk groupings for COVID-19 disease on 30th March 2020

During epidemics and pandemics of emerging infectious diseases, it is critical to rapidly and accurately identify the populations most at risk. In the case of COVID-19, we undertook a systematic review and quality assessment of the rapidly-evolving global literature in this area, looking at three key outcomes: COVID-19 disease, disease severity, and mortality from the condition. Any potential risk factors, populations, and study designs were included. Arising from our findings, we highlight key knowledge gaps in the current literature and the need for unified global approaches moving forwards, particularly for the next pandemic.

Materials and methods

Literature search

We systematically searched Medline, Embase, and Global Health (all via the Ovid platform), in addition to the Web of Science, for published literature between 1st November 2019 and 26th March 2020; then subsequently updated this search for a later period to 29th April. In order to avoid missing publications on risk factors, only terms specific to the virus and the disease were used, which were combined with ‘or’:

  • ‘coronavirus’

  • ‘covid-19’

  • ‘severe acute respiratory syndrome coronavirus 2’

  • ‘2019-nCoV-2’

  • ‘SARS-CoV-2’

  • ‘acute respiratory syndrome’

No limits or filters were applied to the search. The same search terms were used across all databases.

Reference lists of included papers and review articles were also searched, as was the grey literature of public health reports for the 26 countries with the highest numbers of reported patients with COVID-19 at the end of April 2020, for other countries it was assumed there would be insufficient numbers of cases to yield relevant data.

Eligibility criteria and study selection

The following inclusion and exclusion criteria were applied to the search results.

Inclusion criteria:

  • Studies had to provide comparative data on risk factors of any kind for disease (versus no disease), severe disease (versus milder disease) or mortality (versus survival),

  • Studies were eligible if they presented data on patients with polymerase chain reaction (PCR)-confirmed SARS-CoV-2 infections. There was considerable variation in case definitions between studies, but PCR testing was the gold standard test for active disease at the start of the pandemic [14], and other testing methods such as Loop-Mediated Isothermal Amplification or serological tests were not included,

  • Any study design,

  • Any population group,

  • Any language of publication.

Exclusion criteria:

  • No comparator group included in the study,

  • Publication concerned other viruses and diseases,

  • Work conducted in animals or in vitro,

  • Study population was less than 20 individuals.

Two reviewers independently screened all titles, abstracts and full texts for both literature searches. Discrepancies were resolved by consensus. In all cases where studies were published in any language other than English, with no translations available, these were screened by at least one additional reviewer, with further quality control by another member of the reviewing team.

Data extraction

Three reviewers independently double-extracted the studies into a pre-designed spreadsheet that collected:

  • First author,

  • Paper title,

  • Journal,

  • Type of study,

  • Country,

  • Study population,

  • Overall number in study,

  • Number with PCR confirmed SARS-CoV-2,

  • Median age of participants/age range,

  • Sex ratio,

  • Analytical method used,

  • Factors adjusted for during the analysis,

  • Whether disease, disease severity, or death (or a combination of these) was the outcome of interest,

  • The definition of disease severity used, if applicable,

  • The risk factors analysed and the direction of effect.

Results were compared and discrepancies resolved by discussion. Data from studies published in languages other than English, at this stage only the Chinese language, were extracted by two additional reviewers, with further quality control by another member of the reviewing team.

Quality assessment

Two reviewers independently assessed the quality of included studies. Studies published in languages other than English were quality assessed by two additional reviewers, with further quality control by another member of the reviewing team. Assessments were undertaken from the perspective of the objectives of this review, which were not necessarily identical to the objectives of the underlying studies. The quality of included studies was assessed using a checklist adapted from Downs and Black [15], as per the guidance issued by Deeks et al. [16] When assessing the power of studies, the minimum sample size required to detect a relative increase in risk of 10% from a statistically conservative baseline of 50% among the unexposed was calculated at different powers using the Kelsey method within Epi Info, software made available by the United States Center for Disease Control [17]. This 10% value was based on governmental discussions taking place in the UK at the time the review took place. An alpha of 5% was set as the standard. Pragmatically, we assumed only two strata and a ratio of 1:1 between exposure strata. Different thresholds were used for case-control studies and for cohort or cross-sectional studies. These criteria were scored from 0 (< 70% power) to 5 (> 99% power). We considered results sufficient adjusted for confounding if they adjusted for at least the minimal variable set of age, sex, ethnicity and any measure of comorbidities. For ethnically homogenous populations, the need for adjustment for ethnicity was discounted. If two analyses were presented within a single paper with different quality scores, the most conservative score was retained. Studies were not excluded on the basis of the quality assessment.

Analysis and synthesis

Studies were grouped on the basis of the outcome examined (disease, disease severity, mortality) and then the risk factors examined. Results were classified on the basis of whether they presented evidence as to the exposure under study being a risk factor, taking into account the number of individuals exposed. Where studies focussed on a single risk factor of interest with adjustment for confounding, we extracted all data on potential risks in order to maximise the value of our dataset (whilst accepting that such mutually adjusted estimates for covariates may remain confounded even if that for the primary exposure does not) [18]. As there was substantial heterogeneity in study design, reporting, and the risk factors examined, we present a detailed descriptive summary and narrative synthesis of our findings, rather than a meta-analysis.

Registration and reporting

This review was registered on PROSPERO as CRD42020177714 and is reported according to the PRISMA guidelines.

Results

Two thousand eight hundred and sixty-eight hits were obtained by the searches across the two dates (Fig. 1). After de-duplication across the different databases, this was reduced to 1238. Thirty studies were included at the extraction stage; the main reasons for exclusion were small numbers of participants and studies not having a comparator population. From the grey literature an additional report was included and two studies were identified from reference lists.

Fig. 1
figure 1

PRISMA flow chart of selection

Included studies are presented in Table 2. Twenty-nine of the 33 studies were conducted in China, with one each from France, Italy, Singapore and a combined study from England, Wales and Northern Ireland. Six were studies with COVID-19 disease as the outcome, 20 of disease severity and ten of mortality. One additional study looked at a combined outcome of disease severity and mortality.

Table 2 Included studies

Quality assessment

Included studies were generally too small to detect a 10% increase in risk of disease, disease severity, or mortality (Table 3). One study among the 33 was assessed to have 95% power and two others 99%; all were large, national, investigations. As 26 studies were purely descriptive or presented univariable analysis only, there was no adjustment for confounding. Remaining studies with a regression component did not adjust for our minimal confounder set. Only nine studies provided estimates of the random variability of effect estimates. The majority of studies ascertained exposure information from clinical records, which would have collected data prospectively and thus with limited recall bias. Blinding of outcome and exposure recording by investigators was not documented. In the case of certain disease severity outcomes, such as admittance to intensive care units (ICU), variability in thresholds for reaching these outcomes is likely to exist between settings and clinicians.

Table 3 Quality assessment

Risk factors for disease

Six studies compared the likelihood of having COVID-19 to other infectious conditions (Table 4). Of note, as testing strategies were largely focussed on hospitalised individuals i.e. those displaying noticeable symptoms, studies were of the likelihood of COVID-19 disease, rather than more broadly of SARS-CoV-2 infection (and particularly of severe disease, although patients with mild and symptomatic infection were also reported to be hospitalised in some studies for the purposes of isolation or observation). Age and sex were key foci as potential risk factors, comparing patients with COVID-19 to either: a) SARS-CoV or Middle Eastern Respiratory Syndrome (MERS), or b) other forms of pneumonia. Generally, sex ratios were skewed such that men were over-represented among those with disease. In England, Northern Ireland, and Wales, Asian and Black individuals were found to be at increased risk of COVID-19 in descriptive analyses, with 15.4 and 10.7% of patients falling into these groupings, respectively, versus 5.8 and 2.8% of individuals with other viral pneumonia [30]. Higher body mass index (BMI) was also suggested to be a risk factor with two descriptive analyses, for example in the Intensive Care National Audit and Research Centre (ICNARC) report 31.2% of COVID-19 patients had a BMI of 30- < 40, versus 23.5% of people with other viral pneumonia [30, 37]. Given the large, national, scope of the ICNARC dataset, results from it are particularly likely to be reliable.

Table 4 Potential risk factors for disease

Risk factors for severe disease

Among the 20 studies of risk factors for severe versus milder disease and one of a mixed outcome (severe disease and death), a wide array of definitions of severity were used, such as ICU admission, the need for mechanical ventilation, and various measures of respiration and oxygenation (Table 2). Many risk factors were examined (Table 5). As well as potential demographic risks (age, sex, ethnicity), behavioural traits (smoking) and broad clinical factors (BMI, infectious diseases) were analysed. Large numbers of papers sought to explore the implications of different comorbidities on the risk of severe COVID-19, particularly respiratory and cardiovascular conditions.

Table 5 Potential risk factors for disease severity

The least equivocal evidence was presented for age as a risk factor, including four studies where it was an independent risk in a multivariable regression model [19, 20, 31, 36]. The clearest analysis to present age data (i.e. which used different comparison groups) was a univariable regression model where individuals 65 years and over had 3.26 times the hazard rate of ARDS than those under 65 [44]. Eight studies suggested that diabetes could be a risk factor [19, 31, 36, 39, 41, 43, 44, 50], six hypertension [31, 36, 41, 43, 44, 50], and four the presence of unspecified comorbidities) [39, 41, 48, 50], but the balance of evidence for these co-morbidities being risk factors was generally inconclusive. Many other factors were examined by one study, often with small numbers of individuals with the condition. None of the included studies for disease severity were assessed to have been powered to detect a 10% increase in effect size.

Risk factors for mortality

Ten studies examined risk factors for mortality, often by nesting case-control studies within prospective or retrospective cohorts (Table 6). Among these studies, many included statistical testing, but none presented an adjusted regression model for the risk factors considered.

Table 6 Potential risk factors for mortality

Eight studies examined age and all provided evidence for it being a risk factor for mortality [21, 25, 27, 35, 44,45,46,47], although none adjusted for other factors, such as comorbidities. Age groups from 50 upwards were considered particularly at risk. In the single regression analysis, the hazard rate for death in those 65 years or over was estimated to be six times that of individuals under 65 [44]. The evidence was similarly consistent for general comorbidities (albeit all the studies were descriptive); among individuals who died, comorbidities were 1.5 to 2.8 times more common than among those who survived [21, 35, 46, 47, 51]. Specific comorbidities were discussed in several studies, generally under overarching classifications such as ‘cardiovascular disease’ or ‘diabetes’, with more specific definitions not provided. Evidence was more equivocal, but still in favour, of hypertension [3, 21, 25, 27, 47, 51], cardiovascular disease [21, 25, 35, 45, 47, 51], diabetes [21, 25, 45,46,47, 51], and chronic respiratory/lung diseases being risk factors (references presented for studies in support only) [21, 45, 51]. Of these studies, data from two well-powered, national-level studies from China supported cardiovascular disease and diabetes as risk factors for mortality from COVID-19 [25, 45].

Discussion

In this systematic review of risk factors for COVID-19 disease, disease severity and mortality, we document 33 comparative studies examining sociodemographic, behavioural and clinical exposures. Age and sex were very commonly examined; a wide array of comorbidities have also been considered.

Within the synthesised evidence, risk factors for mortality were the clearest, plausibly partly because this outcome is easy to define. Increasing age (different studies presented different thresholds, but being over 50 years of age was common) was an uncontested risk factor. Five studies also presented evidence for the presence of any comorbidities being a risk factor [21, 35, 46, 47, 51], with none demonstrating evidence against. Given the increasing prevalence of comorbidities with age, the lack of adjustment for confounding in these studies likely over-emphasises the effect size of each risk factor. We note that work subsequent to our literature search documents an independent effect of age on COVID-19 mortality from overall comorbidities, as measured by the Charlson Comorbidity Index Score, but not vice-versa [52]. Another study published outside of the time range of our search found both age and an array of comorbidities, each analysed separately (chronic cardiac disease, chronic pulmonary disease, chronic kidney disease, chronic neurological disease, dementia, malignancy, moderate/severe liver disease; and obesity), to be independent risk factors (as well as sex) [53].

Risk factors for severe disease were more complex to synthesise, likely due to the mixed array of outcome measures that can also be prone to observer bias. The impact of age was very commonly assessed, generally showing evidence in favour of this being a risk factor (with a similar age spectrum to the mortality data). Ethnicity was studied in two publications internationally [26, 30], with mixed results. We note that such findings are likely to be highly context-specific, given that ethnicity acts as a proxy for a series of sociodemographic factors that are highly relevant to the spread of an infectious condition (as well as, perhaps, some biological traits).

Studies of risk factors for COVID-19 disease have been complicated by testing strategies globally, which have largely been concentrated on severe disease. As our knowledge of the full symptom spectrum of the disease moves forward, it will be possible to have a broader case definition that does not solely focus on viral testing, and thus the ability for more generalised complementary studies. Additionally, serological surveys assessing the history of infection with SARS-CoV-2 in different population groups will allow the identification of risk factors for infection, whether symptomatic or not. Both ethnicity (Black and Asian individuals at higher risk; from a single study in England, Northern Ireland and Wales) [30] and higher BMI were found to be associated with disease severity within the included literature [30, 37], again from descriptive studies only. While these studies were not eligible for our review, we note a series of reports from non-comparative studies documenting the potential influence of ethnicity on the likelihood of getting COVID-19 e.g. the work of Price-Haywood from the US [52]. Male sex was reasonably consistently shown to be a risk factor for presence of COVID-19 but not with severity of disease or mortality [24, 30, 40]. As with ethnicity, socioeconomic and behavioural factors make this association likely to vary between settings.

In considering the role of comorbidities in COVID-19, it is important to consider the underlying pathology of the virus. Respiratory coronaviruses associated with the common cold in immunocompetent people generally affect only cells in the upper respiratory tract (URT), whereas the previously discovered highly pathogenic coronaviruses SARS-CoV and MERS-CoV affect cells in the URT and lower respiratory tract (LRT). SARS-CoV-2 has been shown to do the same [54], and one of the host cell receptors it targets is Angiotensin-Converting Enzyme 2 (ACE2), with a second major receptor being Transmembrane Serine Protease 2 (TMPRSS2) [55]. SARS-CoV-2 can infect all the major cell types in the respiratory tract – type I and type II pneumocytes, alveolar macrophages and endothelial cells [56, 57]. This infection leads to cell death, with significant leaking of fluid into the alveolar spaces (pulmonary oedema), which compromises gas exchange [58], eventually leading to ARDS. The inflammatory response adds aggregation of repair proteins such as fibrin, which can lead to creation of hyaline membranes which further reduces the surface available for gas exchange [58]. Subsequently, inflammatory cells are activated, recruited by release or exposure of cytokines such as the interleukins (IL) 1β and 6, monocyte chemoattractant protein-1 [56], and proteins of the extracellular matrix, as well as upregulation of the complement system. Inflammatory cells release cytokines which have systemic effects, eventually leading to disseminated intravascular coagulation (DIC), hypotensive shock and metabolic disturbances if not checked [58].

This pathogenesis therefore offers several points where co-morbidities may exacerbate the process. The target receptor TMPRSS2 is modulated in response to air pollution and in autoimmune conditions such as asthma [55], which may affect the number of receptors available for SARS-CoV-2 to target, and ACE2 is involved in the renin-angiotensin system (RAS) which controls blood pressure. Viral interference causes dysfunction, which leads to a pro-inflammatory state and increased vascular permeability in response to changes in vascular contraction and sodium homeostasis – exacerbating the effect from the physical damage to the affected cells [58]. Conditions causing hypertension – both primary and secondary to renal disease, endocrine dysfunctions such as hypothyroidism, cardiovascular dysfunction such as arteriosclerosis, or neurological dysfunctions such as acute stress – also affect the RAS [58], meaning that these conditions might be expected to exacerbate pathology caused by SARS-CoV-2. Any condition creating a pro-inflammatory state, such as type II diabetes or pre-existing infection, or involving autoimmunity, such as type I diabetes, might also be expected to contribute to increased pathology. There is also the direct effect of cell damage – if the target tissues are already damaged this reduces ‘spare’ capacity and therefore the leeway for adaptation to allow the host to continue to maintain homeostasis whilst still being able to eliminate the pathogen and repair the damage. The need for inflammatory cells to clear the infection is also a potential area of interface with comorbidities e.g. conditions such as unsuppressed HIV infection, or congenital deficiencies, or cancer malignancies; or the administration of immunosuppressant drugs such as chemotherapy for cancer or steroids.

The effect of ageing was particularly strong within our review, both in terms of the magnitude of effect estimates and the number of studies presenting evidence. As well as the above impact of comorbidities, we note that the host’s age may influence pathogenesis, both in terms of the likelihood of having various comorbidities, and also due to its effect on the immune system. Indeed, the immune system becomes less effective over time (immunosenescence), which affects the quality and number of immune system cells generated [59]. Given the scale of the impact of age documented within this review, it seems unlikely that its effect can be explained by a single or a small number of comorbidities which are yet to be detected. This opens up the need to explore biological markers, for example ACE2 [60], and markers of immunosenescence.

The strengths of our review include its systematic approach and broad use of search terms to avoid missing studies. We additionally present a quality assessment to aid the interpretation of the strength of the evidence. In some instances, included publications may have focussed on one specific outcome, whereas our quality assessment took the perspective of the outcomes extracted for this review. We were unable to detect instances where two publications used the same patient populations for their analyses, potentially over-emphasising certain findings. Given the global nature of the pandemic, our review includes studies from around the world, albeit with a large preponderance from China, including studies conducted early after the emergence of SARS-CoV-2 when the at-risk population was predominantly those who had contact with Huanan seafood market and their contacts, and not necessarily representative of the general population. We note a particular lack of studies from the African continent and the Americas, which may have implications for generalisability. Given the rapidly evolving literature on COVID-19, we also note our exclusion of studies published online after April 2020 (and the time period in which the surrounding text was written), for example the Dai report on cancer as a risk factor [61] and our exclusion of preprints (which was undertaken to ensure that all included studies had undergone an external quality assessment prior to inclusion).

Across the included publications, variability in study design, exposure and outcome measurement, and analyses made exact syntheses of effect sizes across different risk factors very difficult. Measures of disease severity varied, e.g. admission to ICUs or clinical parameters such as percentage oxygen saturation of the blood. Even measures such as admission to ICU can be subjective and may be time-, clinician-, and health systems-dependent. If severity is recorded at admission, risk factors may reflect issues associated with delayed access to healthcare, which may differ between settings and healthcare systems. It is also important to note that, in some studies of disease severity, mild disease included both people who were hospitalised with symptoms and asymptomatic individuals identified through contact tracing. Generally, analyses were descriptive or univariable and thus did not control for confounding. As documented above, this may be particularly problematic when it comes to separating the impact of age and the presence of comorbidities, as well as for identifying which comorbidities truly increase risk, given that many patients may have multi-morbidity.

The implications of our findings are two-fold for COVID-19, firstly for current public health practice and secondly for the design of future studies. We flag a number of factors of interest that should be considered by governments and public health agencies when designing shielding strategies and the targeting of future vaccines, as well as in mathematical modelling projecting the likely impact of the pandemic over time. We note, however, the need for sensitive handling of population groups deemed to be at higher risk, and how such labelling does not devolve responsibility from public bodies to these individuals for their own welfare [8]. Some public health agencies are now including reporting of potential risk factors in their routine outputs, including ICNARC (included in this review) [30] and the newer European Centre for Disease Prevention and Control reports, which were released after this review was conducted [62].

Our review demonstrates both the volume of literature that can be published within only a few months since the appearance of an emerging infectious disease, and the need for co-ordinated approaches to such pathogens. Global efforts using national datasets are hugely valuable in systematically determining the aetiology of a disease, particularly to detect smaller effect sizes. Determination of the exact threshold of important risk depends on public perceptions of the disease [63], as well as policy needs. Data collection should be standardised where possible, e.g. by using consistent definitions of outcomes and the treatment of exposures (for example for hypertension, given that blood pressure is continuous). (For COVID-19 we note both the valuable World Health Organization interim guidelines on its management in providing consistent approaches for testing and the definition of ARDS [14], and that platforms such as the International Severe Acute Respiratory and Emerging Infections Consortium (ISARIC) have aimed to facilitate such standardisation [64].) The choice of comparison groups should also merit careful consideration; comparison to other forms of the same condition (e.g. SARS and MERS for COVID-19), although interesting, provide little information about risk groups to be currently acted upon. Where key potential risk factors of interest, such as deprivation, are linked to both the disease of interest and the comparator condition, this limits the inferences possible. Saying this, studies of COVID-19 with the comparator group of other forms of viral pneumonia are a useful complement to studies using a general population comparator, as they show whether people with particular risk factors are at risk over and above what they might experience from ‘normal’ respiratory viruses, which might inform the level of additional precautions they could consider taking.

Finally, appropriately adjusted multivariable analyses should be prioritised, in order to separate the implications of different risk factors and to infer true causal relationships, for example exploring specific markers of comorbidity severity and control, such as the use of specific medications. We can then make the recommendations for shielding criteria more targeted, meaning that the public can be made more aware of the risk factors that are likely to have clinical significance and adapt their behaviour accordingly. Early clinical studies during pandemics are critically important and published rapidly under extremely difficult circumstances, but we would argue that high-quality epidemiological studies should also be seen as a priority, and that emergency response plans should include provision of appropriate epidemiological and statistical expertise.

Conclusions

The volume of literature generated in the short time since the appearance of SARS-CoV-2 has been considerable. Many studies have sought to document the risk factors for COVID-19 disease, disease severity and mortality. Age was the only risk factor based on robust studies and with a consistent body of evidence. Mechanistic studies are required to understand why age is such an important risk factor. At the start of pandemics, large, standardised, studies using multivariable analyses – e.g. using national surveillance data – are urgently needed in order to inform stratified approaches to rapidly protecting the population groups most at risk.