Background

The burden of mortality from COVID-19 rises steeply with age [1]. This is due to a combination of age itself, and the prevalence of underlying health conditions. Both age and underlying health conditions are independently associated with severe COVID-19 outcomes, including hospitalisation and mortality [2,3,4,5]. Many of the relevant underlying health conditions are also more common at older ages, and people with underlying health conditions account for the majority of COVID-19-related hospital and intensive care admissions [6]. This is reflected in United Kingdom (UK) national guidance describing individuals at moderate or high risk of severe COVID-19 (Table 1), which is based on age and underlying health conditions [7].

Table 1 COVID-19 moderate- and high-risk groups in national guidance [7] compared to the at-risk population study definition

Characterising the size and distribution of the population at risk of severe COVID-19 is vital for effective policy and planning in response to the COVID-19 pandemic [9]. Age- and region-specific prevalence of at-risk groups are key to predicting mortality and managing pressure on hospital inpatient and intensive care services across the country. Numbers of school-aged children and working-aged adults at risk are important for re-opening local schools and workplaces. Current international and UK national guidance advise that individuals at high risk of death from COVID-19 due to age or underlying health conditions should be a priority for COVID-19 vaccination [10, 11]. Vaccination planning requires at-risk population size for vaccine numbers, and age and regional distribution for modelling impact on regional transmission, since vaccine response typically decreases with older age [12].

Worldwide, modelling based on the Global Burden of Disease (GBD) study suggests that approximately one in five individuals have a health condition that increases risk of COVID-19 [13]. National prevalence studies of COVID-19 at-risk groups are rare. Large household surveys suggest that a third of adults in the United States, and between a third and a half of adults in Brazil, have at least one risk factor for COVID-19 (based on age ≥ 65 years, or underlying health conditions for younger adults) [14, 15]. A previous study estimated that at least 8.4 million individuals in the UK were at risk, but included only a subset of relevant health conditions [16]. Universal healthcare with an electronic health records system offers an opportunity for precise and representative estimation of at-risk prevalence in the UK, which may both support UK policy-making and offer a comparison with national GBD-based modelling estimates to aid in interpretation of GBD-based estimates internationally.

This study aimed to quantify the size, composition, and distribution of the population at risk of severe COVID-19 across the UK in March 2019, using electronic health records to define at-risk status based on all underlying conditions in UK national guidance.

Methods

Data sources

We conducted a point prevalence study among the UK general population using the Clinical Practice Research Datalink (CPRD) GOLD dataset, an anonymised sample of electronic health records from primary care practices across the UK [17]. The dataset includes diagnoses recorded using Read codes, primary care prescribing, and results of tests ordered in primary care. Data validity has been shown to be high [18]. The UK has universal healthcare, and the sample of the population who are in CPRD GOLD was found to be nationally representative by age and sex in March 2011: we re-assessed representativeness in 2019 in a sensitivity analysis [17].

Secondary care (hospital) data linkage is available for approximately 75% of CPRD GOLD-registered individuals in England, based on practice-level consent. For patients admitted to hospital, the Hospital Episode Statistics Admitted Patient Care dataset records diagnoses using International Classification of Diseases ICD-10 codes, and procedures such as chemotherapy using Classification of Interventions and Procedures OPCS-4 codes [19].

The CPRD Pregnancy Register uses validated algorithms, combining information across the primary care record such as antenatal scans, expected delivery dates, and deliveries, terminations and miscarriage records, to date and characterise pregnancies in CPRD GOLD [20].

Index dates

Our primary analysis index date was 5 March 2019 for up-to-date national prevalence estimates. CPRD GOLD coverage peaked in 2014, when it included approximately 7% of the UK population: by 2019 the dataset was smaller and did not cover all regions in England. Since the dataset in 2014 therefore offered greater power than 2019, and full regional representation across England, we repeated point prevalence estimates for 5 March 2014 as a sensitivity analysis.

Pregnancy was described for the index date of 5 March 2014 only, not 5 March 2019, since the latest Pregnancy Register update was in February 2018.

Study population

The study population comprised individuals aged 2–100 years with a current registration and a record meeting CPRD quality criteria (acceptable patient record and practice up to standard) in CPRD GOLD, with at least 1 y’s prior registration to allow recording of underlying conditions [21]. Eligibility started on the latest of: 1 January 2019, second birthday, a year after registration, or practice meeting CPRD quality standards. Eligibility ended at the earliest of: 5 March 2019, hundredth birthday, death, leaving the practice, or last data collection from the practice. Individuals with any time eligible between 1 January and 5 March were included in the main analysis of point prevalence on 5 March to increase study power, with a sensitivity analysis limited to individuals active in the dataset on 5 March 2019.

For pregnancy, the study population comprised women aged 11–49 years. As pregnancy is transient, women were required to be registered in the dataset on 5 March 2014, rather than any time between 1 January and 5 March 2014.

Definition of at-risk population

In national guidance, all individuals aged ≥70 years are considered at moderate risk (Table 1) [7]. Since age-specific population estimates are readily available, the primary analysis for this study defined at-risk status based on underlying health conditions alone, rather than age. An additional analysis estimated the size of the at-risk population including all individuals aged ≥70 years.

We defined the COVID-19 at-risk population as individuals with at least one underlying health condition conferring moderate or high risk of severe COVID-19 according to national guidance (Table 1). Namely: any history of chronic respiratory disease (excluding asthma), heart disease, kidney disease, neurological conditions such as multiple sclerosis, diabetes mellitus; or current asthma, severe obesity, or immunosuppression; assessed on the index date [7].

Underlying conditions were defined using diagnoses, height and weight measurements, test results, and prescriptions recorded in primary care for the main analysis. Pregnancy status was ascertained from the CPRD Pregnancy Register (Supplementary Table 1, Additional File 1). Individuals with no recorded body mass index were included in the analysis, categorised as having no evidence of severe obesity. For analysis using linked secondary care data, diagnoses and procedures recorded in secondary care were additionally ascertained from ICD-10 and OCPS-4 codes respectively.

Multimorbidity was defined as more than one condition among the following domains: asthma or other chronic respiratory disease; chronic heart disease; chronic kidney disease; chronic liver disease; chronic neurological disease; diabetes; or immunosuppression (including individuals with dysplenia and organ transplant recipients).

Cancer survivors have an increased risk of COVID-19 mortality but non-haematological cancer survivors are only included in current COVID-19 guidance if receiving immunosuppressing treatment (Table 1) [2]. Separately to the study at-risk definition we described prevalence of any new cancer diagnosis in the past one and five years, as cancer survivors may be at increased risk of COVID-19 related death [2].

Statistical analysis

Point prevalence estimates of the at-risk population and each underlying condition on 5 March 2019 were calculated per 100,000 with binomial exact 95% confidence intervals, for each nation in the UK. The at-risk population prevalence was stratified by sex and age, categorised in 5-year bands except 2–9 years and 90–99 years. Prevalence estimates for the at-risk population and each condition were stratified by age and region, separately and in combination. Prevalence values with fewer than five individuals were suppressed to preserve confidentiality.

For additional analysis estimating the size of the at-risk population including all individuals aged ≥70 years, the at-risk prevalence among individuals aged 2–69 years was age-standardised in 5-year bands, and added to the population aged ≥70 years, using mid-2019 national population estimates [22]. Comparison of prevalence in 2014 to 2019 was stratified by region to account for the change in regional representation of the dataset over time. The point prevalence of pregnancy and underlying health conditions was estimated among women aged 11–49 years on 5 March 2014. Prevalence estimates with and without linked secondary care records were compared among individuals at practices in England which had consented to data linkage.

Sensitivity analyses

CPRD GOLD was nationally representative by age and sex in March 2011 [17]. To update this assessment, the 2019 study population was compared to mid-2019 national population estimates, and 2019 at-risk prevalence estimates directly age-standardised in five-year bands using mid-2019 population estimates for each nation [22].

The main analysis included individuals eligible for any period of time between 1 January and 5 March 2019. Individuals who left CPRD between 1 January and 5 March would not subsequently have had new diagnoses recorded, which could underestimate point prevalence on 5 March. As a sensitivity analysis, at-risk prevalence was estimated with the study population restricted to individuals who were still registered in CPRD on 5 March 2019.

All analysis was conducted using STATA 16 MP.

Results

Characteristics of the study population

The 2019 study population included 2,706,053 individuals: 990,939 (36.6%) in England, 801,352 in Scotland, 708,670 in Wales and 205,092 in Northern Ireland (Table 2). Approximately half (50.2%) were female. The study included 359,412 individuals (13.3%) aged ≥70 years. There was some over-representation of 40–59-year-olds compared to mid-2019 national population estimates for all four countries (Fig. 1).

Table 2 Point prevalencea of the COVID-19 at-risk population in the UK on 5 March 2019, N = 2,706,053
Fig. 1
figure 1

Age and sex distribution of 2019 study population (N = 2,706,053) compared to mid-2019 national population estimates [22]

In 2014, the dataset included 4,730,254 individuals: 2,980,402 (63.0%) in England, 810,169 (17.1%) in Scotland, 730,563 (15.4%) in Wales and 209,120 (4.4%) in Northern Ireland (Supplementary Table 2, Additional File 1). Age and sex distributions were similar to 2019, with 50.3% female and 12.7% aged ≥70 years.

COVID-19 at-risk population

On 5 March 2019, 24.4% (95% CI 24.4–24.5) of the study population were at risk of severe COVID-19 due to underlying health conditions. National at-risk prevalence ranged from 22.6% in England to 26.5% in Wales (Table 2).

In a secondary analysis, the number of at-risk individuals based on current guidance including all individuals aged ≥70 years was estimated at 18.5 million across the UK, of whom 9.53 million (95% CI 9.52–9.53) were aged < 70 years (Table 3).

Table 3 Estimated size of the 2019 UK at-risk population according to national guidance (either aged ≥70 years, or younger with an underlying health condition)

Composition by underlying health conditions

The commonest conditions across the UK were chronic kidney disease (7.2%), diabetes mellitus (7.1%), asthma (6.5%) and chronic heart disease (4.5%). Prevalence of each condition varied nationally, with chronic liver disease notably commoner in Scotland (Table 2). Multimorbidity was common, ranging from 6.2% in England to 7.9% in Northern Ireland: 7.1% across the UK.

Variation by age

The proportion of at-risk individuals increased gradually with age from 5.1% of children aged 2–9 years to a peak at 79.4% of those aged 85–89 years in England before declining at older ages (Fig. 2). Similar age distributions were seen in each nation, and for each condition except current asthma, which peaked at age 10–14 years (Fig. 2).

Fig. 2
figure 2

Age distributions of the at-risk population and underlying health conditions on 5 March 2019, N = 2,706,053

The at-risk population comprised 18.1% of individuals aged < 70 years (including 8.3% of school-aged children and 19.6% of working aged adults) and 66.2% of individuals aged ≥70 years across the UK (Table 2).

Variation by sex

Overall, a higher proportion of women than men were at risk (Table 2), but the association varied with age, and men were more likely than women to be at risk from age 55 years upwards (Fig. 3; Supplementary Table 3, Additional File 1).

Fig. 3
figure 3

2019 point prevalence of the at-risk population by age and sex across the UK, N = 2,706,053

Variation by region

No individuals from the North East or East Midlands regions of England were included in 2019, whereas all regions were represented in 2014. London had the lowest proportion of the population considered at risk in both 2014 and 2019 (Fig. 4). The East of England, South Central and South East also had lower prevalence of at-risk individuals than Midlands or Northern regions in both 2014 and 2019. Regional patterns varied between underlying conditions (Supplementary Fig. 1, Additional File 1).

Fig. 4
figure 4

Point prevalence of the England at-risk population by region comparing 2014 (N = 4,730,254) and 2019 (N = 2,706,053). The 2019 study population did not include any individuals in the North East or East Midlands regions; x-axis scale starts at 20,000/100,000

Prevalence estimates of the at-risk population and each condition stratified by age and region (separately and combined) on 5 March 2014 and 2019 are here: https://doi.org/10.17037/DATA.00001833

Differences between 2014 and 2019 prevalence estimates

Compared to 2014, at-risk prevalence estimates in 2019 were 0.8% higher in Northern Ireland but lower in Scotland (− 0.5%), Wales (− 0.8%), and England (− 1.3%). When stratified by region within England (Fig. 4), at-risk prevalence increased from 2014 to 2019 for Yorkshire and the Humber and the West Midlands, and decreased in all other regions (excluding the North East and East Midlands, which were unavailable in the 2019 dataset), but no changes exceeded 1.9% difference.

For underlying conditions, absolute changes in UK prevalence estimates from 2014 to 2019 ranged from a − 0.9% decrease in chronic kidney disease to a 0.7% increase in diabetes mellitus (Supplementary Fig. 1, Additional File 1). The biggest relative increases were for chronic liver disease (+ 32.3% from 2014 to 2019), diabetes (+ 11.5%) and chronic respiratory disease other than asthma (+ 11.4%). The largest relative falls were for chronic kidney disease (− 10.6%) and current asthma (− 6.0%).

Cancer survivors

On 5 March 2019, 0.4% of the UK had incident cancer recorded within the previous year and 1.6% within the previous 5 y (Table 2).

Pregnancy

Among women aged 11–49 years on 5 March 2014, 2.1% were pregnant, of whom 12.9% had a recorded health condition, compared to 14.5% of non-pregnant women (Table 4).

Table 4 2014 prevalencea of pregnancy and underlying health conditions among women aged 11–49 years, N = 1,181,840

Linked secondary care records

At-risk prevalence based on standalone primary care records was similar among individuals with and without eligibility for data linkage. Linked secondary care records increased the estimated prevalence of the at-risk population in England by 1.8% in both 2014 and 2019. The increase was greater among individuals < 70 years than those ≥70 years.

For underlying conditions, the greatest absolute changes in prevalence estimates were for multimorbidity, which increased from 6.5 to 7.6% in 2019, and chronic heart disease, which increased from 4.0 to 5.3%. The greatest relative increase was for chronic liver disease, which nearly doubled from 0.27 to 0.53% (Table 5).

Table 5 Prevalence estimates for England with and without linked secondary care data (2014 N = 1,802,468; 2019 N = 744,496)

Sensitivity analyses

Age-standardisation did not alter 2019 at-risk prevalence estimates (not presented). When the study population was restricted to individuals active on 5 March 2019, at-risk prevalence estimates fell by less than 1% (Supplementary Table 4, Additional File 1).

Discussion

This paper describes the size and distribution of the population at risk of severe COVID-19 based on clinical records from a large, nationally representative cohort across the UK.

On 5 March 2019, 24.4% of the UK population were at higher risk than others of the same age due to underlying health conditions, including 8.3% of school-aged children, 19.6% of working-aged adults, and 66.2% of individuals aged ≥70 years. The commonest conditions were chronic kidney disease, diabetes and asthma. Multimorbidity was common at 7.1%. The size and regional distribution of the at-risk population was similar in 2014 and 2019, with lower prevalence in London and the South of England than Midlands or Northern regions. Separately, the 1.6% of the study population with a new diagnosis of cancer within the previous 5 y may also be at increased risk of severe COVID-19 [2].

Including all individuals aged ≥70 years, 18.5 million individuals in the UK would be considered moderate or high risk under current national guidance [7]. This is higher than a previous estimate of 8.4 million comprising 7.2% of men and 7.5% of women aged 30–69 years, and 33% of men and 29% of women aged ≥70 years [16]. Our estimates include additional conditions to cover the full national guidance [7]. There were also differences in ascertainment: for example, our diabetes prevalence estimate for ages 30–69 in England was 7.0%, compared to 2.2% in the previous study [16] and 6.9% in the national Quality Outcomes Framework (QOF) [23]. This may be due to increases in diagnoses and recording of diabetes over time in our more recent study period of March 2019 (rather than 1997–2017) [16].

Our at-risk prevalence estimates were slightly lower than GBD-based estimates that 29.1% of the UK population had at least one underlying health condition increasing COVID-19 risk, or 28.1% when restricted to the same set of conditions by excluding cancers causing indirect immunosuppression and tuberculosis from GBD-based estimates [13]. This did not appear to be due to under-estimation of clustering due to multimorbidity in the GBD-based study, as the 9.2% multimorbidity prevalence modelled was higher than we observed even when using linked secondary care records. The difference was greatest among older age groups, and our finding that 19.6% of working-aged adults (19–65 years) were at risk is broadly comparable to the GBD-based estimate (for the same conditions) of 22.8% among those aged 15–64 years [13].

Our prevalence estimates are in line with national QOF diabetes and cancer monitoring, slightly higher than the more narrowly defined QOF chronic heart disease estimates [23], and consistent with previous UK studies of chronic kidney disease and asthma [24, 25]. The five-year trends of increasing diabetes and decreasing asthma prevalence are consistent with directions of change in a previous study of asthma [24], and QOF, although 2014 QOF diabetes prevalence was slightly higher at 6.2% in 2013/14 [23].

Linked secondary care records in England increased the estimated size of the at-risk population only modestly, but the estimated prevalence of chronic liver disease in 2019 nearly doubled from 0.27 to 0.53%, and multimorbidity and chronic heart disease prevalence also increased. Our chronic liver disease prevalence estimate in England of 529/100,000 when supplemented with secondary care data is more consistent with previous national estimates of approximately 600/100,000 for the UK than our lower estimate using primary care data alone [26]. Several studies of the associations between underlying health conditions and COVID-19 outcomes in England have used standalone primary care records to characterise underlying health conditions [3, 27]. Such studies may under-ascertain chronic liver disease, heart disease and multimorbidity, and thus underestimate associations of these conditions with COVID-19 outcomes. If the risk of severe COVID-19 differs between underlying health conditions, then their differential under-ascertainment in primary health records may bias estimates of associations of underlying health conditions with COVID-19 outcomes. Among women who were pregnant on 5 March 2014, 12.9% were at risk due to an underlying health condition, compared to a third of the pregnant women admitted to hospital with COVID-19 [28]. While the pregnancy register has high sensitivity for livebirths, pregnancy losses may be under-recorded [20]. The 2.1% point prevalence estimate of pregnancy is perhaps low compared to a survey in which 591/5686 (10%) of women aged 16–44 years in Britain reported a pregnancy ending in the previous year, although these are not easily comparable [29]. Caution is required in applying historical pregnancy estimates, as COVID-19 may affect family planning.

Strengths and limitations

To our knowledge, these are the first prevalence estimates of the full population at risk of severe COVID-19 across the UK according to national guidelines. Strengths include the large, nationally representative cohort, risk group definitions with detailed ascertainment tailored to risk of COVID-19, and quantification of the value of linked secondary care records.

Our definition of the at-risk population was based on UK national guidance [7]. Large national studies have found that the health conditions in national COVID-19 guidance are indeed associated with increased risk of COVID-19-related death, although the size of associations vary [2,3,4]. Older individuals have also been found to be at higher risk of severe COVID-19 than younger, independent of underlying health conditions [2,3,4]. However, understanding of the risk factors for severe COVID-19 is still evolving. To support policy and planning to adapt flexibly to future evidence of the associations of different underlying conditions with COVID-19 outcomes, we provide age- and region- stratified prevalence for each underlying condition separately, including separating asthma from other respiratory conditions [27, 30].

A key limitation is that UK-wide estimates rely on primary care records, which may miss undiagnosed conditions and under-ascertain conditions diagnosed in secondary care. Our analysis including linked secondary care records in England suggests that estimates of the overall size of the at-risk population are robust, but that the prevalence of multimorbidity, chronic heart disease and liver disease may be underestimated from primary care records. There is likely under-ascertainment of immunosuppressing cancer treatments even using secondary care records, which could be on a scale similar to the 1.6% of the population newly diagnosed with cancer within the previous year. Second, the 2019 estimates did not include all regions in England. Although the dataset remained nationally representative in terms of age and sex in 2019, and prevalence estimates of individual conditions were consistent with expectations, suggesting that national 2019 estimates are representative, regionally-stratified estimates in 2019 are incomplete. Prevalence estimates from 2014 include all regions but are less up-to-date, and differences from 2019 may reflect changes in prevalence and recording of conditions, and the CPRD GOLD population over time. Third, inclusion of individuals active in the dataset at any point between 1 January and 5 March could have resulted in some under-estimation of point prevalence on 5 March: sensitivity analysis suggested this was minimal. Finally, we were able to describe pregnancy in 2014 only, and pregnancy prevalence may be under-estimated.

Conclusions

We estimate that current national guidance on COVID-19 risk groups encompasses 18.5 million individuals across the UK, a larger population than previously estimated. These national estimates broadly support the use of Global Burden of Disease modelled estimates and age-targeted vaccination strategies in other countries.

We found that 66.2% of individuals aged ≥70 years had at least one recorded underlying condition, suggesting that an age-based approach to COVID-19 vaccination could efficiently target individuals at highest risk. Implementation of public health measures such as influenza vaccination generally achieve higher uptake when targeted on the basis of age rather than health conditions [31]. Age-based vaccination strategies may also be more feasible to implement in low-resource settings.

Our finding that 8.3% of school-aged children and 19.6% of working aged adults are in the at-risk population (as currently defined) emphasises the need to consider younger at-risk individuals in shielding guidance and when reopening schools and workplaces. The large number of children and younger adults with underlying conditions, who may nevertheless be at low absolute risk of severe COVID-19, supports vaccine strategies based on age- and condition-specific estimates of risk of severe COVID-19, rather than including individuals of any age with underlying conditions. We provide age-stratified prevalence for each condition to support effective vaccine resource allocation based on age and health conditions.