Keywords

1 Introduction: Overview of COG-UK

I want to thank the organizers of this meeting for their kind invitation. It is a great honour for me to be able to speak to you today.

Figure 1 shows the landing page of the COG-UK website. Please do have a look at the website if you are working in the field of SARS-CoV-2 genomics. Our website contains news, reports, blogs and a variety of other information. There are also links to numerous analytical tools and methods. The landing page shows that we have so far sequenced around 430,000 SARS-CoV-2 genomes in the UK. This is the single largest effort in the UK to date to sequence a single pathogen.

Fig. 1
figure 1

Source Peacock (2021)

COVID-19 Genomics UK Consortium website.

I want to talk today about variants past, present, and future. Before I do that, I want to talk briefly about COG-UK (Fig. 2). This began on April 1, 2020, making us just over a year old. This brought together the many pathogen genomics experts in the UK. It was only a matter of time before SARS-CoV-2 evolved in a way that would challenge us in terms of efficacy of therapeutics and vaccines, and we needed to be prepared.

Fig. 2
figure 2

Source Peacock (2021)

COVID-19 GENOMICS UK (COG-UK) consortium.

With government funding and the support of the UK Government Chief Science Advisor Sir Patrick Vallance, we set up a project in April 2020 whereby we had a network of regional sequencing hubs, which were largely academic sites across the country as shown on the map (Fig. 2), together with the Wellcome Sanger Institute. These are linked with the four public health agencies in England, Wales, Scotland, and Northern Ireland.

COG-UK has four main objectives (Fig. 3). First, we aim to generate data for use by public health agencies and public health interventions. Second, we develop tools and methods for sequencing and analysis, which are open access. We also make our genome data available in open access databases. Third, we support international efforts—we want to support international partners through collaboration and provision of expertise. Fourth, we aim to integrate our data within the wider UK science ecosystem.

Fig. 3
figure 3

Source Peacock (2021)

COG-UK objectives.

Sequencing is supported by our sampling strategy (Fig. 4). Our maximum potential sequencing capacity is around 30,000 genomes a week. Half of our sequencing capability is used to sequence unbiased, random samples. We sequence samples from across the UK without selection criteria so that we can detect changes in the virus in all regions of the country. The remaining capacity is used on targeted public health sampling such as outbreaks and surge testing when, for example, we know there is a cluster of cases infected with a variant of concern, such as B.1.351, the variant of concern first detected in South Africa. We also provide sequencing to support the UK core national studies (for example, longitudinal surveillance studies) (Fig. 5).

Fig. 4
figure 4

Source Peacock (2021)

COG-UK objectives and sampling.

Fig. 5
figure 5

Source Peacock (2021)

COG-UK objectives and sampling.

This map of the UK (Fig. 6) shows the origin of SARS-CoV-2 positive samples that are sequenced by the regional sequencing hubs. Each NHS diagnostic testing laboratory that undertakes PCR testing for COVID-19 is connected to a specific regional hub. Each dot representing a testing lab is color-coded based on the sequencing hub that positive samples are then transferred to.

Fig. 6
figure 6

Source Peacock (2021)

Sequencing network: hospital laboratories and patients (“pillar 1”).

The UK has an extensive SARS-CoV-2 community testing capability, which is conducted through a network of lighthouse laboratories (Fig. 7). PCR samples from these labs are sent to the Wellcome Sanger Institute for sequencing.

Fig. 7
figure 7

Source Peacock (2021)

Sequencing network: community testing (“pillar 2”).

Coverage (number of samples sequenced versus the number positive) for the most recent week that we have complete data for is around 40%, and cumulatively over the entire pandemic is around 10% (Fig. 8).

Fig. 8
figure 8

Source Peacock (2021)

Country coverage.

Targeted sequencing is important to understand ongoing public health challenges. We sequence positive samples associated with people who are in quarantine, and from surge testing and outbreak investigations (Fig. 9).

Fig. 9
figure 9

Source Peacock (2021)

COG-UK objectives and sampling.

2 Viral Variants: Context and Drivers

I would now like to talk about viral variants in terms of context and drivers of their emergence.

At the top of Fig. 10, you will see a schematic of the SARS-CoV-2 genome (Andersen et al. 2020), which has around 30,000 bases. The current focus of attention is predominantly the gene encoding the spike protein. The SARS-CoV-2 genome encodes four structural proteins, but the spike protein is the major focus because this is the part of the virus against which antibodies (and vaccines) are largely targetted. The spike protein is involved in binding to the human ACE2 receptor and cell entry, and so is a critically important part of the virus for the infective process.

Fig. 10
figure 10

Source Peacock (2021)

SARS-CoV-2 genome and mutations.

Mutations in the SARS-CoV-2 genome occur at a rate of around one or two per month and these are largely random in the genome. Many of these mutations have no effect on the virus and no change to the biology/phenotype. Some mutations will be disadvantageous to the virus and may not be observed at all or that virus may become extinct after a period. The mutations that we are most concerned about are those that are associated with a fitness advantage to the virus, which may lead to lineage expansion.

Here, I list the characteristics that could arise from mutations that are important for human health (Fig. 11). These fall into four categories. First, the virus may become more transmissible. Second, changes may lead to immune escape. That means that our vaccines or immunity acquired after COVID-19 infection may become less effective. Third, disease severity may increase (and it may decrease). Fourth, some genetic changes could impact diagnostic tests. A variant may have one or a combination of these depending on the changes that occur in the genome.

Fig. 11
figure 11

Source Peacock (2021)

SARS-CoV-2 variants: implications for human health.

Turning to those environments that are permissive for the emergence and selection of mutations and variants of concern, I have listed four categories (Fig. 12). The first is sustained transmission in settings with high disease rates. If there is no disease, there is very little opportunity for the virus to mutate. With more than 120 million or more cases known to have occurred worldwide there is plenty of opportunity for the virus to mutate. This also means that even though the rate of mutation of the virus is considered to be relatively low, this does not mean to say that the number of mutations will be low. The second permissive environment is when there is transmission in populations with natural or induced immunity, including partial immunity. Through selection of the fittest, variants may be selected that can find chinks in the armour of our immunity and cause infection (escape variants). In an immune population, these are likely to be selected from circulating viruses. The third category is the selection of mutations and variants during prolonged infection in immunocompromised patients. Fourth is the concern that passage of the virus between humans and animals can lead to evolutionary changes that could lead to important changes in biology for human infection and disease control.

Fig. 12
figure 12

Source Peacock (2021)

Permissive environments for the selection of mutations and variants of concern.

Figure 13 shows a graph of SARS-CoV-2 viral evolution in a single patient during treatment of chronic infection (Kemp et al. 2021). The patient was a male individual who had a history of B-cell lymphoma and had been treated with chemotherapy in 2012. He presented to hospital with SARS-CoV-2 infection and had persistent infection over a prolonged period. The graph shows the results of ultra-deep sequencing of SARS-CoV-2 from the patients on 23 occasions over a period of 101 days. The frequency of specific mutations is shown on the left axis and the CT value from the diagnostic PCR test is shown on the right axis. The peaks in the graph show the most common mutations that emerged. In the first 65 days, there was very little change in the virus, including during two courses of Remdesivir. The patient then went on to receive convalescent plasma. Following the first two doses, a range of mutations emerged, with some variants becoming dominant at specific time points. Mutation D796H and the deletion △69/70 together led to a modest reduction in sensitivity to convalescent plasma antibodies. This is like a training ground for viral evolution. It has been hypothesized that some of the variants of concern may have arisen in patients who are chronically infected and who cannot clear the virus.

Fig. 13
figure 13

Source Peacock (2021)

SARS-CoV-2 evolution during treatment of chronic infection.

I have already referred to the importance of human to animal passage of the virus. Figure 14 shows work by Robert F. Garry published in January 2021 (Garry et al. 2021) The figure shows a compilation of mutations arising in the spike protein on sustained human-to-human transmission and human-to-animal passage. Mutations are grouped into four different classes. The first is in the receptor binding domain, shown at the top of the figure. Some mutations, for example, the N501Y substitution present in the B.1.1.7 lineage, first detected in the UK and associated with greater transmission, have emerged during transmission between humans, but have also emerged in animals. The second is mutations in the N-terminal domain of the spike protein, particularly in the portion most exposed on the surface of the virus. Third is mutations in or near the furin cleavage site, and fourth is mutations around the region of the D614 gene mutation, which emerged early in the pandemic and was associated with a range of biological changes in the virus.

Fig. 14
figure 14

Source Peacock (2021)

Compilation of mutations arising in spike on sustained transmission.

3 Sars-Cov-2 Variants: Past

Next, I am going to talk about some mutations and variants observed in the past, starting with the D614G substitution. This was not present in the virus that first emerged but appeared around March 2020.

The graph (top left) (Fig. 15) shows the emergence and D614G replacement over time on a global scale. This substitution enhances infectivity, competitive fitness, and transmission in primary human cells and animal models, results in a modest reduction in SARS-CoV-2 neutralization but does not alter pathogenicity in mice (Hou 2020; Korber 2020; Plante 2020; Zhou 2020).The latter finding is replicated in humans (Volz et al. 2021).The graph (top right) shows outcomes from infection in people admitted to hospital with SARS-CoV-2 who were either infected with the D or the G version, shown in age groups. There was no difference between D and G for patients who required oxygen or ventilation, or who died. But the mutation was associated with a higher viral load and a younger age of infection when the variant first emerged. What this shows us is that a single change in the virus can lead to a fundamental change in the biology of the virus. However, it has taken many months to undertake the experimental work. While we wish to understand the impact of viral genetic changes very rapidly, it does take time to perform the necessary experiments.

Fig. 15
figure 15

Source Peacock (2021)

D614G substitution: an early lesson in evolving biology.

4 Sars-Cov-2 Variants: Present

I am going to now talk about variants, present.

Shown here (Fig. 16) is the latest chart from Public Health England on the gov.uk website in the UK, listing variants being tracked there. One further variant was added in the last day or so, first detected in India but not shown here. I will focus on four variants of concern in the UK, three of which are variants of concern listed by the World Health Organization. The first was initially detected in England—B.1.1.7. The second is the variant first detected in South Africa—B.1.351. The third is P.1 first detected in Japan and appeared to have emerged in Brazil. The fourth is B.1.1.7 with the addition of E484K.

Fig. 16
figure 16

Source Peacock (2021)

Variants as defined by PHE April 7, 2021.

Figure 17 shows a national overview of lineages in the UK based on sequence data generated by the Wellcome Sanger Institute (community surveillance) between September 2020 and March 2021. The top image shows incidence of major lineages over time, and the bottom image shows the proportion. Our prevalent lineage early in this period (B.1.177) was first detected in Spain and appeared to spread across Europe as a result of travel, rather than any biological characteristic of the virus. This was replaced over time by B.1.1.7. Around 98% of all infection in our country currently is caused by B.1.1.7.

Fig. 17
figure 17

Source Peacock (2021)

National overview, England.

B.1.1.7 was first detected in the south of England, and it has since been reported in at least 114 countries (Fig. 18) [for selected studies on aspects of B.1.1.7, see Collier (2021), Garcia-Beltran (2021), Lumley (2021), NERVTAG (2021), Nunez (2021), Planas (2021), Wang (2021)].This variant was striking when it first emerged, because of the number or mutations it carried (n = 23). But the most notable substitution is N501Y, which sits in the receptor binding motif of the spike and increases binding affinity to the human ACE2 receptor. This variant is more transmissible (range, 43–90%). There are a large number of studies on virulence of this variant, which appears to cause more severe disease and increased risk of hospitalization, but people admitted to hospital do not have an increased risk of death. This variant also affects specific diagnostic tests, with S-gene target failure. This does not lead to test failure and has been useful as a surrogate for B.1.1.7, which is an accurate marker when the prevalence of B.1.1.7 is high. Antibody neutralization is slightly reduced, but vaccine impact is not significant. A recent study suggests that this variant may interfere with the innate immune response, with evidence for increased resistance to interferon (Guo et al. 2021).

Fig. 18
figure 18

Source Peacock (2021)

Lineage B.1.1.7 (UK).

Figure 19 shows the antigenic variation that is occurring in B.1.1.7 in the UK over time. It demonstrates that there is increasing genetic change in B.1.1.7, including numerous independent occasions when E484K has emerged. You can find this image (which is regularly updated) in the open access tool, COG-UK Mutation Explorer.

Fig. 19
figure 19

Source Peacock (2021)

B.1.1.7 antigenic variation over time.

B.1.351 was first detected in South Africa (Fig. 20). The earliest genome was detected on October 8, 2020, and at least 69 countries have detected at least one B.1.351 sequence [for selected studies on aspects of B.1.351, see Cele (2021), Dejnirattisai (2021), Garcia-Beltran (2021), Moyo-Gwete (2021), Madhi (2021), Liu (2021), Wu (2021)]. There are spike gene substitutions with likely functional significance (K417N, E484K and N501Y). The variant is thought to be more transmissible. There was a reported 20% increased risk of in hospital mortality during the second wave in South Africa, noting that a high burden of cases can place pressure on hospitals. An important concern is that neutralization is substantially decreased in experimental work, using sera from people taken after infection. Furthermore, there appears to be an effect on vaccine efficacy. In a study of the AZ vaccine, a two-dose regimen did not show protection against mild to moderate COVID-19 due to the B.1.351 variant. But this was a relatively small study in a young population, and we cannot exclude that this vaccine would protect people from severe disease and death. The results are, therefore, inconclusive.

Fig. 20
figure 20

Source Peacock (2021)

Lineage B.1.351 (South Africa).

The P.1 lineage was first detected in Brazil and was also detected in Japan in people who were entering the country (Fig. 21). The earliest genome dates back to December 4, 2020, and 35 countries have detected at least one P.1 sequence to date [for selected studies on P.1, see Coutinho (2021), Dejnirattisai (2021), Faria (2021), Garcia-Beltran (2021), Lui (2021), Wu (2021)]. This variant has spike protein substitutions with likely functional significance (K417N, E484K and N501Y). You will observe the similarity in mutations in spike with B.1.351, and the phenotype of the variant is proving to be similar, including increased transmissibility. Whether P.1 is associated with altered disease severity is currently unclear. A moderate reduction in neutralization is observed, including some reduction in neutralization using sera from people following vaccination with one of several vaccines.

Fig. 21
figure 21

Source Peacock (2021)

Lineage P.1 (Brazil).

A study by Garcia-Beltran et al. (2021) looked at comparative neutralization of a range of variants using serology samples from 99 individuals who either had one or two doses of the Pfizer or Moderna vaccine (Fig. 22). This overcomes the problems of comparing neutralization results between different labs. Across the ten variants tested, neutralization was strongest for wild type, and weakest for the B.1.351—which was equivalent to neutralization for other coronaviruses, including SARS-CoV.

Fig. 22
figure 22

Source Peacock (2021)

Comparative neutralization of variants.

From the same work (Garcia-Beltran et al. 2021), Fig. 23 shows the actual fold decrease in neutralization (Fig. 23). There is a marked decrease in pseudovirus neutralization relative to the wild type for three variants of B.1.351, for both Pfizer and Moderna vaccine sera.

Fig. 23
figure 23

Source Peacock (2021)

Comparative neutralization of variants.

5 Sars-Cov-2 Variants: Future

So, what does the future hold?

Many changes that have occurred in the SARS-CoV-2 genome in different lineages around the world are in comparable positions in the genome (Fig. 24). This is consistent with convergent evolution, where variants are selected based on phenotypes that allow the virus to persist and spread in populations that are becoming increasingly immune. In this figure, you can see the root of the viral phylogeny at the start of the pandemic, after which there was a division into the two major lineages, A and B. Lineages have arisen over time, mostly from B.1, with several variants having the E484K mutation and/or the N501Y mutation.

Fig. 24
figure 24

Source Peacock (2021)

Convergent evolution.

There are ongoing discussions in COG-UK and elsewhere about the importance of shifting our thinking from single variants to constellations of mutations (Fig. 25). The table shows the key genetic changes in a number of variants. This reflects the early beginnings of cataloging amino acid substitutions that are particularly important in terms of their biology. We are observing a shifting and dynamic fitness landspace over time, as SARS-CoV-2 evolves in human populations as they develop increasing immunity.

Fig. 25
figure 25

Source Peacock (2021)

Convergence of mutations: “constellations”.

Can we predict the future in terms of variants, and what can we expect? I compare this to looking into a crystal ball because it is very difficult to accurately predict what might happen to the genome in the future (Fig. 26). I think it likely that there will be further emergence of variants of concern, which will be driven by high transmission in populations with partial immunity, and may be associated with chronically infected people and human to animal passage. I would expect there to be increasing complexity in describing, tracking, and understanding variants over time. There is an optimistic point that by sequencing the viruses as they change, we can provide this information to vaccine developers. And even now, vaccine developers are developing vaccines against the first variant of concern (B.1.351), and studies are looking at the effect of a third booster dose on immunity. The genetic changes we have observed to date are not associated with a virus that is highly resistant to current vaccines. In the future we will be using sequencing data to modify vaccine strategies over time, as this proves necessary.

Fig. 26
figure 26

Source Peacock (2021)

Looking to the future: what we could expect.

What actions will be needed in the future? Cases of COVID-19 cause by variants of concern can still be prevented by the usual precautions—hands (washing), face (masks), and space (social distance and outdoors) (Fig. 27). This will be effective against variants and is a key plank in our control of the virus. We also need to focus on global vaccine rollout. It is not sufficient to vaccinate in particular countries and then hope for the best. We need to vaccinate the world. In places where disease rates are high, this is also likely to be where further variants of concern will emerge. We need to protect everybody by vaccinating everybody.

Fig. 27
figure 27

Source Peacock (2021)

Looking to the future: what actions will be needed.

A sure way to protect populations is through travel restrictions, although different countries have reached different decisions about travel restriction policy. But quarantine, testing and tracing are key elements of public health control together with ongoing vaccines and if required, vaccine modification. National surveillance is highly effective in numerous countries, but a global network of sequencing capabilities and data-sharing platforms are not currently in place. We need this to understand what variants are circulating where, to ensure that vaccines in use will be effective. We need a network of sequencing capabilities with global information sharing, together with effective pipelines for genotype to phenotype evaluation, immunology, and other experimental systems.

This (Fig. 28) is the most important slide of my talk. The COG-UK consortium contains around 500 people, all of whom have contributed to the work that I have talked about today, and I am deeply indebted and grateful to all of them.

Fig. 28
figure 28

Source Peacock (2021)

Representation of the COG UK COVID 19 Genomics UK Consortium ‘SARS-CoV-2 variants: past, present and future’ presentation.

I would like to thank you for listening. Thank you.