Evidence of natural selection and dominance of SARS-CoV-2 variant Lambda (C.37) over variants of concern in Cusco, Peru

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) lineage C.37 (Lambda) has spread rapidly in Peru and other Latin American countries. However, most studies in Peru have focused on Lima, the capital city, without knowing the dynamics of the spread of the variant in other departments. Cusco, Peru, is one of the most popular departments in the country for tourists, so the introduction of new variants of SARS-CoV-2 might occur despite closure of the borders. Therefore, in this work, we analyzed the variants circulating in Cusco. The aim of this work was to better understand the distribution of SARS-CoV-2 lineages circulating in Cusco and to characterize the genomes of these strains. To this end, 46 SARS-CoV-2 genomes from vaccinated and unvaccinated patients were sequenced in the first half of 2021. The genomes were analyzed using phylogenetic and natural selection methods. Phylogenetic trees from Cusco showed dominance of the Lambda lineage over the variants of concern (VOCs), and there was no clustering of variants by district. Natural selection analysis revealed mutations, mainly in the spike protein, at positions 75, 246, 247, 707, 769, and 1020. In addition, we found that unvaccinated patients accumulated more new mutations than did vaccinated patients, and these included the F101Y mutation in ORF7a, E419A in NSP3, a deletion in S (21,618-22,501), and a deletion in ORF3a (25,437-26,122). Supplementary Information The online version contains supplementary material available at 10.1007/s00705-022-05645-x.

The virus quickly spread worldwide, and variants then began to appear. To date, there are 1479 variants according to the Pangolin classification [4], which are classified into variants of concern (VOC) and variants of interest (VOI) according to the World Health Organization (WHO) (https:// www. who. int/ es/ activ ities/ track ing-SARS-CoV-2-varia nts).
Mortality and infection rates are high, and wholegenome sequencing is needed to follow the evolution and epidemiology of the virus in Peru. Currently, 4,079 virus 1 3 88 Page 2 of 13 sequences from Peru have been deposited with the Global Initiative on Sharing Avian Influenza Data (GISAID). By sequencing the complete genome, it is possible to track the dynamics and epidemiology of the virus. For example, in April last year, the predominant lineages were B.1 and B.1.1 [5,6]. However, recently, the situation changed, as it has been shown that the Lambda variant (C.37) is generally predominant in Peru [5]. In 2021, 74,152 cases were reported in Cusco in early October, with a lethality of 3.94%, and Lambda was the predominant variant, followed by Gamma (P.1) and Delta (B.I.617.2) (https:// web. ins. gob. pe/ es/ covid 19/ secue nciam iento-sars-cov2).
Despite sequencing efforts, very few sequences have been deposited in the GISAID database. Therefore, in this work, we performed genomic surveillance of SARS-CoV-2 in the Cusco region in the first half of 2021, as this region has a high level of tourist activity, making it more likely for VOC introductions may occur. However, the Alpha and Gamma variants did not predominate over the Lambda variant in the studied patients in the Cusco region.
In this work, the genome sequences of 46 SARS-CoV-2 isolates from vaccinated and unvaccinated patients were determined in the first half of 2021. The genomes were analyzed using phylogenetic methods, and natural selection was evaluated. A high prevalence of the Lambda lineage was observed in the Cusco region, whereas VOC variants were not prevalent. Moreover, unvaccinated patients were more likely to accumulate new mutations than vaccinated patients.

Ethical approval
This study was reviewed by the Institutional Ethics Committee of the Universidad San Antonio Abad del Cusco (UNSAAC) and approved under the number CBI-UNSAAC2021-01.

Sample collection
Nasopharyngeal swabs were collected from 47 patients in a volume of 2 ml of viral transport medium (VTM), RNA extraction was performed using a Maxwell RSC Viral RNA Extraction Kit (Promega), and the diagnosis of COVID-19 was made based on RT-qPCR using a Novel Coronavirus (2019-nCoV) Nucleic Acid Diagnostic Kit (Sandure Biotech). Samples with a C t value of 25 or below were used for sequencing.

Identification of mutations
Synonymous and non-synonymous substitutions unique to the Cusco sequences were identified using CoV-GLUE [16]. A heat map of mutation frequency was generated using ggplot2 in R.

Natural selection inference analysis
First, ORFs were identified using the RCoV19 Resource for Coronavirus 2019 NGDC online software [17]. To identify sites subject to pervasive diversifying or purifying selection, we used FEL (fixed-effects likelihood) [18], SLAC (singlelikelihood ancestor count) [18], and FUBAR (fast unbiased Bayesian approximation) [19]. To evaluate pervasive and episodic diversifying selection, the mixed-effects model of evolution (MEME) [20] was used.

Protein structure prediction
The viral proteins listed in Table 1 were modeled using the I-TASSER server [21]. The generated models were visualized in the UCSF program Chimera-alpha [22] by comparing MatchMaker structures using the Needleman-Wunsch alignment algorithm and the BLOSUM-62 matrix to perform sequence alignment from subsequent structural matches and to identify the selected sites.

Samples
We selected 46 nasopharyngeal swab samples from COVID-19 patients in the Cusco region with a cycle threshold (C t ) value of 15 for detection of SARS-CoV-2 by qRT-PCR. In total, there were 30 women and 14 men, with a mean age of 37.4 years, 21 of whom were vaccinated and 25 of whom were unvaccinated. All patients had symptoms such as cough, fever, body aches, headache, and anosmia, and none of them died (Supplementary Table S1).

Genomic diversity of SARS-CoV-2 in Cusco, Peru
In Peru, a major effort has been made to sequence the genomes of SARS-CoV-2 strains circulating in the country. As part of the Genomic Surveillance Network, the Universidad Nacional de San Antonio Abad del Cusco (UNSAAC) and the Universidad Peruana Cayetano Heredia (UPCH) sequenced 47 SARS-CoV-2 genomes from the first half of 2021 to analyze the distribution of circulating lineages in the department of Cusco.
In addition, 3,545 SARS-CoV-2 genomes with higher coverage from Peru were analyzed during the same period to compare the dynamics of SARS-CoV-2 at the national level and in the department of Cusco.
The mutations in the spike protein are mainly related to greater transmissibility and resistance to antibodies, and this could be a reason for the rapid spread of this variant in the Peruvian population. For example, the T76I mutation increases infectivity, L452Q increases the affinity of the spike protein for ACE2 and contributes to transmissibility, Interestingly, lineages considered to be VOCs were found in a small proportion, such as Alpha (0.37%), Mu (0.37%), and Delta (0.47%), in addition to the Delta sublineages AY.26, AY.43, and AY.46.6 ( Fig. 1). The Gamma lineage P.1 was the second most common, followed by the Gamma sublineage P.1.12. These lineages have mutations in the receptor-binding domain (RDB) of the spike protein that are associated with increased transmissibility. P.1 began to increase in Peru around week 31, but interestingly, the Gamma variant did not replace the Lambda variant (Fig. 1B).
Analysis of the diversity of variants in the Cusco region gave a picture similar to that of the country as a whole, with the Lambda variant predominating. The predominant SARS-CoV-2 variants circulating in Cusco were C.37 (n = 227, 85.66%), P.1 (n = 24, 9.05%), and B.1.1.348 (n = 7, 2.64%), whereas the variants P. VOCs such as Alpha, Gamma, and Delta were found in a low proportion compared to Lambda. VOCs did not predominate in the countryside or in the Cusco region during the peak of Lambda. A similar situation occurred in Mexico, where the Alpha variant was not predominant over the B.1.1.519 variant [24].
The prevalence of Lambda in Peru is probably due to the "founder effect" that occurs when a limited number of individual viruses form a new population during transmission. The viral ancestor C.37 first dominated in the population, after which natural selection and random events increased the frequency of variants, which then became fixed [25].
A large proportion of COVID-19 cases in Cusco occurred in the north of the department in the districts of Echarati, Kimbiri, Quellouno, Santa Ana, and Pichari, with more than 100 cases per 1000 inhabitants (Fig. 2). The entire department was affected by an intermediate and high prevalence of COVID-19, with most cases caused by the Lambda lineage.

Phylogenetic tree of SARS-CoV-2 genome sequences from Peru
Phylogenetic reconstruction was performed using 3,545 SARS-CoV-2 genome sequences from Peru that were downloaded from GISIAD, using the sequence from Wuhan, China, to root the tree. Near the root of the tree are the A.1, A.2, and B.2 lineages from China and many global exports to Southeast Asia, Japan, South Korea, Australia, the United States, and Europe (Fig. 3). Thus, these sequences probably represent introductions from other countries.
Each of the dominant lineages in Peru formed a defined clade in the tree. The dominant lineage was C.37, and the Alpha, Mu, Delta, and Gamma VOCs occurred at lower frequencies.
The Lambda C.37 lineage was found to be the most common; the predominance of the Lambda variant from January to April may indicate higher transmissibility according to Horizon analysis [5].
Lambda C.37 is a deeply branched sublineage of B.1.1.1 [6], and it was observed with high prevalence in the phylogeny, which may indicate a possible local adaptation. However, Lambda is known to have undergone several different independent introductions, probably from Europe and Asia between mid-February and early March [30] (Fig. 3).
The phylogeny based on complete genome sequences from Cusco is shown in Figure 4, with the lineages and the districts from which the isolates were obtained shown at the tips of the tree. A total of 132 complete genome sequences from Cusco were used, 46 of which correspond to the genomes sequenced in this study (Supplementary Table S1). The isolates from Cusco were divided into clades corresponding to lineages C.37 and P.1, and with lineage B.1, which corresponds to imported cases, at the base of the tree.
The Cusco phylogenetic tree did not show any obvious grouping of lineages by district, but there was a grouping by lineage that formed monophyletic clades for each lineage.  (Fig. 4).
The next clade in the phylogenetic tree of SARS-CoV-2 in Cusco corresponds to the P.1 (Gamma) lineage (Fig. 4). The earliest sequence of the P.1 lineage sampled in Peru was found on December 17, 2020. In Cusco, the earliest sequence was from March 5, 2021. Few sequences are seen in the phylogenetic tree, and in contrast to other regions of the world [31], the Gamma lineage was not predominant in Cusco.
The Cusco isolates belong mainly to the Lambda lineage (C.37) and are distributed in different districts of Cusco; there is no clustering of lineages by province. There was also no obvious clustering of isolates from vaccinated or unvaccinated patients (Fig. 4).

Mutations and natural selection in SARS-CoV-2 in samples from Cusco
Many of the mutations in the SARS-CoV-2 lineages are characteristic of that lineage and are fixed by natural selection. Natural selection has shaped the evolution of viruses and led to adaptation of viruses to different environments. Because the Lambda variant was dominant in Peru, we investigated the presence of natural selection for specific mutations favoring adaptation in the Cusco region. We use a combination of site-level selection analyses: a mixed-effects evolutionary model (MEME), fixed-effects likelihood (FEL), single-likelihood ancestor counting (SLAC), and fast unbiased Bayesian approximation (FUBAR).
The genes encoding the nonstructural protein NSP3 and the structural proteins S and N, had the highest number of mutations in the samples from Cusco. Regarding the type of selection, some sites were found using the MEME and FUBAR methods to be subjected to episodic selection, especially in the nonstructural proteins (NSPs) ( Table 1). No sites were found using SLAC, as this method is the least robust due to its reliance on a relatively naive counting approach [18].
Mutations in the NSP2 gene were not identical in samples from vaccinated and unvaccinated patients (Fig. 5). Ten sites in NSP2 were found to be under episodic selection, and the T223I mutation was found in a vaccinated patient. Most of the mutations in NSP2 were found in unvaccinated patients. The NSP2 protein is dispensable for viral replication and can interact with the host proteins prohibitin 1 (PHB1) and prohibitin 2 (PHB2), altering intracellular host signaling [32].
In the NSP3 protein, the most frequent mutations were T248I, F1469S, and F1569V (Fig. 5). These changes are characteristic of the C.37 lineage and were found in both vaccinated and unvaccinated patients. Analysis of the mode of selection using MEME software revealed that the T1365I mutation is subject to episodic selection and was present only in vaccinated patients (Table 1) (Fig. 6). NSP3 is a transmembrane multidomain protein [33] that is involved in the replication/ transcription complex (RTC) and plays a role in polyprotein processing [34]. Interestingly, this protein, which is important for replication, had a greater number of mutations than did non-structural proteins such as NSP1, which is responsible for inhibiting the interferon response at different levels.
The spike gene also accumulated many nonsynonymous mutations. Mutations characteristic of the C.37 lineage (G75V, T76I, L452Q, F490S, D614G, and T859N) were present in both vaccinated and unvaccinated patients. The G75V (under selection) and T76I mutations significantly increase viral infectivity [35]. The mutations L452Q and F490S are responsible for increased transmission [35]. On the other hand, the D614G mutation has been observed in SARS-CoV-2 variants with enhanced viral replication and transmission efficiency. This mutation could be a response to positive selection pressure [36]. The T859N mutation had no effect on vaccine-induced neutralization [35].
The sites under natural selection in the spike protein were analyzed. Several sites were found, with the mutations L18F and G769R being the only ones that exhibited episodic selection in vaccinated patients. The L18F mutation occurred in the South African variant B.1.351 (Beta or GH501Y variant. V2). Later, it was found in the P.1 variant from Brazil (Gamma or GR /501Y variant. V3) and the Zeta variant (P.2). Studies have shown that the L18F substitution can affect neutralizing antibody binding [37].
Some mutations in ORF3a were found to be under episodic selection. ORF3a is known to induce apoptosis, suggesting that ORF3a mutations in SARS-CoV-2 might reduce apoptosis in infected cells, allowing the virus to spread  [38]. Some sites in the ORF3a protein were shown previously to be at under positive natural selection by the FUBAR and DEPS assays, and these might may be under selection from the immune system [3].
Another of the proteins with the highest number of mutations was the structural nucleocapsid phosphoprotein N. The N protein had the characteristic mutations of the C.37 lineage: P37L, R203K, G204R, and G214C. In addition, there was evidence of sites under positive natural selection by the FUBAR method, which had been identified previously by Velazquez-Salinas et al. [3].
In addition, a few nonsynonymous substitutions were observed in ORF8. This protein has been linked to viral pathogenesis by regulating the first innate response to SARS-CoV infection [39]. Previously, it was reported that ORF8-specific sites were under positive selection [3].
Some novel mutations were observed that have not been reported previously. These were present in samples from unvaccinated patients, such as sample CUS-UPCH-0813, with a F101Y mutation in ORF7a (Fig. 5). Sample CUS-UPCH-081 was found to have an E419A mutation in NSP3 and a deletion in S (nt 21,618-22,501). A deletion in ORF3a (nt 25,437-26,122) was also detected in samples CUS-UNSACC-3, 8, 12, and 15 in unvaccinated patients from Cusco. Previous reports have indicated that unvaccinated patients had a higher viral load than vaccinated patients [39], increasing the likelihood of errors in each round of replication. Lower viral replication rates of the Delta and Omicron variants have been observed in vaccinated individuals when compared to unvaccinated individuals [40,41]. Although vaccination does not confer total immunity to SARS-CoV-2 infection, it does reduce the viral load and thereby the frequency of appearance of new mutations that can become fixed by natural selection.
In general, the NSP3, N, and S proteins in the Cusco samples accumulated a greater number of amino acid substitutions, but this does not mean that they are all under natural selection, because many can be eliminated by purifying selection. The proteins with some sites under positive natural selection were ORF3a, ORF8, and S. Our evolutionary analysis supports the hypothesis that early divergence events during the SARS-CoV-2 pandemic may be associated with positive selection of specific sites in ORF3a and ORF8.

Conclusions
Our results describe the distribution of SARS-CoV-2 lineages in the department of Cusco and evolutionary events such as natural selection. The Lambda lineage (C37) is predominant in the Cusco region and Peru, followed by the Gamma lineage (P1). The NSP3, S, and N genes accumulated a greater number of nonsynonymous mutations, many of which were not found to be under positive natural selection because they were fixed previously in the C.37 lineage. Isolates from unvaccinated patients had novel mutations not found in those from vaccinated patients.
Sites under directional and episodic selection in viruses from vaccinated and unvaccinated patients were predominantly found in nonstructural proteins, mainly NSP3. However, in the absence of experimental work showing phenotypic differences in SARS-CoV-2 isolates with these mutations, we cannot evaluate the importance of the nonsynonymous changes in viruses from patients from Cusco. Sites under natural selection may have public health implications. We suggest further studies to investigate how these changes affect evolution. included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.