INTRODUCTION

In the Wuhan province of China, various cases of pneumonia were reported in the late 2019 whose causative agent was identified as Severe acquired respiratory syndrome coronavirus 2 (SARS-CoV-2) [13]. Studies have revealed that SARS-CoV-2 genome shows close resemblance of approximately 96% with the bat SARS-like coronavirus strain Bat-CoV RaTG13 [4]. The SARS-CoV-2 infection leads to Coronavirus disease 2019 (COVID-19) whose symptom ranges from mild to severe respiratory distress. The SARS-CoV-2 rapidly spreads from infected individuals to the normal population by direct contact or via respiratory droplets. This virus has infected human population worldwide and caused tremendous loss of lives and economy. As of 11th Nov 2021, approximately 252 million cases of COVID-19 have been reported worldwide and death toll has reached 5.08 million. To overcome the COVID-19 pandemic, the treatment strategies, better diagnostic methods and vaccines are being developed by the collaborative efforts of industry and academia worldwide.

The single-stranded RNA genome of SARS-CoV-2 is  approximately 29 Kb in length [5]. Its genome encodes 29 proteins categorized into structural (S, M, E and N), non-structural (Nsp1-16) and accessory proteins (Orf3a, 3b, 6, 7a, 7b, 8, 9b, 9c and 10) [6]. SARS-CoV-2 enters the host cells by interaction with the receptor angiotensin-converting enzyme 2 (ACE2) present on host cells via the receptor binding domain (RBD) of Spike protein [7]. Subsequent to the entry into the host cell, the SARS-CoV-2 genomic RNA is translated by the host cell translational machinery to produce viral proteins. As a result, the viral proteins are exposed to the host immune system that triggers an immune response. The B cell epitopes generated from viral proteins bind to the host B cell antigen receptors to induce a cascade of reactions to generate protective antibodies that neutralize the virus [8]. Therefore, identifying the epitopes of SARS-COV-2 plays central role in vaccine development and SARS-CoV-2 pathogenesis. More importantly, SARS-CoV-2 is rapidly mutating and several new variants are emerging and therefore, continued analyses of variants are warranted to understand viral evolution and its implication on the host immune response.

In this study, we used bioinformatic approach to characterize N-protein epitopes of SARS-CoV-2. Our data revealed that few epitopes of N-protein might lose its antigenic property as a result of mutations observed among Indian SARS-CoV-2 isolates.

METHODS

Sequence Retrieval for Analysis

We retrieved protein sequences of SARS-CoV-2 used in this study from public accessible NCBI Virus database. The NCBI virus database has annotated 26 proteins of SARS-CoV-2 from the reference genome (NC_045512.2). We performed detailed analysis of variations present in the N-protein among Indian isolates of SARS-CoV-2. For this analysis, sequences of 831 N-protein reported from India (till July 2021), were also downloaded from NCBI virus database. The accession ID of N-protein reference sequence (N-protein) used in this study was YP_009724397. The details of all N-protein sequences (accession ID) used in this study are mentioned in Supplementary Table S1.

The Prediction of B Cell Epitopes

Linear B cell epitopes are small continuous peptides. For this prediction, a webserver tool IEDB (Immune Epitope Database) was applied that uses an algorithm based on the “Bepipred linear prediction method” [9]. For the analysis by “Bepipred linear prediction methods” the default threshold value of 0.350 was applied using Bepipred 2.0. The antigenicity and allergenicity of epitopes were predicted by Vaxijen 2.0 [10] and AllergenFP v.1.0 [11] webservers, respectively. We used reference sequence of N-protein (accession ID: YP_009724397) for this prediction.

Multiple Sequence Alignments (MSAs)

MSAs were performed to identify variations present among N-proteins. The Clustal Omega tool was used to conduct MSAs [12] as described earlier [13]. In this analysis, the first reported sequence of N-protein from Wuhan, China was used (protein Accession Number: YP_009724397) as the reference sequence. The 831 N-protein sequences reported from India until 1st June 2021 were compared with the reference sequence to identify variations present among Indian isolates.

Secondary Structure and Protein Disorder Prediction

The secondary structure of polypeptide sequence was predicted using CFSSP webserver [14] as described earlier [15]. The per-residue contribution of disorder was predicted using PONDR webserver [16]. The PONDR-VSL2 value more than 0.5 represents disorder, while the value less than 0.5 indicates order in the polypeptide structure.

Analysis of Effect of Mutation on Protein Function and Stability

The PROVEAN (Protein Variation Effect Analyzer) score indicates the probable effect of a mutation on protein function [17]. For this prediction, the default threshold score of –2.5 was used. A PROVEAN score of ≤–2.5 represents “deleterious” mutation, while, score more than –2.5 indicates “neutral” mutation. We predicted protein stability by iMutantsuite webserver [18] based on the difference in free energy (ΔΔG) as described earlier [19]. The protein is considered more stable if the ΔΔG is positive, while negative ΔΔG indicates instability.

RESULTS

Identification and Characterization of Linear B Cell Epitopes of N-Protein

The linear B cell epitopes were predicted by a bioinformatic tool IEDB, which is based on the “Bepipred linear prediction method” using 0.350 as a threshold value. The analysis of linear B cell epitopes of N-protein revealed that it contributes to 13 potential peptides (Fig. 1, yellow shaded area). Among those 13 peptides, seven peptides fulfilled the criteria of being antigenic, non-allergen and non-toxic (Table 1). The complete list of 13 peptides is shown in Supplementary Table S2. The peptide 6 (KLDDKDPNFK) shows the highest antigenic value (Vaxijen score) which is 2.129 followed by peptide 4 (AFGRRGPEQTQGNFG) with 1.172 score.

Fig. 1.
figure 1

Analysis of linear B cell epitopes contributed by N‑protein. Data was obtained using IEDB webserver. The B cell epitopes were predicted at the default threshold value of 0.350 for each protein. The Y-axis represents the “Bepipred score,” while the X-axis represents the N-protein residue number. The yellow shaded regions indicate the residues that reside in the putative B cell epitopes, while the green shaded region represents non-antigenic residues of N-protein.

Table 1. Linear continuous B cell epitopes of N-protein predicted by IEDB webserver

Analysis of Variations in N-Protein of SARS-CoV-2 among Indian Isolates

To understand the N-protein variations in India, we compared the SARS-CoV-2 N-protein sequences reported from India with the first sequence reported from Wuhan, China (YP_009724397). The MSA analysis revealed that N-protein has gained 81 mutations in India (Table 2). We characterized those mutants by analyzing the change in polarity and charge of N-protein (Table 2). Subsequently, we did a stability prediction using I-mutant Suite to understand the effect of these mutations on N-protein. For stability prediction, we measured the change in free energy (ΔΔG) that demonstrates the stability of the protein. Our data revealed that 17 mutations increased stability since the ΔΔG was positive for them while 64 mutations led to decrease in stability of N-protein (Table 2). Further, we measured the PROVEAN score of each mutant that predicts the impact of mutation on protein function. Our data shows that several mutations impart no effect on protein function (neutral), while 12 mutations are deleterious (Table 2). Altogether, our data suggests that the variations in N-protein contribute to the alteration in their properties and function.

Table 2. List of SARS-CoV-2 N-protein mutations identified from Indian isolates

Mutations in B Cell Epitope Cause Alteration in Their Antigenicity

Subsequently, we mapped the identified N-protein mutations over the seven B cell epitopes described above. Our data revealed that 21 mutations reside within the B cell epitopes (Fig. 2a). The peptides 2, 5, and 6 possess mutations at one site (Fig. 2a). Similarly, peptide 1 has two mutations while peptide 3 and 5 possess seven and nine mutations, respectively (Fig. 2a). We did not observe any mutation in peptide 4, suggesting that this epitope is the most conserved among the rest of the epitopes (Fig. 2a). The details of the wild type and mutant epitopes are mentioned in Supplementary Table S3. Subsequently, we analyzed the antigenicity, allergenicity and toxicity of the mutant epitopes. Interestingly, our data revealed that all mutant epitopes are non-allergen and non-toxic (Supplementary Table S3); however, peptide 3 mutant 2 (P3M2) and peptide 5 mutant 1 (P5M1) lost their antigenic property and became non-antigenic (Figs. 2d and 2e). The mutant peptides of epitopes 1, 2, 6 and 7 do not show any significant alteration in antigenicity property (Figs. 2b, 2c, 2f, and 2g). The loss of epitope in P3M2 and P5M1 can also be visualized by IEDB epitope predictions. The “bepipred score graph” revealed that compared to wild-type, the mutants P3M2 (compare 2H and 2I) and P5M1 (compare 2H and 2J) has loss of epitopes. Altogether, our data suggests that the emerging mutations in N-protein can contribute to alterations in their properties.

Fig. 2.
figure 2

Characterization of N-protein epitopes. (a) the linear sequence of N-protein is shown along with the residue number. The location of B cell epitopes (peptide) is mentioned in the linear sequence of N-protein. The asterisk denotes the mutant residues identified among Indian SARS-CoV-2 isolates. Note that several mutations reside in the B cell epitope sequences. (b–g) Effect of mutation on the antigenicity of B cell epitopes. The antigenicity was analyzed by predicting the Vaxijen score of wild type and mutated peptide sequence. Each panel (b–g) represents the individual peptide and their mutant counterpart. In two cases, Peptide 3 (d) and Peptide 5 (e) the mutant showed marked reduction in Vaxijen score (highlighted in green dashed box). (h–j) Comparative analysis of B cell epitopes in wild-type and mutants. Panel H shows “wild-type N-protein” graph, panel I represents the N-protein sequence containing P3M2 mutation and panel J shows N-protein sequence containing P5M1 mutation. The Y-axis represents the “Bepipred score”, while the X-axis represents the residue N-protein residue number. The yellow shaded regions indicate the residues that reside in the putative B cell epitopes, while the green shaded region represents non-antigenic residues of N-protein.

Mutant Epitopes Alter Protein Disorder Parameters

We further characterized the effect of mutations on the two epitopes that lost their antigenicity. We analyzed per-residue disorder property of the mutant peptides. Our data shows that both mutant peptides are more ordered than their wild-type counterparts (Supplementary Figs. S1A and S1B). Altogether, our data suggests an alteration in protein disorder score in the mutant epitopes.

DISCUSSION

The SARS-CoV-2 has undergone rapid evolution after its emergence from Wuhan, China that is adversely impacting the development of vaccines and treatment strategies. Therefore, there is an urgent need to understand the biology of SARS-CoV-2 for designing effective therapy. Although several in silico predictions have been conducted on SARS-CoV-2 to identify putative epitopes, those studies were aimed for vaccine development [2023]. In this study, we focused on the mutations that have occurred in the epitopes and their effects were predicted. Our data revealed that N-protein sequences reported from India have 81 mutations. To correlate “how these mutations might affect N‑protein epitopes”, we mapped these 81 mutations with putative epitopes to identify 21 mutations reside in those N-protein epitopes. Furthermore, we observed that two epitopes lost their antigenicity (Fig. 2) due to the mutation suggesting that the epitopes are changing with SARS-CoV-2 evolution. Our data strongly suggests that, as a consequence of the mutation occurring on epitopes, the specificity and/or sensitivity of the immune-based assays where N-proteins are used might change, which will adversely affect the results. It has been established by several studies that the new SARS-COV-2 variants have altered host antibody interactions and in some rare cases they might not be recognized by the host antibody [24, 25]. A recent study revealed that several SARS-CoV-2 variants have decreased sensitivity to neutralizing monoclonal antibodies [26]. Such variants are likely to become resistant to the host immune system. This study gives a comprehensive view of B cell epitopes of SARS-CoV-2 N-protein and its evolution has been discussed. It is evident from our study that future studies should be conducted to link the impact of newly emerging SARS-CoV-2 mutations on protein structure, antigenicity, interaction with antibodies and their consequences.