Clinical status and contact/travel history
All patients were diagnosed positive for SARS-CoV-2 RNA by Real Time PCR as described above. Five of the patients suffered from fever, while seven patients exhibited some symptoms of infection like sore throat, cough with sputum, running nose or breathlessness. One patient suffered from Acute Respiratory Distress Syndrome (ARDS). Two patients did not exhibit any symptom (table 1). Five individuals had contact with COVID-19 patients in particular; both S2 and S3 had contact with the same patient (table 1). One individual had history of international travel while another had history of domestic travel.
The shotgun RNA-Seq data resulted in high coverage (greater than 100X median depth of coverage) of complete genome sequences of the SARS-CoV-2 in five samples (S2, S3, S5, S6 and S11) in which greater than 96% of the viral genome was covered at greater than 5X and greater than 99% of the viral genome was covered at greater than 1X. A negative correlation was found between viral load (represented by the Threshold Cycle or Ct value of the RNA samples in the Real Time PCR based diagnostic assay) and the number of reads mapped to the viral genome in the RNA-Seq library. Even with 9 samples, the Pearson Correlation Coefficient was found to be −0.63 (P value = 0.036) (table 1). In particular, it was observed that samples with Ct values greater than 25 mostly resulted in generation of low counts of viral sequence reads leading to less than 15X median depth of coverage of the viral genome. In the remaining four samples (S1, S8, S10 and S12), the median depth of coverage was less than 15X and hence the viral genome sequencing was achieved after amplification of the viral genome by a multiplex PCR approach. All the nine sequences have been submitted in the Global Initiative on Sharing All Influenza Data (GISAID) database.
Phylogenetic tree analysis of the sequences, along with other complete viral genome sequences submitted from India in GISAID, revealed that seven of these sequences belonged to the A2a clade while only one sequence belonged to clade B4 (figure 1 and table 1). We were unable to classify one of the nine sequences, S1, into any clade due to low sequence coverage. To understand transmission histories of these nine SARS-CoV-2 isolates from West Bengal, we aligned these sequences with more than 6000 global sequences, including thirty sequences submitted in GISAID from India (at the time of our analysis) to identify specific mutations that occur at the highest level of the tip in a branch leading to the specific subtype. The predicted origin of the transmitted subtype in each case was identified with 98-100% confidence from the branch in which our samples were located in the phylogenetic tree (table 1).
The list of mutations detected in the sequences from nine samples are provided (table 2). Seven sequences harboured the important signature mutations of A2a clade. These consisted of the 14408 C/T mutation resulting in a change of P323L in the RdRp and the 23403 A/G mutation resulting in a change of D614G in the Spike glycoprotein of the virus. In addition to these, 24933 G/T mutation in the gene coding for Spike glycoprotein (G1124V) and triple base mutations of 2881-2883 GGG/AAC in the gene coding for nucleocapsid resulting in two consecutive amino acid changes R203K and G204R were detected in S2, S3 and S2, S3, S5 respectively. While the 24933 G/T S gene mutation was unique to these samples and could not be found in any other sequence from India or the rest of the World, the nucleocapsid mutations could be detected in only three other sequences from India (figure 2). Out of these, two sequences were obtained from individuals with contact history of a COVID-19 patient who had travelled from Italy. Interestingly, two out of three sequences harbouring these mutations obtained by us belonged to Kolkata and with contact history with one COVID-19 patient who had travelled from London (UK). The third sequence was obtained from a COVID-19 patient from Darjeeling, India who had history of travel from Chennai, India. These mutations have been found in 16% of SARS-CoV-2 sequences reported World-wide from countries like UK, Netherlands, Iceland, Belgium, Portugal, USA, Australia, Brazil, etc.
RdRp (NSP12) gene of the SARS-CoV-2 codes for the RNA-dependent RNA polymerase and is vital for the replication machinery of the virus. We detected a total of six mutations in this gene in the nine samples, out of which four were nonsynonymous, including the A2a clade specific 14408 C/T (RdRp: P323L) mutation. Two individuals, S11 and S12, harboured viral genome sequences that shared a unique 13730 C/T (A88V) mutation which was not found in any other sequence reported from India or rest of the World. One individual S10, whose viral sequence belonged to B4 clade, harboured 3 mutations in RdRp, which appear to be clade specific, out of which 2 were nonsynonymous.
Impact of mutations in nucleocapsid gene on miRNA binding
To study the functional relevance of the mutations, we investigated the alteration in miRNA binding in the nucleocapsid coding region, predicted to be caused by the 28881-3 GGG/AAC mutations. We found seven miRNAs which bind to the original sequence and three which bind the mutated sequence exclusively (table 3 and figure 3). The number of bases in the sequence (GGG/AAC) which bind the seed sequence of miRNA were also identified. The strength of miRNA prediction is reflected by the ∆G value mentioned in the figure 3. Lesser the value, stronger is the binding. The values are comparable to some of the experimentally validated miRNA bindings like miR122 binding to HCV RNA has ∆G value of −18.3 kcal/mol for S1 binding site and −22.6 kcal/mol for S2 binding site (data not shown). The values of ∆G obtained for the miRNAs binding to N protein coding region are comparable to these values, suggesting their relevance in the in vivo conditions.
We checked the levels of these miRNAs in cancer conditions around the upper respiratory tract in the dBDEMC2 database. We found that miR-24-1-5p and miR-299-5p were downregulated in most of the cancers. miR-24-2-5p was found to be upregulated in Esophageal Cancer (ESCA), Head and neck cancer (HNSC), Lung cancer (LUCA) and downregulated in Nasopharyngeal cancer (NSCA) (supplementary figure 1). Assuming that the binding of miRNAs would inhibit the viral replication/stability, higher abundance of that miRNA would be protective against infection and lower abundance would increase the susceptibility towards infection. To comprehend the results, we have found that if a patient suffering from ESCA, HNSC, LUCA is infected with the original virus containing GGG sequence, the upregulated miR-24-2-5p would be protective against the infection. But, if the same patient is infected with the mutated virus containing AAC sequence, miR-24-2-5p will not be functional anymore and miR-299-5p which targets the mutated site is also downregulated. This could make the patients suffering from described cancers, highly susceptible to infection with the mutant virus.
We also checked if these miRNAs are associated with other disease conditions and found that miR-299-5p is down regulated in Type 2 Diabetes Mellitus (T2DM) and hence could serve as one of factors for increased susceptibility of T2DM patients for the mutated viral subtype and increase the risk of co-morbidity (Huang et al. 2018). Another miR-3162-3p, targeting original subtype, is reported to be higher in Asthma patients (Fang et al. 2016). This could be one of the factors limiting the original viral propagation, but the loss of its targeting site in mutated viral subtype could increase the host susceptibility towards viral infection.
We further checked if there are some other conditions that could alter the availability of these miRNAs at the site of infection. Therefore, we used the TissueAtlas database to analyse the presence and correlation of these miRNAs in body fluids. We found that there is differential expression of certain miRNAs in the saliva of patients suffering from pancreatic cancer. miR-642b-5p, miR-3162-3p and miR-299-5p were found to be upregulated in the saliva of pancreatic cancer patients which could provide similar protective/susceptible effect as mentioned of miRNAs before (supplementary figure 2). miRNAs have been known to affect viral replication and stability by binding to protein coding regions of the genome of H1N1, EV71, CVB3 and many more viruses (Bruscella et al. 2017; Trobaugh and Klimstra 2017). In most of the cases, binding of miRNAs leads to translational repression of the targeted protein and hence directly affects viral RNA replication. Targeting by miRNAs could decrease the levels of N protein, which is involved in various steps of viral life cycle including replication, translation and coating of viral RNA to form the nucleocapsid. Hence, altered levels of the shortlisted miRNA could regulate various viral processes and severity of SARS-CoV-2 infection. The effect of miRNAs would be opposite if they assist in viral replication/stability, but that needs to be experimentally confirmed and still holds the importance of miRNAs targeting the original and mutated sites.
Structural impact of mutations in nucleocapsid
We analysed the 28881-3 GGG/AAC mutations in the nucleocapsid gene which results in contiguous amino acid changes of R203K and G204R for their potential role in alteration of structure of the encoded protein. The sites of these mutations at position are located in the SR-rich region which is known to be intrinsically disordered (Chang et al. 2014). In addition, this region is known to encompass a few phosphorylation sites (Surjit et al. 2005), notably the GSK3 phosphorylation site at Ser202 and a CDK phosphorylation site at Ser206 which are in close proximity to these mutations. The sequence motifs ‘SRGTS’ (202-206) and ‘SPAR’ (206-209) are entirely consistent with GSK3 and CDK phosphorylation motifs, respectively. When Ser202 is phosphorylated which incorporates a large negative group tethered to the sidechain of Ser, as seen in many other substrates of kinases, it is likely that charge neutralization takes place involving positively charged sidechains in the sequential and spatial vicinity. Arg203 is a part of GSK3 phosphorylation motif and its sidechain could potentially contribute to charge neutralization at P-Ser202. Given the sequential, and therefore spatial proximity of Arg203 to P-Ser206 the sidechain of Arg203 could potentially be involved in interaction also with phosphate group at position 206. This interaction would contribute to reduction of conformational entropy. Similarly, Arg209, a part of CDK phosphorylation motif, would contribute to charge neutralization at P-Ser206. Arg203 and Gly204 are mutated to Lys and Arg respectively (figure 4).
Possible implications of D614G mutation (in SD domain) on protein structural stability
Spike protein (S) of coronaviruses is a class I viral fusion protein which is synthesized as a single chain precursor that trimerizes upon folding. It is composed of two subunits: S1 (in the amino terminal) containing the receptor binding domain (RBD) and S2 (in the carboxy terminal) that drives membrane fusion (figure 5). While the S1 N-terminal region comprises of domain A, the S1 C-terminal half folds as three spatially distinct domains: B, C and D (Walls et al. 2016; Wrapp et al. 2020). The S protein pre-dominantly exists in two structurally distinct conformations: pre-fusion and post-fusion (Li 2016). The pre-fusion state is metastable. Interactions between the protomers facilitated through interlocking of the S1 subunit around the S2 trimer in a crown-like fashion stabilizes the pre-fusion conformation (Walls et al. 2016). Transition from pre-fusion to post-fusion state involves large conformational change to fuse the viral membrane with host cell membrane. This process is triggered when S binds to hACE2 protein via the SB domain to enter the host cell. The ectodomain trimer of S protein in coronaviruses are known to adopt multiple SB conformations (Walls et al. 2020). At the time of preparing this manuscript, three pre-fusion state structures of SARS-CoV-2 S protein solved by cryo-EM are available which encompass the two sites of mutation (D614 and G1124) of our interest. While one of these three structures (PDB code: 6VXX) (Walls et al. 2020) is in a perfectly closed state (i.e., the SB domain in all the three protomers of the trimeric S protein are in a closed conformation), the other two structures (PDB codes: 6VYB and 6VSB) (Walls et al. 2020; Wrapp et al. 2020) are in partially open state. In 6VYB, the SB domain in chain B is in open state whereas it has a closed conformation in chain A and C. In 6VSB, the SB domain in chain A is in open state whereas it has a closed conformation in the other two chains (B and C). In all the three structure, D614 lies in a loop at the interface between any two out of the three protomers. The co-ordinates for the D614 side-chain in chain A and C of 6VYB are available only up to Cβ-atom and the orientation of these atoms are similar to that observed in the respective atoms of D614 in 6VXX. The co-ordinates of all the side-chain atoms of D614 in chain B of 6VYB are available and they are similar to that observed in chain B of 6VXX. The side-chain of D614 in all the protomers of 6VXX and chain B of 6VYB point outward from the core of the protein toward the solvent. The side-chain orientation of D614 in all the three chains of 6VSB is different from the former two structures. This differential orientation of D614 side-chain in 6VSB facilitates formation of hydrogen bond between D614 (present in S1 subunit) and T859 (present in S2 subunit) from the neighbouring chain in two out of the three interfaces found in 6VSB (figure 6).
Taken together, these facts suggest that D614 is highly flexible and support the wobbly nature of the inter-protomeric hydrogen bond observed between D614 and T859. Contribution of this transient hydrogen bond toward stability of the pre-fusion state cannot be negated. Interestingly, S protein of mouse coronavirus (MHV-A59) which has a similar structural topology as that of the SARS-CoV-2 S protein but shares a low overall sequence identity (~32%), has a conservative substitution at the position equivalent to D614 of the latter. The Asn (N655) of mouse coronavirus (MHV-A59) is replaced with Asp (D614) in SARS-CoV-2 (figure 5 and figure 7). In earlier literature, N655 has been suggested to offer inter-protomeric interactions that contribute toward maintenance of the S2 fusion machinery in its metastable state (AC Walls et al. 2016). Given the conservation of Asp at this position in closely related coronaviruses (Bat coronaviruses: BtCoV-RaTG13 and BtCoV-HKU3; SARS-CoV) and its conservative substitution in mouse coronavirus (MHV-A59), it is likely that D614 is important for structural stability of S protein.
As Gly lacks a side-chain, the transient hydrogen bond as observed in the wild-type S protein would be lost in the variant with D614G mutation. This can potentially compromise on the structural stability of pre-fusion state of S protein possibly interfering with conformational transitions. Moreover, replacement of Asp with Gly at this position would come with higher conformational freedom at the backbone (C Ramakrishnan and GN Ramachandran 1965) of the polypeptide resulting in enhancement of local conformational entropy.
Possible implications of G1124V mutation (in S2 subunit) on protein structural stability
The Gly at this position is solvent exposed and is present at the tip of the C-terminal end of a β-strand. This position is proximal to the region where the S protein attaches itself to the viral membrane (figure 5). It is to be noted that the Gly at this position is conserved among the closely related coronaviruses (Bat coronavirus RaTG13 and HKU3, SARS-CoV) hinting toward its possible role in maintenance of structure and function of the S protein (figure 7). In general, as explained above, Gly backbone has higher conformational freedom than any other amino acid residues (Ramakrishnan and Ramachandran 1965). Therefore, substitution of Gly with Val would impart rigidity to the local region. The possible implication of such rigidity on the association of S protein with viral membrane could be understood from a structure of S protein in association with the viral membrane. However, such a structure is currently unavailable.