SARS-CoV-2 RNA, nsP3c (non-structural Protein3c) spans the sequence of the so-called SARS Unique Domains (SUDs), first observed in SARS-CoV. Although the function of this viral protein is not fully elucidated, it is believed that it is crucial for the formation of the replication/transcription viral complex (RTC) and of the interaction of various viral “components” with the host cell; thus, it is essential for the entire viral life cycle. The first two SUDs, the so-called SUD-N (the N-terminal domain) and SUD-M (domain following SUD-N) domains, exhibit topological and conformational features that resemble the nsP3b macro (or “X”) domain. Indeed, they are all folded in a three-layer α/β/α sandwich structure, as revealed through crystallographic structural investigation of SARS-CoV SUDs, and they have been attributed to different substrate selectivity as they selectively bind to oligonucleotides. On the other hand, the C-terminal SUD (SUD-C) exhibit much lower sequence similarities compared to the SUD-N & SUD-M, as reported in previous crystallographic and NMR studies of SARS-CoV. In the absence of the 3D structures of SARS-CoV-2, we report herein the almost complete NMR backbone and side-chain resonance assignment (1H,13C,15N) of SARS-CoV-2 SUD-M and SUD-C proteins, and the NMR chemical shift-based prediction of their secondary structure elements. These NMR data will set the base for further understanding at the atomic-level conformational dynamics of these proteins and will allow the effective screening of a large number of small molecules as binders with potential biological impact on their function.
A novel coronavirus (SARS-CoV-2) that causes the disease Coronavirus Disease 2019 (COVID-19) emerged in a seafood and poultry market in the Chinese city of Wuhan in 2019 (Li et al. 2020). Cases have been detected in most countries worldwide, and on March 11, 2020, the World Health Organization characterized the outbreak as a pandemic. Other coronaviruses that have plagued humankind till these days are namely, SARS-CoV (identified in 2003) and MERS-CoV (Middle East Respiratory Syndrome; first reported in Saudi Arabia in 2012). Since these three viruses belong to the same family, they share significant similarities including giving rise to severe symptoms and having high pathogenicity in humans. However, despite the structural similarities and other common features in pathogenicity, studies have identified some differences inter alia in the spike protein (S), which is believed to be of immense importance for the increased efficiency in SARS-CoV-2 transmission and spread (Ou et al. 2020). Therefore, it is important to determine the protein structure differences among these coronaviruses in an attempt to elucidate the mechanisms and the factors that induce virulence of each individual virus.
Among the protein domains that are common in SARS viruses are, the so-called, SARS Unique Domains (SUDs), first identified in SARS-CoV. The polypeptide that includes these SUDs domains is part of the non-structural protein 3 (nsP3), and it is named nsP3c. It consists of three separate domains, SUD-N that is located at the N-terminal of SUD, SUD-M (middle domain) and SUD-C that is the smallest of the three (Tan et al. 2007; Johnson et al. 2010; Serrano et al. 2009; Kusov et al. 2015; Burrell et al. 2017; Lei et al. 2018). These three SARS-CoV-2 domains share sequence identity of 68.57%, 81.6% & 73.44%, respectively, with SARS-CoV SUD N, M and C domains (Fig. 1).
SUD-N and SUD-M exhibit a macro-like folding, α/β/α sandwich fold consisting of ~ 120–140 amino acids. According to the literature, they have a greater affinity for oligonucleotides instead of binding ADP-ribose (ADPr) and they lack the capacity of macro domains to hydrolyze attached ADPr molecules as well as their potential inability to de-MARylate substrates (Tan et al. 2009; Alhammad et al. 2020). Unlike the two previous domains, SUD-C is the shortest in length domain (~ 60–70 amino acids) and has a frataxin-like or a double-wing motif α/β fold, consisting of five antiparallel β sheets, packed against two α helices (Johnson et al. 2010; Chatterjee et al. 2009; Tan et al. 2009). The proposed function of SUD-M and SUD-C is that of binding of G-quadruplex forming RNA (Hammond et al. 2017). It has also been reported that SUD-C from bat coronavirus has DNA and metal ion-binding properties (Staup et al. 2019). Specifically, SUD-M, as a single domain, has been reported to bind (GGGA)2 and (GGGA)5 as well as (GGGA)2GG while SUD-MC, as a double domain, only binds to (GGGA)2GG but not (GGGA)2 or (GGGA)5, suggesting SUD-C might play a role in tuning the selectivity of binding of SARS Unique Domain (Johnson et al. 2010). Moreover, in vivo experiments shed light on SUD-M as an essential domain for the replication of the viral genome, in contrast to SUD-N and SUD-C, which are nonessential for virus genome replication (Kusov et al. 2015). One of the reported features of SUDs is the interaction with host proteins, like RCHY1, which is an E3 ligase that regulates the function of p53 protein, which might have an antiviral role (Ma-Lauer et al. 2016). These interactions might also be crucial for the acute symptoms experienced by the individuals infected with the virus. In addition, a recent study demonstrated that SUD-MC interacts with specific cellular components, affecting the pulmonary inflammation (Chang et al. 2020). Conformational dynamics and interaction properties of SUDs may be of great interest for the detailed functional characterization of the viral components and/or the discovery or the identification of new lead compounds that bind to these proteins in the quest for new antiviral drugs.
We report herein the complete backbone and side chains chemical shift assignments of the SARS-CoV-2 SUD-M and SUD-C (spanning the residues 551–675 and 680–743; according to nsP3 numbering, respectively). These data can be exploited for the elucidation at the atomic level of the structure, dynamics and interaction of these domains with a library of chemical compounds with potential antiviral properties.
Methods and experiments
The coding sequences of the SUD-M domain (551-675 of nsP3) and SUD-C domain (680-743 of nsP3) were amplified using primers (fwd: 5′ GAATTCCATATGGGTACCGTGAGCTGGAAC 3′ and rev: 5′ CCGCTCGAGTTATTAGCTGCTGGTCAG 3′) and (fwd: 5′ CGCGGATCCGAGGAACACTTCATCG 3′ and rev: 5′ CCGCTCGAGTTATTAGCTCAGCAGGG 3′, respectively. cDNA sequence encoding nsP3 residues 201–745 (GenBank entry: MT066156.1- nucleotide numbering of the whole genome 3319–4954- GenBank entry: QIA98553 orf1ab—protein numbering of 1019–1563) was used to design the primers. This sequence was synthesized, and codon optimized for expression in Escherichia coli, by GenScript, (Piscataway, NJ). SARS-CoV-2 SUD-M coding sequence was cloned into pET28a(+) expression vector, containing an N-terminal His-tag followed by a thrombin cleavage site. The produced protein contained four artificial N-terminal residues (GSHM) preceding the native protein sequence. The SARS-CoV-2 SUD-C coding sequence was cloned into pGEX4T-1 expression vector, containing an N-terminal GST-tag followed by a thrombin cleavage site. The produced protein contained two artificial N-terminal residues (GS) preceding the native protein sequence.
Protein expression and uniform 15N and 15N/13C labeling
For the expression of SARS UNIQUE DOMAIN M (SUD-M) and SARS UNIQUE DOMAIN C (SUD-C), in 0.5 L M9 culture (40 mM Na2HPO4, 22 mM KH2PO4, 8 mM NaCl) containing 0.5 g 15N labeled NH4Cl, 2 g unlabeled or 13C d-glucose, 1 mL from a solution containing 0.5 mg/L biotin, 0.5 mg/L thiamin, 0.5 mL 1 M Mg2SO4, 0.15 mL 1 M CaCl2, 1 mL solution Q (40 mM HCl, 50 mg/L FeCl2·4H2O, 184 mg/L CaCl2·2H2O, 64 mg/L H3BO3, 18 mg/L CoCl2·6H2O, 4 mg/L CuCl2·2H2O, 340 mg/L ZnCl2, 605 mg/L Na2MoO4·2H2O, 40 mg/L MnCl2·4H2O), and 1 mg/L of kanamycin (for SUD-M) and 1 mg/L of ampicillin (for SUD-C), an LB preculture that was inoculated with BL21 (DE3) E. coli cells transformed with the above mentioned plasmid (that was grown overnight at 37 °C, 180 rpm) was added. The culture was incubated in 37 °C, 180 rpm until the OD600 was between 0.6 and 0.8, then IPTG was added to final concentration of 1 mM and the culture incubated overnight (16 h) at 18 °C.
Protein purification performed according to standard protocols and details will be published elsewhere. The final NMR samples (concentration 0.9 mM for SUD-M and 0.7 mM for SUD-C) were prepared by adding 10% D2O and 0.25 mM DSS.
Data acquisition, processing and assignment
Protein NMR samples for SUD-M and SUD-C domains were prepared in 500 μL buffer at pH 7.2 containing 50 mM NaPi, 50 mM NaCl, 10% D2O, 2 mM DTT, 2 mM EDTA, 2 mM NaN3, bacterial inhibitor cocktail (Sigma Aldrich®) and 0.25 mM DSS (4,4-dimethyl-4-silapentane-1-sulfonic acid) as internal 1H chemical shift standard. 13C and 15N chemical shifts were referenced indirectly to the 1H standard using a conversion factor derived from the ratio of NMR frequencies (Wishart et al. 1995). The protein concentration in the NMR sample was 0.9 mM for SUD-M and 0.7 mM for SUD-C. All NMR experiments were recorded at 298 K on a Bruker Avance III High-Definition four-channel 700 MHz NMR spectrometer equipped with a cryogenically cooled 5 mm 1H/13C/15N/D Z-gradient probe (TCI). The acquired NMR experiments used for sequence specific assignment are summarized in Table 1. Backbone and sidechains assignments for both SARS-CoV-2 SUD-M and SUD-C domains were obtained from the following series of heteronuclear experiments: 2D [1H,15N]-HSQC and 2D [1H,15N]-TROSY, 3D HN(CO)CA, 3D HNCA, 3D TROSY HN(CO)CACB, 3D TROSY HNCACB, 3D HN(CA)CO, 3D HNCO, 3D HNHA, 3D HBHA(CO)NH, aliphatic 3D (H)CCH TOCSY, 2D [1,13C]-HSQC and 3D 15N-edited NOESY (Table 1). We also performed CBCA(CO)NH selective experiments in order to help the identification of residues without CG and residues such as Ala, Cys and Ser (Table 1). All NMR data were processed with TOPSPIN 4.0.6 and analyzed with CARA 1.9.2a4 (Keller 2004).
Extent of assignments and data deposition
The 2D 1H,15N-HSQC spectrum shows well-dispersed amide signals as shown in Fig. 2 for SUD-M and in Fig. 3 for SUD-C, respectively. For nsP3c SUD-M we assigned 98.6% of the resonances of the backbone atoms (HN, N, H α, CO, Cα and Cβ) and about 70% of all the atom of side chains including also the aromatic rings. The only unassigned HN and N resonances of nsP3c SUD-M belong to Ala579, Gly590 and the N of Pro572, Pro631, Pro654, Pro662 (not possible to assign following classic triple resonance proton detected experiments). Gly590 and the prolines are part of the loop regions or part of unstructured regions as shown in Fig. 4, instead Ala579 is positioned at the beginning of an α -helix. For nsP3c SUD-C we assigned 99.1% of the resonances of the backbone atoms (HN, N, Hα, CO, Cα and Cβ) and 80% of all the atom of side chains including the aromatic rings. The only unassigned HN and N resonances of nsP3c SUD-C are belonging to Gln704 and to the backbone nitrogen of Pro723, which are part of the loop regions or part of less structured regions as shown in Fig. 5.
Secondary structure prediction for both SUD domains (M and C) were performed using chemical shift assignments of five atoms (HN, H α, Cα, Cβ, CO, N) for each residue in the sequence using TALOS+ (Shen et al. 2009). The secondary structure elements for SUD-M protein (125 a.a.) are organized in the following order from N- to C-terminus: β/α/β/α/β/β/α/β/α/α/β/α (Fig. 4). The order of the secondary structure segments is very similar to that of the nsP3b protein (Cantini et al 2020) and to SUD-N domain of nsP3c, beside two extra β strands and an α -helix secondary structure elements. This domain has also high secondary structure identity in comparison with SUD-M domain from SARS-CoV. The secondary structure elements for SUD-C protein (64 a.a.) are organized in the following order from N- to C-terminus: α/β/β/β/β/α (Fig. 5). This domain has high secondary structure identity in comparison with SUD-C domain from SARS-CoV previously characterized and its secondary structure folding is very similar (Johnson et al. 2010).
Chemical shift values for the 1H, 13C and 15N resonances of SARS-CoV-2 nsP3c SUD-M and SUD-C have been deposited at the BioMagResBank (https://www.bmrb.wisc.edu) under accession numbers 50516 and 50517, respectively.
Alhammad YMO, Kashipathy MM, Roy A, Johnson DK, McDonald P, Battaile KP, Gao P, Lovell S, Fehr AR (2020) The SARS-CoV-2 conserved macrodomain is a highly efficient ADP-ribosylhydrolaseenzyme. bioRxiv. https://doi.org/10.1101/2020.05.11.089375
Burrell CJ, Howard CR, Murphy FA (2017) Coronaviruses. Fenner and white’s medical virology, 5th edn. Elsevier, Amsterdam, pp 437–446
Cantini F, Banci L, Altincekic N et al (2020) 1H, 13C, and 15N backbone chemical shift assignments of the apo and the ADP-ribose bound forms of the macrodomain of SARS-CoV-2 non-structural protein 3b. Biomol NMR Assign 14:339–346
Chang YS, Ko BH, Ju JC, Chang HH, Huang SH, Lin CW (2020) SARS Unique Domain (SUD) of severe acute respiratory syndrome coronavirus induces NLRP3 inflammasome-dependent CXCL10-mediated pulmonary inflammation. Int J Mol Sci 21:3179
Chatterjee A, Johnson MA, Serrano P, Pedrini B, Joseph JS, Neuman BW, Saikatendu K, Buchmeier MJ, Kuhn P, Wüthrich K (2009) Nuclear magnetic resonance structure shows that the severe acute respiratory syndrome coronavirus-unique domain contains a macrodomain fold. J Virol 83:1823–1836
Hammond RG, Tan X, Johnson MA (2017) SARS-unique fold in the Rousettus bat coronavirus HKU9. Protein Sci. 26:1726–1737
Johnson MA, Chatterjee A, Neuman BW, Wüthrich K (2010) SARS coronavirus unique domain: three-domain molecular architecture in solution and RNA binding. J Mol Biol 400:724–742
Keller R (2004) The computer aided resonance assignment tutorial, 1st edn. Cantina Verlag, Goldau. ISBN 3-85600-112-3
Kusov Y, Tan J, Alvarez E, Enjuanes L, Hilgenfeld R (2015) A G-quadruplex-binding macrodomain within the “SARS-unique domain” is essential for the activity of the SARS-coronavirus replication-transcription complex. Virology 484:313–322
Lei J, Kusov Y, Hilgenfeld R (2018) NsP3 of coronaviruses: structures and functions of a large multi-domain protein. Antivir Res 149:58–74
Li Q, Guan X, Wu P et al (2020) Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia. N Engl J Med 382:1199–1207
Ma-Lauer Y, Carbajo-Lozoya J, Hein MY, Müller MA, Deng W, Lei J et al (2016) P53 down-regulates SARS coronavirus replication and is targeted by the SARS-unique domain and PLpro via E3 ubiquitin ligase RCHY1. Proc Natl Acad Sci USA 113:E5192–E5201
Ou X, Liu Y, Lei X et al (2020) Characterization of spike glycoprotein of SARS-CoV-2 on virus entry and its immune cross-reactivity with SARS-CoV. Nat Commun 11:1620
Serrano P, Johnson MA, Chatterjee A, Neuman BW, Joseph JS, Buchmeier MJ, Kuhn P, Wüthrich K (2009) Nuclear magnetic resonance structure of the nucleic acid-binding domain of severe acute respiratory syndrome coronavirus nonstructural protein 3. J Virol 83:12998–13008
Shen Y, Delaglio F, Cornilescu G, Bax A (2009) TALOS+: a hybrid method for predicting protein backbone torsion angles from NMR chemical shifts. J Biomol NMR 44:213–223
Staup AJ, De Silva IU, Catt JT, Tan X, Hammond RG, Johnson MA (2019) Structure of the SARS-unique domain C from the bat coronavirus HKU4. Nat Prod Commun 14:1934578X19849202
Tan J, Kusov Y, Mutschall D, Tech S, Nagarajan K, Hilgenfeld R, Schmidt CL (2007) The “SARS-unique domain” (SUD) of SARS coronavirus is an oligo(G)-binding protein. Biochem Biophys Res Commun 364:877–882
Tan J, Vonrhein C, Smart OS, Bricogne G, Bollati M, Kusov Y, Hansen G, Mesters JR, Schmidt CL, Hilgenfeld R (2009) The SARS-unique domain (SUD) of SARS coronavirus contains two macrodomains that bind G-quadruplexes. PLoS Pathog 5:e1000428
Wishart DS, Bigam CG, Yao J, Abildgaard F, Dyson HJ, Oldfield E, Markley JL, Sykes BD (1995) 1H, 13C and 15N chemical shift referencing in biomolecular NMR. J Biomol NMR 6:135–140
Work at BMRZ is supported by the state of Hesse. Work in COVID19-NMR was supported by the Goethe Corona Funds and the DFG in CRC902: “Molecular Principles of RNA-based regulation.” Work at CERM is supported by the Italian Ministry for University and Research (FOE funding) to the Italian Center (CERM, University of Florence) of Instruct-ERIC, a European Research Infrastructure, ESFRI Landmark. This work was also supported by the INSPIRED (MIS 5002550) which is implemented under the Action ‘Reinforcement of the Research and Innovation Infrastructure,’ funded by the Operational Program ‘Competitiveness, Entrepreneurship and Innovation’ (NSRF 2014–2020) and co-financed by Greece and the European Union (European Regional Development Fund). EU FP7 REGPOT CT-2011-285950—“SEE-DRUG” project is acknowledged for the purchase of UPAT’s 700 MHz NMR equipment.
Conflict of interest
The authors declare no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Gallo, A., Tsika, A.C., Fourkiotis, N.K. et al. 1H,13C and 15N chemical shift assignments of the SUD domains of SARS-CoV-2 non-structural protein 3c: “The SUD-M and SUD-C domains”. Biomol NMR Assign 15, 165–171 (2021). https://doi.org/10.1007/s12104-020-10000-9
- Non-structural protein
- SARS unique domain (SUD)
- Solution NMR-spectroscopy
- Protein druggability