Biological context

Over the last 20 years three major human infections related to Coronaviruses attracted the attention of the global scientific community. The most recent infection, named COVID-19, spread rapidly worldwide and turned into a pandemic as declared by WHO. This disease is caused by Severe Acute Respiratory Syndrome coronavirus 2 (SARS-CoV-2), which along with SARS-CoV and MERS-CoV (Middle Eastern respiratory syndrome) coronaviruses, belongs to the beta Coronavirinae subfamily (i.e. 4 genera of Coronaviridae). All of the aforementioned members share a common morphology possessing a positive-sense single-stranded RNA (+ ssRNA) genome of up to 30 kb in length, encapsulated in a spherical virion (Burrell 2017). The 30 kb RNA of coronaviruses is the largest known RNA viral genome, containing multiple ORFs and possessing 5’ cap, 3’ poly-A tail, 5’ and 3’ UTRs with important roles in the regulations of replication and transcription (Snijder et al. 2016; Lei et al. 2018; Alhammad and Fehr 2020a).

One third of this genome (located at the proximal 3’ end) encodes genes that correspond to the Coronaviruses virion main structural proteins: the spike glycoprotein (S), the transmembrane protein (M) and the internal phosphorylated nucleocapsid protein (N); another minor protein is a small transmembrane protein (E). Furthermore, some coronaviruses exhibit a gene for an additional envelope protein with esterase and hemagglutination functions (HE), the latter is also found in influenza virus (Burrell et al. 2017; Alhammad and Fehr 2020a).

The two thirds at the proximal 5′ end of the genome consists of two ORF (ORF 1a and 1b). A ribosomal frame-shift leads to the translation of two polyproteins (pp1a and pp1ab) which are further processed to the necessary elements for the formation of the replication/transcription viral complex (RTC) (Perlman et al. 2009; V’kovski et al. 2020). In general, three virus-encoded proteases play a key role in the polyproteins cleavage. The papain-like protease 1(PL1pro), papain-like protease 2 (PL2pro) and the 3C-like main protease (Mpro). Among the resulting 16 proteins, generated by this cleavage, the multi-domain non-structural protein 3 (nsP3) is the largest and it is considered as a key component with various roles in different stages of the viral-life, from RTC formation to host proteins interactions. Nevertheless, there are still many undefined functions that have to be further investigated (Snijder et al. 2016; Burrell et al. 2017; Lei et al. 2017).

In contrast with the most of coronaviruses, SARS-CoV-2 genome encodes only two proteases (PLpro and Mpro). nsP3 contains PLpro and other globular domains such as the macro or “X” domain (Snijder et al. 2016). The macro domain is conserved in many viruses with a three-layer α/β/α sandwich fold, exhibiting ADP-ribose binding properties and ADP-ribose hydrolase activity and it is termed here nsP3b (Lei et al. 2018; Frick et al. 2020; Michalska et al. 2020; Alhammad et al. 2020b). nsP3b is sequentially followed by nsP3c the so-called SARS Unique Domain (SUD) (Tan et al. 2007; Tan et al. 2009; Chatterjee et al. 2009; Serrano et al. 2009; Johnson et al. 2010; Kusov et al. 2015). These domains occur mainly in SARS-CoV and they have been also related to viruses found in certain bats (Lei et al. 2017). SUD exhibits a three-domain architecture; SUD N, SUD M for middle domain, and SUD C.

Despite the similar fold-type compared to the macro domain of the two first SUDs, N and M, there is no sufficient experimental evidence for common functional properties with the macro domain. However, it is believed that the dual domain polypeptide of SARS-CoV, SUD-NM, binds oligo(G)-nucleotides capable of forming G-quadruplexes. Mutations either on SARS-CoV SUD-N or SUD-M reduce or abolish the G-quadruplex binding capacity and selectivity of these domains (Tan et al. 2009). According to in vitro experiments, it is now believed that due to this binding affinity of SUDs to G-oligonucleotides, SUDs may be involved in the virus’ RTC machinery and they may control the host-cell’s response to the viral infection (Kusov et al. 2015). Therefore, considering the functional role of both SARS-CoV macro and SUDs and the recent SARS-CoV-2 outbreak, the discovery of antiviral therapeutics that target these domains is of great interest. In the absence of the 3D structure of any of the SARS-CoV-2 SUDs, we report herein a preliminary NMR study and the complete backbone and side chains chemical shifts assignment of the SARS-CoV-2 SUD-N (spanning the residues 409–548; according to nsP3 numbering). These data provide the basis for drug-screening attempts and may be exploited in the identification of a lead compound with potential antiviral interest through the testing of a large chemical compound library.

Methods and experiments

Construct design

The coding sequence of the SUD-N domain (residues 409-548 of nsP3) was amplified using primers (forward: 5′ CGCGGATCCCAAGACGATAAG 3′ and reverse 5′ CCGCTCGAGTTATTATTCTTGTTTCTC 3′ designed on cDNA sequence encoding nsP3 residues 201-745 (GenBank entry: MT066156.1- nucleotide numbering of the whole genome 3319–4954- GenBank entry: QIA985533 orf1ab—protein numbering of 1019–1563) obtained as a codon optimized for expression in E. coli, synthetic gene from GenScript, (Piscataway, NJ). SARS-CoV-2 SUD-N coding sequence was cloned into pGex4T-1 expression vector, containing an N-terminal GST-tag followed by a thrombin cleavage site. The produced protein contained two artificial N-terminal residues (G-1, S-0) preceding the native protein sequence.

Protein expression and uniform 15N and 15N/13C labeling

An LB preculture was inoculated with BL21 (DE3) E.coli cells transformed with the above mentioned plasmid, and was grown overnight at 37 ˚C, 180 rpm with 0.1 mg/mL ampicillin. A culture of 0.5 L M9 medium (40 mM Na2HPO4, 22 mM KH2PO4, 8 mM NaCl) containing 0.5 g 15N labeled NH4Cl and 2 g unlabeled or 13C D-glucose, 1 mL from a stock solution containing 0.5 mg/mL biotin and 0.5 mg/mL thiamine, 0.5 mL 1 M Mg2SO4, 0.15 mL 1 M CaCl2, 1 mL solution Q (40 mM HCl, 50 mg/L FeCl2 . 4H2O, 184 mg/L CaCl2 · 2H2O, 64 mg/L H3BO3, 18 mg/L CoCl2 · 6H2O, 4 mg/L CuCl2 · 2H2O, 340 mg/L ZnCl2, 710 mg/L Na2MoO4 · 2H2O, 40 mg/L MnCl2 · 4H2O), and 0.1 mg/mL ampicillin was inoculated with the preculture. When the OD600 reached 0.6–0.8, IPTG was added to final concentration of 1 mM and the culture was incubated at 18 °C. Sixteen hours (16 h) after induction the cells were harvested by centrifugation.

Protein purification and sample preparation

The cell pellet was re-suspended with 25 mL lysis buffer containing 50 mM Tris-HCl pH 8, 300 mM NaCl, 2 mM DTT. Then protease inhibitor cocktail (Sigma Aldrich® P8849) and 25 μL DNase (1 mg/mL) were added to the re-suspended cells, and they were sonicated (PMisonix®, Sonicator 4000), after the sonication took place, the suspension was centrifuged at 4 °C and 20.000 rpm (Thermo Scientific®, Sorvall Lynx 6000) for 30 min. The soluble fraction containing the GST-tagged SARS-CoV2 SUD N was loaded onto a GSTrap HP affinity column (GE Healthcare) that had been previously equilibrated with 25 mL binding buffer (50 mM Tris-HCl pH 8, 300 mM NaCl). The column was washed with 50 mM Tris-HCl pH 8, 300 mM NaCl and then 100 μL thrombin (10 mg/mL) in 5 mL binding buffer were added to the column in order to remove the GST tag from the SUD-N. The column was incubated with thrombin for 16 h (overnight) at 4 °C. After the overnight incubation, the SUD-N was eluted with 50 mM Tris-HCl pH 8, 300 mM NaCl, as was verified with a 17% SDS-PAGE gel and finally GST-elution (50 mM Tris-HCl, pH 8, 10 mM Reduced Glutathione). With the use of an Amicon® Ultra 15 mL Centrifugal Filter membrane (nominal molecular weight cutoff 10 kDa) the protein was concentrated to final volume of 1 mL and buffer exchange was performed, to 50 mM NaPi, pH 7.2, 50 mM NaCl, 2mM EDTA, 2mM DTT. Finally, size exclusion chromatography was performed using Superdex 75 Increase 10/300 GE Healthcare column. The fractions containing the pure protein were pooled together and were concentrated to final volume of 500 μL to prepare the NMR sample.

Data acquisition, processing and assignment

Protein samples for NMR experiments were prepared in 500 μL of a mixed solvent of 90% H2O (50 mM NaPi, 50 mM NaCl, pH 7.2) and 10% D2O, 2 mM DTT, 2 mM EDTA, 2 mM NaN3, bacterial inhibitor cocktail (Sigma Aldrich®) and 0.25 mM DSS (4,4-dimethyl-4-silapentane-1-sulfonic acid) as internal 1H chemical shift standard. 13C and 15N chemical shifts were referenced indirectly to the 1H standard using a conversion factor derived from the ratio of NMR frequencies (Wishart et al. 1995). The protein concentration in the NMR sample was in the range of 0.9-1 mM. All NMR experiments were recorded at 298 K on a Bruker Avance III High-Definition four-channel 700 MHz NMR spectrometer equipped with a cryogenically cooled 5 mm 1H/13C/15N/D Z-gradient probe (TCI). The acquired NMR experiments used for sequence specific assignment are summarized in Table 1.

Table 1. List of NMR experiments acquired to perform the sequence specific assignment of apo-nsP3c SUD-N domain and their main acquisition parameters

Results

Assignments and data deposition

Backbone and sidechains assignments for the SARS-CoV-2 SUD-N were obtained from the following heteronuclear experiments: 2D [1H,15N]-HSQC and 2D [1H,15N]-TROSY, 3D HN(CO)CA, 3D HNCA, 3D TROSY HN(CO)CACB, 3D TROSY HNCACB, 3D HN(CA)CO, 3D HNCO, 3D HNHA, 3D HBHA(CO)NH, aliphatic 3D (H)CCH TOCSY, 2D [1H,13C]-HSQC and 3D 15N-edited NOESY (Table 1). To help the assignment process CBCA(CO)NH selective experiments have been performed in order to identify residues without CG and residues such as Ala, Cys and Ser (Table 1). All NMR data were processed with TOPSPIN 4.0.6 and analyzed with CARA 1.9.2a4 (Keller 2004). The 2D 1H,15N-HSQC spectrum shows well-dispersed amide signals (Fig. 1). For nsP3c SUD-N we assigned 95% of the resonances of the backbone atoms (HN, N, Hα, CO, Cα and Cβ) and about 80% of all the atom of side chains. The unassigned residues of nsP3c SUD-N are Gln409, Lys486, Lys487, Thr491, Glu492, Leu516 and Asn517 and are part of the loop regions as shown in Fig. 2.

Fig. 1
figure 1

700 MHz 1H,15N-HSQC assigned spectrum of the 13C,15N-labelled SARS-CoV-2 SUD-N nsP3c at 1 mM concentration in 50 mM NaPi pH 7.2, 50 mM NaCl, 2 mM EDTA, 2 mM DTT and 10% D2O acquired at 298 K. Amino acid numbering is according to the sequence of the multi-domain non-structural protein 3 (nsP3)

Fig. 2
figure 2

Predicted secondary structure of SARS-CoV-2 SUD-N nsP3c using TALOS+

Secondary structure prediction was performed using chemical shift assignments of five atoms (HN, Hα, Cα, Cβ, CO, N) for each residue in the sequence using TALOS+ (Shen et al. 2009). The secondary structure elements are in the following order from N-terminal to C-terminal: β/α/β/α/α/β/β/α. The order of the secondary structure segments is very similar to that of the nsP3b protein (Cantini et al 2020).

Chemical shift values for the 1H, 13C and 15N resonances of SARS-CoV-2 nsP3c SUD-N have been deposited at the BioMagResBank (https://www.bmrb.wisc.edu) under accession numbers 50448.