Sample preparation
The primary sequence of N from SARS-CoV-2 was extracted from NCBI genome entry NC_045512.2 [GenBank entry MN908947.3 (Wu et al. 2014)]. Commercially synthesized genes (Genscript Biotech) were codon-optimized for expression in Escherichia coli and subcloned in a pET21b(+) vector. Hexa-histidine and TEV-cleavage tags were included at the N-terminus to facilitate protein purification. After protease cleavage, the proteins contain N-terminal GRR- extensions. For 15N and 13C isotope labelling, cells were grown in M9-minimal medium supplemented with 15N–NH4–Cl and 13C6–d–glucose (1 and 2 g/L respectively).
Both nucleoprotein constructs (N123, comprising residues 1–263 and N3, comprising residues 175–263) were cloned into a pESPRIT vector between the AatII and NotI cleavage sites with His8-tag and TEV cleavage sites at the N-terminus (GenScript Biotech Netherlands). Transformation was performed by heat-shock and proteins were expressed in E. coli BL21(DE3) (Novagen) for 5 h at 37°C after induction at an optical density of 0.6 with 1 mM isopropyl–b–d–thiogalactopyranoside. Cells were harvested by centrifugating at 5000 rpm, resuspended in 20 mM Tris (pH 8.0) and 500 mM NaCl buffer, lysed by sonication, and centrifugated again at 18,000 rpm at 4°C. The supernatant was subjected to standard Ni purification. Proteins were eluted with 20 mM Tris (pH 8), 500 mM NaCl and 500 mM imidazole. Samples were then dialysed against 20 mM Tris (pH 8), 500 mM NaCl, 2 mM DTT at room temperature overnight. Following TEV cleavage, samples were concentrated and subjected to size exclusion chromatography (SEC, Superdex 75/200) in 50 mM Na-Phosphate (pH 6.5), 250 mM NaCl 2 mM DTT buffer (NMR buffer). Proteins were studied at 600 and 150 μM for N3 and 0.91 mM for N123.
NMR experiments
BEST and BEST-TROSY (BT) double and triple resonance assignment experiments, including BEST-HNCA, BEST HN(CO)CA, BT-HNCO (Lescop et al. 2007; Solyom et al. 2013), were recorded on 15N,13C-labeled samples at 298 K using Bruker Avance III spectrometers equipped with a cryoprobe at 1H frequencies of 600 and 850 MHz. R1rho relaxation experiments (Lakomek et al. 2012) were recorded at 150 μM protein concentration and 298 K in a 50 mM Na-Phosphate (pH 6.5), 250 mM NaCl 2 mM DTT buffer at 950 MHz.
Residues 217–224 and 231–235 of N3 were assigned using a deuterated 13C 15N labelled sample at 300 µM. BEST-HNCA, BEST-HN(CO)CA with one increment in 15N dimension were recorded at 850 MHz, allowing detection of weak additional peaks of the helical region and connection with those already assigned. All spectra were processed using NMRPipe (Delaglio et al. 1995) and analyzed using CCPNMR Analysis Assign (Skinner et al. 2016) and NMRFAM-SPARKY (Lee et al. 2015). Assignment was further assisted by comparison to BMRB entry 34511 (Dinesh et al. 2020) of the N-terminal RNA-binding domain of the SARS-CoV-2 nucleoprotein, specifically nucleoprotein residues 44 to 180 (N2).
Assignments and data deposition
The HSQC of N3 is typical of an intrinsically disordered protein (Fig. 1). Most peaks in N3 are reproduced in the spectrum of N123, that also reveals the presence of the folded RNA-binding domain N2, in addition to disordered peaks from N1 (intensity of peaks from the first 15 residues of N3 were reduced in N123, probably due to the proximity with N2). The assignment of N2 has been published (Dinesh et al. 2020), and assignment of N1 in N123 (Fig. 2) and N3 in its isolated form and in the context of N123 were accomplished using standard triple resonance approaches. A high percentage of resonances could be assigned in N3 (84% 1HN, 84% 15NH, 82% 13Cα and 76% 13C) and N1 in the context of N123 (94% 1HN, 94% 15NH, 100% 13Cα, 57% 13Cβ and 100% 13C). These assignments have been deposited in the Biological Magnetic Resonance Databank (BMRB ID: 50557, comprising backbone resonance assignment of N3 and 50558, comprising backbone resonance assignment of N1 and of resolved resonances in N3, both in the context of N123).
Secondary structural propensity and dynamics
Resonance assignment of the two domains confirms the disordered nature of N1 and N3 (Figs. 1 and 2). The linker region (N3) connecting the two folded domains (N2 and N4), comprises intrinsically disordered SR-rich and polar termini, flanking a central hydrophobic strand that exhibits a pronounced helical propensity (> 30% from 216 to 224 with near 100% helical population from position 220) (Fig. 3). Assignment of 5 residues at the centre of this region was not possible, possibly due to dimerization mediated by the helix. This region is also predicted to form a hydrophobic helix in SARS-CoV (Chang et al. 2009). Recent simulations (Cubuk et al. 2020) also proposed the presence of a weakly (< 20%) helical motif in the SR region at the N-terminal region of N (176–185), which is not seen experimentally, although there is overlap of predicted helical propensity (< 30% helix predicted to stretch from 213 to 225) in the vicinity of experimentally observed helical propensity. Studies of isolated peptides (Savastano et al. 2020) also suggested a helical propensity in the SR region which is not seen here.
Assignment of the backbone resonances of the N-terminal domain of N (N1) within the N123 construct (Fig. 2) reveals the presence of an intrinsically disordered chain with no detectable secondary structural propensity and only very slight differences around residue 248 compared to free N3. Note that a number of resonances that were visible in N3 were not assigned in N123, most probably due to short relaxation times experienced in the vicinity of the helical motif that limits transfer of magnetization in triple resonance experiments. Nevertheless putative transfer of assignment was proposed on the basis of the 15N–1H correlation spectra of N3 and N123 (starred resonances in Fig. 2). Spin relaxation measured in N123 confirms the highly dynamic nature of N1 and N3 in the context of the folded RNA binding domain N2 (Fig. 3), with relaxation rates in the expected range for a disordered domain (Adamski et al. 2019), with the exception of the helical element in N3 and three consecutive residues in N1 (R14-I15-T16) that both show elevated rates.
In conclusion, NMR backbone assignment and preliminary relaxation studies provide the basis for further NMR studies of this important drug target, providing the tools necessary for the identification of inhibitors and for detailed functional studies of this essential protein.