1H, 13C, and 15N backbone chemical shift assignments of the nucleic acid-binding domain of SARS-CoV-2 non-structural protein 3e

The ongoing pandemic caused by the Betacoronavirus SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus-2) demonstrates the urgent need of coordinated and rapid research towards inhibitors of the COVID-19 lung disease. The covid19-nmr consortium seeks to support drug development by providing publicly accessible NMR data on the viral RNA elements and proteins. The SARS-CoV-2 genome encodes for approximately 30 proteins, among them are the 16 so-called non-structural proteins (Nsps) of the replication/transcription complex. The 217-kDa large Nsp3 spans one polypeptide chain, but comprises multiple independent, yet functionally related domains including the viral papain-like protease. The Nsp3e sub-moiety contains a putative nucleic acid-binding domain (NAB) with so far unknown function and consensus target sequences, which are conceived to be both viral and host RNAs and DNAs, as well as protein-protein interactions. Its NMR-suitable size renders it an attractive object to study, both for understanding the SARS-CoV-2 architecture and drugability besides the classical virus’ proteases. We here report the near-complete NMR backbone chemical shifts of the putative Nsp3e NAB that reveal the secondary structure and compactness of the domain, and provide a basis for NMR-based investigations towards understanding and interfering with RNA- and small-molecule-binding by Nsp3e.


Biological context
SARS-CoV-2, the cause of the early 2020 pandemic accompanied by the respiratory disease called COVID-19, is the latest representative of the coronaviridae family, which also comprises the 2002 first generation SARS-CoV and the Middle East Respiratory Syndrome (MERS)-CoV. The severe velocity of virus spread, based on its unexpectedly high infectivity, demands for a rapid action towards both the development of a vaccine and potent viral inhibitors to weaken or eliminate symptoms that are a major life-thread, especially to older generations worldwide.
The almost 30-kb enveloped positive-sense singlestranded RNA of SARS-CoV-2 represents one of the largest known viral genomes. Contained therein are possible 14 open reading frames (ORFs) that encode for up to 30 transcripts, the majority of which have been proven at protein level (Gordon et al. 2020). Within the highly conserved proteins of Betacoronaviruses (Yoshimoto 2020), the ORF1a/b-encoded non-structural proteins (Nsp) 1-16 assemble the replication/transcription complex which comprises an incompletely understood network of viral-viral and host-viral protein-protein and RNA-protein interactions. Besides the structural Spike protein, important for viral entry, it is a set of non-structural proteins that represent the canonical protein drug targets, among them the two proteases Nsp5 (Mpro) and Nsp3d (PLpro), the Nsp3b ADP-ribose-phosphatase macrodomain, and the Nsp7/8/12 RNA-dependent RNA polymerase complex.
Nsp3, the largest Nsp (Snijder et al. 2003), is one of the most enigmatic coronavirus proteins as it is composed of a plethora of functionally related, yet independent subunits. After cleavage of Nsp3 from the full-length ORF1-encoded polypeptide chain, it displays a 1945-residue multi-domain protein, with individual functional entities that are subclassified from Nsp3a to Nsp3e followed by the ectodomain embedded in two transmembrane regions and the very C-terminal CoV-Y domain. Nsp3e is unique to Betacoronaviruses and consists of a nucleic acid-binding domain (NAB) and the so-called group 2-specific marker (G2M) (Neuman et al. 2008). Structural information is rare; while the G2M is predicted to be intrinsically disordered (Lei et al. 2018), the only available experimental structure of the Nsp3e NAB was solved from SARS-CoV by the Wüthrich lab using solution NMR (Serrano et al. 2009). The SARS-CoV Nsp3e NAB was shown to bind G-rich ssRNA and to possess DNAunwinding capability (Neuman et al. 2008), while its precise function and well-defined consensus target sequences have remained unknown. Seeing its specific appearance, Nsp3e thus represents a potential drug target for both the current as well as potential future Betacoronavirus epidemic waves.
The 2020 founded research consortium covid19-nmr seeks to rapidly and publicly support the search for antiviral drugs using an NMR-based screening approach which requires the broad production of all drugable proteins and RNAs and their as comprehensive as possible assignment of NMR resonances, and eventually the determination of structures to be used in rational drug design. We here provide the near-complete backbone assignment of the SARS-CoV-2 Nsp3e NAB and thereby enable its exploitability in followup applications, such as residue-resolved drug screening and interaction mapping.

Construct design
This study uses the SARS-CoV-2 NCBI reference genome entry NC_045512.2, identical to GenBank entry MN908947.3 (Wu et al. 2020). The definition of domain boundaries for the Nsp3e NAB was guided by the available NMR structure (PDB 2K87) of its closest homologue, i.e. Nsp3e from the 2002 first generation SARS-CoV (Serrano et al. 2009), sharing 82% sequence identity. Based on the sequence alignment of the entire SARS-CoV-2 Nsp3e with SARS-CoV Nsp3e and consideration of flexible overhangs observed in the structure, we defined our expression construct to span amino acids 1088-1203 counting the overall Nsp3 primary sequence. A codon-optimized expression construct of SARS-CoV-2 Nsp3e NAB was obtained from GenScript Biotech (Netherlands), inserted into the pET3bbased vector pKM263, containing an N-terminal His 6 -tag, a GST-tag and a tobacco etch virus (TEV) cleavage site. Due to the nature of the TEV cleavage site, the produced protein contained four artificial N-terminal residues (Gly-3, Ala-2, Met-1 and Gly0) after cleavage, before the original protein sequence starts with Tyr1 according to Tyr1088 in the fulllength Nsp3 sequence.
Sequence-specific assignments of tryptophan side chain 1 H ε1 / 15 N ε1 resonances were obtained with a [ 15 N, 1 H]-BEST-TROSY version of the HN(CDCG)CB experiment (Lohr and Ruterjans 2002) with proton pulses centered at 10 ppm and covering a bandwidth of 4 ppm. A slowly exchanging histidine imidazole 1 H Nε2 resonance was assigned using a 2D BEST-TROSY-H(NCDCG)CB version with the magnetization transfer pathway adapted to histidine side chains and proton pulses centered at 12 ppm (Andersson et al. 1998). The 15 N heteronuclear NOE experiment was performed as an interleaved pseudo-3D TROSY version (Lakomek et al. 2012) using 256 indirect complex points. All NMR experiments were carried out at a sample temperature of 25 °C using Bruker Avance spectrometers of 600, 700 and 950 MHz proton Larmor frequency, equipped with cryogenic z-axis gradient probes. Data acquisition and processing was undertaken using Topspin versions 3 and 4. Cosine-squared window functions were applied for apodization in all dimensions. Spectra were referenced with respect to internal DSS and for 13 C and 15 N as described in (Wishart et al. 1995).

Assignments and data deposition
All assignments of the Nsp3e NAB were performed using the CCPNMR analysis 2.4 software suite (Vranken et al. 2005) and the program Sparky (Lee et al. 2015).
The Nsp3e NAB 1 H, 15 N-HSQC shown in Fig. 1 shows an excellent peak dispersion. Of note, we obtained a yet better resolved amide correlation spectrum at 950 MHz proton frequency; however, we found some resonances exchange broadened and only visible at lower field strength, e.g. Phe23. For convenience, residues were numbered starting with 1 on Tyr1088. The overall high quality of all spectra allowed the assignment of > 98% of all backbone amides within the natural sequence (Tyr1-Thr116, according to Tyr1088-Thr1203), all Trp and Gln sidechain amides, and 3 out of 10 Asn sidechain amides (17,90,101). The assignments are in good agreement with the previously published assignments of the 2002 SARS-CoV Nsp3e NAB 1066 − 1181 (Serrano et al. 2008), which reflects the high sequence similarity (Yoshimoto 2020). Only two residues of the natural sequence (Asn22 and Ser73, both likely in flexible loop regions) could not be assigned in their backbone amides due to obvious line-broadening beyond detectability, which notably had also been observed for the SARS-CoV Nsp3e (Serrano et al. 2008). For amino acids Glu5, Ile7, Asn8, Asp57, Leu58, and Val114 we observed a second, minor conformation based on the preceding prolines with both cis and trans isomers present.
To assess the overall compactness of the NAB and internal dynamics, we recorded hetNOE data (Fig. 2a) as a function of the primary sequence. For residues 8-109, hetNOE values of 0.65 or higher were measured indicating an overall rigid structure of the protein. No regions of increased flexibility were observed except for the two termini (residues 1-7 and 110-116). We also calculated carbon secondary chemical shifts based on the chemical shifts of C α and C β (Fig. 2b bottom) relative to random coil values essentially as described by (Wishart and Sykes 1994). Four consecutive residues with significant negative or positive shifts were used to define either β-strands or α-helices, respectively. Our data suggest a ββαββαββα-fold, which is in agreement with the structure of its homologue from SARS-CoV (Fig. 2b  top). While all secondary structure elements well align between the two homologues, helix-2 -according to our data -is shorter and directly connects to β-strand 4. The very terminal residues do not display secondary structure content, which is in line with the increased flexibility observed in the hetNOE experiment. Our data thus suggest that the NAB of SARS-CoV-2 Nsp3e resembles a similar structure as the SARS-CoV Nsp3e (Serrano et al. 2009). Our determined NMR resonance assignments and spectral quality clearly prove the Nsp3e NAB drugability and will now pave the way towards a solution structure, RNA-and protein interaction studies, and residue-resolved high-throughput drug screening as a crucial contribution to the initiative of screening all potential SARS-CoV-2 proteins.  (Vranken et al. 2005) based on the respective signal-to-noise of spectra. No values are shown for Asn22 and Ser73 (missing assignments) and Phe23, His82 and Lys95 due to large relative errors based on the overall low peak intensities of these amides. Additional gaps derive from prolines. b SCS are interpreted towards their underlying secondary structure as shown above the panel (experimental) and when compared to the SARS-CoV Nsp3e homologue structure (Serrano et al. 2008(Serrano et al. , 2009) from PDB entry 2K87. α-helices are shown with red bars, β-strands with blue arrows, respectively. Light colors indicate the presence of elements with imperfect geometry in the structure or merely tentative secondary chemical shifts 1 H, 13 C, and 15 N backbone chemical shift assignments of the nucleic acid-binding domain… 1 3 BioMagResBank (https ://www.bmrb.wisc.edu) under accession number 50334. Spectral raw data (upon request) and assignments are also accessible through https ://covid 19-nmr.de.

Conflict of interest The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.