1H, 13C and 15N chemical shift assignment of the stem-loop 5a from the 5′-UTR of SARS-CoV-2

The SARS-CoV-2 (SCoV-2) virus is the causative agent of the ongoing COVID-19 pandemic. It contains a positive sense single-stranded RNA genome and belongs to the genus of Betacoronaviruses. The 5′- and 3′-genomic ends of the 30 kb SCoV-2 genome are potential antiviral drug targets. Major parts of these sequences are highly conserved among Betacoronaviruses and contain cis-acting RNA elements that affect RNA translation and replication. The 31 nucleotide (nt) long highly conserved stem-loop 5a (SL5a) is located within the 5′-untranslated region (5′-UTR) important for viral replication. SL5a features a U-rich asymmetric bulge and is capped with a 5′-UUUCGU-3′ hexaloop, which is also found in stem-loop 5b (SL5b). We herein report the extensive 1H, 13C and 15N resonance assignment of SL5a as basis for in-depth structural studies by solution NMR spectroscopy.

1 3 in genome replication, transcription of subgenomic (sg) mRNAs and the balanced translation of viral proteins (Madhugiri et al. 2016;Kelly et al. 2020;Tidu et al. 2020). While the development of antiviral therapeutics against COVID-19 is primarily focused on the viral proteins, the highly structured RNA elements provide an extensive reservoir of additional drug targets to be exploited. The architecture of the RNA genome of SCoV2 and related viruses has so far been investigated mainly by sequence-based computational predictions and by chemical probing approaches in vitro and in vivo (e.g. Manfredonia et al. 2020;Rangan et al. 2020). Although structural probing methods have been established to map RNA-small molecule interactions even in cells (Martin et al. 2019), these tools are unable to define the tertiary structure and dynamics of the RNA-elements in the SCoV-2 genome with sufficiently high resolution to enable structurebased drug design by virtual screening.
While the sequences of the individual structural elements vary between different Coronaviruses, their ubiquitous presence and highly conserved secondary structures suggest that these elements are critically important for viral viability and pathogenesis (reviewed in Madhugiri et al. 2016). One example of such an important structure is stem-loop 5 (SL5). SL5 is structurally conserved in the genomes of Alpha-and Betacoronaviruses and has been shown to be crucial for efficient viral replication (Chen and Olsthoorn 2010;Guan et al. 2011).
In SCoV-2, SL5 consists of four helices including nts 149-297 of the 5′-UTR and the first 29 nts of the Nsp1 coding region (Suppl. Figure 1A). Sub-elements are joined to the SL5 basal stem by a four-helix junction. These subelements are termed SLs 5a, 5b and 5c. SL5a consists of 31 nucleotides and represents the largest of the three stemloops. Intriguingly, the apical loop sequences of SL5a and SL5b are identical (5′-UUU CGU -3′) and belong to the 5′-UUY CGU -3′ motif, which is also found in Alphacoronaviruses. This high level of sequence conservation suggests functional importance, e.g. in viral packaging (Masters 2019). Thus, we have recently obtained secondary structure models of SL5a-c and the basal stem segment of SL5 based on initial 1 H and 15 N assignments (Wacker et al. 2020). In order to characterize SL5a further, we provide here a near complete 1 H, 13 C and 15 N chemical shift assignment.

Sample preparation
RNA synthesis for NMR experiments: For DNA template production, the sequence of SL5a together with the T7 promoter was generated by hybridization of complementary oligonucleotides and introduced into the EcoRI and NcoI sites of an HDV ribozyme encoding plasmid (Schürer et al. 2002), based on the pSP64 vector (Promega). RNAs were transcribed as HDV ribozyme fusions to obtain a homogeneous 3′-end. The recombinant vector pHDV-5_SL5a was transformed and amplified in the Escherichia coli strain DH5α. Plasmid-DNA was purified using a large scale DNA isolation kit (Gigaprep; Qiagen) according to the manufacturer's instructions and linearized with HindIII prior to in-vitro transcription using the T7 RNA polymerase P266L mutant, which was prepared as described in (Guillerez et al. 2005). 15 ml transcription reactions [20 mM DTT, 2 mM spermidine, 200 ng/µl template, 200 mM Tris/glutamate (pH 8.1), 40 mM Mg(OAc) 2 , 12 mM NTPs, 32 µg/ml T7 RNA Polymerase, 20% DMSO] were performed to obtain sufficient amounts of SL5a RNA (5′-pppGGG CUG CUU ACG GUU UCG UCC GUG UUG CAG CCC-3′). Preparative transcription reactions (6 h at 37 °C and 70 rpm) were terminated by addition of 150 mM EDTA. SL5a RNA was purified as follows: RNAs were precipitated with one sample volume of ice-cold 2-propanol. RNA fragments were separated on 15% denaturing polyacrylamide (PAA) gels and visualized by UV shadowing at 254 nm. SL5a RNA was excised from the gel and eluted using the following protocol: The gel fragments were granulated in two gel volumes 0.3 M NaOAc solution, incubated for 30 min at − 80 °C, followed by 15 min at 65 °C. The RNA was further eluted from gel fragments overnight by passive diffusion into 0.3 M NaOAc, precipitated with EtOH and desalted via PD10 columns (GE Healthcare). Residual PAA was removed by reversed-phase HPLC using a Kromasil RP 18 column and a gradient of 0-40% 0.1 M acetonitrile/triethylammonium acetate. After freeze-drying of RNA-containing fractions and cation exchange by LiClO 4 precipitation (2% in acetone), the RNA was folded in water by heating to 80 °C followed by rapid cooling on ice. Buffer exchange to NMR buffer (25 mM potassium phosphate buffer, pH 6.2, 50 mM potassium chloride) was performed using Vivaspin centrifugal concentrators (2 kDa molecular weight cut-off). Purity of SL5a was verified by denaturing PAA gel electrophoresis and homogenous folding was monitored by native PAA gel electrophoresis, loading the same RNA concentration as used in NMR experiments.
Using this protocol, two NMR samples of SL5a, an 810 µM uniformly 15 N-and a 680 µM uniformly 13 C, 15 Nlabeled sample, were prepared and used for the assignment presented herein.
At BMRZ and KI, experiments were performed at 298 K if not indicated otherwise. NMR spectra were processed and analyzed using Topspin versions 4.0.8 (GU) and 3.6.2 (KI). The chemical shift assignment was conducted using Sparky (Lee et al. 2015). NMR data were managed and archived using the platform LOGS (2020, version 2.1.54, Signals GmbH & Co KG, www.logs.repos itory .com). 1 H chemical shifts were referenced externally to DSS, and 13 C and 15 N chemical shifts were indirectly referenced from the 1 H chemical shift as described earlier (Wishart et al. 1995).
We have previously reported the imino and cytidine amino resonance assignment of SL5a (Wacker et al. 2020) that allowed us to determine the base pairing in this RNA element. The location of stable base pairs is confirmed by through space 2h J NN coupling constants (Dingley et al. 2008) reported in Suppl. Table S1. These assignments were available from experiments conducted on a 15 N-labeled RNA sample and provided starting points of the aromatic proton resonance assignment using 1 H, 1 H-NOESY (Tables 1 I, (Tables 1 II, 2 III) were assigned using a 3D 13 C-NOESY-HSQC experiment (Table 1 VII), which was selective for the aromatic region. Cytidine and uridine C5-H5 resonances were assigned using 1 H, 1 H-TOCSY (Table 1 VI, Fig. 1e) and 1 H, 13 C-HSQC spectra (Table 1 III Fig. 1d). Furthermore, quaternary carbon atoms were assigned using an HNCO type experiment (  Fig. 1c) linked the aromatic carbons to the anomeric C1′ resonances, where the nitrogen dimension aided in distinguishing between purine and pyrimidine nucleotides as well as between uridines and cytidines. Also, by correlating C6/8 to C1′, resonance overlap is minimized given the broader signal distribution in the carbon as opposed to the respective proton dimensions. Based on C1′ resonances obtained from the CNC spectrum and from sequential assignment in the NOESY spectra, H1′-C1′ correlations were assigned in the 1 H, 13 C-HSQC spectrum (Table 1 III ,Fig. 1f). A continuous sequential walk of H1′-to-H6/H8 was possible for both helices (Fig. 1c). The H1′-C1′ assignment was further confirmed with a 3D 13 C-NOESY-HSQC experiment (Table 1 IX), which was selective for the C1′ resonances. Using two different 3D HCCH TOCSY experiments (Table 1 X, XI and XII), the remaining ribose carbon resonances C2′-C5′ were assigned. The two experiments differed in the TOCSY mixing time such that with a short mixing time of 6 ms, C2′ and C3′ resonances could be distinguished by intensity differences, while with a long mixing time of 18 ms also C4′ and C5′ carbons were correlated to the C1′ resonances.

The U-rich bulge
One of the structural features of the SL5a RNA is an asymmetric U-rich bulge (Fig. 1c). In this likely more dynamic part of the RNA, a near to complete sequential walk (H6/8 to H6/8 or H1′) was possible and thus, all aromatic H6/8-C6/8 correlations were assigned. With the aromatic assignment at hand, the strong imino resonance of a uridine involved in non-canonical base pairing was assigned to residue U194 using the (H)C(CCN)H experiment at 283 K. From observation of this signal, the formation of a base pairing involving U194 and likely either U211 or U212 is suggested. This is further supported by an imino-to-imino NOE contact between U194 and a non-canonical uridine at 273 K. Furthermore, from the U194 carbon chemical shifts in the HNCO experiment, we conclude that the hydrogen bonding interaction is mediated through the C2 carbonyl group (Fürtig et al. 2003;Ohlenschläger et al. 2004). The existence of a GU-wobble base pair involving residues U195 and G210 has not been confirmed, yet. However, broadened imino proton resonances for an additional guanosine and uridine, which are taking part in non-canonical interactions, are observed at low temperature (283 K).
Here, it is evident that the chemical shifts of the central two nucleotides of the 5′-UUU CGU -3′ hexaloop, U202 and C203, are in good agreement with the respective counterparts in the 5′-cUUCGg-3′ tetraloop. This observation is also reflected in the canonical coordinates (Ebrahimi et al. 2001;Cherepanov et al. 2010), which suggest the ribofuranosyl ring to adopt the C2′-endo conformation for U202 and C203, while the remaining nucleotides (with a complete ribose carbon assignment) adopt the canonical C3′-endo conformation (Fig. 2c). These spectral data suggest a structural similarity between the middle part of the 5′-UUU CGU -3′ hexa-and 5′-cUUCGg-3′ tetraloop. This might not hold true to the same extent for the flanking residues U201 and G204 as characteristic resonances are absent in the 1 H, 13 C-HSQC Experimental parameters and experiment-specific parameters are given ns number of scans, sw spectral width, aq acquisition time, o1/2/3 carrier frequencies on channels 1/2/3, rel. delay relaxation delay, CT constant time, JR jump-return

Compliance with ethical standards
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/.

Fig. 2
Comparison of 1 H, 13 C-CT-HSQC spectra of the ribose regions of a SL5a and b a 14 nt RNA with 5′-cUUCGg-3′ tetraloop (Fürtig et al. 2004;Nozinovic et al. 2010). Positive contours are given in black, negative contours in red. Experimental details are given in