1H, 13C and 15N chemical shift assignment of the stem-loops 5b + c from the 5′-UTR of SARS-CoV-2

The ongoing pandemic of the respiratory disease COVID-19 is caused by the SARS-CoV-2 (SCoV2) virus. SCoV2 is a member of the Betacoronavirus genus. The 30 kb positive sense, single stranded RNA genome of SCoV2 features 5′- and 3′-genomic ends that are highly conserved among Betacoronaviruses. These genomic ends contain structured cis-acting RNA elements, which are involved in the regulation of viral replication and translation. Structural information about these potential antiviral drug targets supports the development of novel classes of therapeutics against COVID-19. The highly conserved branched stem-loop 5 (SL5) found within the 5′-untranslated region (5′-UTR) consists of a basal stem and three stem-loops, namely SL5a, SL5b and SL5c. Both, SL5a and SL5b feature a 5′-UUUCGU-3′ hexaloop that is also found among Alphacoronaviruses. Here, we report the extensive 1H, 13C and 15N resonance assignment of the 37 nucleotides (nts) long sequence spanning SL5b and SL5c (SL5b + c), as basis for further in-depth structural studies by solution NMR spectroscopy. Supplementary Information The online version contains supplementary material available at 10.1007/s12104-021-10053-4.

severe acute respiratory syndrome (SARS) causing agent SARS-CoV. Betacoronaviruses have large positive sense, single-stranded RNA genomes, with highly conserved 5′and 3′-untranslated regions (UTRs) that do not code for viral proteins. These structured UTRs are highly conserved among Betacoronaviruses and are important for the replication, balanced transcription of subgenomic mRNAs and translation of viral proteins. (Yang and Leibowitz 2015) So far, most efforts for the development of new antiviral drugs target the proteins of SARS-CoV-2. The structured regulatory elements of the approx. 30,000 nucleotides (nts) long RNA genome remain unexploited as potential target sites for antiviral drugs. Between different Coronaviruses, the sequence of the individual elements varies, but their secondary structures reveal remarkably high conservation, suggesting a critical importance for viral viability and pathogenesis. (Madhugiri et al. 2016) Until now, a large number of sequence-based computational predictions and different chemical probing approaches have been reported to map the architecture of these viral RNA elements. (Zhao et al. 2020;Huston et al. 2021;Lan et al. 2021;Manfredonia and Incarnato 2021;Rangan et al. 2021) However, to establish the viral RNA as an antiviral drug target, high-resolution structural data are important that can also visualize structural dynamics and tertiary structure interactions. In response to the pandemic situation, the international COVID19-NMR initiative (https:// covid 19-NMR. de) has set the goal to provide this information by solution NMR, in order to initiate and guide structure-based drug screening, design and synthesis. The structured parts of the SARS-CoV-2 genome have been divided into fragments in a 'divide and conquer' approach, allowing us to determine the secondary structures of these RNA elements. (Wacker et al. 2020) Further, fragment screening campaigns demonstrated that the RNA structural elements can be targeted differentially, revealing low micromolar binding affinities specific to molecules of low molecular weight. (Sreeramulu et al. 2021).
An intriguing example of an RNA regulatory element from SARS-CoV-2 is the comparably large structural element of SL5 spanning nts 149-265. The entire SL5 element consists of four helices, joining three sub-elements with stem loop motifs to the SL5 basal stem by a four-way junction. These sub-elements are termed SLs 5a, 5b and 5c. Interestingly, SL5 is forming junction-connected elements in the genomes of both Alpha-and Betacoronaviruses. (Madhugiri et al. 2014(Madhugiri et al. , 2016 The regulatory function of SL5 has been linked to maintaining efficient viral replication. (Chen and Olsthoorn 2010;Guan et al. 2011) In SL5b, an apical hexaloop sequence is found that is identical to the loop in SL5a (5'-UUU CGU -3′). Similar loop sequences with 5′-UUY CGU -3′ motifs can also be found in members of the Alphacoronavirus genus, suggesting a conserved function e.g. in viral packaging. (Masters 2019) Interestingly, currently available sequencing data for new SCoV-2 variants emerging since March 2020 show that the 5′-UUU CGU -3′ loop in SL5a remains conserved compared to the original virus strain, while a C241U mutation resulting in a 5′-UUU UGU-3′ loop appeared in SL5b.
In SCoV2, SL5 contains the first 29 nts of the open reading frame ORF 1a/b that codes for nsp1, the first of the nonstructural proteins (Fig. 1) including the start codon A266 to G268, suggesting that the complex structural arrangement in SL5 is important for translation initiation. The SL5b stem-loop contains nucleotides 228 to 252 (25 nts), while the downstream located SL5c consists of 10 nts, 253 to 262. We report here the NMR chemical shift assignments Fig. 1 A Schematic overview of 5′-UTR RNA elements of the SCoV2 genome. Black: SL5 element; AUG start codon and the 5′-terminal structural elements of the open reading frame ORF1a/b are highlighted in grey. B Elements used for the NMR-based divide-andconquer approach. C Predicted secondary structures of RNA (sub-) elements used for the NMR chemical shift assignment of SL5b + c reported here. Genomic region, numbering and sample titles are given. B/C Black regions according to genomic sequence, grey regions contain stabilizing nucleotides. The actual investigated RNAs are represented by the sequences including the grey regions 1 H, 13 C and 15 N chemical shift assignment of the stem-loops 5b + c from the 5′-UTR of SARS-CoV-2 for SL5b + c (nts 227-263) containing both stems, which was aided by assigning the isolated SL elements based on initial 1 H and 15 N assignments of all sub-elements of SL5 (a-c) and the basal stem. (Wacker et al. 2020) More recently, we reported the chemical shift assignments including 13 C chemical shifts for SL5a. (Schnieders et al. 2021).

Sample preparation
RNA synthesis for NMR experiments: For DNA template production, the sequences of SL5b + c (genomic nucleotides 227 to 263) and SL5b_GC, (5′-G-(genomic 227 to 252)-CC-3′) (Fig. 1C), together with the T7 promoter were generated by hybridization of complementary oligonucleotides and introduced into the EcoRI and NcoI sites of a plasmid, based on the pSP64 vector (Promega) encoding an HDV ribozyme (Schürer et al. 2002). RNAs were transcribed as HDV ribozyme fusions to obtain homogeneous 3′-ends. The recombinant vectors pHDV-5_SL5b + c and pHDV-5_ SL5b_GC were transformed and amplified in the Escherichia coli strain DH5α. Plasmid-DNA was purified with a large scale DNA isolation kit (Gigaprep; Qiagen) according to the manufacturer's instructions and linearized with Hin-dIII prior to in-vitro transcription by T7 RNA polymerase [P266L mutant, prepared as described in (Guillerez et al. 2005;Schnieders et al. 2020)]. 15 ml transcription reactions [20 mM DTT, 2 mM spermidine, 200 ng/µl template, 200 mM Tris/glutamate (pH 8.1), 40 mM Mg(OAc) 2 , 12 mM NTPs, 32 µg/ml T7 RNA Polymerase, 20% DMSO (b + c) or 0% DMSO (b_GC)] were performed to obtain sufficient amount of RNA. Preparative transcription reactions (6 h at 37 °C and 70 rpm) were terminated by addition of 150 mM EDTA. SL5b + c and 5SL5b_GC RNAs were purified as follows: RNAs were precipitated with one volume of 2-propanol at − 20 °C overnight. RNA fragments were separated on 12-15% denaturing polyacrylamide (PAA) gels and visualized by UV shadowing at 254 nm. SL5b + c and SL5b_GC containing RNA bands were excised from the gel and then incubated for 30 min at − 80 °C, followed by 15 min at 65 °C. The elution was done overnight by passive diffusion into 0.3 M NaOAc, precipitated with EtOH and desalted via PD10 columns (GE Healthcare). Residual PAA was removed by reversed-phase HPLC using a Kromasil RP 18 column and a gradient of 0-40% 0.1 M triethylammonium acetate in acetonitrile. After freeze-drying of RNA-containing fractions and cation exchange by LiClO 4 precipitation (2% in acetone), the RNA was folded in water by heating to 80 °C followed by rapid cooling on ice. Buffer exchange into NMR buffer (95% H 2 O/5% D 2 O, 25 mM potassium phosphate buffer, pH 6.2, 50 mM potassium chloride) was performed using Vivaspin centrifugal concentrators (2 kDa molecular weight cut-off). The purity of SL5b + c and SL5b_GC was verified by denaturing PAA gel electrophoresis and homogeneous folding was monitored by native PAA gel electrophoresis, loading the same RNA concentration as used in NMR experiments. 100% D 2 O samples were prepared by lyophilisation and redissolving in identical volumes of pure D 2 O to keep the buffer salt concentration constant.
Using this protocol, three NMR samples of SL5b + c were prepared: a 350 µM uniformly 15 N-and two uniformly 13 C, 15 N-labelled samples (750 µM in buffer with 95% H 2 O/5% D 2 O and 430 µM in buffer with 100% D 2 O). In addition, two uniformly 13 C, 15 N-labelled samples of SL5b_GC were prepared: an H 2 O sample at a concentration of 510 µM in buffer with 95% H 2 O/5% D 2 O and a D 2 O sample at a concentration of 300 µM in buffer with 100% D 2 O. For the divide-and-conquer approach, the unlabelled single stem-loop 5c (Fig. 1C) was purchased from Horizon Discovery LTD (Cambridge, UK). The sample was processed by reversed-phase HPLC identical to the in-vitro transcribed samples. LiClO 4 precipitation, buffer exchange, folding check and sample preparation was performed as mentioned above (1.1 mM in NMR buffer with 95% H 2 O/5% D 2 O, 1.0 mM in NMR buffer with 100% D 2 O).
Experiments were performed in a temperature range spanning 274 to 298 K. NMR spectra were processed and analysed using Topspin (versions 3.6.2 to 4.1.1), and chemical shift assignment was conducted using Sparky. (Lee et al. 2015) NMR data were managed and archived using the platform LOGS (2020, version 2.1.54, Signals GmbH & Co KG, www. logs. repos itory. com). 1 H chemical shifts were referenced externally to DSS and 13 C and 15 N chemical shifts were indirectly referenced from the 1 H chemical shift as previously described. (Wishart et al. 1995).

Assignment and data deposition
The imino, aromatic and ribose resonances of the SL5b + c were assigned using a 13 C, 15 N-labelled SL5b_GC sample and an unlabelled SL5c model RNA (SL5c) (SI Fig. 1).
The latter experiment was also used to confirm assigned shifts for aromatic H6/H8 and H1′ found in a 3D-NOESY-HSQC and 3D-HCN. Additional carbons C4, C5 and C6 for adenosine were assigned using a 3D TROSY-HCCH-COSY. Nucleobase intra and sequential aromatic-to-ribose H1′ correlations were successfully detected for the predicted helical stem part from G-2 to U238 and from U243 to C + 2. The H1′ assignments were confirmed by a 3D-NOESY-HSQC leading to almost complete H1′ assignments with the help of a 1 H, 13 C-HSQC for the H1′-C1′ region ( Fig. 2A). From here, a 1 H, 13 C-ct-HSQC and 3D-HCCH-TOCSY's with different mixing times gave further insight to the CH ribose resonance shifts. A canonical shift analysis for the sugar puckers of the almost completely assigned C1′ to C5′ resonances showed a C3′-endo conformation except for the bulge and loop nucleotides (Fig. 3). Additionally, chemical shift assignments for the nitrogens N1 or N9 could be assigned in the 1 H, 15 N-HCN experiment.
The assignments obtained for the SL5b_GC sample were transferred to 1 H, 15 N-TROSY, 1 H, 1 H-NOESY, 1 H, 13 C-HNCO and 1 H, 13 C-HSQC spectra of SL5b + c (Table 1I to  IV), showing a fit of the shifts of SL5b_GC from nucleotides  Table 2) 1 H, 13 C and 15 N chemical shift assignment of the stem-loops 5b + c from the 5′-UTR of SARS-CoV-2 1 3 ranging from C230 to G250. Small chemical shift differences are in line with the differences in primary chemical structure.
The assigned imino resonances of the SL5c sample provided a starting point for the aromatic proton resonances assignment of the 1 H, 1 H-NOESY experiment (for experimental data see SI Table 2). The sequential walk was only interrupted by missing cross peaks between G256-H8 to A257-H8 in the loop region in both the 95% H 2 O and the D 2 O (Fig. 4A) samples in NMR buffer. Using 1 H, 1 H-TOCSY and 1 H, 13 C-HSQC ( Fig. 4B and C) the aromatic resonances of protons H2, H6, H8 and H5 as well as their corresponding carbons except for U262-C5 were obtained. Using a 1 H, 1 H-NOESY recorded for a sample diluted in D 2 O (Fig. 4A), the H1′,C1′ ribose resonances were assigned by the analysis of intra-nucleotide and sequential NOEs. Similar to this, the H2′-C2′ assignments were obtained by identification of H1′-H2′ intra-nucleotide and sequential NOEs.

The 5′-UUU CGU -3′ hexaloop in SL5b
The entire SL5 motif (Fig. 1A) spans three stem-loop elements. Interestingly, SL5a and SL5b both possess the same 5′-UUU CGU -3′ hexaloop consisting of nucleotides [238][239][240][241][242][243]. While in SL5a, the loop closing stem is formed by at least three base pairs, the hexaloop of SL5b is closed by a stem consisting of only two GC base pairs, preceded by a bulge at residue A235. This bulged A235 shows downfield shifted signals for the aromatic H2 and H8 in the aromatic 1 H, 13 C-HSQC, as typically observed for non-stacked purines. (Aeschbacher et al. 2013) Starting from H8 of residue G237, assignment of the ribose H1′ and aromatic H6/H8 was obtained by a sequential walk obtained in an amino nitrogen filtered NOESY (x-filter-NOESY). Loop residues U240 and C241 show characteristic C1′-H1′ shifts in the 1 H, 13 C-HSQC, which reveal a fingerprint characteristic for the hexaloop. The overall resonance assignment of the hexaloop is in excellent agreement with the observations in SL5a. This loop arrangement is similar to a UUCG-tetraloop, for which detailed structural restraints are available. (Fürtig et al. 2004;Nozinovic et al. 2010).
Remarkably, available sequencing data for observed mutations in SARS-CoV2 (Hadfield et al. 2018;Cao et al. 2021), show significantly different vulnerability for mutations for the two hexaloop sequences. While the SL5a loop sequence remained mostly conserved until recently, in SL5b mutation C241U appeared in variants emerging since March 2020. A first study on mutational frequency indicates, among others, high C-to-U mutation rates in the SCoV2 genome. (Mourier et al. 2021) With the most recent mutation at SL5a C203U, both hexaloop sequences change to 5′-UUU UGU -3′. With the chemical shifts provided here, the delineation of structural differences for mutant versions of SCoV2 from changes in chemical shifts can be monitored by NMR spectroscopy.

The GAAA-tetraloop of SL5c
The GAAA-tetraloop of the SL5c stem consists of G256 to A259, closed by two GC and one AU base pairs. All three G N1-H1 resonances observed for the shorter construct were superimposable in the 1 H, 15 N TROSY spectrum of SL5b + c. The chemical shifts observed in SL5b + c are in agreement with chemical shifts for a GAAA tetraloop (Jucker et al. 1996). Particularly, the ribose H1′ shift of ~ 3.5 ppm of the guanosine residue in the loop closing base pairing is characteristic. 31 P 1D data support the formation of a typical GAAA-tetraloop (SI Fig. 3, (Legault and Pardi 1994)). Legault and Pardi detected shows a stabilization of the G imino 1 H (corresponding to G256 in our construct) by interaction with a phosphate oxygen in the backbone for a GAAA tetraloop. While we could not assign crosspeaks between the loop guanosine (G256) and the loop adenosine (A259) that would have confirmed the reported loop geometry in SL5c, we found an additional imino crosspeak assigned to G256. In addition to the imino and amino proton assignment, which is consistent with the published chemical shifts for SL5b + c (Wacker et al. 2020), complete assignments of the aromatic C-H resonances as well as the ribose resonances C1′-H1′ and C2′-H2′ were obtained for element SL5c. While in SL5c imino signals of base pairs were observable at 298 K, a vanishing of these signals as well as appearance of additional  (Cherepanov et al. 2010). Data points are annotated by base numbering as used in the RNA secondary structure scheme on the right. Blue (both in the graph and the secondary structure) highlights residues with non-C3′-endo conformation or deviations in exocyclic torsion angle γ H, 13 C and 15 N chemical shift assignment of the stem-loops 5b + c from the 5′-UTR of SARS-CoV-2 1 3 signals and shifting of signals in the aromatic region of SL5c was noticed within the larger SL5b + c context. Thus, the SL5c stem is more stable and opens only at higher temperatures in the full-length construct (SI Fig. 4).

Summary
We herein present the 1 H, 13 C, 15 N chemical shifts of SL5b + c, using two fragments, SL5b_GC and SL5c, that subsequently allowed assignment of the SL5b + c element. The assignments of the sub-constructs were used as starting points for advancing the SL5b + c assignment. For the combined construct, an overall assignment was conducted at temperatures of 274 to 298 K. 95% of the aromatic H6-C6 and H8-C8 resonances were assigned for SL5b + c as well as the 8 adenosine H2-C2, and 17 uridine and cytidine H5-C5. Carbonyl and other quaternary carbon atoms of the nucleobases were partly assigned: in purines (C2: 75%, C4: 15% and C6: 50%) and pyrimidines (C2: 29% and C4: 24%). The assignment of the nitrogen atoms includes 70% N1 for purines and 65% N3 for pyrimidines, which represent mostly those involved in hydrogen bonding interactions. With the assignment transfer 90% of H1′-C1′ chemical shift assignment was obtained. Further ribose resonances for H2′ to H5″ and C2′ to C5′ are partially assigned in ranges of 10 to 30%. In summary, an assignment of 65% of the 1 H, 53% of the 13 C and 63% of the 15 N atoms in the nucleobases of SL5b + c has been achieved.

Conflict of interest
The authors declare no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.