Detecting RNA-RNA interactions in E. coli using a modified CLASH method
- 1.6k Downloads
Bacterial small regulatory RNAs (sRNAs) play important roles in sensing environment changes through sRNA-target mRNA interactions. However, the current strategy for detecting sRNA-mRNA interactions usually combines bioinformatics prediction and experimental verification, which is hampered by low prediction accuracy and low-throughput. Additionally, among the 4736 sequenced bacterial genomes, only about 2164 sRNAs from 319 strains have been described. Furthermore, target mRNAs of only 157 sRNAs have been uncovered. Obviously, highly efficient methods were required to detect sRNA-mRNA interactions in the sequenced genomes. This study aimed to apply a modified CLASH (cross-linking, ligation and sequencing hybrids) method to detect RNA-RNA interactions in E. coli, a model bacterial organism.
Statistically significant interactions were detected in 29 transcript pairs. To the best of our knowledge, 24 pairs were reported for the first time and were novel RNA interactions, including tRNA-tRNA, tRNA-ncRNA (non-coding RNA), tRNA-rRNA, rRNA-mRNA, rRNA-ncRNA, rRNA-rRNA, rRNA-IGT (intergenic transcript), and tRNA-IGT interactions.
Discovery of novel RNA-RNA interactions in the present study demonstrates that RNA-RNA interactions might be far more complicated than ever expected. New methods may be required to help discover more novel RNA-RNA interactions. The present work describes a high-throughput protocol not only for discovering new RNA interactions, but also directly obtaining base-pairing sequences, which should be useful in assessing RNA structure and interactions.
KeywordsRNA-RNA interaction High-throughput sequencing Bacteria
Cross-linking, ligation and sequencing hybrids
High-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation
In vivo PAR-CLIP
Photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation
Small regulatory RNA
RNA-RNA interactions (RRIs) play important roles in multiple biological processes. For example, mounting evidence suggests that miRNA-mRNA interactions in eukaryotes and sRNA-mRNA interactions in bacteria can exert posttranscriptional regulation of gene expression [1, 2, 3]. In addition, snoRNAs can guide chemical modifications of rRNAs through snoRNA-rRNA interactions . Furthermore, many other types of RRIs have been found, and include miRNA-lncRNA, lncRNA-mRNA, lncRNA-lncRNA, snoRNA-mRNA, and scaRNA-rRNA interactions [4, 5, 6, 7]. Therefore, detecting RRIs, especially transcriptome-wide RRIs, is an important strategy to understand RNA functions and related biological processes. To this end, many bioinformatics and experimental approaches have been developed.
The current bioinformatics methods for predicting RRIs take RNA sequences as input, and can be divided into two classes . The first class comprises general models for the prediction of RRIs. For example, the earliest methods, e.g. RNAfold  or Mfold , detect RRIs by predicting the secondary structure of a combined RNA sequence, which is composed of two RNAs to be studied. Base-pairing regions between the two RNAs can demonstrate their interaction. Meanwhile, complex models such as RNAcofold  and RNAplex  have been developed. These models can be applied to detect binding sites between two RNA molecules, but cannot be applied to determine whether two RNAs interact directly or not. The second class is specially designed for particular RNA types such as miRNAs or bacterial sRNAs. For instance, multiple models have been developed for miRNA target prediction [7, 9, 10]: TargetScan , PicTar , PITA , rna22 , and RNAhybrid . In addition, there are programs designed for sRNA-target mRNA prediction, including IntaRNA , CopraRNA , RNApredator , TargetRNA2 , sRNATarget , and sTarPicker . These models often provide candidate interactions for experimental validation. However, the main shortcoming of such models is the high false positive rate [7, 8]. To overcome this, high-throughput sequencing (HITS)-based protocols have been developed to detect RRIs.
An early strategy is the high-throughput sequencing of RNAs isolated by crosslinking immunoprecipitation(HITS-CLIP), which was developed to decode miRNA-mRNA interactions in the mouse brain . At first, two HITS datasets, Ago-miRNA and Ago-mRNA, were generated respectively. Then, bioinformatics methods were developed to predict miRNA-mRNA interactions. Photoactivatable-ribonucleoside-enhanced crosslinking and immunoprecipitation (PAR-CLIP) was next developed ; in this method, 4-thiouridine is used to introduce thymidine to cytidine transitions during cDNAs library preparation. The information of transitions could be used to determine miRNA target sites. However, the recently-developed CLASH (cross-linking, ligation and sequencing hybrids) or iPAR-CLIP (in vivo PAR-CLIP) method is more effective in directly detecting RRIs. The key idea behind the CLASH or iPAR-CLIP is to identify chimeric reads formed by RRIs. So far, the CLASH method has been applied to identify snoRNA-rRNA interactions in yeast , sRNA-RNA interactions in bacteria [24, 25], and miRNA-mRNA interactions in humans . Additionally, the iPAR-CLIP method was applied to assess miRNA-target interactions in C. elegans . However, these methods used specific proteins to detect RRIs only associated with them, which did not cover the whole RNA-RNA interactome. This study aimed to apply a modified CLASH strategy to assess all RRIs in E. coli, it doesn’t matter if they are associated with other partner molecules such as proteins. To the best of our knowledge, CLASH or similar methods have not been applied to detect transcriptome-wide whole RRIs in prokaryotes.
UV cross-linking of living cells treated with AMT
Escherichia coli K12 MG1655 cells were centrifuged and washed with PBS, resuspended with PBS at a density of 5 × 109 cells/ml and incubated on ice. AMT (Sigma) was added to treat the cells at a concentration of 0.3 mg/ml, on ice for 10 min. Then, the cells were kept on ice and subjected to UV irradiation at 365 nm with an intensity of 10 mW/cm2 six times (10 min each); cells were shaken well before irradiation.
Cell lysis and RNA extraction
After cross-linking, cells were washed twice with PBS. Lysozyme solution (TIANGEN) and 10% SDS (Sigma) were added for cell lysis at 64 °C for 2 min. Lysates were cooled to 4 °C. RNA was extracted by the acid guanidinium thiocyanate-phenol-chloroform extraction method . DNA contamination, if any, was eliminated using DNase I (NEB), which was deactivated by heating to 90 °C for 2 min.
RNase T1 digestion
RNAs were trimmed with RNase T1 (Invitrogen) for 1 h.
RNase H digestion
20-mer oligo-deoxy-ribonucleotides and buffer were added. The mixture was heated to 90 °C for 2min and cooled to room temperature naturally. RNase H (Thermo Scientific) was added for RNA digestion in DNA/RNA duplexes for 1 h. After 3 repeats, the oligonucleotides were removed by DNase I (NEB).
RNA size selection
RNAs were resolved on 10% urea polyacrylamide gels. The bands corresponding to 40-100 nt were cut out and recovered using a ZR small-RNA PAGE recovery kit (Zymo research).
The recovered RNAs were incubated in a dephosphorylation mixture containing 8 U FastAP thermosensitive alkaline phosphatase (Thermo Scientific, EF0651) and 40 U RNase inhibitors in polynucleotide kinase (PNK) buffer for 45 min at 20 °C.
RNA 5’ end phosphorylation and intramolecular ligation
RNAs were subsequently phosphorylated with 10 U of T4 polynucleotide kinase in PNK buffer (TAKARA) for 30 min at 37 °C. Cross-linked RNA molecules were then ligated using 40 U of T4 RNA ligase 1 (New England Biolabs, M0204), 1 mM ATP, and 40 U RNase inhibitors in RNA ligase 1 buffer for 1 h at 15 °C, and kept for 16 h at 4 °C.
Photoreversal of cross-linking
For cross-linking reversal, the ligated RNAs were irradiated at 254 nm UV with a fluence of 400 mJ/cm2, followed by 200 mJ/cm2. RNAs were then precipitated overnight using isopropyl alcohol and washed twice with 75% alcohol.
Library preparation and high-throughput sequencing
Sequencing libraries were generated using NEBNext® Ultra™ RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer’s recommendations. Library preparation was carried out on an Illumina HiSeq 2000/2500 platform. To detect RRIs in E. coli, five samples were prepared. The sample with full treatment was termed ‘TAN’. Compared with the ‘TAN’ sample, ‘AN’ had no T4 RNA ligase treatment, while ‘A’ had no T4 RNA ligase and photoreversal treatments; ‘B’ was the sample without AMT, T4 RNA ligase and photoreversal treatments, while ‘U’ had no AMT, UV irradiation, T4 RNA ligase and photoreversal treatments.
The adapter sequences were removed from the raw sequencing reads using the Flexbar  software, which meanwhile could trim the 3’ end until the Phred quality score 30 or higher is reached. After that, those reads with length shorter than 10 nts, with undetermined bases taking up more than 10%, or with low quality score (<=5) bases taking up more than 50%, were filtered out. Then paired end reads were merged by a home-made program. In this study, the overlaps between paired end reads were no less than 10 nts. The reads were then mapped to the genome of Escherichia coli K12 MG1655 with BLAST . Only BLAST hits without mismatches or gaps were considered. For each read, potential helical regions were predicted using GUUGle . Then, chimeras (chimeric reads) were identified for subsequent analysis. Here we searched for “chimeras” satisfying the following criteria: (1) read not mapped continuously to the genome; (2) read generating two BLAST hits which together could cover it fully; (3) the two parts of the read (corresponding to the two BLAST hits) directly adjacent or having up to 4-nt overlap between them; (4) if the two BLAST hits were in the same gene, they should overlap each other in the gene; (5) the helical regions formed by the two parts of the read should contain at least one classical cross-linking site of AMT, i.e. 5’-UR or 5’-RU. The reads mapped to multiple gene pairs were discarded. Each combination of helical regions with AMT sites was used as constraint to assess the dimeric structure of the two parts by RNAcofold . The structure with lowest energy was selected as the interaction structure frozen by AMT.
Probabilistic analysis of inter-molecular interactions in chimeric reads
Identification of ligated RNAs
Statistics of reads from the five samples
With mismatch and gap disallowed, each read from the TAN sample was mapped individually to the genome using the BLAST program . Some reads mapped wholly to a continuous genomic sequence. They originated most likely from a single transcript, and thus were called single reads. Other reads could be split into two parts, with each one mapped wholly to a continuous genomic sequence. When the two genomic sequences corresponded to different transcripts, the term chimera was used for the read. When the two genomic sequences overlapped, the read was also called a chimera, which most likely originated from the interaction of two identical RNA molecules. When the two genomic sequences are separated in the same transcript, the read was defined as a single read. To eliminate chimeras originated from reverse transcription template switching, the two parts were required to be directly adjacent or with up to 4-nt gap or overlap in the read. Other reads should be split into more than two parts to ensure full mapping to the genome. These reads might originate from multiple ligations of RNA and/or cross-linked RNAs, which were not further analyzed in this study.
The ligated RNAs form stable structures with low free energy distribution
Interactions in chimeric and single reads
We selected reads mapped uniquely for further analysis. As AMT was used to freeze the RNA interactions and enrich them for sequencing, the chimera should contain certain site pairs in both RNA sequences which could be recognized and linked by AMT. As for single reads, site pairs should also exist. We used GUUGle  and home-made programs to search classical cross-linking sites of AMT, i.e. 5’-UR or 5’-RU. Finally, 2652 chimeric and 11265 single reads were identified to contain reasonable cross-linking sites. For each read, there might be several possible combinations of base-paired regions containing AMT sites. We selected the one with lowest energy as constraint to determine the interaction’s structure of a chimera or non-continuous single read using RNAcofold, or the folding structure of a continuous single read using RNAfold.
Identification of over-represented RNA-RNA interactions
Statistics of inter-molecular RNA-RNA interactions
tRNA – tRNA
8 (glyT – glyU/proM/serV/thrT/tyrU, glyU – tyrU, tyrU-/valV/valW)
ncRNA – tRNA
7 (ffs – glyT/serV/tyrU, ssrS – glyT/glyU/hisR/proM)
tRNA – rRNA
5 (glyT – rrfA, proM – rrfA, serT – rrsG, serV – rrlA/rrlH)
rRNA – mRNA
2 (rrlC – cadA/rpoB)
rRNA – ncRNA
3 (ffs – rrfA/rrfF, ssrS – rrlH)
rRNA – rRNA
1 (rrsG – rrsG)
rRNA – IGT
1 (rrlC –)
tRNA – tmRNA
1 (serV – ssrA)
tRNA – IGT
1 (hisR –)
Global snapshot of the E. coli RNA-RNA interactome
Among the detected single reads, 10697 reads were located in known genes, among which 10054 were located in 11 rRNA genes, 409 reads in 14 tRNA genes, 145 reads in 5 non-coding RNAs (ncRNAs), 73 reads in 58 mRNAs, 15 reads in transfer-messenger RNA (tmRNA) ssrA, and 1 read in pseudo mRNA yneO. For the remaining reads, 86 were located in intergenic regions, 472 crossed gene boundaries, 7 crossed 3 repeat region boundaries, and 3 were located in a repeat region named REP31. These reads may reflect new RNA transcripts, whose functional roles remain to be discovered.
The 1082 over-represented chimeric reads originated from different types of inter-molecular interactions (Table 2). For example, 315 reads were from 8 tRNA-tRNA interaction pairs, 266 from 7 ncRNA-tRNA pairs, 199 from 5 tRNA-rRNA pairs, 141 from 2 rRNA-mRNA pairs, 74 from 3 rRNA-ncRNA pairs, 33 from 1 rRNA-rRNA pair, and 13 from 1 tRNA-tmRNA pair. Interactions involving intergenic transcripts (IGTs) were also found, such as 28 reads from the interaction between rRNA rrlC and the transcript from a region between the protein coding genes udk and alkA, and 13 reads from that between tRNA hisR and the transcript from a downstream region of leuT. The existence of the novel transcript interacting with rrlC were proved by PCR and sequencing (Additional file 1: Document S1).
Detailed information of the above reads can be accessed at our webserver http://ccb1.bmi.ac.cn/htsrr/. For each read, there is a webpage to illustrate its gene composition, base pairing position and sequence, interacting structure, and probability distribution of being unpaired in the parent genes; this will help related experimental research on RNA-RNA interactions and RNA structure.
In recent years, CLASH methods have been developed to detect RRIs in yeast , bacteria [24, 25] and human cells . In these studies, UV was used to stimulate cross-linking between proteins and associated RNAs, and specific proteins were used to obtain the cross-linked RNAs for RRI detection. Although many RRIs have been discovered, they were limited to those associated with specific proteins, which did not cover the whole RNA-RNA interactome.
In this study, psoralen can intercalate into RNA duplexes and after irradiation with 320–400 nm light, uridines from the RNA duplex can be frozen by covalent attachment  when in close proximity . When irradiated with 254 nm UV, crosslinking can be reversed. Psoralen induces intra- and inter-molecular cross-links within RNAs, it doesn’t matter if there exist other molecules such as proteins. This makes it possible to ligate all interacting RNA sequences, whether intra- or inter-molecular entities. Then, the concatenated RNA molecules can be used to prepare cDNA libraries for high-throughput sequencing, which should reveal the whole RNA-RNA interactome. To the best of our knowledge, this is the first study to scan the bacterial whole RNA-RNA interactome using a modified CLASH protocol. As expected, we detected both intra- and inter-molecular RNA-RNA interactions. Furthermore, crosslinking uridines preferred by AMT would be helpful to detect interacting sequences. To ensure reliability, the reads were analyzed and filtered strictly. For example, during mapping to the genome, no mismatch or gap was permitted; an interaction RNA pair should contain base-paired sequences with classical AMT cross-linking sites and be supported by no less than 10 reads. In addition, the count of an interaction RNA pair should be statistically significant. Although no sRNA-mRNA interaction was found with statistically significance in this study, almost all functional classes of RNAs were detected to be involved in various inter-molecular RNA-RNA interactions, among which tRNA-tRNA, tRNA-ncRNA, tRNA-rRNA, rRNA-mRNA, rRNA-ncRNA, rRNA-rRNA, rRNA-IGT and tRNA-IGT interactions were also detected in a RNase E-CLASH study by Waters et al . 5 interacting gene pairs discovered in Waters’s study were detected with statistical significance in this study, of which 1 is tRNA-tmRNA and 4 is tRNA-tRNA. We detected not only alternative interaction regions in each of them, but also the same interaction regions in 3 tRNA-tRNA gene pairs. If we do not consider the statistical significance, additional 27 gene pairs revealed in Waters’s study were detected in this study, among which we detected the same interaction regions in 5, alternative interaction regions in 16, both same and alternative interaction regions in 6 (Additional file 2: List S1). The alternative interaction regions may demonstrate the dynamics of RNA-RNA interactions, which will affect and be affected by their interactions with other partner molecules. Detection of the same and alternative interaction regions showed the ability of this modified protocol to capture the dynamics of RNA-RNA interactions, no matter it occurred before, during or after the interaction with other partner molecules such as proteins. These results revealed an unexpected complexity of RNA-RNA interactions, even in a simple bacterial cell. These findings would benefit the functional researches of RNAs to explore the unexpected RNA-RNA interactions, especially their changes in various conditions. For example, among the identified 29 pairs of interacting genes, tRNAs appeared in 22, where tRNA, ncRNA, rRNA, tmRNA and an unknown transcript were partners. Here, tRNA fragments appearing as chimeras may be tRNA-derived fragments (tRFs) indeed. The tRFs have been described in all three domains of life, and are produced from mature tRNAs or their precursor transcripts; they serve as a source of small functional RNAs involved in many biological processes, such as translational inhibition and stress response. Research on tRFs is still in its beginning stage .
The known sRNA-mRNA interactions were not found, likely because: (1) sRNA-mRNA interactions are dependent on specific environmental changes. The experimental conditions employed in this study may not be consistent with those of known sRNA-mRNA interactions; (2) with the high abundance of rRNAs and tRNAs, the sequencing depth was not high enough, and known interactions of sRNA-mRNA could not be detected; (3) other reasons such as many experimental steps do not promote the detection of sRNA-mRNA interactions. In the next study, we will try to improve the protocol to detect low abundance of RNA-RNA interactions by simplifing the experimental processes, eliminating rRNAs and tRNAs, and increasing the sequencing depth. Moreover, we will apply it in the comparing studies of various conditions and strains to find the RNA-RNA interactions underlying various physiological and pathogenic phenomena.
Initially, we aimed to detect more sRNA-mRNA interactions using a modified CLASH protocol. Interestingly, another unexpected aspect of the complicated RNA-RNA interactions was revealed, indicating the necessity for studying RNA interactions broadly. The protocol applied here would be useful in future research for finding more RNA interactions and determining their interaction sequences. In addition, if combined with other CLASH protocols which can detect RRIs associated with proteins, we will be able to study the relation between the RNA-RNA interactome and the protein world.
As demonstrated by the novel RNA-RNA interactions discovered in this study, there may still be multiple RNA-RNA interactions to be uncovered in E. coli. Methods for detecting various kinds of RNA-RNA interactions may be different. The present study provides a high-throughput protocol not only for discovering new RNA interactions, but also obtaining base-pairing sequences directly, which should be useful in assessing RNA structure and interactions.
This work was financially supported by the National Natural Science Foundation of China (31271404, 31471244, 91540202, 31300093). The funding body had no role as such in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.
Availability of data and material
The raw RNA-Seq data for the five samples are available in the NCBI Sequence Read Archive (SRA) repository via accession number SRP103891, https://www.ncbi.nlm.nih.gov/sra/?term=SRP103891. The scripts used in the bioinformatics analysis and the eventual results are available at the webserver (http://ccb1.bmi.ac.cn/htsrr/).
WJL jointly conceived the study with XFZ. SX, HJF and BLT performed the experiments. TL, KYZ and ZW analyzed and interpreted the data. TL and WJL wrote the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Consent for publication
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
- 29.Dodt M, Roehr JT, Ahmed R, Dieterich C. FLEXBAR-Flexible Barcode and Adapter Processing for Next-Generation Sequencing Platforms. Biology (Basel). 2012;1(3):895–905.Google Scholar
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.