Invertebrate 7SK snRNAs
- 1.5k Downloads
7SK RNA is a highly abundant noncoding RNA in mammalian cells whose function in transcriptional regulation has only recently been elucidated. Despite its highly conserved sequence throughout vertebrates, all attempts to discover 7SK RNA homologues in invertebrate species have failed so far. Here we report on a combined experimental and computational survey that succeeded in discovering 7SK RNAs in most of the major deuterostome clades and in two protostome phyla: mollusks and annelids. Despite major efforts, no candidates were found in any of the many available ecdysozoan genomes, however. The additional sequence data confirm the evolutionary conservation and hence functional importance of the previously described 3′ and 5′ stem-loop motifs, and provide evidence for a third, structurally well-conserved domain.
Keywords7SK RNA Polymerase III transcription Noncoding RNA Lophotrochozoans
The 7SK snRNA is a highly abundant noncoding RNA in vertebrate cells. The polymerase (Pol) III transcript, with a length of about 330 nucleotides (nt) (Krüger and Benecke 1987; Murphy et al. 1987), is highly conserved in vertebrates (Gürsoy et al. 2000). Due to its abundance, it has been known since the 1960s. Its function as a transcriptional regulator, however, has only recently been discovered. 7SK mediates the inhibition of the general transcription elongation factor P-TEFb by the HEXIM1 protein and thereby represses transcript elongation by Pol II (Blazek et al. 2005; Egloff et al. 2006; Michels et al. 2004; Peterlin and Price 2006). Furthermore, 7SK RNA suppresses the deaminase activity of APOBEC3C and sequesters this enzyme in the nucleolus (He et al. 2006).
Two distinct secondary structure elements are highly conserved throughout vertebrates (Egloff et al. 2006): a 5′-terminal hairpin structure that binds both HEXIM1 and P-TEFb, and a 3′-terminal hairpin that interacts with P-TEFb only. In contrast to the nearly perfect sequence conservation in jawed vertebrates, the 7SK RNA from the lamprey Lampetra fluviatilis differed in more than 30% of its nucleotide positions from its mammalian counterpart (Gürsoy et al. 2000). The highest sequence conservation is observed in the 5′ and 3′ hairpin regions. The sequence conservation seems to decline rapidly outside the gnathostomes. In Gürsoy et al. (2000), some of us also reported on an unsuccessful attempt to find 7SK RNA in hagfish and lancet and suggested that the 7SK RNA might be a vertebrate innovation. In this paper we combine improved cloning strategies with systematic computational homology searches to detect highly divergent 7SK RNAs in invertebrate animals.
Materials and Methods
Cloning and Sequencing of 7SK RNAs
Total cellular RNA was isolated from frozen tissue, minced with scissors, and homogenized in buffer containing guanidinium thiocyanate (Chomczynski and Sacchi 1987). Northern blots were performed with 5 μg of purified RNA separated in 2% agarose gels containing 0.67 M formaldehyde. After transfer to nylon membranes (Hybond-N; Amersham), immobilized RNA was hybridized with labeled antisense RNA probes generated by T7 transcription of inversely cloned cDNA fragments of the previously identified 7SK RNA of Lampetra fluviatilis (Gürsoy et al. 2000). Hybridization with labeled antisense RNA (2 × 106 cpm/ml) was in 50% formamide, 0.1% SDS, 5 × Denhardt’s reagent, 10 μg/ml each of yeast tRNA and denatured salmon sperm DNA, and 5 × SET (150 mM NaCl, 20 mM Tris-HCl, pH 7.9, 1 mM EDTA).
Cloning of new 7SK cDNAs was based on RT-PCR reactions. In the first step 4 μg of total RNA was reverse transcribed using the Omniscript RT-Kit (Qiagen) with a primer complementary to the 3′-end of 7SK RNA. An aliquot of that assay was taken for PCR with the same 3′-primer and a specific upstream primer. The successful combination for Myxine and Branchiostoma amplification was nt 54–72 as upstream primer and nt 197–215 as downstream primer, both of the human 7SK DNA sequence. For Helix, the same upstream primer, but in combination with a lamprey downstream primer (corresponding to positions 299–316), was successful. Candidate fragments were cloned, sequenced, and taken to deduce gene-specific “nested’’ primers for rapid amplification of cDNA ends (RACE) experiments (Frohman et al. 1988), as described earlier (Gürsoy et al. 2000). Briefly, for the 3′-ends cellular RNA was first polyadenylated and reverse transcription started with oligo(dT) carrying at its 5′-side an oligonucleotide sequence suitable for subsequent PCR with two “nested’’ gene-specific primers. The 5′-ends were obtained by reverse transcription with a specific primer and oligonucleotide ligation (T4-RNA ligase) to the 3′-end of first-strand cDNAs. As before, PCR amplification was achieved with nested gene-specific primers.
Computational Homology Search
Homology search was performed as a stepwise procedure. In the first stage, we started with the sequence of the functional human 7SK sequence (X05490, X04236 (Krüger and Benecke 1987; Murphy et al. 1984; Wassarman and Steitz 1991; Zieve and Penman 1976)) and performed a blast search against the genome assemblies available in Ensembl (version 42). In this way, we identified candidates in other vertebrate genomes, including the following, previously published sequences: Mus musculus (M63671, Moon and Krause 1991), Rattus norvegicus (K02909, Reddy et al. 1984), Takifugu rubripes (AJ890104, Egloff et al. 2006; Myslinksi et al. 2004), Tetraodon nigroviridis (AJ890103, Egloff et al. 2006), Danio rerio (AJ890102, Egloff et al. 2006), and Gallus gallus (AJ890104, Egloff et al. 2006). In addition, we searched the shotgun traces of a selection of unfinished mammalian genomes as well as all unfinished nonmammalian animals. Beyond jawed vertebrates, this initial blast search recovered a single candidate in the genome of the lamprey Petromyzon marinus, which turned out the be very closely related to the published sequence of Lampetra fluviatilis 7SK RNA (Gürsoy et al. 2000). The match to a single shotgun trace (1047111637562) from the nematode Brugia malayi was disregarded since it exactly matched the human sequence and hence is certainly a contamination.
We then created a multiple sequence alignment starting with an initial CLUSTALW alignment. This was then manually refined to conform to the experimentally determined structure model of human 7SK snRNA by Wassarman and Steitz (1991). The best-conserved blocks were marked and converted in the search patterns using the aln2pattern program, which is part of the fragrep2 package (Mosig et al. 2007). This step was guided by the functional interpretation of the structural domains of the 7SK in Egloff et al. (2006). The program fragrep, which searches fragmented approximate sequence patterns in genomic DNA sequences, was then used to scan the available genomic DNA data.
The blast and fragrep searches were performed on the genomic sequences from Ensembl and pre-Ensembl, (versions 44 and 45), the genomes of Branchiostoma floridae and Nematostella vectensis (downloaded from the Joint Genome Institute;1), and the metazoan sequences contained in the Ensembl trace archive. More details are given in the electronic supplement.
Computational Identification of Putative 7SK Promoters
It is well known that the promoters of RNA pol III transcribed 7SK, U6, and U6atac snRNA contain three common elements: the proximal sequence element (PSE), about 50 nt upstream of the gene; a TATA box-like element; and distal enhancer elements (Dahlberg and Lund 1988; Wassarman and Steitz 1991). The PSE of the Pol III snRNAs is very similar to that on the snRNA transcribed by Pol II (U1, U2, U4, U5, U11, U12, U4atac).
In order to distinguish functional 7SK genes from pseudogenes we investigated their upstream regions for snRNA-specific promoter elements. Since these sequence motifs can vary significantly (Hernandez Jr et al. 2007) between species, we searched the genomes also for spliceosomal snRNAs, extracted 100-nt upstream regions, and used meme (version 3.5.4) (Bailey and Elkan 1994) to identify the PSE consensus separately for each species. For the study reported here we used the applet available from http://meme.nbcr.net/downloads/ with default options and parameters -nmotifs 5 -minw 10 -maxw 30. The PSE patterns obtained in this way were then used to identify those 7SK candidates that have an snRNA-like PSE. The results of the homology search for the spliceosomal RNA genes will be reported elsewhere (Marz et al. 2007).
A structural alignment of the vertebrate sequences based on the experimentally determined structure for human 7SK snRNA (Krüger and Benecke 1987; Wassarman and Steitz 1991) was constructed manually using the RALEE mode (Griffiths-Jones 2005) for the emacs editor. The model was iteratively improved upon the addition of new candidate sequences during analysis with the help of consensus structure predictions using RNAalifold (Hofacker et al. 2002).
The 5′ stem sequence of the basal deuterostomes and lophotrochozoa diverged too much from the vertebrate consensus, so they could not be aligned based on sequence similarity alone. We therefore used the absolutely conserved GATC-GATC stem in the center of this region as an anchor since it defined both sequence and structure constraints. The alignment was then edited so as to maximize the number of base pairs in the consensus structure.
Northern Blot Verification of 7SK Sequences
Numerous attempts to identify 7SK RNA in invertebrate phyla have been unsuccessful in the past. Neither RT-PCR experiments with mammalian primers nor northern blot analyses with oligonucleotide-primed cDNA probes were successful (Gürsoy et al. 2000). Therefore, we decided to increase northern blot sensitivity by using radioactively labeled antisense RNA probes. For this, the two most conserved (in comparison with the human sequence) regions of the lamprey (Lampetra fluviatilis) 7SK RNA gene (Gürsoy et al. 2000) were subcloned in inverted orientation under control of the T7 RNA polymerase promoter. Thus, these two clones allowed the synthesis in vitro of labeled transcripts with very high specific activity. The resulting two antisense RNA probes were complementary to regions 1–94 (A) and 283–316 (B) of lamprey 7SK RNA, respectively.
For normalization among samples, a labeled full-length antisense U6 snRNA has been included in both hybridizations. In the upper panel, weaker U6 signals are observed throughout. This is due to the rehybridization of the stripped blot. In both rounds, U6 antisense RNA hybridizes to the same target sequences, whereas the two 7SK probes bind to different areas of the 7SK RNA. The broad smear observed with sea urchin RNA is due to cross contamination of the 7SK antisense probe with ribosomal RNA and degradation products thereof.
cDNA Cloning of Novel 7SK RNAs
Cloning of invertebrate 7SK cDNA was performed by RT-PCR and primers deduced from the most conserved elements of vertebrate 7SK RNA. Routinely, about 10 different primer combinations had to be tested. In many cases, PCR fragments with the expected lengths were obtained. After subcloning and sequencing, however, most fragments were found to represent pieces of ribosomal DNA. Only a single previously unidentified sequence was amplified from hagfish RNA. Two identical sequences, but differing in length, were obtained from amphioxus. A single new clone was obtained from snail. These clones showed a limited but significant sequence homology with the corresponding central sections of the vertebrate 7SK DNA. Therefore, the 3′- and 5′-flanking sequences of those clones were amplified by RACE experiments. After subcloning and sequencing, composite 7SK RNA sequences were obtained for Myxine glutinosa (329 nt), Branchiostoma lanceolatum (304 nt), and Helix pomatia (303 nt).
These three 7SK RNA clones revealed sequence homology with lamprey 7SK DNA in the range between 44% (snail) and 59% (hagfish). The identification of several interspersed elements (7 to 11 nt long) with a perfect match to the vertebrate 7SK RNA sequence strongly supported the notion that 7SK cDNA has been successfully cloned from two basal chordates (hagfish and amphioxus) and a single nondeteuterostome invertebrate. In addition, we report here sequences for Gadus morrhua and Mustelus asterias. All sequences have been deposited in GenBank, accession numbers AM773429–AM773436. Multiple sequence alignments can be found in the electronic supplement.
Within vertebrates, homology search turned out to be rather straightforward. Simple blastn searches were sufficient. In eutherians, however, it is a problem to identify the functional 7SK gene among a larger number 7SK-derived pseudogenes. In fact, searching Ensembl v.44 with an E-value cutoff of 10−4 returns more than 100 hits in all eutherian genomes. In contrast, there are only 31 hits in Monodelphis domestica and 11 hits in the chicken genome. The current assembly of the genome of Xenopus tropicalis features two adjacent copies. These are identical also in an extended flanking sequence, indicating a recent segmental duplication of the locus or an assembly artifact; see, e.g., Cheung et al. (2003). In each of the five sequenced teleost fish, only a single copy of the 7SK is present. Three blast hits were found in the pre-Ensembl release of the sea lamprey genome. Only a single one, located on Contig17254, matches the published sequence from Lampetra fluviatilis over its full length.
In three vertebrate species, however, we failed to find a complete 7SK gene. Only a single partial hit was recovered from the low-coverage genome elephant shark Callorhynchus milli. All convincing blastn hits of the chicken 7SK sequence against the available Taeniopygia guttata shotgun reads seem to belong to a single locus. The corresponding sequence very well matches the chicken sequence but shows a 398-nt insert, which we interpret as an artifact. To our surprise, only a single blastn hit was found in the platypus genome. The corresponding sequence significantly deviates from the vertebrate consensus both in the first ∼8 nt and in the last ∼100 nt, and it is not located in a region that is syntenic to the functional 7SK genes in other vertebrates. This blast hit thus is most likely a pseudogene. Since the locus around the platypus ICK homologue is incompletely assembled, it is reasonable to assume that we fail to find the platypus 7SK due to missing data, not because platypus has lost its functional 7SK RNA.
While the vertebrate 7SK RNAs are very well conserved at sequence level (Gürsoy et al. 2000), blast searches soon reached their limits outside of this clade. A weak blast hit of the human query sequence in the Branchiostoma floridae genome was easily verified by comparison with the experimentally determined Branchiostoma lanceolatum 7SK RNA sequence. In total, we find six nearly identical 7SK candidates on five different scaffolds. These sequences are also nearly identical at least 100 nt upstream of the 7SK. It is unclear whether there are really multiple functional copies of 7SK RNA dispersed in the amphioxus genome.
Beyond amphioxus, three further candidates were found by means of fragrep only: two closely related sequences from the urochordates Ciona intestinalis and Ciona savignyi, and a single candidate from the hemichordate Saccoglossus kowalevskii. In the C. intestinalis genome there is only a single 7SK locus. In contrast, the current assembly of C. savignyi features four nearly identical copies within 8 kb on reftig_107.
The PSE of basal deuterostomes, 63–48 nt upstream of 7SK, is well conserved with the one of U6 snRNA for all investigated organisms. The TATA-box is located 32–25 nt upstream. An exception is Ciona intestinalis, which has an insertion of 15 nt between the TATA-box and 7SK, and a 9-nt deletion between the PSE and TATA. The Ciona species shows no TATA-box conservation with other snRNAs, nevertheless, there is a slightly modified TATA-box. Branchiostoma has a canonical TATA-box.
Despite significant efforts we did not find credible candidates in the genome of the sea urchin Strongylocentrotus purpuratus. The three best candidates in this case lack the 3′ hairpin structure. In addition, the 5′ hairpin region can hardly be aligned with other deuterostome 7SK sequences. Our search also failed for the shotgun traces of the urochordate Oikopleura dioica. In this case we found a good candidate for the 3′ stem-loop structure, but the 300 nt upstream of this hit does not match other 7SK sequences. We suspect that these negative results might be due to incomplete genomic data in these cases.
Protostome 7SK RNAs
The fragrep search was successful in three protostome genomes: the mollusk Lottia gigantea and the two annelids Capitella capitella and Helobdella robusta. All three sequences are easily recognizable as homologues of the 7SK sequence that was cloned from the escargot Helix pomatia. In addition, a partial sequence from Aplysia californica was found by blast using the experimentally determined escargot 7SK sequence as query.
The PSE of Aplysia californica is located 67–41 nt upstream of 7SK snRNA; the sequence motif TGTATAGA matches the typical TATA-box sequence 35–28 nt upstream. In Lottia gigantea we find CTTATATA (positions −31 to −24) and the PSE 15 nt upstream of the TATA-box. In Capitella we find TATACA at positions −27 to −21 and a possible PSE, although it does not match well with the upstream sequence of the U6 snRNA in this species. The single shotgun read from Helobdella robusta does not show a recognizable TATA-box region but an alignable PSE region. It is not clear whether this sequence is a functional gene or a 7SK-derived pseudogene.
Despite extensive efforts, on the other hand, no 7SK candidate was found in any of the many available insect and nematode genomes. A search in the genomes of the two platyhelminths Schmidtea mediterranea and Schistosoma mansoni also remained unsuccessful. Thus, among protostomes, the 7SK RNA can be found only among lophotrochozoans.
Given the lack of success on ecdysozoan genomes and the four highly derived lophotrochozoan sequences, we were not surprised that searches in the genomes of diploblastic animals and in the choanoflagellate Monosiga brevicollis were also not successful.
Refined Structural Models of 7SK RNAs
The secondary structures for the 5′ and 3′ stems were proposed already in previous publications. Wassarman and Steitz (1991) derived a model for the human 7SK snRNA based on chemical probing data. Egloff et al. (2006) used site-directed mutagenesis to demonstrate that both the 5′ and 3′ stems are functionally relevant. The structural model in Fig. 5 is derived from a sequence alignment that takes into account both sequence covariation and thermodynamic considerations. Our consensus model is in agreement with the previously published structures with a marginal exception: in Egloff et al. (2006) the regions marked in Fig. 5 are shown as an interior loop, while Wassarman and Steitz (1991) shows only the terminal A-U as part of the interior loop.
The 5′ stem models both for basal deuterostoma and for the lophotrochozoa are different in size, sequence, and structure. The only common ground among all three models is the GATC-GATC structure/sequence pattern at the beginning of the topmost stem.
The vertebrate-specific stem B, which is not necessary for P-TEFb binding (Egloff et al. 2006), also fits very well with both experimental models, again, with a small difference affecting a small interior loop. It does not appear to have a counterpart in basal deuterostomes and protostomes.
The central region of the 3′stem is structurally conserved in all 7SK RNAs, the only exception being an extension of the most central stem by a single GC pair in Branchiostoma and Saccoglossus. The 3′ stem-loop structure can be extended by an additional 5 base pairs (bp) in vertebrates and, to a lesser extent, also in the other 7SK snRNAs. The exact pairing pattern in this extended region does not seem to be very well conserved, however.
The small stem A feature, finally, is highly conserved also in sequence across all known 7SK snRNA, although the size of the loop region is variable in the lophotrochozoan sequences. So far, no specific function has been reported for this region.
Assuming that the commonly accepted sister-group relationship of Protostomia and Deuterostomia is indeed correct, our findings imply that the 7SK originated at latest in the bilaterian ancestor. In contrast, we found no trace of a 7SK RNA candidate either in platyhelminthes, in any of the numerous ecdysozoan species for which genomic data are available, or in the genome of the cnidarian Nematostella vectensis. In summary, our data support an (early) bilaterian origin for 7SK snRNA.
The monophyly of the Ecdysozoa is, among other arguments, also supported (Telford 2006) by the shared secondary absence of large numbers of genes in euarthropods and nematodes (Hughes and Friedman 2004). There is no functionally described HEXIM1 orthologue in insects. The current release of the ENSEMBL (v. 44) homology annotation, however, lists HEXIM homologues in Drosophila melanogaster (CG3508), Aedes aegyptii (AAEL013291), and Anopheles gambiae (AGAP002875). It is at least conceivable, therefore, that an ancestral 7SK gene has been secondarily lost in this clade. Alternatively, the 7SK sequence might have diverged so far that it is not recognizable with currently available bioinformatic approaches.
The analysis of sequences and secondary structures revealed a striking difference between vertebrate and invertebrate sequences. While vertebrate 7SK RNAs are highly conserved in both sequence and structure, the molecule is highly variable in the other clades. Consensus structure models derived using a combination of thermodynamic folding and evaluation of compensatory mutations reveal three structural motifs that are conserved throughout all known 7SK sequences. The central domain (stem B), however, is present in vertebrates only, while elsewhere this region is so variable that our attempts to construct plausible alignments failed.
Supplemental data, in particular, machine-readable sequence alignments, tables of genomic coordinates, and lists of PSE elements can be found at http://www.bioinf.uni-leipzig.de/Publications/SUPPLEMENTS/07-021/.
This work has multiple roots: Dorota Koper-Emde’s Ph.D. thesis at the Ruhr University Bochum (2004), Manja Marz’ M.Sc. thesis at the University of Leipzig (2006), and an Advanced Bioinformatics Computer Lab Course on “RNA Homology Search’’ organized by PFS at the University of Vienna in Fall 2006. It was supported in part by the DFG Bioinformatics Initiative and the GK Wissensrepräsentation (Leipzig) and by the Austrian GEN-AU projects “Non-Coding RNA’’ and “Bioinformatics Integration Network II’’ (Vienna). We thank Guido Fritzsch for taking care of the Genbank submissions.
- Bailey T, Elkan C (1994) Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology. AAAI Press, Menlo Park, CA, pp 28–36Google Scholar
- Dahlberg JE, Lund E (1988) The genes and transcription of the major small nuclear RNAs. In: M. Birnstiel (ed) Structure and function of major and minor small nuclear ribonucleoprotein particles. Springer-Verlag, Berlin, pp 38–70Google Scholar
- Marz M, Kirsten T, Stadler PF (2007) Evolution of spliceosomal snRNA genes (manuscript in preparation)Google Scholar
- Moon IS, Krause MO (1991) Common RNA polymerase I, II, and III upstream elements in mouse 7SK gene locus revealed by the inverse polymerase chain reaction DNA. Cell Biol 10:23–32Google Scholar
- Mosig A, Chen JL, Stadler PF (2007) Homology search with fragmented nucleic acid sequence patterns. In: Giancarlo R, Hannenhalli S (eds) WABI 2007. Lecture Notes in Computer Science, Vol 4645. Springer Verlag, Berlin, pp 335–345Google Scholar
- Reddy R, Henning D, Subrahmanyam CS, Busch H (1984) Primary and secondary structure of 7-3 (K) RNA of Novikoff hepatoma. J Biol Chem 259:12,265–12,270Google Scholar