Mammalian Genome

, Volume 19, Issue 7, pp 581–586

piRNA-like RNAs in the marsupial Monodelphis domestica identify transcription clusters and likely marsupial transposon targets

Authors

    • Molecular Genetics and BiophysicsIntegrated DNA Technologies
  • Lingyan Huang
    • Molecular Genetics and BiophysicsIntegrated DNA Technologies
  • Paul B. Samollow
    • Department of Veterinary Integrative BiosciencesTexas A&M University
Article

DOI: 10.1007/s00335-008-9109-x

Cite this article as:
Devor, E.J., Huang, L. & Samollow, P.B. Mamm Genome (2008) 19: 581. doi:10.1007/s00335-008-9109-x

Abstract

PIWI-interacting RNAs (piRNAs) are a recently discovered class of small noncoding RNAs that have been detected in human, mouse, rat, zebrafish, and Drosophila genomes. We have utilized a size-directed small-RNA cloning procedure to clone and map more than 300 candidate piRNA-like small RNAs in the genome of the marsupial species Monodelphis domestica. Our results are consistent with those from other species in that the piRNA-like candidate sequences range in size from 28 to 31 nucleotides, show a pronounced preference for uridine at the 5′ end, are transcribed from a few large clusters, appear to target transposons, and display virtually no sequence conservation.

Introduction

Beginning with the discovery of the first microRNA (miRNA) more than a decade ago (Lee et al. 1993), the previously unsuspected world of small, noncoding, regulatory RNAs has grown to include not only hundreds of miRNAs but several additional classes of small RNAs, including endogenous siRNAs, 21U RNAs, and transacting siRNAs (Borsani et al. 2005; Lee at al. 2006; Peragine et al. 2004; Ruby et al. 2006). Recently, another new member was added to the growing list of small-RNA classes. Called PIWI-interacting RNAs (piRNAs), they are distinct in both size (28–31 nt long) and their specific interactions with members of the Piwi clade of argonaute proteins (Kim 2006).

The first piRNAs were independently discovered in mouse testes by four groups (Aravin et al. 2006; Girard et al. 2006; Grivna et al. 2006; Watanabe et al. 2006). Since their initial discovery, it has been shown that there are two subclasses of piRNAs in mice, a 29–31-nt-long MIWI-associated subclass and a 26–28-nt-long MILI-associated subclass. These two subclasses are expressed at different phases of spermatogenesis, with the MILI-associated piRNAs being expressed earlier than MIWI-associated piRNAs, although there is some overlap during which both classes are expressed (Aravin et al. 2006). Aravin et al. (2006) and Grivna et al. (2006) confirmed the existence of piRNAs in humans while Lau et al. (2006) and Houwing et al. (2007) specifically confirmed their existence in rats and zebrafish, respectively. In all species so far examined, piRNAs are found to be consistently 28–31 nt long, have a pronounced preference for a 5′ uridine base, are transcribed from large genomic transcription clusters, and display virtually no interspecies sequence conservation. It has also been reported that piRNAs are uniformly 2′-O-methylated at their 3′ ends (Kirino and Mourelatos 2007; Ohara et al. 2007).

Here, we report on several hundred piRNA-like small RNAs cloned from testes RNA of a marsupial (metatherian) mammal, the gray short-tailed opossum Monodelphis domestica. The choice of using testes RNA was determined by the findings cited above which were all derived from testes RNA. Our sequencing and mapping results confirm all of the major characteristics of previously reported piRNAs in placental (eutherian) mammals, namely, that they range in size from 28 to 31 nt long, are unevenly distributed in the M. domestica genome, appear to be transcribed from long transcripts, and show a pronounced preference for uridine in the 5′ terminal position. Moreover, many of the sequences appear to target marsupial-specific transposons, in keeping with the piRNA “ping-pong” function model of transposon suppression in which the piRNA and its target transposon cyclically generate the suppressor sequences (Brennecke et al. 2007).

Materials and methods

Total RNA was purified from testes of Monodelphis domestica using the mirVANA procedure (Ambion, Austin, TX). The putative piRNA-containing fraction was enriched from 50 μg of total RNA by excising a slice from a 12% denaturing (7 M urea) polyacrylamide gel, the location and size of which were determined by the position of a 31-mer internal control RNA (piSPIKETM, Integrated DNA Technologies, Coralville, IA). RNAs purified from this size-directed gel slice were then cloned using the miRCatTM small-RNA cloning procedure (Integrated DNA Technologies). Briefly, purified small RNAs, including the spiked 31-mer internal control, are 3′ ligated with a preactivated, adenylated linker (rAppCTGTAGGCACCATCAATddC) using T4 RNA Ligase in the absence of ATP (Lau et al. 2001). Linkered RNAs are then purified from a second 12% denaturing (7 M urea) polyacrylamide gel. These purified RNAs are 5′ linkered with the DNA/RNA chimeric 5′ M.R.S. Linker (TGGAATrUrCrUrCrGrGrGrCrArCrCrArArGrGrU) using T4 RNA Ligase in the presence of 10 mM ATP. The piSPIKE internal control is synthesized without a 5′ phosphate and, therefore, is unable to accept the 5′ M.R.S. Linker. Thus, the only doubly ligated RNA species were those from the original tissue RNA purification. These species were reverse transcribed and then PCR amplified using the RT/REV primer 5′-GATTGATGGTGCCTACAG-3′ (Tm = 50.2°C) and the FOR primer 5′-TGGAATTCTCGGGCACC-3′ (Tm = 55.0°C). PCR amplification conditions were 95.0°C for 5 min followed by 35 cycles of 95.0°C for 30 sec, 52.0°C for 30 sec, and 72.0°C for 30 sec, with a final polymerase extension at 72.0°C for 7 min. The expected amplicon size was 70 ± 2 bp. Finally, amplicons were digested with BanI restriction endonuclease, restriction fragments were concatamerized using T4 DNA Ligase, and concatamers were nontemplate adenylated with Taq polymerase and cloned into pGEM T-EASY (Promega, Madison, WI). Clones were sequenced on an ABI Model 3130xl Genetic Analyzer (Applied Biosystems, Foster City, CA).

Only clones in which there was an identifiable sequence between the two constant linkers shown above were taken as positives. Each of these positive sequences was then screened via BLAST against the MonDom5 M. domestica genome assembly in Ensembl. BLAST searches were carried out using default parameters with the exception that the RepeatMasker filter was turned off. Sequences were accepted as potential candidate piRNA-like RNAs only if there was a full-length match in MonDom5. Chromosome coordinates were recorded for every full-length hit obtained for that sequence. Candidate sequences were given sequential identification numbers by chromosome, i.e., MdopiR-1, MdopiR-2, etc. Candidate sequences for which multiple full-length matches were obtained were given sequential subidentifications within the unique ID numbers, i.e., MdopiR-100.1, MdopiR-100.2, etc. (see Supplementary Table 1).

Results

Approximately 600 clones were selected for sequencing and, of these, 406 were found to contain concatamer inserts. The average concatamer contained three doubly ligated target sequences with a range of one to six. BLAST searches of all target sequences revealed that 89 (22%) could be identified as scraps of tRNAs, rRNAs, and a few pieces of mRNAs from annotated genes. The remaining sequences were screened for multiple occurrences of the same sequence, leaving a total of 310 unique sequence signatures. The size range of these unique signatures was 20–40 nt with 87.8% falling between 28 and 31 nt (Fig. 1). Consistent with the 5′ terminal base bias seen in piRNAs from other species, the 5′-most base of 259 of the 310 sequences (83.5%) is U, while the other bases are much less frequent, i.e., G (22, 7.1%), A (18, 5.8%), and C (11, 3.5%).
https://static-content.springer.com/image/art%3A10.1007%2Fs00335-008-9109-x/MediaObjects/335_2008_9109_Fig1_HTML.gif
Fig. 1

Distribution of clone target sequence sizes from M. domestica testes RNA. Clones were derived from a size-targeted small-RNA fraction using a 31-mer internal RNA control (piSPIKE™, Integrated DNA Technologies). Target sequence size refers to the base count in each concatamer clone found between the constant 5′ linker sequences and the constant 3′ linker sequences given in the Materials and methods section. Average target sequence size was 29.1 bp with a range of 20–40 bp

Chromosome coordinates could be determined via BLAST against the M. domestica genome (MonDom5 in ENSEMBL) for 246 of the 310 unique signatures. The distribution of BLAST results for all 310 sequences is summarized in Table 1. Twenty-five unique sequences were found to be present in six or more copies (range = 6–51 copies) in the same chromosome vicinity. In some of these cases, for example, MdopiR-161 and MdopiR-162, the coordinates were seen to overlap (Supplementary Table 1 and Supplementary Fig. 1). Similar findings have been reported for placental mammals (Ro et al. 2007a, b). Ro et al. (2007b) have also observed that some piRNA-like RNAs cloned from mouse ovary are associated with repetitive sequences (rapiRNAs) while others are non-repeat-associated piRNAs (napiRNAs). Although there is no definite cutoff size, the former group tends to be longer than conventional piRNAs (32–38 nt). The overlapping sequences MdopiR-161 and MdopiR-162, both 30 nt long, are associated with a tandem repeat and, therefore, meet the criterion for designation as rapiRNA-like RNAs. We assessed the 20 candidate piRNA-like RNAs among our clones that are 32 nt or longer and only two, MdopiR-267 and MdopiR-280, were found to be associated with any type of repetitive sequences and both are associated with transposons. On the other hand, because we pursued a size-selected cloning strategy that focused intentionally on the 28-31-nt size range, rapiRNA-size RNAs were excluded.

A majority of the 244 sequences for which definite chromosome coordinates could be determined were found in 16 monodirectional transcription clusters shown in Figure 2. Of these clusters, 15 lie in intergenic regions while one cluster mapped to the nearly 200-kb-long third intron of the phosphodiesterase 1C gene on chromosome 6.
https://static-content.springer.com/image/art%3A10.1007%2Fs00335-008-9109-x/MediaObjects/335_2008_9109_Fig2_HTML.gif
Fig. 2

Relative position of 16 piRNA transcription clusters so far identified in the M. domestica genome. The chromosome position of each cluster was determined relative to the placement of 415 BACs by Duke et al. (2007) using the chromosome coordinates of each putative cluster and the chromosome coordinates of the mapped BACs

Of the remaining 66 sequences, 23 failed to return any matches in MonDom5 and another 36 could not be uniquely mapped as each returned more than 100 full-length matches scattered throughout the genome. The latter group of sequences was evaluated in RepeatMasker [A.F.A. Smit, R. Hubley, and P. Green, RepeatMasker Open-3.0 (1996–2004). Available at http://www.repeatmasker.org] using the cloned sequence plus 600 bp of flanking sequence randomly chosen from among the MonDom5 matches. Of these 36 cloned sequences three, MdopiR-246, MdopiR-255, and MdopiR-261, could not be identified as a known repeat sequence by RepeatMasker. Two sequences, MdopiR-270 and MdopiR-274, were identified as generic repeat classes, Tigger 3 and the tRNA repeat GLY-GGG, respectively. These were the only two sequences among these 36 that were also found in the mouse, wallaby, and platypus genomes. The remaining 31 sequences were specifically identified as marsupial transposons. Consistent with previous reports, the transposon classes represented were varied and included SINES, LINES, and LTRs (Table 2 and Supplementary Table 1).
Table 1

Summary of MonDom 5 BLAST results for 310 unique sequence signatures

Sequences with one full-length match in MonDom5

200

Sequences with two to five full-length matches in MonDom5

19

Sequences with 6 to 51 full-length matches in MonDom5

25

Sequences with 100 or more full-length matches in MonDom5

36a

Unassigned sequences

7b

Sequences with no full-length matches in MonDom5

23

aThese sequences are presented in Table 2

bThese sequences include one microRNA (miR-451), three predicted protein-coding genes, two unassigned sequence matches, and a marsupial transposon (LTR-176_MD)

Table 2

piRNA-like RNA sequences returning more than 100 full-length BLAST hits in the MonDom5 M. domestica genome assembly

 

Insert sequence

Repeat masker

piRNA ID

5′                                                                          3′

Transposon ID

MdopiR-245

UCAUCUAUAAAAUUAGUCGGAGAAGGAAA

Mar1a Mdo

MdopiR-246

UGGAUUUGGAAUCAGAGGAUGUGGGU

No ID

MdopiR-247

UAGUGCCAAUAGAGCGUAAGGUCAAAGAGU

LTR-ERV1

MdopiR-248

UUGAGGUAGUCUAUUUCAUUCGGUGCUGG

L1 Mdo

MdopiR-249

UGGCAAACCUUUUAGAGACAGAGUGCCCA

OposCharlie3a

MdopiR-250

UGGGUCUGGAGUCAGGAAGCCUCAU

Mar1a Mdo

MdopiR-251

GAGUCACUUAACCUGUUUGCCUCAGAUUCC

Mar1b Mdo

MdopiR-252

AGUGGAUUGAGAGCCAGGCCUAGAGAUG

SINE1 Mdo

MdopiR-253

UUGGCGAUUACAUUCCUGGGGGGUUGU

L1 Mdo

MdopiR-254

UCAGGUCAUGCAGAGAAAAGUCUAAUGGUCC

Mdo ERV2

MdopiR-255

UGUUGAAUGAAUGAAUGGAGGUUAUUUC

No ID

MdopiR-256

CUUGAAUUCAAGACCUCCUGACUCUAGGCC

SINE1 Mdo

MdopiR-257

UUUUGUGUCAUGGACCCCUUUGGUAGUCU

MIR3 MarsA

MdopiR-258

UGCGGAUGACGUGUCCAGACCAUUGUAGC

RTE Mdo

MdopiR-259

UGGUAUCCAUUUUCUACAAAACCCUGUUGC

Mdo ERV2

MdopiR-260

UCAUUUUAUGUAUGAGAAACUGAGAUAAA

Mar1a Mdo

MdopiR-261

UGGGAUAUAAACUUGCCGGGACCAAUGCC

No ID

MdopiR-262

UUCUAUGUUAACCACUCGGGGAUUAUUAGG

Mdo ERV15

MdopiR-264

UGGAUUCAUAUCUGACCUCAGACACUUC

SINE1 Mdo

MdopiR-265

GUUAAUAUUAAUUUGUACCCCUUUUAGGCCC

L1 Opos

MdopiR-266

UGAUACAUACUAGCUGUGUAACCGUGGAC

Mar 1c Mdo

MdopiR-267

GGAUUGAGAGCCAGGCCUAGAGAUAGGAGGUCC

SINE1 Mdo

MdopiR-268

AGUGGAAUGAGAACCAGGCCUAGAGAUG

SINE1 Mdo

MdopiR-269

UGUAAAAUGAGAGAGUUGGUGUAGGUGGC

MIR3 MarsB

MdopiR-270

UUAUUUUAUAGAUAAGGAAACUGAGGCU

Tigger 3

MdopiR-271

UGUGAUUGGUAGAUAUAAGGACUUGGGGGU

LTR1k Mdo

MdopiR-272

UGGACUGAGAGCCAGGCCUAGAGACUGGAGU

SINE1 Mdo

MdopiR-273

UCAUGAGUCCCUUGGAGUUGUCUUGGGU

L1 Opos

MdopiR-274

GCAUUGGUGGUUCAGUGGUAGAAUUCUCG

tRNA-GLY

MdopiR-275

UUGUGGAUAAUUUCCAUUUUGGGAGGCA

L1 Mdo

MdopiR-276

UGAUGAUGUUUGAGCAGGGAUGGACAGA

LTR2e MD

MdopiR-277

UGCUUUGUUUCUUCUCAGGCUGGUCAC

LTR106 MD

MdopiR-278

UUGCAGCCAUAUUAACCCGGAAGUCCGCUC

L1 Mdo

MdopiR-279

UUAAAAAAAAAUACUGGUGUAGA

L1 Mdo

MdopiR-280

UACACAGCCAGUUAGUGUCUGAGGCCACAAAA

Mar1a Mdo

MdopiR-281

UGGCAAACCUUUUAGAGACAGAGUGCCCA

OposCharlie3a

Flanking sequence from MonDom5 was used to query each insert sequence in RepeatMasker to obtain a transposon identification. One sequence, MdopiR-263, was deleted from this group and reassigned as MdopiR-162 after further analysis

Finally, a group of seven sequences returned a mixture of BLAST results, including one microRNA (miR-451), three (MdopiR-284, MdopiR-285, and MdopiR-286) in predicted M. domestica genes (LOC100030896, LOC100025308, and LOC100078421 respectively), two in unassigned chromosome positions, and one (MdopiR-287) that matched a marsupial transposon (LTR-176_MD) but with a limited number of hits (51).

Discussion

We have confirmed the expression of piRNA-like small RNAs in the genome of the marsupial Monodelphis domestica using a size-directed small-RNA cloning strategy against testes RNA. The cloned sequences conform to the canonical characteristics of piRNAs discovered in placental mammals. That is, the expressed sequences are primarily 28 and 31 nucleotides in length, display a pronounced preference for a 5′ uridine, map into distinct transcription clusters, and many appear to be complementary to marsupial transposons.

Ro et al. (2007a) confirm that while other biological functions are likely, the principal regulatory function of piRNAs appears to be transposon suppression during spermatogenesis. This function has been summarized in the piRNA “ping-pong” model proposed by Hannon and colleagues (Aravin et al. 2007; Brennecke et al. 2007). The essence of the ping-pong model is that 28–31-nt RNAs are processed from long, multi-piRNA transcripts by an as yet unknown mechanism, after which the processed piRNAs are bound to a PIWI protein and transported to a target transposon transcript. The PIWI-piRNA complex then binds complementarily to a transposon target, creating double-stranded RNA substrates for a DICER-like siRNA enzyme. Double-stranded cleavage products now contain a piRNA antisense strand that is transported back to the piRNA transcripts and additional piRNA sense strands are generated by the action of an argonaute protein. One important aspect of this function is that piRNA ping-pong would predict a sort of molecular “arms race” between transposons and piRNAs (Aravin et al. 2007). A corollary of this arms race would be the prediction that piRNA sequences would tend to diverge relatively rapidly between lineages to keep up with the fairly rapid evolution of transposon sequences within lineages (Cardazzo et al. 2003; Martin et al. 1985). This prediction is supported by the observation that while the chromosomal locations of piRNA transcription clusters are fairly well conserved in human, mouse, and rat, the piRNA sequences themselves are not (Ro et al. 2007).

We examined the predicted lack of sequence conservation in our candidate M. domestica piRNA-like small RNAs via BLAST of all 310 sequences with both mouse (Mus musculus) and platypus (Ornithorhynchus anatinus) genome assemblies. Using the criteria that were used for mapping, the comparison with mouse returned nine matches (2.9%) while the comparison with platypus returned 19 matches (6.1%). Six of the nine mouse sequence matches were also found in platypus. Of these, one is the microRNA miR-451, three are annotated mouse piRNAs, and the other two are transposons. When the same criteria were applied to a BLAST search of the tammar wallaby (Macropus eugenii) genome (GenBank Trace Archive), 43 matches (13.8%) were found. Ten of the 43 matches were for sequences that matched transposons in M. domestica. These identities were confirmed in the wallaby by submitting match plus flanking sequence to RepeatMasker. An 11th sequence match in this group is one in which a transposon was not identified in M. domestica but was identified in the wallaby sequence as a marsupial SINE (MIR3_MarsA).

In invertebrates and fishes, and to a lesser but substantial extent in eutherian mammals, piRNAs are believed to act as suppressors of retrotransposon function and mobilization during gametogenesis by binding Piwi proteins and cleaving specific sequences in transposon transcripts (Brennecke et al. 2007; O’Donnell and Boeke 2007). Having been detected in species as distant as insects, fish, and eutherian mammals, it is no surprise that we have also identified piRNAs in a marsupial mammal and that these sequences conform to the canonical characteristics of mammalian piRNAs. Analysis of the M. domestica genome (Mikkelsen et al. 2007) has revealed that the greater size of the opossum genome (∼3.7 Gb) relative to those of sequenced eutherian genomes can be accounted for by the very high proportion of the genome comprising repetitive element families. Specifically, more than half of the opossum genome appears to be composed of such elements, and a substantial fraction of these bear the hallmarks of transposable elements (Gentles et al. 2007). In addition, comparison with the opossum genome indicates that genomic evolution among eutherian lineages has occurred largely through innovation in conserved, noncoding elements (CNEs) rather than in protein-coding genes (Gentles et al. 2007; Mikkelsen et al. 2007), and there is every reason to suppose that similar lineage-specific innovations and diversification among CNEs and transposable elements has occurred among marsupial lineages as well. If so, the evolutionary arms race between transposon evolution and piRNA evolution would be expected to produce marsupial-specific piRNA species and drive rapid differentiation of piRNAs between marsupial lineages. Identification of canonical marsupial-specific piRNA-like sequences in the opossum fits this scenario well, as does the fact that the piRNA-like sequences detected so far in the opossum display very little overlap with the tammar wallaby. We anticipate that a deep-sequencing strategy utilizing one of the recently developed massively parallel sequencing platforms will reveal many more piRNAs in the opossum genome and will further aid in the evaluation of lineage-specific piRNA evolution.

Acknowledgments

The authors thank Dr. Matthew Breen for assistance with the opossum chromosome figures. This study was funded in part by Grant Number RR014214 from the National Institutes of Health.

Supplementary material

335_2008_9109_MOESM1_ESM.doc (926 kb)
(DOC 925 kb)
335_2008_9109_MOESM2_ESM.doc (22 kb)
(DOC 21 kb)

Copyright information

© Springer Science+Business Media, LLC 2008