Background

A large fraction of repetitive sequences originate in eukaryotic genomes from mobile genetic elements (MGEs), which are grouped into 2 classes. Class I transposons require mRNA intermediates, whereas class II elements transpose directly as DNA. Tc1-like class II transposons, named after the founder gene in Caenorhabditis elegans, are probably the most widespread MGEs in nature, and are found in fungi, plant ciliates, nematodes, arthropods, fish, amphibians and mammals (reviewed in [1]). These genes contain a single reading frame that encodes for the enzyme transposase, which is flanked with terminal inverted repeating units. Transposition of class II MGEs is characterized by limited requirements for host cellular factors, which can account for their remarkable ability to undergo horizontal transfer across great taxonomic distances [2]. MGEs are regarded as parasitic genes, and proliferation is deleterious for the host. Therefore, transposition is commonly followed by inactivation. MGEs could play an important role in the evolution of teleost fish, and comprise a substantial fraction of their genome. Multiple copies of Tc1-like transposons were found in several fish species from different orders [36], however transcription of teleost Tc1-like genes has not been documented. Recent high-throughput sequencing of salmonid cDNA libraries has revealed surprisingly large number of transposon transcripts. Most if not all these sequences contain incapacitating mutations in the reading frames, and can be regarded as transcribed pseudogenes or null-alleles. At present, rainbow trout (Oncorhynchus mykiss) and Atlantic salmon (Salmo salar) TIGR Gene Indices [7] contain 50773 and 31341 unique cDNA sequences, respectively among which we found several hundreds MGE, Tc1-like genes being most abundant. This wealth of sequence information provides insight into the structure and evolution of transposons. We also cloned several copies of two rainbow trout Tc1-like genes with complete reading frames, which adds to understanding the transposon life cycle. Multiple gene expression analyses with high-density cDNA microarray indicate stimulation of rainbow trout transposons transcription in response to stress, toxicity and pathogens.

Results

In order to search for transcribed transposons in salmonid fish, we compared the unique cDNA sequences from TIGR gene indices with 262 metazoan transposon proteins retrieved from Swissprot. Blastx found matches in 273 rainbow trout and 163 Atlantic salmon sequences at a cutoff value e < 10-20 (Table 1). The ratio of transposons to all cDNA sequences in salmonids was 2.35–31.6 times greater than in other vertebrate species with available gene indices, and a large fraction (68.3%) showed similarity to 11 proteins of the Tc1 family. Tc1-like transcripts were found in the gene indices of 4 other teleost fish species and in the African clawed frog Xenopus laevis, but not in higher vertebrates. To estimate an approximate number of Tc1-like genes, 6 genomic clones of Atlantic salmon were analyzed, covering 1.2 MB [Genbank:AC148723, Genbank:AC149099, Genbank:AC148779, Genbank:AC148618, Genbank:AC148617 and AC148616], and a blastx search found 56 matches at a cutoff value of 10-20. The size of the haploid Atlantic salmon genome is equal to 3 billion base pairs. Assuming a relatively homogenous distribution of Tc1-transposons, about 140,000 copies can be expected, which is 3 orders of magnitude greater than the number of Tc1-like sequences in the salmonid fish gene indices. It is necessary to note that TIGR contigs are produced by automatic assembly of EST sequences that have at least 95% homology in overlaps of minimum units of 40 base pairs [8]. Therefore, transcripts of recently diverged transposon copies could be merged unless they were flanked by differing 5'- and 3'-untranslated sequences. The numbers of transcribed transposons can be greater than the number estimated by searching across gene indices, but it is likely that only a minor fraction of salmonid Tc1-like genes is active.

Table 1 Transcribed vertebrate transposons1.

Most Tc1-like sequences from the rainbow trout gene index contained incomplete reading frames. To analyse the structural relatedness of these genes, we used 38 fragments, which encode at least 170 amino acids at the C-termini. Thus, 31 sequences were from the TIGR database plus 7 more were produced in this study (i.e., newly identified genes named Glan and Barb [Genbank: AY880883-AY880888]). The maximum likelihood (ML) tree consisted of 3 single genes and 11 clades, containing 2 to 5 sequences (Figure 1). Seven clades (I-VII) could be regarded as a part of the multi gene family, as sequence identity with the nearest neighbors was in the range of 35–73%; the remaining 4 clades were highly divergent. Only 1 of 6 clades containing more than 3 genes (X in Figure 1) was split into clusters supported by high bootstrap values. The highest sequence identity were observed for Glan and Barb. However, divergence within other clades could in theory be overestimated, due to forced assembly of similar transcripts.

Figure 1
figure 1

Structural relatedness of transcribed rainbow trout Tc1-like transposons. The ML tree is based on sequences encoding for at least 170 amino acids at the C-termini. The TIGR sequences are designated by the accession numbers, transposons Barb and Glan were identified in this study. Tree was produced using Dnaml (Phylip package), nodes with bootstrap values greater than 0.75 are indicated. Accession numbers of TIGR contigs in the clusters are: I-1 – BX884691; I-2 – TC46229; II-1 – TC52875; II-2 – TC46539; III-1 – TC46343; III-2 – TC46498; III-3 – TC46491; III-4 – CB488722; IV-1 – TC46455; IV-2 – CA377451; IV-3 – TC47500; IV-4 – TC47499; IV-5 – CA369142; V-1 – TC46391; V-2 – CA369399; VIII-1 – TC54663; VIII-2 – TC54666; IX-1 – CB488927; IX-2 – TC46493; X-1 – TC46521; X-2 – TC46197; X-3 – TC46383; X-4 – TC46308; XI-1 – CA361855; XI-2 – TC54683; XI-3 – CR368829.

Sequencing of complete reading frames for 3 copies of Glan and 4 copies of Barb allowed for the study of transposons molecular evolution within the rainbow trout genome. All 7 sequences include incapacitating mutations, which prevent translation of transposase. Barb copies have diverged up to 11.4 ± 1.4% (mean ± SD) and accumulation of deletions (Figure 2) impeded reconstruction of the ancestral protein. Low divergence of Glan copies (1.9 ± 0.8%) suggest relatively recent transposition into the rainbow trout genome. The consensus sequence of 3 reading frames was identical to TIGR contig [TGI:TC46394], which encoded a protein with characteristic features of Tc1-like transposase, such as the presence of domains required for nuclear localization, DNA binding, cleavage and joining and DDE motif found in the catalytic units of diverse MGEs and retroviruses (Figure 3). Noteworthy of mention is that all transcripts of Glan contained mutations that prevented translation of transposase, however the consensus contig sequence that was assembled from a large number of EST from different cDNA libraries appeared intact. Given that the rate of spontaneous mutations in vertebrate germ cell lines is ~10-5 [9], transposition of Glan could have taken place as recently as only a few thousand years ago. We also performed PCR screen of this gene in fish from inland reservoirs of Finland, where it was detected in 17 species from different orders (Table 2). Interestingly, three of the four species in which Glan was not found (grayling, whitefish and vendace) are more closely related to rainbow trout than most of those species carrying this gene. Low divergence of copies and discontinuous distribution are evidence for horizontal transmission. We analysed the rates of synonymous (Ks) and non-synonymous (Ka) substitutions in newly identified rainbow trout transposons using a sequence of the nearest Swissprot protein (hypothetical transposase of plaice, with 77% homology [Genbank:CAB51372]) as a reference (Table 3). With respect to this transposase, the Ks/Ka ratio was high and significantly greater in the younger gene (4.85 ± 0.30 in Glan and 3.35 ± 0.04 in Barb). A comparison of copies indicated a probability of divergent evolution in Glan (Ks/Ka = 0.69 ± 0.05). In Barb the rates of synonymous and non-synonymous substitutions approached unity (Ks/Ka = 1.03 ± 0.12), which is consistent with the protracted accumulation of mutations in a solely neutral mode.

Table 2 Presence of Glan in genomic DNA of fish from inland waters of Finland.
Table 3 Synonymous (Ks) and non-synonymous (Ka) divergences of the rainbow trout transposons Glan and Barb. Plaice transposase was used as a reference.
Figure 2
figure 2

Alignment of protein coding sequences of new transcribed rainbow trout TC1-like transposases cloned in this study.

Figure 3
figure 3

Alignment of deduced amino acid sequences of rainbow trout transposon Glan with the Tc1-like transposon of plaice, Pleuronectes platessa (Genbank: CAB51372), TPA of frog, Rana pipiens (Genbank: DAA01561) and tcb1 of the nematode, C. elegans (Genbank: NP_741053). Homeodomain (indicated with box) is involved in the binding of DNA; the DDE/D motif (indicated with arrows) is present in diverse MGE [1].

We did not find sequences of any other known proteins in the salmonid Tc1-like contigs and probably transposons are transcribed from own promoters. Evidence for regulation of transposon transcription rate was produced in microarray analyses. We used a platform designed for studies of responses to environmental stress, toxicity and pathogens in salmonid fish [10, 11]. Overall this platform included more than 1300 genes, 7 of which were similar to Tc1-like transposons. Five transposons showed marked differential expression in response to external stimuli, such as handling stress, exposure to toxic compounds and injection of cortisol or bacterial antigens; the microarray results were confirmed with real-time qPCR. A consensus profile of transposons correlated with those of 27 protein coding genes in 35 microarray experiments (Pearson r2 > 0.63); examples are presented in Figure 4. The highest correlation (r2 > 0.8) was shown by classical markers of cellular stress, such as the aryl hydrocarbon receptor, MAP kinase 13 and hypoxia inducible factor. We also searched for enrichment of Gene Ontology [12] categories in this list of genes. Significant over-representation was demonstrated by functional classes that are implicated in protective reactions to acute conditions (i.e., response to stress and oxidative stress, defense and humoral immune response, receptors and regulators of transcription, Table 4).

Figure 4
figure 4

Differential expression of transposons in rainbow trout. The panel presents profiles of transposons and a group of genes that showed coordinated expression in 35 microarray experiments (Pearson r2 is indicated). Selected experiments are reported: 1–8 – exposure of yolk sac rainbow trout fry to model contaminants [10], β-naphthoflavone, low (1) and high (2) doses; cadmium, low (3) and high (4) doses; carbon tetrachloride, low (5) and high (6) doses; pyrene, low (7) and high (8) doses. Items 9–12 – response to handling stress [11, GEO:GSM22355], kidney, 1 day (9) and 5 days (10); brain, 1 day (11) and 5 days (12).

Table 4 Over-presentation of Gene Ontology classes in a list of genes that showed co-ordinated expression with Tc1-like transposons. The composition of microarray was used as a reference. The gene names and expression profiles are shown in Figure 4.

Discussion

Having a large number of transposons and a preponderance of Tc1-like genes is a characteristic feature of salmonid genomes [35]. Sequence analysis of the transcribed genes (Figure 1) suggested repeated transpositions at protracted intervals. A wide distribution of Tc1 transposons is believed to account for the limited requirements in the host cellular factors. Sleeping Beauty, an artificially reconstructed salmon transposon [13] is capable of integration into genomes of a wide range of vertebrate species, however different efficiencies observed in various cell lines point to possible involvement of the recipient's proteins in transposition [14]. This is in line with a wide, though limited, distribution of homologs for the transcribed salmonid DNA transposons, which have not been found among EST of warm-blood vertebrates. The variety of salmonid Tc1-like genes is truly remarkable. Phylogenetic analyses of 38 sequences, encoding homologous fragments of C-termini, found 14 distinct types of Tc1-like genes and the real number of different genes is probably much greater. Our search was based on the similarity between proteins that were available from Swissprot, and many transposons could remain unidentified due to the lack of known homologs. Furthermore, the rapid decay of transposons could impede the discovery of ancient transposed genes.

Despite the wide spread occurrence of Tc1-like transposons in vertebrates, not a single active gene has been identified to date [14]. Inactivation of salmonid DNA transposons could take place within a relatively short period of time after transmission. Cloning of 2 transposons having a relatively low divergence rate indicates the rapid accumulation of incapacitating mutations, such as insertions or deletions, shifts of reading frames and premature stop codons (Figure 2). Analysis of synonymous and non-synonymous substitutions suggest that inactivation of younger transposon could be preceded by selective divergence within a limited period of time, whilst evolution of the older gene appeared entirely neutral. Results from a study on recent transpositions in insects from four different orders suggest that selective constraints operate exclusively by horizontal gene transfer [15]. A comparison of rainbow trout genes with Tc1-like transposon from plaice confirm the conservation of functionally important domains in distantly related proteins, which is gradually obscured during the course of neutral evolution (Ks/Ka ratios in the younger Glan and older Barb genes are 4.85 ± 0.31 and 3.35 ± 0.04 respectively).

Silencing of transposons takes place at the transcriptional or post-transcriptional levels [16], and both of these mechanisms could act in salmonid fish. Based on frequency in a 1.2 MB gene fragment, we can assume that Tc1-like genes comprise nearly 5% of the Atlantic salmon genome and only a minor fraction preserved transcription after inactivation of translation. A survey of salmonid EST found untranslated transposons in both sense and anti-sense polarities, which is the main prerequisite for the formation of double-stranded RNA. RNA interference (RNAi) is implicated in the control of transposition in germ cell lines of the nematode C. elegans [17], and existence of an RNAi pathway in rainbow trout was recently demonstrated [18]. Suppression of intact transposases with mutant genes was also reported in insects, and this control mechanism is referred to as dominant-negative complementation [19].

Given efficient protection against transposition in animals, the tenacity and variety of transposons may seem surprising. Sustained persistence of transposons can, in theory, account for their residence in unknown reservoir species; e.g, the role of parasites as potential vectors of horizontal transfer across phylogenetically remote organisms has been hypothesized [20]. However this can hardly explain the remarkable diversity of these genes. The ML tree (Figure 1) suggests that at each transposition event, the rainbow trout genome was invaded with a new transposon, although several genes could have a common ancestor. If expression of translated genes is under control of RNAi, successful recurring transposition of identical or highly similar genes appears unlikely. Hence, the combination of neutral or divergent evolution within a genome with transfer across phylogenetic boundaries can be the most efficient strategy for the survival and diversification of transposons. PCR screen detected Glan in genomes of many fish species from phylogenetically remote taxonomic groups (Table 2). Clades I-VII of the ML tree (Figure 1) can correspond to genes that evolved independently. However it is also possible that descendants of a founder gene has returned several times into the rainbow trout genome, after passage through a chain of co-evolutionary hosts.

Results of our microarray studies suggested that a large fraction of transcribed Tc1-genes can be stimulated under acute conditions, but it remains unclear whether or not the transposon transcripts have any functional importance. In theory, they can be transcribed from cryptic promoters, which are activated by the remodelling of chromatin. However, input from stress-responsive promoters is also plausible. Transcripts can be required for the control of transposition through RNAi, however such explanation appears unlikely for highly mutated genes that were probably silenced long ago in evolutionary time. Currently, there is a growing body of evidence to support the involvement of non-coding RNA into the regulation of gene expression at different levels. The role of small and large RNA in modification of the chromatin structure was reviewed recently [2123]. Stress-induced transcription of short interspersed repeated sequences (SINE) was reported in human, mouse and silkworm [2427]; and SINE transcripts were shown to enhance translation of reporter genes [28, 29]. Stress also activates the transcription of satellite III repeat [30]. Because this large non-coding RNA is consistently associated with chromatin, it can be required for the protection of sensitive regions from stress-induced damage. Synthetic double-stranded RNA enhances the expression of anti-viral proteins in salmonid fish [31, 32] and, in theory, endogenous dsRNA can mimic a viral infection by launching protective reactions.

Tc1-like transposons are co-regulated with a group of genes that are implicated in the defense response, signal transduction and regulation of transcription. In this respect, it is noteworthy to mention that Tc1-like fragments reside in a number of immune and stress-related salmonid genes, such as the non-classical MHC class I antigen [Genbank:AF091779, Genbank:AF091780], immunoglobulin heavy chain, IgD [Genbank:AF141605, Genbank:AF278717], inducible nitric oxide synthase iNOS/NOS2 [Genbank:AJ295231] and aryl hydrocarbon receptor 2b, AhR2 [Genbank:AY463929]. Multiple copies of Glan in sense and anti-sense polarity are found in rainbow trout MHC class Ia [Genbank: AB162342.1] and b [AB162343.1] regions, in the vicinity of genes encoding the complement proteasome subunit and several MHCI loci. Modulation of gene expression that was due to the insertion of transposons has been documented in many studies (reviewed in [33]), and involvement of dispersed repeated sequences into the co-ordination of gene expression with similar functions was hypothesized more than three decades ago [34]. The role of transposon transcripts in the regulation of gene expression was recently discovered in yeast [35], where the induction of an RNAi-dependent silent chromatin configuration resulted in reduced transcription of several meiotic genes. A possible involvement of transposon transcripts in the regulation of gene expression in salmonid fish remains to be studied.

Conclusion

Information produced by the sequencing of salmonid fish cDNA libraries and identification of recently transmitted transposons provide new insights into the structure, diversity and molecular evolution and life cycle of mobile genetic elements. High expression levels in rainbow trout tissues and marked responses to external stimuli indicate potential functional roles of transposon pseudogenes, which requires further investigation. These genes can be used as sensitive molecular biomarkers of acute conditions in salmonid fish.

Methods

Sequence analyses

The expressed transposons were analysed in rainbow trout and Atlantic salmon TIGR Gene indices, and sequence comparison was conduced with stand-alone blast [36]. Multiple sequence alignments were performed with ClustalW [37] and the conserved protein domains were searched in Interpro [38]. Synonymous and non-synonymous substitutions in newly cloned genes were determined by Dnasp [39]. Maximum Likelihood (ML) phylogenetic analyses were performed with Phylip [40].

PCR cloning

The conserved sequence in the untranslated regions of rainbow trout Tc1-like transposases were inferred from EST sequences. RNA was extracted from rainbow trout brain and treated with Rnase-free Dnase (Promega). Reverse transcription with SuperScriptIII (Invitrogen) was primed with oligo(dT). PCR was performed with primer 5'-ATACAGTGCCTTGCGAGAGTATTC-3' using a TripleMaster kit (Eppendorf), and the product was cloned into pcDNA3.1/V5-His-TOPO (Invitrogen). Seven of nine sequenced clones contained complete reading frames.

PCR analyses of genomic DNA

The fish samples were collected from inland reservoirs in Finland, and DNA from fin clips was prepared with salt extraction [41]. In brief, fin samples were digested at 60°C in 440 μl of buffer (1.8 mM EDTA, 9 mM Tris-HCl, pH 8; 1.8% SDS) containing 160 μg of proteinase K. After addition of 300 μl of 6 M NaCl, lysates were centrifuged at 12,000 g for 30 min. DNA in the supernatants was precipitated with isopropanol, washed with 70% aqueous ethanol and dissolved in water. The 654-base fragments of Glan PCR were amplified using the Hot Master Taq kit (Eppendorf). Primers (5'-TGAAGAATCGACAACAAGTGGGACA-3' and 5'-GCTTTCTTCTTGCCACTCTTCCATA-3') were annealed to templates at 68°C.

Microarray analyses

Fish experiments, design of the rainbow trout cDNA microarray, hybridization protocol and data analyses are described in detail elsewhere [10, 11]. In brief, the platform included 1,300 genes printed in 6 replicates each. The dye swap design was used; each sample containing RNA from 4 individuals was hybridized to slides with reverse assignment of fluorescent dyes (Cy3- and Cy5-dCTP from Amersham Pharmacia). Labels were incorporated at the stage of cDNA synthesis. The measurements in spots were filtered by criteria I/B ≥ 3 and (I-B)/(S I +S B ) ≥ 0.6, where I and B were the mean signal and background intensities, respectively, and SI,S B were the standard deviations. Lowess normalization was performed and differential expression was analysed with the Student's t-test (p < 0.01). The genes were ranked by the log(p-level).