Background

Transposable elements (TEs) are widespread in the living world. Few multicellular eukaryotes lack these genetic elements, which have the ability to move and multiply throughout the genome[1], but see[2] for exceptions in endosymbiont genomes. There is substantial variation in the proportion of mobile genetic elements and especially TEs across genomes[3]. Numerous studies have explored whether TE history reflects the evolutionary history of their hosts (that is, co-evolution[47]). Due to their high densities in eukaryotic genomes and their irreversible mode of insertion, some authors have proposed that SINEs (Short INterspersed Elements, non-autonomous Class I TEs) may be used for inferring the evolutionary history of their host[8]. The phylogeny of a strictly vertically transmitted TE (from parents to offspring) should reflect that of its host(s). However, genetic and evolutionary processes, which may occur during speciation, may affect subsequent phylogenetic inferences based on TE sequences. For example, horizontal transfer (that is, transmission by vectors unrelated to host reproduction), incomplete lineage sorting (that is, the maintenance of ancient polymorphism in closely related species), lack of genetic variability or hybridization/introgression may account for incongruent phylogenetic patterns between TEs and their hosts[913]. Hybrid genomes should carry TE(s) from the genomes of their parental species. In contrast, if horizontal transfers occur between species that share ecological niches or are infected by the same parasites, we do not expect that TE phylogenies will match species phylogenies.

The Daphnia pulex complex has been intensively studied due to its dominance in freshwater habitats in North America, and its variation in reproductive mode and ploidy level. Daphnia usually reproduce by cyclic parthenogenesis, which is clonal reproduction interrupted by bouts of sexual reproduction. However, some lineages reproduce by obligate parthenogenesis[1418]. The D. pulex complex includes seven species that have been distinguished on the basis of morphological, ecological, and genetic data[1922]. In North America, two species of the complex, D. pulex and Daphnia pulicaria, are dominant in freshwater habitats, and are morphologically similar but ecologically distinct[19], although they hybridize in nature[18, 2325]. Diploid hybrids always have D. pulex mitochondrial genomes (mtDNA) and have been found to reproduce by obligate parthenogenesis in nature based on patterns of allozyme variation[18, 26] and laboratory crosses[26], but see[27] for experimentally produced hybrids capable of sexual reproduction. Analyses of allozyme variation indicate that introgression is rare or nonexistent in areas where the two species co-occur[16, 18, 25] but a recent study using mitochondrial and nuclear markers has shown that introgression between these species had a substantial impact on their evolutionary history[28].

Hybridization has also occurred among other species in the D. pulex complex[2932]. For example, hybridization between North American species and circumarctic or European Daphnia species has been suggested based on allozyme, mtDNA or nuclear markers such as the ribosomal intergenic spacer (IGS)[21, 33, 34]. Some hybrids found in arctic and subarctic regions are polyploids (mainly triploids[35]). Polyploids with D. pulex, D. pulicaria and Daphnia middendorffiana (a circumarctic species) mtDNA are thought to have arisen from hybridization between D. pulex and D. pulicaria, or with another species which no longer exists as a cyclic parthenogen[22, 29, 30, 32]. Another group of polyploids has been found in Daphnia tenebrosa, a circumarctic species[31] and includes both triploid and tetraploid clones[35] but their hybrid nature is unknown[22].

The D. pulex genome has recently been sequenced[36] and numerous Class II DNA transposons have been identified from it[37]. One Class II element, Pokey (Subclass I; sensu[38]) of the piggyBac superfamily, has been extensively studied in non-hybrid populations. It inserts in a site-specific fashion into a unique TTAA site in the tandemly repeated 28S rRNA genes[3941] and is also encountered in other parts of the genome. Two Pokey elements from D. pulicaria have been completely sequenced[39] and show that the element has 16 bp imperfect terminal inverted repeats (TIRs) and encodes a putative transposase that is much longer than originally estimated and contains an intron (Y. Bigot, personal communication,[40]). In addition, the non-coding region upstream of the transposase gene contains a repeated sequence that is similar to a sequence in the unique region of the host’s ribosomal IGS[34, 39, 40]. Based on patterns of polymorphism observed among natural populations, previous studies[41, 42] have suggested that Pokey may be active in sexual populations of D. pulex but not in obligate parthenogens. Although Class II DNA transposons are more prone to transfer horizontally between lineages than retrotransposons, horizontal transfer does not seem to play a major role in the evolution of Pokey as 28S rRNA genes and mtDNA phylogenies of species in the subgenus Daphnia are congruent with the phylogeny of Pokey elements[7] leading to the conclusion that Pokey elements may co-evolve with their hosts.

The purpose of the present study is to describe Pokey sequences in hybrid isolates from the D. pulex complex. Specifically, we aim to determine if Pokey sequences inserted in rRNA genes reflect the putative parental species origins of the hybrid isolates. As Pokey elements insert in 28S rRNA genes, they are subject to genetic processes such as inter-allelic recombination, gene conversion and unequal crossing over[43] that may lead to the concerted evolution of these genes. If Pokey elements are strictly vertically inherited and if recombination is negligible, we expect that a phylogeny of Pokey elements should follow the host phylogeny, and Pokey elements encountered in hybrid genomes should correspond to those encountered in their host parental species. If recombination is important in Pokey evolution and has occurred between elements showing recent activity and after hybridization, hybrid isolates will carry either sequences inherited from both parents or mosaic sequences between parental types. Alternatively, if recombination occurs between inactivated elements that diverged in the distant past, we could see Pokey mosaic sequences that do not necessarily reflect the progenitor species of the hybrid isolates in the D. pulex complex.

After cloning and sequencing partial Pokey elements from representatives of species in the D. pulex complex, we: (a) detected recombinant elements and located the recombination breakpoints in each, (b) estimated recombination rate parameters across sequences using a coalescent based method, (c) determined parental sequences from which the recombinant elements arose and (d) reconstructed the evolutionary history of the Pokey sequences. Moreover we used a PCR-RFLP protocol to search for the presence of different Pokey elements in additional hybrid and non-hybrid isolates of the D. pulex complex across freshwater habitats and geographical locations.

Methods

Daphnia samples

We analyzed 51 isolates representing six of the seven lineages of the D. pulex complex identified by Colbourne et al.[20]. Only D. melanica was not included. Nine Pokey sequences were obtained from nine isolates studied by Penton and Crease[7]. We established parthenogenetic lines (isolates) in the laboratory from 42 individual females sampled from nature between 2004 and 2010 (see Additional file1) and cultured them using standard techniques[44]. For 50 of these isolates, genomic DNA was extracted from 10 to 30 individuals weighing approximately 100 mg (wet weight) using the DNeasy Tissue kit (QIAGEN Inc., Mississauga, ON, Canada) according to the supplier’s protocol. Species identity of these isolates was assessed by combining information on morphology, haplotype of the mitochondrial ND5 gene and genotype of the nuclear Lactate dehydrogenase (Ldh) gene as described in[22]. Ploidy levels were previously assessed using microsatellite genotypes and flow cytometry[35].

Structure of the Pokey transposase gene

RT-PCR was performed on RNA samples extracted from a sexual D. pulex isolate to determine if Pokey transposase transcripts were present and to provide further evidence for the presence of an intron in the transposase gene. RNA was extracted from the parthenogenetic offspring of one sexual D. pulex isolate PX2-ON-9 (provided by Dr. M. Cristescu at the University of Windsor). The tissue was stored in RNA Later (Qiagen) at −20°C and RNA was extracted using the RNAqueous-4PCR kit (Life Technologies Inc., Grand Island, NY, USA) and standard manufacturer’s protocols. The absence of DNA contamination was verified using standard PCR and Pokey transposase primers. Reverse transcription of the RNA used the SuperScript III One-Step RT-PCR with Platinum Taq Kit (Life Technologies) and standard manufacturer’s protocols. Primers used were Pok4065F (5’-TGATTCACCGAGGCCTCAGTTC) with Pok4488R (5’-GAATCGCTCGCGAGTCATGG) and Pok5026F (5-TCGAACCTGCAGCCGGACGAATTTGCAG) with Pok5985R (5’-CACGTCGGTTAGAATATTCTGGCTCGTCGG). Numbers correspond to nucleotides in the 6.6 kb element from D. pulicaria (GenBank accession # AY115589.1,[8]).

Cloning and sequencing Pokey alleles

To examine the impact of hybridization on Pokey diversity, Pokey alleles inserted in 28S rRNA genes from various isolates of the D. pulex complex were cloned and sequenced. An approximately 1600 bp fragment of Pokey from 16 isolates (2 diploid hybrids, 5 polyploid hybrids, 1 introgressed polyploid (TE3-MB-4), 3 diploids, 3 polyploids with unknown hybrid state, 3 diploid non-hybrids) of the D. pulex complex (see Additional file1) were amplified using Pok5026F and the primer 28SR (5-TCCATTCGTGCGCGTCACTAATTAGATGAC), which is located 46 bp downstream of the Pokey TTAA target site. PCR reactions were performed using the Phusion™ high-fidelity PCR kit (Finnzymes, Woburn, MA, USA) according to the supplier’s protocol. The thermocycling profile consisted of 1 cycle of 1 minute at 94°C, 35 cycles of 30 seconds at 94°C, 30 seconds at 55°C and 2 minutes at 72°C, with a final incubation of 5 minutes at 72°C. PCR amplicons were cloned using the StrataClone™ Blunt PCR Cloning Kit (Agilent Technologies Inc., Santa Clara, CA, USA) according to the supplier’s protocol. For each isolate, 10 to 12 plasmids chosen randomly from a minimum of 40 isolated colonies were purified using the E.Z.N.A. Plasmid Mini Kit II (Omega Bio-tek Inc., Norcross, GA, USA). Pokey sequences obtained from these plasmids were named using the labels of the isolate from which they were cloned plus the number that corresponds to the colony from which they were picked (for example PX2-QC-8_28 is the Pokey allele isolated from colony 28 produced during the cloning of PX2-QC-8 Pokey sequences). These Pokey alleles were compared to Pokey sequences from a previous study which focused on non-hybrid isolates[7] to ensure that the elements analyzed in this study belong to the Pokey lineage. Partial Pokey sequences were manually aligned using the ClustalW module of BioEdit v.7.0.5[45]; available athttp://www.mbio.ncsu.edu/BioEdit/bioedit.html.

Detecting putative recombination events

Potential recombination events among Pokey alleles were explored in a dataset of 53 sequences in which ambiguous sites and indels were deleted to prevent identification of false recombination events. The phi-test[46], implemented in Splitstree v4.10[47], and GARD (Genetic Algorithm for Recombination Detection, available via the DataMonkey website athttp://www.datamonkey.org[48]) were used to determine if recombination events may be detected in our Pokey sequence dataset as well as in partial transposase sequences included in them.

Hudson and Kaplan minimum recombination events, Rm[49] were calculated with the DnaSP v5 software[50]. Moreover we used kwarg (available at http://www.stats.ox.ac.uk/~lyngsoe/section26/), a heuristic alternative to the branch and bound method implemented in beagle[51], to calculate Rh[52], which gives an estimate of the lower bound of the minimum number of recombination events. Kwarg does not guarantee that the minimal recombination history will be found but returns a history with a low number of recombination events under the infinite sites assumption. We ran kwarg 2,048 times with the default scoring scheme and recorded the number of recombination events for each iteration.

Recombination rate parameter

As all the recombination analyses detected recombi nation events (see Results), we estimated Pokey‘s population recombination rate parameters along the sequence using a full likelihood coalescent approach based on a finite-sites model implemented in LAMARC v2.1.6[53]. This approach estimates the parameter r LAM = c/μ where c is the recombination rate per site per generation and μ is the neutral mutation rate per site per generation. This model considers that (a) all recombination is homologous, (b) there is no association between recombination frequency and sequence divergence, (c) gene convergence and interference did not occur, (d) recombination events are selectively neutral and (e) recombination rate does not vary along the sequence. The F84 evolutionary model was used as it is a closer fit to the substitution model, Tamura 3-parameter with gamma shape and invariant sites, inferred with MEGA5[54] than the other model (GTR) offered by LAMARC. The F84 model is also computationally faster. To account for substitution rate differences between non-synonymous sites and all other sites, two categories of relative mutation rate were assigned. Our sampling strategy included 20 initial chains of 20,000 genealogies and two final chains of 1,000,000 genealogies with 2,500 samples discarded per chain. Adaptive heating was used to improve the search of parameter space with initial relative temperature of 1, 1.1 and 2. The entire analysis was replicated four times and the results were combined using the algorithm of Geyer[55]. Coalescent-based analyses in LAMARC estimate both genetic diversity at the population level, θ LAM = 4N e μ (where μ is the mutation rate per site for a diploid population and Ne is the effective population size) and the recombination rate r LAM = c/μ (where c is the rate of recombination per site per generation). Therefore, the population recombination rate parameter can be calculated as R = θ LAM r LAM  = (4N e μ)(c/μ) = 4Nec.

Detection of recombination breakpoints

Nucleotide diversity among sequences (that is, π and θ) and mutation rate heterogeneity along sequences (the Gamma shape parameter, α) may have a dramatic impact on the performance of different methods to detect recombination breakpoints[5658]. For example, Posada and Crandall[57] have shown that different methods may have a different propensity to detect recombination when nucleotide diversity varies and mutation rate heterogeneity is fixed. Moreover, they have shown that mutation rate heterogeneity may lead some methods to detect false positives[5658].

We estimated nucleotide diversity (π and θ) using DnaSP v5.10.01[50]. Mutation rate heterogeneity (α) and a nucleotide substitution model were inferred using a maximum likelihood framework in MEGA5[54]. To detect recombination breakpoints and identify which parental sequences formed the recombinant alleles, we applied a stepwise protocol of the maximum chi-square method[59, 60] using the stepwise package v0.1.1 (available athttp://stat.sfu.ca/statgen/research/stepwise.html). The maximum chi-square method is a sliding window approach that computes a chi-square statistic from a 2 x 2 table with counts of matches and mismatches between the two half windows separated by a proposed breakpoint. As high mutation rate heterogeneity in Pokey sequences (see Results) may lead to false negatives when window half-widths are too small, we used different values of window half-width (70, 80, 90 and 100 nucleotides) and 100,000 Monte Carlo replicates for the permutation distribution. The maximum chi-square analysis was reiterated including breakpoints detected by previous steps, as the stepwise protocol specifies, until no further breakpoint was detected. This maximum chi-square analysis identifies two sequences involved (one derived, one parental) for each recombination event detected. Recombination breakpoint detection by the maximum chi-square method is relatively more powerful and less prone to type I error (false positives) than other algorithms[56, 57] when sequences are subject to high mutation rate heterogeneity. Each event detected between different sequence pairs was checked by eye.

Phylogenetic analyses and ancestral recombination graphs

Dendrograms were produced for each fragment separated by recombination breakpoints highlighted by GARD analysis of the 53 sequences using the Neighbor-joining algorithm[61]. We constructed a phylogenetic network using the NeighborNet algorithm implemented in Splitstree v4.10[47]. To understand more accurately the process and evolutionary history of recombination in the Pokey fragments cloned here, we constructed an ancestral recombination graph using kwarg[51, 52, 62]. We chose the simulation with the lowest Rh value to build this graph.

RFLP analysis

Given the intra- and intergenomic variability of Pokey alleles in isolates from which elements were cloned, and the high number of recombination events in these sequences (see Results), we aimed to explore the presence/absence of these alleles in additional isolates belonging to the D. pulex complex. To do this, we developed a restriction enzyme analysis (that is, RFLP) based on Pokey sequences analyzed so far and the recombination breakpoints detected by the GARD analysis. We then screened Pokey alleles from 41 isolates (see Results) for RFLP haplotypes. PCR reactions (two replicates for each isolate) were performed in 25 μL reaction mixtures containing 1x Econotaq PCR buffer, 2 mM MgCl2, 50 μM dNTP, 0.1 μM of each primer (Pok5026F and 28SR), and 1 unit of EconoTaq polymerase (Lucigen Corporation, Middleton, WI, USA) or using the Phusion™high-fidelity PCR kit according to the manufacturer’s protocol (Finnzymes). PCR amplifications were performed using the thermocycling profile described above. Restriction enzymes that cut at specific sites in some Pokey alleles may be used to highlight Pokey sequences that have undergone recombination events. We used NebCutter v2.0 (http://tools.neb.com/NEBcutter2/) on each of the 53 sequences to choose enzymes that would highlight different pure and recombinant Pokey alleles. The restriction enzymes Dra I, Bsp HI and Bst EII (New England BioLabs Inc., Ipswich, MA, USA) were used in a two-step protocol. Dra I, Bsp HI and Bst EII enzymes cut at 380 bp, approximately 790 bp, and approximately 1080 bp. The eight possible conformations were differentiated on a 3% agarose gel. In the first step, Dra I and Bsp HI were used in the same mix containing NEBbuffer4 and digestion was conducted at 37°C for two hours. In the second step, NEBbuffer3 at a 1X final concentration, BSA at 1U/μl and Bst EII at 1U/μl were added to the solution and digestion was conducted at 60°C for one hour. Digested PCR products were separated on 3% agarose gels for two hours at 95 volts.

Cloning and PCR amplifications may produce chimeric sequences (for example, recombinants between alleles) and lead to an overestimation of recombination in nature. We assessed this hypothesis by re-amplifying Pokey sequences from isolates used in our phylogenetic survey using the PhusionTM high-fidelity PCR kit and by increasing elongation steps of the thermocycling profile to three minutes instead of two minutes. Increasing the elongation steps should reduce the probability of producing chimeric sequences[63]. Results were compared to PCR-RFLP results with two-minute elongation steps to determine if the recombination pattern is reduced. Moreover, if recombination events identified in the data are not artifacts, PCR-RFLP results should be concordant with expectations based on the cloned elements that we sequenced.

Results

Structure of the Pokey transposase gene

Sequencing of RT-PCR products, performed on RNA extracted from a sexual isolate of D. pulex (PX2-ON-9, Additional file1), showed that a sequence upstream of the proposed stop codon in the 6.6 kb Pokey element is actually the beginning of a 68 bp canonical GT-AG intron. When spliced out, this adds another exon (exon 2) and extends the transposase coding region by 585 bp (Figure 1).

Figure 1
figure 1

Schematic view of a Daphnia Pokey element. IGS repeats (gray boxes) are sequences similar to a region of the D. pulex intergenic spacer. 5- and 3-ITR are imperfect 16 bp inverted terminal repeats. IGS, intergenic spacer.

Recombination signals

Forty-four sequences of the 3’ end of Pokey elements were added to nine Pokey alleles previously sequenced from non-hybrid isolates for a total of 53 sequences analyzed. These sequences have variable lengths due to multiple indels and vary from 1,419 bp (PX2-QC-8_30) to 1,471 bp (TE3-MB-4_9). These sequences contain two exons of the Pokey transposase gene separated by an intron of 68 bp to 70 bp. Only the last 259 bp of exon 1 is included, but exon 2 is complete and varies in length from 585 bp to 627 bp. Nucleotide diversity estimates based on the full-length Pokey sequences are π = 0.048 and θ = 0.037, and the gamma shape parameter (α) is 0.8423 with 63.3% of sites being invariant. Nucleotide diversity in exon 2 is an order of magnitude higher than exon 1 and about twice as high as diversity in the non-coding region downstream of the transposase gene (Table 1). Total amino acid (aa) sequences encoded by the two exons range from 282 aa in most sequences to 297 aa with most of the length variation (196 aa to 210 aa) occurring in exon 2. One sequence (TE3-MB-1_14) has a one base pair insertion that causes a frameshift mutation in the first exon.

Table 1 Nucleotide diversity and mean similarity between sequence pairs for each part of the Pokey sequences

Recombination analyses performed on the 53 Pokey sequences, as well as in the reduced dataset (about 852 bp to 891 bp in length) of 53 partial transposase sequences, showed evidence of recombination events (phi-test; P <0.01). Two recombination breakpoints were found using GARD analysis of the entire sequences (at bp 540 and bp 820; Kishino-Hasegawa-tests, P <0.001 for each breakpoint) and in the transposase coding region dataset (at bp 498 and bp 686; Kishino-Hasegawa-tests, P <0.001 for each breakpoint). Six of the seven recombination events detected by the chi-square method are encountered in exon 2 (see Additional file2). One step, using the stepwise protocol of the chi-square maximum method with the different half width windows (70, 80, 90 and 100), was sufficient to detect at least 20 recombination events, named A to T (see Additional file3). The intricate way in which the numerous recombination events shape the evolution of Pokey elements led us to identify groups of sequences that we named a, b, c, d, e, f, g, h and i (see Additional file4). The phylogenetic network generated from 53 partial Pokey sequences shows clear groups of sequences (Figure 2 and Additional file4). Six sequences (TE3-MB-4_9, PX2-QC-8_1, PX2-QC-8_29, PX3-QC-1_20, EPC2-SP-2_1 and EPC-DE-3_1) were not included in groups as they may have originated from different recombination events than did the sequences with which they cluster in the network (Table 2 and Figure 2). Sequences amplified from some isolates belonging to the same species clustered together, each in a distinct group (for example, D. tenebrosa group d, European D. pulex group e, D. pulex group h , Figure 2). Sequences from the isolate TE3-MB-4 (a + TE3-MB-4_9) and sequences from triploid isolates from Churchill (PC3-MB-6 and MI3-MB-2, group c), which have D. pulicaria or D. middendorffiana sensu stricto mtDNA haplotypes[35], also form separate groups. Various sequences amplified from D. pulicaria (PC2-QC-4 and PC2-SK-5) or hybrid isolates (PX2-QC-8, PX3-QC-1 and PC3-QC-1) clustered together in group g. Three groups included sequences amplified from isolates that belong to different species (Figure 2). Two groups (‘b + EPC2-SP-2_1 + EPC-DE-3_1’ and f) included sequences from pulex-pulicaria hybrid isolates (PX2-QC-8 and PC3-QC-3), D. tenebrosa isolates (TE3-MB and TE2-MB) and European D. pulicaria isolates (EPC2-SP-2 and EPC-DE-3). Group i is represented by sequences from D. arenata (AR2-OR-1), D. pulicaria (PC2-QC-4) and a pulex-pulicaria hybrid (PX2-QC-8). An allele from one D. tenebrosa isolate (TE2-MB-2) grouped with sequences from pulex-pulicaria hybrid isolates in group g.

Figure 2
figure 2

Phylogenetic network of 53 partial Pokey sequences from isolates of the Daphnia pulex complex. The network was constructed using the NeighborNet algorithm. Dashed lines delineate sequence groups presented in Table 2. Colors represent species or hybrid state of isolates: red = D. pulex, light blue = D. arenata, grey = D. tenebrosa, black = European D. pulex, dark blue = European D. pulicaria, green = D. pulicaria, orange = putative pulex-pulicaria hybrids, purple = the isolate with D. tenebrosa mtDNA and a D. pulex nuclear genome.

Table 2 Recombination events in which groups of sequences (see text for more explanation) are involved according to maximum χ 2 and GARD dendrogram analyses

Groups of sequences with the same recombination site may represent a single recombination event between ancestral sequences. For example, the recombination breakpoint of events A (a : c), B (a : h), C (a : PX2-QC-8_29), D (j : TE3-MB-4_9) and E (i : a) were estimated to be bp 592 to bp 607 for all pairs. Thus, we considered them to be the same recombination event, I. This approach results in seven different recombination events named I (bp 592 to bp 607), II (bp 748 to bp 800), III (bp 815 to bp 856), IV (bp 825 to bp 856), V (bp 917 to bp 920), VI (bp 952 to bp 957) and VII (bp 1,038 to bp 1,073) (see Additional file4). Although it is still possible, it seems unlikely that all recombination events between sequence pairs were independent and not due to past events in the common ancestors of groups of sequences.

The Hudson-Kaplan minimum number of recombination events (Rm) in our partial Pokey sequences dataset is 30 according to the algorithm implemented in DnaSP v5.10.01. According to the kwarg algorithm, Rh, the estimated lower bound of the minimum number of recombination events, follows a normal distribution with mean 96.39 and a standard deviation of 3.10.

Recombination rate

Using LAMARC to co-estimate mutation rate θ LAM = 5.94 × 10-2 (with 95% support intervals of 3.63 × 10-2 and 8.16 × 10-2) and the overall recombination rate r LAM = 5.09 × 10-1 (with 95% support intervals of 3.57 × 10-1 and 7.68 × 10-1) allowed us to estimate a population recombination rate of 3.02 × 10-2 recombination events per site per generation for Pokey elements.

Ancestry of recombinant Pokey sequences

Recombination breakpoints estimated by the GARD analysis (at bp 540 and bp 820) seem to correspond to the recombination breakpoints of events I and IV estimated using the maximum chi-square analysis (at bp 592 to bp 607 and bp 825 to bp 856, respectively). Dendrograms (Figure 3) were produced from each fragment (from bp 1 to bp 540, from bp 541 to bp 820, and from bp 821 to bp 1,450, respectively) of the Pokey sequences bound by recombination breakpoints identified by the GARD analysis. Combining the dendrogram analysis (Figure 3) and the maximum chi-square method of recombination breakpoint detection (see Additional file3) allowed us to estimate from which parental sequences the recombinant Pokey sequences originated. Recombination events and probable parental sequences of each group of sequences are summarized in Table 2. While groups d, e and f do not seem to derive from any recombination event (Figure 3, Table 2), other sequences may have originated from multiple events (TE3-MB-4_9 and PX3-QC-1_20; Table 2). Moreover, some sequences may have originated from recombination events between sequences that were themselves recombinant. For example, sequences from group a originated from recombination between sequences from groups h or i and from group c. This last group may originate from a recombination event between sequences from groups d and i (Table 2, Figure 3).

Figure 3
figure 3

Unrooted dendrograms of 53 partial Pokey sequences from isolates of the Daphnia pulex complex. (A), (B) and (C) represent trees constructed from three regions of recombinant Pokey sequences (bp 1 to approximately bp 540, bp approximately 540 to approximately bp 820, and bp approximately 820 to approximately bp 1,450, respectively) defined by recombination breakpoints identified by GARD analysis. Colors represent species or hybrid state of isolates: red = D. pulex, light blue = D. arenata, grey = D. tenebrosa, black = European D. pulex, dark blue = European D. pulicaria, green = D. pulicaria, orange = putative pulex-pulicaria hybrids, purple = the isolate with D. tenebrosa mtDNA and a D. pulex nuclear genome. GARD, Genetic Algorithm for Recombination Detection.

Non-synonymous polymorphism at the nucleotide level leads to a high level of polymorphism at the aa level and, coupled with recombination events, produces a diversity of partial aa sequences of the Pokey transposase (see Additional file2).

Ancestral recombination graphs

Additional file5 shows an ancestral recombination graph with 89 recombination events (Rh) identified by the Song and Hein algorithm[52]. The evolutionary history of Pokey allele TE3-MB-4_12, which contains 38 recombination events, is highlighted in red. This ancestral recombination graph shows putative past recombination events that were not detected by the maximum chi-square method. For example, the evolution of EPX2-DE-1_1, encountered in a European D. pulex isolate, seems to involve 25 past recombination events.

Pokey alleles in the v1.1 Daphnia genome

The v1.1 genome sequence of D. pulex[36] was scanned for our partial Pokey sequence using TBLASTN to explore which alleles are present. Sequences from contigs with E-value = 0.0 and without indels more than 10 nucleotides long were aligned with our 53 sequences to generate a dataset of 58 sequences. A NeighborNet phylogenetic network (see Additional file6) shows that Pokey elements from the genome sequence group with sequences encountered in diploid and polyploid isolates from the pulicaria group (PC2-SK-5 and PC3-QC-3), in diploid D. tenebrosa (TE2-MB-1) and in European D. pulicaria (EPC2-CZ-1).

RFLP analysis of Pokey alleles in additional isolates

To test if recombinant Pokey alleles are encountered in additional clones of the D. pulex complex, we used a restriction enzyme analysis (that is, PCR-RFLP) of alleles amplified from the 28S rRNA genes from 41 isolates (see Additional file7). Eight different haplotypes were found (numbered 1 to 8) using three restriction enzymes, some of which correspond to sequences from the groups encountered in our phylogenetic analysis. The distribution of these haplotypes is summarized in Table 3. We found that rRNA genes from different genomes may contain up to five different Pokey RFLP haplotypes with an average of 1.74 per isolate and a standard deviation of 0.96.

Table 3 Taxonomic, habitat and geographical distributions of RFLP haplotypes

Discussion

Hybridization among Daphnia species

Our results show that recombination has a dramatic impact on the evolution of Pokey alleles in the rDNA of Daphnia. The recombinant sequences do not seem to be due to cloning artifacts as some isolates may have just one recombinant Pokey allele in their genome (for example, triploid isolates PC3-SK-5 and PC3-MB-4 or diploid isolates EPC2-CZ-1, EPC2-SP-2 and PX2-QC-1, Additional file7). Increasing the extension time of the PCR cycles to avoid artefactual recombination events during amplification did not change the RFLP results (data not shown). Moreover, Daphnia isolates show RFLP patterns that are concordant with those expected from the cloned Pokey sequences (see Additional file7).

Pokey sequences show substantial nucleotide diversity (π = 0.048 and θ = 0.037, Table 1), undergo recombination at a rate of 3.02 × 10-2 events per site per generation and rarely spread by horizontal transfer[7]. As predicted in the case of recombination events between Pokey elements that co-evolve with their host and have undergone recent activity, hybrid isolates from the D. pulex complex carry either sequences inherited from putative parental species or mosaic sequences produced by recombination between parental types. Most Pokey sequences from pulex-pulicaria hybrids (for example, PX2-MB-1, PX3-QC-1_1, PC3-QC-1_28 or PX2-QC-8_28) either cluster with Pokey alleles from D. pulex (group h) or D. pulicaria (PC2-QC-4 and PC-SK-5_2 in group g) in the phylogenetic network (Figure 2) or are recombinants between sequences from these groups (that is, PX3-QC-1_20). Pokey sequences from polyploid isolates with mtDNA from either D. pulicaria or D. middendorffiana from Manitoba, Canada seem to be recombinants between sequences found in D. tenebrosa (group d) and D. pulicaria, D. pulex or D. arenata (that is, groups g, h or i, respectively, Table 2). This suggests that D. tenebrosa may be one of the parental species of these polyploid isolates, which are thought to originate from hybridization between D. pulex and D. pulicaria or with another species that no longer exists as a cyclic parthenogen[22, 29, 30, 32]. Phylogenetic analysis of Ldh A sequences is also concordant with a hybrid origin of polyploid isolates with D. middendorffiana mtDNA (Figure 3 in[64]).

The introgressed triploid isolate, TE3-MB-4, which has D. tenebrosa mtDNA and most likely a D. pulex nuclear genome[22, 31]), carries alleles from group ‘a + TE3-MB-4_9’ with a minor trace from D. tenebrosa and major traces from D. pulex or D. pulicaria alleles. The presence of alleles with traces of D. pulex or D. pulicaria in D. tenebrosa or European D. pulicaria isolates (TE3-MB-1, TE3-MB-3, TE2-MB-1, TE2-MB-3, EPC2-MB-2 and EPC2-MB-3) may be a remnant of hybridization events between these four species. Similarly, Pokey alleles from European D. pulicaria isolates (EPC2-SP-2_1 and EPC2-DE-3_1) also seem to be recombinants between alleles from North American D. pulex or D. pulicaria and D. tenebrosa (recombination events G, K and L in Table 2 and Figure 2). Although Vergilino et al.[22] did not detect evidence of hybridization between species in the pulicaria group (North American D. pulex and D. pulicaria) and the tenebrosa group (D. tenebrosa and European D. pulicaria) using microsatellite loci, Weider et al.[33] did find evidence for it using allozyme data. Moreover, Ambrose and Crease[34] have shown that IGS segments from some isolates of European D. pulex may have originated from North American D. pulicaria. As North American D. pulex and D. pulicaria have been found in Europe[21], introgression of Pokey alleles from North American species into European species is possible and may generate recombinants that persist for long periods of time and are able to spread over broad geographic areas.

Although most Pokey sequences from hybrid isolates are mosaics, which is consistent with recent recombination between alleles from different parent species, some recombinant alleles may be more ancient and represent retained ancestral polymorphism (that is, incomplete lineage sorting). The clustering of recombinant alleles in group b, which are encountered in D. tenebrosa isolates, a hybrid D. pulex (PX2-QC-8) and a polyploid with D. pulicaria mtDNA (PC3-QC-3) suggest that these recombinant alleles are ancient.

Pokey alleles from the isolate whose genome was sequenced[36] cluster with alleles from group f (Figure 3 and Additional file6) whereas Pokey alleles from other D. pulex isolates cluster in group h. This may be the result of incomplete lineage sorting or a signature of past hybridization events that have occurred in the population from which the genome isolate was sampled. Alternatively, the genomic Pokey alleles may have a different evolutionary history than their paralogs in rDNA. Moreover, these genomic alleles show a limited signature of recombination according to the GARD analysis. As the recombination rate is variable across the Daphnia genome[65, 66], we may expect this result if Pokey elements are inserted in genomic regions showing low recombination rates.

Recombination parameters among Pokey elements

The number of recombination breakpoints detected in our sampled sequences by different methods was highly variable. Most recombination breakpoint detection algorithms underestimate the number of recombination events[57] and their accuracy varies depending on whether the method uses summary statistics or the maximal information in the sample[58]. The maximum chi-square method performed in a stepwise manner and for different window sizes detected a minimum of seven unique recombination events between Pokey alleles from different species of the D. pulex complex (see Additional file3, Table 2). This value is likely to be an underestimate as a high recombination rate may lead to overlapping recombination events. As the maximum chi-square is a substitution-based method that compares substitution distributions from two windows flanking a tested site, recombination breakpoint detection will be dependent on the size of the windows used when mutation rate heterogeneity is high and/or nucleotide diversity is low. Small windows should increase the probability of detecting recombination breakpoints in high diversity regions but may be unable to detect recombination events in low nucleotide diversity regions. Increasing the size of the windows to increase recombination detection accuracy in regions with low nucleotide diversity may lead to less accuracy in the regions with multiple recombination events. The estimate of Hudson-Kaplan minimum recombination events (Rm) is 30 and the haplotype lower bound (Rh) estimate is 96.39 ± 3.10. Rm is a parsimony-based method (that is, the four gamete test) and may underestimate the actual lower bound of recombination events, particularly when the recombination rate is high[67, 68]. In contrast to the former estimators, Rh, estimated using a maximum-likelihood framework that uses the maximal information in the sample, is probably more reliable than Rm[68] and shows the highest lower bound of recombination events. The accuracy of Rm and Rh to estimate the lower bound of recombination events has not been tested on sequences subjected to low and high heterogeneous mutation rates. If numerous recombination events occurred during the evolutionary history of Pokey elements, we may need sophisticated phylogenetic tools and analyses, such as the Ancestral Recombination Graph, to deepen our understanding of their evolutionary history.

The recombination rate of Pokey sequences estimated in our survey is high (3.02 × 10-2 events/site/generation) but comparable to the value (2.0 to 6.0 × 10-2 events/generation) calculated by McTaggart et al.[65] based on changes in the relative frequency of rDNA length variants in four apomictically-propagated Daphnia obtusa lines over 90 generations. To our knowledge, this is the first time that the rate of recombination has been explicitly estimated for a DNA transposon. This estimate has to be viewed with caution as some of the assumptions of the algorithm used, such as the absence of gene conversion and interference, and the selective neutrality of recombination events, may be violated. Even so, Rh and Rm estimates lead to the conclusion that the recombination rate is high between Pokey elements. Such a high rate of recombination between Pokey elements may explain why no significant differences were found between the amount of variation in 28S rRNA genes with or without Pokey insertions in cyclic and obligately asexual D. pulex isolates[69] as recombination may take place between the Pokey elements as well as the 28S rRNA genes themselves.

Effects of recombination on evolution of Pokey elements

Numerous studies have focused on the relationship between recombination rates in host genomes and TE distributions[7073] or dynamics[37, 41, 74]. The distribution and abundance of TEs may be influenced by selective forces[7577], the stochastic process of mutation[78] and/or the availability of insertion sites. One theoretical model based on the action of selection posits that the impact of recombination events between TEs inserted in non-homologous loci is deleterious (that is, ectopic exchange model) and, therefore, insertions will accumulate in regions of the genome with low rates of recombination[77]. However, empirical data from sequenced genomes show that the relationship between TE density and recombination rate depends on the organism and the TE class or family[4, 39, 71, 7982].

Recombination between TE copies of the same family has been reported before[8387], but the significance of this for TEs in general is rarely discussed. Our survey focused on variation in an approximately 1,600 bp sequence. This dataset does not include the TIRs known to be essential for transposition or the IGS sequences upstream of the transposase gene that may influence expression of the transposase[39]. If the TIRs or transposase gene have diverged in different species and recombination events produce mosaic sequences, the transposase may not be able to recognize TIRs with which it has not co-evolved. Thus, we might expect that many recombination events between divergent TE alleles will inactivate their mosaic product. Alternatively, Schaack et al.[74] recently suggested that TEs might benefit from recombination in a similar manner to the host, by generating new variants that are able to evade sequence-specific host suppression machinery. Recombination between different insertions could bring together independent, beneficial nucleotide substitutions that may then be favored by intragenomic selection and result in an increase in copy number of the new, recombinant insertion.

Fragments of the Pokey transposase gene have different evolutionary histories. Moreover, non-synonymous polymorphism coupled with recombination has produced a substantial level of Pokey variability at the aa level (Table 1, Additional file2). This variability is primarily seen in exon 2, which is thought to contain a nuclear localization signal and a sequence similar to a Plant Homeo Domain or cysteine-rich zinc finger that is possibly involved in chromatin or protein-protein interactions[40, 88, 89]. For example, the fourth cysteine residue of the putative zinc finger is replaced either by a serine or a tyrosine residue in 3 of the 40 different aa sequences we analyzed (see Additional file2). This variation might translate to flexibility in target site interactions or changes to dimerization dynamics during the formation of the transposition complex, which could have implications for mobility and transposition efficiency.

DNA transposons in Caenorhabditis elegans are located preferentially in recombination hotspot regions whereas retrotransposons are not, and Duret et al.[71] suggested a role for recombination in the transposition process. Glass et al.[69] did find indirect evidence that Pokey insertions in rDNA could spread through unequal crossing-over, so perhaps there is some credence to this suggestion. In addition to Pokey in Daphnia, the retrotransposons R1 and R2 in Drosophila and numerous other arthropods are also found inserted into rDNA. Using the ectopic exchange model as a framework, Zhang et al.[90] demonstrated that eukaryotic hosts may tolerate a high load of retrotransposable elements in their rDNA because they generally have many more rDNA copies than the minimum required for rRNA synthesis. The location of Pokey elements in rDNA may be advantageous for both element and host, as frequent recombination between different copies located in these loci may increase the efficiency of intragenomic selection, and insertions may be only mildly harmful due to the multi-copy nature of rDNA. Pokey haplotype 2 is a recombinant allele (Figures 1, Table 2) that was found in numerous hybrid and non-hybrid Daphnia isolates having different origins and occupying different locations and ecosystems (Table 3). This might be such a successful haplotype due to the amalgamation, through recombination, of separately beneficial mutations into a single lineage that was then able to spread to different populations and species within the D. pulex complex.

Conclusions

Most of the hybrid isolates from the D. pulex complex analyzed in this study carry Pokey sequences inherited from putative parental species or mosaic sequences produced by recombination between parental types. In addition, recombination may play an important role in generating life history variation in this TE. Future studies should test the activity of recombinant Pokey alleles to assess if recombination events have had a significant effect on their transposition capacity. Additionally, testing for recombinants among the other families of TEs found within D. pulex would be useful to determine whether rates of recombination among TEs depend in part on the region of the genome they inhabit or on other properties such as mode of transposition. Moreover, the possible beneficial effects of recombination on TE-level evolution in general should be investigated more thoroughly by estimating rates of recombination for different TEs in various genomes.