Introduction

LTR-retrotransposons are important sources of genetic diversity and have had a major impact on the structure of plant genomes (reviewed in Kumar and Bennetzen 1999). They have been implicated in various chromosomal modifications (Lönnig and Saedler 2002) and have played an important role in the variability of genome sizes (Tikhonov et al. 1999). Models of genome evolution have recently emerged whereby bursts of retrotransposon amplifications are offset by removal of retrotransposon sequences (Vitte and Panaud 2003; Bennetzen et al. 2005; Vitte and Panaud 2005). Removal processes can involve unequal homologous recombination leading to the formation of solo-LTRs (Shirasu et al. 2000) or illegitimate recombination that removes all or partial retrotransposon sequences to generate truncated or fragmented remnants (Devos et al. 2002; Ma et al. 2004). Together, amplification and removal leads to a rapid turnover of retrotransposon populations and most complete elements found in several plant species are thought to be less than 3 million years old (Vitte and Panaud 2003; Bennetzen et al. 2005). Conditions leading to bursts of amplification have been associated with stress and changes in environmental conditions (Grandbastien 1998; Grandbastien et al. 2005; Kalendar et al. 2000), and with breakdown of epigenetic controls (Okamoto and Hirochika 2001). However those processes leading to removal are still poorly understood and it is not known whether they occur continuously or are triggered by specific stimuli over short evolutionary periods.

Allopolyploidy (interspecific hybridization and polyploidy) is a widespread and recurrent evolutionary process in plants and is often accompanied by major structural, cytogenetic, epigenetic and functional changes to the genome, leading to new phenotypes and to reproductive isolation (reviewed in Wendel 2000; Comai 2000; Ramsey and Schemske 2002; Adams and Wendel 2005; Chen and Ni 2006). Bursts of genomic instabilities, that include DNA loss and gain, have been shown to occur at the onset of polyploidization. Transcriptional activation and/or mobilization of transposable elements have been reported in response to species crosses with wheat, Arabidopsis, and rice (Kashkush et al. 2002, 2003; Madlung et al. 2005; Liu and Wendel 2000; Liu et al. 2004; Shan et al. 2005), as well as with mammalian hybrids (O'Neill et al. 1998). In other works, losses of retroelement sequences have been observed in Triticeae interspecific and intergeneric hybrids (Han et al. 2003), and 20% of AFLP changes detected in introgressed rice lines were due to transposable elements sequences (Wang et al. 2005). Therefore, transposable elements are suspected to act as recombinational hotspots mediating recombinations and deletions, as recently described at the wheat Ha locus (Chantret et al. 2005). Interspecific hybridization may trigger extensive and rapid turnover of retrotransposon populations, however no quantitative studies on changes have been performed that consider simultaneously retrotransposon populations and divergence in the progenitor parents and in the derived polyploid. This is the purpose of this paper.

Tobacco (Nicotiana tabacum) is a recent allotetraploid species that formed less than 200,000 years ago (Clarkson et al. 2005) and most probably less than 20,000 years ago (M. Chase personal communication) from two distinct progenitor species that contributed tobacco's S- and T-genomes. The nearest relative of the maternal S-genome is Nicotiana sylvestris (Clarkson et al. 2004), with plastid DNA of these two species being 99.9% identical (Yukawa et al. 2006). The nearest relative of the paternal T-genome was identified as a particular accession of Nicotiana tomentosiformis (Murad et al. 2002). Although most allopolyploid species display lower DNA 1C value than expected from the sum of parental genomes (Leitch and Bennett 2004), the tobacco genome size is very close to the sum of N. sylvestris and N. tomentosiformis (source: http://www.rbgkew.org.uk/cval/database1.html). However, the structure of the tobacco genome is not the sum of the genomes found in the two progenitor diploids N. sylvestris and N. tomentosiformis. Deviations from additivity include intergenomic translocations (Lim et al. 2004a), rearrangements of rDNA units (Volkov et al. 1999; Lim et al. 2000a), loss of satellite repeats related to intergenic spacers (Lim et al. 2004b) and alteration in copy numbers of endogenous pararetroviruses (Gregor et al. 2004) and Tnt1 copia-type retrotransposon (Melayah et al. 2004).

In this paper we better characterize the evolutionary dynamics of retrotransposons subsequent to the creation of tobacco by extending our previous analyses of Tnt1 copia-type elements to include four retrotransposon populations and an analysis of several accessions of each species. In this way a holistic view of retrotransposon diversity and population dynamics in polyploid tobacco and diploid parents can be derived. We use these data to evaluate the contribution retrotransposons have made to the divergence of the genomes of tobacco and its diploid progenitor species. We used amplified fragment length polymorphism (AFLP) to calibrate the evolutionary histories of the retrotransposon populations reconstructed using SSAP (Sequence-Specific Amplification Polymorphism) profiles. SSAP is a high resolution retrotransposon-anchored PCR strategy which allows the simultaneous detection of multiple insertions (Waugh et al. 1997). The SSAP profiles provided a comparative estimate of the retrotransposon-linked genetic diversity of tobacco and progenitor species, allowed quantification of the contribution each diploid progenitor made to tobacco retrotransposon populations, and provided estimates of any losses or amplifications of retrotransposon sequences subsequent to tobacco's formation.

Materials and methods

Plant material

For SSAP and AFLP analysis plant material shown in Table 1 was used. Accessions of diploid progenitors N. sylvestris and N. tomentosiformis originate from Nicotiana germplasm sources from several countries [Institut du Tabac, Bergerac, France; USDA, USA; Institut für Pflanzengenetik und Kulturpflanzenforschung (IPK), Gatersleben, Germany; Nijmegen Botanical Garden, The Netherlands] and horticultural sources. The lines of tobacco were obtained from Institut du Tabac (Altadis, Bergerac, France) and represent diverse landrace accessions that have not been derived from breeding programs and are termed “ancestral lines” (Delon et al. 1999). Nuclear DNA was extracted from fresh young leaves of all accessions as in Dellaporta et al. (1983).

Table 1 List of accessions

For in situ hybridizations, the N. tabacum line cv 095-55 (Royal Botanic Gardens, Kew), N. sylvestris ac TW137 (USDA), and the N. tomentosiformis ac. NIC479/84 were used.

Ty1-copia elements used for SSAP

Four populations of copia-type retrotransposons endogenous to N. tabacum were selected for this study, one derived from Tto1 (Hirochika 1993), two from Tnt1 (Grandbastien et al. 1989), and one from the newly characterized Tnt2 element. Tnt1 and Tto1 are present in tobacco in several hundred copies and a few tens of copies, respectively, and both possess active copies able to amplify in response to stress (Hirochika 1993; Melayah et al. 2001). The Tto1-1R primer (5′-CACTCCCCTGTTAGGAAACATTC-3′, positions +267, +289) was designed in the LTR of the tobacco Tto1 (D83003) element, orientated towards the 5′ end. For Tnt1, two different primers were used, Tnt1-OL13 (5′-CTTATACCTTGTCTGTGAAACC-3′, positions +265, +286) and Tnt1-OL16 (5′-TTCCCACCTCACTACAATATCGC-3′, positions +317, +339), both designed in the LTR U5 region of the Tnt1-94 (X13777) element. An alignment of U5 sequences of various Tnt1 elements (Fig. 1) indicate that the Tnt1-OL16 target sequence is conserved across Solanaceae species while the Tnt1-OL13 target sequence is conserved in Nicotiana only. We also observed that Tnt1-OL13 amplifies Tnt1 sequences from Nicotiana species only (Vernhettes et al. 1998), while Tnt1-OL16 also amplifies Tnt1-related sequences in other Solanaceae, such as tomato and pepper (Costa et al. 1999; Araujo et al. 2001; Tam et al. 2005). We selected both Tnt1 primers for this study, on the assumption that each primer probably introduces a bias in its recognition of Tnt1 elements, the Tnt1-OL13 primer recognizing populations more recently amplified in Nicotiana (and presumably more active), while the Tnt1-OL16 primer amplifies more ancient Tnt1 populations.

Fig. 1
figure 1

Alignment of partial LTR sequences of Tnt1 elements from various Solanaceae species. Only sequences downstream from the TATA box are shown. The boundaries of the U3, R, and U5 regions are indicated, as well as the position of the Tnt1-OL13 and Tnt1-OL16 primers. Similarities above 60% are indicated by light grey backgrounds, above 80% by dark grey backgrounds and complete similarities by white letters on a black background. N. tabacum = Nicotiana tabacum Tnt1-94 element (X13777); N. plumbaginifolia = Nicotiana plumbaginifolia Tnp2-F97 element (Leprince et al. 2001); N. benthamiana = partial LTR present in the Nicotiana benthamiana mixed tissue hybrid transcript EST760241 (CK297527: LTR position 1–366); S. peruvianum = Solanum peruvianum Retrolyc1-1 element (AF228701); S. lycopersicum = partial LTR present in the Solanum lycopersicum EST274036 callus hybrid transcript EST274036 (AW030781: LTR position 1–338); S. tuberosum = solo-LTR in first intron of Solanum tuberosum nematode resistance-like protein Gro1-1 gene (AY196159, LTR position 724–1303). Dashes indicate gaps and dots indicate missing sequences (ESTs truncated in 5′)

Tnt2 partial sequence was isolated from the tobacco cv Xanthi (XHFD8 line) using a SSAP-based strategy developed by Pearce et al. (1999), that isolates the 3′ end of POL coding sequences of any copia-type element, together with downstream sequences including putative 5′ ends of 3′ LTRs. We followed the protocol of Pearce et al. (1999), except that we used Hot Star Taq (Qiagen) and omitted the motif 2 PCR primer, replacing it by a non-biotinylated version of the motif 1 primer (RTKHID/E) for the second PCR amplification. PCR products were cloned in pGEMT-easy vectors (Promega) according to the manufacturer recommendations, and used to transform DH10B Escherichia coli cells. Inserts were sequenced by Eurogentec with universal primers. Blastn analysis of one PCR clone, named Tnt2 (EF437960), revealed a disrupted copy of the corresponding retrotransposon at the N. sylvestris Lhcb1 locus (Hasegawa et al. 2002). Tnt2-Lhcb1 is inserted in reverse orientation 1.2 kb downstream from the Lhcb1*1 gene and 0.6 kb upstream from the Lhcb1*2 gene, and is located in two non-overlapping sequence accessions [AB012636, positions 2133–3659, includes the 3′ LTR (2133–2485) and the final 357 aminoacids of the pol domain; AB012637, positions 1–1299, includes the first 226 aminoacids of the gag region and the 5’ LTR (947–1299)]. The 5’ and 3’ LTRs of Tnt2-Lchb1 are 353 bp long and 99.4% identical, suggesting that this copy inserted recently and Tnt2 elements may be active. No overall nucleotidic similarity was detected between Tnt2 and Tnt1 or Tto1, whether in LTR or internal sequences. At aminoacid levels, the partial gag region of Tnt2-Lchb1 shows 35% identity and 54% similarity with Tnt1, while the partial pol domain shows 56% identity and 73% similarity with Tnt1. The Tnt2d primer (5′-CCGAACCTCGTAAATTCTGGTG-3′) was designed in the LTR of Tnt2-Lbch1 (positions +224, + 245 of the LTR), orientated towards the 3′ end.

Marker analysis

SSAP

The SSAP technique (Waugh et al. 1997) was adapted as described in Tam et al. (2005) except that 5 μl of the digested–ligated genomic template DNA was used for PCR amplification. Genomic DNA was digested with EcoRI, ligated to the corresponding adaptor, and amplified using the non-labelled adaptor primer E00 (Tam et al. 2005) and the 33P-labelled retrotransposon primers described above. Amplified 33P-labelled products were separated on 6% denaturing polyacrylamide gels and exposed after drying to Kodak Biomax films (Eastman Kodak, Rochester).

AFLP

EcoRI–MseI AFLP (Vos et al. 1995) was performed as previously described (Julio et al. 2006), using the following primer combinations: E-AAG + M-CAA, E-AGG + M-CAC, E-ACG + M-CTT, E-ACC + M-CAA, E-ACA + M-CTA, E-AAG + M-CAC. Samples were analysed via capillary electrophoresis on an ABI 310 Genetic Analyzer (AppliedBiosystems, Foster City, CA, USA) and data were treated with Genescan software.

Data analysis

SSAP and AFLP bands were scored as present (1) or absent (0). Binary data were analysed using PAUP*4.10 (Swofford, 2002). Pairwise distances were computed using the Nei–Li distance option in PAUP*4.10 using the minimum evolution option. Dendrograms were constructed using the UPGMA method (unweighted pair group using arithmetic averages, Sneath and Sokal 1973) and support for each group was estimated by 1,000 bootstrap replicates. Diversity levels were estimated as percentages of polymorphic bands. For interspecies diversity, percentages of polymorphisms were calculated among all accessions of each species combination, either two by two or across all three species. For diversity analyses (Tables 2, 3, 4), only bands for which the presence/absence status was clear for all accessions were scored, while for the analysis of the contribution of each progenitor diploid to the tobacco genome (Table 5), we scored all bands for which the origin could be established unequivocally, regardless of their intra-specific polymorphism status. The number of bands scored for the latter analysis is therefore higher than the number of bands scored for diversity analyses. In tobacco, the proportions of bands from each progenitor species were evaluated excluding common bands shared between the three species, and calculated relative to the total number of bands in tobacco, scoring each tobacco band shared with both parental species as a single band. This calculation was assumed to be the less biased, as it is not possible to determine whether the two orthologous parental insertions have been both transferred to tobacco or have been lost from one of the two parental sub-genomes.

Distribution of GRD sequences

GRD53 sequences were detected by PCR using primers F1, R1, and R2, as previously described (Murad et al. 2002).

Fluorescent in situ hybridizations (FISH)

FISH was carried out as previously described (Leitch et al. 2001) with modifications as in Lim et al. (2000b). The Tnt2 probes were amplified from total genomic DNA. The LTR probe was amplified using primers Tnt2-LTR-F: (5′-TGTCAAAATTTCAGATTCCCAC-3′) and Tnt2-LTR-R (5′-AATTGTTGCGGAAGCCAAATG-3′). The GAG probe was amplified using primers Tnt2-GAG-F (5′-ATCTTGATTTGTGTCGTCAG-3′) and Tnt2-GAG-R: (5′-CTCGCTTTCTTTGCTCATAG-3′). PCR amplifications were performed with 100 ng of tobacco genomic DNA as template, in a reaction volume of 25 μl containing 1× Taq buffer, 1.5 mM of MgCl2, each nucleotide at 0.2 mM, each primer at 0.4 μM, and 1.5 units of Taq DNA polymerase (Invitrogen). The PCR was run under the following conditions: 5 min initial denaturation at 95°C; 35 cycles of 30 s at 95°C, 30 s at 60 or 56°C for LTR and GAG, respectively, 1 min at 72°C; followed by 10 min at 72°C. PCR generated a product whose size is 348 and 682 bp for LTR and GAG, respectively. The LTR and GAG probes were labelled using digoxigenin-11-dUTP (LTR) or biotin-16-dUTP (GAG), and PCR amplification with the same primers and conditions as before, using 5 μl of PCR product diluted 50× as template, in a reaction volume of 50 μl containing 1× Taq Buffer, 1.5 mM MgCl2, 0.1 mM of each nucleotide, 50 μM of digoxigenin-11-dUTP or biotin-16-dUTP, each primer at 0.4 μM and 2.5 units of Taq DNA polymerase (Bioline). For genomic in situ hybridization (GISH) analysis, total N. sylvestris and N. tomentosiformis genomic DNAs were labelled with digoxigenin-11-dUTP and biotin-16-dUTP, respectively, as described in Lim et al. (2000b). For both FISH and GISH, digoxigenin-11-dUTP was detected with FITC-conjugated anti-digoxigenin IgG (Roche Biochemicals) giving green fluorescence, and biotin-16-dUTP was detected with Cy3-conjugated avidin, giving red fluorescence. FISH analyses were conducted using the LTR and GAG probes simultaneously. After stripping LTR and GAG label from metaphases of tobacco, material was reprobed with labelled N. sylvestris and N. tomentosiformis genomic DNAs simultaneously. Chromosomes were counterstained with DAPI (4′,6-diamidino-2-phenylindole, giving blue fluorescence) and imaged using a Leica DMRA2 epifluorescent microscope, photographed with a Orca ER camera, and analysed using Improvision Openlab' software.

Results

Interspecies diversity

Six AFLP primer sets yielded a total of 281 bands across 18 accessions, of which 273 (97.2%) are polymorphic (Table 2). The AFLP dendrogram (Fig. 2a) shows good separation of the three species, with tobacco sister to N. sylvestris, in accordance with polymorphism levels being higher between tobacco and N. tomentosiformis (83.8%) than between tobacco and N. sylvestris (76.1%) (Table 2). SSAP dendrograms also clearly separate the three species (Fig. 2b–f), except for Tto1-1R where only a few unscorable bands of weak intensities are detected in N. tomentosiformis. Overall polymorphism levels are variable depending on the retrotransposon population analysed, but when the five datasets are combined, polymorphism levels are similar to those for AFLP. All retrotransposon populations, as well as the combination dataset, show tobacco as a sister group to N. sylvestris, with lowest polymorphism levels between this pair of species (Table 2), and with a closer relationship between tobacco and N. sylvestris for Tnt2d.

Table 2 Interspecies polymorphism levels between allotetraploid N. tabacum (tab) and progenitor diploids N. sylvestris (syl), and N. tomentosiformis (tom)
Fig. 2
figure 2

UPGMA dendrograms of relationships among the 19 Nicotiana accessions. a AFLP analysis; b−e SSAP analysis with each of the four retrotransposon populations; f combination of the four retrotransposon datasets. Numbers next to the branches indicate bootstrap support

Diversity in diploid progenitor species

AFLP diversity is low in N. sylvestris (4.2%) compared to N. tomentosiformis (31.7%) (Table 3). The N. tomentosiformis collection forms two groups (G1 and G2, Fig. 2), with high diversity between groups (24.1%), and little diversity within groups (G1 = 8.7%, G2 = 5.4%). Levels of SSAP polymorphisms follow similar patterns to AFLP. There are low levels of SSAP polymorphisms in N. sylvestris, ranging from 0 to 3.4% depending on the retrotransposon population analysed, with an average of 0.9% for the combined datasets (Table 3). In contrast N. tomentosiformis has between 22.2% (Tnt1-OL16) and 46.7% (Tnt2d) polymorphic bands, with an average of 28.7% for the combined datasets. For all retrotransposon populations analysed (except Tto1-1R), SSAP resolves the G1 and G2 groups (Fig. 2b–e), with most diversity occuring between groups (Table 3).

Table 3 Polymorphism levels in progenitor diploids N. sylvestris and N. tomentosiformis

A previous study analysing the ancestry of tobacco traced back the closest living ascendant of the T-genome of tobacco to N. tomentosiformis accession NIC479/84 (Murad et al. 2002). This accession most closely resembles tobacco in the presence/absence and chromosomal locations of several repetitive sequences (45S rDNA, NTRS, GDR3, GRD53) (Murad et al. 2002). Since recurrent simultaneous gain or loss of various tandem repeats at different chromosomal positions is unlikely, the most parsimonious interpretation of these data is that NIC479/84 belongs to a lineage of N. tomentosiformis that is closely related to the tobacco paternal T-genome. Accession NIC479/84 clusters in the G1 group. The distribution of these repetitive sequences differs for other N. tomentosiformis accessions, such as TW142 that clusters in the G2 group. We tested by PCR whether other accessions of group G1 also contained GRD53. Fig. 3 shows that GRD53 sequences are present in the three accessions of group G1, but are absent from the two accessions of group G2. All these data indicate that G1 and G2 are distinct N. tomentosiformis populations, and those from the G1 group are most closely related to the T-genome of tobacco. Therefore in subsequent analysis comparing the N. tomentosiformis contribution to the tobacco retrotransposon content, we considered only accessions of N. tomentosiformis from the G1 group (i.e. we excluded accessions TW142 and ITB647). As it is not possible to determine whether some accessions from the G1 group are more closely related to tobacco than others, all accessions from the G1 group were equally considered in our subsequent analysis.

Fig. 3
figure 3

Distribution of GRD53 geminivirus-related sequences in the Nicotiana collection. L molecular weight ladder. 1 ITB645 (G1); 2 ITB647 (G2); 3 TW142 (G2); 4 ITB646 (G1); 5 NIC 479/84 (G1); 6 Alipes; 7 Ambalema; 8 Calycina; 9 Chinensis; 10 Fructicosa; 11 Lacerata; 12 Petiolaris; 13 Purpurea B; 14 Ducrettet; 15 TW137; 16 ITB626; 17 934750005; 18 934750138; 19 904750319

Diversity in allotetraploid tobaccos

The levels of AFLP polymorphisms observed for the eight N. tabacum landraces examined here (30.3%) are considerably higher than the levels observed for breeding material (6.3%) using similar conditions (Julio et al. 2006). Therefore the eight landraces are a source of higher genetic diversity. In addition, AFLP polymorphisms observed among the eight tobacco lines are in comparable range to those observed between N. tomentosiformis G1 and G2 groups (24.1%). Tobacco SSAP polymorphisms range from 3.6% (Tto1-1R) to 8.5% (Tnt1-OL16), with an average of 7% (Table 4), levels that are significantly lower than AFLP polymorphisms. Due to low SSAP diversity not all families of retroelements generated dendrograms with well-supported branching patterns; those that did showed material differences to the AFLP dendograms (Fig. 2b–e). We scored separately bands specific to tobacco (i.e. not observed in the progenitor species, Table 4, “tab-sp”) and those bands shared with parents (Table 4, “tab-sh”). Although tobacco-specific SSAP bands are more polymorphic than shared bands, they are still predominantly monomorphic (83–100%) indicating they pre-date the divergence of the tobacco lines. Moreover, most tobacco SSAP polymorphisms are due to band loss in one to three lines. There are only two occurrences (one Tnt1-OL13 band, one Tnt1-OL16 band, data not shown) where a tobacco-specific band appeared in only one or two accessions, suggestive of a recent insertion.

Table 4 Polymorphism levels in allotetraploid N. tabacum

Contribution of the diploid parental species to allotetraploid tobacco

In order to evaluate the contribution each progenitor diploid made to the tobacco genome we evaluated the proportions of bands specific to tobacco and those that are shared with each parent (Table 5). Similar-sized bands are interpreted as orthologous loci. Bands specific to tobacco (Table 5, “tab-sp”) are assumed to have appeared after tobacco was formed, and bands shared between tobacco and one progenitor diploid (Table 5, “tab-syl”, “tab-tom”) are considered inherited by tobacco from that parent. Likewise bands shared by all three species (Table 5, “common”) are thought to be orthologous loci pre-dating the divergence of N. sylvestris and N. tomentosiformis. We excluded from the analysis the rare bands shared by both parents that were not found in tobacco (three Tnt1-OL16 bands and one AFLP band). Finally, we also evaluated the proportion of bands present in only one diploid (Table 5, “syl-sp” and “tom-sp”).

Table 5 Distribution of insertions in allotetraploid N. tabacum and diploid N. sylvestris and N. tomentosiformis progenitor species

The number of AFLP bands scored in each of the diploids is similar (0.95 syl/tom ratio, Table 5). But AFLP bands in tobacco originating from the N. sylvestris progenitor are more abundant than those from the N. tomentosiformis progenitor (1.23 S/T ratio, Table 5), indicating a higher loss of parental genetic material from the T-genome, and explaining the closer genetic relationship between tobacco and its maternal diploid progenitor. The contribution of each parental genome to tobacco's retrotransposon content differs for each retrotransposon population. For all four populations, and especially for Tnt2d, there is a higher relative abundance of SSAP bands in N. sylvestris than N. tomentosiformis, suggesting that relative copy numbers of the four retrotransposon populations are higher in N. sylvestris, despite similar genome sizes for both diploids. In tobacco there is a 1.87-fold and 1.27-fold larger number of Tnt1-OL13 and Tnt1-OL16 SSAP bands inherited from N. sylvestris than from N. tomentosiformis (S/T ratios). These ratios are similar to their relative abundance in the diploid progenitors (1.83 and 1.22 syl/tom ratios, respectively). Therefore the contribution of each parent to the tobacco Tnt1 content is proportionally similar, even though the tobacco genome has more insertions of N. sylvestris origin. In contrast, Tnt2d bands in tobacco of N. sylvestris origin are 7.67-fold more abundant than those of N. tomentosiformis origin while they are only 2.88-fold more abundant in N. sylvestris than N. tomentosiformis. These data suggest specific losses of Tnt2d bands from the tobacco T-genome.

The proportion of bands specific to tobacco (“tab-sp”, Table 5) varies depending on the retrotransposon population analysed. Tobacco-specific bands constitute 28.8 and 25.7% of the Tnt1-OL13 and Tnt2d tobacco populations, respectively, and only 12.5 and 16.1% of total Tto1-1R and Tnt1-OL16 tobacco bands, respectively. In opposite, the proportion of ancient bands common to all three species is reduced for Tnt1-OL13 and Tnt2d (6.1 and 4.1%, respectively), compared with Tnt1-OL16 (16.1%).

Loss of parental bands in tobacco

For each retrotransposon population, the number of SSAP bands scored in tobacco is lower than the sum of SSAP bands detected in each progenitor species, and a high number of SSAP bands in the progenitor diploids are absent in tobacco (Table 5: “syl-sp” and “tom-sp”). The smallest difference between tobacco and a progenitor is observed for the Tto-1R population, where approximately 85% of the bands observed in N. sylvestris also occur in tobacco. Bigger differences occur elsewhere, between 29% (Tnt2d) and 39.6% (Tnt1-OL13) of the retrotransposon content of N. sylvestris, and 25.9% (Tnt-OL16) and 62.5% (Tnt2d) of the retrotransposon content of N. tomentosiformis are absent in tobacco. In comparison, 40% of N. sylvestris AFLP bands and 51.6% of N. tomentosiformis AFLP bands are absent in tobacco.

We investigated the possibility that SSAP bands specific to each parent represented retrotransposon amplifications in the diploid since tobacco evolved, rather than loss in tobacco subsequent to its formation. We reasoned that recent insertions in the diploids could be identified if they are not fixed in all accessions. Most bands are monomorphic in N. sylvestris (Table 3), indicating that all accessions are closely related and preventing us from determining if N. sylvestris-specific bands are subsequent to tobacco’s formation. In contrast, we might expect new insertions in N. tomentosiformis to differ between the G1 and G2 groups, whose divergence pre-dates tobacco’s formation. However, the majority of bands specific to N. tomentosiformis are shared between accessions of groups G1 and G2 (70% for Tnt1-OL13; 93% for Tnt1-OL16; 73% for Tnt2d, data not shown). The absence of these bands in tobacco is therefore best explained by their loss in tobacco subsequent to its formation.

Chromosomal distribution of Tnt2

In order to investigate the loss of N. tomentosiformis-derived Tnt2 SSAP bands in tobacco, we performed FISH using probes against the LTR (digoxigenin-labelled, FITC detected, green) and the GAG coding region (biotin-labelled, Cy3 detected, red) (Fig. 4). With both probes, the highest signal densities are observed to N. sylvestris chromosomes (Fig. 4 a–d). The signal occurs as small to medium-sized dots scattered over most chromosomes. Some signals are localized into clusters (particularly near subtelomeric regions), and there is some evidence of reduced signal densities at the centromere. FISH signal to N. tomentosiformis chromosomes (Fig. 4 e–h) is different in that there is materially less signal, and the signals that do occur are larger and more clustered. These data are in accordance with relative abundances revealed by SSAP in progenitor diploids. In tobacco (Fig. 4 i–l) there is a striking contrast in hybrization intensities between S- and T-genome chromosomes (as identified by GISH, Fig. 4l). Most LTR and GAG sequences occur on the S-genome chromosomes where they cluster and signal sizes appear larger than that observed in N. sylvestris. The few sporadic signals to T-genome chromosomes occur at a lower density than in N. tomentosiformis. These data are consistent with SSAP data showing a specific loss of tobacco Tnt2d elements originating from the N. tomentosiformis progenitor and indicate that there has been no spreading of Tnt2 signals from the S- to the T-genome chromosomes. There is no indication of mobility of Tnt2 sequences from S- to T-derived regions of recombinant chromosomes (Fig. 4i–l).

Fig. 4
figure 4

FISH analysis of Tnt2 distribution on metaphase chromosomes of tobacco and its two diploid parental species. a–d N. sylvestris: a GAG probe, b LTR probe, c superimposition of the GAG and LTR images, d DAPI image showing chromosome morphology. e–h N. tomentosiformis: e GAG probe, f LTR probe, g superimposition of the GAG and LTR images, h DAPI image showing chromosome morphology. i–l N. tabacum: i GAG probe, j LTR probe, k superimposition of the GAG and LTR images, l GISH using N. sylvestris total genomic DNA to detect the S-genome of tobacco (green fluorescence) and N. tomentosiformis total genomic DNA to detect the T-genome of tobacco (red fluorescence). White arrows indicate recombinant chromosomes in tobacco: arrow head S9/t, large arrow S2/t, small, thin arrow T1/s. Scale bar 10 μm

Discussion

Amplification of retrotransposons has not contributed to tobacco diversification

Our results show that retrotransposon insertions are useful for diversity analysis in Nicotiana. Phenetic reconstructions using SSAP are similar to those using AFLP, with clear clusters of accessions to species and a similar tree topology with tobacco sister to N. sylvestris as shown previously (Ren and Timko 2001). For N. sylvestris and N. tomentosiformis, SSAP diversity levels fall in comparable ranges to those from AFLP. The N. sylvestris accessions examined have low diversity levels suggesting either: (1) N. sylvestris has little genetic diversity in the wild [e.g. having minimal population substructure, as it is an uncommon plant occurring in a narrow belt on the Andean foothills of northwestern Argentina (Goodspeed 1954) or having undergone a recent population bottleneck] or (2) the accessions were originated from similar wild populations. The accessions examined were from Nicotiana germplasm sources in several countries, but in each case the precise provenance of the materials is unavailable. In addition, we tested two N. sylvestris accessions from USDA (TW136 and TW138, originally supplied by T.H. Goodspeed) and found them to be similar to the six accessions used in this work (data not shown). Interestingly Goodspeed (1954) reported low phenotypic variation in N. sylvestris indicating that perhaps the genetic base of N. sylvestris is indeed narrow. All N. tomentosiformis accessions studied were also obtained from Nicotiana germplasm from several countries, but again the precise provenance of these materials is unavailable. N. tomentosiformis occurs in the wild as two races with distinct phenotypes (Goodspeed 1954) and has high genetic diversity (Murad et al. 2002; Lim et al. 2004b). Here AFLP and SSAP profiles reveal two distinct clusters of accessions (clusters G1 and G2). This genetic diversity could reflect the population substructure of the species, and the accessions may reflect much of the genetic diversity found in the wild. The genetic variation between N. tomentosiformis accessions revealed by AFLP and SSAP supports the hypothesis that tobacco formed after the divergence of two N. tomentosiformis lineages (Murad et al. 2002). Murad et al. (2002) showed that two N. tomentosiformis lineages are distinct in the distribution of 45S rDNA, the tandem repeat NTRS, and the occurrence of the tandem repeats GRD53 and GRD3, involving loci on 3 of the 12 chromosomes. The GRD53 and GRD3 are neither found in any other diploid relative of N. tomentosiformis (Lim et al. 2000b), nor in all lines of that species, and are likely a newly evolved sequence derived from a horizontal gene transfer involving geminivirus DNA (Murad et al. 2004). The character of all these sequences is restricted to one N. tomentosiformis lineage and the T-genome of tobacco. For this reason we used GRD53 as markers for the N. tomentosiformis-lineage most closely similar to tobacco. Only N. tomentosiformis accessions in the G1 cluster carry GRD53 repeats and because of this similarity with the tobacco T-genome, our study concentrated on G1 group N. tomentosiformis.

In contrast to parental species, tobacco SSAP diversity is lower than AFLP diversity. SSAP polymorphisms reveal insertion polymorphisms and changes at insertion sites, including rearrangements, indels and mutations of flanking EcoRI sites. Most previous studies using retrotransposons that greatly differ in their copy numbers showed SSAP to be three- to fivefold more polymorphic than AFLP, indicating a predominance of insertion polymorphisms over other diversity-inducing processes (Waugh et al. 1997; Ellis et al. 1998; Tam et al. 2005). This was not observed in tobacco. The five retrotransposon populations studied have had little impact on tobacco diversification after it was created, confirming the hypothesis of a monophyletic origin for tobacco. In addition, while Tnt2 activation conditions are unknown, Tnt1 and Tto1 are known to be somatically quiescent in healthy plants and activated in stress conditions (Grandbastien 1998; Grandbastien et al. 2005). This pattern of activity may explain the lack of SSAP diversity observed in tobacco.

The SSAP polymorphisms that do occur in tobacco are mostly band losses in one to three lines (data not shown), rather than new bands which would be indicative of potential transpositions. In addition, EcoRI/MseI AFLP polymorphisms are ca. fourfold higher than EcoRI SSAP polymorphisms. This suggests that most AFLP diversity in tobacco occurs at or near MseI restriction sites, and that polymorphisms at or near EcoRI restriction sites are very low, as the latter would also be expected to generate SSAP polymorphisms in our study.

Differential contribution of each parent to tobacco’s retrotransposon populations

Our data reveal that the contribution of each parent to the tobacco genome is unequal. A higher proportion of AFLP bands of N. sylvestris origin occur than that of N. tomentosiformis origin. These data indicate a faster divergence of the T-genome of tobacco as reported previously (Lim et al. 2004a; Skalicka et al. 2003, 2005) and a closer relationship of the tobacco S-genome with the genome of modern-day N. sylvestris. We also show that different retrotransposon populations behave differently. For Tnt1-OL13 and Tnt1-OL16, the number of bands in tobacco derived from each parent reflects their relative abundance in the diploid species. More drastic changes were observed (both by SSAP and FISH) for the Tnt2d population with more than half of the bands expected from N. tomentosiformis absent in tobacco. A notable exception is Tto1-1R, for which most of the N. sylvestris content was transmitted to tobacco.

The loss of SSAP bands does not reflect homologous recombination leading to solo-LTRs, as SSAP primers were designed against the LTR, and do not discriminate between complete elements and solo-LTRs. However, a number of mechanisms could result in the loss of a SSAP bands, including: changes in, or near, EcoRI restriction sites and deletions/truncations of retrotransposon sequences, in particular of element's termini, leading to loss of the SSAP priming target sequence. Such deletions could be small or large, up to several kb as recently described at the wheat Ha locus (Chantret et al. 2005). Our study does not allow us to determine the precise nature and extent of changes occurring at Tnt1 and Tto1 insertion sites. The reduction in Tnt2d bands of N. tomentosiformis origin in tobacco correlates with a reduction in Tnt2 signals detected by FISH on tobacco T-genome chromosomes, indicative of the physical removal of large portions of Tnt2 sequences. These losses most likely reflect major truncations or fragmentation of retrotransposon sequences, as described in Arabidopsis and rice (Devos et al. 2002; Ma et al. 2004). Losses of parental bands are mostly monomorphic, i.e. detected in all tobacco lines, indicating that they occured early in the evolution of tobacco, possibly in response to allopolyploidy. In support of this hypothesis, rapid genomic rearrangements have been shown in newly formed allopolyploids (Wendel et al. 2000; Chen and Ni 2006), including in synthetic N. tabacum (Skalicka et al. 2003, 2005).

Distinct inheritance patterns for each retrotransposon population in tobacco indicate that SSAP band losses do not reflect global genome reorganization, but features specific to each retrotransposon population. Although retrotransposons generally show widespread distributions in large plant genomes (Heslop-Harrison et al. 1997), recent work has shown that different copia-type families have markedly different distribution in Brassica (Alix et al. 2005). Transpositional history clearly also impacts on genomic distribution, as the distribution of copia-type retrotransposons differs depending on insertion age in Arabidopsis (Pereira 2004; Peterson-Burch et al. 2004). Different distributions of particular retrotransposons, linked to insertion specificities or to age, probably influence evolutionary outcome. FISH revealed that Tnt2 tend to cluster in N. tomentosiformis, while they are more widespread in N. sylvestris. This difference may have contributed to the large-scale elimination of Tnt2 from the T-genome of tobacco, perhaps because they occurred in recombination-prone regions, or perhaps were deleted from repeated or nested locations. Therefore in tobacco, differences in global distribution of each retrotransposon population are likely to have influenced insertion stability.

Retrotransposons were differentially amplified at early stages of tobacco creation

The proportions of SSAP bands specific to tobacco were twofold higher for Tnt1-OL13 and Tnt2d than for Tnt-OL16 and Tto1-1R. These patterns inversely correlate with levels of insertions common to all three species, i.e. pre-dating the divergence of N. sylvestris and N. tomentosiformis, that are good indicators of the relative age of each retrotransposon population. Younger populations of elements are associated with high levels of new tobacco-specific SSAP bands, probably arising by retrotransposon amplification. AFLP-type changes at or near EcoRI sites can also generate new SSAP bands. However, these AFLP-type changes are low in tobacco and do not explain high levels of tobacco-specific SSAP bands. The correlation between relatively young retrotransposons and increased levels of new SSAP bands in tobacco argues, at least for Tnt1-OL13 and Tnt2d, that the new SSAP bands were derived from new retrotransposon insertions.

The eight tobacco landraces studied here were selected to provide the widest possible genetic diversity of this species. Our AFLP studies indicate that they show more diversity (30%) than cultivated lines (6%) analysed in the same conditions. Tobacco cultivated lines were presumably selected from a small subpopulation of the available genetic diversity. But most tobacco-specific SSAP bands are monomorphic in tobacco landraces, demonstrating that amplification occurred early in the evolution of tobacco. This early amplification of retrotransposons may have occurred in response to the genomic shock caused by interspecific hybridization, as postulated by McClintock (1984). Activation of retrotransposons may result from breakdown in epigenetic controls in the early allopolyploid (Liu and Wendel 2003), or from transcriptional activation via specific stress–response pathways, as previously shown for Tnt1 in response to microbial stress (Grandbastien et al. 2005). Features specific to particular retrotransposons may also play a role in activity, e.g. ancient populations may contain lower proportions of potentially active elements.

In contrast to the intergenomic mobility of retroelements observed in allotetraploid cotton (Zhao et al. 1998; Hanson et al. 1998), there is no large-scale spreading of Tnt2 copies between tobacco’s S- and T-genome chromosomes, or even between S- and T-derived regions of recombinant chromosomes. Similar results were reported for Tnt1 (Melayah et al. 2004). The difference with cotton may be due to the more recent origin of tobacco (<0.2 million years) compared to cotton (1–2 million years), with insufficient time elapsed to allow spreading of retrotransposons between parental sub-genomes. The reduction in Tnt2 copies observed on tobacco's T-genome chromosomes and the high proportion of novel Tnt2d insertions detected in tobacco indicate preferential targetting to S-genome chromosomes. As retrotransposons amplify via a cytoplasmic RNA intermediate, this S-genome-specific amplification remains to be explained, and could reflect targeting preferences, e.g. Tnt2 preferential inserts in or next to itself, or structural features specific to the S-genome favour Tnt2 integration. Such targeting preferences may also explain why Tnt2d copies are threefold more abundant in modern-day N. sylvestris than in modern-day N. tomentosiformis.

Retrotransposon families are composed of populations that behave differently

Our results demonstrate that a given retrotransposon family can be composed of populations differing in age and activity, as exemplified with Tnt1. The Tnt1-OL13 and Tnt1-OL16 primer pairs were designed in regions differing in levels of sequence conservation between species. Despite the fact that the pools of elements recognized by Tnt1-OL16 must overlap those recognized by Tnt1-OL13, we observed differences in age and behavior of the two populations. Tnt1 population structure in Solanaceae has been show to resemble viral quasispecies, evolving via selective bursts of amplification of different subfamilies (Casacuberta et al. 1997; Araujo et al. 2001). Our result shows that retrotransposon families are likely composed of a continuum of related populations that behave differently, and SSAP profiles obtained for a given primer are not necessarily representative of the behavior of the entire retrotransposon family. As a consequence, the design of SSAP primers will influence the profile of retrotransposons and influence interpretation of retrotransposon diversity.

Conclusion

Allopolyploids provide excellent opportunities to study the processes that shape retrotransposon populations in plant genomes. The tobacco genome results from turnover of retrotransposon sequences, with removals counterbalanced by new insertions. We have detected unique behaviours specific to particular retrotransposon populations. Differences may reflect distinct evolutionary histories and activities of particular elements, and perhaps genome distribution and coverage. The retrotransposon populations studied here are present in low- to medium copy numbers in tobacco and its diploid progenitors. High copy number elements may show very different patterns of transmission to the hybrid tobacco structure. Nevertheless, our work shows that amplification and removal of retrotransposon sequences can occur rapidly and that the retrotransposon content of a given plant species may be strongly influenced by features specific to each retrotransposon population and by the host evolutionary history, with periods of rapid turnover influenced by allopolyploidy events. Further analyses using newly synthesized tobaccos will be necessary to elucidate the precise timings of these changes. The diploid progenitors of tobacco, N. sylvestris and N. tomentosiformis, are distantly related Nicotiana species (Knapp et al. 2004). It may prove informative to study allopolyploids of a similar age to tobacco, but with more closely related parental diploid progenitors (e.g., the allopolyploid Nicotiana arentsii derived from progenitors Nicotiana undulata and Nicotiana wigandioides, both in section Undulatae), to determine if the response of retrotransposon populations to allopolyploidy in these species is similar.