Atlantys is ubiquitous in the genus Oryza
The Ty-3-Gypsy Atlantys element was first identified in O. sativa by McCarthy and co-workers  as part of an extensive survey to identify and classify LTR-RTs in rice. To isolate and analyze Atlantys elements and LTRs from other Oryza species, a sample of 27 sequenced BACs (totaling more than 4.2 Mbp of sequence) from 11 different Oryza species was searched using LTR sequences from two O. sativa Atlantys elements (Table 1) as query in nucleotide Basic Local Alignment Sequence Tool (BlastN) searches under relaxed settings. Complete elements from O. punctata [BB], O. granulata [GG], and O. alta [CCDD] and a nearly complete (because of a truncated 5′ LTR) element from O. officinalis [CC] were identified. The length of these elements ranged from 11,226 bp in O. granulata to the 13,728 bp in O. sativa subspecies japonica. These elements were classified as members of the Atlantys family because of their reciprocal LTR similarity and the topology of a neighbor-joining (NJ) phylogenetic tree made using conserved reverse transcriptase (RT) coding domains of the isolated elements and homologous sequences of several other O. sativa Ty3-Gypsy elements. All the RT domains of the putative Atlantys elements clustered with the O. sativa Atlantys sequence in a single clade with a support of 100% bootstrap replicates (Fig. 1). Furthermore, the structure of the complete Atlantys-like elements (Fig. 2) was conserved across the various species including an unusual third partial open reading frame (ORF) located close to the 3′ LTR on the opposite strand, with respect to the gag_pol coding region. This partial ORF encodes a putative protein with sequence similarity to an “amminotransferase-like” protein from O. sativa. However, this ORF, as with all the other Atlantys coding domains, contained several stop codons and frameshift mutations. Other canonical features of LTR retroelements such as the primer binding site (PBS) and the polypurinic tract (PPT) appeared to be highly conserved in the various Atlantys elements. Fifteen of 18 nucleotides (all except the terminal 5′-CCA-3′) of the Atlantys PBS are complementary to the 3′ end of the tRNAArg isolated in Arabidopsis thaliana , whereas the putative polypurinic tract showed few pyrimidines interrupting the long stretch of purines (Fig. 2).
Atlantys solo-LTRs were found in O. alta [CCDD], O. australiensis [EE], O. minuta [BBCC], and O. ridleyi [HHJJ] (one for each BAC). To isolate additional Atlantys LTRs from the wild relatives of rice, LTR sequences from the O. sativa Atlantys element were used as queries in similarity searches against a set of random sheared Oryza sequences (Table 2) generated from a previous project , which enabled us to reconstruct putative Atlantys LTRs for O. glaberrima [AA], O. nivara [AA], and O. rufipogon [AA]. Overall, we obtained 18 Atlantys LTRs from complete elements (10 LTRs), solo LTRs (four), truncated elements (one), and joined random sheared sequences (three). The length of the complete LTRs isolated ranged from 1,299 bp (in O. ridley) to 1,833 bp (solo-LTR in O. alta; Table 1). Their overall similarity when compared to the LTR of O. sativa Atlantys ranged from 87.1% (O. nivara Atlantys) to 50.7% (O. granulata Atlantys).
Dramatic abundance variation of Atlantys elements in the Oryza
We estimated the abundance of Atlantys elements in the Oryza in three ways. First, we determined the number of Atlantys elements (complete, truncated, and solo LTRs) in the genome sequences of both japonica and indica rice using LTRs as the query sequence in similarity searches. Only hits longer than 90% of the query length were taken into account. In the case of O. sativa ssp. indica, a total of 501 Atlantys LTRs were identified, of which 70 belonged to 35 complete elements (a complete element is constituted by the two LTRs plus the internal coding domains), 164 were part of truncated elements (because one LTR was missing and the remaining had close to its ends PBS or PPT or the integrase Ty3-Gypsy conserved domain), 182 were solo-LTR originated by intra-element recombination (all showed the canonical 5 bp long target site duplication (TSD)), and 85 LTRs were not classified because no significant signals such as PBS or PPT or TSD were found in their proximity. The latter ones could be solo-LTRs originating from inter-element unequal recombination or they could be part of highly rearranged and incomplete elements. In japonica rice, 529 complete LTRs were identified, 102 from 51 complete retroelements, 166 from truncated elements, 162 are solo-LTRs, and 99 unclassified. Considering only complete elements and solo-LTRs showing TSD, the ratio of complete elements versus solo-LTRs in indica and japonica rice was roughly 1:5 and 1:3, respectively. It should, however, be noted that the reliability of the ratio calculated in indica is questionable since the sequence used to estimate this is yet to be completed. Since the settings used in the search were extremely stringent (all the instances shorter than 90% of the query length were discarded), these figures are an underestimate of the actual amount of LTR Atlantys related sequences in O. sativa. A BlastN search collecting all the hits having significant similarity (e-value lower than 1e–10) with the Atlantys LTRs gave much higher figures—1,892 hits in O. sativa ssp. japonica and 1,945 in O. sativa ssp. indica.
Second, to estimate the abundance of Atlantys elements in the remaining species, we used Atlantys LTRs as queries in similarity searches against sequences derived from a set of random sheared small insert shotgun libraries generated from the 12 Oryza species under investigation (Table 2). We considered both the number of significant hits (all those with an e-value lower than 1e–10 in BlastN searches) and the amount of nucleotides masked (as was obtained from RepeatMasker analysis (www.repeatmasker.org)). The results of this analysis showed that in all cases, Atlantys seemed to be significantly more abundant in the Officinalis complex (BB, CC, BBCC, CCDD, and EE genomes)  than in the AA genomes. In O. alta [CCDD], O. australiensis [EE], O. minuta [BBCC], and O. officinalis [CC], 4.62%, 3.31%, 3.31%, and 3.47% of the bases searched were masked, respectively. A significant amount of masked bases were also found in O. punctata [BB] (2.48%) and O. granulata [GG] (2.45%). In contrast to species of the Officinalis complex, Atlantys appeared to be depleted in the AA genomes (0.86% in O. nivara, 0.85% in O. rufipogon, and 0.79% in O. glaberrima) as well as in the [FF] genome of O. brachyantha (0.33%; Table 3).
We estimated the copy number of Atlantys LTR-related sequences in each of the 12 Oryza genomes using the results of BlastN searches and the method proposed by Hawkins et al. . Surprisingly, the estimated number of Atlantys LTR-related sequences in several genomes was well above 10,000 copies: O. officinalis—16,293; O. coarctata—11,500; O. granulate—18,578; O. minuta—28,055; O. australiensis—22,583, O. alta—26,466; and O. ridleyi—20,929 (Table 3). The reliability of this approach was confirmed by analyzing 10 sets of 3,000 O. sativa in silico random sheared sequences each. The 22 significant hits obtained on average, translated into 1,426 predicted instances for O. sativa japonica which is in good agreement with the 1,892 hits obtained when the complete Oryza genome, was scanned using BlastN (Table 3).
Finally, to approximate the total genomic contribution of Atlantys related sequences to the Oryza genus, we assumed that the ratio of complete elements to solo LTRs was 1:3, as previously calculated for O. sativa var. japonica (we did not use the ratio calculated for O. sativa var. indica sequence because we consider the map-based O. sativa var. japonica sequence more reliable) and that the Atlantys element and LTR sizes were 13,000 and 1,500 bp, respectively. When these assumptions are applied to the rice RefSeq, we obtained a figure of 4.99 Mb which is about 6% lower than number of bases actually masked using RepeatMasker (5.32 Mbp). Under these assumptions, we calculated that over 8.31%, 7.90%, 7.93%, 5.93%, and 7.41% of the O. alta, O. minuta, O. officinalis, O. punctata, and O. australiensis genomes are composed of Atlantys-related sequences, respectively. Furthermore, Atlantys-related elements also appeared to make up a considerable portion of the O. ridleyi and O. granulata genomes, approximately 66.3 Mbp and 58.8 Mbp, respectively (Table 3).
Combined, these results suggest that the Atlantys transposable element family is significantly responsible for genome size increase in many of the Oryza genomes studied in this work and especially in those belonging to the Officinalis complex.
Phylogenetic relationships and variability of Atlantys in the Oryza
To investigate the phylogenetic relationship of the Atlantys elements in the Oryza, 137 paralogous Atlantys RT sequence domains, isolated from the random sheared libraries, were aligned and used to build a neighbor-joining phylogenetic tree (Fig. 3). The results showed that paralogs from different genome types usually do not mix in the same clades. This observation also held true in many cases for paralogs from different species (with the exception of polyploids to be discussed later). These results indicate that proliferation of the Atlantys elements in the Oryza took place after events leading to the formation of different genome types and in many cases even after the occurrence of single speciation events.
The evolutionary history of Atlantys was analyzed in detail in the polyploids O. minuta [BBCC] and O. alta [CCDD] and in their diploid BB and CC genome counterparts. To obtain greater numerical power in our analysis, we investigated multiple sequence alignments between the first 5′ 300 nucleotides of LTR sequences rather than using the RT sequences. The NJ tree made using O. punctata, O. officinalis, O. minuta, and O. alta Atlantys LTRs showed a constant mixing of O. punctata and O. minuta paralogs as well as of O. officinalis and O. minuta paralogs (Fig. 4). In particular, the two O. minuta counterparts (BB and CC) separate accordingly, mixing with O. punctata and O. officinalis elements, respectively. This topology suggests that the retrotranspositional activity of the Atlantys element in these species took place before the polyploidization events leading to O. minuta. In contrast, most of the Atlantys LTR sequences of O. alta cluster in a separate clade, thereby indicating that the retrotranspositional events happened after polyploidization leading to O. alta (Fig. 4). Both scenarios were also observed when RT sequences were used for phylogenetic analysis (data not shown).
Nucleotide diversity of Atlantys retrotransposons in different Oryza species was studied by analyzing the alignment of 137 Atlantys RT paralogs. We calculated nucleotide diversity using the Jukes–Cantor method (Pi-JC) and the average number of nucleotide differences (K). Pi-JC values varied from 0.09 in O. punctata [BB] to 0.28 in O. coarctata [HHKK] and the K parameter ranged from 40.6 in O. granulata [GG] to 91.9 in O. coarctata [HHKK]. With the exception of O. alta, the highest variability values were found in the polyploid species, which reflects the contribution of Atlantys elements from each of the two subgenomes. However, the interpopulation Pi value was greater than any of the single populations, again confirming that the timing of retrotransposition events primarily occurred after speciation (Table 4).
To investigate nucleotide diversity in greater depth, we measured Pi-JC values for 257 complete Atlantys LTRs (both solo-LTRs and LTRs from complete elements) identified in the O. sativa ssp. japonica reference sequence. High levels of diversity were observed in four regions—the 5′ and 3′ LTR ends, plus two internal regions as shown in Fig. 5. A similar pattern was also found for the indica subspecies reference genome sequence (data not shown). We then tried to correlate this variation with the presence of putative biological signals along the LTR sequence. A search for putative TATA boxes revealed seven positions, five of them were situated in low diversity regions, and two were located in more variable regions (Fig. 5). While it is impossible to validate the function of the putative TATA boxes in silico, it should be pointed out that the TATA box found around 400 nucleotides downstream of the 5′ LTR end is highly conserved in all the 14 complete Atlantys LTRs isolated from the other Oryza species (Fig. 5).
Solo-LTRs, timing of Atlantys retrotransposition, and removal through unequal homologous recombination
Several mechanisms have been proposed to explain the ability of an organism to counteract genome size expansion caused by LTR retroelement transposition  such as unequal homologous recombination at LTR sites leading to the creation of solo-LTRs. Although we could not study this mechanism in the wild relatives of rice, due to relatively short read lengths of the random sheared sequences as compared with full-length Atlantys elements, we were able to determine the ratio of full-length Atlantys elements to solo-LTRs in the O. sativa ssp. japonica and indica reference sequences at 1:3 and 1:5, respectively. These ratios were significantly lower from the expected ratio of 1:1.5  and provided evidence that unequal homologous recombination may have played a significant role in the attenuation of Atlantys copy number in domesticated rice. Phylogenetic analysis of a random set of 91 LTRs from complete Atlantys elements and solo-LTRs from both rice subspecies could not resolve the LTRs into distinct clades suggesting that the counteracting mechanism leading to the formation of solo-LTRs was contemporary to the retrotranspositional events (Fig. 6).
Furthermore, it should be noted that many of the BACs used to identify Atlantys elements from the wild relatives of rice also contained solo LTRs (e.g., O. alta, O. minuta, O. australiensis, and O. ridleyi). Although the sample is limited, the presence of solo LTRs provides additional evidence that removal of complete Atlantys elements by unequal homologous recombination was active in other Oryza species.
To determine the timing of Atlantys element insertion into the japonica and indica genomes, we used the “genome paleontology” approach proposed by SanMiguel et al.  whereby mutation rates are compared for pairs of LTRs derived from complete Atlantys elements (51 from O. sativa ssp. japonica and 35 from ssp. indica) using a substitution rate 2 × 10−8 substitutions per synonymous site per year . Results from both species were very similar and were therefore evaluated together. The overall majority of insertion events appear to have occurred recently in evolutionary time as more than 97% of the elements inserted less than 3 million years ago (MYA) and only 2.8% inserted more than 3 MYA (Table 5). Although the sample size is small, estimates of the insertion times of Atlantys elements in the other Oryza species (Table 1) where all the insertions date back to less than 1 MYA suggest a time scale of the insertion events similar to that obtained in domesticated rice. To further support the recent retrotranspositional history of Atlantys elements in Oryza, we used blastclust (ftp://ftp.ncbi.nih.gov/blast/documents/blastclust.html) to cluster the 5′ 300 bp Atlantys LTR tracts from each of the Oryza species individually. For many of the species that have a significant number of tracts isolated (at least 10), it was possible to find LTR tracts sharing more than 97% similarity over their complete length. In O. alta, five out of 23 tracts clustered; in O. granulata, seven out of 51 clustered; and in O. minuta and O. officinalis, 12 out of 51 and five out of 27 clustered, respectively. This degree of similarity could be roughly translated, using a molecular paleontology approach, into insertion times as recent as 1 MYA, thereby supporting the concept that the retrotranspositional history of the Atlantys element family in the Oryza is recent.