DcSto: carrot Stowaway-like elements are abundant, diverse, and polymorphic

We investigated nine families of Stowaway-like miniature inverted-repeat transposable elements (MITEs) in the carrot genome, named DcSto1 to DcSto9. All of them were AT-rich and shared a highly conserved 6 bp-long TIR typical for Stowaways. The copy number of DcSto1 elements was estimated as ca. 5,000 per diploid genome. We observed preference for clustered insertions of DcSto and other MITEs. Distribution of DcSto1 hybridization signals revealed presence of DcSto1 clusters within euchromatic regions along all chromosomes. An arrangement of eight regions encompassing DcSto insertion sites, studied in detail, was highly variable among plants representing different populations of Daucus carota. All of these insertions were polymorphic which most likely suggests a very recent mobilization of those elements. Insertions of DcSto near carrot genes and presence of putative promoters, regulatory motifs, and polyA signals within their sequences might suggest a possible involvement of DcSto in the regulation of gene expression. Electronic supplementary material The online version of this article (doi:10.1007/s10709-013-9725-6) contains supplementary material, which is available to authorized users.


Introduction
Transposable elements (TEs), DNA segments capable of changing their chromosomal position, are present in genomes of almost all living organisms. In plants, TEs can constitute from ca. 10 % of Arabidopsis thaliana genome (Arabidopsis Genome Initiative 2000) to 85 % of the B73 maize genome (Schnable et al. 2009). Based on the mechanism of transposition, TEs are divided into two groupsclass I: retrotransposons, and class II: DNA transposons. Retrotransposons transpose via an RNA intermediate (a 'copy-and-paste' mechanism) and each transposition event leads to increase of their copy number. DNA transposons (class II) comprise two subclasses, divided with respect to the number of strands that are cleaved during transposition. Class II TEs belonging to subclass 1 are mobilized via a 'cutand-paste' mechanism, where both DNA strands are cleaved at each end during transposition, while mobilization of subclass 2 elements does not require double-strand cleavage (Finnegan 1989;Wicker et al. 2007). Subclass 1 is further divided into two orders, i.e. TIR, characterized by the presence of terminal inverted repeats (TIRs), and Crypton, identified in fungi, devoid of TIR sequences.
Miniature inverted-repeat transposable elements (MITEs), are usually the most abundant group of DNA transposons. They are characterized by a small size ([600 bp) and similarity of their TIRs to the termini of related Class II TIR transposons (Wicker et al. 2007). Stowaway MITEs were described in maize (Bureau and Wessler 1994) as short (\500 base pairs), AT-rich, having a potential to form secondary structures and forming 2-nt 'TA' target site duplications (TSD) upon insertion. Some Stowaways may provide polyadenylation sites and cis-acting regulatory regions to adjacent genes. Stowaways were identified in both monocots and dicots (Feschotte et al. 2003). Due to similarity of TIRs (5 0 -CTG CCT CCR T-3 0 , where R stands for A or G) and TSD, it was suggested that Stowaway elements could be mobilized in trans by transposases encoded by Tc1/mariner-like DNA transposons (Feschotte and Wessler 2002;Feschotte et al. 2003Feschotte et al. , 2005Macas et al. 2005). Also, it was shown that a single source of transposase can interact with Stowaways belonging to several distinct families. The transposase has a relatively weak binding specificity and the function of TIRs might by suppressed or enhanced by an internal sequence of the element (Feschotte et al. 2005;Yang et al. 2009). Excision of mariners and Stowaways leaves a characteristic footprint, which is element-specific, corresponding to the 5 0 and 3 0 TIR nucleotides and can be of variable length (Lampe et al. 1996). Thus, a previously occupied locus can be distinguished from the ancestral empty site. Stowaway elements are relatively hypomethylated and frequently occur in genic regions (Mao et al. 2000;Takata et al. 2007). Recently, an active Stowaway element was identified in potato. Mobilization took place in the course of leaf protoplast cultures and caused somaclonal variation of skin color of potato tubers (Momose et al. 2010). Availability of genomic sequences significantly increased the number of TE families identified and facilitated their classification. Nevertheless, in species for which sequence data are restricted, new transposons are often identified following the analysis of knock-out mutations caused by TE insertions into coding regions.
Carrot is one of the most important vegetable crops and a major source of carotenoids that are precursors of vitamin A. However, sequencing data for carrot are not extensive and little is known about the organization of the repetitive portion of the carrot genome, including transposable elements. To date, only two families of TEs, i.e. Tdc1 of CACTA superfamily (Ozeki et al. 1997;Itoh et al. 2003) and DcMaster of PIF/Harbinger superfamily (Grzebelus et al. 2007), the latter associated with a family of Tourist-like MITEs named Krak , were identified in the carrot genome. In the present study, we investigated carrot Stowaway-like MITEs, DcSto (Daucus carota Stowaway-like), estimated their copy number, identified local rearrangements created upon insertion that pointed at putative functional implications of their presence on adjacent coding regions.

Plant materials
DNA was extracted from plants representing carrot diversity, comprising cultivated carrots of different origin and breeding history (supplementary Table 1) previously used for carrot genetic diversity study  A primer complementary to the TIR sequence of a DcSto1 (DcS-TIR) was designed manually based on sequence of DcSto1 element identified in the Rs locus. To design the DcS-TIR primer, both Stowaway characteristic features, 16 bp consensus TIR sequence and the characteristic TSD, were considered. To amplify full-length DcSto1 elements the reaction was prepared in 10 ll and contained 20 ng genomic DNA, 1 mM DcS-TIR primer (5 0 TAC TCC CTC CGT CCC ACC 3 0 ), 0.25 mM dNTPs (Fermentas), 0.5 U Taq DNA polymerase (Fermentas) and 19 Taq buffer. The amplification profile was as follows: 94°C (1 min), 30 cycles of 94°C (30 s), 50°C (40 s), 68°C (1 min) and 68°C (4 min). PCR products were separated in 1 % agarose gels, purified, cloned, and sequenced.
Estimation of the Copy Number of DcSto 1 Copy number of DcSto1 elements was estimated essentially as proposed by Grzebelus et al. (2006). Two rounds of amplification with DcS-TIR primer were carried out. First PCR was set up to check if at least one element is present per 384-well BAC plate and to confirm the specificity of amplification. For this purpose, pools by plate of B8503 carrot genomic BACs (Cavagnaro et al. 2009) were used. PCR reaction contained 10-30 ng BAC DNA, 1 mM DcS-TIR primer (5 0 TAC TCC CTC CGT CCC ACC 3 0 ), 0.25 mM dNTPs (Fermentas), 0.5 U Dream Taq DNA polymerase (Fermentas) and 19 Dream Taq buffer. The thermal profile was as follows: 94°C (2 min), 30 cycles of 94°C (30 s), 50°C (40 s), 68°C (1 min) and 68°C (5 min). PCR products were separated in 1 % agarose gels, three products of expected size and all amplicons larger than expected were cut out, purified, cloned, and sequenced. Subsequently, PCR under the same conditions as above was carried out using DNAs of 141 randomly chosen BAC clones. Amplicons were separated in 1 % agarose gels and scored. We estimated the copy number of DcSto1 elements in the carrot genome taking into account that the average BAC clone size was 0.121 Mbp (Cavagnaro et al. 2009) and that the 2n carrot genome was approximately 980 Mbp (Bennett and Leith 1995).

Inverse PCR, Design, and Validation of Site-Specific Primers
Inverse PCR was set up as described by Collins and Weissman (1984). Ca. 100 ng of genomic DNA was digested with 1 U TaqI (5 0 T^CGA3 0 ), MspI (5 0 C^CGG3 0 ), EcoRI (5 0 G^AATTC3 0 ), or NdeI (5 0 CA^TATG3 0 ) (Fermentas) at 37°C for 3 h in the total volume of 10 ll and incubated for 20 min at 65°C for thermal inactivation of the enzyme. Within DcSto sequence, restriction sites of the applied enzymes were not present. To achieve intramolecular circularization, 5 ll of each digestion mixture was incubated overnight at 4°C with 5 U of T4 DNA ligase (Fermantas).
Local BLAST search was used to mine BAC-end sequences (BES) database with DcSto1 sequence as a query (e-value cutoff was 0.01). Identified BES sequences with insertions of DcSto elements carrying characteristic TSD, TIR sequence, and for which enough flanking sequence was available, were used for further analysis. Boundary sequences of insertion sites obtained following iPCR or in silico analysis of BES, were used to design site-specific primers with Primer3 (v. 0.4.0) using default parameters.
Site-specific PCR was carried out in 10 ll containing around 20 ng genomic DNA, 1 mM forward and reverse primer, 0.25 mM dNTPs (Fermentas), 0.5 U Taq DNA polymerase (Fermentas), and 19 Taq buffer supplied with MgCl 2 (Fermentas). Amplification profile was as follows: 94°C (1 min), 30 cycles of 94°C (30 s), variable annealing temperature from 55 to 58°C (depending on the primer combination) (30 s), 68°C (variable time depending on the primer combination, from 1 to 3 min), and 68°C (6 min). All primer sequences, the corresponding annealing temperatures and times of elongation are provided in supplementary Table 2. Products were separated on 1 % agarose gels and selected amplicons were sequenced.

Sequence evaluation and analysis
DcSto sequences were aligned using ClustalX (Thompson et al. 1997) and manually edited in BioEdit (Hall 1999). Genetic distances were calculated with Dnadist in PHYLIP (Felsenstein 1996) using Kimura two-parameter model of nucleotide substitution, Neighbor Joining (NJ) tree was produced with Neighbor and plotted with TreeView (Page 1996). To represent relationships among DcSto1 elements amplified from different sources, NJ tree was generated using MEGA 5.05 (Tamura et al. 2011). Consensus sequences of DcSto1 to DcSto9 were used to predict secondary structures in RNAfold (Hofacker 2003), to search for putative promoter regions using TSSP (Softberry), 3 0end cleavage and polyadenylation sites using POLYAH (Softberry), regulatory DNA sequences in RegSite database using NSITE-PL (Softberry), and to identify transposons inserted in/close to coding regions in the sequences deposited in GenBank using blastn algorithm (Altschul et al. 1997).

Fluorescence in situ hybridization
Localization of DcSto1 elements on chromosomes of cv. Amsterdam 3 (AS33) was carried out by means of fluorescence in situ hybridization (FISH). The DcSto1 probe was amplified with DcS-TIR primer and cloned into pGEM-T vector. All steps of multi color FISH were carried out as described by Nowicka et al. (2012).

Identification and characteristics of DcSto elements
Upon a routine analysis of length polymorphism at the rs locus (Yau et al. 2005) in carrot breeding materials of American origin we identified an undescribed 274 bp-long insertion characterized by a two-nucleotide 'TA' TSD and 16 bp-long TIRs. Based on the presence of TSD and TIR sequences, the inserted element was classified as a Stowaway-like MITE and named DcSto1-rs. Sequence of that element served as a starting point to identify more copies of DcSto. In total, we identified 89 elements using different methods (Table 1). On the basis of the commonly accepted 80-80-80 criterion (Wicker et al. 2007), the DcSto elements were divided into nine families, DcSto1 to DcSto9. Grouping of twenty DcSto for which the full sequence was available (i.e. excluding those identified by amplification with the DcS-TIR primer) is in Fig. 1. Members of DcSto1, DcSto2, and DcSto3, were represented by more than one element in each category. Elements DcSto4, DcSto5, and DcSto6 were related to DcSto1, DcSto2, and DcSto3, respectively while DcSto7, DcSto8, and DcSto9 were more distant.
Among the DcSto elements identified, 73 copies represented DcSto1 and their average sequence similarity was 86 %. They could be grouped into branches on the Neighbor-Joining tree, however, the grouping did not  correspond with their origin from cultivated versus wild Daucus (Fig. 2). For DcSto2 and DcSto3, four and six copies, respectively, were observed and each of the remaining six families was represented by one element. The TIR length varied from 13 bp for DcSto4 to complete sequence folding into almost perfect hairpin structures in DcSto3, DcSto6, and DcSto9 (supplementary Figure 1). However, a highly conserved 6 bp-long terminal motif CTCCCT was always distinguishable (Fig. 3). As observed for other MITEs, the AT content of DcSto sequences was high (60-72 %).

Abundance and genomic distribution of DcSto1
PCR amplification with the DcS-TIR primer was set up to screen 141 randomly chosen BAC clones. Amplicons of expected size were present in 87 of them. Thus, on average one DcSto1 element was present per 196 kb and the copy number of DcSto1 could be estimated as ca. 5,000 per diploid carrot genome. The high copy number of DcSto1 elements was confirmed by Southern hybridization with the DcSto1 element used as a probe. Hybridization to EcoRI-digested DNA of cultivated and wild carrots resulted in a smear, indicating presence of DcSto1 in numerous copies (supplementary Figure 2). We also investigated the physical distribution of DcSto1 along the carrot chromosomes using fluorescent in situ hybridization. DcSto1 elements were present on both arms of all chromosomes but they were absent in the centromeric, telomeric, subtelomeric, and nucleolar organizer regions. In general, DcSto1 showed clustered distribution along the euchromatic regions of chromosome arms and their co-localization with the DAPI stained blocks of heterochromatin was not observed (Fig. 4) Local rearrangements resulting from the activity of DcSto elements PCR amplification of DcSto transposons from carrot BAC clones using DcS-TIR primer produced three bands of sizes longer than expected for DcSto1. One of them was a DcSto1 element carrying additional 59 nt similar to the terminal part of the transposon at the 5 0 end (Fig. 5b). Those extra nucleotides, excluding the sequence of the TIR-primer, were more similar to other DcSto copies then to the adjacent element, suggesting a nested insertion into another DcSto element. Two other products were derived from DcSto1 carrying nested insertions of a complete DcSto2 element or an unrelated Tourist-like MITE (Fig. 5c, d).
To characterize local variation in DcSto insertion sites, primers flanking DcSto insertions identified in BES were used to re-amplify eight loci in unrelated individuals. All insertion sites were polymorphic among analyzed plants of cultivated carrot. In case of three loci, besides expected size variants representing empty/occupied site, more complex rearrangements were also observed. In the BS2.1 region, a complete DcSto1 was identified, but also two of its derivatives, likely resulting from abortive gap repair following an excision event. Additionally, one of the derived DcSto1 variants was accompanied by an insertion of a DcSto6 element 20 bp upstream the DcSto1 insertion (Fig. 6). We have not identified a variant carrying a solo insertion of DcSto6, which suggests that the latter element was inserted into the BS2.1 variant with the internally truncated DcSto1 element already present. The BS2.2 region was characterized by alternative insertions of two different DcSto elements, DcSto3 and DcSto4, in exactly the same position. Also, a short deletion around the insertion site, likely resulting from an excision event, was observed in one of the variants (supplementary Figure 3). In the BS4 region, three independent insertions of DcSto1, DcSto5, and an uncharacterized 549 bp-long indel were found in different genetic backgrounds (supplementary Figure 4). DcSto5 was inserted 48 bp upstream the DcSto1 insertion site and a variant carrying insertions of both elements was not identified.
The presence of non-fixed DcSto insertions in Daucus carota suggests an extensive recent transpositional activity of those elements. One particularly interesting case of DcSto mobilization came from an analysis of size polymorphism in the first intron of the carrot chxb1 gene. A longer version of the intron, resulting from the insertion of a DcSto2 element was present and segregating only in AS38, one of more than 160 screened accessions (supplementary Figure 5), indicating a very recent insertion event limited to a single population of cultivated carrots.

Presence of DcSto elements in the vicinity of coding regions
DcSto elements were identified upon examination of published sequences in the vicinity of genes of Apioideae (carrot, Petroselinum crispum, and Bupleurum kaoi), especially within 5 0 UTR regions and upstream of transcription start sites (Table 2). Complete elements were identified upstream of two carrot genes, i.e. gDcPAL3, and Inv*Dc5. Interestingly, other DcSto copies associated with genic regions were devoid of TIR and the typical TSD at one or both ends, thus they were immobilized. DcSto6 showed 70 % similarity over the entire element to the complete Stowaway-like MITE in parsley, which we named PcSto (Petroselinum crispum Stowaway), identified in a region upstream of four P. crispum genes and in an intron of one gene (Table 2). An average sequence similarity of the four identified PcSto elements was 74 %, and only PcSto-PR2 and PcSto-CMPG1 were over 80 % similar. Interestingly, DcSto6 and DcSto3 were more similar to the PcSto elements than to other carrot DcSto elements (supplementary Figure 6). DcSto elements belonging to five families carried putative promoters and TATA boxes, while four families might provide polyA sites for adjacent genes. Also, within sequences of any DcSto family at least 5 putative regulatory motifs were present and only three families i.e. DcSto5, DcSto7, and DcSto8 carried less then ten putative regulatory motifs (Table 3).

Discussion
The present study demonstrated that DcSto MITEs were abundant and diverse in the carrot genome. We investigated the distribution of DcSto1 elements across cultivated and wild carrot, as well as two closely related species D. capillifolius and D. sahariensis. We did not observe any DcSto1 lineages differentiating investigated groups of accessions, which might reflect their previously documented intra-and interspecific crossability . We showed that DcSto elements in carrot were present in thousands of copies, which stays in agreement with the general characteristics of Stowaway elements present in other plants. For example, 18 Stowaway-like MITE families present in over 18,000 copies, were described in the wheat genome (Yaakov et al. 2013). Also, analysis of Stowaway elements in the relatively small genome of rice revealed presence of over 22,000 Stowaway elements divided into 36 families (Feschotte et al. 2003).
The high level of DcSto insertion polymorphism and low frequency of carrot plants harboring insertions of DcSto1 into rs and DcSto2 into chxb1 genes suggests recent mobilization of DcSto elements. Insertion and excision events resulted in high local variability, including deletions of sequences surrounding the excision site, which was reported previously in rice (Yang et al. 2006(Yang et al. , 2009. We speculate that the DcSto3 and DcSto6 families were vertically inherited from the common ancestor by the Daucus and Petroselinum linkages as suggested by their high similarity to PcSto identified in the P. crispum genome. Despite the apparent long evolutionary history of these families, indicated by their likely presence in the genome of a common ancestor of Daucus and Petroselinum, a recent mobilization event of DcSto6 in the BS2.1 region was documented. As Stowaway MITEs were shown to be evolutionarily related to autonomous elements from the Tc1/mariner superfamily (Turcotte et al. 2001;Menzel et al. 2006), it has been commonly accepted that their mobilization relies on the availability of mariner transposases (Feschotte et al. 2003). However, autonomous elements serving as transposase donors for DcSto elements remain to be identified. As proposed by Jiang et al. (2004), MITE precursors originated from autonomous elements, but their proliferation was driven later by transposases encoded by autonomous elements, not directly related but still capable of recognizing MITE termini. Also, relatively few changes in TIRs may have a dramatic effect on transposase binding to the element ends (Lampe et al. 2001). According to this scenario, divergence of TIRs sequences of DcSto elements may have resulted in mobilization of a particular group by different mariner-like transposases within overlapping time-frames. A hypothesis of multiple bursts of transposition was also proposed by  to explain diversity and evolutionary history of Medicago PIF/Harbinger-related MITEs.
We found evidence for clustered insertions of DcSto. FISH with DcSto1 revealed clustered signals over all carrot chromosomes. Analysis of the local structure of carrot DcSto insertion sites revealed clustered insertions (BS2.1) and independent insertions of DcSto into the same position or very close to the insertion site of other elements in plants of different origin (BS2.2, BS4). Analysis of maize and rice MITEs insertion site preference shows that insertion of MITEs into other members of the same family were common. It was proposed that nested and clustered insertions may act as a mechanism of limitation of transposition frequency and 'safe haven' where further integration of transposons would be tolerated (Rothnie et al. 1990;Jiang and Wessler 2001).
Independent insertions of Stowaway elements into the same localization were shown in rice R genes (Hu et al. 2000) and Triticaeae b-amylase gene (Mason-Gamer 2007). In addition, Stowaways were identified in maize bz locus, referred to as a transpositional 'hot spot' and characterized by multiple insertions of MITEs, DNA transposons, and retrotransposons (Wang and Dooner 2006). The first DcSto element was found in the first intron of carrot soluble invertase isozyme II (rs). Interestingly, the same intron was previously reported as harboring insertion of a non-autonomous PIF/Harbinger element DcMaster1 (Grzebelus et al. 2006) which might suggest that the intron acted as a similar transpositional 'hot spot'.
As observed for other MITEs, DcSto elements were frequently inserted in the vicinity of genes (Mao et al. 2000, Yaakov et al. 2013. Besides the rs gene, we found a copy of DcSto in the first intron of the chxb1 gene and reanalyzed previous reports on insertions in or near carrot genes. Cardoso et al. (2009) identified an indel in the third intron of the AOX2a gene which we found to be a DcSto. Kimura et al. (2008) described an insertion of a 299 bp MITE in the promoter region of the phenylalanine ammonia-lyase gene (DcPAL3) close to another MITE element. The former MITE could be attributed to the DcSto group. Also, we identified a DcSto element inserted in the region upstream carrot lipid body membrane protein gene (Hatzopoulos et al. 1990).
Presence of putative promoters and polyA motifs within sequence of most of analyzed DcSto elements and presence  of sequences constituting putative regulatory elements for all families might indicate their possible influence on the expression of adjacent genes. The effect of DcSto8 on the expression of carrot phenylalanine ammonia-lyase gene (DcPAL3) analyzed by Kimura et al. (2008), showed that clustered insertion of DcSto8 and another MITE significantly increased the transcription level. That effect may be one of the reasons for a relatively frequent occurrence of DcSto elements in clusters with other MITEs observed previously and confirmed in the present study. Analysis of the binding capacity of a region upstream a lipid body membrane protein gene from carrot overlapping with the DcSto9 insertion showed that the region had a potential to form complexes with nuclear extracts from embryos (Hatzopoulos et al. 1990).
Interestingly, the region identified as responsible for DNA-binding, harbored by DcSto9, was not present in other DcSto element families. Very recently, it was shown that more than half of miRNAs associated with rice transposable elements originated from MITEs (Yu et al. 2010, Sanan-Mishra et al. 2009). Similar results were observed in Solanaceae (Kuang et al. 2009) and Arabidopsis (Hollister et al. 2011). Readthrough transcription of MITEs inserted into UTR regions of genes may lead to their folding into hairpin structures, which are further processed into small RNAs. As shown in Arabidopsis, such MITE-derived miRNAs have a significant impact on the decrease of expression of adjacent genes, owing to higher methylation of those regions (Hollister et al. 2011). A predicted miRNA encoded by DcSto7, as proposed by Cardoso et al. (2009), showed significant similarity to a rice miRNA. Moreover, DcSto3 and DcSto6 copies, capable of forming fold-back structures, were most frequently associated with carrot transcripts (Iorizzo et al. 2011). Also, DcSto6 lacking one of the TIRs is present in the region upstream the carrot Dc8 gene showing differential expression during embryo development related to changes in the methylation pattern of the promoter region (Zhou et al. 1998). Also, DcSto6 elements shared 69 % similarity over their entire sequence with PcSto elements, identified adjacent to coding regions of Petroselinum crispum, a related Apiaceae species. In addition, DcSto and related elements in Apioideae species identified in the vicinity of coding regions were frequently characterized by the lack of one or both TIRs. Loss of functional termini prevents mobilization of DcSto elements which might suggest their retention due to a putative adaptive effect on the expression of adjacent genes. We conclude that abundance of DcSto elements in euchromatic regions, their presence in carrot transcripts, and presence of putative regulatory motifs within their sequences may indicate their involvement in the regulation of gene expression.