, Volume 68, Issue 1, pp 80-89,
Open Access This content is freely available online to anyone, anywhere at any time.

The Contribution of Transposable Elements to Expressed Coding Sequence in Arabidopsis thaliana


The goal of this study was to assess the extent to which transposable elements (TEs) have contributed to protein-coding regions in Arabidopsis thaliana. To do this, we first characterized the extent of chimeric TE-gene constructs. We compared a genome-wide TE database to genomic sequences, annotated coding regions, and EST data. The comparison revealed that 7.8% of expressed genes contained a region with close similarity to a known TE sequence. Some groups of TEs, such as helitrons, were underrepresented in exons relative to their genome-wide distribution; in contrast, Copia-like and En/Spm-like sequences were overrepresented in exons. These 7.8% percent of genes were enriched for some GO-based functions, particularly kinase activity, and lacking in other functions, notably structural molecule activity. We also examined gene family evolution for these genes. Gene family information helped clarify whether the sequence similarity between TE and gene was due to a TE contributing to the gene or, instead, the TE co-opting a portion of the gene. Most (66%) of these genes were not easily assigned to a gene family, and for these we could not infer the direction of the relationship between TE and gene. For the remainder, where appropriate, we built phylogenetic trees to infer the direction of the TE-gene relationship by parsimony. By this method, we verified examples where TEs contributed to expressed proteins. Our results are undoubtedly conservative but suggest that TEs may have contributed small protein segments to as many as 1.2% of all expressed, annotated A. thaliana genes.