Abstract
The availability of a large amount of genomic sequences has provided unique opportunities for understanding the composition and dynamics of transposable elements (TEs) in plants. As the cost of sequencing declines, the genomic sequences of most crop plants will be available within the next few years. Thus, the annotation of genomic sequences, rather than sequence availability, will become the “bottleneck” for genome study. Since TEs are the largest component of most plant genomes, the automation of TE identification and classification is essential for future genome annotation as well as characterization of TEs. In this chapter, the functions and mechanisms of different repeat finding tools are reviewed, with a focus on de novo repeat identification programs. In addition, this chapter covers the further processing of results from de novo identification programs and the construction of repeat libraries for downstream genome analyses.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev Genet 33:479–532
Feschotte C, Jiang N, Wessler SR (2002) Plant transposable elements: where genetics meets genomics. Nat Rev Genet 3:329–341
Wicker T et al (2007) (2007), A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982
Schnable PS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115
Yang L, Bennetzen JL (2009) Distribution, diversity, evolution, and survival of Helitrons in the maize genome. Proc Natl Acad Sci U S A 106:19922–19927
Jiang N et al (2004) Pack-MULE transposable elements mediate gene evolution in plants. Nature 431:569–573
Holligan D et al (2006) The transposable element landscape of the model legume Lotus japonicus. Genetics 174:2215–2228
Hanada K et al (2009) The functional role of pack-MULEs in rice inferred from purifying selection and expression profile. Plant Cell 21:25–38
Jiang N et al (2009) Genome organization of the tomato sun locus and characterization of the unusual retrotransposon Rider. Plant J 60:181–193
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403410
Pereira V (2004) Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5:R79
Pereira V (2008) Automated paleontology of repetitive DNA with REANNOTATE. BMC Genomics 9:614
McCarthy EM, McDonald JF (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19:62–67
Lerat E (2010) Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104:520–533
Saha S et al (2008) Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res 36:2284–2294
Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276
Li R et al (2005) ReAS: recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1:e43
Jiang N et al (2003) An active DNA transposon family in rice. Nature 421:163–167
Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21:i152–i158
Thompson JD et al (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21:i351–i358
Singh A et al (2010) An algorithm for the reconstruction of consensus sequences of ancient segmental duplications and transposon copies in eukaryotic genomes. Int J Bioinform Res Appl 6:147–162
Kennedy RC et al (2011) An automated homology-based approach for identifying transposable elements. BMC Bioinformatics 12:130
Abrusan G et al (2009) TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329–1330
Jurka J et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467
Feschotte C et al (2009) Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biol Evol 1:205–220
Morgenstern B (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res 32:W33–W36
Agarwal P, States DJ (1994) The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome. Proc Int Conf Intell Syst Mol Biol 2:1–9
Kurtz S et al (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29:4633–4642
Lefebvre A et al (2003) FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics 19:319–326
Campagna D et al (2005) RAP: a new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics 21:582–588
Gu W et al (2008) Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380:77–83
Acknowledgments
I thank Dr. Frank Dennis (Michigan State Univ.) for critical reading of the manuscript. This work was supported by NSF grant IOS-1126998.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media, New York
About this protocol
Cite this protocol
Jiang, N. (2013). Overview of Repeat Annotation and De Novo Repeat Identification. In: Peterson, T. (eds) Plant Transposable Elements. Methods in Molecular Biology, vol 1057. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-568-2_20
Download citation
DOI: https://doi.org/10.1007/978-1-62703-568-2_20
Published:
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-567-5
Online ISBN: 978-1-62703-568-2
eBook Packages: Springer Protocols