Overview of Repeat Annotation and De Novo Repeat Identification

Jiang, Ning

doi:10.1007/978-1-62703-568-2_20

Ning Jiang³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 1057))

3440 Accesses
6 Citations
1 Altmetric

Abstract

The availability of a large amount of genomic sequences has provided unique opportunities for understanding the composition and dynamics of transposable elements (TEs) in plants. As the cost of sequencing declines, the genomic sequences of most crop plants will be available within the next few years. Thus, the annotation of genomic sequences, rather than sequence availability, will become the “bottleneck” for genome study. Since TEs are the largest component of most plant genomes, the automation of TE identification and classification is essential for future genome annotation as well as characterization of TEs. In this chapter, the functions and mechanisms of different repeat finding tools are reviewed, with a focus on de novo repeat identification programs. In addition, this chapter covers the further processing of results from de novo identification programs and the construction of repeat libraries for downstream genome analyses.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 119.00; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Kumar A, Bennetzen JL (1999) Plant retrotransposons. Annu Rev Genet 33:479–532
Article PubMed CAS Google Scholar
Feschotte C, Jiang N, Wessler SR (2002) Plant transposable elements: where genetics meets genomics. Nat Rev Genet 3:329–341
Article PubMed CAS Google Scholar
Wicker T et al (2007) (2007), A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982
Article PubMed CAS Google Scholar
Schnable PS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115
Article PubMed CAS Google Scholar
Yang L, Bennetzen JL (2009) Distribution, diversity, evolution, and survival of Helitrons in the maize genome. Proc Natl Acad Sci U S A 106:19922–19927
PubMed CAS Google Scholar
Jiang N et al (2004) Pack-MULE transposable elements mediate gene evolution in plants. Nature 431:569–573
Article PubMed CAS Google Scholar
Holligan D et al (2006) The transposable element landscape of the model legume Lotus japonicus. Genetics 174:2215–2228
Article PubMed CAS Google Scholar
Hanada K et al (2009) The functional role of pack-MULEs in rice inferred from purifying selection and expression profile. Plant Cell 21:25–38
Article PubMed CAS Google Scholar
Jiang N et al (2009) Genome organization of the tomato sun locus and characterization of the unusual retrotransposon Rider. Plant J 60:181–193
Article PubMed CAS Google Scholar
Altschul SF et al (1990) Basic local alignment search tool. J Mol Biol 215:403410
Google Scholar
Pereira V (2004) Insertion bias and purifying selection of retrotransposons in the Arabidopsis thaliana genome. Genome Biol 5:R79
Article PubMed Google Scholar
Pereira V (2008) Automated paleontology of repetitive DNA with REANNOTATE. BMC Genomics 9:614
Article PubMed Google Scholar
McCarthy EM, McDonald JF (2003) LTR_STRUC: a novel search and identification program for LTR retrotransposons. Bioinformatics 19:62–67
Article Google Scholar
Lerat E (2010) Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104:520–533
Article PubMed CAS Google Scholar
Saha S et al (2008) Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res 36:2284–2294
Article PubMed CAS Google Scholar
Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276
Article PubMed CAS Google Scholar
Li R et al (2005) ReAS: recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1:e43
Article PubMed Google Scholar
Jiang N et al (2003) An active DNA transposon family in rice. Nature 421:163–167
Article PubMed CAS Google Scholar
Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21:i152–i158
Article PubMed CAS Google Scholar
Thompson JD et al (1997) The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res 25:4876–4882
Article PubMed CAS Google Scholar
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21:i351–i358
Article PubMed CAS Google Scholar
Singh A et al (2010) An algorithm for the reconstruction of consensus sequences of ancient segmental duplications and transposon copies in eukaryotic genomes. Int J Bioinform Res Appl 6:147–162
Article PubMed CAS Google Scholar
Kennedy RC et al (2011) An automated homology-based approach for identifying transposable elements. BMC Bioinformatics 12:130
Article PubMed CAS Google Scholar
Abrusan G et al (2009) TEclass–a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329–1330
Article PubMed CAS Google Scholar
Jurka J et al (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467
Article PubMed CAS Google Scholar
Feschotte C et al (2009) Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biol Evol 1:205–220
Article PubMed Google Scholar
Morgenstern B (2004) DIALIGN: multiple DNA and protein sequence alignment at BiBiServ. Nucleic Acids Res 32:W33–W36
Article PubMed CAS Google Scholar
Agarwal P, States DJ (1994) The Repeat Pattern Toolkit (RPT): analyzing the structure and evolution of the C. elegans genome. Proc Int Conf Intell Syst Mol Biol 2:1–9
PubMed CAS Google Scholar
Kurtz S et al (2001) REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res 29:4633–4642
Article PubMed CAS Google Scholar
Lefebvre A et al (2003) FORRepeats: detects repeats on entire chromosomes and between genomes. Bioinformatics 19:319–326
Article PubMed CAS Google Scholar
Campagna D et al (2005) RAP: a new computer program for de novo identification of repeated sequences in whole genomes. Bioinformatics 21:582–588
Article PubMed CAS Google Scholar
Gu W et al (2008) Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380:77–83
Article PubMed CAS Google Scholar

Download references

Acknowledgments

I thank Dr. Frank Dennis (Michigan State Univ.) for critical reading of the manuscript. This work was supported by NSF grant IOS-1126998.

Author information

Authors and Affiliations

Department of Horticulture, Michigan State University, East Lansing, MI, USA
Ning Jiang

Authors

Ning Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. Genetics, Development & Cell Biology, Iowa State University, Ames, Iowa, USA
Thomas Peterson

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Jiang, N. (2013). Overview of Repeat Annotation and De Novo Repeat Identification. In: Peterson, T. (eds) Plant Transposable Elements. Methods in Molecular Biology, vol 1057. Humana Press, Totowa, NJ. https://doi.org/10.1007/978-1-62703-568-2_20

Download citation

DOI: https://doi.org/10.1007/978-1-62703-568-2_20
Published: 05 July 2013
Publisher Name: Humana Press, Totowa, NJ
Print ISBN: 978-1-62703-567-5
Online ISBN: 978-1-62703-568-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics