Roadmap for Annotating Transposable Elements in Eukaryote Genomes

  • Emmanuelle Permal
  • Timothée Flutre
  • Hadi Quesneville
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 859)

Abstract

Current high-throughput techniques have made it feasible to sequence even the genomes of non-model organisms. However, the annotation process now represents a bottleneck to genome analysis, especially when dealing with transposable elements (TE). Combined approaches, using both de novo and knowledge-based methods to detect TEs, are likely to produce reasonably comprehensive and sensitive results. This chapter provides a roadmap for researchers involved in genome projects to address this issue. At each step of the TE annotation process, from the identification of TE families to the annotation of TE copies, we outline the tools and good practices to be used.

Key words

Transposable elements Genome annotation Sequence analysis Bioinformatics Genomics 

Notes

Acknowledgments

This work was supported in part by grants from the Agence Nationale de la Recherche (Holocentrism project, to HQ [grant number ANR-07-BLAN-0057]) and the Centre National de la Recherche Scientifique—Groupement de Recherche 2157 “Elements Transposables.” TF was supported by a PhD studentship form the Institut National de la Recherche Agronomique. EP was supported by a Post-Doctoral fellowship form the Agence Nationale de la Recherche.

References

  1. 1.
    Bergman CM, et al. (2006) Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome. Genome Biol 7:R112PubMedCrossRefGoogle Scholar
  2. 2.
    Quesneville H, et al. (2005) Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol 1:166–175PubMedCrossRefGoogle Scholar
  3. 3.
    Lander ES, et al. (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921PubMedCrossRefGoogle Scholar
  4. 4.
    Schnable PS, et al. (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115PubMedCrossRefGoogle Scholar
  5. 5.
    Finnegan DJ (1989) Eukaryotic transposable elements and genome evolution. Trends Genet 5:103–107PubMedCrossRefGoogle Scholar
  6. 6.
    Wicker T, et al. (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982PubMedCrossRefGoogle Scholar
  7. 7.
    Bergman CM, Quesneville H (2007) Discovering and detecting transposable elements in genome sequences. Brief Bioinform 8:382–392PubMedCrossRefGoogle Scholar
  8. 8.
    Quesneville H, Nouaud D, Anxolabehere D (2003) Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genomes. J Mol Evol 57 Suppl 1:S50-59PubMedCrossRefGoogle Scholar
  9. 9.
    Cuomo CA, et al. (2007) The Fusarium graminearum genome reveals a link between localized polymorphism and pathogen specialization. Science 317:1400–1402PubMedCrossRefGoogle Scholar
  10. 10.
    Nene V, et al. (2007) Genome sequence of Aedes aegypti, a major arbovirus vector. Science 316:1718–1723PubMedCrossRefGoogle Scholar
  11. 11.
    Vitte C, Panaud O, Quesneville H (2007) LTR retrotransposons in rice (Oryza sativa, L.): recent burst amplifications followed by rapid DNA loss. BMC Genomics 8:218PubMedCrossRefGoogle Scholar
  12. 12.
    Abad P, et al. (2008) Genome sequence of the metazoan plant-parasitic nematode Meloidogyne incognita. Nat Biotechnol 26:909–915PubMedCrossRefGoogle Scholar
  13. 13.
    Buisine N, Quesneville H, Colot V (2008) Improved detection and annotation of transposable elements in sequenced genomes using multiple reference sequence sets. Genomics 91:467–475PubMedCrossRefGoogle Scholar
  14. 14.
    Martin F, et al. (2008) The genome of Laccaria bicolor provides insights into mycorrhizal symbiosis. Nature 452:88–92PubMedCrossRefGoogle Scholar
  15. 15.
    Cock JM, et al. (2010) The Ectocarpus genome and the independent evolution of multicellularity in brown algae. Nature 465:617–621PubMedCrossRefGoogle Scholar
  16. 16.
    d’Alencon E, et al. (2010) Extensive synteny conservation of holocentric chromosomes in Lepidoptera despite high rates of local genome rearrangements. Proc Natl Acad Sci USA 107:7680–7685Google Scholar
  17. 17.
    Martin F, et al. (2010) Perigord black truffle genome uncovers evolutionary origins and mechanisms of symbiosis. Nature 464:1033–1038PubMedCrossRefGoogle Scholar
  18. 18.
    Spanu PD, et al. (2010) Genome expansion and gene loss in powdery mildew fungi reveal tradeoffs in extreme parasitism. Science 330:1543–1546PubMedCrossRefGoogle Scholar
  19. 19.
    Flutre T, et al. (2011) Considering transposable element diversification in de novo annotation approaches. PLoS One 6:e16526PubMedCrossRefGoogle Scholar
  20. 20.
    Clark AG, et al. (2007) Evolution of genes and genomes on the Drosophila phylogeny. Nature 450:203–218PubMedCrossRefGoogle Scholar
  21. 21.
    Altschul SF, et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25:3389–3402PubMedCrossRefGoogle Scholar
  22. 22.
    Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276PubMedCrossRefGoogle Scholar
  23. 23.
    Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats, Bioinformatics 21 Suppl 1:i152-158PubMedCrossRefGoogle Scholar
  24. 24.
    Huang X (1994) On global sequence alignment. Comput Appl Biosci 10:227–235PubMedGoogle Scholar
  25. 25.
    Katoh K, et al. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30:3059–3066PubMedCrossRefGoogle Scholar
  26. 26.
    Blumenstiel JP, Hartl DL, Lozovsky ER (2002) Patterns of insertion and deletion in contrasting chromatin domains. Mol Biol Evol 19:2211–2225PubMedCrossRefGoogle Scholar
  27. 27.
    Jurka J, et al. (2005) Repbase Update, a database of eukaryotic repetitive elements. Cytogenet Genome Res 110:462–467PubMedCrossRefGoogle Scholar
  28. 28.
    Finn RD, et al. (2010) The Pfam protein families database. Nucleic Acids Res 38:D211-222PubMedCrossRefGoogle Scholar
  29. 29.
    Abrusan G, et al. (2009) TEclass – a tool for automated classification of unknown eukaryotic transposable elements. Bioinformatics 25:1329–1330PubMedCrossRefGoogle Scholar
  30. 30.
    NCBI. NCBI suiteGoogle Scholar
  31. 31.
    Edgar RC (2004) MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5:113PubMedCrossRefGoogle Scholar
  32. 32.
    Guindon S, Gascuel O (2003) A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst Biol 52:696–704PubMedCrossRefGoogle Scholar
  33. 33.
    Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0., Institute for Systems BiologyGoogle Scholar
  34. 34.
    Jurka J, et al. (1996) CENSOR – a program for identification and elimination of repetitive elements from DNA sequences. Comput Chem 20:119–121PubMedCrossRefGoogle Scholar
  35. 35.
    Kohany O, et al. (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinformatics 7:474PubMedCrossRefGoogle Scholar
  36. 36.
    Benson G (1999) Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res 27:573–580PubMedCrossRefGoogle Scholar
  37. 37.
    Kolpakov R, Bana G, Kucherov G (2003) mreps: Efficient and flexible detection of tandem repeats in DNA. Nucleic Acids Res 31:3672–3678PubMedCrossRefGoogle Scholar
  38. 38.
    Kurtz S, et al. (2008) A new method to compute K-mer frequencies and its application to annotate large repetitive plant genomes. BMC Genomics 9:517PubMedCrossRefGoogle Scholar
  39. 39.
    Gu W, et al. (2008) Identification of repeat structure in large genomes using repeat probability clouds. Anal Biochem 380:77–83PubMedCrossRefGoogle Scholar
  40. 40.
    Li R, et al. (2005) ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Comput Biol 1:e43PubMedCrossRefGoogle Scholar
  41. 41.
    Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21 Suppl 1:i351-358PubMedCrossRefGoogle Scholar
  42. 42.
    Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18PubMedCrossRefGoogle Scholar
  43. 43.
    Yang L, Bennetzen JL (2009) Structure-based discovery and description of plant and animal Helitrons. Proc Natl Acad Sci USA 106:12832–12837PubMedCrossRefGoogle Scholar
  44. 44.
    Chen Y, et al. (2009) MUST: a system for identification of miniature inverted-repeat transposable elements and applications to Anabaena variabilis and Haloquadratum walsbyi. Gene 436:1–7PubMedCrossRefGoogle Scholar
  45. 45.
    Lerat E (2010) Identifying repeats and transposable elements in sequenced genomes: how to find your way through the dense forest of programs. Heredity 104:520–533PubMedCrossRefGoogle Scholar
  46. 46.
    Caspi A, Pachter L (2006) Identification of transposable elements using multiple alignments of related genomes. Genome Res 16:260–270PubMedCrossRefGoogle Scholar
  47. 47.
    Le QH, et al. (2000) Transposon diversity in Arabidopsis thaliana. Proc Natl Acad Sci USA 97:7376–7381PubMedCrossRefGoogle Scholar
  48. 48.
    Rasmussen K, Stoye J, Myers EW (2006) Efficient q-gram filters for finding all e-matches over a given length. J Comput Biol 13:296–308PubMedCrossRefGoogle Scholar
  49. 49.
    Feschotte C, et al. (2009) Exploring repetitive DNA landscapes using REPCLASS, a tool that automates the classification of transposable elements in eukaryotic genomes. Genome Biol Evol 1:205–220PubMedCrossRefGoogle Scholar
  50. 50.
    Jiang N, et al. (2004) Pack-MULE transposable elements mediate gene evolution in plants. Nature 431:569–573PubMedCrossRefGoogle Scholar
  51. 51.
    Morgante M, et al. (2005) Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet 37:997–1002PubMedCrossRefGoogle Scholar
  52. 52.
    Eickbush TH, et al. (1997) Evolution of R1 and R2 in the rDNA units of the genus Drosophila. Genetica 100:49–61PubMedCrossRefGoogle Scholar
  53. 53.
    Gray YH (2000) It takes two transposons to tango: transposable-element-mediated chromosomal rearrangements. Trends Genet 16:461–468PubMedCrossRefGoogle Scholar
  54. 54.
    Clamp M, et al. (2004) The Jalview Java alignment editor. Bioinformatics 20:426–427PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Emmanuelle Permal
    • 1
  • Timothée Flutre
    • 1
  • Hadi Quesneville
    • 1
  1. 1.Unité de Recherches en Génomique Info – URGI (UR1164) – INRA – Centre de VersaillesVersailles cedexFrance

Personalised recommendations