Skip to main content

An Overview of Best Practices for Transposable Element Identification, Classification, and Annotation in Eukaryotic Genomes

  • Protocol
  • First Online:
Transposable Elements

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2607))

Abstract

Transposable elements (TEs) exert an increasingly diverse spectrum of influences on eukaryotic genome structure, function, and evolution. A deluge of genomic, transcriptomic, and proteomic data provides the foundation for turning essentially any non-model eukaryotic species into an emerging model to study any and all aspects of organismal biology, ultimately shaping future directions for biomedical, environmental, and biodiversity research. However, identification and annotation of the mobile genome component still lags behind the standards accepted for host gene annotation. To achieve the objective of providing every genome project with a comprehensive description of its mobilome component in addition to the standard genic and transcriptomic datasets, each step of TE identification, classification, and annotation should be focused on improving TE boundary designation, reducing identification error rates, and providing accurate information on the type and integrity of TE insertions. Here, we offer practical advice for generating TE models in de novo assemblies for non-model organisms, provide step-by-step instructions to guide inexperienced TE annotators through some of the commonly utilized TE analysis pipelines, and entertain suggestions for tool improvement which could be implemented by interested developers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Protocol
USD 49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Flutre T, Duprat E, Feuillet C, Quesneville H (2011) Considering transposable element diversification in de novo annotation approaches. PLoS One 6:e16526

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Flynn JM et al (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A 117:9451–9457

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Ou S et al (2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20:275

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Valencia JD, Girgis HZ (2019) LtrDetector: a tool-suite for detecting long terminal repeat retrotransposons de-novo. BMC Genomics 20:450

    Article  PubMed  PubMed Central  Google Scholar 

  5. Girgis HZ (2015) Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform 16:227

    Article  Google Scholar 

  6. Elliott TA et al (2021) TE Hub: a community-oriented space for sharing and connecting tools, data, resources, and methods for transposable element annotation. Mob DNA 12:16

    Article  PubMed  PubMed Central  Google Scholar 

  7. Bailly-Bechet M, Haudry A, Lerat E (2014) “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mob DNA 5:13

    Article  PubMed Central  Google Scholar 

  8. Bao W, Kojima KK, Kohany O (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11

    Article  PubMed  PubMed Central  Google Scholar 

  9. Amselem J et al (2019) RepetDB: a unified resource for transposable element references. Mob DNA 10:6

    Article  PubMed  PubMed Central  Google Scholar 

  10. Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF (2021) The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 12:2

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Quesneville H, Nouaud D, Anxolabéhère D (2003) Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genomes. J Mol Evol 57(Suppl 1):S50–S59

    Article  CAS  PubMed  Google Scholar 

  12. Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1):i152–i158

    Article  CAS  PubMed  Google Scholar 

  14. Huang X (1994) On global sequence alignment. Comput Appl Biosci 10:227–235

    CAS  PubMed  Google Scholar 

  15. Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Hoede C et al (2014) PASTEC: an automatic transposable element classification tool. PLoS One 9:e91929

    Article  PubMed  PubMed Central  Google Scholar 

  17. Wicker T et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982

    Article  CAS  PubMed  Google Scholar 

  18. Smit AFA, Hubley R, Green P (2015) RepeatMasker Open-4.0. 2013–2015 http://www.repeatmasker.org

  19. Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351–i358

    Article  CAS  PubMed  Google Scholar 

  20. Ou S, Jiang N (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol 176:1410–1422

    Article  CAS  PubMed  Google Scholar 

  21. Kohany O, Gentles AJ, Hankus L, Jurka J (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinform 7:474

    Article  Google Scholar 

  22. Goubert C et al (2022) A beginner’s guide to manual curation of transposable elements. Mob DNA 13:7

    Article  PubMed  PubMed Central  Google Scholar 

  23. Storer JM, Hubley R, Rosen J, Smit AFA (2021) Curation guidelines for de novo generated transposable element families. Curr Prot 1:e154

    Google Scholar 

  24. Carey KM et al (2021) PolyA: a tool for adjudicating competing annotations of biological sequences. bioRxiv:2021.2002.2013.430877

    Google Scholar 

  25. Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18

    Article  PubMed  PubMed Central  Google Scholar 

  26. Ou S, Jiang N (2019) LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob DNA 10:48

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Shi J, Liang C (2019) Generic Repeat Finder: a high-sensitivity tool for genome-wide de novo repeat detection. Plant Physiol 180:1803–1815

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Su W, Gu X, Peterson T (2019) TIR-Learner, a new ensemble method for TIR transposable element annotation, provides evidence for abundant new transposable elements in the maize genome. Mol Plant 12:447–460

    Article  CAS  PubMed  Google Scholar 

  29. Xiong W, He L, Lai J, Dooner HK, Du C (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci U S A 111:10263–10268

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Su W, Ou S, Hufford MB, Peterson T (2021) A tutorial of EDTA: extensive De Novo TE annotator. Methods Mol Biol 2250:55–67

    Article  CAS  PubMed  Google Scholar 

  31. Bell EA et al (2022) Transposable element annotation in non-model species: the benefits of species-specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines. Mol Ecol Resour 22:823–833

    Article  CAS  PubMed  Google Scholar 

  32. Yan H, Bombarely A, Li S (2020) DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 36:4269–4275

    Article  CAS  PubMed  Google Scholar 

  33. Biryukov M, Ustyantsev K (2021) DARTS: an algorithm for domain-associated RetroTransposon search in genome assemblies. Genes (Basel) 13:9

    Article  PubMed  Google Scholar 

  34. Storer J, Hubley R, Rosen J, Smit AFA (2022) Methodologies for the de novo discovery of transposable element families. Genes (Basel) 13:709

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgments

Work in the laboratory is supported by grant R01GM111917 from the US National Institutes of Health to I.A.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fernando Rodriguez or Irina R. Arkhipova .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature

About this protocol

Check for updates. Verify currency and authenticity via CrossMark

Cite this protocol

Rodriguez, F., Arkhipova, I.R. (2023). An Overview of Best Practices for Transposable Element Identification, Classification, and Annotation in Eukaryotic Genomes. In: Branco, M.R., de Mendoza Soler, A. (eds) Transposable Elements. Methods in Molecular Biology, vol 2607. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2883-6_1

Download citation

  • DOI: https://doi.org/10.1007/978-1-0716-2883-6_1

  • Published:

  • Publisher Name: Humana, New York, NY

  • Print ISBN: 978-1-0716-2882-9

  • Online ISBN: 978-1-0716-2883-6

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics