Abstract
Transposable elements (TEs) exert an increasingly diverse spectrum of influences on eukaryotic genome structure, function, and evolution. A deluge of genomic, transcriptomic, and proteomic data provides the foundation for turning essentially any non-model eukaryotic species into an emerging model to study any and all aspects of organismal biology, ultimately shaping future directions for biomedical, environmental, and biodiversity research. However, identification and annotation of the mobile genome component still lags behind the standards accepted for host gene annotation. To achieve the objective of providing every genome project with a comprehensive description of its mobilome component in addition to the standard genic and transcriptomic datasets, each step of TE identification, classification, and annotation should be focused on improving TE boundary designation, reducing identification error rates, and providing accurate information on the type and integrity of TE insertions. Here, we offer practical advice for generating TE models in de novo assemblies for non-model organisms, provide step-by-step instructions to guide inexperienced TE annotators through some of the commonly utilized TE analysis pipelines, and entertain suggestions for tool improvement which could be implemented by interested developers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Flutre T, Duprat E, Feuillet C, Quesneville H (2011) Considering transposable element diversification in de novo annotation approaches. PLoS One 6:e16526
Flynn JM et al (2020) RepeatModeler2 for automated genomic discovery of transposable element families. Proc Natl Acad Sci U S A 117:9451–9457
Ou S et al (2019) Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol 20:275
Valencia JD, Girgis HZ (2019) LtrDetector: a tool-suite for detecting long terminal repeat retrotransposons de-novo. BMC Genomics 20:450
Girgis HZ (2015) Red: an intelligent, rapid, accurate tool for detecting repeats de-novo on the genomic scale. BMC Bioinform 16:227
Elliott TA et al (2021) TE Hub: a community-oriented space for sharing and connecting tools, data, resources, and methods for transposable element annotation. Mob DNA 12:16
Bailly-Bechet M, Haudry A, Lerat E (2014) “One code to find them all”: a perl tool to conveniently parse RepeatMasker output files. Mob DNA 5:13
Bao W, Kojima KK, Kohany O (2015) Repbase update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11
Amselem J et al (2019) RepetDB: a unified resource for transposable element references. Mob DNA 10:6
Storer J, Hubley R, Rosen J, Wheeler TJ, Smit AF (2021) The Dfam community resource of transposable element families, sequence models, and genome annotations. Mob DNA 12:2
Quesneville H, Nouaud D, Anxolabéhère D (2003) Detection of new transposable element families in Drosophila melanogaster and Anopheles gambiae genomes. J Mol Evol 57(Suppl 1):S50–S59
Bao Z, Eddy SR (2002) Automated de novo identification of repeat sequence families in sequenced genomes. Genome Res 12:1269–1276
Edgar RC, Myers EW (2005) PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1):i152–i158
Huang X (1994) On global sequence alignment. Comput Appl Biosci 10:227–235
Katoh K, Standley DM (2013) MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol 30:772–780
Hoede C et al (2014) PASTEC: an automatic transposable element classification tool. PLoS One 9:e91929
Wicker T et al (2007) A unified classification system for eukaryotic transposable elements. Nat Rev Genet 8:973–982
Smit AFA, Hubley R, Green P (2015) RepeatMasker Open-4.0. 2013–2015 http://www.repeatmasker.org
Price AL, Jones NC, Pevzner PA (2005) De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1):i351–i358
Ou S, Jiang N (2018) LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol 176:1410–1422
Kohany O, Gentles AJ, Hankus L, Jurka J (2006) Annotation, submission and screening of repetitive elements in Repbase: RepbaseSubmitter and Censor. BMC Bioinform 7:474
Goubert C et al (2022) A beginner’s guide to manual curation of transposable elements. Mob DNA 13:7
Storer JM, Hubley R, Rosen J, Smit AFA (2021) Curation guidelines for de novo generated transposable element families. Curr Prot 1:e154
Carey KM et al (2021) PolyA: a tool for adjudicating competing annotations of biological sequences. bioRxiv:2021.2002.2013.430877
Ellinghaus D, Kurtz S, Willhoeft U (2008) LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9:18
Ou S, Jiang N (2019) LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob DNA 10:48
Shi J, Liang C (2019) Generic Repeat Finder: a high-sensitivity tool for genome-wide de novo repeat detection. Plant Physiol 180:1803–1815
Su W, Gu X, Peterson T (2019) TIR-Learner, a new ensemble method for TIR transposable element annotation, provides evidence for abundant new transposable elements in the maize genome. Mol Plant 12:447–460
Xiong W, He L, Lai J, Dooner HK, Du C (2014) HelitronScanner uncovers a large overlooked cache of Helitron transposons in many plant genomes. Proc Natl Acad Sci U S A 111:10263–10268
Su W, Ou S, Hufford MB, Peterson T (2021) A tutorial of EDTA: extensive De Novo TE annotator. Methods Mol Biol 2250:55–67
Bell EA et al (2022) Transposable element annotation in non-model species: the benefits of species-specific repeat libraries using semi-automated EDTA and DeepTE de novo pipelines. Mol Ecol Resour 22:823–833
Yan H, Bombarely A, Li S (2020) DeepTE: a computational method for de novo classification of transposons with convolutional neural network. Bioinformatics 36:4269–4275
Biryukov M, Ustyantsev K (2021) DARTS: an algorithm for domain-associated RetroTransposon search in genome assemblies. Genes (Basel) 13:9
Storer J, Hubley R, Rosen J, Smit AFA (2022) Methodologies for the de novo discovery of transposable element families. Genes (Basel) 13:709
Acknowledgments
Work in the laboratory is supported by grant R01GM111917 from the US National Institutes of Health to I.A.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature
About this protocol
Cite this protocol
Rodriguez, F., Arkhipova, I.R. (2023). An Overview of Best Practices for Transposable Element Identification, Classification, and Annotation in Eukaryotic Genomes. In: Branco, M.R., de Mendoza Soler, A. (eds) Transposable Elements. Methods in Molecular Biology, vol 2607. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2883-6_1
Download citation
DOI: https://doi.org/10.1007/978-1-0716-2883-6_1
Published:
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2882-9
Online ISBN: 978-1-0716-2883-6
eBook Packages: Springer Protocols