Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

Novák, Petr; Neumann, Pavel; Macas, Jiří

doi:10.1038/s41596-020-0400-y

Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

Protocol
Published: 23 October 2020

Volume 15, pages 3745–3776, (2020)
Cite this article

From

View current issue Submit your manuscript

7737 Accesses
117 Citations
31 Altmetric
Explore all metrics

Abstract

RepeatExplorer2 is a novel version of a computational pipeline that uses graph-based clustering of next-generation sequencing reads for characterization of repetitive DNA in eukaryotes. The clustering algorithm facilitates repeat identification in any genome by using relatively small quantities of short sequence reads, and additional tools within the pipeline perform automatic annotation and quantification of the identified repeats. The pipeline is integrated into the Galaxy platform, which provides a user-friendly web interface for script execution and documentation of the results. Compared to the original version of the pipeline, RepeatExplorer2 provides automated annotation of transposable elements, identification of tandem repeats and enhanced visualization of analysis results. Here, we present an overview of the RepeatExplorer2 workflow and provide procedures for its application to (i) de novo repeat identification in a single species, (ii) comparative repeat analysis in a set of species, (iii) development of satellite DNA probes for cytogenetic experiments and (iv) identification of centromeric repeats based on ChIP-seq data. Each procedure takes approximately 2 d to complete. RepeatExplorer2 is available at https://repeatexplorer-elixir.cerit-sc.cz.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: Schematic representation of RepeatExplorer (a) and TAREAN (b) pipelines.**

**Fig. 2: Decision tree for automatic annotation.**

**Fig. 3: Principle of comparative analysis.**

**Fig. 5: Graphical summary of clustering results for *V. villosa*.**

**Fig. 6: Comparative analysis summary.**

**Fig. 7: Design of an oligonucleotide probe and primers for PCR.**

**Fig. 8: Example visualization of ChIP-seq Mapper output.**

DNA barcoding, an effective tool for species identification: a review

Article 29 October 2022

RNA-Seq Data Analysis in Galaxy

Life barcoded by DNA barcodes

Article Open access 15 August 2022

Data availability

Example datasets that include WGS reads and ChIP-Seq reads (Table 1) were published in refs. ¹⁸ and ¹⁹ and are freely available at the ENA database (https://www.ebi.ac.uk/ena).

Code availability

The source code for all pipelines is available for public use at https://bitbucket.org/petrnovak/repex_tarean/ and https://bitbucket.org/repeatexplorer/re_utilities/ under a GNU General Public License.

References

Pellicer, J., Hidalgo, O., Dodsworth, S. & Leitch, I. J. Genome size diversity and its impact on the evolution of land plants. Genes (Basel) 9, 88 (2018).
Article Google Scholar
Vu, G. T. H. et al. Comparative genome analysis reveals divergent genome size evolution in a carnivorous plant genus. Plant Genome 8, 1–14 (2015).
Article CAS Google Scholar
Schnable, P. S. et al. The B73 maize genome: complexity, diversity, and dynamics. Science 326, 1112–1115 (2009).
Article CAS Google Scholar
Garrido-Ramos, M. A. Satellite DNA: an evolving topic. Genes (Basel) 8, 230 (2017).
Article Google Scholar
Bennetzen, J. L. & Wang, H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505–530 (2014).
Article CAS Google Scholar
Metzker, M. L. Sequencing technologies—the next generation. Nat. Rev. Genet. 11, 31–46 (2009).
Article Google Scholar
Goerner-Potvin, P. & Bourque, G. Computational tools to unmask transposable elements. Nat. Rev. Genet. 19, 688–704 (2018).
Article CAS Google Scholar
Lower, S. S., McGurk, M. P., Clark, A. G. & Barbash, D. A. Satellite DNA evolution: old ideas, new approaches. Curr. Opin. Genet. Dev. 49, 70–78 (2018).
Article CAS Google Scholar
Novák, P., Neumann, P. & Macas, J. Graph-based clustering and characterization of repetitive sequences in next-generation sequencing data. BMC Bioinforma. 11, 378 (2010).
Article Google Scholar
Novák, P., Neumann, P., Pech, J., Steinhaisl, J. & Macas, J. RepeatExplorer: a Galaxy-based web server for genome-wide characterization of eukaryotic repetitive elements from next-generation sequence reads. Bioinformatics 29, 792–793 (2013).
Article Google Scholar
Weiss-Schneeweiss, H., Leitch, A. R., McCann, J., Jang, T.-S. & Macas, J. Employing next generation sequencing to explore the repeat landscape of the plant genome. In Next Generation Sequencing in Plant Systematics Vol. 158 (eds. Hörandl, E. & Appelhans, M.) 155–179 (Koeltz Scientific Books, 2015).
Macas, J., Neumann, P. & Navrátilová, A. Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula. BMC Genomics 8, 427 (2007).
Article Google Scholar
Pertea, G. et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics 19, 651–652 (2003).
Article CAS Google Scholar
Afgan, E. et al. The Galaxy platform for accessible, reproducible and collaborative biomedical analyses: 2018 update. Nucleic Acids Res. 46, W537–W544 (2018).
Article CAS Google Scholar
Neumann, P., Novák, P., Hoštáková, N. & Macas, J. Systematic survey of plant LTR-retrotransposons elucidates phylogenetic relationships of their polyprotein domains and provides a reference for element classification. Mob. DNA 10, 1 (2019).
Article Google Scholar
Novák, P. et al. TAREAN: a computational tool for identification and characterization of satellite DNA from unassembled short reads. Nucleic Acids Res 45, e111 (2017).
Article Google Scholar
Blondel, V. D., Guillaume, J. L., Lambiotte, R. & Lefebvre, E. Fast unfolding of communities in large networks. J. Stat. Mech. 2008, P10008 (2008).
Article Google Scholar
Macas, J. et al. In depth characterization of repetitive DNA in 23 plant genomes reveals sources of genome size variation in the legume tribe Fabeae. PLoS ONE 10, e0143424 (2015).
Article Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25, 3389–3402 (1997).
Article CAS Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2014).
Article Google Scholar
Zytnicki, M., Akhunov, E. & Quesneville, H. Tedna: a transposable element de novo assembler. Bioinformatics 30, 2656–2658 (2014).
Article CAS Google Scholar
Goubert, C. et al. De novo assembly and annotation of the Asian tiger mosquito (Aedes albopictus) repeatome with dnaPipeTE from raw genomic reads and comparative analysis with the yellow fever mosquito (Aedes aegypti). Genome Biol. Evol. 7, 1192–1205 (2015).
Article CAS Google Scholar
Koch, P., Platzer, M. & Downie, B. R. RepARK—de novo creation of repeat libraries from whole-genome NGS reads. Nucleic Acids Res. 42, e80 (2014).
Article CAS Google Scholar
Chu, C., Nielsen, R. & Wu, Y. REPdenovo: inferring de novo repeat motifs from short sequence reads. PLoS ONE 11, e0150719 (2016).
Article Google Scholar
Kumke, K. et al. Plantago lagopus B chromosome is enriched in 5S rDNA-derived satellite DNA. Cytogenet. Genome Res. 148, 68–73 (2016).
Article CAS Google Scholar
Grant, J. R., Pilotte, N. & Williams, S. A. A case for using genomics and a bioinformatics pipeline to develop sensitive and species-specific PCR-based diagnostics for soil-transmitted helminths. Front. Genet. 10, 883 (2019).
Article CAS Google Scholar
Neumann, P. et al. Stretching the rules: monocentric chromosomes with multiple centromere domains. PLoS Genet 8, e1002777 (2012).
Article CAS Google Scholar
Howley, P. M., Israel, M. A., Law, M. F. & Martin, M. A. A rapid method for detecting and mapping homology between heterologous DNAs. Evaluation of polyomavirus genomes. J. Biol. Chem. 254, 4876–4883 (1979).
CAS PubMed Google Scholar
Ávila Robledillo, L. et al. Extraordinary sequence diversity and promiscuity of centromeric satellites in the legume tribe Fabeae. Mol. Biol. Evol. 37, 2341–2356 (2020).
Article Google Scholar
Ávila Robledillo, L. et al. Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing. Sci. Rep. 8, 5838 (2018).
Article Google Scholar

Download references

Acknowledgements

This work was supported by the ERDF/ESF project ELIXIR-CZ - Capacity building (No. CZ.02.1.01/0.0/0.0/16_013/0001777) and the ELIXIR-CZ research infrastructure project (MEYS No: LM2015047) including access to computing and storage facilities.

Author information

Authors and Affiliations

Biology Centre, Czech Academy of Sciences, Institute of Plant Molecular Biology, České Budějovice, Czech Republic
Petr Novák, Pavel Neumann & Jiří Macas

Authors

Petr Novák
View author publications
You can also search for this author in PubMed Google Scholar
Pavel Neumann
View author publications
You can also search for this author in PubMed Google Scholar
Jiří Macas
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

P. Novák, P. Neumann and J.M. conceptualized, designed or developed analysis workflows, tools or procedures and wrote the manuscript.

Corresponding author

Correspondence to Jiří Macas.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Francisco Ruiz-Ruano and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Data 1

Complete list of classification categories used for annotation in Viridiplantae

Supplementary Data 2

Repeat quantification performed on example dataset

Rights and permissions

Reprints and permissions

About this article

Cite this article

Novák, P., Neumann, P. & Macas, J. Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2. Nat Protoc 15, 3745–3776 (2020). https://doi.org/10.1038/s41596-020-0400-y

Download citation

Received: 19 February 2020
Accepted: 21 August 2020
Published: 23 October 2020
Issue Date: November 2020
DOI: https://doi.org/10.1038/s41596-020-0400-y
Springer Nature Limited

This article is cited by

Distinct structural variants and repeat landscape shape the genomes of the ancient grapes Aglianico and Falanghina
- Riccardo Aversano
- Marina Iovene
- Domenico Carputo
BMC Plant Biology (2024)
High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly
- Ludwig Mann
- Kristin Balasch
- Tony Heitkam
BMC Genomics (2024)
Genomic and cytogenetic analyses reveal satellite repeat signature in allotetraploid okra (Abelmoschus esculentus)
- Jiarui Liu
- Xinyi Lin
- Jiantang Xu
BMC Plant Biology (2024)
Meiotic recombination dynamics in plants with repeat-based holocentromeres shed light on the primary drivers of crossover patterning
- Marco Castellani
- Meng Zhang
- André Marques
Nature Plants (2024)
Analysis and benchmarking of small and large genomic variants across tandem repeats
- Adam C. English
- Egor Dolzhenko
- Fritz J. Sedlazeck
Nature Biotechnology (2024)

Associated content

Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing

Article Scientific Reports Open access 11 April 2018

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

From

Abstract

Access this article

Similar content being viewed by others

DNA barcoding, an effective tool for species identification: a review

RNA-Seq Data Analysis in Galaxy

Life barcoded by DNA barcodes

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Related links

Supplementary information

Supplementary Data 1

Supplementary Data 2

Rights and permissions

About this article

Cite this article

This article is cited by

Distinct structural variants and repeat landscape shape the genomes of the ancient grapes Aglianico and Falanghina

High-fidelity (repeat) consensus sequences from short reads using combined read clustering and assembly

Genomic and cytogenetic analyses reveal satellite repeat signature in allotetraploid okra (Abelmoschus esculentus)

Meiotic recombination dynamics in plants with repeat-based holocentromeres shed light on the primary drivers of crossover patterning

Analysis and benchmarking of small and large genomic variants across tandem repeats

Satellite DNA in Vicia faba is characterized by remarkable diversity in its sequence composition, association with centromeres, and replication timing

Navigation

Global analysis of repetitive DNA from unassembled sequence reads using RepeatExplorer2

Abstract

Access this article

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Related links

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation