RAIDER: Rapid Ab Initio Detection of Elementary Repeats

Figueroa, Nathaniel; Liu, Xiaolin; Wang, Jiajun; Karro, John

doi:10.1007/978-3-319-02624-4_16

Nathaniel Figueroa²¹,
Xiaolin Liu²²,
Jiajun Wang²¹ &
…
John Karro^21,23,24

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8213))

Included in the following conference series:

Brazilian Symposium on Bioinformatics

968 Accesses
1 Citations
1 Altmetric

Abstract

Here we present RAIDER, a tool for the de novo identification of elementary repeats. The problem of searching for genomic repeats without reference to a compiled profile library is important in the annotation of new genomes and the discovery of new repeat classes. Several tools have attempted to address the problem, but generally suffer either an inability to run at the whole-genome scale or loss of sensitivity due to sequence variation between repeat copies. To address this, Zheng and Lonardi define elementary repeats: building blocks that can be assembled into a repeat library, but allow for the filtering of spurious fragments. However, their tool was too slow for use on large input, and subsequent attempts to improve efficiency have been unable to deal with the expected variation between repeat instances. RAIDER addresses both these problems, implementing a novel algorithm for elementary repeat detection and incorporating the spaced seed strategy of PatternHunter to allow for copy variation. Able to process the human genome in under 6.4 hours, initial results indicate a coverage rate comparable to or better than that achieved by competing de novo search tool when paired with the library-based RepeatMasker.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker Open-1.0 (1996-2010), http://www.repeatmasker.org
Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389 (1997)
Article Google Scholar
Bao, Z., Eddy, S.R.: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research 12(8), 1269–1276 (2002)
Article Google Scholar
Bergman, C.M., Quesneville, H.: Discovering and detecting transposable elements in genome sequences. Briefings in Bioinformatics 8(6), 382–392 (2007)
Article Google Scholar
Edgar, R.C., Myers, E.W.: PILER: identification and classification of genomic repeats. Bioinformatics 21(suppl. 1), i152–i158 (2005)
Google Scholar
Google: sparsehash - An extremely memory-efficient hash_map implementation - Google Project Hosting, http://code.google.com/p/sparsehash/
Hardison, R.C.: Covariation in Frequencies of Substitution, Deletion, Transposition, and Recombination During Eutherian Evolution. Genome Research 13(1), 13–26 (2003)
Article Google Scholar
He, D.: Using suffix tree to discover complex repetitive patterns in DNA sequences. In: Conference Proceedings: ... of Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1, pp. 3474–3477. IEEE Engineering in Medicine and Biology Society (2006)
Google Scholar
Huo, H., Wang, X., Stojkovic, V.: An Adaptive Suffix Tree Based Algorithm for Repeats Recognition in a DNA Sequence. Bioinformatics and Bioengenierring, 181–184 (2009)
Google Scholar
Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and Genome Research 110(1-4), 462–467 (2005)
Article Google Scholar
Karro, J.E., Peifer, M., Hardison, R.C., Kollmann, M., von Grünberg, H.H.: Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Molecular Biology and Evolution 25(2), 362–374 (2008)
Article Google Scholar
Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Research 29(22), 4633–4642 (2001)
Article Google Scholar
Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)
Article Google Scholar
Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology 2(3), 417–439 (2004)
Article Google Scholar
Li, R., Ye, J., Li, S., Wang, J., Han, Y., Ye, C., Wang, J., Yang, H., Yu, J., Wong, G.K.S., Wang, J.: ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Computational Biology 1(4), e43 (2005)
Google Scholar
Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics (2002)
Google Scholar
Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Research 14(9), 1786–1796 (2004)
Article Google Scholar
Price, A.L., Jones, N.C., Pevzner, P.A.: De novo identification of repeat families in large genomes. Bioinformatics 21(suppl. 1), i351–8 (2005)
Google Scholar
Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences. Tropical Plant Biology 1(1), 85–96 (2008)
Article Google Scholar
Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Empirical comparison of ab initio repeat finding programs. Nucleic Acids Research 36(7), 2284–2294 (2008)
Article Google Scholar
Zabala, G., Vodkin, L.: Novel exon combinations generated by alternative splicing of gene fragments mobilized by a CACTA transposon in Glycine max. BMC Plant Biology 7, 38 (2007)
Article Google Scholar
Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)
Article Google Scholar
Zheng, J., Lonardi, S.: Discovery of repetitive patterns in DNA with accurate boundaries … (2005)
Google Scholar
Zhi, D., Raphael, B.J., Price, A.L., Tang, H., Pevzner, P.A.: Identifying repeat domains in large genomes. Genome Biology 7(1), R7 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, Miami University, Oxford, Ohio, USA
Nathaniel Figueroa, Jiajun Wang & John Karro
Center for Molecular and Structural Biology, Miami University, Oxford, Ohio, USA
Xiaolin Liu
Department of Microbiology, Miami University, Oxford, Ohio, USA
John Karro
Department of Statistics, Miami University, Oxford, Ohio, USA
John Karro

Authors

Nathaniel Figueroa
View author publications
You can also search for this author in PubMed Google Scholar
Xiaolin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jiajun Wang
View author publications
You can also search for this author in PubMed Google Scholar
John Karro
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Chemistry, University of São Paulo, Avenida Prof. Lineu Prestes, 748 sala 911, 05508-000, São Paulo, SP, Brazil
João C. Setubal
School of Computing, Facom-UFMS, Federal University of Mato Grosso do Sul, CP 549, Mato Grosso do Sul, 79070-900, MS, Brazil
Nalvo F. Almeida

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Figueroa, N., Liu, X., Wang, J., Karro, J. (2013). RAIDER: Rapid Ab Initio Detection of Elementary Repeats. In: Setubal, J.C., Almeida, N.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2013. Lecture Notes in Computer Science(), vol 8213. Springer, Cham. https://doi.org/10.1007/978-3-319-02624-4_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-02624-4_16
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02623-7
Online ISBN: 978-3-319-02624-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics