Skip to main content

RAIDER: Rapid Ab Initio Detection of Elementary Repeats

  • Conference paper
Advances in Bioinformatics and Computational Biology (BSB 2013)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 8213))

Included in the following conference series:

Abstract

Here we present RAIDER, a tool for the de novo identification of elementary repeats. The problem of searching for genomic repeats without reference to a compiled profile library is important in the annotation of new genomes and the discovery of new repeat classes. Several tools have attempted to address the problem, but generally suffer either an inability to run at the whole-genome scale or loss of sensitivity due to sequence variation between repeat copies. To address this, Zheng and Lonardi define elementary repeats: building blocks that can be assembled into a repeat library, but allow for the filtering of spurious fragments. However, their tool was too slow for use on large input, and subsequent attempts to improve efficiency have been unable to deal with the expected variation between repeat instances. RAIDER addresses both these problems, implementing a novel algorithm for elementary repeat detection and incorporating the spaced seed strategy of PatternHunter to allow for copy variation. Able to process the human genome in under 6.4 hours, initial results indicate a coverage rate comparable to or better than that achieved by competing de novo search tool when paired with the library-based RepeatMasker.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Smit, A.F.A., Hubley, R., Green, P.: RepeatMasker Open-1.0 (1996-2010), http://www.repeatmasker.org

  2. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.: Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25(17), 3389 (1997)

    Article  Google Scholar 

  3. Bao, Z., Eddy, S.R.: Automated de novo identification of repeat sequence families in sequenced genomes. Genome Research 12(8), 1269–1276 (2002)

    Article  Google Scholar 

  4. Bergman, C.M., Quesneville, H.: Discovering and detecting transposable elements in genome sequences. Briefings in Bioinformatics 8(6), 382–392 (2007)

    Article  Google Scholar 

  5. Edgar, R.C., Myers, E.W.: PILER: identification and classification of genomic repeats. Bioinformatics 21(suppl. 1), i152–i158 (2005)

    Google Scholar 

  6. Google: sparsehash - An extremely memory-efficient hash_map implementation - Google Project Hosting, http://code.google.com/p/sparsehash/

  7. Hardison, R.C.: Covariation in Frequencies of Substitution, Deletion, Transposition, and Recombination During Eutherian Evolution. Genome Research 13(1), 13–26 (2003)

    Article  Google Scholar 

  8. He, D.: Using suffix tree to discover complex repetitive patterns in DNA sequences. In: Conference Proceedings: ... of Annual International Conference of the IEEE Engineering in Medicine and Biology Society, vol. 1, pp. 3474–3477. IEEE Engineering in Medicine and Biology Society (2006)

    Google Scholar 

  9. Huo, H., Wang, X., Stojkovic, V.: An Adaptive Suffix Tree Based Algorithm for Repeats Recognition in a DNA Sequence. Bioinformatics and Bioengenierring, 181–184 (2009)

    Google Scholar 

  10. Jurka, J., Kapitonov, V.V., Pavlicek, A., Klonowski, P., Kohany, O., Walichiewicz, J.: Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and Genome Research 110(1-4), 462–467 (2005)

    Article  Google Scholar 

  11. Karro, J.E., Peifer, M., Hardison, R.C., Kollmann, M., von Grünberg, H.H.: Exponential decay of GC content detected by strand-symmetric substitution rates influences the evolution of isochore structure. Molecular Biology and Evolution 25(2), 362–374 (2008)

    Article  Google Scholar 

  12. Kurtz, S., Choudhuri, J.V., Ohlebusch, E., Schleiermacher, C., Stoye, J., Giegerich, R.: REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Research 29(22), 4633–4642 (2001)

    Article  Google Scholar 

  13. Lander, E.S., et al.: Initial sequencing and analysis of the human genome. Nature 409(6822), 860–921 (2001)

    Article  Google Scholar 

  14. Li, M., Ma, B., Kisman, D., Tromp, J.: Patternhunter II: highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology 2(3), 417–439 (2004)

    Article  Google Scholar 

  15. Li, R., Ye, J., Li, S., Wang, J., Han, Y., Ye, C., Wang, J., Yang, H., Yu, J., Wong, G.K.S., Wang, J.: ReAS: Recovery of ancestral sequences for transposable elements from the unassembled reads of a whole genome shotgun. PLoS Computational Biology 1(4), e43 (2005)

    Google Scholar 

  16. Ma, B., Tromp, J., Li, M.: PatternHunter: faster and more sensitive homology search. Bioinformatics (2002)

    Google Scholar 

  17. Pevzner, P.A., Tang, H., Tesler, G.: De novo repeat classification and fragment assembly. Genome Research 14(9), 1786–1796 (2004)

    Article  Google Scholar 

  18. Price, A.L., Jones, N.C., Pevzner, P.A.: De novo identification of repeat families in large genomes. Bioinformatics 21(suppl. 1), i351–8 (2005)

    Google Scholar 

  19. Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Computational Approaches and Tools Used in Identification of Dispersed Repetitive DNA Sequences. Tropical Plant Biology 1(1), 85–96 (2008)

    Article  Google Scholar 

  20. Saha, S., Bridges, S., Magbanua, Z.V., Peterson, D.G.: Empirical comparison of ab initio repeat finding programs. Nucleic Acids Research 36(7), 2284–2294 (2008)

    Article  Google Scholar 

  21. Zabala, G., Vodkin, L.: Novel exon combinations generated by alternative splicing of gene fragments mobilized by a CACTA transposon in Glycine max. BMC Plant Biology 7, 38 (2007)

    Article  Google Scholar 

  22. Zerbino, D.R., Birney, E.: Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Research 18(5), 821–829 (2008)

    Article  Google Scholar 

  23. Zheng, J., Lonardi, S.: Discovery of repetitive patterns in DNA with accurate boundaries … (2005)

    Google Scholar 

  24. Zhi, D., Raphael, B.J., Price, A.L., Tang, H., Pevzner, P.A.: Identifying repeat domains in large genomes. Genome Biology 7(1), R7 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer International Publishing Switzerland

About this paper

Cite this paper

Figueroa, N., Liu, X., Wang, J., Karro, J. (2013). RAIDER: Rapid Ab Initio Detection of Elementary Repeats. In: Setubal, J.C., Almeida, N.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2013. Lecture Notes in Computer Science(), vol 8213. Springer, Cham. https://doi.org/10.1007/978-3-319-02624-4_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-02624-4_16

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-02623-7

  • Online ISBN: 978-3-319-02624-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics