Searching for Supermaximal Repeats in Large DNA Sequences

  • Chen Na Lian
  • Mihail Halachev
  • Nematollaah Shiri
Part of the Communications in Computer and Information Science book series (CCIS, volume 13)

Abstract

We study the problem of finding supermaximal repeats in large DNA sequences. For this, we propose an algorithm called SMR which uses an auxiliary index structure (POL), which is derived from and replaces the suffix tree index STTD64 [1]. The results of our numerous experiments using the 24 human chromosomes data indicate that SMR outperforms the solution provided as part of the Vmatch [2] software tool. In searching for supermaximal repeats of size at least 10 bases, SMR is twice faster than Vmatch; for a minimum length of 25 bases, SMR is 7 times faster; and for repeats of length at least 200, SMR is about 9 times faster. We also study the cost of POL in terms of time and space requirements.

Keywords

DNA sequences supermaximal repeats suffix tree performance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Halachev, M., Shiri, N., Thamildurai, A.: Efficient and scalable indexing techniques for biological sequence data. In: Hochreiter, S., Wagner, R. (eds.) BIRD 2007. LNCS (LNBI), vol. 4414, pp. 464–479. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  2. 2.
    Vmatch: large scale sequence analysis software, http://www.vmtach.de
  3. 3.
    Gusfield, D.: Algorithms on strings, trees and sequences: computer science and computational biology. Cambridge University Press, New York (1997)MATHGoogle Scholar
  4. 4.
    Korf, B.R.: Human Genetics: A Problem-Based Approach. Blackwell, Boston (2000)Google Scholar
  5. 5.
    Watson, J., Hopkins, N., Roberts, J., Steitz, J., Weiner, A.: Molecular Biology of the Gene, 6th edn. Benjamin-Cummings, Menlo Park (2007)Google Scholar
  6. 6.
    Kurtz, S.: Reducing the space requirement of suffix trees. Software Practice and Experience 29(13), 1149–1171 (1999)CrossRefGoogle Scholar
  7. 7.
    Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM Journal on Computing 35(2), 378–407 (2005)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Ferragina, P., Manzini, G.: Opportunistic data structures with applications. In: 41st IEEE Symposium on Foundations of Computer Science, pp. 390–398 (2000)Google Scholar
  9. 9.
    Valimaki, N., Gerlach, W., Dixit, K., Makinen, V.: Compressed suffix tree - a basis for genome-scale sequence analysis. Bioinformatics 23(5), 629–630 (2007)CrossRefGoogle Scholar
  10. 10.
    Hon, W.-K., Lam, T.W., Sung, W.-K., Tse, W.-L., Wong, C.-K., Yiu, S.-M.: Practical Aspects of Compressed Suffix Arrays and FM-index in Searching DNA Sequences. In: 6th Workshop on Algorithm Engineering and Experiments, pp. 31–38 (2004)Google Scholar
  11. 11.
    Kurtz, S., Schleiermacher, C.: REPuter: Fast Computation of Maximal Repeats in Complete Genomes. Bioinformatics 15, 426–427 (1999)CrossRefGoogle Scholar
  12. 12.
  13. 13.
  14. 14.
    Bedell, J.A., Korf, I., Gish, W.: MaskerAid: a Performance Enhancement to RepeatMasker. Bioinformatics 16(11), 1040–1041 (2000)CrossRefGoogle Scholar
  15. 15.
    Gotoh, O.: An Improved Algorithm for Matching Biological Sequences. Journal of Molecular Biology 162(3), 705–708 (1982)CrossRefGoogle Scholar
  16. 16.
    Abouelhoda, M.I., Kurtz, S., Ohlebusch, E.: Replacing suffix tree with enhances suffix arrays. Journal of Discrete Algorithms 2(1), 53–86 (2004)MATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Miki, B.L., Neelin, J.M.: DNA repeat lengths of erythrocyte chromatins differing in content of histones H1 and H5. Nucleic Acids Res. 8(3), 529–542 (1980)CrossRefGoogle Scholar
  18. 18.
    National Center for Biotechnology Information, http://www.ncbi.nim.nih.gov

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Chen Na Lian
    • 1
  • Mihail Halachev
    • 1
  • Nematollaah Shiri
    • 1
  1. 1.Dept. of Computer Science & Software EngineeringConcordia UniversityMontrealCanada

Personalised recommendations