Fast and Cache-Oblivious Dynamic Programming with Local Dependencies

  • Philip Bille
  • Morten Stöckel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7183)

Abstract

String comparison such as sequence alignment, edit distance computation, longest common subsequence computation, and approximate string matching is a key task (and often computational bottleneck) in large-scale textual information retrieval. For instance, algorithms for sequence alignment are widely used in bioinformatics to compare DNA and protein sequences. These problems can all be solved using essentially the same dynamic programming scheme over a two-dimensional matrix, where each entry depends locally on at most 3 neighboring entries. We present a simple, fast, and cache-oblivious algorithm for this type of local dynamic programming suitable for comparing large-scale strings. Our algorithm outperforms the previous state-of-the-art solutions. Surprisingly, our new simple algorithm is competitive with a complicated, optimized, and tuned implementation of the best cache-aware algorithm. Additionally, our new algorithm generalizes the best known theoretical complexity trade-offs for the problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aggarwal, A., Vitter, J.S.: The Input/Output complexity of sorting and related problems. Commun. ACM 31(9), 1116–1127 (1988)MathSciNetCrossRefGoogle Scholar
  2. 2.
    Bille, P.: Faster approximate string matching for short patterns. Theory Comput. Syst. (2011) (to appear)Google Scholar
  3. 3.
    Bille, P., Farach-Colton, M.: Fast and compact regular expression matching. Theoret. Comput. Sci. 409(3), 486–496 (2008)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Chowdhury, R.A., Ramachandran, V.: Cache-oblivious dynamic programming. In: Proc. 17th Symp. on Discrete Algorithms, pp. 591–600 (2006)Google Scholar
  5. 5.
    Chowdhury, R.A., Ramachandran, V.: Cache-efficient dynamic programming algorithms for multicores. In: Proc. 20th Symp. on Parallelism in Algorithms and Architectures, pp. 207–216 (2008), http://doi.acm.org/10.1145/1378533.1378574
  6. 6.
    Chowdhury, R.A., Le, H.S., Ramachandran, V.: Cache-oblivious dynamic programming for bioinformatics. Trans. Comput. Biol. and Bioinformatics 7, 495–510 (2010)CrossRefGoogle Scholar
  7. 7.
    Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002)MathSciNetCrossRefMATHGoogle Scholar
  8. 8.
    Crochemore, M., Landau, G.M., Ziv-Ukelson, M.: A subquadratic sequence alignment algorithm for unrestricted scoring matrices. SIAM J. Comput. 32(6), 1654–1673 (2003)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Driga, A., Lu, P., Schaeffer, J., Szafron, D., Charter, K., Parsons, I.: FastLSA: A fast, linear-space, parallel and sequential algorithm for sequence alignment. In: Proc. Intl. Conf. on Parallel Processing, pp. 48–57 (2005)Google Scholar
  10. 10.
    Driga, A., Lu, P., Schaeffer, J., Szafron, D., Charter, K., Parsons, I.: FastLSA: A fast, linear-space, parallel and sequential algorithm for sequence alignment. Algorithmica 45, 337–375 (2006)MathSciNetCrossRefMATHGoogle Scholar
  11. 11.
    Frigo, M., Leiserson, C.E., Prokop, H., Ramachandran, S.: Cache-oblivious algorithms. In: Proc. 40th Symp. Foundations of Computer Science, pp. 285–297 (1999)Google Scholar
  12. 12.
    Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology, Cambridge (1997)Google Scholar
  13. 13.
    Hermelin, D., Landau, G.M., Landau, S., Weimann, O.: A unified algorithm for accelerating edit-distance computation via text-compression. In: Proc. 26th Symp. Theoretical Aspects of Computer Science. Leibniz International Proceedings in Informatics (LIPIcs), vol. 3, pp. 529–540 (2009)Google Scholar
  14. 14.
    Hirschberg, D.S.: A linear space algorithm for computing maximal common subsequences. Commun. ACM 18(6), 341–343 (1975)MathSciNetCrossRefMATHGoogle Scholar
  15. 15.
    Hunt, J.W., Szymanski, T.G.: A fast algorithm for computing longest common subsequences. Commun. ACM 20, 350–353 (1977)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Landau, G.M., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10, 157–169 (1989)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. System Sci. 20, 18–31 (1980)MathSciNetCrossRefMATHGoogle Scholar
  18. 18.
    Myers, E.W., Miller, W.: Optimal alignments in linear space. Comput. Appl. Biosci. 4(1), 11–17 (1988)Google Scholar
  19. 19.
    Myers, G.: A fast bit-vector algorithm for approximate string matching based on dynamic programming. J. ACM 46(3), 395–415 (1999)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Navarro, G.: A guided tour to approximate string matching. ACM Comput. Surv. 33(1), 31–88 (2001)CrossRefGoogle Scholar
  21. 21.
    Wagner, R.A., Fischer, M.J.: The string-to-string correction problem. J. ACM 21, 168–173 (1974)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Philip Bille
    • 1
  • Morten Stöckel
    • 1
  1. 1.DTU InformaticsTechnical University of DenmarkCopenhagenDenmark

Personalised recommendations