Skip to main content

New Refinement Techniques for Longest Common Subsequence Algorithms

  • Conference paper
String Processing and Information Retrieval (SPIRE 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2857))

Included in the following conference series:

Abstract

Certain properties of the input strings have dominating influence on the running time of an algorithm selected to solve the longest common subsequence (lcs) problem of two input strings. It has turned out to be difficult – as well theoretically as practically – to develop an lcs algorithm which would be superior for all problem instances. Furthermore, implementing the most evolved lcs algorithms presented recently is laborious.

This paper shows that it is still beneficial to refine the traditional lcs algorithms to get new algorithm variants that are in practice competitive to the modern lcs methods in certain problem instances. We present and analyse a general-purpose algorithm NKY-MODIF, which has a moderate time and space efficiency and can easily be implemented correctly. The algorithm bases on the so-called diagonal-wise method of Nakatsu, Kambayashi and Yajima (NKY). The NKY algorithm was selected for our further consideration due to its algorithmic independence of the size of the input alphabet and its light pre-processing phase.

The NKY-MODIF algorithm refines the NKY method essentially in three ways: by reducing unnecessary scanning over the input sequences, storing the intermediate results more locally, and utilizing lower and upper bound knowledge about the lcs. In order to demonstrate that the some of the presented ideas are not specific for the NKY only, we apply lower bound information on two lcs algorithms having a different processing approach than the NKY has. This introduces a new way to solve the lcs problem.

The lcs problem has two variants: calculating only the length of the lcs, and determining also the symbols belonging to one instance of the lcs. We verify the presented ideas for both of these problem types by extensive test runs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wagner, R.A., Fischer, M.J.: The string to string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)

    MATH  MathSciNet  Google Scholar 

  2. Hirschberg, D.S.: Algorithms for the Longest Common Subsequence problem. Journal of the Association for Computing Machinery 24(4), 664–675 (1977)

    MATH  MathSciNet  Google Scholar 

  3. Hunt, J.W., Szymanski, T.G.: A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM 20(5), 350–353 (1977)

    Article  MATH  MathSciNet  Google Scholar 

  4. Mukhopadhyay, A.: A Fast Algorithm for the Longest-Common-Subsequence Problem. Information Sciences 20, 69–82 (1980)

    Article  MATH  MathSciNet  Google Scholar 

  5. Bergroth, L., Hakonen, H., Raita, T.: A Survey of Longest Common Subsequence Algorithms. In: Proceedings of SPIRE 2000, A Coruña, Spain, pp. 39–47 (2000)

    Google Scholar 

  6. Chin, F.Y.L., Poon, C.K.: A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size. Journal of Information Processing 13(4), 463–469 (1990)

    MATH  Google Scholar 

  7. Hsu, W.J., Du, M.W.: New Algorithms for the LCS Problem. Journal of Computer and System Sciences 29, 133–152 (1984)

    Article  MATH  MathSciNet  Google Scholar 

  8. Apostolico, A., Guerra, C.: The Longest Common Subsequence Problem Revisited. Algorithmica 2, 315–336 (1987)

    Article  MATH  MathSciNet  Google Scholar 

  9. Rick, C.: New Algorithms for the Longest Common Subsequence Problem, Institut für Informatik der Universität Bonn, Research Report No. 85123-Cs (October 1994)

    Google Scholar 

  10. Miller, W., Myers, E.W.: A File Comparison Program. Software – Practice and Experience 15(11), 1025–1040 (1985)

    Article  Google Scholar 

  11. Myers, E.W.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1, 251–266 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  12. Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) Sequence Comparison Algorithm. Information Processing Letter 35, 317–323 (1990)

    Article  MATH  MathSciNet  Google Scholar 

  13. Nakatsu, N., Kambayashi, Y., Yajima, S.: A Longest Common Subsequence Suitable for Similar Text Strings. Acta Informatica 18, 171–179 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  14. Chin, F., Poon, C.K.: Performance Analysis of Some Simple Heuristics for Longest Common Subsequences. Algorithmica 12, 293–311 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  15. Bergroth, L., Hakonen, H., Raita, T.: New Approximation Algorithms for Longest Common Subsequences. In: Proceedings of SPIRE 1998, Santa Cruz de la Sierra, Bolivia (September 1998)

    Google Scholar 

  16. Johtela, T., Smed, J., Hakonen, H., Raita, T.: An Efficient Heuristic for the LCS Problem. In: Third South American Workshop on String Processing, WSP 1996, Recife, Brazil, August 1996, pp. 126–140 (1996)

    Google Scholar 

  17. Kuo, S., Cross, G.R.: An Improved Algorithm to Find the Length of the Longest Common Subsequence of Two Strings. ACM SIGIR Forum 23(3-4), 89–99 (1989)

    Article  Google Scholar 

  18. Rick, C.: Simple and Fast Linear Space Computation of Longest Common Subsequences. Information Processing Letters 75(6), 275–281 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  19. Goeman, H., Clausen, M.: A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem. In: Proceedings of the Prague Stringology Club Workshop 1999 (1999)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bergroth, L., Hakonen, H., Väisänen, J. (2003). New Refinement Techniques for Longest Common Subsequence Algorithms. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds) String Processing and Information Retrieval. SPIRE 2003. Lecture Notes in Computer Science, vol 2857. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39984-1_22

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39984-1_22

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-20177-9

  • Online ISBN: 978-3-540-39984-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics