SPIRE 2005: String Processing and Information Retrieval pp 301-314 | Cite as
Utilizing Dynamically Updated Estimates in Solving the Longest Common Subsequence Problem
Abstract
The running time of longest common subsequence (lcs) algorithms is shown to be dependent of several parameters. To such parameters belong e. g. the size of the input alphabet, the distribution of the characters in the input strings and the degree of similarity between the strings. Therefore it is very difficult to establish an lcs algorithm that could be efficient enough for all relevant problem instances. As a consequence of that fact, many of those algorithms are planned to be applied only on a restricted set of all possible inputs. Some of them are besides quite tricky to implement.
In order to speed up the running time of lcs algorithms in common, one of the most crucial prerequisities is that preliminary information about the input strings could be utilized. In addition, this information should be available after a reasonably quick preprocessing phase. One informative a priori -value to calculate is a lower bound estimate for the length of the lcs. However, the obtained lower bound might not be as accurate as desired and thus no appreciable advantages of the preprocessing can be drawn.
In this paper, a straightforward method for updating dynamically the lower bound value for the lcs is presented. The purpose is to refine the estimate gradually to prune more effectively the search space of the used exact lcs algorithm. Furthermore, simulation tests for the new presented method will be performed in order to convince us of the benefits of it.
Keywords
Longest common subsequence string algorithms heuristic algorithmsPreview
Unable to display preview. Download preview PDF.
References
- 1.Wagner, R.A., Fischer, M.J.: The string to string correction problem. Journal of the Association for Computing Machinery 21(1), 168–173 (1974)MATHMathSciNetGoogle Scholar
- 2.Hirschberg, D.S.: Algorithms for the Longest Common Subsequence problem. Journal of the Association for Computing Machinery 24(4), 664–675 (1977)MATHMathSciNetGoogle Scholar
- 3.Hunt, J.W., Szymanski, T.G.: A Fast Algorithm for Computing Longest Common Subsequences. Communications of the ACM 20(5), 350–353 (1977)MATHCrossRefMathSciNetGoogle Scholar
- 4.Mukhopadhyay, A.: A Fast Algorithm for the Longest-Common-Subsequence Problem. In: Information Sciences, vol. 20, pp. 69–82. Elsevier North Holland Inc., Amsterdam (1980)Google Scholar
- 5.Bergroth, L., Hakonen, H., Raita, T.: A Survey of Longest Common Subsequence Algorithms. In: Proceedings of SPIRE 2000, A Coruña, Spain, pp. 39–47 (2000)Google Scholar
- 6.Chin, F.Y.L., Poon, C.K.: A Fast Algorithm for Computing Longest Common Subsequences of Small Alphabet Size. Journal of Information Processing 13(4), 463–469 (1990)MATHGoogle Scholar
- 7.Hsu, W.J., Du, M.W.: New Algorithms for the LCS Problem. Journal of Computer and System Sciences 29, 133–152 (1984)MATHCrossRefMathSciNetGoogle Scholar
- 8.Apostolico, A., Guerra, C.: The Longest Common Subsequence Problem Revisited. Algorithmica 2, 315–336 (1987)MATHCrossRefMathSciNetGoogle Scholar
- 9.Rick, C.: New Algorithms for the Longest Common Subsequence Problem, Institut für Informatik der Universität Bonn, Research Report No. 85123-Cs (October 1994)Google Scholar
- 10.Bergroth, L., Hakonen, H., Väisänen, J.: New Refinement Techniques for Longest Common Subsequence Algorithms. In: Nascimento, M.A., de Moura, E.S., Oliveira, A.L. (eds.) SPIRE 2003. LNCS, vol. 2857, pp. 287–303. Springer, Heidelberg (2003)CrossRefGoogle Scholar
- 11.Miller, W., Myers, E.W.: A File Comparison Program. Software - Practice and Experience 15(11), 1025–1040 (1985)CrossRefGoogle Scholar
- 12.Myers, E.W.: An O(ND) Difference Algorithm and Its Variations. Algorithmica 1, 251–266 (1986)MATHCrossRefMathSciNetGoogle Scholar
- 13.Wu, S., Manber, U., Myers, G., Miller, W.: An O(NP) Sequence Comparison Algorithm. Information Processing Letter 35, 317–323 (1990)MATHCrossRefMathSciNetGoogle Scholar
- 14.Nakatsu, N., Kambayashi, Y., Yajima, S.: A Longest Common Subsequence Algorithm Suitable for Similar Text Strings. Acta Informatica 18, 171–179 (1982)MATHCrossRefMathSciNetGoogle Scholar
- 15.Kuo, S., Cross, G.R.: An Improved Algorithm to Find the Length of the Longest Common Subsequence of Two Strings. ACM SIGIR Forum 23(3-4), 89–99 (Spring / Summer, 1989)Google Scholar
- 16.Chin, F., Poon, C.K.: Performance Analysis of Some Simple Heuristics for Longest Common Subsequences. Algorithmica 12, 293–311Google Scholar
- 17.Bergroth, L., Hakonen, H., Raita, T.: New Approximation Algorithms for Longest Common Subsequences. In: Proceedings of SPIRE 1998, Santa Cruz de la Sierra, Bolivia (September 1998)Google Scholar
- 18.Johtela, T., Smed, J., Hakonen, H., Raita, T.: An Efficient Heuristic for the LCS Problem. In: Third South American Workshop on String Processing, WSP 1996, Recife, Brazil, August 1996, pp. 126–140 (1996)Google Scholar
- 19.Rick, C.: Simple and Fast Linear Space Computation of Longest Common Subsequences. Information Processing Letters 75(6), 275–281 (2000)MATHCrossRefMathSciNetGoogle Scholar
- 20.Goeman, H., Clausen, M.: A New Practical Linear Space Algorithm for the Longest Common Subsequence Problem. In: Proceedings of the Prague Stringology Club Workshop (1999)Google Scholar