Advertisement

Sublinear Space Algorithms for the Longest Common Substring Problem

  • Tomasz Kociumaka
  • Tatiana Starikovskaya
  • Hjalte Wedel Vildhøj
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8737)

Abstract

Given m documents of total length n, we consider the problem of finding a longest string common to at least d ≥ 2 of the documents. This problem is known as the longest common substring (LCS) problem and has a classic \(\mathcal{O}(n)\) space and \(\mathcal{O}(n)\) time solution (Weiner [FOCS’73], Hui [CPM’92]). However, the use of linear space is impractical in many applications. In this paper we show that for any trade-off parameter 1 ≤ τ ≤ n, the LCS problem can be solved in \(\mathcal{O}(\tau)\) space and \(\mathcal{O}(n^2/\tau)\) time, thus providing the first smooth deterministic time-space trade-off from constant to linear space. The result uses a new and very simple algorithm, which computes a τ-additive approximation to the LCS in \(\mathcal{O}(n^2/\tau)\) time and \(\mathcal{O}(1)\) space. We also show a time-space trade-off lower bound for deterministic branching programs, which implies that any deterministic RAM algorithm solving the LCS problem on documents from a sufficiently large alphabet in \(\mathcal{O}(\tau)\) space must use \(\Omega(n\sqrt{\log(n/(\tau\log n))/\log\log(n/(\tau\log n)})\) time.

Keywords

String Match Constant Space Partial Input Pattern Match Algorithm Marked Node 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afek, Y., Bremler-Barr, A., Landau Feibish, S.: Automated signature extraction for high volume attacks. In: Proc. 9th ANCS, pp. 147–156 (2013)Google Scholar
  2. 2.
    Beame, P.: Clifford, R., Machmouchi, W.: Element Distinctness, Frequency Moments, and Sliding Windows. In: Proc. 54th FOCS, pp. 290–299 (2013)Google Scholar
  3. 3.
    Beame, P., Saks, M., Sun, X., Vee, E.: Time-Space Trade-Off Lower Bounds for Randomized Computation of Decision Problems. Journal of the ACM 50(2), 154–195 (2003)CrossRefMathSciNetGoogle Scholar
  4. 4.
    Borodin, A., Cook, S.A.: A Time-Space Tradeoff for Sorting on a General Sequential Model of Computation. SIAM Journal on Computing 11(2), 287–297 (1982)CrossRefMATHMathSciNetGoogle Scholar
  5. 5.
    Breslauer, D., Grossi, R., Mignosi, F.: Simple Real-Time Constant-Space String Matching. Theor. Comput. Sci. 483, 2–9 (2013)CrossRefMATHMathSciNetGoogle Scholar
  6. 6.
    Crochemore, M., Hancart, C., Lecroq, T.: Algorithms on Strings. Cambridge University Press (2007)Google Scholar
  7. 7.
    Farach-Colton, M.: Optimal Suffix Tree Construction with Large Alphabets. In: Proc. 38th FOCS, pp. 137–143 (1997)Google Scholar
  8. 8.
    Grossi, R., Vitter, J.S.: Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching. SIAM Journal on Computing 35(2), 378–407 (2005)CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Gusfield, D.: Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology. Cambridge University Press (1997)Google Scholar
  10. 10.
    Han, Y.: Deterministic sorting in O(nloglogn) time and linear space. Journal of Algorithms 50(1), 96–105 (2004)CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Hui, L.C.K.: Color Set Size Problem with Applications to String Matching. In: Apostolico, A., Galil, Z., Manber, U., Crochemore, M. (eds.) CPM 1992. LNCS, vol. 644, pp. 230–243. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  12. 12.
    Kreibich, C., Crowcroft, J.: Honeycomb: Creating Intrusion Detection Signatures Using Honeypots. ACM SIGCOMM Comput. Commun. Rev. 34(1), 51–56 (2004)CrossRefGoogle Scholar
  13. 13.
    Navarro, G., Mäkinen, V.: Compressed Full-Text Indexes. ACM Computing Surveys (CSUR) 39(1), 2 (2007)CrossRefGoogle Scholar
  14. 14.
    Ružić, M.: Constructing Efficient Dictionaries in Close to Sorting Time. In: Aceto, L., Damgård, I., Goldberg, L.A., Halldórsson, M.M., Ingólfsdóttir, A., Walukiewicz, I. (eds.) ICALP 2008, Part I. LNCS, vol. 5125, pp. 84–95. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  15. 15.
    Starikovskaya, T., Vildhøj, H.W.: Time-Space Trade-Offs for the Longest Common Substring Problem. In: Fischer, J., Sanders, P. (eds.) CPM 2013. LNCS, vol. 7922, pp. 223–234. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  16. 16.
    Wang, K., Cretu, G.F., Stolfo, S.J.: Anomalous Payload-Based Worm Detection and Signature Generation. In: Valdes, A., Zamboni, D. (eds.) RAID 2005. LNCS, vol. 3858, pp. 227–246. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  17. 17.
    Weiner, P.: Linear Pattern Matching Algorithms. In: Proc. 14th FOCS (SWAT), pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Tomasz Kociumaka
    • 1
  • Tatiana Starikovskaya
    • 2
  • Hjalte Wedel Vildhøj
    • 3
  1. 1.Institute of InformaticsUniversity of WarsawPoland
  2. 2.Higher School of Economics (HSE)National Research UniversityRussia
  3. 3.Technical University of Denmark, DTU ComputeDenmark

Personalised recommendations