Longest Common Extensions via Fingerprinting

  • Philip Bille
  • Inge Li Gørtz
  • Jesper Kristensen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7183)

Abstract

The longest common extension (LCE) problem is to preprocess a string in order to allow for a large number of LCE queries, such that the queries are efficient. The LCE value, LCE s (i,j), is the length of the longest common prefix of the pair of suffixes starting at index i and j in the string s. The LCE problem can be solved in linear space with constant query time and a preprocessing of sorting complexity. There are two known approaches achieving these bounds, which use nearest common ancestors and range minimum queries, respectively. However, in practice a much simpler approach with linear query time, no extra space and no preprocessing achieves significantly better average case performance. We show a new algorithm, Fingerprint k , which for a parameter k, 1 ≤ k ≤ ⌈log n ⌉, on a string of length n and alphabet size σ, gives O(k n 1/k ) query time using O(k n) space and O(k n + sort(n,σ)) preprocessing time, where sort(n,σ) is the time it takes to sort n numbers from σ. Though this solution is asymptotically strictly worse than the asymptotically best previously known algorithms, it outperforms them in practice in average case and is almost as fast as the simple linear time algorithm. On worst case input, this new algorithm is significantly faster in practice compared to the simple linear time algorithm. We also look at cache performance of the new algorithm, and we show that for k = 2, cache optimization can improve practical query time.

Keywords

Average Case Query Time Input String Alphabet Size Query Algorithm 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Fischer, J., Heun, V.: Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  3. 3.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Disc. Alg. 8(4), 418–428 (2010)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symp. on Theory of Computing, pp. 125–136 (1972)Google Scholar
  6. 6.
    Landau, G.M., Vishkin, U.: Introducing efficient parallelism into approximate string matching and a new serial algorithm. In: Proc. 18th Symp. on Theory of Computing, pp. 220–230 (1986)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Philip Bille
    • 1
  • Inge Li Gørtz
    • 1
  • Jesper Kristensen
    • 1
  1. 1.DTU InformaticsTechnical University of DenmarkCopenhagenDenmark

Personalised recommendations