Longest Common Extensions via Fingerprinting

Bille, Philip; Gørtz, Inge Li; Kristensen, Jesper

doi:10.1007/978-3-642-28332-1_11

Philip Bille¹⁷,
Inge Li Gørtz¹⁷ &
Jesper Kristensen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7183))

Included in the following conference series:

International Conference on Language and Automata Theory and Applications

858 Accesses
1 Citations

Abstract

The longest common extension (LCE) problem is to preprocess a string in order to allow for a large number of LCE queries, such that the queries are efficient. The LCE value, LCE _s(i,j), is the length of the longest common prefix of the pair of suffixes starting at index i and j in the string s. The LCE problem can be solved in linear space with constant query time and a preprocessing of sorting complexity. There are two known approaches achieving these bounds, which use nearest common ancestors and range minimum queries, respectively. However, in practice a much simpler approach with linear query time, no extra space and no preprocessing achieves significantly better average case performance. We show a new algorithm, Fingerprint _k, which for a parameter k, 1 ≤ k ≤ ⌈log n ⌉, on a string of length n and alphabet size σ, gives O(k n ^1/k) query time using O(k n) space and O(k n + sort(n,σ)) preprocessing time, where sort(n,σ) is the time it takes to sort n numbers from σ. Though this solution is asymptotically strictly worse than the asymptotically best previously known algorithms, it outperforms them in practice in average case and is almost as fast as the simple linear time algorithm. On worst case input, this new algorithm is significantly faster in practice compared to the simple linear time algorithm. We also look at cache performance of the new algorithm, and we show that for k = 2, cache optimization can improve practical query time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Farach-Colton, M., Ferragina, P., Muthukrishnan, S.: On the sorting-complexity of suffix tree construction. J. ACM 47(6), 987–1011 (2000)
Article MathSciNet MATH Google Scholar
Fischer, J., Heun, V.: Theoretical and Practical Improvements on the RMQ-Problem, with Applications to LCA and LCE. In: Lewenstein, M., Valiente, G. (eds.) CPM 2006. LNCS, vol. 4009, pp. 36–48. Springer, Heidelberg (2006)
Chapter Google Scholar
Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)
Article MathSciNet MATH Google Scholar
Ilie, L., Navarro, G., Tinta, L.: The longest common extension problem revisited and applications to approximate string searching. J. Disc. Alg. 8(4), 418–428 (2010)
Article MathSciNet MATH Google Scholar
Karp, R.M., Miller, R.E., Rosenberg, A.L.: Rapid identification of repeated patterns in strings, trees and arrays. In: Proc. 4th Symp. on Theory of Computing, pp. 125–136 (1972)
Google Scholar
Landau, G.M., Vishkin, U.: Introducing efficient parallelism into approximate string matching and a new serial algorithm. In: Proc. 18th Symp. on Theory of Computing, pp. 220–230 (1986)
Google Scholar

Download references

Author information

Authors and Affiliations

DTU Informatics, Technical University of Denmark, Copenhagen, Denmark
Philip Bille, Inge Li Gørtz & Jesper Kristensen

Authors

Philip Bille
View author publications
You can also search for this author in PubMed Google Scholar
Inge Li Gørtz
View author publications
You can also search for this author in PubMed Google Scholar
Jesper Kristensen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistics, Universitat Rovira i Virgili, Avinguda Catalunya, 35, 43002, Tarragona, Spain
Adrian-Horia Dediu & Carlos Martín-Vide &

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bille, P., Gørtz, I.L., Kristensen, J. (2012). Longest Common Extensions via Fingerprinting. In: Dediu, AH., Martín-Vide, C. (eds) Language and Automata Theory and Applications. LATA 2012. Lecture Notes in Computer Science, vol 7183. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28332-1_11

Download citation

DOI: https://doi.org/10.1007/978-3-642-28332-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28331-4
Online ISBN: 978-3-642-28332-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics