Abstract
Given strings P and Q the (exact) string matching problem is to find all positions of substrings in Q matching P. The classical Knuth-Morris-Pratt algorithm [SIAM J. Comput., 1977] solves the string matching problem in linear time which is optimal if we can only read one character at the time. However, most strings are stored in a computer in a packed representation with several characters in a single word, giving us the opportunity to read multiple characters simultaneously. In this paper we study the worst-case complexity of string matching on strings given in packed representation. Let m ≤ n be the lengths P and Q, respectively, and let σ denote the size of the alphabet. On a standard unit-cost word-RAM with logarithmic word size we present an algorithm using time
Here occ is the number of occurrences of P in Q. For m = o(n) this improves the O(n) bound of the Knuth-Morris-Pratt algorithm. Furthermore, if m = O(n/log σ n) our algorithm is optimal since any algorithm must spend at least \(\Omega(\frac{(n+m)\log \sigma}{\log n} + {\mathrm{occ}}) = \Omega(\frac{n}{\log_\sigma n} + {\mathrm{occ}})\) time to read the input and report all occurrences. The result is obtained by a novel automaton construction based on the Knuth-Morris-Pratt algorithm combined with a new compact representation of subautomata allowing an optimal tabulation-based simulation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Amir, A., Benson, G.: Efficient two-dimensional compressed matching. In: Proceedings of the 2nd Data Compression Conference, pp. 279–288 (1992)
Amir, A., Benson, G.: Two-dimensional periodicity and its applications. In: Proceedings of the 3rd Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 440–452 (1992)
Amir, A., Benson, G., Farach, M.: Let sleeping files lie: pattern matching in Z-compressed files. J. Comput. System Sci. 52(2), 299–307 (1996)
Arlazarov, V.L., Dinic, E.A., Kronrod, M.A., Faradzev, I.A.: On economic construction of the transitive closure of a directed graph (in russian). english translation in soviet math. dokl. 11, 1209–1210 (1975); Dokl. Acad. Nauk. 194, 487–488 (1970)
Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992)
Baeza-Yates, R.A.: Improved string searching. Softw. Pract. Exper. 19(3), 257–271 (1989)
Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)
Faro, S., Lecroq, T.: Efficient pattern matching on binary strings. In: Proceedings of the 35th International Conference on Current Trends in Theory and Practice of Computer Science (2009)
Fredriksson, K.: Faster string matching with super-alphabets. In: Laender, A.H.F., Oliveira, A.L. (eds.) SPIRE 2002. LNCS, vol. 2476, pp. 44–57. Springer, Heidelberg (2002)
Fredriksson, K.: Shift-or string matching with super-alphabets. Inf. Process. Lett. 87(4), 201–204 (2003)
Gusfield, D.: Algorithms on strings, trees, and sequences: computer science and computational biology. Cambridge (1997)
Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)
Klein, S.T., Ben-Nissan, M.: Accelerating Boyer Moore searches on binary texts. In: Holub, J., Žďárek, J. (eds.) CIAA 2007. LNCS, vol. 4783, pp. 130–143. Springer, Heidelberg (2007)
Knuth, D.E., James, J., Morris, H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)
Masek, W., Paterson, M.: A faster algorithm for computing string edit distances. J. Comput. System Sci. 20, 18–31 (1980)
Myers, E.W.: A four-russian algorithm for regular expression pattern matching. J. ACM 39(2), 430–448 (1992)
Navarro, G., Raffinot, M.: Flexible Pattern Matching in Strings – Practical on-line search algorithms for texts and biological sequences, 280 pages. Cambridge University Press, Cambridge (2002)
Rytter, W.: Algorithms on compressed strings and arrays. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 48–65. Springer, Heidelberg (1999)
Tarhio, J., Peltola, H.: String matching in the DNA alphabet. Softw. Pract. Exp. 27, 851–861 (1997)
Welch, T.A.: A technique for high-performance data compression. IEEE Computer 17(6), 8–19 (1984)
Wu, S., Manber, U., Myers, E.W.: A subquadratic algorithm for approximate regular expression matching. J. Algorithms 19(3), 346–360 (1995)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bille, P. (2009). Fast Searching in Packed Strings. In: Kucherov, G., Ukkonen, E. (eds) Combinatorial Pattern Matching. CPM 2009. Lecture Notes in Computer Science, vol 5577. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02441-2_11
Download citation
DOI: https://doi.org/10.1007/978-3-642-02441-2_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02440-5
Online ISBN: 978-3-642-02441-2
eBook Packages: Computer ScienceComputer Science (R0)