A Very Fast String Matching Algorithm Based on Condensed Alphabets

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9778)

Abstract

String matching is the problem of finding all the substrings of a text which correspond to a given pattern. It’s one of the most investigated problem in computer science, mainly due to its various applications in many fields. In recent years most solutions to the problem focused on efficiency and flexibility of the searching procedure and effective techniques appeared to speed-up previous solutions. In this paper we present a simple and very efficient algorithm for string matching. It can be seen as an extension of the Skip-Search algorithm to condensed alphabets with the aim of reducing the number of verifications during the searching phase. From our experimental results it turns out that the new variant obtains in most cases the best running time when compared against the most effective algorithms in literature. This makes the new algorithm one of the most flexible solutions in practical cases.

Keywords

Exact text analysis String matching Experimental algorithms Text processing 

References

  1. 1.
    Allauzen, C., Crochemore, M., Raffinot, M.: Factor oracle: A new structure for pattern matching. In: Bartosek, M., Tel, G., Pavelka, J. (eds.) SOFSEM 1999. LNCS, vol. 1725, pp. 295–310. Springer, Heidelberg (1999). http://dx.doi.org/10.1007/3-540-47849-3_18 CrossRefGoogle Scholar
  2. 2.
    Baeza-Yates, R., Gonnet, G.H.: A new approach to text searching. Commun. ACM 35(10), 74–82 (1992). http://doi.acm.org/10.1145/135239.135243 CrossRefGoogle Scholar
  3. 3.
    Cantone, D., Faro, S.: Fast-search algorithms: New efficient variants of the boyer-moore pattern-matching algorithm. J. Automata Lang. Comb. 10(5/6), 589–608 (2005)MathSciNetMATHGoogle Scholar
  4. 4.
    Cantone, D., Faro, S.: Improved and self-tuned occurrence heuristics. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference 2013, Prague, Czech Republic, 2–4 September 2013, pp. 92–106. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague (2013). http://www.stringology.org/event/2013/p09.html
  5. 5.
    Cantone, D., Faro, S., Giaquinta, E.: A compact representation of nondeterministic (suffix) automata for the bit-parallel approach. Inf. Comput. 213, 3–12 (2012). http://dx.doi.org/10.1016/j.ic.2011.03.006 MathSciNetCrossRefMATHGoogle Scholar
  6. 6.
    Charras, C., Lecroq, T.: Handbook of Exact String Matching Algorithms. College Publications (2004)Google Scholar
  7. 7.
    Charras, C., Lecroq, T., Pehoushek, J.D.: A very fast string matching algorithm for small alphabeths and long patterns (extended abstract). In: Farach-Colton [11], pp. 55–64. http://dx.doi.org/10.1007/BFb0030780
  8. 8.
    Crochemore, M., Czumaj, A., Gasieniec, L., Jarominek, S., Lecroq, T., Plandowski, W., Rytter, W.: Speeding up two string-matching algorithms. Algorithmica 12(4/5), 247–267 (1994). http://dx.doi.org/10.1007/BF01185427 MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Durian, B., Chhabra, T., Ghuman, S.S., Hirvola, T., Peltola, H., Tarhio, J.: Improved two-way bit-parallel search. In: Holub, J., Zdárek, J. (eds.) Proceedings of the Prague Stringology Conference 2014, Prague, Czech Republic, 1–3 September 2014, pp. 71–83. Department of Theoretical Computer Science, Faculty of Information Technology, Czech Technical University in Prague (2014)Google Scholar
  10. 10.
    Ďurian, B., Peltola, H., Salmela, L., Tarhio, J.: Bit-parallel search algorithms for long patterns. In: Festa, P. (ed.) SEA 2010. LNCS, vol. 6049, pp. 129–140. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  11. 11.
    Farach-Colton, M. (ed.): CPM 1998. LNCS, vol. 1448. Springer, Heidelberg (1998)Google Scholar
  12. 12.
    Faro, S., Külekci, M.O.: Fast and flexible packed string matching. J. Discrete Algorithms 28, 61–72 (2014). http://dx.doi.org/10.1016/j.jda.2014.07.003 MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Faro, S., Lecroq, T.: Efficient variants of the backward-oracle-matching algorithm. In: Holub, J., Žďárek, J. (eds.) Proceedings of the Prague Stringology Conference 2008, pp. 146–160. Czech Technical University in Prague, Czech Republic (2008)Google Scholar
  14. 14.
    Faro, S., Lecroq, T.: The exact string matching problem: a comprehensive experimental evaluation. CoRR abs/1012.2547 (2010)Google Scholar
  15. 15.
    Faro, S., Lecroq, T.: A fast suffix automata based algorithm for exact online string matching. In: Moreira, N., Reis, R. (eds.) CIAA 2012. LNCS, vol. 7381, pp. 149–158. Springer, Heidelberg (2012). http://dx.doi.org/10.1007/978-3-642-31606-7_13 CrossRefGoogle Scholar
  16. 16.
    Faro, S., Lecroq, T.: A multiple sliding windows approach to speed up string matching algorithms. In: Klasing, R. (ed.) SEA 2012. LNCS, vol. 7276, pp. 172–183. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  17. 17.
    Faro, S., Lecroq, T.: The exact online string matching problem: A review of the most recent results. ACM Comput. Surv. 45(2), 13 (2013). http://doi.acm.org/10.1145/2431211.2431212 CrossRefMATHGoogle Scholar
  18. 18.
    Fredriksson, K., Grabowski, S.: Practical and optimal string matching. In: Consens, M.P., Navarro, G. (eds.) SPIRE 2005. LNCS, vol. 3772, pp. 376–387. Springer, Heidelberg (2005). http://dx.doi.org/10.1007/11575832_42 CrossRefGoogle Scholar
  19. 19.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987)MathSciNetCrossRefMATHGoogle Scholar
  20. 20.
    Knuth, D.E., Morris, J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(1), 323–350 (1977)MathSciNetCrossRefMATHGoogle Scholar
  21. 21.
    Lecroq, T.: Fast exact string matching algorithms. Inf. Process. Lett. 102(6), 229–235 (2007). http://dx.doi.org/10.1016/j.ipl.2007.01.002 MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Navarro, G., Raffinot, M.: A bit-parallel approach to suffix automata: Fast extended string matching. In: Farach-Colton [11], pp. 14–33. http://dx.doi.org/10.1007/BFb0030778
  23. 23.
    Yao, A.C.: The complexity of pattern matching for a random string. SIAM J. Comput. 8(3), 368–387 (1979). http://dx.doi.org/10.1137/0208029 MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Università di CataniaCataniaItaly

Personalised recommendations