Theory of Computing Systems

, Volume 55, Issue 1, pp 41–60 | Cite as

String Indexing for Patterns with Wildcards

  • Philip Bille
  • Inge Li Gørtz
  • Hjalte Wedel Vildhøj
  • Søren Vind


We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.
  • A linear space index with query time O(m+σjloglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, [2007]), which requires query time Θ(jn) in the worst case.

  • An index with query time O(m+j+occ) using space \(O(\sigma^{k^{2}} n \log^{k} \log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.

  • A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91–100, [2004]).

We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.


String indexing Wildcard Variable length gap Suffix tree LCP data structure 


  1. 1.
    Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998) Google Scholar
  2. 2.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. J. Algorithms 50(2), 257–275 (2004) CrossRefMATHMathSciNetGoogle Scholar
  3. 3.
    Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Proc. 20th CPM, pp. 154–167 (2009) Google Scholar
  4. 4.
    Bille, P., Gørtz, I.L.: Substring range reporting. In: Proc. 22nd CPM, pp. 299–308 (2011) Google Scholar
  5. 5.
    Bille, P., Gørtz, I.L., Vildhøj, H., Wind, D.: String matching with variable length gaps. In: Proc. 17th SPIRE, pp. 385–394 (2010) Google Scholar
  6. 6.
    Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994) Google Scholar
  7. 7.
    Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011) CrossRefMATHMathSciNetGoogle Scholar
  8. 8.
    Chazelle, B.: Filtering search: a new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986) CrossRefMATHMathSciNetGoogle Scholar
  9. 9.
    Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Syst. 10(4), 399–419 (2006) CrossRefGoogle Scholar
  10. 10.
    Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007) CrossRefMATHMathSciNetGoogle Scholar
  11. 11.
    Coelho, L., Oliveira, A.: Dotted suffix trees a structure for approximate text indexing. In: Proc. 13th SPIRE, pp. 329–336 (2006) Google Scholar
  12. 12.
    Cole, R., Hariharan, R.: Approximate string matching: a simpler faster algorithm. SIAM J. Comput. 31(6), 1761–1782 (2002) CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002) Google Scholar
  14. 14.
    Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004) Google Scholar
  15. 15.
    Fischer, M.J., Paterson, M.S.: String-matching and other products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974) Google Scholar
  16. 16.
    Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a sparse table with O(1) worst case access time. J. ACM 31, 538–544 (1984) CrossRefMATHGoogle Scholar
  17. 17.
    Fredriksson, K., Grabowski, S.: Efficient algorithms for pattern matching with general gaps, character classes, and transposition invariance. Inf. Retr. 11(4), 335–357 (2008) CrossRefGoogle Scholar
  18. 18.
    Fredriksson, K., Grabowski, S.: Nested counters in bit-parallel string matching. In: Proc. 3rd LATA, pp. 338–349 (2009) Google Scholar
  19. 19.
    Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. SIGACT News 17(4), 52–54 (1986) CrossRefGoogle Scholar
  20. 20.
    Hagerup, T.: Sorting and searching on the word RAM. In: Proc. 15th STACS, pp. 366–398 (1998) Google Scholar
  21. 21.
    Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984) CrossRefMATHMathSciNetGoogle Scholar
  22. 22.
    Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999) CrossRefGoogle Scholar
  23. 23.
    Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007) Google Scholar
  24. 24.
    Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002) Google Scholar
  25. 25.
    Lam, T.W., Sung, W.K., Tam, S.L., Yiu, S.M.: Space efficient indexes for string matching with don’t cares. In: Proc. 18th ISAAC, pp. 846–857 (2007) Google Scholar
  26. 26.
    Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theor. Comput. Sci. 43, 239–249 (1986) CrossRefMATHMathSciNetGoogle Scholar
  27. 27.
    Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989) CrossRefMATHMathSciNetGoogle Scholar
  28. 28.
    Lewenstein, M.: Indexing with gaps. In: Proc. 18th SPIRE, pp. 135–143 (2011) Google Scholar
  29. 29.
    Maas, M., Nowak, J.: Text indexing with errors. J. Discrete Algorithms 5(4), 662–681 (2007) CrossRefMathSciNetGoogle Scholar
  30. 30.
    Mehldau, G., Myers, G.: A system for pattern matching applications on biosequences. Comput. Appl. Biosci. 9(3), 299–314 (1993) Google Scholar
  31. 31.
    Morgante, M., Policriti, A., Vitacolonna, N., Zuccolo, A.: Structured motifs search. J. Comput. Biol. 12(8), 1065–1082 (2005) CrossRefGoogle Scholar
  32. 32.
    Myers, E.: Approximate matching of network expressions with spacers. J. Comput. Biol. 3(1), 33–51 (1996) CrossRefGoogle Scholar
  33. 33.
    Navarro, G., Raffinot, M.: Fast and simple character classes and bounded gaps pattern matching, with applications to protein searching. J. Comput. Biol. 10(6), 903–923 (2003) CrossRefGoogle Scholar
  34. 34.
    Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001) Google Scholar
  35. 35.
    Rahman, M.S., Iliopoulos, C.S., Lee, I., Mohamed, M., Smyth, W.F.: Finding patterns with variable length gaps or don’t cares. In: Proc. 12th COCOON, pp. 146–155 (2006) Google Scholar
  36. 36.
    Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996) Google Scholar
  37. 37.
    Tam, A., Wu, E., Lam, T., Yiu, S.: Succinct text indexing with wildcards. In: Proc. 16th SPIRE, pp. 39–50 (2009) Google Scholar
  38. 38.
    Tsur, D.: Fast index for approximate string matching. J. Discrete Algorithms 8(4), 339–345 (2010) CrossRefMATHMathSciNetGoogle Scholar
  39. 39.
    Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973) Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Philip Bille
    • 1
  • Inge Li Gørtz
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.DTU ComputeTechnical University of DenmarkLyngbyDenmark

Personalised recommendations