String Indexing for Patterns with Wildcards

  • Philip Bille
  • Inge Li Gørtz
  • Hjalte Wedel Vildhøj
  • Søren Vind
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7357)

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.

  • A linear space index with query time O(m + σj loglogn + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case.

  • An index with query time O(m + j + occ) using space \(O(\sigma^{k^2} n \log^k\log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.

  • A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].

Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998)Google Scholar
  2. 2.
    Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proc. 11th SODA, pp. 794–803 (2000)Google Scholar
  3. 3.
    Bille, P., Gørtz, I.L.: Substring Range Reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String Matching with Variable Length Gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  5. 5.
    Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)Google Scholar
  6. 6.
    Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Disc. Algorithms 9(4), 358–364 (2011)MathSciNetMATHCrossRefGoogle Scholar
  7. 7.
    Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)MathSciNetMATHCrossRefGoogle Scholar
  8. 8.
    Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Sys. 10(4), 399–419 (2006)CrossRefGoogle Scholar
  9. 9.
    Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Coelho, L.P., Oliveira, A.L.: Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 329–336. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  11. 11.
    Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004)Google Scholar
  12. 12.
    Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. In: Proc. 9th SODA, pp. 463–472 (1998)Google Scholar
  13. 13.
    Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002)Google Scholar
  14. 14.
    Fischer, M.J., Paterson, M.S.: String-Matching and Other Products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974)Google Scholar
  15. 15.
    Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a Sparse Table with O(1) Worst Case Access Time. J. ACM 31, 538–544 (1984)MATHCrossRefGoogle Scholar
  16. 16.
    Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. ACM SIGACT News 17(4), 52–54 (1986)CrossRefGoogle Scholar
  17. 17.
    Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)MathSciNetMATHCrossRefGoogle Scholar
  18. 18.
    Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)CrossRefGoogle Scholar
  19. 19.
    Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007)Google Scholar
  20. 20.
    Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)Google Scholar
  21. 21.
    Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space Efficient Indexes for String Matching with Don’t Cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  22. 22.
    Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239–249 (1986)MathSciNetMATHCrossRefGoogle Scholar
  23. 23.
    Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)MathSciNetMATHCrossRefGoogle Scholar
  24. 24.
    Lewenstein, M.: Indexing with Gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  25. 25.
    Maas, M., Nowak, J.: Text indexing with errors. J. Disc. Algorithms 5(4), 662–681 (2007)CrossRefGoogle Scholar
  26. 26.
    Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)Google Scholar
  27. 27.
    Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)Google Scholar
  28. 28.
    Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  29. 29.
    Tsur, D.: Fast index for approximate string matching. J. Disc. Algorithms 8(4), 339–345 (2010)MathSciNetMATHCrossRefGoogle Scholar
  30. 30.
    Vildhøj, H.W., Vind, S.: String Indexing for Patterns with Wildcards. Master’s thesis, Technical University of Denmark (2011), http://www.imm.dtu.dk/~hwvi/
  31. 31.
    Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Philip Bille
    • 1
  • Inge Li Gørtz
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.DTU InformaticsTechnical University of DenmarkDenmark

Personalised recommendations