Skip to main content

String Indexing for Patterns with Wildcards

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNTCS,volume 7357)

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.

  • A linear space index with query time O(m + σ j loglogn + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case.

  • An index with query time O(m + j + occ) using space \(O(\sigma^{k^2} n \log^k\log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.

  • A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].

Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-642-31155-0_25
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   64.99
Price excludes VAT (USA)
  • ISBN: 978-3-642-31155-0
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   84.99
Price excludes VAT (USA)

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998)

    Google Scholar 

  2. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proc. 11th SODA, pp. 794–803 (2000)

    Google Scholar 

  3. Bille, P., Gørtz, I.L.: Substring Range Reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  4. Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String Matching with Variable Length Gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)

    CrossRef  Google Scholar 

  5. Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)

    Google Scholar 

  6. Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Disc. Algorithms 9(4), 358–364 (2011)

    MathSciNet  MATH  CrossRef  Google Scholar 

  7. Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)

    MathSciNet  MATH  CrossRef  Google Scholar 

  8. Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Sys. 10(4), 399–419 (2006)

    CrossRef  Google Scholar 

  9. Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)

    MathSciNet  MATH  CrossRef  Google Scholar 

  10. Coelho, L.P., Oliveira, A.L.: Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 329–336. Springer, Heidelberg (2006)

    CrossRef  Google Scholar 

  11. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004)

    Google Scholar 

  12. Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. In: Proc. 9th SODA, pp. 463–472 (1998)

    Google Scholar 

  13. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002)

    Google Scholar 

  14. Fischer, M.J., Paterson, M.S.: String-Matching and Other Products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974)

    Google Scholar 

  15. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a Sparse Table with O(1) Worst Case Access Time. J. ACM 31, 538–544 (1984)

    MATH  CrossRef  Google Scholar 

  16. Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. ACM SIGACT News 17(4), 52–54 (1986)

    CrossRef  Google Scholar 

  17. Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    MathSciNet  MATH  CrossRef  Google Scholar 

  18. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)

    CrossRef  Google Scholar 

  19. Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007)

    Google Scholar 

  20. Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)

    Google Scholar 

  21. Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space Efficient Indexes for String Matching with Don’t Cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)

    CrossRef  Google Scholar 

  22. Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239–249 (1986)

    MathSciNet  MATH  CrossRef  Google Scholar 

  23. Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    MathSciNet  MATH  CrossRef  Google Scholar 

  24. Lewenstein, M.: Indexing with Gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  25. Maas, M., Nowak, J.: Text indexing with errors. J. Disc. Algorithms 5(4), 662–681 (2007)

    CrossRef  Google Scholar 

  26. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)

    Google Scholar 

  27. Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)

    Google Scholar 

  28. Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)

    CrossRef  Google Scholar 

  29. Tsur, D.: Fast index for approximate string matching. J. Disc. Algorithms 8(4), 339–345 (2010)

    MathSciNet  MATH  CrossRef  Google Scholar 

  30. Vildhøj, H.W., Vind, S.: String Indexing for Patterns with Wildcards. Master’s thesis, Technical University of Denmark (2011), http://www.imm.dtu.dk/~hwvi/

  31. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S. (2012). String Indexing for Patterns with Wildcards. In: Fomin, F.V., Kaski, P. (eds) Algorithm Theory – SWAT 2012. SWAT 2012. Lecture Notes in Computer Science, vol 7357. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31155-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31155-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31154-3

  • Online ISBN: 978-3-642-31155-0

  • eBook Packages: Computer ScienceComputer Science (R0)