Skip to main content

String Indexing for Patterns with Wildcards

  • Conference paper
Algorithm Theory – SWAT 2012 (SWAT 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7357))

Included in the following conference series:

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.

  • A linear space index with query time O(m + σ j loglogn + occ). This significantly improves the previously best known linear space index by Lam et al. [ISAAC 2007], which requires query time Θ(jn) in the worst case.

  • An index with query time O(m + j + occ) using space \(O(\sigma^{k^2} n \log^k\log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.

  • A time-space trade-off, generalizing the index by Cole et al. [STOC 2004].

Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alstrup, S., Husfeldt, T., Rauhe, T.: Marked ancestor problems. In: Proc. 39th FOCS, pp. 534–543 (1998)

    Google Scholar 

  2. Amir, A., Lewenstein, M., Porat, E.: Faster algorithms for string matching with k mismatches. In: Proc. 11th SODA, pp. 794–803 (2000)

    Google Scholar 

  3. Bille, P., Gørtz, I.L.: Substring Range Reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  4. Bille, P., Li Gørtz, I., Vildhøj, H.W., Wind, D.K.: String Matching with Variable Length Gaps. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 385–394. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  5. Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: Proc. 2nd ISMB, pp. 53–61 (1994)

    Google Scholar 

  6. Chan, H.L., Lam, T.W., Sung, W.K., Tam, S.L., Wong, S.S.: A linear size index for approximate pattern matching. J. Disc. Algorithms 9(4), 358–364 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  7. Chazelle, B.: Filtering search: A new approach to query-answering. SIAM J. Comput. 15(3), 703–724 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chen, G., Wu, X., Zhu, X., Arslan, A., He, Y.: Efficient string matching with wildcards and length constraints. Knowl. Inf. Sys. 10(4), 399–419 (2006)

    Article  Google Scholar 

  9. Clifford, P., Clifford, R.: Simple deterministic wildcard matching. Inf. Process. Lett. 101(2), 53–54 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  10. Coelho, L.P., Oliveira, A.L.: Dotted Suffix Trees A Structure for Approximate Text Indexing. In: Crestani, F., Ferragina, P., Sanderson, M. (eds.) SPIRE 2006. LNCS, vol. 4209, pp. 329–336. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  11. Cole, R., Gottlieb, L., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proc. 36th STOC, pp. 91–100 (2004)

    Google Scholar 

  12. Cole, R., Hariharan, R.: Approximate string matching: A simpler faster algorithm. In: Proc. 9th SODA, pp. 463–472 (1998)

    Google Scholar 

  13. Cole, R., Hariharan, R.: Verifying candidate matches in sparse and wildcard matching. In: Proc. 34rd STOC, pp. 592–601 (2002)

    Google Scholar 

  14. Fischer, M.J., Paterson, M.S.: String-Matching and Other Products. In: Complexity of Computation, SIAM-AMS Proceedings, pp. 113–125 (1974)

    Google Scholar 

  15. Fredman, M.L., Komlós, J., Szemerédi, E.: Storing a Sparse Table with O(1) Worst Case Access Time. J. ACM 31, 538–544 (1984)

    Article  MATH  Google Scholar 

  16. Galil, Z., Giancarlo, R.: Improved string matching with k mismatches. ACM SIGACT News 17(4), 52–54 (1986)

    Article  Google Scholar 

  17. Harel, D., Tarjan, R.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The PROSITE database, its status in 1999. Nucleic Acids Res. 27(1), 215–219 (1999)

    Article  Google Scholar 

  19. Iliopoulos, C.S., Rahman, M.S.: Pattern matching algorithms with don’t cares. In: Proc. 33rd SOFSEM, pp. 116–126 (2007)

    Google Scholar 

  20. Kalai, A.: Efficient pattern-matching with don’t cares. In: Proc. 13th SODA, pp. 655–656 (2002)

    Google Scholar 

  21. Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space Efficient Indexes for String Matching with Don’t Cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  22. Landau, G., Vishkin, U.: Efficient string matching with k mismatches. Theoret. Comput. Sci. 43, 239–249 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  23. Landau, G., Vishkin, U.: Fast parallel and serial approximate string matching. J. Algorithms 10(2), 157–169 (1989)

    Article  MathSciNet  MATH  Google Scholar 

  24. Lewenstein, M.: Indexing with Gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  25. Maas, M., Nowak, J.: Text indexing with errors. J. Disc. Algorithms 5(4), 662–681 (2007)

    Article  Google Scholar 

  26. Navarro, G., Baeza-Yates, R., Sutinen, E., Tarhio, J.: Indexing methods for approximate string matching. IEEE Data Eng. Bull. 24(4), 19–27 (2001)

    Google Scholar 

  27. Sahinalp, S., Vishkin, U.: Efficient approximate and dynamic matching of patterns using a labeling paradigm. In: Proc. 37th FOCS, pp. 320–328 (1996)

    Google Scholar 

  28. Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct Text Indexing with Wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  29. Tsur, D.: Fast index for approximate string matching. J. Disc. Algorithms 8(4), 339–345 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  30. Vildhøj, H.W., Vind, S.: String Indexing for Patterns with Wildcards. Master’s thesis, Technical University of Denmark (2011), http://www.imm.dtu.dk/~hwvi/

  31. Weiner, P.: Linear pattern matching algorithms. In: Proc. 14th SWAT, pp. 1–11 (1973)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S. (2012). String Indexing for Patterns with Wildcards. In: Fomin, F.V., Kaski, P. (eds) Algorithm Theory – SWAT 2012. SWAT 2012. Lecture Notes in Computer Science, vol 7357. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31155-0_25

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31155-0_25

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31154-3

  • Online ISBN: 978-3-642-31155-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics