Theory of Computing Systems

, Volume 55, Issue 1, pp 41–60

String Indexing for Patterns with Wildcards

  • Philip Bille
  • Inge Li Gørtz
  • Hjalte Wedel Vildhøj
  • Søren Vind
Article

DOI: 10.1007/s00224-013-9498-4

Cite this article as:
Bille, P., Gørtz, I.L., Vildhøj, H.W. et al. Theory Comput Syst (2014) 55: 41. doi:10.1007/s00224-013-9498-4

Abstract

We consider the problem of indexing a string t of length n to report the occurrences of a query pattern p containing m characters and j wildcards. Let occ be the number of occurrences of p in t, and σ the size of the alphabet. We obtain the following results.
  • A linear space index with query time O(m+σjloglogn+occ). This significantly improves the previously best known linear space index by Lam et al. (in Proc. 18th ISAAC, pp. 846–857, [2007]), which requires query time Θ(jn) in the worst case.

  • An index with query time O(m+j+occ) using space \(O(\sigma^{k^{2}} n \log^{k} \log n)\), where k is the maximum number of wildcards allowed in the pattern. This is the first non-trivial bound with this query time.

  • A time-space trade-off, generalizing the index by Cole et al. (in Proc. 36th STOC, pp. 91–100, [2004]).

We also show that these indexes can be generalized to allow variable length gaps in the pattern. Our results are obtained using a novel combination of well-known and new techniques, which could be of independent interest.

Keywords

String indexing Wildcard Variable length gap Suffix tree LCP data structure 

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Philip Bille
    • 1
  • Inge Li Gørtz
    • 1
  • Hjalte Wedel Vildhøj
    • 1
  • Søren Vind
    • 1
  1. 1.DTU ComputeTechnical University of DenmarkLyngbyDenmark

Personalised recommendations