Less Space: Indexing for Queries with Wildcards

  • Moshe Lewenstein
  • J. Ian Munro
  • Venkatesh Raman
  • Sharma V. Thankachan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8283)


Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n], such that whenever a pattern P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is O(nlog k n) words or O(nlog k + 1 n) bits and the query time is O(p + 2 h loglogn + occ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlog k + ε n) bits index achieving the same query time as that of Cole et al.’s index, where 0 < ε < 1 is an arbitrary small constant. Then we propose another index of size O(nlog k nlogσ) bits, but with a slightly higher query time of O(p + 2 h logn + occ), where σ denotes the alphabet set size.


