Less Space: Indexing for Queries with Wildcards

  • Moshe Lewenstein
  • J. Ian Munro
  • Venkatesh Raman
  • Sharma V. Thankachan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8283)

Abstract

Text indexing is a fundamental problem in computer science, where the task is to index a given text (string) T[1..n], such that whenever a pattern P[1..p] comes as a query, we can efficiently report all those locations where P occurs as a substring of T. In this paper, we consider the case when P contains wildcard characters (which can match with any other character). The first non-trivial solution for the problem is given by Cole et al. [STOC 2004], where the index space is O(nlog k n) words or O(nlog k + 1 n) bits and the query time is O(p + 2 h loglogn + occ), where k is the maximum number of wildcard characters allowed in P, h ≤ k is the number of wildcard characters in P and occ represents the number of occurrences of P in T. Even though many indexes offering different space-time trade-offs were later proposed, a clear improvement on this result is still not known. In this paper, we first propose an O(nlog k + ε n) bits index achieving the same query time as that of Cole et al.’s index, where 0 < ε < 1 is an arbitrary small constant. Then we propose another index of size O(nlog k nlogσ) bits, but with a slightly higher query time of O(p + 2 h logn + occ), where σ denotes the alphabet set size.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alstrup, S., Brodal, G.S., Rauhe, T.: New data structures for orthogonal range searching. In: FOCS, pp. 198–207 (2000)Google Scholar
  2. 2.
    Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Monotone minimal perfect hashing: searching a sorted table with O(1) accesses. In: SODA, pp. 785–794 (2009)Google Scholar
  4. 4.
    Belazzougui, D., Navarro, G., Valenzuela, D.: Improved compressed indexes for full-text document retrieval. J. Algorithms 18, 3–13 (2013)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Bille, P., Gørtz, I.L.: Substring range reporting. In: Giancarlo, R., Manzini, G. (eds.) CPM 2011. LNCS, vol. 6661, pp. 299–308. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 283–294. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  7. 7.
    Bucher, P., Bairoch, A.: A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In: ISMB, pp. 53–61 (1994)Google Scholar
  8. 8.
    Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: Compressed indexes for approximate string matching. Algorithmica 58(2), 263–281 (2010)MathSciNetCrossRefMATHGoogle Scholar
  9. 9.
    Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the RAM, revisited. In: Symposium on Computational Geometry, pp. 1–10 (2011)Google Scholar
  10. 10.
    Chien, Y.-F., Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Geometric BWT: Compressed text indexing via sparse suffixes and range searching. Algorithmica (2013)Google Scholar
  11. 11.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)Google Scholar
  12. 12.
    Golynski, A., Ian Munro, J., Srinivasa Rao, S.: Rank/select operations on large alphabets: a tool for text indexing. In: SODA, pp. 368–373 (2006)Google Scholar
  13. 13.
    Hofmann, K., Bucher, P., Falquet, L., Bairoch, A.: The prosite database, its status in 1999. Nucleic Acids Research 27(1), 215–219 (1999)CrossRefGoogle Scholar
  14. 14.
    Hon, W.-K., Ku, T.-H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed text indexing with wildcards. J. Discrete Algorithms 19, 23–29 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Huynh, T.N.D., Hon, W.-K., Lam, T.-W., Sung, W.-K.: Approximate string matching using compressed suffix arrays. Theoretical Comp. Science 352(1), 240–249 (2006)MathSciNetCrossRefMATHGoogle Scholar
  16. 16.
    Iliopoulos, C.S., Rahman, M.S.: Indexing factors with gaps. Algorithmica 55(1), 60–70 (2009)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Kärkkäinen, J., Puglisi, S.J.: Medium-space algorithms for inverse BWT. ESA (1), 451–462 (2010)Google Scholar
  18. 18.
    Lam, T.-W., Sung, W.-K., Tam, S.-L., Yiu, S.-M.: Space efficient indexes for string matching with don’t cares. In: Tokuyama, T. (ed.) ISAAC 2007. LNCS, vol. 4835, pp. 846–857. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  19. 19.
    Lewenstein, M.: Indexing with gaps. In: Grossi, R., Sebastiani, F., Silvestri, F. (eds.) SPIRE 2011. LNCS, vol. 7024, pp. 135–143. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  20. 20.
    Lewenstein, M.: Orthogonal range searching for text indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 267–302. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  21. 21.
    Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)MathSciNetCrossRefMATHGoogle Scholar
  22. 22.
    Rahman, M.S., Iliopoulos, C.S.: Pattern matching algorithms with don’t cares. In: SOFSEM (2), pp. 116–126 (2007)Google Scholar
  23. 23.
    Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Transactions on Algorithms 3(4) (2007)Google Scholar
  24. 24.
    Sadakane, K., Navarro, G.: Fully-functional succinct trees. In: SODA, pp. 134–149 (2010)Google Scholar
  25. 25.
    Sleator, D.D., Tarjan, R.E.: A data structure for dynamic trees. J. Comput. Syst. Sci. 26(3), 362–391 (1983)MathSciNetCrossRefMATHGoogle Scholar
  26. 26.
    Tam, A., Wu, E., Lam, T.-W., Yiu, S.-M.: Succinct text indexing with wildcards. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds.) SPIRE 2009. LNCS, vol. 5721, pp. 39–50. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  27. 27.
    Thachuk, C.: Compressed indexes for text with wildcards. Theor. Comput. Sci. 483, 22–35 (2013)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Moshe Lewenstein
    • 1
  • J. Ian Munro
    • 2
  • Venkatesh Raman
    • 3
  • Sharma V. Thankachan
    • 4
  1. 1.Bar-Ilan UniversityIsrael
  2. 2.University of WaterlooCanada
  3. 3.The Institute of Mathematical SciencesIndia
  4. 4.Louisiana State UniversityUSA

Personalised recommendations