Advertisement

Fast String Dictionary Lookup with One Error

  • Timothy Chan
  • Moshe Lewenstein
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9133)

Abstract

A set of strings, called a string dictionary, is a basic string data structure. The most primitive query, where one seeks the existence of a pattern in the dictionary, is called a lookup query. Approximate lookup queries, i.e., to lookup the existence of a pattern with a bounded number of errors, is a fundamental string problem. Several data structures have been proposed to do so efficiently. Almost all solutions consider a single error, as will this result. Lately, Belazzougui and Venturini (CPM 2013) raised the question whether one can construct efficient indexes that support lookup queries with one error in optimal query time, that is, \(O(|p|/\omega + \textit{occ})\), where \(p\) is the query, \(\omega \) the machine word-size, and \(occ\) the number of occurrences.

Specifically, for the problem of one mismatch and constant alphabet size, we obtain optimal query time. For a dictionary of \(d\) strings our proposed index uses \(O(\omega d\log ^{1+\epsilon }d)\) additional bit space (beyond the space required to access the dictionary data, which can be maintained in compressed form). Our results are parameterized for a space-time tradeoff.

We propose more results for the case of lookup queries with one insertion/deletion on dictionaries over a constant sized alphabet. These results are especially effective for large patterns.

Keywords

Range Query Edit Distance Query Time Binary Alphabet Encode Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)zbMATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014) Google Scholar
  3. 3.
    Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 154–167. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  4. 4.
    Belazzougui, D., Venturini, R.: Compressed string dictionary look-up with edit distance one. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 280–292. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  5. 5.
    Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. Theory Comput. Syst. 55(1), 41–60 (2014)zbMATHMathSciNetCrossRefGoogle Scholar
  6. 6.
    Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)zbMATHMathSciNetCrossRefGoogle Scholar
  7. 7.
    Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996) CrossRefGoogle Scholar
  8. 8.
    Brodal, G.S., Venkatesh, S.: Improved bounds for dictionary look-up with one error. Inf. Process. Lett. 75(1–2), 57–59 (2000)MathSciNetCrossRefGoogle Scholar
  9. 9.
    Chan, H., Lam, T.W., Sung, W., Tam, S., Wong, S.: Compressed indexes for approximate string matching. Algorithmica 58(2), 263–281 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)zbMATHMathSciNetCrossRefGoogle Scholar
  11. 11.
    Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13–15, 2011, pp. 1–10 (2011)Google Scholar
  12. 12.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing (STOC), pp. 91–100 (2004)Google Scholar
  13. 13.
    Demaine, E.D., López-Ortiz, A.: A linear lower bound on index size for text retrieval. J. Algorithms 48(1), 2–15 (2003)zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: a geometric approach with applications to string matching problems. In: Proceedings of Symposium on Theory of Computing (STOC), pp. 483–491 (1999)Google Scholar
  15. 15.
    Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372(1), 115–121 (2007)zbMATHMathSciNetCrossRefGoogle Scholar
  16. 16.
    Hon, W.-K., Ku, T.-H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: Data Compression Conference (DCC), pp.13–122 (2011)Google Scholar
  17. 17.
    Lam, T.-W., Sung, W.-K., Wong, S.-S.: Improved approximate string matching using compressed suffix data structures. Algorithmica 51(3), 298–314 (2008)zbMATHMathSciNetCrossRefGoogle Scholar
  18. 18.
    Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady 10, 707–710 (1966)MathSciNetGoogle Scholar
  19. 19.
    Lewenstein, M.: Orthogonal range searching for text indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 267–302. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  20. 20.
    Lewenstein, M., Munro, J.I., Nekrich, Y., Thankachan, S.V.: Document retrieval with one wildcard. In: Csuhaj-Varjú, E., Dietzfelbinger, M., Ésik, Z. (eds.) MFCS 2014, Part II. LNCS, vol. 8635, pp. 529–540. Springer, Heidelberg (2014) Google Scholar
  21. 21.
    Lewenstein, M., Nekrich, Y., Vitter, J.S.: Space-efficient string indexing for wildcard pattern matching. In: 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), pp. 506–517 (2014)Google Scholar
  22. 22.
    Policriti, A., Prezza, N.: Hashing and indexing: succinct data structures and smoothed analysis. In: Ahn, H.-K., Shin, C.-S. (eds.) ISAAC 2014. LNCS, vol. 8889, pp. 157–168. Springer, Heidelberg (2014) Google Scholar
  23. 23.
    Tsur, D.: Fast index for approximate string matching. J. Discrete Algorithms 8(4), 339–345 (2010)zbMATHMathSciNetCrossRefGoogle Scholar
  24. 24.
    Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. Algorithms 25(1), 194–202 (1997)zbMATHMathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Cheriton School of Computer ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Department of Computer ScienceBar-Ilan UniversityRamat GanIsrael

Personalised recommendations