Abstract
A set of strings, called a string dictionary, is a basic string data structure. The most primitive query, where one seeks the existence of a pattern in the dictionary, is called a lookup query. Approximate lookup queries, i.e., to lookup the existence of a pattern with a bounded number of errors, is a fundamental string problem. Several data structures have been proposed to do so efficiently. Almost all solutions consider a single error, as will this result. Lately, Belazzougui and Venturini (CPM 2013) raised the question whether one can construct efficient indexes that support lookup queries with one error in optimal query time, that is, \(O(|p|/\omega + \textit{occ})\), where \(p\) is the query, \(\omega \) the machine word-size, and \(occ\) the number of occurrences.
Specifically, for the problem of one mismatch and constant alphabet size, we obtain optimal query time. For a dictionary of \(d\) strings our proposed index uses \(O(\omega d\log ^{1+\epsilon }d)\) additional bit space (beyond the space required to access the dictionary data, which can be maintained in compressed form). Our results are parameterized for a space-time tradeoff.
We propose more results for the case of lookup queries with one insertion/deletion on dictionaries over a constant sized alphabet. These results are especially effective for large patterns.
T. Chan—The research is supported by an NSERC grant.
M. Lewenstein—This research is supported by a BSF grant 2010437 and a GIF grant 1147/2011.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The space attributed to this algorithm in [4] is \(O(d|p|^2\log d)\) bits. However, this is probably because it was assumed that the generated strings, which are of size \(O(d|p|^2\log d)\) bits, need to be maintained. However, this is not the case. It is sufficient to maintain the hash function and not the fully generated strings.
References
Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000)
Amir, A., Levy, A., Porat, E., Shalom, B.R.: Dictionary matching with one gap. In: Kulikov, A.S., Kuznetsov, S.O., Pevzner, P. (eds.) CPM 2014. LNCS, vol. 8486, pp. 11–20. Springer, Heidelberg (2014)
Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Kucherov, G., Ukkonen, E. (eds.) CPM 2009 Lille. LNCS, vol. 5577, pp. 154–167. Springer, Heidelberg (2009)
Belazzougui, D., Venturini, R.: Compressed string dictionary look-up with edit distance one. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 280–292. Springer, Heidelberg (2012)
Bille, P., Gørtz, I.L., Vildhøj, H.W., Vind, S.: String indexing for patterns with wildcards. Theory Comput. Syst. 55(1), 41–60 (2014)
Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)
Brodal, G.S., Gasieniec, L.: Approximate dictionary queries. In: Hirschberg, D.S., Meyers, G. (eds.) CPM 1996. LNCS, vol. 1075, pp. 65–74. Springer, Heidelberg (1996)
Brodal, G.S., Venkatesh, S.: Improved bounds for dictionary look-up with one error. Inf. Process. Lett. 75(1–2), 57–59 (2000)
Chan, H., Lam, T.W., Sung, W., Tam, S., Wong, S.: Compressed indexes for approximate string matching. Algorithmica 58(2), 263–281 (2010)
Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011)
Chan, T.M., Larsen, K.G., Pǎtraşcu, M.: Orthogonal range searching on the RAM, revisited. In: Proceedings of the 27th ACM Symposium on Computational Geometry, Paris, France, June 13–15, 2011, pp. 1–10 (2011)
Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of Symposium on Theory of Computing (STOC), pp. 91–100 (2004)
Demaine, E.D., López-Ortiz, A.: A linear lower bound on index size for text retrieval. J. Algorithms 48(1), 2–15 (2003)
Ferragina, P., Muthukrishnan, S., de Berg, M.: Multi-method dispatching: a geometric approach with applications to string matching problems. In: Proceedings of Symposium on Theory of Computing (STOC), pp. 483–491 (1999)
Ferragina, P., Venturini, R.: A simple storage scheme for strings achieving entropy bounds. Theor. Comput. Sci. 372(1), 115–121 (2007)
Hon, W.-K., Ku, T.-H., Shah, R., Thankachan, S.V., Vitter, J.S.: Compressed dictionary matching with one error. In: Data Compression Conference (DCC), pp.13–122 (2011)
Lam, T.-W., Sung, W.-K., Wong, S.-S.: Improved approximate string matching using compressed suffix data structures. Algorithmica 51(3), 298–314 (2008)
Levenshtein, V.I.: Binary codes capable of correcting deletions, insertions, and reversals. Sov. Phys. Doklady 10, 707–710 (1966)
Lewenstein, M.: Orthogonal range searching for text indexing. In: Brodnik, A., López-Ortiz, A., Raman, V., Viola, A. (eds.) Ianfest-66. LNCS, vol. 8066, pp. 267–302. Springer, Heidelberg (2013)
Lewenstein, M., Munro, J.I., Nekrich, Y., Thankachan, S.V.: Document retrieval with one wildcard. In: Csuhaj-Varjú, E., Dietzfelbinger, M., Ésik, Z. (eds.) MFCS 2014, Part II. LNCS, vol. 8635, pp. 529–540. Springer, Heidelberg (2014)
Lewenstein, M., Nekrich, Y., Vitter, J.S.: Space-efficient string indexing for wildcard pattern matching. In: 31st International Symposium on Theoretical Aspects of Computer Science (STACS 2014), pp. 506–517 (2014)
Policriti, A., Prezza, N.: Hashing and indexing: succinct data structures and smoothed analysis. In: Ahn, H.-K., Shin, C.-S. (eds.) ISAAC 2014. LNCS, vol. 8889, pp. 157–168. Springer, Heidelberg (2014)
Tsur, D.: Fast index for approximate string matching. J. Discrete Algorithms 8(4), 339–345 (2010)
Yao, A.C., Yao, F.F.: Dictionary look-up with one error. J. Algorithms 25(1), 194–202 (1997)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chan, T., Lewenstein, M. (2015). Fast String Dictionary Lookup with One Error. In: Cicalese, F., Porat, E., Vaccaro, U. (eds) Combinatorial Pattern Matching. CPM 2015. Lecture Notes in Computer Science(), vol 9133. Springer, Cham. https://doi.org/10.1007/978-3-319-19929-0_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-19929-0_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19928-3
Online ISBN: 978-3-319-19929-0
eBook Packages: Computer ScienceComputer Science (R0)