, Volume 72, Issue 3, pp 791–817 | Cite as

Improved Space-Time Tradeoffs for Approximate Full-Text Indexing with One Edit Error

  • Djamal Belazzougui


In this paper we are interested in indexing texts for substring matching queries with one edit error. That is, given a text T of n characters over an alphabet of size σ, we are asked to build a data structure that answers the following query: find all the occ substrings of the text that are at edit distance at most 1 from a given string q of length m. In this paper we show two new results for this problem. The first result, suitable for an unbounded alphabet, uses O(nlog ε n) (where ε is any constant such that 0<ε<1) words of space and answers to queries in time O(m+occ). This improves simultaneously in space and time over the result of Cole et al. The second result, suitable only for a constant alphabet, relies on compressed text indices and comes in two variants: the first variant uses O(nlog ε n) bits of space and answers to queries in time O(m+occ), while the second variant uses O(nloglogn) bits of space and answers to queries in time O((m+occ)loglogn). This second result improves on the previously best results for constant alphabets achieved in Lam et al. and Chan et al.


Compressed index Edit distance Approximate string matching 



The author wishes to thank the anonymous reviewers for their helpful comments and corrections and Travis and Meg Gagie for their many helpful corrections and suggestions.


  1. 1.
    Amir, A., Keselman, D., Landau, G.M., Lewenstein, M., Lewenstein, N., Rodeh, M.: Text indexing and dictionary matching with one error. J. Algorithms 37(2), 309–325 (2000) zbMATHMathSciNetCrossRefGoogle Scholar
  2. 2.
    Belazzougui, D.: Faster and space-optimal edit distance “1” dictionary. In: Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 154–167 (2009) CrossRefGoogle Scholar
  3. 3.
    Belazzougui, D., Navarro, G.: Alphabet-independent compressed text indexing. In: Proceedings of the 19th Annual European Symposium on Algorithms (ESA), pp. 748–759 (2011) Google Scholar
  4. 4.
    Belazzougui, D., Boldi, P., Pagh, R., Vigna, S.: Fast prefix search in little space, with applications. In: Proceedings of the 18th Annual European Symposium on Algorithms (ESA), pp. 427–438 (2010) Google Scholar
  5. 5.
    Bille, P., Gørtz, I.L., Sach, B., Vildhøj, H.W.: Time-space trade-offs for longest common extensions. In: Proceedings of the 23rd Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 293–305 (2012) CrossRefGoogle Scholar
  6. 6.
    Brodal, G.S., Gąsieniec, L.: Approximate dictionary queries. In: Proceedings of the 7th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 65–74 (1996) CrossRefGoogle Scholar
  7. 7.
    Buchsbaum, A.L., Goodrich, M.T., Westbrook, J.: Range searching over tree cross products. In: Proceedings of the 8th Annual European Symposium on Algorithms (ESA), pp. 120–131 (2000) Google Scholar
  8. 8.
    Chan, H.-L., Lam, T.W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: Compressed indexes for approximate string matching. Algorithmica 58(2), 263–281 (2010) zbMATHMathSciNetCrossRefGoogle Scholar
  9. 9.
    Chan, H.-L., Lam, T.-W., Sung, W.-K., Tam, S.-L., Wong, S.-S.: A linear size index for approximate pattern matching. J. Discrete Algorithms 9(4), 358–364 (2011) zbMATHMathSciNetCrossRefGoogle Scholar
  10. 10.
    Clark, D.R., Munro, J.I.: Efficient suffix trees on secondary storage (extended abstract). In: Proceedings of the 8th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 383–391 (1996) Google Scholar
  11. 11.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: Proceedings of the 36th ACM Symposium on Theory of Computing (STOC), pp. 91–100 (2004) Google Scholar
  12. 12.
    Crochemore, M., Rytter, W.: Jewels of Stringology: Text Algorithms. World Scientific, Singapore (2003) Google Scholar
  13. 13.
    Elias, P.: Efficient storage and retrieval by content and address of static files. J. ACM 21(2), 246–260 (1974) zbMATHMathSciNetCrossRefGoogle Scholar
  14. 14.
    Fano, R.M.: On the number of bits required to implement an associative memory. Memorandum 61, Computer Structures Group, Project MAC, MIT, Cambridge, Mass., n.d. (1971) Google Scholar
  15. 15.
    Ferragina, P., Manzini, G.: Indexing compressed text. J. ACM 52(4), 552–581 (2005) MathSciNetCrossRefGoogle Scholar
  16. 16.
    Fischer, J.: Optimal succinctness for range minimum queries. In: Proceedings of the 9th Latin American Theoretical Informatics Symposium (LATIN), pp. 158–169 (2010) Google Scholar
  17. 17.
    Fredkin, E.: Trie memory. Commun. ACM 3(9), 490–499 (1960) CrossRefGoogle Scholar
  18. 18.
    Grossi, R., Vitter, J.S.: Compressed suffix arrays and suffix trees with applications to text indexing and string matching. SIAM J. Comput. 35(2), 378–407 (2005) zbMATHMathSciNetCrossRefGoogle Scholar
  19. 19.
    Gusfield, D.: Algorithms on Strings, Trees, and Sequences—Computer Science and Computational Biology. Cambridge University Press, Cambridge (1997) zbMATHCrossRefGoogle Scholar
  20. 20.
    Harel, D., Tarjan, R.E.: Fast algorithms for finding nearest common ancestors. SIAM J. Comput. 13(2), 338–355 (1984) zbMATHMathSciNetCrossRefGoogle Scholar
  21. 21.
    Jacobson, G.: Space-efficient static trees and graphs. In: Proceedings of the 30th Annual Symposium on Foundations of Computer Science (FOCS), pp. 549–554 (1989) CrossRefGoogle Scholar
  22. 22.
    Karp, R.M., Rabin, M.O.: Efficient randomized pattern-matching algorithms. IBM J. Res. Dev. 31(2), 249–260 (1987) zbMATHMathSciNetCrossRefGoogle Scholar
  23. 23.
    Knuth, D.E.: The Art of Computer Programming, Volume III: Sorting and Searching. Addison-Wesley, Reading (1973) Google Scholar
  24. 24.
    Lam, T.W., Sung, W.-K., Wong, S.-S.: Improved approximate string matching using compressed suffix data structures. In: Proceedings of the 16th International Symposium on Algorithms and Computation (ISAAC), pp. 339–348 (2005) Google Scholar
  25. 25.
    Lam, T.W., Sung, W.-K., Wong, S.-S.: Improved approximate string matching using compressed suffix data structures. Algorithmica 51(3), 298–314 (2008) zbMATHMathSciNetCrossRefGoogle Scholar
  26. 26.
    Maaß, M.G., Nowak, J.: Text indexing with errors. In: Proceedings of the 16th Annual Symposium on Combinatorial Pattern Matching (CPM), pp. 21–32 (2005) CrossRefGoogle Scholar
  27. 27.
    Manber, U., Myers, E.W.: Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993) zbMATHMathSciNetCrossRefGoogle Scholar
  28. 28.
    McCreight, E.M.: A space-economical suffix tree construction algorithm. J. ACM 23(2), 262–272 (1976) zbMATHMathSciNetCrossRefGoogle Scholar
  29. 29.
    Munro, J.I.: Tables. In: Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pp. 37–42 (1996) CrossRefGoogle Scholar
  30. 30.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: Proceedings of the 14th Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 657–666 (2002) Google Scholar
  31. 31.
    Raman, R., Raman, V., Satti, S.R.: Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets. ACM Trans. Algorithms 3(4) (2007) Google Scholar
  32. 32.
    Rao, S.S.: Time-space trade-offs for compressed suffix arrays. Inf. Process. Lett. 82(6), 307–311 (2002) zbMATHCrossRefGoogle Scholar
  33. 33.
    Sadakane, K.: Succinct data structures for flexible text retrieval systems. J. Discrete Algorithms 5(1), 12–22 (2007) zbMATHMathSciNetCrossRefGoogle Scholar
  34. 34.
    Weiner, P.: Linear pattern matching algorithms. In: Proceedings of the 14th Annual Symposium on Switching and Automata Theory (FOCS), pp. 1–11 (1973) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2014

Authors and Affiliations

  1. 1.Helsinki Institute for Information Technology (HIIT), Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations