On Hardness of Several String Indexing Problems

  • Kasper Green Larsen
  • J. Ian Munro
  • Jesper Sindahl Nielsen
  • Sharma V. Thankachan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8486)


Let \({\cal D} =\{d_1,d_2,...,d_D\}\) be a collection of D string documents of n characters in total. The two-pattern matching problems ask to index \({\cal D}\) for answering the following queries efficiently.

  • report/count the unique documents containing P 1 and P 2.

  • report/count the unique documents containing P 1 , but not P 2.

Here P 1 and P 2 represent input patterns of length p 1 and p 2 respectively. Linear space data structures with \(O(p_1+p_2+\sqrt{nk}\log^{O(1)} n)\) query cost are already known for the reporting version, where k represents the output size. For the counting version (i.e., report the value k), a simple linear-space index with \(O(p_1+p_2+ \sqrt{n})\) query cost can be constructed in O(n 3/2) time. However, it is still not known if these are the best possible bounds for these problems. In this paper, we show a strong connection between these string indexing problems and the boolean matrix multiplication problem. Based on this, we argue that these results cannot be improved significantly using purely combinatorial techniques. We also provide an improved upper bound for a related problem known as two-dimensional substring indexing.


Query Time Boolean Matrix Output Size Word Structure Boolean Matrice 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bansal, N., Williams, R.: Regularity lemmas and combinatorial algorithms. Theory of Computing 8(1), 69–94 (2012)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Brodal, G.S., Davoodi, P., Rao, S.S.: On space efficient two dimensional range minimum data structures. Algorithmica 63(4), 815–830 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Chan, T.M., Durocher, S., Larsen, K.G., Morrison, J., Wilkinson, B.T.: Linear-space data structures for range mode query in arrays. In: STACS. LIPIcs, vol. 14, pp. 290–301. Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik (2012)Google Scholar
  4. 4.
    Chan, T.M., Durocher, S., Skala, M., Wilkinson, B.T.: Linear-space data structures for range minority query in arrays. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 295–306. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  5. 5.
    Chan, T.M., Larsen, K.G., Patrascu, M.: Orthogonal range searching on the ram, revisited. In: Symposium on Computational Geometry, pp. 1–10. ACM (2011)Google Scholar
  6. 6.
    Cohen, H., Porat, E.: Fast set intersection and two-patterns matching. Theor. Comput. Sci. 411(40-42), 3795–3800 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  7. 7.
    Ferragina, P., Koudas, N., Muthukrishnan, S., Srivastava, D.: Two-dimensional substring indexing. J. Comput. Syst. Sci. 66(4), 763–774 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  8. 8.
    Fischer, J., Gagie, T., Kopelowitz, T., Lewenstein, M., Mäkinen, V., Salmela, L., Välimäki, N.: Forbidden patterns. In: Fernández-Baca, D. (ed.) LATIN 2012. LNCS, vol. 7256, pp. 327–337. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  9. 9.
    Gall, F.L.: Powers of tensors and fast matrix multiplication. CoRR, abs/1401.7714 (2014)Google Scholar
  10. 10.
    Golynski, A., Munro, J.I., Rao, S.S.: Rankselect operations on large alphabets: A tool for text indexing. In: SODA, pp. 368–373. ACM Press (2006)Google Scholar
  11. 11.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: String retrieval for multi-pattern queries. In: Chavez, E., Lonardi, S. (eds.) SPIRE 2010. LNCS, vol. 6393, pp. 55–66. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  12. 12.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Document listing for queries with excluded pattern. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 185–195. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  13. 13.
    Hon, W.-K., Shah, R., Thankachan, S.V., Vitter, J.S.: Space-efficient framework for top-k string retrieval. In: JACM (2014)Google Scholar
  14. 14.
    JáJá, J., Mortensen, C.W., Shi, Q.: Space-efficient and fast algorithms for multidimensional dominance reporting and counting. In: Fleischer, R., Trippen, G. (eds.) ISAAC 2004. LNCS, vol. 3341, pp. 558–568. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Matias, Y., Muthukrishnan, S.M., Şahinalp, S.C., Ziv, J.: Augmenting suffix trees, with applications. In: Bilardi, G., Pietracaprina, A., Italiano, G.F., Pucci, G. (eds.) ESA 1998. LNCS, vol. 1461, pp. 67–78. Springer, Heidelberg (1998)Google Scholar
  16. 16.
    Muthukrishnan, S.: Efficient algorithms for document retrieval problems. In: SODA, pp. 657–666. ACM/SIAM (2002)Google Scholar
  17. 17.
    Navarro, G.: Spaces, trees and colors: The algorithmic landscape of document retrieval on sequences. CoRR, abs/1304.6023 (2013)Google Scholar
  18. 18.
    Nekrich, Y., Navarro, G.: Sorted range reporting. In: Fomin, F.V., Kaski, P. (eds.) SWAT 2012. LNCS, vol. 7357, pp. 271–282. Springer, Heidelberg (2012)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kasper Green Larsen
    • 1
  • J. Ian Munro
    • 2
  • Jesper Sindahl Nielsen
    • 1
  • Sharma V. Thankachan
    • 2
  1. 1.Aarhus UniversityDenmark
  2. 2.University of WaterlooCanada

Personalised recommendations