Advertisement

On Hardness of Jumbled Indexing

  • Amihood Amir
  • Timothy M. Chan
  • Moshe Lewenstein
  • Noa Lewenstein
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8572)

Abstract

Jumbled indexing is the problem of indexing a text T for queries that ask whether there is a substring of T matching a pattern represented as a Parikh vector, i.e., the vector of frequency counts for each character. Jumbled indexing has garnered a lot of interest in the last four years; for a partial list see [2,6,13,16,17,20,22,24,26,30,35,36]. There is a naive algorithm that preprocesses all answers in O(n 2|Σ|) time allowing quick queries afterwards, and there is another naive algorithm that requires no preprocessing but has O(nlog|Σ|) query time. Despite a tremendous amount of effort there has been little improvement over these running times.

In this paper we provide good reason for this. We show that, under a 3SUM-hardness assumption, jumbled indexing for alphabets of size ω(1) requires Ω(n 2 − ε ) preprocessing time or Ω(n 1 − δ ) query time for any ε,δ > 0. In fact, under a stronger 3SUM-hardness assumption, for any constant alphabet size r ≥ 3 there exist describable fixed constant ε r and δ r such that jumbled indexing requires \(\Omega(n^{2-\epsilon_r})\) preprocessing time or \(\Omega(n^{1-\delta_r})\) query time.

Keywords

Query Time Input String Alphabet Size Naive Algorithm Text Size 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Amir, A., Butman, A., Porat, E.: On the relationship between histogram indexing and block-mass indexing. In: Philosophical Transactions A (to appear)Google Scholar
  3. 3.
    Amir, A., Church, K.W., Dar, E.: Separable attributes: a technique for solving the sub matrices character count problem. In: SODA, pp. 400–401 (2002)Google Scholar
  4. 4.
    Amir, A., Farach, M., Muthukrishnan, S.: Alphabet dependence in parameterized matching. Inf. Process. Lett. 49(3), 111–115 (1994)CrossRefzbMATHGoogle Scholar
  5. 5.
    Phanendra Babu, G., Mehtre, B.M., Kankanhalli, M.S.: Color indexing for efficient image retrieval. Multimedia Tools and Applications 1(4), 327–348 (1995)CrossRefGoogle Scholar
  6. 6.
    Badkobeh, G., Fici, G., Kroon, S., Lipták, Z.: Binary jumbled string matching for highly run-length compressible texts. Inf. Process. Lett. 113(17), 604–608 (2013)CrossRefzbMATHGoogle Scholar
  7. 7.
    Baker, B.S.: Parameterized pattern matching: Algorithms and applications. J. Comput. Syst. Sci. 52(1), 28–42 (1996)CrossRefzbMATHGoogle Scholar
  8. 8.
    Baker, B.S.: Parameterized duplication in strings: Algorithms and an application to software maintenance. SIAM J. Comput. 26(5), 1343–1362 (1997)CrossRefzbMATHMathSciNetGoogle Scholar
  9. 9.
    Baran, I., Demaine, E.D., Pǎtraşcu, M.: Subquadratic algorithms for 3SUM. In: Dehne, F., López-Ortiz, A., Sack, J.-R. (eds.) WADS 2005. LNCS, vol. 3608, pp. 409–421. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  10. 10.
    Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)CrossRefGoogle Scholar
  11. 11.
    Boyer, R.S., Moore, J.S.: A fast string searching algorithm. Commun. ACM 20(10), 762–772 (1977)CrossRefzbMATHGoogle Scholar
  12. 12.
    Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Pătraşcu, M., Taslakian, P.: Necklaces, convolutions, and X + Y. Algorithmica 69, 294–314 (2014)CrossRefMathSciNetGoogle Scholar
  13. 13.
    Burcsi, P., Cicalese, F., Fici, G., Lipták, Z.: On table arrangements, scrabble freaks, and jumbled pattern matching. In: Boldi, P. (ed.) FUN 2010. LNCS, vol. 6099, pp. 89–101. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  14. 14.
    Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Butman, A., Lewenstein, N., Munro, I.J.: Permuted scaled matching. In: CPM 2014 (to appear, 2014)Google Scholar
  16. 16.
    Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Prague Stringology Conference, pp. 105–117 (2009)Google Scholar
  17. 17.
    Cicalese, F., Laber, E.S., Weimann, O., Yuster, R.: Near linear time construction of an approximate index for all maximum consecutive sub-sums of a sequence. In: Kärkkäinen, J., Stoye, J. (eds.) CPM 2012. LNCS, vol. 7354, pp. 149–158. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Cole, R., Gottlieb, L.-A., Lewenstein, M.: Dictionary matching and indexing with errors and don’t cares. In: STOC, pp. 91–100 (2004)Google Scholar
  19. 19.
    Crochemore, M., Iliopoulos, C.S., Kociumaka, T., Kubica, M., Langiu, A., Pissis, S.P., Radoszewski, J., Rytter, W., Waleń, T.: Order-preserving incomplete suffix trees and order-preserving indexes. In: Kurland, O., Lewenstein, M., Porat, E. (eds.) SPIRE 2013. LNCS, vol. 8214, pp. 84–95. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  20. 20.
    Durocher, S., Ian Munro, J., Mondal, D., Thankachan, S.V.: Jumbled pattern matching over large alphabets (2014) (manuscript, personal communication)Google Scholar
  21. 21.
    Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. Journal of Computational Biology 11(6), 1050–1060 (2004)CrossRefGoogle Scholar
  22. 22.
    Gagie, T., Hermelin, D., Landau, G.M., Weimann, O.: Binary jumbled pattern matching on trees and tree-like structures. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 517–528. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  23. 23.
    Gajentaan, A., Overmars, M.H.: On a class of O(n 2) problems in computational geometry. Comput. Geom. 5, 165–185 (1995)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Giaquinta, E., Grabowski, S.: New algorithms for binary jumbled pattern matching. Inf. Process. Lett. 113(14-16), 538–542 (2013)CrossRefzbMATHMathSciNetGoogle Scholar
  25. 25.
    Hazay, C., Lewenstein, M., Sokol, D.: Approximate parameterized matching. ACM Transactions on Algorithms 3(3) (2007)Google Scholar
  26. 26.
    Hermelin, D., Landau, G.M., Rabinovich, Y., Weimann, O.: Binary jumbled pattern matching via all-pairs shortest paths (2014) (manuscript), http://arxiv.org/abs/1401.2065
  27. 27.
    Holub, S.: Parikh test sets for commutative languages. ITA 42(3), 525–537 (2008)zbMATHMathSciNetGoogle Scholar
  28. 28.
    Huang, X., Ali, H., Sadanandam, A., Singh, R.: SRPVS: a new motif searching algorithm for protein analysis. In: Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference, CSB 2004, pp. 674–675 (2004)Google Scholar
  29. 29.
    Knuth, D.E., Morris Jr., J.H., Pratt, V.R.: Fast pattern matching in strings. SIAM J. Comput. 6(2), 323–350 (1977)CrossRefzbMATHMathSciNetGoogle Scholar
  30. 30.
    Kociumaka, T., Radoszewski, J., Rytter, W.: Efficient indexes for jumbled pattern matching with constant-sized alphabet. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 625–636. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  31. 31.
    Kopczynski, E., Widjaja, A.: Parikh images of grammars: Complexity and applications. In: LICS, pp. 80–89 (2010)Google Scholar
  32. 32.
    Kubica, M., Kulczynski, T., Radoszewski, J., Rytter, W., Walen, T.: A linear time algorithm for consecutive permutation pattern matching. Inf. Process. Lett. 113(12), 430–433 (2013)CrossRefMathSciNetGoogle Scholar
  33. 33.
    Lee, L.-K., Lewenstein, M., Zhang, Q.: Parikh matching in the streaming model. In: Calderón-Benavides, L., González-Caro, C., Chávez, E., Ziviani, N. (eds.) SPIRE 2012. LNCS, vol. 7608, pp. 336–341. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  34. 34.
    Manber, U., Myers, E.W.: Suffix arrays: A new method for on-line string searches. SIAM J. Comput. 22(5), 935–948 (1993)CrossRefzbMATHMathSciNetGoogle Scholar
  35. 35.
    Moosa, T.M., Rahman, M.S.: Indexing permutations for binary strings. Inf. Process. Lett. 110(18-19), 795–798 (2010)CrossRefzbMATHMathSciNetGoogle Scholar
  36. 36.
    Moosa, T.M., Rahman, M.S.: Sub-quadratic time and linear space data structures for permutation matching in binary strings. J. Discrete Algorithms 10, 5–9 (2012)CrossRefzbMATHMathSciNetGoogle Scholar
  37. 37.
    Parikh, R.: On context-free languages. J. ACM 13(4), 570–581 (1966)CrossRefzbMATHMathSciNetGoogle Scholar
  38. 38.
    Pătraşcu, M.: Towards polynomial lower bounds for dynamic problems. In: STOC, pp. 603–610 (2010)Google Scholar
  39. 39.
    Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision 7(1), 11–32 (1991)CrossRefGoogle Scholar
  40. 40.
    Weiner, P.: Linear pattern matching algorithms. In: SWAT (FOCS), pp. 1–11 (1973)Google Scholar
  41. 41.
    Williams, R.: Faster all-pairs shortest paths via circuit complexity. In: STOC (to appear, 2014)Google Scholar
  42. 42.
    Williams, V.V., Williams, R.: Subcubic equivalences between path, matrix and triangle problems. In: FOCS, pp. 645–654 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Amihood Amir
    • 1
    • 2
  • Timothy M. Chan
    • 3
  • Moshe Lewenstein
    • 1
  • Noa Lewenstein
    • 4
  1. 1.Bar-Ilan UniversityIsrael
  2. 2.Johns Hopkins UniversityUSA
  3. 3.University of WaterlooCanada
  4. 4.Netanya CollegeIsrael

Personalised recommendations