Advertisement

On Table Arrangements, Scrabble Freaks, and Jumbled Pattern Matching

  • Péter Burcsi
  • Ferdinando Cicalese
  • Gabriele Fici
  • Zsuzsanna Lipták
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6099)

Abstract

Given a string s, the Parikh vector of s, denoted p(s), counts the multiplicity of each character in s. Searching for a match of Parikh vector q (a “jumbled string”) in the text s requires to find a substring t of s with p(t) = q. The corresponding decision problem is to verify whether at least one such match exists. So, for example for the alphabet Σ = {a, b, c}, the string s = abaccbabaaa has Parikh vector p(s) = (6,3,2), and the Parikh vector q = (2,1,1) appears once in s in position (1,4). Like its more precise counterpart, the renown Exact String Matching, Jumbled Pattern Matching has ubiquitous applications, e.g., string matching with a dyslectic word processor, table rearrangements, anagram checking, Scrabble playing and, allegedly, also analysis of mass spectrometry data. We consider two simple algorithms for Jumbled Pattern Matching and use very complicated data structures and analytic tools to show that they are not worse than the most obvious algorithm. We also show that we can achieve non-trivial efficient average case behavior, but that’s less fun to describe in this abstract so we defer the details to the main part of the article, to be read at the reader’s risk...well, at the reader’s discretion.

Keywords

String Match Current Window Window Algorithm Approximate String Match Wavelet Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amir, A., Apostolico, A., Landau, G.M., Satta, G.: Efficient text fingerprinting via Parikh mapping. J. Discrete Algorithms 1(5-6), 409–421 (2003)zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    Babai, L., Felzenszwalb, P.F.: Computing rank-convolutions with a mask. ACM Trans. Algorithms 6(1), 1–13 (2009)Google Scholar
  3. 3.
    Bellman, R., Karush, W.: Mathematical programming and the maximum transform. Journal of the Soc. for Industrial and Applied Math. 10(3), 550–567 (1962)zbMATHCrossRefMathSciNetGoogle Scholar
  4. 4.
    Benson, G.: Composition alignment. In: Benson, G., Page, R.D.M. (eds.) WABI 2003. LNCS (LNBI), vol. 2812, pp. 447–461. Springer, Heidelberg (2003)Google Scholar
  5. 5.
    Böcker, S.: Simulating multiplexed SNP discovery rates using base-specific cleavage and mass spectrometry. Bioinformatics 23(2), 5–12 (2007)CrossRefGoogle Scholar
  6. 6.
    Böcker, S., Lipták, Z.: A fast and simple algorithm for the Money Changing Problem. Algorithmica 48(4), 413–432 (2007)zbMATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Bremner, D., Chan, T.M., Demaine, E.D., Erickson, J., Hurtado, F., Iacono, J., Langerman, S., Taslakian, P.: Necklaces, convolutions, and X + Y. In: Azar, Y., Erlebach, T. (eds.) ESA 2006. LNCS, vol. 4168, pp. 160–171. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  8. 8.
    Butman, A., Eres, R., Landau, G.M.: Scaled and permuted string matching. Inf. Process. Lett. 92(6), 293–297 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    Chan, T.M.: All-pairs shortest paths with real weights in O(n 3/logn) time. Algorithmica 50(2), 236–243 (2008)zbMATHCrossRefMathSciNetGoogle Scholar
  10. 10.
    Cicalese, F., Fici, G., Lipták, Z.: Searching for jumbled patterns in strings. In: Proc. of the Prague Stringology Conference 2009, pp. 105–117 (2009)Google Scholar
  11. 11.
    Cieliebak, M., Erlebach, T., Lipták, Z., Stoye, J., Welzl, E.: Algorithmic complexity of protein identification: combinatorics of weighted strings. Discrete Applied Mathematics 137(1), 27–46 (2004)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Clark, D.: Compact pat trees. PhD thesis, University of Waterloo, Canada (1996)Google Scholar
  13. 13.
    Eppstein, D.A.: Efficient algorithms for sequence analysis with concave and convex gap costs. PhD thesis, New York, NY, USA (1989)Google Scholar
  14. 14.
    Eres, R., Landau, G.M., Parida, L.: Permutation pattern discovery in biosequences. Journal of Computational Biology 11(6), 1050–1060 (2004)CrossRefGoogle Scholar
  15. 15.
    Felzenszwalb, P.F., Huttenlocher, D.P., Kleinberg, J.M.: Fast algorithms for large-state-space HMMs with applications to web usage analysis. In: Thrun, S., Saul, L.K., Schölkopf, B. (eds.) NIPS. MIT Press, Cambridge (2003)Google Scholar
  16. 16.
    Goczyła, K.: The generalized Banach match-box problem: Application in disc storage management. Acta Applicandae Mathematicae 5, 27–36 (1986)zbMATHCrossRefMathSciNetGoogle Scholar
  17. 17.
    Grossi, R., Gupta, A., Vitter, J.S.: High-order entropy-compressed text indexes. In: SODA, pp. 841–850 (2003)Google Scholar
  18. 18.
    Jokinen, P., Tarhio, J., Ukkonen, E.: A comparison of approximate string matching algorithms. Software Practice and Experience 26(12), 1439–1458 (1996)CrossRefGoogle Scholar
  19. 19.
    Mendelson, H., Pliskin, J., Yechiali, U.: Optimal storage allocation for serial files. Communications of the ACM 22, 124–130 (1979)zbMATHCrossRefMathSciNetGoogle Scholar
  20. 20.
    Mendelson, H., Pliskin, J., Yechiali, U.: A stochastic allocation problem. Operations Research 28, 687–693 (1980)zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    Munro, J.I.: Tables. In: Chandru, V., Vinay, V. (eds.) FSTTCS 1996. LNCS, vol. 1180, pp. 37–42. Springer, Heidelberg (1996)Google Scholar
  22. 22.
    Navarro, G., Mäkinen, V.: Compressed full-text indexes. ACM Comput. Surv. 39(1) (2007)Google Scholar
  23. 23.
    Parida, L.: Gapped permutation patterns for comparative genomics. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 376–387. Springer, Heidelberg (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Péter Burcsi
    • 1
  • Ferdinando Cicalese
    • 2
  • Gabriele Fici
    • 3
  • Zsuzsanna Lipták
    • 4
  1. 1.Department of Computer AlgebraEötvös Loránd UniversityHungary
  2. 2.Dipartimento di Informatica ed ApplicazioniUniversity of SalernoItaly
  3. 3.I3S, UMR6070, CNRS et Université de Nice-Sophia AntipolisFrance
  4. 4.AG Genominformatik, Technische FakultätBielefeld UniversityGermany

Personalised recommendations