New results and open problems related to non-standard stringology

  • S. Muthukrishnan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 937)

Abstract

There are a number of string matching problems for which the best known algorithms rely on algebraic convolutions (an approach pioneered by Fischer and Paterson [FP74]). These include for instance the classical string matching with wild cards and the k-mismatches problem. In [MP94], the authors studied generalizations of these problems which they called the non-standard stringology. There they derived upper and lower bounds for non-standard string matching problems.

In this paper, we pose several novel problems in the area of non-standard stringology. Some we have been able to resolve here; others we leave open. Among the technical results in this paper are:
  1. 1.

    improved bounds for string matching when a symbol in the string matches at most d others (motivated by noisy string matching),

     
  2. 2.

    first-known bounds for approximately counting mismatches in noisy string matching as above, and

     
  3. 3.

    improved bounds for the k-witnesses problem and its applications.

     

Our results are obtained by using the probabilistic proof technique and randomized algorithmic methods; these techniques, although standard, have seldom been used in combinatorial pattern matching.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [A89]
    A. Aho. Algorithms for finding patterns in strings. Handbook of theoretical computer science, Vol 1, Van Leeuwen Ed., 1989.Google Scholar
  2. [Ab87]
    K. Abrahamson. Generalized string matching. SIAM J. Comp., 1987, 1039–1051.Google Scholar
  3. [AC75]
    A. Aho and M. Corasick. Efficient string searching: An aid to bibliographic search. Comm. of the ACM, 18(6), 1975, 333–340.CrossRefGoogle Scholar
  4. [AF91]
    A. Amir and M. Farach. Efficient 2-dimensional Approximate Matching of Non-rectangular Figures. Proc of 2nd Ann ACM Symp on Discrete Algorithms, 1991, 212–222.Google Scholar
  5. [AGMN92]
    N. Alon, Z. Galil, O. Margalit and M. Naor. Witnesses for boolean matrix multiplication and for shortest paths. Proc. 33rd Ann. IEEE Symp. Foundations of CS, 1992,417–426.Google Scholar
  6. [AHU74]
    A. Aho, J. Hopcroft, and J. Ullman. The design and analysis of computer algorithms. Addison-Wesley Publishers, 1974.Google Scholar
  7. [AL88]
    A. Amir and G. Landau. Fast serial and parallel multidimensional approximate array matching. Theoretical Computer Science, 81, 1991, 97–115.CrossRefGoogle Scholar
  8. [AS93]
    N. Alon and J. Spencer. The probabilistic method. Wiley, 1993.Google Scholar
  9. [BYG89]
    R. Baeza-Yates and G. Gonnet. A new approach to text searching. Proc. ACM SIGIR, Cambridge, Mass., 12:168–175, 1989.Google Scholar
  10. [C71]
    S. Cook. Linear time simulation of deterministic two-way pushdown automata. Proc IFIP Congress, 1971.Google Scholar
  11. [DSO79]
    M. Dayhoff, R. Schwartz and B. Orcutt. A model for evolutionary change in proteins, in Dayhoff, ed., Atlas of Protein Sequence and Structure, 5, 1979, 345–352.Google Scholar
  12. [FP74]
    M. Fischer and M. Paterson. String Matching and other Products. SIAM-AMS Proceedings, Vol. 7, 113–125, 1974.Google Scholar
  13. [Ga79]
    Z. Galil. Some open problems in the theory of computation as questions about two-way deterministic pushdown automaton languages. Mathematical Systems Theory, 1979, 211–228.Google Scholar
  14. [Ga85]
    Z. Galil. Open problems in stringology. Combinatorial Algorithms on Words, A. Apostolico and Z. Galil Eds, Springer-Verlag Lecture Notes, 1985. 1–8.Google Scholar
  15. [GG88]
    Z. Galil and R. Giancarlo. Data structures and algorithms for approximate string matching. Journal of Complexity, 4(1988), 33–72.Google Scholar
  16. [HU79]
    J. Hopcroft and J. Ullman. Introduction to Automata Theory, Languages and Computation, Addison-Wesley, Reading, Mass., 1979.Google Scholar
  17. [K93]
    H. Karloff. Fast algorithms for approximately counting mismatches. Manuscript, 1993.Google Scholar
  18. [Ko87]
    S.R. Kosaraju. Efficient string searching. Manuscript, 1987.Google Scholar
  19. [Ko89]
    S.R. Kosaraju. Efficient tree pattern matching. Proc IEEE Ann. Symp. on FOCS, 1989, 178–183.Google Scholar
  20. [KMP77]
    D. E. Knuth, J. H. Morris, and V. R. Pratt. Fast pattern matching in strings. SIAM J. Computing, 6:323–350, 1977.CrossRefGoogle Scholar
  21. [KNSW92]
    M. Karchmer, I. Newman, M. Saks and A. Wigderson. Non-deterministic communication complexity with few witnesses. Manuscript, 1992.Google Scholar
  22. [KR87]
    R. Karp and M.O. Rabin. Efficient randomized pattern matching algorithms. IBM Journal of Research and Development, 31(2), 249–260.Google Scholar
  23. [Lov]
    L. Lovasz. Communication complexity — a survey. Paths, Flows and VLSI Layout, Korte, Lovasz, Promel, Schrijver Eds., Springer-Verlag (1990), 235–266.Google Scholar
  24. [LV89]
    G.M. Landau and U. Vishkin. Fast parallel and serial approximate string matching. Journal of Algorithms, Vol.10 2(1989), 262–272.Google Scholar
  25. [LW75]
    R. Lowrance and R. Wagner. An extension of the string-to-string correction problem. Journal of Association of Computing Machinery, 22, 1975, 177–183.Google Scholar
  26. [MP94]
    S. Muthukrishnan and K. Palem. Non-standard stringology: algorithms and complexity. Proc. 26th Annual ACM Symp. on the Theory of Computing, 1994, 770–779.Google Scholar
  27. [MR92]
    S. Muthukrishnan and H. Ramesh. String matching under general match relation. Proc 12th FST & TCS, India, LNCS, Springer-Verlag, Vol. 652, 1992, 356–367.Google Scholar
  28. [MaP80]
    W. Masek and M. Paterson. A faster algorithm for computing string-edit distances. Journal of Computer and System Sciences, 20(1), 1980, 18–31.Google Scholar
  29. [P94]
    V. Pan. Personal Communication, 1994.Google Scholar
  30. [Pi85]
    R.Y. Pinter. Efficient string matching with don't-care in patterns. Combinatorial Algorithms on Words, NATO-ASI series, pp. 11–29, 1985. Editors: A. Apostolico and Z. Galil.Google Scholar
  31. [Se92]
    R. Seidel. On the all-pairs-shortest-path problems. Proc. 24th Ann. ACM Symp. Theory of Computing, 1992, 745–749.Google Scholar
  32. [Uk85]
    E. Ukkonen. Finding approximate patterns in strings. Journal of Algorithms, Vol.6, 1985, 132–137.Google Scholar
  33. [WM92]
    S. Wu and U. Manber. Fast text searching allowing errors. Communications of ACM, 35, 1992, 83–91.Google Scholar

Copyright information

© Springer-Verlag 1995

Authors and Affiliations

  • S. Muthukrishnan
    • 1
  1. 1.DIMACS, Rutgers University

Personalised recommendations