Advertisement

Multiple approximate string matching

  • Ricardo Baeza-Yates
  • Gonzalo Navarro
Session 5B: Invited Lecture
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1272)

Abstract

We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster than previous solutions in a wide range of cases.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    A. Aho and M. Corasick. Efficient string matching: an aid to bibliographic search. CALM, 18(6):333–340, June 1975.Google Scholar
  2. 2.
    R. Baeza-Pates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, Sept. 1992.Google Scholar
  3. 3.
    R. Baeza-Pates and G. Navarro. A fast heuristic for approximate string matching. In N. Ziviani, R. Baeza-Pates, and K. Guimarães, editors, Proc. of WSP'96, pages 47–63,1996.ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/wsp96.2.ps.gz.Google Scholar
  4. 4.
    R. Baeza-Pates and G. Navarro. A faster algorithm for approximate string matching. In Proc. of CPM'96, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/cpm96.ps.gz.Google Scholar
  5. 5.
    R. Baeza-Pates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185–192. Springer-Verlag, 1992. LNCS 644.Google Scholar
  6. 6.
    W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. of CPM'92, pages 172–181, 1992. LNCS 644.Google Scholar
  7. 7.
    W. Chang and E. Lawler. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5):327–344, Oct/Nov 1994.Google Scholar
  8. 8.
    Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM J. of Computing, 19(6):989–999, 1990.Google Scholar
  9. 9.
    D. Greene, M. Parnas, and F. Yao. Multi-index hashing for information retrieval. In Proc. FOCS'91, pages 722–731, 1994.Google Scholar
  10. 10.
    G. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10:157–169, 1989.Google Scholar
  11. 11.
    R. Muth and U. Manber. Approximate multiple string search. In Proc. of CPM'96, pages 75–86, 1996.Google Scholar
  12. 12.
    E. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345–374, Oct/Nov 1994.Google Scholar
  13. 13.
    G. Navarro. Approximate string matching by counting. Submitted, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/count.ps.gz,1997.Google Scholar
  14. 14.
    P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.Google Scholar
  15. 15.
    D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, Aug. 1990.Google Scholar
  16. 16.
    E. Sutinen and J. Tarhio. On using q-gram locations in approximate string matching. In Proc. of ESA'95. Springer-Verlag, 1995. LNCS 979.Google Scholar
  17. 17.
    T. Takaoka. Approximate pattern matching with samples. In Proc. of ISAAC'94, pages 234–242. Springer-Verlag, 1994. LNCS 834.Google Scholar
  18. 18.
    E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.Google Scholar
  19. 19.
    E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.Google Scholar
  20. 20.
    A. Wright. Approximate string matching using within-word parallelism. Software Practice and Experience, 24(4):337–362, Apr. 1994.Google Scholar
  21. 21.
    S. Wu and U. Manber. Fast text searching allowing errors. CALM, 35(10):83–91, Oct. 1992.Google Scholar
  22. 22.
    S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • Ricardo Baeza-Yates
    • 1
  • Gonzalo Navarro
    • 1
  1. 1.Department of Computer ScienceUniversity of ChileSantiagoChile

Personalised recommendations