Skip to main content

Multiple approximate string matching

  • Session 5B: Invited Lecture
  • Conference paper
  • First Online:
Algorithms and Data Structures (WADS 1997)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1272))

Included in the following conference series:

Abstract

We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster than previous solutions in a wide range of cases.

This work has been supported in part by FONDECYT grants 1950622 and 1960881.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Aho and M. Corasick. Efficient string matching: an aid to bibliographic search. CALM, 18(6):333–340, June 1975.

    Google Scholar 

  2. R. Baeza-Pates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, Sept. 1992.

    Google Scholar 

  3. R. Baeza-Pates and G. Navarro. A fast heuristic for approximate string matching. In N. Ziviani, R. Baeza-Pates, and K. Guimarães, editors, Proc. of WSP'96, pages 47–63,1996.ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/wsp96.2.ps.gz.

    Google Scholar 

  4. R. Baeza-Pates and G. Navarro. A faster algorithm for approximate string matching. In Proc. of CPM'96, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/cpm96.ps.gz.

    Google Scholar 

  5. R. Baeza-Pates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185–192. Springer-Verlag, 1992. LNCS 644.

    Google Scholar 

  6. W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. of CPM'92, pages 172–181, 1992. LNCS 644.

    Google Scholar 

  7. W. Chang and E. Lawler. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5):327–344, Oct/Nov 1994.

    Google Scholar 

  8. Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM J. of Computing, 19(6):989–999, 1990.

    Google Scholar 

  9. D. Greene, M. Parnas, and F. Yao. Multi-index hashing for information retrieval. In Proc. FOCS'91, pages 722–731, 1994.

    Google Scholar 

  10. G. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10:157–169, 1989.

    Google Scholar 

  11. R. Muth and U. Manber. Approximate multiple string search. In Proc. of CPM'96, pages 75–86, 1996.

    Google Scholar 

  12. E. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345–374, Oct/Nov 1994.

    Google Scholar 

  13. G. Navarro. Approximate string matching by counting. Submitted, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/count.ps.gz,1997.

    Google Scholar 

  14. P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.

    Google Scholar 

  15. D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, Aug. 1990.

    Google Scholar 

  16. E. Sutinen and J. Tarhio. On using q-gram locations in approximate string matching. In Proc. of ESA'95. Springer-Verlag, 1995. LNCS 979.

    Google Scholar 

  17. T. Takaoka. Approximate pattern matching with samples. In Proc. of ISAAC'94, pages 234–242. Springer-Verlag, 1994. LNCS 834.

    Google Scholar 

  18. E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.

    Google Scholar 

  19. E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.

    Google Scholar 

  20. A. Wright. Approximate string matching using within-word parallelism. Software Practice and Experience, 24(4):337–362, Apr. 1994.

    Google Scholar 

  21. S. Wu and U. Manber. Fast text searching allowing errors. CALM, 35(10):83–91, Oct. 1992.

    Google Scholar 

  22. S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Frank Dehne Andrew Rau-Chaplin Jörg-Rüdiger Sack Roberto Tamassia

Rights and permissions

Reprints and permissions

Copyright information

© 1997 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Baeza-Yates, R., Navarro, G. (1997). Multiple approximate string matching. In: Dehne, F., Rau-Chaplin, A., Sack, JR., Tamassia, R. (eds) Algorithms and Data Structures. WADS 1997. Lecture Notes in Computer Science, vol 1272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63307-3_57

Download citation

  • DOI: https://doi.org/10.1007/3-540-63307-3_57

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-63307-5

  • Online ISBN: 978-3-540-69422-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics