Abstract
We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster than previous solutions in a wide range of cases.
This work has been supported in part by FONDECYT grants 1950622 and 1960881.
Preview
Unable to display preview. Download preview PDF.
References
A. Aho and M. Corasick. Efficient string matching: an aid to bibliographic search. CALM, 18(6):333–340, June 1975.
R. Baeza-Pates. Text retrieval: Theory and practice. In 12th IFIP World Computer Congress, volume I, pages 465–476. Elsevier Science, Sept. 1992.
R. Baeza-Pates and G. Navarro. A fast heuristic for approximate string matching. In N. Ziviani, R. Baeza-Pates, and K. Guimarães, editors, Proc. of WSP'96, pages 47–63,1996.ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/wsp96.2.ps.gz.
R. Baeza-Pates and G. Navarro. A faster algorithm for approximate string matching. In Proc. of CPM'96, pages 1–23, 1996. ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/cpm96.ps.gz.
R. Baeza-Pates and C. Perleberg. Fast and practical approximate pattern matching. In Proc. CPM'92, pages 185–192. Springer-Verlag, 1992. LNCS 644.
W. Chang and J. Lampe. Theoretical and empirical comparisons of approximate string matching algorithms. In Proc. of CPM'92, pages 172–181, 1992. LNCS 644.
W. Chang and E. Lawler. Sublinear approximate string matching and biological applications. Algorithmica, 12(4/5):327–344, Oct/Nov 1994.
Z. Galil and K. Park. An improved algorithm for approximate string matching. SIAM J. of Computing, 19(6):989–999, 1990.
D. Greene, M. Parnas, and F. Yao. Multi-index hashing for information retrieval. In Proc. FOCS'91, pages 722–731, 1994.
G. Landau and U. Vishkin. Fast parallel and serial approximate string matching. J. of Algorithms, 10:157–169, 1989.
R. Muth and U. Manber. Approximate multiple string search. In Proc. of CPM'96, pages 75–86, 1996.
E. Myers. A sublinear algorithm for approximate keyword searching. Algorithmica, 12(4/5):345–374, Oct/Nov 1994.
G. Navarro. Approximate string matching by counting. Submitted, ftp://ftp.dcc.uchile.cl/pub/users/gnavarro/count.ps.gz,1997.
P. Sellers. The theory and computation of evolutionary distances: pattern recognition. J. of Algorithms, 1:359–373, 1980.
D. Sunday. A very fast substring search algorithm. CACM, 33(8):132–142, Aug. 1990.
E. Sutinen and J. Tarhio. On using q-gram locations in approximate string matching. In Proc. of ESA'95. Springer-Verlag, 1995. LNCS 979.
T. Takaoka. Approximate pattern matching with samples. In Proc. of ISAAC'94, pages 234–242. Springer-Verlag, 1994. LNCS 834.
E. Ukkonen. Algorithms for approximate string matching. Information and Control, 64:100–118, 1985.
E. Ukkonen. Finding approximate patterns in strings. J. of Algorithms, 6:132–137, 1985.
A. Wright. Approximate string matching using within-word parallelism. Software Practice and Experience, 24(4):337–362, Apr. 1994.
S. Wu and U. Manber. Fast text searching allowing errors. CALM, 35(10):83–91, Oct. 1992.
S. Wu, U. Manber, and E. Myers. A sub-quadratic algorithm for approximate limited expression matching. Algorithmica, 15(1):50–67, 1996.
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 1997 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Baeza-Yates, R., Navarro, G. (1997). Multiple approximate string matching. In: Dehne, F., Rau-Chaplin, A., Sack, JR., Tamassia, R. (eds) Algorithms and Data Structures. WADS 1997. Lecture Notes in Computer Science, vol 1272. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-63307-3_57
Download citation
DOI: https://doi.org/10.1007/3-540-63307-3_57
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-63307-5
Online ISBN: 978-3-540-69422-9
eBook Packages: Springer Book Archive