Multiple filtration and approximate pattern matching
- Cite this article as:
- Pevzner, P.A. & Waterman, M.S. Algorithmica (1995) 13: 135. doi:10.1007/BF01188584
- 153 Downloads
Given a text of lengthn and a query of lengthq, we present an algorithm for finding all locations ofm-tuples in the text and in the query that differ by at mostk mismatches. This problem is motivated by the dot-matrix constructions for sequence comparison and optimal oligonucleotide probe selection routinely used in molecular biology. In the caseq=m the problem coincides with the classicalapproximate string matching with k mismatches problem. We present a new approach to this problem based on multiple hashing, which may have advantages over some sophisticated and theoretically efficient methods that have been proposed. This paper describes a two-stage process. The first stage (multiple filtration) uses a new technique to preselect roughly similarm-tuples. The second stage compares thesem-tuples using an accurate method. We demonstrate the advantages of multiple filtration in comparison with other techniques for approximate pattern matching.