Algorithmica

, Volume 13, Issue 1, pp 135–154

Multiple filtration and approximate pattern matching

  • P. A. Pevzner
  • M. S. Waterman
Article

DOI: 10.1007/BF01188584

Cite this article as:
Pevzner, P.A. & Waterman, M.S. Algorithmica (1995) 13: 135. doi:10.1007/BF01188584

Abstract

Given a text of lengthn and a query of lengthq, we present an algorithm for finding all locations ofm-tuples in the text and in the query that differ by at mostk mismatches. This problem is motivated by the dot-matrix constructions for sequence comparison and optimal oligonucleotide probe selection routinely used in molecular biology. In the caseq=m the problem coincides with the classicalapproximate string matching with k mismatches problem. We present a new approach to this problem based on multiple hashing, which may have advantages over some sophisticated and theoretically efficient methods that have been proposed. This paper describes a two-stage process. The first stage (multiple filtration) uses a new technique to preselect roughly similarm-tuples. The second stage compares thesem-tuples using an accurate method. We demonstrate the advantages of multiple filtration in comparison with other techniques for approximate pattern matching.

Key words

String matching Computational molecular biology 

Copyright information

© Springer-Verlag New York Inc. 1995

Authors and Affiliations

  • P. A. Pevzner
    • 1
    • 2
  • M. S. Waterman
    • 1
    • 3
  1. 1.Department of MathematicsUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Computer Science DepartmentThe Pennsylvania State UniversityUniversity ParkUSA
  3. 3.Department of Molecular BiologyUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations