Algorithms in Bioinformatics

Volume 3240 of the series Lecture Notes in Computer Science pp 74-86

Gapped Local Similarity Search with Provable Guarantees

  • Manikandan NarayananAffiliated withComputer Science Division, University of California
  • , Richard M. KarpAffiliated withComputer Science Division, University of CaliforniaInternational Computer Science Institute

* Final gross prices may vary according to local VAT.

Get Access


We present a program qhash, based on q-gram filtration and high-dimensional search, to find gapped local similarities between two sequences. Our approach differs from past q-gram-based approaches in two main aspects. Our filtration step uses algorithms for a sparse all-pairs problem, while past studies use suffix-tree-like structures and counters. Our program works in sequence-sequence mode, while most past ones (except QUASAR) work in pattern-database mode.

We leverage existing research in high-dimensional proximity search to discuss sparse all-pairs algorithms, and show them to be subquadratic under certain reasonable input assumptions. Our qhash program has provable sensitivity (even on worst-case inputs) and average-case performance guarantees. It is significantly faster than a fully sensitive dynamic-programming-based program for strong similarity search on longsequences.