Skip to main content

Hardness of Optimal Spaced Seed Design

  • Conference paper
Combinatorial Pattern Matching (CPM 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3537))

Included in the following conference series:

Abstract

Speeding up approximate pattern matching is a line of research in stringology since the 80’s. Practically fast approaches belong to the class of filtration algorithms, in which text regions dissimilar to the pattern are excluded (filtered out) in a first step, and remaining regions are compared to the pattern by dynamic programming in a second step. Among the necessary conditions used to test similarity between the regions and the pattern, many require a minimum number of common substrings between them. When only substitutions are taken into account for measuring dissimilarity, it was shown recently that counting spaced subwords instead of substrings improve the filtration efficiency. However, a preprocessing step is required to design one or more patterns, called gapped seeds, for the subwords, depending on the search parameters. The seed design problems proposed up to now differ by the way the similarities to detect are given: either a set of similarities is given in extenso (this is a “region specific” problem), or one wishes to detect all similar regions having at most k substitutions (general detection problem). Several articles exhibit exponential algorithms for these problems. In this work, we provide hardness and inapproximability results for both the region specific and general seed design problems, thereby justifying the exponential complexity of known algorithms. Moreover, we introduce a new formulation of the region specific seed design problem, in which the weight of the seed (i.e., number of characters in the subwords) has to be maximized, and show it is as difficult to approximate than Maximum Independent Set.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Altschul, S.F., Gish, W., Miller, W., Meyers, E.W., Lipman, D.J.: Basic local alignment search tool. Journal of Molecular Biology 215(3), 403–410 (1990)

    Google Scholar 

  2. Babcock, W.C.: Intermodulation interference in radio systems. Bell System Technical Journal 32(1), 63–73 (1953)

    Google Scholar 

  3. Burkhardt, S., Crauser, A., Ferragina, P., Lenhof, H.-P., Rivals, E., Vingron, M.: qgram Based Database Searching Using a Suffix Array (QUASAR). In: Third Annual International Conference on Computational Molecular Biology, Lyon, France, April 11–14, pp. 77–83. ACM Press, New York (1999)

    Google Scholar 

  4. Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56(1–2), 51–70 (2003)

    MATH  MathSciNet  Google Scholar 

  5. Califano, A., Rigoutsos, I.: FLASH: A fast look-up algorithm for string homology. In: Hunter, L., Searls, D., Shavlik, J. (eds.) Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology, Menlo Park, CA, USA, July 1993, pp. 56–64. AAAI Press, Menlo Park (1993)

    Google Scholar 

  6. Downey, R.G., Fellows, M.R.: Parameterized Complexity. Monographs in Computer Science. Springer, Heidelberg (1999)

    Google Scholar 

  7. Farach-Colton, M., Landau, G.M., Cenk Sahinalp, S., Tsur, D.: Optimal spaced seeds that avoid false negatives, http://cs.haifa.ac.il/~landau/gadi/seeds.ps

  8. Feige, U.: A threshold of ln n for approximating set cover. Journal of the Association for Computing Machinery 45(4), 634–652 (1998)

    MATH  MathSciNet  Google Scholar 

  9. Garey, M.R., Johnson, D.S.: Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman and Co., New York (1979)

    MATH  Google Scholar 

  10. Håstad, J.: Clique is hard to approximate within n1 − ε. Acta Mathematica 182, 105–142 (1999)

    Article  MATH  MathSciNet  Google Scholar 

  11. Kucherov, G., Noé, L., Roytberg, M.: Multi-seed lossless filtration. In: Sahinalp, S.C., Muthukrishnan, S.M., Dogrusoz, U. (eds.) CPM 2004. LNCS, vol. 3109, pp. 297–310. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  12. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology 2(3), 417–439 (2004)

    Article  Google Scholar 

  13. Ma, B., Tromp, J., Li, M.: Patternhunter: faster and more sensitive homology search. Bioinformatics 18(3), 440–445 (2002)

    Article  Google Scholar 

  14. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5(149) (2004), doi:10.1186/1471-2105-5-149.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nicolas, F., Rivals, E. (2005). Hardness of Optimal Spaced Seed Design. In: Apostolico, A., Crochemore, M., Park, K. (eds) Combinatorial Pattern Matching. CPM 2005. Lecture Notes in Computer Science, vol 3537. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11496656_13

Download citation

  • DOI: https://doi.org/10.1007/11496656_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-26201-5

  • Online ISBN: 978-3-540-31562-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics