Skip to main content

Subset Seed Automaton

  • Conference paper
Book cover Implementation and Application of Automata (CIAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 4783))

Included in the following conference series:

Abstract

We study the pattern matching automaton introduced in [1] for the purpose of seed-based similarity search. We show that our definition provides a compact automaton, much smaller than the one obtained by applying the Aho-Corasick construction. We study properties of this automaton and present an efficient implementation of the automaton construction. We also present some experimental results and show that this automaton can be successfully applied to more general situations.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Kucherov, G., Noé, L., Roytberg, M.: A unifying framework for seed sensitivity and its application to subset seeds. JBCB 4, 553–569 (2006)

    Google Scholar 

  2. Burkhardt, S., Kärkkäinen, J.: Better filtering with gapped q-grams. Fundamenta Informaticae 56, 51–70 (2003)

    MATH  MathSciNet  Google Scholar 

  3. Ma, B., Tromp, J., Li, M.: PatternHunter: Faster and more sensitive homology search. Bioinformatics 18, 440–445 (2002)

    Article  Google Scholar 

  4. Brown, D., Li, M., Ma, B.: A tutorial of recent developments in the seeding of local alignment. JBCB 2, 819–842 (2004)

    Google Scholar 

  5. Brown, D.: A survey of seeding for sequence alignments. In: Bioinformatics Algorithms: Techniques and Applications (to appear, 2007)

    Google Scholar 

  6. Li, M., Ma, B., Kisman, D., Tromp, J.: PatternHunter II: Highly sensitive and fast homology search. Journal of Bioinformatics and Computational Biology 2, 417–439 (2004)

    Article  Google Scholar 

  7. Noé, L., Kucherov, G.: YASS: enhancing the sensitivity of DNA similarity search. Nucleic Acids Research 33(web-server issue), W540–W543 (2005)

    Article  Google Scholar 

  8. Califano, A., Rigoutsos, I.: Flash: A fast look-up algorithm for string homology. In: Proceedings of the 1st International Conference on Intelligent Systems for Molecular Biology (ISMB), pp. 56–64 (1993)

    Google Scholar 

  9. Tsur, D.: Optimal probing patterns for sequencing by hybridization. In: Bücher, P., Moret, B.M.E. (eds.) WABI 2006. LNCS (LNBI), vol. 4175, pp. 366–375. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Schwartz, S., Kent, J., Smit, A., Zhang, Z., Baertsch, R., Hardison, R., Haussler, D., Miller, W.: Human–mouse alignments with BLASTZ. Genome Research 13, 103–107 (2003)

    Article  Google Scholar 

  11. Sun, Y., Buhler, J.: Choosing the best heuristic for seeded alignment of DNA sequences. BMC Bioinformatics 7 (2006)

    Google Scholar 

  12. Csürös, M., Ma, B.: Rapid homology search with two-stage extension and daughter seeds. In: Wang, L. (ed.) COCOON 2005. LNCS, vol. 3595, pp. 104–114. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  13. Mak, D., Gelfand, Y., Benson, G.: Indel seeds for homology search. Bioinformatics 22, e341–e349 (2006)

    Article  Google Scholar 

  14. Brejová, B., Brown, D., Vinar, T.: Vector seeds: An extension to spaced seeds. Journal of Computer and System Sciences 70, 364–380 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  15. Keich, U., Li, M., Ma, B., Tromp, J.: On spaced seeds for similarity search. Discrete Applied Mathematics 138, 253–263 (2004) preliminary version in 2002.

    Article  MATH  MathSciNet  Google Scholar 

  16. Buhler, J., Keich, U., Sun, Y.: Designing seeds for similarity search in genomic DNA. In: Proceedings of the 7th Annual International Conference on Computational Molecular Biology (RECOMB), pp. 67–75 (2003)

    Google Scholar 

  17. Brejová, B., Brown, D., Vinar, T.: Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology 1, 595–610 (2004)

    Article  Google Scholar 

  18. Cole, R., Hariharan, R., Indyk, P.: Tree pattern matching and subset matching in deterministic O(nlog3 n)-time. In: Proceedings of 10th Symposium on Discrete Algorithms (SODA), pp. 245–254 (1999)

    Google Scholar 

  19. Holub, J., Smyth, W.F., Wang, S.: Fast pattern-matching on indeterminate strings. Journal of Discrete Algorithms (2006)

    Google Scholar 

  20. Rahman, S., Iliopoulos, C., Mouchard, L.: Pattern matching in degenerate DNA/RNA sequences. In: Proceedings of the Workshop on Algorithms and Computation (WALCOM), pp. 109–120 (2007)

    Google Scholar 

  21. Noé, L., Kucherov, G.: Improved hit criteria for DNA local alignment. BMC Bioinformatics 5 (2004)

    Google Scholar 

  22. Aho, A.V., Corasick, M.J.: Efficient string matching: An aid to bibliographic search. Communications of the ACM 18, 333–340 (1975)

    Article  MATH  MathSciNet  Google Scholar 

  23. Amir, A., Porat, E., Lewenstein, M.: Approximate subset matching with don’t cares. In: Proceedings of 12th Symposium on Discrete Algorithms (SODA), pp. 305–306 (2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Jan Holub Jan Žďárek

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kucherov, G., Noé, L., Roytberg, M. (2007). Subset Seed Automaton. In: Holub, J., Žďárek, J. (eds) Implementation and Application of Automata. CIAA 2007. Lecture Notes in Computer Science, vol 4783. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76336-9_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-76336-9_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-76335-2

  • Online ISBN: 978-3-540-76336-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics