Skip to main content

Faster Algorithms for Sampling and Counting Biological Sequences

  • Conference paper
Book cover String Processing and Information Retrieval (SPIRE 2009)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5721))

Included in the following conference series:

  • 1054 Accesses

Abstract

A set of sequences S is pairwise bounded if the Hamming distance between any pair of sequences in S is at most 2d. The Consensus Sequence problem aims to discern between pairwise bounded sets that have a consensus, and if so, finding one such sequence s *, and those that do not. This problem is closely related to the motif-recognition problem, which abstractly models finding important subsequences in biological data. We give an efficient algorithm for sampling pairwise bounded sets, referred to as MarkovSampling, and show it generates pairwise bounded sets uniformly at random. We illustrate the applicability of MarkovSampling to efficiently solving motif-recognition instances. Computing the expected number of motif sets has been a long-standing open problem in motif-recognition [1,3]. We consider the related problem of counting the number of pairwise bounded sets, give new bounds on number of pairwise bounded sets, and present an algorithmic approach to counting the number of pairwise bounded sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comp. Bio. 9(2), 225–242 (2002)

    Article  Google Scholar 

  2. Chin, F.Y.L., Leung, C.M.: Voting algorithms for discovering long motifs. In: Proc. APBC 2005, pp. 261–271 (2005)

    Google Scholar 

  3. Davila, J., Balla, S., Rajasekaran, S.: Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 544–552 (2007)

    Article  Google Scholar 

  4. Dyer, M.: Approximate counting by dynamic programming. In: Proc. STOC 2003, pp. 693–699 (2003)

    Google Scholar 

  5. Dyer, M., Frieze, A.: Randomly colouring graphs with lower bounds on girth and maximum degree. In: Proc. FOCS 2001, pp. 579–587 (2001)

    Google Scholar 

  6. Dyer, M., Frieze, A., Jerrum, M.: Approximately counting Hamilton paths and cycles in dense graphs. SIAM J. Comput. 27(5), 1262–1272 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  7. Dyer, M., Frieze, A., Jerrum, M.: On counting independent sets in sparse graphs. In: Proc. FOCS 1999, pp. 210–217 (1999)

    Google Scholar 

  8. Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1), 354–363 (2002)

    Article  Google Scholar 

  9. Evans, P.A., Smith, A., Wareham, H.T.: On the complexity of finding common approximate substrings. Th. Comp. Sci. 306, 407–430 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  10. Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30, 113–119 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37, 25–42

    Google Scholar 

  12. Hayes, T.P., Vigoda, E.: A non-Markovian coupling for randomly sampling colorings. In: Proc. FOCS 2003, pp. 618–627 (2003)

    Google Scholar 

  13. Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comp. and Sys. Sci. 65(1), 73–96 (2002)

    Article  MATH  Google Scholar 

  14. Molloy, M.: The glauber dynamics on colorings of a graph with high girth and maximum degree. In: Proc. STOC 2002, pp. 91–98 (2002)

    Google Scholar 

  15. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)

    Book  MATH  Google Scholar 

  16. Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. ISMB 2000, pp. 344–354 (2000)

    Google Scholar 

  17. Rajasekaran, S., Balla, S., Huang, C.H.: Exact algorithms for the planted motif problem. J. Comp. Bio. 12(8), 1117–1128 (2005)

    Article  Google Scholar 

  18. Sinclair, A., Jerrum, M.: Approximate counting, uniform generation and rapidly mixing. Inform. and Comput. 82, 93–133

    Google Scholar 

  19. Sze, S., Lu, S., Chen, J.: Integrating sample-driven and patter-driven approaches in motif finding. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 438–449. Springer, Heidelberg (2004)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Boucher, C. (2009). Faster Algorithms for Sampling and Counting Biological Sequences. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_24

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03784-9_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03783-2

  • Online ISBN: 978-3-642-03784-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics