Abstract
A set of sequences S is pairwise bounded if the Hamming distance between any pair of sequences in S is at most 2d. The Consensus Sequence problem aims to discern between pairwise bounded sets that have a consensus, and if so, finding one such sequence s *, and those that do not. This problem is closely related to the motif-recognition problem, which abstractly models finding important subsequences in biological data. We give an efficient algorithm for sampling pairwise bounded sets, referred to as MarkovSampling, and show it generates pairwise bounded sets uniformly at random. We illustrate the applicability of MarkovSampling to efficiently solving motif-recognition instances. Computing the expected number of motif sets has been a long-standing open problem in motif-recognition [1,3]. We consider the related problem of counting the number of pairwise bounded sets, give new bounds on number of pairwise bounded sets, and present an algorithmic approach to counting the number of pairwise bounded sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comp. Bio. 9(2), 225–242 (2002)
Chin, F.Y.L., Leung, C.M.: Voting algorithms for discovering long motifs. In: Proc. APBC 2005, pp. 261–271 (2005)
Davila, J., Balla, S., Rajasekaran, S.: Fast and practical algorithms for planted (l, d) motif search. IEEE/ACM Trans. Comput. Biol. Bioinf. 4(4), 544–552 (2007)
Dyer, M.: Approximate counting by dynamic programming. In: Proc. STOC 2003, pp. 693–699 (2003)
Dyer, M., Frieze, A.: Randomly colouring graphs with lower bounds on girth and maximum degree. In: Proc. FOCS 2001, pp. 579–587 (2001)
Dyer, M., Frieze, A., Jerrum, M.: Approximately counting Hamilton paths and cycles in dense graphs. SIAM J. Comput. 27(5), 1262–1272 (1998)
Dyer, M., Frieze, A., Jerrum, M.: On counting independent sets in sparse graphs. In: Proc. FOCS 1999, pp. 210–217 (1999)
Eskin, E., Pevzner, P.A.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(1), 354–363 (2002)
Evans, P.A., Smith, A., Wareham, H.T.: On the complexity of finding common approximate substrings. Th. Comp. Sci. 306, 407–430 (2003)
Frances, M., Litman, A.: On covering problems of codes. Th. Comp. Sys. 30, 113–119 (1997)
Gramm, J., Niedermeier, R., Rossmanith, P.: Fixed-parameter algorithms for closest string and related problems. Algorithmica, 37, 25–42
Hayes, T.P., Vigoda, E.: A non-Markovian coupling for randomly sampling colorings. In: Proc. FOCS 2003, pp. 618–627 (2003)
Li, M., Ma, B., Wang, L.: Finding similar regions in many strings. J. Comp. and Sys. Sci. 65(1), 73–96 (2002)
Molloy, M.: The glauber dynamics on colorings of a graph with high girth and maximum degree. In: Proc. STOC 2002, pp. 91–98 (2002)
Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, Cambridge (1995)
Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. In: Proc. ISMB 2000, pp. 344–354 (2000)
Rajasekaran, S., Balla, S., Huang, C.H.: Exact algorithms for the planted motif problem. J. Comp. Bio. 12(8), 1117–1128 (2005)
Sinclair, A., Jerrum, M.: Approximate counting, uniform generation and rapidly mixing. Inform. and Comput. 82, 93–133
Sze, S., Lu, S., Chen, J.: Integrating sample-driven and patter-driven approaches in motif finding. In: Jonassen, I., Kim, J. (eds.) WABI 2004. LNCS (LNBI), vol. 3240, pp. 438–449. Springer, Heidelberg (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Boucher, C. (2009). Faster Algorithms for Sampling and Counting Biological Sequences. In: Karlgren, J., Tarhio, J., Hyyrö, H. (eds) String Processing and Information Retrieval. SPIRE 2009. Lecture Notes in Computer Science, vol 5721. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03784-9_24
Download citation
DOI: https://doi.org/10.1007/978-3-642-03784-9_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03783-2
Online ISBN: 978-3-642-03784-9
eBook Packages: Computer ScienceComputer Science (R0)