Generalized Planted (l,d)-Motif Problem with Negative Set

Leung, Henry C. M.; Chin, Francis Y. L.

doi:10.1007/11557067_22

Henry C. M. Leung²¹ &
Francis Y. L. Chin²¹

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 3692))

Included in the following conference series:

International Workshop on Algorithms in Bioinformatics

1113 Accesses
4 Citations

Abstract

Finding similar patterns (motifs) in a set of sequences is an important problem in Computational Molecular Biology. Pevzner and Sze [18] defined the planted (l,d)-motif problem as trying to find a length-l pattern that occurs in each input sequence with at most d substitutions. When d is large, this problem is difficult to solve because the input sequences do not contain enough information on the motif. In this paper, we propose a generalized planted (l,d)-motif problem which considers as input an additional set of sequences without any substring similar to the motif (negative set) as extra information. We analyze the effects of this negative set on the finding of motifs, and define a set of unsolvable problems and another set of most difficult problems, known as “challenging generalized problems”. We develop an algorithm called VANS based on voting and other novel techniques, which can solve the (9,3), (11,4),(15,6) and (20,8)-motif problems which were unsolvable before as well as challenging problems of the planted (l,d)-motif problem such as (9,2), (11,3), (15,5) and (20,7)-motif problems.

This research is supported in part by an RGC grant HKU 7135/04E.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bailey, T., Charles Elkan, C.: Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21, 51–80 (1995)
Google Scholar
Barash, Y., Bejerano, G., Friedman, N.: A Simple Hyper-Geometric Approach for Discovering Putative Transcription Factor Binding Sites. Workshop on Algorithms in Bioinformatics WABI 1, 278–293 (2001)
Article Google Scholar
Brazma, A., Jonassen, I., Eidhammer, I., Gilbert, D.: Approaches to the automatic discovery of patterns in biosequences. Jour. Comp. Biol. 5, 279–305 (1998)
Article Google Scholar
Buhler, J., Tompa, M.: Finding motifs using random projections. Research in Computational Molecular Biology RECOMB 1, 69–76 (2001)
Google Scholar
Chin, F., Leung, H.: Voting Algorithms for Discovering Long Motifs. Asia-Pacific Bioinformatics Conference APBC 3, 261–271 (2005)
Article Google Scholar
Chin, F., Leung, H., Yiu, S.M., Lam, T.W., Rosenfeld, R., Tsang, W.W., Smith, D., Jiang, Y.: Finding Motifs for Insufficient Number of Sequences with Strong Binding to Transcription Factor. Research in Computational Molecular Biology RECOMB 4, 125–132 (2004)
Google Scholar
Chin, F., Leung, H., Yiu, S.M., Rosenfeld, R., Tsang, W.W.: Finding Motifs with Insufficient Number of Strong Binding Sites. Jour. Comp. Biol. (to appear)
Google Scholar
Fraenkel, Y., Mandel, Y., Friedberg, D., Margalit, H.: Identification of common motifs in unaligned dna sequences: application to Escherichia coli Lrp regulon. Bioinformatics 11, 379–387 (1995)
Article Google Scholar
Gelfand, M., Koonin, E., Mironov, A.: Prediction of transcription regulatory sites in archaea by a comparative genomic approach. Nucl. Acids Res. 28, 695–705 (2000)
Article Google Scholar
van Helden, J., Andre, B., Vides, J.C.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. Journal of Molecular Biology 281(5), 827–842 (1998)
Article Google Scholar
Hertz, G.Z., Stormo, G.D.: Identification of consensus patterns in unaligned dna and protein sequences: a large-deviation statistical basis for penalizing gaps. International Conference on Bioinformatics and Genome Research 3, 201–216 (1995)
Google Scholar
Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwald, A., Wootton, J.: Detecting subtule sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)
Article Google Scholar
Lawrence, C., Reilly, A.: An expectation maximization (em) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins: Structure, Function and Genetics 7, 41–51 (1990)
Article Google Scholar
Leung, H., Chin, F.: Finding Exact Optimal Motif in Matrix Representation by Partitioning. In: European Conference on Computational Biology ECCB (2005) (to appear)
Google Scholar
Liang, S.: cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Computer Society Bioinformatics Conference 2, 260–265 (2003)
Google Scholar
Marsan, L., Sagot, M.F.: Algorithms for extracting structured motifs using a suffix tree with an application to promoter and regulatory site consensus identification. Jour. Comp. Biol. 7(3-4), 345–362 (2000)
Article Google Scholar
Pesole, G., Prunella, N., Liuni, S., Attimonelli, M., Saccone, C.: Wordup: an efficient algorithm for discovering statistically significant patterns in dna sequences. Nucl. Acids. Res. 20(11), 2871–2875 (1992)
Article Google Scholar
Pevzner, P., Sze, S.H.: Combinatorial approaches to finding subtle signals in dna sequences. In: International Conference on Intelligent Systems for Molecular Biology vol. 8, pp. 269–278 (2000)
Google Scholar
Sagot, M.F.: Spelling approximate repeated or common motifs using a suffix tree. In: Lucchesi, C.L., Moura, A.V. (eds.) LATIN 1998. LNCS, vol. 1380, pp. 111–127. Springer, Heidelberg (1998)
Chapter Google Scholar
Sinha, S.: Discriminative motifs. Jour. Comp. Biol. 10, 599–616 (2003)
Article Google Scholar
Zhu, J., Zhang, M.: SCPD: a promoter database of the yeast Saccha-romyces cerevisiae. Bioinformatics 15, 563–577 (1999), http://cgsigma.cshl.org/jian/
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Hong Kong, Pofulam, Hong Kong
Henry C. M. Leung & Francis Y. L. Chin

Authors

Henry C. M. Leung
View author publications
You can also search for this author in PubMed Google Scholar
Francis Y. L. Chin
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Biocomputing Group, University of Bologna, Italy
Rita Casadio
Janelia Farm Research Campus, Howard Hughes Medical Institute, Ashburn, Virginia, USA
Gene Myers

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Leung, H.C.M., Chin, F.Y.L. (2005). Generalized Planted (l,d)-Motif Problem with Negative Set. In: Casadio, R., Myers, G. (eds) Algorithms in Bioinformatics. WABI 2005. Lecture Notes in Computer Science(), vol 3692. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11557067_22

Download citation

DOI: https://doi.org/10.1007/11557067_22
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29008-7
Online ISBN: 978-3-540-31812-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics