Graphical Approach to Weak Motif Recognition in Noisy Data Sets

  • Loi Sy Ho
  • Jagath C. Rajapakse
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4146)

Abstract

Accurate recognition of motifs in biological sequences has become a central problem in computational biology. Though previous approaches have shown reasonable performances in detecting motifs having clear consensus, they are inapplicable to the recognition of weak motifs in noisy datasets, where only a fraction of the sequences may contain motif instances. This paper presents a graphical approach to deal with the real biological sequences, which are noisy in nature, and find potential weak motifs in the higher eukaryotic datasets. We examine our approach on synthetic datasets embedded with the degenerate motifs and show that it outperforms the earlier techniques. Moreover, the present approach is able to find the wet-lab proven motifs and other unreported significant consensus in real biological datasets.

References

  1. 1.
    Bailey, T., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: 2nd ISMB, pp. 33–54 (1994)Google Scholar
  2. 2.
    Buhler, J., Tompa, M.: Finding motifs using random projections. J. Comput. Biol. 9(2), 225–242 (2002)CrossRefGoogle Scholar
  3. 3.
    Chin, F., Leung, H., Yiu, S., Lam, T., Rosenfeld, R., Tsang, W., Smith, D., Jiang, Y.: Finding Motifs for Insufficient Number of Sequences with Strong Binding to Transcription Factor. In: RECOMB 2004, pp. 125–132 (2004)Google Scholar
  4. 4.
    Eskin, E., Pevzner, P.: Finding composite regulatory patterns in DNA sequences. Bioinformatics 18(suppl. 1), S354–S363 (2002)Google Scholar
  5. 5.
    Helden, J., Andre, B., Collado-Vides, J.: Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. J. Mol. Biol. (1998)Google Scholar
  6. 6.
    Hertz, G., Stormo G, G.: Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics 15(7-8), 563–577 (1999)CrossRefGoogle Scholar
  7. 7.
    Hu, J., Li, B., Kihara, D.: Limitations and Potentials of Current Motif Discovery Algorithms. Nucleic Acids Res. 33(15), 4899–4913 (2005)CrossRefGoogle Scholar
  8. 8.
    Jensen, K., Styczynski, M., Rigoutsos, I., Stephanopoulos, G.: A generic motif discovery algorithm for sequential data. Bioinformatics (in press, 2005)Google Scholar
  9. 9.
    Keich, U., Pevzner, P.A.: Finding motifs in the twilight zone. Bioinformatics 18(10), 1374–1381 (2002)CrossRefGoogle Scholar
  10. 10.
    Latchman, S.: Eukaryotic Transcription Factors. Academic Press, London (2003)Google Scholar
  11. 11.
    Lawrence, C., Altschul, S., Boguski, M., Liu, J., Neuwland, A., Wootton, J.: Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262, 208–214 (1993)CrossRefGoogle Scholar
  12. 12.
    Liang, S., Samanta, M., Biegel, B.A.: cWINNOWER Algorithm for Finding Fuzzy DNA Motifs. Journal of Bioinformatics and Computational Biology 2(1), 47–60 (2004)CrossRefGoogle Scholar
  13. 13.
    Liu, S., Neuwald, A., Lawrence, C.: Bayesian models for multiple local sequence alignment and Gibbs sampling strategies. J. Amer. Statist. Assoc. 90, 1157–1170 (1995)Google Scholar
  14. 14.
    Pevzner, P., Sze, S.: Combinatorial approaches to finding subtle signals in DNA sequences. Intelligent Systems for Molecular Biology, 269–278 (2000)Google Scholar
  15. 15.
    Price, A., Ramabhadran, S., Pevzner, P.: Finding subtle motifs by branching from sample strings. Bioinformatics 19(2), II149-II155 (2003)CrossRefGoogle Scholar
  16. 16.
    Rajasekaran, S., Balla, S., Huang, C.: Exact Algorithm for Planted Motif Challenge Problems. In: 3rd Asia-Pacific Bioinformatics Conference, pp. 249–259 (2003)Google Scholar
  17. 17.
    Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 8, pp. 344–354 (2000)Google Scholar
  18. 18.
    Tompa, M., Li, N., Bailey, T., Church, G., De Moor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, W., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing Computational Tools for the Discovery of Transcription Factor Binding Sites. Nature Biotechnology 23(1), 137–144 (2005)CrossRefGoogle Scholar
  19. 19.
    Yang, X., Rajapakse, J.: Graphical approach to weak motif recognition. Genome Informatics 15(2), 52–62 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Loi Sy Ho
    • 1
  • Jagath C. Rajapakse
    • 1
    • 2
    • 3
  1. 1.BioInformatics Research Center, School of Computer EngineeringNanyang Technological UniversitySingapore
  2. 2.Biological Engineering DivisionMassachusetts Institute of TechnologyCambridgeUSA
  3. 3.Singapore-MIT AllianceSingapore

Personalised recommendations