Identifying Conserved Discriminative Motifs

  • Jyotsna Kasturi
  • Raj Acharya
  • Ross Hardison
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5265)


The identification of regulatory motifs underlying gene expression is a challenging problem, particularly in eukaryotes. An algorithm to identify statistically significant discriminative motifs that distinguish between gene expression clusters is presented. The predictive power of the identified motifs is assessed with a supervised Naïve Bayes classifier. An information-theoretic feature selection criterion helps find the most informative motifs. Results on benchmark and real data demonstrate that our algorithm accurately identifies discriminative motifs. We show that the integration of comparative genomics information into the motif finding process significantly improves the discovery of discriminative motifs and overall classification accuracy.


Discriminative motifs regulatory elements comparative genomics classification Naïve Bayes mutual information 


  1. 1.
    Blanchette, M., Schwikowski, B., Tompa, M.: An exact algorithm to identify motifs in orthologous sequences from multiple species. In: Proc Eighth Intl Conf Intelligent Systems Mol Biol (ISMB), pp. 37–45. AAAI Press, Menlo Park (2000)Google Scholar
  2. 2.
    Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nat. Gen. 27, 167–171 (2001)CrossRefGoogle Scholar
  3. 3.
    Cardon, L., Stormo, G.: Expectation maximization for identifying protein-binding sites with variable lengths from unaligned DNA fragments. J. Mol. Biol. 223, 159–170 (1992)CrossRefPubMedGoogle Scholar
  4. 4.
    Cover, T.M., Thomas, J.A.: Elements of information theory. Wiley, New York (1991)CrossRefGoogle Scholar
  5. 5.
    Fickett, J.W., Wasserman, W.W.: Discovery and modeling of transcriptional regulatory regions. Curr. Opinion in Biotechnology 11, 19–24 (2000)CrossRefGoogle Scholar
  6. 6.
    Holmes, I., Bruno, W.J.: Finding regulatory elements using joint likelihoods for sequence and expression profile data. Amer Assoc for Artificial Intelligence (2000)Google Scholar
  7. 7.
    Kasturi, J., Acharya, R.: Clustering of Diverse Genomic Data using Information Fusion. In: Proc ACM Sym Applied Computing (Bioinformatics Track) (2004)Google Scholar
  8. 8.
    McCallum, A., Nigam, K.: A comparison of event models for naïve bayes text classification. In: Proc. AAAI (1998)Google Scholar
  9. 9.
    McGuire, A.M., Church, G.M.: Predicting regulons and their cis-regulatory motifs by comparative genomics. Nucleic Acids Res. 28(22), 4523–4530 (2000)CrossRefPubMedCentralGoogle Scholar
  10. 10.
    Mitchell, T.: Machine Learning, ch. 10. McGraw Hill, New York (1997)Google Scholar
  11. 11.
    Lawrence, C., Reilly, A.: An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins 7, 41–51 (1990)CrossRefPubMedGoogle Scholar
  12. 12.
    Liu, X., Brutlag, D.L., Liu, J.S.: Bioprospector: discovering conserved dna motifs in upstream regulatory regions of co-expressed genes. Pac. Sym. Biocomp., 27–38 (2001)Google Scholar
  13. 13.
    Roth, F.P., Hughes, J.D., Estep, P.W., Church, G.M.: Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotech. 16, 939–945 (1998)CrossRefGoogle Scholar
  14. 14.
    Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From promoter sequence to expression: a probabilistic framework. In: RECOMB (2001)Google Scholar
  15. 15.
    Siepel, A., Haussler, D.: Combining phylogenetic and hidden Markov models in biosequence analysis. In: Proc. Seventh Annual Intl. Conf. Comp. Mol. Biol. (RECOMB), pp. 277–286 (2003)Google Scholar
  16. 16.
    Sinha, S., Tompa, M.: A statistical method for finding transcription factor binding sites. Amer. Assoc. Artificial Intelligence (2000)Google Scholar
  17. 17.
    Sinha, S.: Discriminative motifs. J. Comput.Biol. 10(3-4), 599–615 (2003)CrossRefPubMedGoogle Scholar
  18. 18.
    Smith, A.D., Sumazin, P., Zhang, M.Q.: Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc. Natl. Acad. Sci. USA. 102(5), 1560–1565 (2005)CrossRefPubMedPubMedCentralGoogle Scholar
  19. 19.
    Thijs, G., Kathleen, M., Yves, M.: A Gibbs Sampling method to detect over-expressed motifs in the upstream regions of co-expressed genes. In: RECOMB (2001)Google Scholar
  20. 20.
    Tompa, M., Li, N., Bailey, T.L., Church, G.M., De Moor, B., Eskin, E., Favorov, A.V., Frith, M.C., Fu, Y., Kent, W.J., Makeev, V.J., Mironov, A.A., Noble, W.S., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., Van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nature Biotech. 23(1), 137–144 (2005)CrossRefGoogle Scholar
  21. 21.
    Welch, J.J., Watts, J.A., Vakoc, C.R., Yao, Y., Wang, H., Hardison, R.C., Blobel, G.A., Chodosh, L.A., Weiss, M.J.: Blood.  104(10), 3136–3147 (2004)Google Scholar
  22. 22.
    Xie, X., Lu, J., Kulbokas, E.J., Golub, T.R., Mootha, V., Lindblad-Toh, K., Lander, E.S., Kellis, M.: Systematic discovery of regulatory motifs in human promoters and 3’ UTRs by comparison of several mammals. Nature 434(7031), 338–345 (2005)CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Jyotsna Kasturi
    • 1
    • 2
  • Raj Acharya
    • 2
  • Ross Hardison
    • 3
    • 4
  1. 1.Non-Clinical Biostatistics, Johnson & Johnson Pharmaceutical Research & Development, New JerseyUSA
  2. 2.Department of Computer Science and EngineeringUSA
  3. 3.Center for Comparative Genomics and BioinformaticsHuck Institutes of Life SciencesUSA
  4. 4.Department of Biochemistry and Molecular BiologyPennsylvania State UniversityUSA

Personalised recommendations