BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes

  • Tien-ho Lin
  • Pradipta Ray
  • Geir K. Sandve
  • Selen Uguroglu
  • Eric P. Xing
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4955)


The transcriptional regulatory sequences in metazoan genomes often consist of multiple cis-regulatory modules (CRMs). Each CRM contains locally enriched occurrences of binding sites (motifs) for a certain array of regulatory proteins, capable of integrating, amplifying or attenuating multiple regulatory signals via combinatorial interaction with these proteins. The architecture of CRM organizations is reminiscent of the grammatical rules underlying a natural language, and presents a particular challenge to computational motif and CRM identification in metazoan genomes. In this paper, we present BayCis, a Bayesian hierarchical HMM that attempts to capture the stochastic syntactic rules of CRM organization. Under the BayCis model, all candidate sites are evaluated based on a posterior probability measure that takes into consideration their similarity to known BSs, their contrasts against local genomic context, their first-order dependencies on upstream sequence elements, as well as priors reflecting general knowledge of CRM structure. We compare our approach to five existing methods for the discovery of CRMs, and demonstrate competitive or superior prediction results evaluated against experimentally based annotations on a comprehensive selection of Drosophila regulatory regions. The software, database and Supplementary Materials will be available at http://www.sailing.cs. .


Hide Markov Model Position Weight Matrix Distal Buffer Global Background Motif Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Alkema, W.B., Johansson, O., Lagergren, J., Wasserman, W.W.: Mscan: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. 32(Web Server issue), 195–198 (2004)CrossRefGoogle Scholar
  2. 2.
    Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99(2), 757–762 (2002)CrossRefGoogle Scholar
  3. 3.
    Davidson, E.H.: Genomic Regulatory Systems. Academic Press, London (2001)Google Scholar
  4. 4.
    Donaldson, I.J., Chapman, M., Gottgens, B.: Tfbscluster: a resource for the characterization of transcriptional regulatory networks. Bioinformatics 21(13), 3058–3059 (2005)CrossRefGoogle Scholar
  5. 5.
    Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden Markov model: Analysis and applications. Mach Learning 32, 41–62 (1998)CrossRefzbMATHGoogle Scholar
  6. 6.
    Frith, M., Li, M., Weng, Z.: Clusterbuster:finding dense clusters of motifs in dna seqs. Nuc. Ac. Res. 31(13), 3666–3668 (2003)CrossRefGoogle Scholar
  7. 7.
    Frith, M.C., Hansen, U., Weng, Z.: Detection of cis-element clusters in higher eukaryotic DNA. Bioinf. 17, 878–889 (2001)CrossRefGoogle Scholar
  8. 8.
    Gallo, S., Li, L., Hu, Z., Halfon, M.: Redfly:a regulatory element database for drosophila. Bioinf. 22(3), 381–383 (2006)CrossRefGoogle Scholar
  9. 9.
    Gupta, M., Liu, J.S.: De novo cis-regulatory module elicitation for eukaryotic genomes. Proc. Natl. Acad. Sci. USA 102(20), 7079–7084 (2005)CrossRefGoogle Scholar
  10. 10.
    Huang, H., Kao, M., Zhou, X., Liu, J.S., Wong, W.H.: Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification. Journal of Computational Biology 11(1) (2004)Google Scholar
  11. 11.
    Liu, X., Brutlag, D.L., Liu, J.: Bioprospector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Proc. of Pac. Symp. Biocomput., 127–138 (2001)Google Scholar
  12. 12.
    Loots, G.G., Ovcharenko, I., Pachter, L., Dubchak, I., Rubin, E.M.: rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12(5), 832–839 (2002)CrossRefGoogle Scholar
  13. 13.
    Ludwig, M.Z., Patel, N.H., Kreitman, M.: Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125(5), 949–958 (1998)Google Scholar
  14. 14.
    Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007)CrossRefGoogle Scholar
  15. 15.
    Michelson, A.: Deciphering genetic regulatory codes:a challenge for fnal genomics. Pr. Nat. Acad. Sc. USA 99, 546–548 (2002)CrossRefGoogle Scholar
  16. 16.
    Moses, A.M., Chiang, D.Y., Eisen, M.B.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac. Symp. Biocomput., 324–335 (2004)Google Scholar
  17. 17.
    Murphy, K., Paskin, M.: Linear time inference in hierarchical hmms. Adv. in Neural Inf. Proc. Sys. 14 (2002)Google Scholar
  18. 18.
    Narang, V., Sung, W.K., Mittal, A.: Computational annotation of transcription factor binding sites in D melanogaster developmental genes. In: Proceedings of The 17th International Conference on Genome Informatics (2006)Google Scholar
  19. 19.
    Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.D.: Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3(30), 1–13 (2002)Google Scholar
  20. 20.
    Rebeiz, M., Reeves, N.L., Posakony, J.W.: Score: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data site clustering over random expectation. Proc. Natl. Acad. Sci. USA 99(15), 9888–9893 (2002)CrossRefGoogle Scholar
  21. 21.
    Sharan, R., Ovcharenko, I., Ben-Hur, A., Karp, R.M.: Creme: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics 19(Suppl 1), i283–291 (2003)CrossRefGoogle Scholar
  22. 22.
    Siddharthan, R., Siggia, E.D., van Nimwegen, E.: Phylogibbs: A gibbs sampling motif finder that incorporates phylogeny. PLoS Computational Biology 1(7), e67 (2005)CrossRefGoogle Scholar
  23. 23.
    Sinha, S., Blanchette, B., Tompa, M.: Phyme: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5(170) (2004)Google Scholar
  24. 24.
    Sinha, S., Liang, Y., Siggia, E.: Stubb: a program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res. 34(Web Server issue), W555–W559 (2006)CrossRefGoogle Scholar
  25. 25.
    Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12(1 Pt 2), 505–519 (1984)CrossRefGoogle Scholar
  26. 26.
    Thijs, G., Lescot, M., Marchal, K., Rombauts, S., DeMoor, B., Rouze, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)CrossRefGoogle Scholar
  27. 27.
    Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S., Lawrence, T.E.: Decoding human regulatory circuits. Genome Res. 14(10A), 1967–1974 (2004)CrossRefGoogle Scholar
  28. 28.
    Tompa, M., Li, N., Bailey, T., Church, G., DeMoor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, A., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotech. 23(1), 137–144 (2005)CrossRefGoogle Scholar
  29. 29.
    Wingender, E., Dietze, P., Karas, H., Knuppel, R.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic. Acids. Res. 24(1), 238–241 (1996)CrossRefGoogle Scholar
  30. 30.
    Xing, E.P., Wu, W., Jordan, M.I., Karp, R.M.: Logos: A modular Bayesian model for de novo motif detection. Journal of Bioinformatics and Computational Biology 2(1), 127–154 (2004)CrossRefGoogle Scholar
  31. 31.
    Zhou, Q., Wong, W.H.: Cismodule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101(33), 12114–12119 (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Tien-ho Lin
    • 1
  • Pradipta Ray
    • 1
  • Geir K. Sandve
    • 2
  • Selen Uguroglu
    • 3
  • Eric P. Xing
    • 1
  1. 1.School of Computer ScienceCarnegie Mellon UniversityPittsburghUSA
  2. 2.Dept of Computer and Information ScienceNorwegian University of Science and TechnologyTrondheimNorway
  3. 3.Dept of Computer Science and EngineeringSabanci UniversityIstanbulTurkey

Personalised recommendations