Abstract
The transcriptional regulatory sequences in metazoan genomes often consist of multiple cis-regulatory modules (CRMs). Each CRM contains locally enriched occurrences of binding sites (motifs) for a certain array of regulatory proteins, capable of integrating, amplifying or attenuating multiple regulatory signals via combinatorial interaction with these proteins. The architecture of CRM organizations is reminiscent of the grammatical rules underlying a natural language, and presents a particular challenge to computational motif and CRM identification in metazoan genomes. In this paper, we present BayCis, a Bayesian hierarchical HMM that attempts to capture the stochastic syntactic rules of CRM organization. Under the BayCis model, all candidate sites are evaluated based on a posterior probability measure that takes into consideration their similarity to known BSs, their contrasts against local genomic context, their first-order dependencies on upstream sequence elements, as well as priors reflecting general knowledge of CRM structure. We compare our approach to five existing methods for the discovery of CRMs, and demonstrate competitive or superior prediction results evaluated against experimentally based annotations on a comprehensive selection of Drosophila regulatory regions. The software, database and Supplementary Materials will be available at http://www.sailing.cs. cmu.edu/baycis .
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Alkema, W.B., Johansson, O., Lagergren, J., Wasserman, W.W.: Mscan: identification of functional clusters of transcription factor binding sites. Nucleic Acids Res. 32(Web Server issue), 195–198 (2004)
Berman, B.P., Nibu, Y., Pfeiffer, B.D., Tomancak, P., Celniker, S.E., Levine, M., Rubin, G.M., Eisen, M.: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl. Acad. Sci. USA 99(2), 757–762 (2002)
Davidson, E.H.: Genomic Regulatory Systems. Academic Press, London (2001)
Donaldson, I.J., Chapman, M., Gottgens, B.: Tfbscluster: a resource for the characterization of transcriptional regulatory networks. Bioinformatics 21(13), 3058–3059 (2005)
Fine, S., Singer, Y., Tishby, N.: The hierarchical hidden Markov model: Analysis and applications. Mach Learning 32, 41–62 (1998)
Frith, M., Li, M., Weng, Z.: Clusterbuster:finding dense clusters of motifs in dna seqs. Nuc. Ac. Res. 31(13), 3666–3668 (2003)
Frith, M.C., Hansen, U., Weng, Z.: Detection of cis-element clusters in higher eukaryotic DNA. Bioinf. 17, 878–889 (2001)
Gallo, S., Li, L., Hu, Z., Halfon, M.: Redfly:a regulatory element database for drosophila. Bioinf. 22(3), 381–383 (2006)
Gupta, M., Liu, J.S.: De novo cis-regulatory module elicitation for eukaryotic genomes. Proc. Natl. Acad. Sci. USA 102(20), 7079–7084 (2005)
Huang, H., Kao, M., Zhou, X., Liu, J.S., Wong, W.H.: Determination of local statistical significance of patterns in Markov sequences with application to promoter element identification. Journal of Computational Biology 11(1) (2004)
Liu, X., Brutlag, D.L., Liu, J.: Bioprospector: Discovering conserved DNA motifs in upstream regulatory regions of co-expressed genes. Proc. of Pac. Symp. Biocomput., 127–138 (2001)
Loots, G.G., Ovcharenko, I., Pachter, L., Dubchak, I., Rubin, E.M.: rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 12(5), 832–839 (2002)
Ludwig, M.Z., Patel, N.H., Kreitman, M.: Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development 125(5), 949–958 (1998)
Maerkl, S.J., Quake, S.R.: A systems approach to measuring the binding energy landscapes of transcription factors. Science 315, 233–237 (2007)
Michelson, A.: Deciphering genetic regulatory codes:a challenge for fnal genomics. Pr. Nat. Acad. Sc. USA 99, 546–548 (2002)
Moses, A.M., Chiang, D.Y., Eisen, M.B.: Phylogenetic motif detection by expectation-maximization on evolutionary mixtures. Pac. Symp. Biocomput., 324–335 (2004)
Murphy, K., Paskin, M.: Linear time inference in hierarchical hmms. Adv. in Neural Inf. Proc. Sys. 14 (2002)
Narang, V., Sung, W.K., Mittal, A.: Computational annotation of transcription factor binding sites in D melanogaster developmental genes. In: Proceedings of The 17th International Conference on Genome Informatics (2006)
Rajewsky, N., Vergassola, M., Gaul, U., Siggia, E.D.: Computational detection of genomic cis-regulatory modules, applied to body patterning in the early Drosophila embryo. BMC Bioinformatics 3(30), 1–13 (2002)
Rebeiz, M., Reeves, N.L., Posakony, J.W.: Score: a computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data site clustering over random expectation. Proc. Natl. Acad. Sci. USA 99(15), 9888–9893 (2002)
Sharan, R., Ovcharenko, I., Ben-Hur, A., Karp, R.M.: Creme: a framework for identifying cis-regulatory modules in human-mouse conserved segments. Bioinformatics 19(Suppl 1), i283–291 (2003)
Siddharthan, R., Siggia, E.D., van Nimwegen, E.: Phylogibbs: A gibbs sampling motif finder that incorporates phylogeny. PLoS Computational Biology 1(7), e67 (2005)
Sinha, S., Blanchette, B., Tompa, M.: Phyme: A probabilistic algorithm for finding motifs in sets of orthologous sequences. BMC Bioinformatics 5(170) (2004)
Sinha, S., Liang, Y., Siggia, E.: Stubb: a program for discovery and analysis of cis-regulatory modules. Nucleic Acids Res. 34(Web Server issue), W555–W559 (2006)
Staden, R.: Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Res. 12(1 Pt 2), 505–519 (1984)
Thijs, G., Lescot, M., Marchal, K., Rombauts, S., DeMoor, B., Rouze, P., Moreau, Y.: A higher-order background model improves the detection of promoter regulatory elements by gibbs sampling. Bioinformatics 17(12), 1113–1122 (2001)
Thompson, W., Palumbo, M.J., Wasserman, W.W., Liu, J.S., Lawrence, T.E.: Decoding human regulatory circuits. Genome Res. 14(10A), 1967–1974 (2004)
Tompa, M., Li, N., Bailey, T., Church, G., DeMoor, B., Eskin, E., Favorov, A., Frith, M., Fu, Y., Kent, W., Makeev, V., Mironov, A., Noble, A., Pavesi, G., Pesole, G., Regnier, M., Simonis, N., Sinha, S., Thijs, G., van Helden, J., Vandenbogaert, M., Weng, Z., Workman, C., Ye, C., Zhu, Z.: Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotech. 23(1), 137–144 (2005)
Wingender, E., Dietze, P., Karas, H., Knuppel, R.: TRANSFAC: a database on transcription factors and their DNA binding sites. Nucleic. Acids. Res. 24(1), 238–241 (1996)
Xing, E.P., Wu, W., Jordan, M.I., Karp, R.M.: Logos: A modular Bayesian model for de novo motif detection. Journal of Bioinformatics and Computational Biology 2(1), 127–154 (2004)
Zhou, Q., Wong, W.H.: Cismodule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. Proc. Natl. Acad. Sci. USA 101(33), 12114–12119 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, Th., Ray, P., Sandve, G.K., Uguroglu, S., Xing, E.P. (2008). BayCis: A Bayesian Hierarchical HMM for Cis-Regulatory Module Decoding in Metazoan Genomes. In: Vingron, M., Wong, L. (eds) Research in Computational Molecular Biology. RECOMB 2008. Lecture Notes in Computer Science(), vol 4955. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-78839-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-78839-3_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-78838-6
Online ISBN: 978-3-540-78839-3
eBook Packages: Computer ScienceComputer Science (R0)