Computational Molecular Biology of Genome Expression and Regulation

Zhang, Michael Q.

doi:10.1007/11590316_5

Michael Q. Zhang¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 3776))

Included in the following conference series:

International Conference on Pattern Recognition and Machine Intelligence

1538 Accesses

Abstract

Technological advances in experimental and computational molecular biology have revolutionized the whole fields of biology and medicine. Large-scale sequencing, expression and localization data have provided us with a great opportunity to study biology at the system level. I will introduce some outstanding problems in genome expression and regulation network in which better modern statistical and machine learning technologies are desperately needed.

Recent revolution in genomics has transformed life science. For the first time in history, mankind has been able to sequence the entire human own genome. Bioinformatics, especially computational molecular biology, has played a vital role in extracting knowledge from vast amount of information generated by the high throughput genomics technologies. Today, I am very happy to deliver this key lecture at the First International Conference on Pattern Recognition and Machine Intelligence at the world renowned Indian Statistical Institute (ISI) where such luminaries as Mahalanobis, Bose, Rao and others had worked before. And it is very timely that genomics has attracted new generation of talented young statisticians, reminding us the fact that statistics was essentially conceived from and continuously nurtured by biological problems. Pattern/rule recognition is at the heart of all learning process and hence of all disciplines of sciences, and comparison is the fundamental method: it is the similarities that allow inferring common rules; and it is the differences that allow deriving new rules.

Gene expression, normally referring to the cellular processes that lead to protein production, is controlled and regulated at multiple levels. Cells use this elaborate system of “circuits” and “switches” to decide when, where and by how much each gene should be turned on (activated, expressed) or off (repressed, silenced) in response to environmental clues. Genome expression and regulation refer to coordinated expression and regulation of many genes at large-scales for which advanced computational methods become indispensable. Due to space limitations, I can only highlight some of the pattern recognition problems in transcriptional regulation, which is the most important and best studied.

Currently, there are two general outstanding problems in transcriptional regulation studies: (1) How to find the regulatory regions, in particular, the promoters regions in the genome (throughout most of this lecture, we use promoter to refer to proximal promoters, e.g. ~ 1kb DNA at the beginning of each gene); (2) How to identify functional cis-regulatory DNA elements within each such region.

Download to read the full chapter text

Chapter PDF

Application of Machine-Learning Methods to Understand Gene Expression Regulation

Computational Methods for Inference of Gene Regulatory Networks from Gene Expression Data

Expression Analysis and Genome Annotations with RNA Sequencing

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Bailey, T.L., Elkan, C.: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 2, pp. 28–36 (1994)
Google Scholar
Bajic, V.B., Seah, S.H., Chong, A., Zhang, G., Koh, J.L., Brusic, V.: Dragon Promoter Finder: Recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18(1), 198–199 (2002)
Article Google Scholar
Bajic, V.B., Brusic, V.: Computational detection of vertebrate RNA polymerase II promoters. Methods Enzymol. 370, 237–250 (2003)
Article Google Scholar
Bajic, V.B., Tan, S.L., Suzuki, Y., Sagano, S.: Promoter prediction analysis on the whole human genome. Nat. Biotechnol. 22(11), 1467–1473 (2004)
Article Google Scholar
Barash, Y., Bejerano, G., Friedman, N.: A simple hyper-geometric approach for discovering putative transcription factor binding sites. In: Gascuel, O., Moret, B.M.E. (eds.) WABI 2001. LNCS, vol. 2149, pp. 278–293. Springer, Heidelberg (2001)
Chapter Google Scholar
Ben-Gal, I., Shani, A., Gohr, A., Grau, J., Arviv, S., Shmilovici, A., Posch, S., Grosse, I.: Identification of transcription factor binding sites with variable-order Bayesian networks. Bioinformatics 21(11), 2657–2666 (2005)
Article Google Scholar
Berg, O.G., von Hippel, P.H.: Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. J. Mol. Biol. 191(4), 723–750 (1987)
Article Google Scholar
Boffelli, D., Nobrega, M.A., Rubin, E.M.: Comparative genomics at the vertebrate extremes. Nat. Rev. Genet. 5(6), 456–465 (2004)
Article Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Building a dictionary for genomes: Identification of presumptive regulatory sites by statistical analysis. Proc. Natl. Acad Sci USA 97(18), 10096–10100 (2000)
Article MathSciNet Google Scholar
Bussemaker, H.J., Li, H., Siggia, E.D.: Regulatory element detection using correlation with expression. Nat. Genet. 27(2), 167–171 (2001)
Article Google Scholar
Conlon, E.M., Liu, X.S., Lieb, J.D., Liu, J.S.: Integrating regulatory motif discovery and genome-wide expression analysis. Proc. Natl. Acad. Sci. USA 100(6), 3339–3344 (2003)
Article Google Scholar
Das, D., Banerjee, N., Zhang, M.Q.: Interacting models of cooperative gene regulation. Proc. Natl. Acad. Sci. USA 101(46), 16234–16239 (2004)
Article Google Scholar
Davuluri, R.V., Grosse, I., Zhang, M.Q.: Computational identification of promoters and first exons in the human genome. Nat. Genet. 29(4), 412–417 (2001); Erratum: Nat Genet. 32(3), 459 (2002)
Article Google Scholar
Down, T.A., Hubbard, T.J.: Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res. 12(3), 458–461 (2002)
Article Google Scholar
Eddy, S.R.: Computational genomics of noncoding RNA genes. Cell. 109(2), 137–140 (2002)
Article Google Scholar
Fazzari, M.J., Greally, J.M.: Epigenomics: Beyond CpG islands. Nat. Rev. Genet. 5(6), 446–455 (2004)
Article Google Scholar
Friedman, M.J.: Multivariate adaptive regression splines. Ann. Stat. 19, 1–67 (1991)
Article MATH Google Scholar
Gasch, A.P., Moses, A.M., Chiang, D.Y., Fraser, H.B., Berardini, M., Eisen, M.B.: Conservation and evolution of cis-regulatory systems in ascomycete fungi. PloS Biol. 2(12), 398 (2004)
Article Google Scholar
Hong, P., Liu, X.S., Zhou, Q., Lu, X., Liu, J.S., Wong, W.H.: A boosting approach for motif modeling using ChIP-chip data. Bioinformatics 21(11), 2636–2643 (2005)
Article Google Scholar
Ioshikhes, I.P., Zhang, M.Q.: Large-scale human promoter mapping using CpG islands. Nat. Genet. 26(1), 61–63 (2000)
Article Google Scholar
Kim, T.H., Barrera, L.O., Zheng, M., Qu, C., Singer, M.A., Richmond, T.A., Wu, Y., Green, R.D., Ren, B.: A high-resolution map of active promoters in the human genome. Nature (2005) (e-pub ahead of print)
Google Scholar
Lawrence, C.E., Altschul, S.F., Boguski, M.S., Liu, J.S., Neuwald, A.F., Wootton, J.C.: Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science 262(5131), 208–214 (1993)
Article Google Scholar
Levine, M., Davidson, E.H.: Gene regulatory networks for development. Proc. Natl. Acad. Sci. USA 102(14), 4936–4942 (2005)
Article Google Scholar
Li, W., Meyer, C.A., Liu, X.S.: A hidden Marcov model for analyzing ChIP-chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics 21(Suppl. 1), i274–i282 (2005)
Article Google Scholar
Liu, X.S., Brutlag, D.L., Liu, J.S.: An algorithm for finding protein-DNA binding sites with applications to chromatin-immunoprecipitation microarray experiments. Nat. Biotechnol. 20(8), 835–839 (2002)
Google Scholar
Lucchetta, E.M., Lee, J.H., Fu, L.A., Patel, N.H., Ismagilov, R.F.: Dynamics of Drosophila embryonic patterning network perturbed in space and time using microfluidics. Nature 434(7037), 1134–1138 (2005)
Article Google Scholar
Maniatis, T., Reed, R.: An extensive network of coupling among gene expression machines. Nature 416(6880), 499–506 (2002)
Article Google Scholar
Nobrega, M.A., Ovcharenko, I., Afzal, V., Rubin, E.M.: Scanning human gene deserts for long-range enhancers. Science 302(5644), 413 (2003)
Article Google Scholar
Pavlidis, P., Furey, T.S., Liberto, M., Haussler, D., Grundy, W.: Promoter region-based classification of genes. In: Pac. Symp. Biocomput., pp. 151–163 (2001)
Google Scholar
Pedersen, A.G., Engelbrecht, J.: Investigations of Escherichia coli promoter sequences with artificial neural networks: New signals discovered upstream of the transcriptional start-point. In: Proc. Int. Conf. Intell. Syst. Mol. Biol., vol. 3, pp. 292–299 (1995)
Google Scholar
Prakash, A., Tompa, M.: Statistics of local multiple alignments. Bioinformatics 21(Suppl. 1), i344–i350 (2005)
Article Google Scholar
Scherf, M., Klingenhoff, A., Werner, T.: Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: A novel contact analysis approach. J. Mol. Biol. 297(3), 599–606 (2000)
Article Google Scholar
Segal, E., Barash, Y., Simon, I., Friedman, N., Koller, D.: From promoter sequence to expression: A probabilistic framework. In: Proc. 6th Intl. Conf. Res. Comp. Mol. Biol., pp. 263–272 (2002)
Google Scholar
Siggers, T.W., Silkov, A., Honig, B.: Structural alignment of protein-DNA interfaces: Insights into the determinants of binding specificity. J. Mol. Biol. 345(5), 1027–1045 (2005)
Article Google Scholar
Smale, S.T., Kadonaga, J.T.: The RNA Polymerase II core promoter. Annu. Rev. Biochem. 72, 449–479 (2003)
Article Google Scholar
Smith, A.D., Sumazin, P., Zhang, M.Q.: Identifying tissue-selective transcription factor binding sites in vertebrate promoters. Proc. Natl. Acad. Sci USA 102(5), 1560–1565 (2005)
Article Google Scholar
Stormo, G.D., Hartzell, G.W.: 3rd Identifying protein-building sites from unaligned DNA fragments. Proc. Natl. Acad. Sci. U.S.A. 86(4), 1183–1187 (1989)
Article Google Scholar
Sumazin, P., Chen, G., Hata, N., Smith, A.D., Zhang, T., Zhang, M.Q.: DWE: Discriminating word enumerator. Bioinformatics 21(1), 31–38 (2005)
Article Google Scholar
Taatjes, D.J., Marr, M.T., Tjian, R.: Regulatory diversity among metazoan co-activator complexes. Nat. Rev. Mol. Cell. Biol. 5(5), 403–410 (2004)
Article Google Scholar
Tharakaraman, K., Marino-Ramirez, L., Sheetlin, S., Landsman, D., Spouge, J.L.: Alignments anchored on genomic landmarks can aid in the identification of regulatory elements. Bioinformatics 21(Suppl. 1), i440–i448 (2005)
Article Google Scholar
Tipping, M.E.: Space Bayesian learning and the relevance vector machine. J. Machine Learning Res. 1, 211–244 (2001)
Article MATH MathSciNet Google Scholar
Workman, C.T., Stormo, G.D.: ANN-Spec: A method for discovering transcription factor binding sites with improved specificity. In: Pac. Symp. Biocomput., pp. 467–478 (2000)
Google Scholar
Wray, G.A.: Transcriptional regulation and the evolution of development. Int. J. Dev. Biol. 47(7-8), 675–684 (2003)
Google Scholar
Xuan, Z., Zhao, F., Wang, J.H., Chen, G.X., Zhang, M.Q.: Genome-wide promoter extraction and analysis in human, mouse and rat. Genome Biol. (2005) (In Press)
Google Scholar
Zhang, M.Q., Marr, T.G.: A weight array method for splicing signal analysis. Comput. Appl. Biosci. 9(5), 499–509 (1993)
Google Scholar
Zhang, M.Q.: Identification of human gene core promoters in silico. Genome Res. 8(3), 319–326 (1998)
Google Scholar
Zhang, M.Q.: Discriminant analysis and its application in DNA sequence motif recognition. Brief Bioinform. 1(4), 331–342 (2000)
Article Google Scholar
Zhang, M.Q.: Computational prediction of eukaryotic protein-coding genes. Nat. Rev. Genet. 3(9), 698–709 (2002)
Article Google Scholar
Zhang, M.Q.: Computational methods for promoter recognition. In: Jiang, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Computational Molecular Biology, pp. 249–268. MIT Press, Cambridge (2002)
Google Scholar
Zhang, M.Q.: Inferring gene regulatory networks. In: Lengquer, T. (ed.) Bioinformatics – from Genome to Therapies. Wiley-VCH, Chichester (2005) (submitted)
Google Scholar

Download references

Author information

Authors and Affiliations

Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
Michael Q. Zhang

Authors

Michael Q. Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Soft Computing Research, Machine Intelligence Unit, Indian Statistical Institute, India
Sankar K. Pal
Machine Intelligence Unit, Indian Statistical Institute, 203 B. T. Road, 700108, Kolkata
Sanghamitra Bandyopadhyay
Machine Intelligence Unit, Indian Statistical Institute, 700 108, Kolkata, India
Sambhunath Biswas

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, M.Q. (2005). Computational Molecular Biology of Genome Expression and Regulation. In: Pal, S.K., Bandyopadhyay, S., Biswas, S. (eds) Pattern Recognition and Machine Intelligence. PReMI 2005. Lecture Notes in Computer Science, vol 3776. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11590316_5

Download citation

DOI: https://doi.org/10.1007/11590316_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-30506-4
Online ISBN: 978-3-540-32420-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Computational Molecular Biology of Genome Expression and Regulation

Abstract

Chapter PDF

Similar content being viewed by others

Application of Machine-Learning Methods to Understand Gene Expression Regulation

Computational Methods for Inference of Gene Regulatory Networks from Gene Expression Data

Expression Analysis and Genome Annotations with RNA Sequencing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Computational Molecular Biology of Genome Expression and Regulation

Abstract

Chapter PDF

Similar content being viewed by others

Application of Machine-Learning Methods to Understand Gene Expression Regulation

Computational Methods for Inference of Gene Regulatory Networks from Gene Expression Data

Expression Analysis and Genome Annotations with RNA Sequencing

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation