Abstract
Organisms must continually adapt to changing cellular and environmental factors (e.g., oxygen levels) by altering their gene expression patterns. At the same time, all organisms must have stable gene expression patterns that are robust to small fluctuations in environmental factors and genetic variation. Learning and characterizing the structure and dynamics of Regulatory Networks (RNs), on a whole-genome scale, is a key problem in systems biology. Here, we review the challenges associated with inferring RNs in a solely data-driven manner, concisely discuss the implications and contingencies of possible procedures that can be used, specifically focusing on one such procedure, the Inferelator. Importantly, the Inferelator explicitly models the temporal component of regulation, can learn the interactions between transcription factors and environmental factors, and attaches a statistically meaningful weight to every edge. The result of the Inferelator is a dynamical model of the RN that can be used to model the time-evolution of cell state.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bonneau R, Reiss DJ, Shannon P, et al. The Inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol 2006, 7:R36.
Jacob F, Monod J, Sanchez C, Perrin D. Operon: A group of genes with the expression coordinated by an operator. C R Hebd Seances Acad Sci 1960, 250: 1727–29.
Davidson EH. Gene Activity in Early Development . San Diego: Academic Press, 1977.
Samanta MP, Tongprasit W, Istrail S, et al. The transcriptome of the sea urchin embryo. Science 2006, 314:960–62.
Dynlacht BD. Regulation of transcription by proteins that control the cell cycle. Nature 1997, 389:149–52.
Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol 2000, 8:93–103.
Kluger Y, Basri R, Chang JT, Gerstein M. Spectral biclustering of microarray data: Coclustering genes and conditions. Genome Res 2003, 13:703–16.
Sheng Q, Moreau Y, De Moor B. Biclustering microarray data by Gibbs sampling. Bioinformatics 2003, 19(Suppl 2):II196–205.
Tanay A, Sharan R, Kupiec M, Shamir R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc Natl Acad Sci USA 2004, 101:2981–86.
Tanay A, Sharan R, Shamir R. Discovering statistically significant biclusters in gene expression data. Bioinformatics 2002, 18(Suppl 1):S136–44.
Liu X, Sivaganesan S, Yeung KY, Guo J, Bumgarner RE, Medvedovic M. Context-specific infinite mixtures for clustering gene expression profiles across diverse microarray dataset. Bioinformatics 2006, 22:1737–44.
Reiss DJ, Baliga NS, Bonneau R. Integrated biclustering of heterogeneous genome-wide datasets for the inference of global regulatory networks. BMC Bioinformatics 2006, 7:280.
Reinartz J, Bruyns E, Lin JZ, et al. Massively parallel signature sequencing (MPSS) as a tool for in-depth quantitative gene expression profiling in all organisms. Brief Funct Genomic Proteomic 2002, 1:95–104.
Velculescu VE, Zhang L, Vogelstein B, Kinzler KW. Serial analysis of gene expression. Science 1995, 270:484–87.
Boyer LA, Lee TI, Cole MF, et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122:947–56.
Harbison CT, Gordon DB, Lee TI, et al. Transcriptional regulatory code of a eukaryotic genome. Nature 2004, 431:99–104.
Lee TI, Rinaldi NJ, Robert F, et al. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science 2002, 298:799–804.
Eilbeck K, Lewis SE, Mungall CJ, et al. The sequence ontology: A tool for the unification of genome annotations. Genome Biol 2005, 6:R44.
Keseler IM, Collado-Vides J, Gama-Castro S, et al. EcoCyc: A comprehensive database resource for Escherichia coli. Nucleic Acids Res 2005, 33:D334–37.
Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M. The KEGG resource for deciphering the genome. Nucl Acids Res 2004, 32:D277–80.
Wingender E, Chen X, Hehl R, et al. TRANSFAC: An integrated system for gene expression regulation. Nucl Acids Res 2000, 28:316–19.
Boeckmann B, Bairoch A, Apweiler R, et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucl Acids Res 2003, 31:365–70.
Vidal M, Legrain P. Yeast forward and reverse 'n'-hybrid systems. Nucl Acids Res 1999, 27:919–29.
Goodlett DR, Yi EC. Proteomics without polyacrylamide: Qualitative and quantitative uses of tandem mass spectrometry in proteome analysis. Funct Integr Genomics 2002, 2:138–53.
Gunsalus KC, Ge H, Schetter AJ, et al. Predictive models of molecular machines involved in Caenorhabditis elegans early embryogenesis. Nature 2005, 436:861–65.
Weston AD, Baliga NS, Bonneau R, Hood L. Systems approaches applied to the study of Saccharomyces cerevisiae and Halobacterium sp. Cold Spring Harb Symp Quant Biol 2003, 68:345–57.
Savageau MA. Design principles for elementary gene circuits: Elements, methods, and examples. Chaos 2001, 11:142–59.
Wall ME, Hlavacek WS, Savageau MA. Design of gene circuits: Lessons from bacteria. Nat Rev Genet 2004, 5:34–42.
Laub MT, McAdams HH, Feldblyum T, Fraser CM, Shapiro L. Global analysis of the genetic network controlling a bacterial cell cycle. Science 2000, 290:2144–48.
Finn V. Jensen. Bayesian Networks and Decision Graphs. New York: Springer-Verlag, 2001.
Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian networks to analyze expression data. J Comput Biol 2000, 7:601–20.
Pearl J. Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Francisco: Morgan Kaufmann Publishers Inc., 1988.
Bonneau R, Facciotti MT, Reiss DJ, et al. A predictive model for transcriptional control of physiology in a free living cell. Cell 2007; 131:1354–65.
Yeung KY, Medvedovic M, Bumgarner RE. From co-expression to co-regulation: How many microarray experiments do we need? Genome Biol 2004, 5:R48.
Vert JP, Kanehisa M. Extracting active pathways from gene expression data. Bioinformatics 2003, 19(Suppl 2):II238–44.
Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D. Prolinks: A database of protein functional linkages derived from coevolution. Genome Biol 2004, 5:R35.
Mellor JC, Yanai I, Clodfelter KH, Mintseris J, DeLisi C. Predictome: A database of putative functional links between proteins. Nucleic Acids Res 2002, 30:306–09.
Efron B, Hastie T, Johnstone I, Tibshirani R. Least angle regression. Ann Statist 2004, 32:407–99.
Thorsson V, Hornquist M, Siegel AF, Hood L. Reverse engineering galactose regulation in yeast through model selection. Stat Appl Genet Mol Biol 2005, 4:Article28.
Trevor H, Robert T, Jerome F. The Elements of Statistical Learning. New York: Springer, 2001.
Shannon P, Markiel A, Ozier O, et al. Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Res 2003, 13:2498–504.
Shannon PT, Reiss DJ, Bonneau R, Baliga NS. The gaggle: An open-source software system for integrating bioinformatics software and data sources. BMC Bioinformatics 2006, 7:176.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Humana Press, a part of Springer Science+Business Media, LLC
About this protocol
Cite this protocol
Madar, A., Bonneau, R. (2009). Learning Global Models of Transcriptional Regulatory Networks from Data. In: Ireton, R., Montgomery, K., Bumgarner, R., Samudrala, R., McDermott, J. (eds) Computational Systems Biology. Methods in Molecular Biology, vol 541. Humana Press. https://doi.org/10.1007/978-1-59745-243-4_9
Download citation
DOI: https://doi.org/10.1007/978-1-59745-243-4_9
Published:
Publisher Name: Humana Press
Print ISBN: 978-1-58829-905-5
Online ISBN: 978-1-59745-243-4
eBook Packages: Springer Protocols