Abstract
Genetic regulatory networks (GRNs) are causal structures which can be represented as large directed graphs. Their inference is a central problem in bioinformatics. Because of the paucity of available data and high levels of associated noise, machine learning is essential to performing good and tractable inference of the underlying causal structure.
This chapter serves as a review of the GRN field as a whole, as well as a roadmap for researchers new to the field. It describes the relevant theoretical and empirical biochemistry and the different types of GRN inference. It also describes the data that can be used to perform GRN inference. With this biologically-centred material as background, the chapter surveys previous applications of machine learning techniques and computational intelligence to GRN inference. It describes clustering, logical and mathematical formalisms, Bayesian approaches and some combinations. Each of these is shortly explained theoretically, and important examples of previous research using each are highlighted. Finally, the chapter analyses wider statistical problems in the field, and concludes with a summary of the main achievements of previous research as well as some open research questions in the field.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. Technical Report 2006-13, Stanford University (2006)
Azuaje, F.: Clustering-based approaches to discovering and visualing microarray data patterns. Brief. Bioinf 4(1), 31–42 (2003)
Balagurunathan, Y., et al.: Noise factor analysis for cDNA microarrays. J. Biomed. Optics 9(4), 663–678 (2004)
Baldwin, J.F., Di Tomaso, E.: Inference and learning in fuzzy Bayesian networks. In: FUZZ 2003: The 12th IEEE Int’l Conf. on Fuzzy Sys., vol. 1, pp. 630–635 (May 2003)
Bar-Joseph, Z., et al.: Computational discovery of gene modules and regulatory networks. Nat. Biotech. 21(11), 1337–1342 (2003)
Barabasi, A.-L., Oltvai, Z.N.: Network biology: Understanding the cell’s functional organisation. Nat. Rev. Genetics 5(2), 101–113 (2004)
Ben-Dor, A., et al.: Clustering gene expression patterns. J. Comp. Bio. 6(3/4), 281–297 (1999)
Di Bernardo, D., et al.: Robust identification of large genetic networks. In: Pacific Symp. on Biocomp., pp. 486–497 (2004)
Bonneau, R., et al.: The inferelator: An algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Bio. 7(R36) (2006)
Cao, Y., et al.: Reverse engineering of NK boolean network and its extensions — fuzzy logic network (FLN). New Mathematics and Natural Computation 3(1), 68–87 (2007)
Cao, Y.: Fuzzy Logic Network Theory with Applications to Gene Regulatory Sys. PhD thesis, Department of Electrical and Computer Engineering, Duke University (2006)
Cao, Y., et al.: Pombe gene regulatory network inference using the fuzzy logic network. New Mathematics and Natural Computation
Zeke, S., Chan, H., et al.: Bayesian learning of sparse gene regulatory networks. Biosystems 87(5), 299–306 (2007)
Chickering, D.M.: Learning Bayesian networks is NP-Complete. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data: Artificial Intelligence and Statistics, pp. 121–130. Springer, Heidelberg (1996)
Chu, T., et al.: A statistical problem for inference to regulatory structure from associations of gene expression measurements with microarrays. Bioinf 19(9), 1147–1152 (2003)
Cohen, I., et al.: Learning Bayesian network classifiers for facial expression recognition using both labeled and unlabeled data. CVPR 1, 595–601 (2003)
Conant, G.C., Wagner, A.: Convergent evolution of gene circuits. Nat. Genetics 34(3), 264–266 (2003)
Cui, Q., et al.: Characterizing the dynamic connectivity between genes by variable parameter regression and kalman filtering based on temporal gene expression data. Bioinf. 21(8), 1538–1541 (2005)
de Jong, H.: Modeling and simulation of genetic regulatory systems: A literature review. J. Comp. Bio. 9(1), 67–103 (2002)
de Leon, A.R., Carriere, K.C.: A generalized Mahalanobis distance for mixed data. J. Multivariate Analysis 92(1), 174–185 (2005)
Dempster, A.P., et al.: Maximum likelihood from incomplete data via the EM algorithm. J. the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)
Dennett, D.C.: Real patterns. J. Philosophy 88, 27–51 (1991)
D’haeseleer, P.: Resconstructing Gene Networks from Large Scale Gene Expression Data. PhD thesis, University of New Mexico, Albuquerque, New Mexico (December 2000)
D’haeseleer, P., Fuhrman, S.: Gene network inference using a linear, additive regulation model. Bioinf. (submitted, 1999)
D’haeseleer, P., et al.: Genetic network inference: From co-expression clustering to reverse engineering. Bioinf. 18(8), 707–726 (2000)
Driscoll, M.E., Gardner, T.S.: Identification and control of gene networks in living organisms via supervised and unsupervised learning. J. Process Control 16(3), 303–311 (2006)
Eisen, M.B., et al.: Cluster analysis and display of genome-wide expression patterns. Proc. of the National Academy of Sciences USA 95(25), 14863–14868 (1998)
FitzGerald, P.C., et al.: Comparative genomics of drosophila and human core promoters. Genome Bio. 7, R53+ (2006)
Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACMÂ 5(6), 345 (1962)
Fogelberg, C.: Belief propagation in fuzzy Bayesian networks: A worked example. In: Faily, S., Zivny, S. (eds.) Proc. 2008 Comlab. Student Conference (October 2008)
Fogelberg, C., Palade, V.: GreenSim: A genetic regulatory network simulator. Tech. Report PRG-RR-08-07, Computing Laboratory, Oxford University, Wolfson Building, Parks Road, Oxford, OX1-3QD (May 2008), http://syntilect.com/cgf/pubs:greensimtr
Fogelberg, C., Zhang, M.: Linear genetic programming for multi-class object classification. In: Zhang, S., Jarvis, R. (eds.) AI 2005. LNCS, vol. 3809, pp. 369–379. Springer, Heidelberg (2005)
Fogelberg, C., et al.: Belief propagation in fuzzy bayesian networks. In: Hatzilygeroudis, I. (ed.) 1st Int’l Workshop on Combinations of Intelligent Methods and Applications(CIMA) at ECAI 2008, University of Patras, Greece, July 21–22 (2008)
Friedman, N.: Learning belief networks in the presence of missing values and hidden variables. In: Proc. of the 14th Int’l Conf. on Machine Learning, pp. 125–133. Morgan Kaufmann, San Francisco (1997)
Friedman, N., et al.: Learning the structure of dynamic probabilistic networks. In: Proc. of the 14th Annual Conf. on Uncertainty in Artificial Intelligence (UAI 1998), vol, pp. 139–147. Morgan Kaufmann, San Francisco (1998)
Friedman, N., et al.: Using Bayesian networks to analyze expression data. J. Comp. Bio. 7(3), 601–620 (2000)
Gardner, T.S., et al.: Inferring microbial genetic networks. ASM News 70(3), 121–126 (2004)
Geman, S., Geman, D.: Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 721–742 (1984)
Giaever, G., et al.: Functional profiling of the Saccharomyces cerevisiae genome. Nat. 418(6896), 387–391 (2002)
Grünwald, P.: The minimum description length principle and non-deductive inference. In: Flach, P. (ed.) Proc. of the IJCAI Workshop on Abduction and Induction in AI, Japan (1997)
Guo, H., Hsu, W.: A survey of algorithms for real-time Bayesian network inference. In: Joint AAAI 2002/KDD 2002/UAI 2002 workshop on Real-Time Decision Support and Diagnosis Sys. (2002)
Gurney, K.: An Introduction to Neural Networks. Taylor & Francis, Inc., Bristol (1997)
Han, E.-H., et al.: Clustering based on association rule hypergraphs. In: Research Issues on Data Mining and Knowledge Discovery, TODO (1997)
Harbison, C.T., et al.: Transcriptional regulatory code of a eukaryotic genome. Nat 431(7004), 99–104 (2004)
Hartemink, A.J., et al.: Combining location and expression data for principled discovery of genetic regulatory network models. In: Pacific Symp. on Biocomp, pp. 437–449 (2002)
Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)
Heckerman, D.: A tutorial on learning with Bayesian networks. Technical report, Microsoft Research, Redmond, Washington (1995)
Heng, X.-C., Qin, Z.: Fpbn: A new formalism for evaluating hybrid Bayesian networks using fuzzy sets and partial least-squares. In: Huang, D.-S., Zhang, X.-P., Huang, G.-B. (eds.) ICIC 2005. LNCS, vol. 3645, pp. 209–217. Springer, Heidelberg (2005)
Herrgard, M.J., et al.: Reconciling gene expression data with known genome-scale regulatory network structures. Genome Research 13(11), 2423–2434 (2003)
Hinman, V.F., et al.: Developmental gene regulatory network architecture across 500 million years of echinoderm evolution. Proc. of the National Academcy of Sciences, USA 100(23), 13356–13361 (2003)
Horimoto, K., Toh, H.: Statistical estimation of cluster boundaries in gene expression profile data. Bioinf. 17(12), 1143–1151 (2001)
Imoto, S., et al.: Estimation of genetic networks and functional structures between genes by using Bayesian networks and nonparametric regression. In: Pacific Symp. on Biocomp., vol. 7, pp. 175–186 (2002)
Imoto, S., et al.: Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. J. Bioinf. and Comp. Bio. 1(2), 231–252 (2003)
Jarvis, E.D., et al.: A framework for integrating the songbird brain. J. Comp. Physiology A 188, 961–980 (2002)
Jiang, D., et al.: Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering 16(11), 1370–1386 (2004)
Kauffman, S.A.: The Origins of Order: Self-Organization and Selection in Evolution. Oxford University Press, Oxford (1993)
Kauffman, S.A.: Antichaos and adaptation. Scientific American 265(2), 78–84 (1991)
Kim, S., et al.: Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data. Biosys 75(1-3), 57–65 (2004)
Kitano, H.: Computational systems biology. Nat 420(6912), 206–210 (2002)
Klebanov, L., Yakovlev, A.: How high is the level of technical noise in microarray data? Bio. Direct 2, 9+ (2007)
Koch, M.A., et al.: Comparative genomics and regulatory evolution: conservation and function of the chs and apetala3 promoters. Mol. Bio. and Evolution 18(10), 1882–1891 (2001)
Krause, E.F.: Taxicab Geometry. Dover Publications (1987)
Kyoda, K.M., et al.: A gene network inference method from continuous-value gene expression data of wild-type and mutants. Genome Informatics 11, 196–204 (2000)
Lähdesmäki, H., et al.: On learning gene regulatory networks under the Boolean network model. Machine Learning 52(1–2), 147–167 (2003)
Lam, W., Bacchus, F.: Learning Bayesian belief networks: An approach based on the MDL principle. In: Comp. Intelligence, vol. 10, pp. 269–293 (1994)
Laplace, P.-S.: Essai philosophique sur les probabilités. Mme. Ve. Courcier (1814)
Le, P.P., et al.: Using prior knowledge to improve genetic network reconstruction from microarray data. Silico Bio. 4 (2004)
Liang, S., et al.: REVEAL: a general reverse enginerring algorithm for inference of genetic network architectures. In: Pacific Symp. on Biocomp, pp. 18–29 (1998)
Lum, P.Y., et al.: Discovering modes of action for therapeutic compounds using a genome-wide screen of yeast heterozygotes. Cell 116(1), 121–137 (2004)
MacKay, D.J.C.: Introduction to Monte Carlo methods. In: Jordan, M.I. (ed.) Learning in Graphical Models. NATO Science Series, pp. 175–204. Kluwer, Dordrecht (1998)
MacKay, D.J.C.: Information Theory, Inference, and Learning Algorithms. Cambridge University Press, Cambridge (2003)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proc. of the 5th Berkeley Symp. on Mathematical Statistics and Probability, pp. 281–297. University of California Press (1967)
Madeira, S.C., Oliveira, A.L.: Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Comp. Bio. and Bioinf. 1(1), 24–45 (2004)
Mahalanobis, P.C.: On the generalised distance in statistics. Proc. of the National Institute of Science of India 12, 49–55 (1936)
Marnellos, G., Mjolsness, E.: A gene network approach to modeling early neurogenesis in drosophila. In: Pacific Symp. on Biocomp., vol. 3, pp. 30–41 (1998)
Massimo, F., Mascioli, F., et al.: Scale-based approach to hierarchical fuzzy clustering. Signal Processing 80(6), 1001–1016 (2000)
McShan, D.C., et al.: Symbolic inference of xenobiotic metabolism. In: Altman, R.B., et al. (eds.) Pacific Symp. on Biocomp., pp. 545–556. World Scientific, Singapore (2004)
Metropolis, N.A., et al.: Equation of state calculations by fast computing machines. J. Chemical Physics 21, 1087–1092 (1956)
Mjolsness, E., et al.: Multi-parent clustering algorithms from stochastic grammar data models. Technical Report JPL-ICTR-99-5, JPL (1999)
Motsinger, A.A., et al.: GPNN: Power studies and applications of a neural network method for detecting gene-gene interactions in studies of human disease. BMC Bioinf. 7, 39 (2006)
Murali, T.M., Kasif, S.: Extracting conserved gene expression motifs from gene expression data. In: Pacific Symp. on Biocomp., pp. 77–88 (2003)
Murphy, K.: Learning Bayes net structure from sparse data sets. Technical report, Comp. Sci. Div., UC Berkeley (2001)
Murphy, K., Mian, S.: Modelling gene expression data using dynamic Bayesian networks. Technical report, Computer Science Division, University of California, Berkeley, CA (1999)
Neal, R.M.: Probabilistic inference using Markov chain Monte Carlo methods. Technical Report CRG-TR-93-1, University of Toronto (1993)
Nykter, M., et al.: Simulation of microarray data with realistic characteristics. Bioinf. 7, 349 (2006)
Pan, H., Liu, L.: Fuzzy Bayesian networks - a general formalism for representation, inference and learning with hybrid Bayesian networks. IJPRAI 14(7), 941–962 (2000)
Pan, H., McMichael, D.: Fuzzy causal probabilistic networks - a new ideal and practical inference engine. In: Proc. of the 1st Int’l Conf. on Multisource-Multisensor Information Fusion (July 1998)
Park, H.-S., et al.: A context-aware music recommendation system using fuzzy Bayesian networks with utility theory. In: Wang, L., Jiao, L., Shi, G., Li, X., Liu, J. (eds.) FSKD 2006. LNCS, vol. 4223, pp. 970–979. Springer, Heidelberg (2006)
Pearl, J.: Causal diagrams for empirical research. Biometrika 82(4), 669–709 (1995)
Perkins, T.J., et al.: Reverse engineering the gap gene network of drosophila melanogaster. PLoS Comp. Bio. 2(5), e51+ (2006)
Pritsker, M., et al.: Whole-genome discovery of transcription factor binding sites by network-level conservation. Genome Research 14(1), 99–108 (2004)
Ranawana, R., Palade, V.: Multi-classifier systems: Review and a roadmap for developers. Int’l J. Hybrid Intelligent Sys. 3(1), 35–61 (2006)
Ritchie, M.D., et al.: Optimization of neural network architecture using genetic programming improves detection and modeling of gene-gene interactions in studies of human diseases. BMC Bioinf. 4, 28 (2003)
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Englewood Cliffs (2002)
Schwarz, G.: Estimating the dimension of a model. The Annals of Statistics 6(2), 461–464 (1978)
Segal, E., et al.: Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genetics 34(2), 166–176 (2003)
Segal, E., et al.: From signatures to models: Understanding cancer using microarrays. Nat. Genetics 37, S38–S45 (2005) (By invitation)
Shamir, R., Sharan, R.: Algorithmic approaches to clustering gene expression data. In: Jiang, T., Smith, T., Xu, Y., Zhang, M.Q. (eds.) Current Topics in Comp. Bio., pp. 269–300. MIT press, Cambridge (2002)
Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27, 379–423, 623–656 (1948)
Sheng, Q., et al.: Biclustering microarray data by Gibbs sampling. Bioinf. 19, ii196–ii205 (2003)
Silvescu, A., Honavar, V.: Temporal Boolean network models of genetic networks and their inference from gene expression time series. Complex Sys 13, 54–70 (2001)
Sivia, D.S.: Data Analysis: A Bayesian Tutorial. Clarendon Press, Oxford (1996)
Smith, V.A., et al.: Evaluating functional network inference using simulations of complex biological systems. Bioinf. 18, S216–S224 (2002)
Smith, V.A., et al.: Influence of network topology and data collection on network inference. In: Pacific Symp. on Biocomp., pp. 164–175 (2003)
Spellman, P.T., et al.: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol. Bio. of the Cell 9(12), 3273–3297 (1998)
Spirtes, P., et al.: Constructing Bayesian network models of gene expression networks from microarray data. In: Proc. of the Atlantic Symp. on Comp. Bio., Genome Information Sys. and Technology (2000)
Sterelny, K., Griffiths, P.E.: Sex and Death: An Introduction to Philosophy of Bio. Science and Its Conceptual Foundations series. University Of Chicago Press (June 1999) ISBN 0226773043
Tang, C., et al.: Interrelated two-way clustering: An unsupervised approach for gene expression data analysis. In: Proc. of the IEEE 2nd Int’l Symp. on Bioinf. and Bioeng. Conf., 2001, November 4–6, pp. 41–48 (2001)
Tegner, J., et al.: Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc. of the National Academy of Sciences, USA 100(10), 5944–5949 (2003)
Thomas, R.: Laws for the dynamics of regulatory networks. Int’l J. Developmental Bio. 42, 479–485 (1998)
Tibshirani, R., et al.: Clustering methods for the analysis of DNA microarray data. Technical report, Stanford University (October 1999)
Toh, H., Horimoto, K.: Inference of a genetic network by a combined approach of cluster analysis and graphical Gaussian modeling. Bioinf. 18(2), 287–297 (2002)
Tong, A.H., et al.: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294(5550), 2364–2368 (2001)
Troyanskaya, O., et al.: Missing value estimation methods for DNA microarrays. Bioinf. 17(6), 520–525 (2001)
Vert, J.-P., Yamanishi, Y.: Supervised graph inference. In: Saul, L.K., et al. (eds.) Advances in Neural Information Processing Sys., vol. 17, pp. 1433–1440. MIT Press, Cambridge (2005)
Vohradskỳ, J.: Neural network model of gene expression. FASEB Journal 15, 846–854 (2001)
Wang, Y., et al.: Inferring gene regulatory networks from multiple microarray datasets. Bioinf. 22(19), 2413–2420 (2006)
Xu, R., Wunsch II, D.: Survey of clustering algorithms. IEEE Transactions on Neural Networks 16(3), 645–678 (2005)
Yamanishi, Y., et al.: Protein network inference from multiple genomic data: a supervised approach. Bioinf. 20(1), 363–370 (2004)
Yang, E., et al.: A novel non-overlapping bi-clustering algorithm for network generation using living cell array data. Bioinf. 23(17), 2306–2313 (2007)
Yu, J., et al.: Using Bayesian network inference algorithms to recover molecular genetic regulatory networks. In: Int’l Conf. on Sys. Bio. (ICSB 2002) (December 2002)
Yu, J., et al.: Advances to Bayesian network inference for generating causal networks from observational biological data. Bioinf. 20(18), 3594–3603 (2004)
Yuh, C.H., et al.: Genomic cis-regulatory logic: Experimental and computational analysis of a sea urchin gene. Science 279, 1896–1902 (1998)
Zhang, Y., et al.: Dynamic Bayesian network (DBN) with structure expectation maximization (SEM) for modeling of gene network from time series gene expression data. In: Arabnia, H.R., Valafar, H. (eds.) BIOCOMP, pp. 41–47. CSREA Press (2006)
Zhou, X., et al.: Gene clustering based on clusterwide mutual information. J. Comp. Bio. 11(1), 147–161 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Fogelberg, C., Palade, V. (2009). Machine Learning and Genetic Regulatory Networks: A Review and a Roadmap. In: Hassanien, AE., Abraham, A., Vasilakos, A.V., Pedrycz, W. (eds) Foundations of Computational, Intelligence Volume 1. Studies in Computational Intelligence, vol 201. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-01082-8_1
Download citation
DOI: https://doi.org/10.1007/978-3-642-01082-8_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-01081-1
Online ISBN: 978-3-642-01082-8
eBook Packages: EngineeringEngineering (R0)