Abstract
In this chapter, we study different gene regulatory network learning methods based on penalized linear regressions (the Lasso regression and the Dantzig Selector), Bayesian networks, and random forests. We also replicated the learning scheme using bootstrapped sub-samples of the observations. The biological motivation relies on a tough nut to crack in Systems Biology: understanding the intertwined action of genome elements and gene activity to model gene regulatory features of an organism. We introduce the used methodologies, and then assess the methods on simulated “Systems Genetics” (or genetical genomics) datasets. Our results show that methods have very different performances depending on tested simulation settings: total number of genes in the considered network, sample size, gene expression heritability, and chromosome length. We observe that the proposed approaches are able to capture important interaction patterns, but parameter tuning or ad hoc pre- and post-processing may also have an important effect on the overall learning quality.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Asif MS, Romberg JK (2010) Dynamic updating for \( \ell _{1} \) minimization. J Sel Top Sig Process 4(2):421–434
Aten JE, Fuller TF, Lusis AJ, Horvath S (2008) Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Bioinform 2:34
Bach F (2008) Bolasso: model consistent lasso estimation through the bootstrap. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the twenty-fifth international conference on machine learning (ICML), ACM international conference proceeding series, vol 307. Helsinki, Finland, pp 25–32
Bansal M, di Bernardo D (2007) Inference of gene networks from temporal gene expression profiles. IET Syst Biol 1(5):306–312
Box GEP, Cox DR (1964) An analysis of transformations. J Roy Stat Soc Ser B (Methodological), 26(2):211–252
Breiman L (2001) Random forests. Mach Lear 45(1):5–32
Candès E, Tao T (2007) The Dantzig selector: Statistical estimation when \( p \) is much larger than \( n \). Ann Stat 35(6):2313–2351
Chickering D, Heckerman D, Meek C (2004) Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res 5:1287–1330
Efron B, Tibshirani R (1997) Improvements on cross-validation: The. 632+ bootstrap method. J Am Stat Assoc 92(438):548–560
Efron Bradley (1981) Nonparametric estimates of standard errors - the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599
Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models. J Stat Softw 33(1):1–22
Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In: Proceedings of the 15th conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 206–215
Friedman N, Linial M, Nachman I, Peer D (2000) Using Bayesian networks to analyse expression data. J Comput Biol 7(3):601–620
Geurts P, Huynh-Thu V-A (2012) Personal communication
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Series in Statistics, 2nd edn. Springer, New York
Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243
Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9):e12776
Jansen RC, Nap NP (2001) Genetical genomics : the added value from segregation. Trends Genet 17(7):388–391
Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, Cambridge
Lèbre S, Becq J, Devaux F, Stumpf MH, Lelandais G (2010) Statistical inference of the time-varying structure of gene-regulation networks. BMC Systems Biology 4:130
Leclerc RD (2008) Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol 4:213
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22
Liu B, de la Fuente A, Hoeschele I (2008) Gene network inference via structural equation modeling in genetical genomics experiments. Genetics 178(3):1763–1776
Marbach D, Mattiussi C, Floreano D (2009) Replaying the evolutionary tape: biomimetic reverse engineering of gene networks. Ann New York Acad Sci 1158(1):234–245
Pinna A, Soranzo N, Hoeschele I, de la Fuente A (2011) Simulating systems genetics data with SysGenSIM. Bioinformatics 27(17):2459–2462
Rau A, Jaffrezic F, Fouley J-L, Doerge RW (2010) An empirical Bayesian method for estimating biological networks from temporal microarray data. Stat Appl Genet Mol Biol 9(1):art.9
Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform 8:25
Thomas R (1973) Boolean formalization of genetic control circuits. J Theor Biol 42(3):563–585
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodological), 58(1):267–288
Vandel J, Mangin B, de Givry S (2012) New local move operators for Bayesian network structure learning. In: Proceedings of PGM-12, Granada, Spain
Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S (2011) Gene regulatory network reconstruction using Bayesian networks, the Dantzig selector, the lasso and their meta-analysis. PloS one 6(12):e29165
Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L (2003) Trans-acting regulatory variation in saccharomyces cerevisiae and the role of transcription factors. Nat Genet 35(1):57–64
Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE (2007) Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput Biol 3(4):e69
Acknowledgments
We are very grateful to the staff of the GenoToul (Toulouse, France) Bioinformatics plateform for the computational support it provided during this work. We would also like to thank our colleagues from CRS4 Bioinformatica for creating the datasets of the present study.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Allouche, D. et al. (2013). A Panel of Learning Methods for the Reconstruction of Gene Regulatory Networks in a Systems Genetics Context. In: de la Fuente, A. (eds) Gene Network Inference. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45161-4_2
Download citation
DOI: https://doi.org/10.1007/978-3-642-45161-4_2
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45160-7
Online ISBN: 978-3-642-45161-4
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)