Advertisement

A Panel of Learning Methods for the Reconstruction of Gene Regulatory Networks in a Systems Genetics Context

  • David Allouche
  • Christine Cierco-Ayrolles
  • Simon de Givry
  • Gérald Guillermin
  • Brigitte Mangin
  • Thomas Schiex
  • Jimmy Vandel
  • Matthieu VignesEmail author
Chapter

Abstract

In this chapter, we study different gene regulatory network learning methods based on penalized linear regressions (the Lasso regression and the Dantzig Selector), Bayesian networks, and random forests. We also replicated the learning scheme using bootstrapped sub-samples of the observations. The biological motivation relies on a tough nut to crack in Systems Biology: understanding the intertwined action of genome elements and gene activity to model gene regulatory features of an organism. We introduce the used methodologies, and then assess the methods on simulated “Systems Genetics” (or genetical genomics) datasets. Our results show that methods have very different performances depending on tested simulation settings: total number of genes in the considered network, sample size, gene expression heritability, and chromosome length. We observe that the proposed approaches are able to capture important interaction patterns, but parameter tuning or ad hoc pre- and post-processing may also have an important effect on the overall learning quality.

Keywords

Bayesian Network Random Forest Gene Regulatory Network Confidence Score Allelic State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We are very grateful to the staff of the GenoToul (Toulouse, France) Bioinformatics plateform for the computational support it provided during this work. We would also like to thank our colleagues from CRS4 Bioinformatica for creating the datasets of the present study.

References

  1. Asif MS, Romberg JK (2010) Dynamic updating for \( \ell _{1} \) minimization. J Sel Top Sig Process 4(2):421–434Google Scholar
  2. Aten JE, Fuller TF, Lusis AJ, Horvath S (2008) Using genetic markers to orient the edges in quantitative trait networks: the NEO software. BMC Bioinform 2:34Google Scholar
  3. Bach F (2008) Bolasso: model consistent lasso estimation through the bootstrap. In: Cohen WW, McCallum A, Roweis ST (eds) Proceedings of the twenty-fifth international conference on machine learning (ICML), ACM international conference proceeding series, vol 307. Helsinki, Finland, pp 25–32Google Scholar
  4. Bansal M, di Bernardo D (2007) Inference of gene networks from temporal gene expression profiles. IET Syst Biol 1(5):306–312PubMedCrossRefGoogle Scholar
  5. Box GEP, Cox DR (1964) An analysis of transformations. J Roy Stat Soc Ser B (Methodological), 26(2):211–252Google Scholar
  6. Breiman L (2001) Random forests. Mach Lear 45(1):5–32CrossRefGoogle Scholar
  7. Candès E, Tao T (2007) The Dantzig selector: Statistical estimation when \( p \) is much larger than \( n \). Ann Stat 35(6):2313–2351CrossRefGoogle Scholar
  8. Chickering D, Heckerman D, Meek C (2004) Large-sample learning of Bayesian networks is NP-hard. J Mach Learn Res 5:1287–1330Google Scholar
  9. Efron B, Tibshirani R (1997) Improvements on cross-validation: The. 632+ bootstrap method. J Am Stat Assoc 92(438):548–560Google Scholar
  10. Efron Bradley (1981) Nonparametric estimates of standard errors - the jackknife, the bootstrap and other methods. Biometrika 68(3):589–599CrossRefGoogle Scholar
  11. Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models. J Stat Softw 33(1):1–22PubMedCentralPubMedGoogle Scholar
  12. Friedman N, Nachman I, Peér D (1999) Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In: Proceedings of the 15th conference on uncertainty in artificial intelligence, Stockholm, Sweden, pp 206–215Google Scholar
  13. Friedman N, Linial M, Nachman I, Peer D (2000) Using Bayesian networks to analyse expression data. J Comput Biol 7(3):601–620PubMedCrossRefGoogle Scholar
  14. Geurts P, Huynh-Thu V-A (2012) Personal communicationGoogle Scholar
  15. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction. Series in Statistics, 2nd edn. Springer, New YorkGoogle Scholar
  16. Heckerman D, Geiger D, Chickering DM (1995) Learning Bayesian networks: the combination of knowledge and statistical data. Mach Learn 20(3):197–243Google Scholar
  17. Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P (2010) Inferring regulatory networks from expression data using tree-based methods. PLoS ONE 5(9):e12776PubMedCentralPubMedCrossRefGoogle Scholar
  18. Jansen RC, Nap NP (2001) Genetical genomics : the added value from segregation. Trends Genet 17(7):388–391PubMedCrossRefGoogle Scholar
  19. Koller D, Friedman N (2009) Probabilistic graphical models: principles and techniques. MIT press, CambridgeGoogle Scholar
  20. Lèbre S, Becq J, Devaux F, Stumpf MH, Lelandais G (2010) Statistical inference of the time-varying structure of gene-regulation networks. BMC Systems Biology 4:130Google Scholar
  21. Leclerc RD (2008) Survival of the sparsest: robust gene networks are parsimonious. Mol Syst Biol 4:213PubMedCentralPubMedCrossRefGoogle Scholar
  22. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22Google Scholar
  23. Liu B, de la Fuente A, Hoeschele I (2008) Gene network inference via structural equation modeling in genetical genomics experiments. Genetics 178(3):1763–1776PubMedCrossRefGoogle Scholar
  24. Marbach D, Mattiussi C, Floreano D (2009) Replaying the evolutionary tape: biomimetic reverse engineering of gene networks. Ann New York Acad Sci 1158(1):234–245CrossRefGoogle Scholar
  25. Pinna A, Soranzo N, Hoeschele I, de la Fuente A (2011) Simulating systems genetics data with SysGenSIM. Bioinformatics 27(17):2459–2462PubMedCrossRefGoogle Scholar
  26. Rau A, Jaffrezic F, Fouley J-L, Doerge RW (2010) An empirical Bayesian method for estimating biological networks from temporal microarray data. Stat Appl Genet Mol Biol 9(1):art.9Google Scholar
  27. Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: Illustrations, sources and a solution. BMC Bioinform 8:25CrossRefGoogle Scholar
  28. Thomas R (1973) Boolean formalization of genetic control circuits. J Theor Biol 42(3):563–585PubMedCrossRefGoogle Scholar
  29. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J Roy Stat Soc Ser B (Methodological), 58(1):267–288Google Scholar
  30. Vandel J, Mangin B, de Givry S (2012) New local move operators for Bayesian network structure learning. In: Proceedings of PGM-12, Granada, SpainGoogle Scholar
  31. Vignes M, Vandel J, Allouche D, Ramadan-Alban N, Cierco-Ayrolles C, Schiex T, Mangin B, de Givry S (2011) Gene regulatory network reconstruction using Bayesian networks, the Dantzig selector, the lasso and their meta-analysis. PloS one 6(12):e29165PubMedCentralPubMedCrossRefGoogle Scholar
  32. Yvert G, Brem RB, Whittle J, Akey JM, Foss E, Smith EN, Mackelprang R, Kruglyak L (2003) Trans-acting regulatory variation in saccharomyces cerevisiae and the role of transcription factors. Nat Genet 35(1):57–64PubMedCrossRefGoogle Scholar
  33. Zhu J, Wiener MC, Zhang C, Fridman A, Minch E, Lum PY, Sachs JR, Schadt EE (2007) Increasing the power to detect causal associations by combining genotypic and expression data in segregating populations. PLoS Comput Biol 3(4):e69PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • David Allouche
    • 1
  • Christine Cierco-Ayrolles
    • 1
  • Simon de Givry
    • 1
  • Gérald Guillermin
    • 1
  • Brigitte Mangin
    • 1
  • Thomas Schiex
    • 1
  • Jimmy Vandel
    • 1
  • Matthieu Vignes
    • 1
    Email author
  1. 1.UR875 MIA-TINRA ToulouseCastanet-TolosanFrance

Personalised recommendations