Advertisement

Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data

  • Marco Grzegorczyk
  • Andrej Aderhold
  • Dirk HusmeierEmail author
Protocol
Part of the Methods in Molecular Biology book series (MIMB, volume 1883)

Abstract

A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.

Key words

Gene regulatory networks Gaussian graphical models Sparse regression Hierarchical Bayesian models Gaussian processes Bayesian networks Chemical model averaging Bio-PEPA Network inference scoring scheme Circadian regulation Arabidopsis thaliana 

References

  1. 1.
    Ptashne M, Gann A (2001) Genes and signals. Cold Spring Harbor Laboratory Press, Cold Spring HarborGoogle Scholar
  2. 2.
    Barenco M, Tomescu D, Brewer D, Callard R, Stark J, Hubank M (2006) Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biol 7(3):R25CrossRefGoogle Scholar
  3. 3.
    Lawrence ND, Girolami M, Rattray M, Sanguinetti G (2010) Learning and inference in computational systems biology. MIT Press, CambridgeGoogle Scholar
  4. 4.
    Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks. Bioinformatics 19:2271–2282CrossRefGoogle Scholar
  5. 5.
    Zoppoli P, Morganella S, Ceccarelli M (2010) TimeDelay-ARACNE: reverse engineering of gene networks from time-course data by an information theoretic approach. BMC Bioinf 11:154CrossRefGoogle Scholar
  6. 6.
    Morrissey ER, Juárez MA, Denby KJ, Burroughs NJ (2011) Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. Biostatistics 12(4):682–694CrossRefGoogle Scholar
  7. 7.
    Schäfer J, Strimmer K (2005) A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat Appl Genomics Mol Biol 4(1). https://doi.org/10.2202/1544-6115.1175
  8. 8.
    Friedman J, Hastie T, Tibshirani R (2008) Sparse inverse covariance estimation with the graphical Lasso. Biostatistics 9:432–441CrossRefGoogle Scholar
  9. 9.
    Opgen-Rhein R, Strimmer K (2007) From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst Biol 1(37). https://doi.org/10.1186/1752-0509-1-37 CrossRefGoogle Scholar
  10. 10.
    Tibshirani R (1995) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288Google Scholar
  11. 11.
    Hastie T, Tibshirani R, Friedman JJH (2009) The elements of statistical learning. Springer, New YorkCrossRefGoogle Scholar
  12. 12.
    Zou H, Hastie T (2005) Regularization and variable selection via the Elastic Net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320CrossRefGoogle Scholar
  13. 13.
    Ahmed A, Xing EP (2009) Recovering time-varying networks of dependencies in social and biological studies. Proc Natl Acad Sci 106:11878–11883CrossRefGoogle Scholar
  14. 14.
    Grzegorczyk M, Husmeier D (2012) A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology. Stat Appl Genet Mol Biol 11(4). Article 7Google Scholar
  15. 15.
    Bishop CM (2006) Pattern recognition and machine learning. Springer, SingaporeGoogle Scholar
  16. 16.
    Tipping M (2001) Spare Bayesian learning and the relevance vector machine. J Mach Learn Res 1:211–244Google Scholar
  17. 17.
    Rogers S, Girolami M (2005) A Bayesian regression approach to the inference of regulatory networks from gene expression data. Bioinformatics 21(14):3131–3137CrossRefGoogle Scholar
  18. 18.
    Murphy KP (2012) Machine learning: a probabilistic perspective. MIT Press, CambridgeGoogle Scholar
  19. 19.
    Smith M, Kohn R (1996) Nonparametric regression using Bayesian variable selection. J Econom 75:317–343CrossRefGoogle Scholar
  20. 20.
    Beal M, Falciani F, Ghahramani Z, Rangel C, Wild D (2005) A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21(3):349–356CrossRefGoogle Scholar
  21. 21.
    Beal M (2003) Variational algorithms for approximate Bayesian inference. PhD thesis, Gatsby Computational Neuroscience Unit, University College London, LondonGoogle Scholar
  22. 22.
    Rasmussen C, Williams C (2006) Gaussian processes for machine learning, vol 1. MIT Press, CambridgeGoogle Scholar
  23. 23.
    Äijö T, Lähdesmäki H (2009) Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics. Bioinformatics 25(22):2937–2944CrossRefGoogle Scholar
  24. 24.
    Ko Y, Zhai C, Rodriguez-Zas S (2007) Inference of gene pathways using Gaussian mixture models. In: International conference on bioinformatics and biomedicine, Fremont, pp 362–367Google Scholar
  25. 25.
    Ko Y, Zhai C, Rodriguez-Zas S (2009) Inference of gene pathways using mixture Bayesian networks. BMC Syst Biol 3:54CrossRefGoogle Scholar
  26. 26.
    Geiger D, Heckerman D (1994) Learning Gaussian networks. In: International conference on uncertainty in artificial intelligence. Morgan Kaufmann Publishers, San Francisco, pp 235–243Google Scholar
  27. 27.
    Aderhold A, Husmeier D, Grzegorczyk M (2017) Approximate Bayesian inference in semi-mechanistic models. Stat Comput 27(4):1003–1040CrossRefGoogle Scholar
  28. 28.
    Oates CJ, Dondelinger F, Bayani N, Korkola J, Gray JW, Mukherjee S (2014) Causal network inference using biochemical kinetics. Bioinformatics 30(17):i468–i474CrossRefGoogle Scholar
  29. 29.
    Pokhilko A, Hodge S, Stratford K, Knox K, Edwards K, Thomson A, Mizuno T, Millar A (2010) Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model. Mol Syst Biol 6(1):416PubMedPubMedCentralGoogle Scholar
  30. 30.
    Pokhilko A, Fernández A, Edwards K, Southern M, Halliday K, Millar A (2012) The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops. Mol Syst Biol 8:574CrossRefGoogle Scholar
  31. 31.
    Marin JM, Robert CP (2007) Bayesian core: a practical approach to computational Bayesian statistics. Springer, New YorkGoogle Scholar
  32. 32.
    Chib S, Jeliazkov I (2001) Marginal likelihood from the Metropolis–Hastings output. J Am Stat Assoc 96(453):270–281CrossRefGoogle Scholar
  33. 33.
    Holsclaw T, Sansó B, Lee HK, Heitmann K, Habib S, Higdon D, Alam U (2013) Gaussian process modeling of derivative curves. Technometrics 55(1):57–67CrossRefGoogle Scholar
  34. 34.
    Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, CambridgeGoogle Scholar
  35. 35.
    Friedman J, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33(1): 1–22CrossRefGoogle Scholar
  36. 36.
    Brooks S, Gelman A (1999) General methods for monitoring convergence of iterative simulations. J Comput Graph Stat 7:434–455Google Scholar
  37. 37.
    Gelman A, Rubin D (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–472CrossRefGoogle Scholar
  38. 38.
    Tipping M, Faul A, et al (2003) Fast marginal likelihood maximisation for sparse Bayesian models. In: International workshop on artificial intelligence and statistics, vol 1, pp 3–6Google Scholar
  39. 39.
    Aderhold A, Husmeier D, Grzegorczyk M (2014) Statistical inference of regulatory networks for circadian regulation. Stat Appl Genet Mol Biol 13(3):227–273CrossRefGoogle Scholar
  40. 40.
    Nabney I (2002) NETLAB: algorithms for pattern recognition. Springer, BerlinGoogle Scholar
  41. 41.
    Locke JCW, Kozma-Bognár L, Gould PD, Fehér B, Kevei E, Nagy F, Turner MS, Hall A, Millar AJ (2006) Experimental validation of a predicted feedback loop in the multi-oscillator clock of Arabidopsis thaliana. Mol Syst Biol 2(59).  https://doi.org/10.1038/msb4100102
  42. 42.
    Pokhilko A, Mas P, Millar AJ, et al (2013) Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs. BMC Syst Biol 7(1):1–12CrossRefGoogle Scholar
  43. 43.
    Trejo-Banos D, Millar AJ, Sanguinetti G (2015) A Bayesian approach for structure learning in oscillating regulatory networks. Bioinformatics 31:3617–3624PubMedPubMedCentralGoogle Scholar
  44. 44.
    Guerriero M, Pokhilko A, Fernández A, Halliday K, Millar A, Hillston J (2012) Stochastic properties of the plant circadian clock. J R Soc Interface 9(69):744–756CrossRefGoogle Scholar
  45. 45.
    Wilkinson DJ (2009) Stochastic modelling for quantitative description of heterogeneous biological systems. Nat Rev Genet 10(2): 122–133CrossRefGoogle Scholar
  46. 46.
    Wilkinson D (2011) Stochastic modelling for systems biology, vol 44. CRC Press, Boca RatonGoogle Scholar
  47. 47.
    Ciocchetta F, Hillston J (2009) Bio-PEPA: a framework for the modelling and analysis of biological systems. Theor Comput Sci 410(33):3065–3084CrossRefGoogle Scholar
  48. 48.
    Gillespie D (1977) Exact stochastic simulation of coupled chemical reactions. J Phys Chem 81(25):2340–2361CrossRefGoogle Scholar
  49. 49.
    Flis A, Fernández AP, Zielinski T, Mengin V, Sulpice R, Stratford K, Hume A, Pokhilko A, Southern MM, Seaton DD, McWatters HG, Stitt M, Halliday KJ, Millar AJ (2015) Defining the robust behaviour of the plant clock gene circuit with absolute RNA timeseries and open infrastructure. Open Biol 5(10):150042.  https://doi.org/10.1098/rsob.150042 CrossRefGoogle Scholar
  50. 50.
    Edwards K, Akman O, Knox K, Lumsden P, Thomson A, Brown P, Pokhilko A, Kozma-Bognar L, Nagy F, Rand D, et al (2010) Quantitative analysis of regulatory flexibility under changing environmental conditions. Mol Syst Biol 6(1):424PubMedPubMedCentralGoogle Scholar
  51. 51.
    Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1):29–36CrossRefGoogle Scholar
  52. 52.
    Davis J, Goadrich M (2006) The relationship between precision-recall and ROC curves. In: Proceedings of the 23rd international conference on machine learning (ICML). ACM, New York, pp 233–240Google Scholar
  53. 53.
    Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, Allison KR, Kellis M, Collins JJ, Stolovitzky G, et al (2012) Wisdom of crowds for robust gene network inference. Nat Methods 9(8): 796–804CrossRefGoogle Scholar
  54. 54.
    Rasmussen CE (1996) Evaluation of Gaussian processes and other methods for non-linear regression. PhD thesis, CiteseerGoogle Scholar
  55. 55.
    Rasmussen CE, Neal RM, Hinton GE, van Camp D, Revow M, Ghahramani Z, Kustra R, Tibshirani R (1996) The DELVE repository was developed as part of a PhD thesis, which could be cited as an alternative to the technical report: Carl Edward Rasmussen Evaluation of Gaussian Processes and other Methods for Non-Linear Regression PhD thesis University of TorontoGoogle Scholar
  56. 56.
    Brandt S (1999) Data analysis: statistical and computational methods for scientists and engineers. Springer, New YorkCrossRefGoogle Scholar
  57. 57.
    Neuneier R, Hergert F, Finnoff W, Ormoneit D (1994) Estimation of conditional densities: a comparison of neural network approaches. In: International conference on artificial neural networks. Springer, Berlin, pp 689–692Google Scholar
  58. 58.
    Mockler T, Michael T, Priest H, Shen R, Sullivan C, Givan S, McEntee C, Kay S, Chory J (2007) The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. In: Cold Spring Harbor symposia on quantitative biology, vol 72. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, pp 353–363Google Scholar
  59. 59.
    Fogelmark K, Troein C (2014) Rethinking transcriptional activation in the Arabidopsis circadian clock. PLoS Comput Biol 10(7):e1003705CrossRefGoogle Scholar
  60. 60.
    Grzegorczyk M, Aderhold A, Husmeier D (2015) Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles. Stat Appl Genet Mol Biol 14(2):143–167CrossRefGoogle Scholar
  61. 61.
    Locke JCW, Southern MM, Kozma-Bognár L, Hibberd V, Brown PE, Turner MS, Millar AJ (2005) Extension of a genetic network model by iterative experimentation and mathematical analysis. Mol Syst Biol 1(1)CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Marco Grzegorczyk
    • 1
  • Andrej Aderhold
    • 2
  • Dirk Husmeier
    • 3
    Email author
  1. 1.Johann Bernoulli InstituteUniversity of GroningenGroningenThe Netherlands
  2. 2.Center for Computer ScienceUniversidade Federal do Rio GrandeRio GrandeBrazil
  3. 3.School of Mathematics and StatisticsUniversity of GlasgowGlasgowUK

Personalised recommendations