Inferring Gene Regulatory Networks from Multiple Datasets
Gaussian process dynamical systems (GPDS) represent Bayesian nonparametric approaches to inference of nonlinear dynamical systems, and provide a principled framework for the learning of biological networks from multiple perturbed time series measurements of gene or protein expression. Such approaches are able to capture the full richness of complex ODE models, and can be scaled for inference in moderately large systems containing hundreds of genes. Related hierarchical approaches allow for inference from multiple datasets in which the underlying generative networks are assumed to have been rewired, either by context-dependent changes in network structure, evolutionary processes, or synthetic manipulation. These approaches can also be used to leverage experimentally determined network structures from one species into another where the network structure is unknown. Collectively, these methods provide a comprehensive and flexible platform for inference from a diverse range of data, with applications in systems and synthetic biology, as well as spatiotemporal modelling of embryo development. In this chapter we provide an overview of GPDS approaches and highlight their applications in the biological sciences, with accompanying tutorials available as a Jupyter notebook from https://github.com/cap76/GPDS.
Key wordsNonlinear dynamical systems Gaussian process dynamical systems Causal structure identification Learning from multiple data sources Spatiotemporal models
CAP is supported by the Wellcome Trust (grant 083089/Z/07/Z). IG is supported by EPSRC/BBSRC research grant EP/L016494/1. AS is supported by a 4-year Wellcome Trust PhD Scholarship and Cambridge International Trust Scholarship. DLW acknowledges support from the Engineering and Physical Science Research Council (grant EP/R014337/1).
CAP, IG, and AS BBSRC-EPSRC funded OpenPlant Synthetic Biology Research Centre (BB/L014130/1) through the OpenPlant Fund scheme. CAP and AS also thank M. Azim Surani for his support.
- 4.Zak DE, Gonye GE, Schwaber JS, Doyle FJ (2003) Importance of input perturbations and stochastic gene expression in the reverse engineering of genetic regulatory networks: insights from an identifiability analysis of an in silico network. Genome Res 13(11):2396–2405PubMedPubMedCentralCrossRefGoogle Scholar
- 13.Calderhead B, Girolami M, Lawrence ND (2009) Accelerating Bayesian inference over nonlinear differential equations with Gaussian processes. In: Advances in neural information processing systems, pp 217–224Google Scholar
- 17.Hjort N, Holmes C, Müller P, Walker S (eds) (2010) Bayesian nonparametrics. Cambridge University Press, CambridgeGoogle Scholar
- 18.Murray-Smith R, Johansen TA, Shorten R (1999) On transient dynamics, off-equilibrium behaviour and identification in blended multiple model structures. In: 1999 European control conference (ECC). IEEE, Piscataway, pp 3569–3574Google Scholar
- 19.Murray-Smith R, Girard A (2001) Gaussian process priors with ARMA noise models. In: Irish signals and systems conference, Maynooth, pp 147–152Google Scholar
- 20.Girard A, Rasmussen CE, Candela JQ, Murray-Smith R (2003) Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In: Advances in neural information processing systems, pp 545–552Google Scholar
- 23.Cunningham J, Ghahramani Z, Rasmussen CE (2012) Gaussian processes for time-marked time-series data. In: International conference on artificial intelligence and statistics, pp 255–263Google Scholar
- 25.Frigola R, Chen Y, Rasmussen CE (2014) Variational Gaussian process state-space models. In: Advances in neural information processing systems, pp 3680–3688Google Scholar
- 26.Klemm S et al (2008) Causal structure identification in nonlinear dynamical systems. Department of Engineering, University of Cambridge, CambridgeGoogle Scholar
- 29.Williams CK, Rasmussen CE (2006) Gaussian processes for machine learning, vol 2. MIT Press, CambridgeGoogle Scholar
- 30.Lloyd JR, Duvenaud D, Grosse R, Tenenbaum JB, Ghahramani Z (2014) Automatic construction and natural-language description of nonparametric regression models. Preprint. arXiv:14024304Google Scholar
- 32.Penfold CA, Sybirna A, Reid J, Huang Y, Wernisch L, Grant M, Ghahramani Z, Surani MA (2017) Nonparametric Bayesian inference of transcriptional branching and recombination identifies regulators of early human germ cell development. bioRxiv p 167684Google Scholar
- 36.Solak E, Murray-Smith R, Leithead WE, Leith DJ, Rasmussen CE (2003) Derivative observations in Gaussian process models of dynamic systems. In: Advances in neural information processing systems, pp 1057–1064Google Scholar
- 38.Polanski K, Gao B, Mason SA, Brown P, Ott S, Denby KJ, Wild DL (2017) Bringing numerous methods for expression and promoter analysis to a public cloud computing service. Bioinformatics 1:3Google Scholar
- 49.Hickman R, Hill C, Penfold CA, Breeze E, Bowden L, Moore JD, Zhang P, Jackson A, Cooke E, Bewicke-Copley F et al (2013) A local regulatory network around three NAC transcription factors in stress responses and senescence in Arabidopsis leaves. Plant J 75(1):26–39PubMedPubMedCentralCrossRefGoogle Scholar
- 54.Shervashidze N, Schweitzer P, Leeuwen EJv, Mehlhorn K, Borgwardt KM (2011) Weisfeiler-Lehman graph kernels. J Mach Learn Res 12(Sep):2539–2561Google Scholar