Approximate Bayesian inference in semimechanistic models
 2.9k Downloads
 1 Citations
Abstract
Inference of interaction networks represented by systems of differential equations is a challenging problem in many scientific disciplines. In the present article, we follow a semimechanistic modelling approach based on gradient matching. We investigate the extent to which key factors, including the kinetic model, statistical formulation and numerical methods, impact upon performance at network reconstruction. We emphasize general lessons for computational statisticians when faced with the challenge of model selection, and we assess the accuracy of various alternative paradigms, including recent widely applicable information criteria and different numerical procedures for approximating Bayes factors. We conduct the comparative evaluation with a novel inferential pipeline that systematically disambiguates confounding factors via an ANOVA scheme.
Keywords
Network Inference Semimechanistic model Bayesian model selection Widely applicable information criteria (WAIC, WBIC) Markov jump processes ANOVA Systems biology1 Introduction
This article takes the work of Oates et al. (2014), which won the best paper award at the European Conference on Computational Biology (ECCB) in 2014, further in four respects, related to accuracy and efficiency, model selection, benchmarking, and application expansion. An overview can be found in Fig. 1.
1.1 Accuracy and efficiency
Robust gradient estimation is absolutely critical for semimechanistic modelling. The numerical differentiation proposed in Oates et al. (2014) is known to be susceptible to noise amplification. We here propose the application of Gaussian process (GP) regression and the exploitation of the fact that under fairly general assumptions, GPs are closed under differentiation. Our approach effectively implements a lowpass filter that counteracts the noise amplification of the differentiation step, and we quantify the boost in network reconstruction accuracy that can be achieved in this way. We further critically assess the influence of the parameter prior in the underlying Bayesian hierarchal model. In particular, we compare the gprior with the ridge regression prior (see e.g. Chapter 3 in Marin and Robert (2007)) in the context of the proposed semimechanistic model and demonstrate that the latter significantly improves both accuracy and computational efficiency.
1.2 Model selection
1.3 Benchmarking
Assessing methodological innovation calls for an objective performance evaluation. We have carried out a comprehensive comparative evaluation of the proposed semimechanistic model with 11 stateoftheart network inference methods from computational statistics and machine learning, based on a realistic stochastic process model of the underlying molecular processes (Guerriero et al. 2012) and six distinct regulatory networks with different degrees of connectivity. The analysis of such a complex simulation study is hampered by the influence of various confounding factors, which tend to blur naive graphical representations. We therefore apply an ANOVA scheme, which enables us to disentangle the various effects and thereby extract clear trends and patterns in the results. In this way we can show that by integrating prior domain knowledge via a systemspecific mathematical representation, the resulting semimechanistic model can significantly outperform stateoftheart generic machine learning and computational statistics methods. We provide an application pipeline (Fig. 2) with which a user can objectively quantify this performance gain.
1.4 Application extension
2 Model
2.1 Interaction model
2.2 Rate (or gradient) estimation
2.3 Model and prior distributions
Pseudo Code: MCMC sampling scheme for the iCheMA model
Computational costs for CheMA and iCheMA
Dimensions  1  2  3  4 

CheMA (s)  5.81  13.39  47.57  585.43 
iCheMA (s)  7.61  7.93  9.62  13.39 
3 Inference
3.1 Posterior inference
3.2 Model selection
The ultimate objective of inference is model selection, i.e. to infer the n regulator sets \(\pi _i\) (\(i=1,\ldots ,n\)) of the interaction processes described by Eq. (6). We compare and critically assess five alternative paradigms: the divergence information criterion (DIC), proposed by Spiegelhalter et al. (2002), the ’widely applicable information criterion’ (WAIC), proposed by Watanabe (2010), the ’widely applicable Bayesian information criterion’ (WBIC), proposed by Watanabe (2013), the crossvalidation information criterion (CVIC) , proposed by Gelfand et al. (1992), and the marginal likelihood (also called ’model evidence’). For the ’marginal likelihood’ paradigm, we compare two numerical methods: Chib’s method (Chib), proposed by Chib and Jeliazkov (2001), and thermodynamic integration (TI), proposed by Friel and Pettitt (2008). For the latter, we have further assessed the numerical stabilization of the numerical integration proposed by Friel et al. (2013) in a simulation study, which can be found in Appendix 3.
3.3 Posterior probabilities of interactions
3.4 Network inference scoring scheme
For the CheMA model (Oates et al. (2014)) and the novel model variant (iCheMA) the marginal interaction posterior probabilities in Eq. (25) can be used to rank the network interactions in descending order. If the true regulatory network is known, this ranking defines the receiver operating characteristic (ROC) curve (Hanley and McNeil 1982), where for all possible threshold values, the sensitivity (or recall) is plotted against the complementary specificity. By numerical integration we then obtain the area under the curve (AUROC) as a global measure of network reconstruction accuracy, where larger values indicate a better performance, starting from AUROC = 0.5 to indicate random expectation, to AUROC = 1 for perfect network reconstruction. A second well established measure that is closely related to the AUROC score is the area under the precision recall curve (AUPREC), which is the area enclosed by the curve defined by the precision plotted against the recall (Davis and Goadrich 2006). AUPREC scores have the advantage over AUROC scores that the influence of large quantities in false positives can be identified better through the precision. These two scores (AUROC and AUPREC) are widely applied in the systems biology community to score the global network reconstructions accuracy (Marbach et al. 2012).
3.5 Causal sufficiency
4 Evaluation
4.1 ANOVA

\({\mathbf{N_n}}\) (\(n=1,\ldots ,6\)) is the effect of the Network structure, see Fig. 4 for the six structures,

\({\mathbf{K_k}}\) (\(k=1,\ldots 4\)) is the effect of the GP Kernel, used for the computation of the analytical gradient (’RBF’, ’PER’, ’MAT32’, or ’MAT52’),

\({\mathbf{G_g}}\) (\(g=1,2\)) is the effect of the Gradient type, i.e. numerical (’difference quotient’) versus analytical (’GP interpolation’),

\({\mathbf{M_m}}\) (\(m=1,\ldots ,12\)) is the effect of the inference Method, i.e. the iCheMA model and eleven competing methods, listed in Table 5,

and \({\mathbf{P_p}}\) (\(p=1,2\)) is the effect of the Prior on \({\mathbf{V}}_i\) (i.e. the gprior from Eq. (11) versus the ridge regression prior from Eq. (12).
4.2 Simulation details
In our study we have included the four IC: DIC, WAIC, CVIC, and WBIC, and we have employed four different numerical methods to approximate the marginal likelihood: Chib’s original method (Chib and Jeliazkov 2001) (Chib naive), a stabilized version of Chib’s method (Chib), proposed here, thermodynamic integration with the trapezoid rule (TI), see Eq. (38), and thermodynamic integration with the numerical correction (TISTAB), see Eq. (41).
The computation of the model selection scores (DIC, WAIC, CVIC, WBIC and the MLL with both Chib’s method and TI) requires MCMC simulations; pseudo code can be obtained from Table 1. We monitored the convergence of the MCMC chains with standard convergence diagnostics based on potential scale reduction factors (Gelman and Rubin 1992). The application of Chib’s method is based on the selection of a particular ’pivot’ parameter vector \(\tilde{\varvec{\theta }}\), as described under Eq. (34). Initially, we chose \(\tilde{\varvec{\theta }}\) to be the MAP (maximum a posteriori) estimator from the entire MCMC simulation. This was found to lead to numerical instabilities, though, as seen from the right panel of Fig. 8. We found a way to numerically stabilize Chib’s method, which we discuss in Appendix 1. We also found that the transition from TI to TISTAB proposed by Friel et al. (2013) can be counterproductive, as seen from Table 4. We have investigated this unexpected effect in more detail in a simulation study in Appendix 3. We also found that the improved variant of CheMA (iCheMA) substantially reduces the computational costs of the MCMCbased inference, as shown in Table 2 and discussed in more detail in Appendix 1.
5 Data
5.1 Synthetic data
Parameter settings for the synthetic data of Sect. 5.1. The parameters \(v_{0,y}\) and \(v_{2,y}\) are the maximum reaction rates
Parameter  1  2  3  4  5  6  7  8  9 

\(v_{0,y}\)  1  0.5  1.5  2  0.2  2  3  0.2  0.1 
\(v_{2,y}\)  1  1  1  1  1  0.2  0.1  2  2 
Score differences for iCheMA, applied with thermodynamic integration: TI versus TISTAB
Spreadfactor (sf)  \((\delta ^2=sf, \nu =0.5)\)  \((\delta ^2=sf, \nu =sf)\)  

TI  TISTAB  TI  TISTAB  
0.01  39  33  166  156 
0.1  38  33  76  68 
1  27  23  27  22 
10  8  5  8  6 
100  4  5  4  4 
10000  5  119  5  119 
1e+08  52  3.1e+10  52  1.7e+08 
1e+16  52  9.8e+24  53  2.7e+24 
1e+20  52  3.2e+32  51  3.3e+34 
5.2 Realistic data
For an objective model evaluation, we use the benchmark data from Aderhold et al. (2014), which contain simulated gene expression and protein concentration time series for ten genes in the circadian clock of A. thaliana. The time series correspond to measurements in 2h intervals over 24 h, and are repeated 11 times, corresponding to different experimental conditions. We use time series generated from six variants of the circadian gene regulatory network in A. thaliana, shown in Fig. 4; these variants correspond to different protein, i.e. transcription factor, knockdowns. The molecular interactions in these graphs were modelled as individual discrete events with a Markov jump process, using the mathematical formulation from Guerriero et al. (2012) and practically simulated with Biopepa (Ciocchetta and Hillston 2009), based on the Gillespie algorithm (Gillespie 1977). For large volumes of cells, the concentration time series converge to the solutions of ODEs of the form in Eq. (5). However, for smaller volumes, time series simulated with Markov jump processes contain stochastic fluctuations that mimic the mismatch between the ODE model and genuine molecular processes, and the volume size was chosen as described in Guerriero et al. (2012) so as to match the fluctuations observed in real quantitative reverse transcription polymerase chain reaction (qRTPCR) profiles. For the network reconstruction task, we only kept the gene expression time series and discarded the protein concentrations; this emulates the common problem of systematically missing values for certain types of molecular species (in our case: protein concentrations).
5.3 Real data
Stateoftheart network reconstruction methods
Abbreviation  Full name 

HBR  Hierarchical Bayesian regression 
Lasso  Sparse regression with L\(_1\) penalty 
ElasticNet  Sparse regression with L\(_1\) 
and L\(_2\) penalty  
Tesla  Sparse regression with timevarying 
changepoints  
GGM  Graphical Gaussian models 
SBR  Sparse Bayesian regression 
BSA  Bayesian spline autoregression 
SSM  Statespace models 
GP  Gaussian processes 
MBN  Mixture Bayesian networks 
BGe  Gaussian Bayesian networks 
6 Results
This section discusses the effect of gradient approximation (Sect. 6.1), the influence of the prior (Sect. 6.2), the accuracy of model selection (Sect. 6.3), the relative performance compared to the current state of the art (Sect. 6.4), and the problem of model mismatch (Sect. 6.5).
6.1 Evaluating the effect of the gradient computation
6.2 Evaluating the influence of the parameter prior
6.3 Model selection
The parameter priors in Eqs. (10) and (12) are Gaussians centred on \(\mu =1\), with different variances. For low spreadfactors sf (i.e. for low prior variances), both groups of criteria (IC and MLL) clearly favour the true model, since the prior ’pulls’ the spurious interaction parameter from its true value of zero towards a wrong value of \(\mu =1\). As the prior becomes more diffuse, the score differences become less pronounced, but still select the true model up to spread factors of about \(sf\approx 100\). As the prior becomes more diffuse, with the spread factor exceeding \(sf> 100\), the IC occasionally fail to select the correct model. A more detailed representation focusing on the IC and larger spread factors is given in Fig. 9. It is seen that among the IC it is mainly DIC that repeatedly fails to select the true model (the central interquartile range of the score difference distribution, between the first and third quartile, includes negative values), whereas for the other information criteria the selection of the wrong model is relatively unlikely (the central interquartile range does not include negative values). Two of the four MLL methods, namely TI8 and Chib, start to increasingly favour the true model as the spread factor further increases beyond \(sf> 1000\). This is a consequence of Lindley’s paradox, whereby MLL increasingly penalizes the overcomplex model for increasingly vague priors. TI4, in principle, shows a very similar trend but the score difference is lower than for TI8, indicating that the choice of the discretization points (i.e. the applied temperature ladder) implied by the power \(m\in \{4,8\}\) in Eq. (42) can critically affect the result.
Among the MLL methods, the naive application of Chib’s method, Chib naive, as proposed in Chib and Jeliazkov (2001), shows a completely different pattern and systematically fails to select the correct model for large spread factors. A theoretical explanation for this instability is provided in Appendix 1. We achieve a stabilization of Chib’s method, referred to as Chib, by selecting the pivot parameter set \(\tilde{\varvec{\theta }}\) with the highest posterior probability within the set of actually sampled parameters (excluding the parameter states from the burnin phase). We refer to Appendix 1 for details.
6.4 Comparison with stateoftheart network reconstruction methods
6.5 Model selection for network identification
Rank difference for Chib’s MLL and different information criteria
Additive terms  Explicit product terms  

rank diff. (se)  rank diff. (se)  
Chib’s MLL  5.4 (0.35)  2.2 (0.18) 
DIC  7.0 (0.26)  6.2 (0.32) 
WAIC  6.8 (0.37)  1.4 (0.11) 
CVIC  6.8 (0.35)  1.6 (0.10) 
WBIC  7.8 (0.51)  3.0 (0.13) 
where the symbols have the same meaning as in Eq. (6); see Pokhilko et al. (2010) for explicit mathematical expressions.
We included prior knowledge about molecular complex formation and expanded the iCheMA model accordingly to include the corresponding product of Michaelis–Menten terms in Eq. (6). We then computed the MLL as before and repeated the analysis. The results are shown in Fig. 12b and demonstrate that, by making the model more faithful to the datagenerating process, model selection has substantially improved: the average ranks of the true network (shown in the diagonal elements of the matrix) are never worse than a value of 2 (out of 6), and reach the optimal value of 1 in 50 % of the cases. The corresponding results for the various IC are displayed in Table 6. When restricting the model to additive terms (greater mismatch between data and model), MLL outperforms the IC, as presumably expected. Interestingly, when reducing the mismatch between data and model by including the product terms, two IC, WAIC and CVIC, are competitive with MLL and perform even slightly better. DIC is substantially outperformed by MLL and the competitive information criteria WAIC and CVIC; these findings are consistent with the earlier results from Sect. 6.3. The performance of WBIC lies between WAIC /CVIC and DIC.
7 Discussion
Automatic inference of regulatory structures in our study is based on a bipartition of the variables into putative regulators (transcription factor proteins) and regulatees (mRNAs) and a physical model of the regulation processes based on Michaelis–Menten kinetics. This effectively conditions the inference on assumed prior knowledge and is, as such, contingent on the accuracy of these assumptions. In our study we have allowed for a mismatch between the assumed prior knowledge and the ground truth. First, the assumed model is deterministic, defined in terms of ordinary differential equations, while the datageneration mechanism is stochastic (simulated with a Markov jump process). Second, the interaction model is additive, while the datageneration mechanism includes multiplicative terms. Third, we have allowed for the possibility of missing data (missing protein concentrations). Our results show that due to this mismatch, the true causal system cannot be learned (see e.g. Fig. 11, which shows AUROC and AUPREC scores clearly below 1). However, our work suggests that causal inference based on a simplified physical model achieves significantly better results than inference based on an empirical model. (See Fig. 11. Note that only iCheMA is based on a physical model; all the other methods use machine learning methods based on empirical modelling). Our study also quantifies how the performance improves as the physical model is made more realistic; see Fig. 12.
Semimechanistic modelling is a topical research area, as evidenced by the recent publication by Babtie et al. (2014). Our article complements this work by addressing a different research question. The objective of Babtie et al. is to investigate how uncertainty about the model structure (i.e. the interaction network defined by the ODEs) impacts on parameter uncertainty, and how parameter confidence or credible intervals are systematically underestimated when not allowing for model uncertainty. Our article addresses questions that have not been investigated by Babtie et al. : how accurate is the network reconstruction or ODE model selection, which factors determine it, and to what extent? Our work has been motivated by Oates et al. (2014), and we have shown that the authors’ seminal work, which won the best paper award at ECCB 2014, can be further improved with two methodological modifications: a different gradient computation, based on GP regression, and a different parameter prior, replacing the gprior used by Oates et al. by the ridge regression prior more commonly used in machine learning. These two priors have e.g. been discussed in Chapter 3 of Marin and Robert (2007), but without any conclusions about their relative merits. Our study provides empirical evidence for the superiority of the ridge regression prior (Fig. 6) in the context of semimechanistic models, and a theoretical explanation for the reason behind it (Sect. 6.2). Table 2 shows that the new iCheMA variant reduces the computational costs drastically; a theoretical explanation for the reduction is provided in Appendix 1.
Our work has led to deeper insight into the strengths and shortcomings of different scoring schemes and numerical procedures. We have investigated the effectiveness of DIC as a method of semimechanistic model ranking. DIC is routinely used for model selection in Winbugs (Lunn et al. 2012), and the paper in which it was introduced (Spiegelhalter et al. 2002) has got over 5000 citations at the time of the submission of the present article. However, our findings that in the context of network learning DIC often prefers a model with additional spurious complexity over the true model (Fig. 9) questions its viability as a selection tool for semimechanistic models.
We have further compared different methods for computing the MLL. We have shown that Chib’s method (Chib and Jeliazkov 2001) can lead to numerical instabilities. These instabilities have also been reported by Lunn et al. (2012) and are presumably the reason why Chib’s method is not available in Winbugs. We have identified the cause of the numerical instability (see Appendix 1), and propose a modified implementation that substantially improves the robustness and practical viability of the method. This modification appears to be even preferable to thermodynamic integration, which at higher computational complexity shows noticeable variation with the discretization of the integral in Eq. (37) and the number of ‘temperatures’. It has been suggested (Friel et al. 2013) that the accuracy of thermodynamic integration can be improved by including secondorder terms in the trapezium sum—see Eq. (41)—but the findings of our study are that this correction is no panacea for a general improvement in numerical accuracy, and that there are scenarios where the secondorder correction can be counterproductive. It has come to our attention that a more recent method for improving thermodynamic integration has been proposed by Oates et al. (2016). Including this method in our benchmark study would be an interesting project for future research.
Due to the high computational complexity and potential instability of the MLL computation, several articles in the recent computational statistics literature have investigated faster approximate but numerically more stable alternatives. In our work, we have included WAIC, CVIC and WBIC as alternatives to MLL and evaluated their potential for model selection in two benchmark studies. It turns out that these more recent IC significantly outperform DIC (Figs. 8 and 9), and that WAIC and CVIC are compatible in performance with model selection based on the MLL (Table 6). It is advisable that several independent studies for different systems be carried out by independent researchers in the near future, but our study points to the possibility that statistical model selection in complex systems may be feasible at a comparable degree of accuracy but with substantially lower computational costs than with MLL.
Footnotes
 1.
Within this paper we refer to the resulting gradients as the numerical (Oates et al. 2014) and the analytical gradient, proposed here.
 2.
Using the whole distribution, as on page 58 of Holsclaw et al. (2013), would give us additional indication of uncertainty akin to a distribution of measurement errors. Due to the increased computational costs (additional matrix operations) this has not been attempted, though.
 3.
In the original CheMA model, a different decay term is used (for protein dephosphorylation), and no inhibitory interactions are included for numerical reasons. The linear decay term in Eq. (6) is more appropriate for transcriptional regulation, and the inclusion of inhibitory interactions achieves better results (as shown in Appendix 2).
 4.
This issue applies to the CheMA model, as proposed by Oates et al. and to the new improved variant (iCheMA), proposed here. For both models we implement the exact uncollapsed rather than the approximate collapsed Gibbs sampling step for \(\sigma _i^2\).
 5.
The truncation of \({\mathbf{V}}_i\) is then automatically properly taken into account, as it will only be conditioned on \({\mathbf{V}}_i\) that fulfil the required constraint.
 6.
The HR is equal to 1, as the proposal moves are symmetric. New candidates \({\mathbf{K}}_i^{\star }\) with negative elements are never accepted, as they have the prior probability zero, \(P({\mathbf{K}}_i^{\star })=0\).
 7.
In the realistic data study, we assume gene measurements to take place in intervals of \(\delta _t = 2\) h. This mimics typical rates for the qRTPCR sampling experiments in Flis et al. (2015).
 8.
\(\int _a^b f(x) dx = {(ba)} \frac{f(b)+f(a)}{2}  \frac{(ba)^3}{12} f``(c)\) for some \(c\in [a,b]\).
 9.
Loosely speaking, under ’ideal circumstances’ the MCMC sample should contain parameters with high posterior probabilities, while parameters deviating from the sampled ones, such as \(\tilde{\varvec{\theta }}\), should be assumed to be very unlikely.
 10.
Note that \(\mu =1\) is the prior expectation of \(v_{u,i}\) and \(v_{0,i}\), see Eq. (11).
 11.
In realworld applications the maximal number of regulators for each species can be restricted to a maximal ’fanin’ (or ’indegree’) value, but this fanin is rarely set to to a value lower than 3. Hence, taking the degradation process (i.e. the first column of the design matrix \({\mathbf{D}}_i\)) into account, the MCMC inference requires at least 4dimensional multivariate Gaussian integrals to be computed. For completeness, we note that substantially more effective algorithms for computing multivariate Gaussian CDFs are only available for the bivariate and trivariate case, see, e.g., the algorithms in Drezner and Weslowsky (1989) and Genz (2004).
 12.
The score in Eq. (60) is the deviation between the true and an approximated log Bayes factor.
 13.
A full list of growing conditions and strains is available from the MocklerLab (Mockler et al. 2007) at ftp://www.mocklerlab.org/diurnal/expression_data/Arabidopsis_thaliana_conditions.xlsx
 14.
The equation for protein translation and degradation of P only depends on the status of light and darkness. Thus, protein P is produced throughout darkness with a sharp decline at dawn. By multiplication with a continuous (or binary) light indicator in the range [0,1] we obtain a sharp peak at dawn for Light*P. This peak might act as an initial ‘start of day’ impulse for some of the clock genes.
Notes
Acknowledgments
This project was partially supported by a grant from the Engineering and Physical Sciences Research Council (EPSRC) of the UK, grant reference number EP/L020319/1, and a grant from the European Commission FP7, “Timet”, grant agreement 245143. We would like to thank Andrew Millar for helpful discussions.
References
 Aderhold, A., Husmeier, D., Grzegorczyk, M.: Statistical inference of regulatory networks for circadian regulation. Stat. Appl. Genet. Mol. Biol. 13(3), 227–273 (2014)MathSciNetMATHGoogle Scholar
 Babtie, A.C., Kirk, P., Stumpf, M.P.H.: Topological sensitivity analysis for systems biology. PNAS 111(51), 18,507–18,512 (2014)MathSciNetCrossRefMATHGoogle Scholar
 Barenco, M., Tomescu, D., Brewer, D., Callard, R., Stark, J., Hubank, M.: Ranked prediction of p53 targets using hidden variable dynamic modeling. Genome Biol. 7(3), r25 (2006)CrossRefGoogle Scholar
 Brandt, S.: Data Analysis: Statistical and Computational Methods for Scientists and Engineers. Springer, New York (1999)CrossRefGoogle Scholar
 Chatfield, C.: The Analysis of Time Series. Chapman & Hall, Boca Raton (1989). iSBN 0412318202MATHGoogle Scholar
 Chib, S., Jeliazkov, I.: Marginal likelihood from the MetropolisHastings output. J. Am. Stat. Assoc. 96(453), 270–281 (2001)MathSciNetCrossRefMATHGoogle Scholar
 Ciocchetta, F., Hillston, J.: BioPEPA: a framework for the modelling and analysis of biological systems. Theor. Comput. Sci. 410(33), 3065–3084 (2009)MathSciNetCrossRefMATHGoogle Scholar
 Davis, J., Goadrich, M.: The relationship between precisionrecall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), ACM, pp. 233–240 (2006)Google Scholar
 De Jong, H.: Modeling and simulation of genetic regulatory systems: a literature review. J. Comput. Biol. 9(1), 67–103 (2002)CrossRefGoogle Scholar
 Drezner, Z., Weslowsky, G.: On the computation of the bivariate normal integral. J. Stat. Comput. Simul. 35, 101–107 (1989)MathSciNetCrossRefGoogle Scholar
 Flis, A., Fernández, A.P., Zielinski, T., Mengin, V., Sulpice, R., Stratford, K., Hume, A., Pokhilko, A., Southern, M.M., Seaton, D.D., et al.: Defining the robust behaviour of the plant clock gene circuit with absolute rna timeseries and open infrastructure. Open Biol. 5(10), 150,042 (2015)CrossRefGoogle Scholar
 Fogelmark, K., Troein, C.: Rethinking transcriptional activation in the arabidopsis circadian clock. PLoS Comput. Biol. 10(7), e1003,705 (2014)CrossRefGoogle Scholar
 Friel, N., Pettitt, A.: Marginal likelihood estimation via power posteriors. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 70, 589–607 (2008)MathSciNetCrossRefMATHGoogle Scholar
 Friel, N., Hurn, M., Wyse, J.: Improving power posterior estimation of statistical evidence. Stati. Comput. 24(5), 709–723 (2013)Google Scholar
 Gelfand, A.E., Dey, D.K., Chang, H.: Model determination using predictive distributions with implementation via samplingbased methods. Bayesian Stat. 4, 147–167 (1992)MathSciNetGoogle Scholar
 Gelman, A., Rubin, D.B.: Inference from iterative simulation using multiple sequences. Stat. Sci. 1, 457–472 (1992)CrossRefGoogle Scholar
 Gelman, A., Carling, J.B., Stern, H., Dunson, D.B., Vehtari, A., Rubin, D.: Bayesian Data Analysis, 3rd edn. Chapman & Hall, Boca Raton (2014a)MATHGoogle Scholar
 Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24(6), 997–1016 (2014b)MathSciNetCrossRefMATHGoogle Scholar
 Genz, A.: Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Stat. Comput. 14(3), 251–260 (2004)MathSciNetCrossRefGoogle Scholar
 Genz, A., Bretz, F.: Numerical computation of multivariate t probabilities with application to power calculation of multiple contrasts. J. Stat. Comput. Simul. 63, 361–378 (1999)MathSciNetCrossRefMATHGoogle Scholar
 Genz, A., Bretz, F.: Comparison of methods for the computation of multivariate t probabilities. J. Comput. Graph. Stat. 11(4), 950–971 (2002)MathSciNetCrossRefGoogle Scholar
 Gillespie, D.T.: Exact stochastic simulation of coupled chemical reactions. J. Phys. Chem. 81(25), 2340–2361 (1977)CrossRefGoogle Scholar
 Grzegorczyk, M., Aderhold, A., Husmeier, D.: Inferring bidirectional interactions between circadian clock genes and metabolism with model ensembles. Stat. Appl. Genet. Mol. Biol. 14(2), 143–167 (2015)MathSciNetMATHGoogle Scholar
 Guerriero, M.L., Pokhilko, A., Fernández, A.P., Halliday, K.J., Millar, A.J., Hillston, J.: Stochastic properties of the plant circadian clock. J. R. Soc. Interface 9(69), 744–756 (2012)CrossRefGoogle Scholar
 Hanley, J.A., McNeil, B.J.: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143(1), 29–36 (1982)CrossRefGoogle Scholar
 Holsclaw, T., Sansó, B., Lee, H.K., Heitmann, K., Habib, S., Higdon, D., Alam, U.: Gaussian process modeling of derivative curves. Technometrics 55(1), 57–67 (2013)MathSciNetCrossRefGoogle Scholar
 Lawrence, N.D., Girolami, M., Rattray, M., Sanguinetti, G.: Learning and Inference in Computational Systems Biology. Computational Molecular Biology. MIT Press, Cambridge (2010)MATHGoogle Scholar
 Lindley, D.V.: A statistical paradox. Biometrika 1, 187–192 (1957)CrossRefMATHGoogle Scholar
 Lunn, D., Jackson, C., Best, N., Thomas, A., Spiegelhalter, D.: The BUGS Book: A Practical Introduction to Bayesian Analysis. Chapman & Hall, Boca Raton (2012)MATHGoogle Scholar
 Marbach, D., Costello, J.C., Küffner, R., Vega, N.M., Prill, R.J., Camacho, D.M., Allison, K.R., Kellis, M., Collins, J.J., Stolovitzky, G., et al.: Wisdom of crowds for robust gene network inference. Nat. Methods 9(8), 796–804 (2012)CrossRefGoogle Scholar
 Marin, J.M., Robert, C.P.: Bayesian Core: A Practical Approach to Computational Bayesian Statistics. Springer, New York (2007)MATHGoogle Scholar
 Mockler, T., Michael, T., Priest, H., Shen, R., Sullivan, C., Givan, S., McEntee, C., Kay, S., Chory, J.: The DIURNAL project: DIURNAL and circadian expression profiling, modelbased pattern matching, and promoter analysis. In: Cold Spring Harbor Symposia on Quantitative Biology, Cold Spring Harbor Laboratory Press, vol. 72, pp. 353–363 (2007)Google Scholar
 Murphy, K.: Machine Learning: A Probabilistic Perspective. MIT Press, Cambridge (2012)MATHGoogle Scholar
 Oates, C.J., Dondelinger, F., Bayani, N., Korkola, J., Gray, J.W., Mukherjee, S.: Causal network inference using biochemical kinetics. Bioinformatics 30(17), i468–i474 (2014)CrossRefGoogle Scholar
 Oates, C.J., Papamarkou, T., Girolami, M.: The controlled thermodynamic integral for Bayesian model evidence evaluation. J. Am. Stat. Assoc. (2016). doi: 10.1080/01621459.2015.1021006 MathSciNetGoogle Scholar
 Ota, K., Yamada, T., Yamanishi, Y., Goto, S., Kanehisa, M.: Comprehensive analysis of delay in transcriptional regulation using expression profiles. Genome Inform. 14, 302–303 (2003)Google Scholar
 Pokhilko, A., Hodge, S., Stratford, K., Knox, K., Edwards, K., Thomson, A., Mizuno, T., Millar, A.: Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model. Mol. Syst. Biol. 6(1), 416 (2010)Google Scholar
 Pokhilko, A., Fernández, A., Edwards, K., Southern, M., Halliday, K., Millar, A.: The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops. Mol. Syst. Biol. 8, 574 (2012)CrossRefGoogle Scholar
 Pokhilko, A., Mas, P., Millar, A.: Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs. BMC Syst. Biol. 7(1), 1–12 (2013)CrossRefGoogle Scholar
 Ramsay, J., Hooker, G., Campbell, D., Cao, J.: Parameter estimation for differential equations: a generalized smoothing approach. J. R. Stat. Soc.: Ser. B 69(5), 741–796 (2007)Google Scholar
 Rasmussen, C.E.: Evaluation of Gaussian processes and other methods for nonlinear regression. PhD thesis, University of Toronto, Dept. of Computer Science (1996)Google Scholar
 Rasmussen, C.E., Williams, C.K.I.: Gaussian Processes for Machine Learning. MIT Press, Cambridge (2006)MATHGoogle Scholar
 Rasmussen CE, Neal RM, Hinton GE, van Camp D, Revow M, Ghahramani Z, Kustra R, Tibshirani R (1996) The DELVE manual. http://www.cs.toronto.edu/~delve
 Smolen, P., Baxter, D.A., Byrne, J.H.: Modeling transcriptional control in gene networks—methods, recent results, and future directions. Bull. Math. Biol. 62(2), 247–292 (2000)CrossRefMATHGoogle Scholar
 Solak E, MurraySmith R, Leithead WE, Leith DJ, Rasmussen CE (2002) Derivative observations in Gaussian process models of dynamic systems. In: Proceedings of Advances in Neural Information Processing Systems (NIPS), MIT Press, Cambridge (2002)Google Scholar
 Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 64(4), 583–639 (2002)MathSciNetCrossRefMATHGoogle Scholar
 Toni, T., Welch, D., Strelkowa, N., Ipsen, A., Stumpf, M.P.: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J. R. Soc. Interface 6(31), 187–202 (2009)CrossRefGoogle Scholar
 Vanhatalo, J., Riihimäki, J., Hartikainen, J., Jylänki, P., Tolvanen, V., Vehtari, A.: GPstuff: Bayesian modeling with Gaussian processes. J. Mach. Learn. Res. 14(1), 1175–1179 (2013)MathSciNetMATHGoogle Scholar
 Vyshemirsky, V., Girolami, M.A.: Bayesian ranking of biochemical system models. Bioinformatics 24(6), 833–839 (2008)CrossRefGoogle Scholar
 Watanabe, S.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11, 3571–3594 (2010)MathSciNetMATHGoogle Scholar
 Watanabe, S.: A widely applicable Bayesian information criterion. J. Mach. Learn. Res. 14, 867–897 (2013)MathSciNetMATHGoogle Scholar
 Yang, Z.: Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. J. Mol. Evol. 39, 306–314 (1994)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.