Optimal experiment design for model selection in biochemical networks

Vanlier, Joep; Tiemann, Christian A; Hilbers, Peter AJ; van Riel, Natal AW

doi:10.1186/1752-0509-8-20

Optimal experiment design for model selection in biochemical networks

Methodology article
Open access
Published: 20 February 2014

Volume 8, article number 20, (2014)
Cite this article

Download PDF

You have full access to this open access article

BMC Systems Biology Aims and scope

Optimal experiment design for model selection in biochemical networks

Download PDF

Joep Vanlier^1,2,
Christian A Tiemann^1,2,
Peter AJ Hilbers^1,2 &
…
Natal AW van Riel^1,2

7153 Accesses
25 Citations
1 Altmetric
Explore all metrics

Abstract

Background

Mathematical modeling is often used to formalize hypotheses on how a biochemical network operates by discriminating between competing models. Bayesian model selection offers a way to determine the amount of evidence that data provides to support one model over the other while favoring simple models. In practice, the amount of experimental data is often insufficient to make a clear distinction between competing models. Often one would like to perform a new experiment which would discriminate between competing hypotheses.

Results

We developed a novel method to perform Optimal Experiment Design to predict which experiments would most effectively allow model selection. A Bayesian approach is applied to infer model parameter distributions. These distributions are sampled and used to simulate from multivariate predictive densities. The method is based on a k-Nearest Neighbor estimate of the Jensen Shannon divergence between the multivariate predictive densities of competing models.

Conclusions

We show that the method successfully uses predictive differences to enable model selection by applying it to several test cases. Because the design criterion is based on predictive distributions, which can be computed for a wide range of model quantities, the approach is very flexible. The method reveals specific combinations of experiments which improve discriminability even in cases where data is scarce. The proposed approach can be used in conjunction with existing Bayesian methodologies where (approximate) posteriors have been determined, making use of relations that exist within the inferred posteriors.

Empirically classifying network mechanisms

Article Open access 15 October 2021

Experimental design and model reduction in systems biology

Article 27 October 2018

Construction of feasible and accurate kinetic models of metabolism: A Bayesian approach

Article Open access 15 July 2016

Background

Developing computational models of biochemical networks is complicated by the complexity of their interaction mechanisms [1–8]. Typically, hypotheses on how the system operates are formalized in the form of computational models [9–12]. These models are subsequently calibrated to experimental data using inferential techniques [13–19]. Despite the steady increase in data availability originating from new quantitative experimental techniques, the modeler is often faced with the problem that several different model topologies can describe the measured data to an acceptable degree [20–22]. The uncertainty associated with the predictions hinders the investigator when trying to make a clear distinction between competing models. In such cases, additional data is required. Optimal Experiment Design (OED) methods can be used to determine which experiments would be most useful [23]. These methods typically involve specifying an optimality criterion or design aim and finding the experiment that most effectively attains this goal while considering the current parameter uncertainty. Existing methods of OED for model selection are usually based on assuming an uncertainty distribution around best parameter estimates [24, 25] or model linearization [26]. Due to the non-linearity of the model and the non-Gaussian shape of the parameter distribution, these methods are rarely appropriate for Systems Biology models [27] (See Figure 1 for an example of the effect of model linearization, and how it can skew predictive distributions in cases of large parameter uncertainty). In this work, we employ a Bayesian approach using the Posterior Predictive Distribution (PPD) which directly reflects the prediction uncertainty and accounts for both model non-linearity and non-Gaussianity of the parameter distribution. PPDs are defined as distributions of new observations conditioned on the observed data. Samples from the PPD can be obtained by drawing from the posterior parameter probability distribution and simulating predictions for each parameter set. By simulating a sample from the PPDs for all experimentally accessible moieties and fluxes, differences between models can be explored [28].

Previously, predictive distributions have been used to perform experiment design targeted at reducing the uncertainty of specific predictions [29–31]. In the field of machine learning, optimal experiment design based on information-theoretic considerations is typically referred to as active learning [32]. In the neurosciences, the Bayesian inversion and selection of nonlinear states space models is known as dynamic causal modelling (DCM). Although DCM is dominated by variational (approximate) Bayesian model inversion - the basic problems and ensuing model selection issues are identical to the issues considered in this work. In DCM, the issue of optimising experimental design focuses on the Laplace-Chernoff risk for model selection and its relationship with classical design optimality criteria. Daunizeau et al. (2011) consider the problem of detecting feedback connections in neuronal networks and how this depends upon the duration of design stimulation [33]. We will consider a similar problem in biochemical networks - in terms of identifying molecular interactions and when to sample data. We present a method to use samples from simulated predictive distributions for selecting experiments useful for model selection. Considering the increased use of Bayesian inference in the field [14, 34–39], this approach is particularly timely since it enables investigators to extract additional information from their inferences.

In a Bayesian setting, model selection is typically based on the Bayes factor, which measures the amount of evidence the data provides for one model over another [40, 41]. For every pair of models, a Bayes factor can be computed, defined as the ratio of their integrated likelihoods. One advantage of the Bayes factor is that it automatically penalizes unnecessary model complexity in light of the experimental data. It therefore reduces the risk of unwarranted model rejections. This penalization occurs because more parameters or unnecessarily wide priors lead to a lower weighting of the high likelihood region. This is illustrated in Figure 2.

What the Bayesian selection methodology does not provide, however, is a means to determine which experiment would optimally increase the separation between models. Determining which measurements to perform in order to optimally increase the Bayes factor in favor of the correct model is a difficult task. We propose a method which allows ranking combinations of new experiments according to their efficacy at increasing the Bayes factors which point to the correct model. Predictions whose distributions do not overlap between competing models are good measurement candidates [42, 43]. Often distributions for a single prediction show a large degree of overlap, hampering a decisive outcome. Fortunately, PPDs also contain information on how model predictions are related to each other. The relations between the different prediction uncertainties depend on both the data and the model. Differences in these inter-prediction relations between competing models can be probed and used (see Figure 3). We quantify these differences in predictive distributions by means of the Jensen Shannon divergence (JSD) [44].

There are many design parameters that one could optimize. In this paper, we focus on a simple example: namely, which system variable should be measured and at which time point. We argue that by measuring those time points at which the models show the largest difference in their predictive distributions, large improvements in the Bayes factors can be obtained. By applying the methodology on an analytically tractable model, we show that the JSD is nearly monotonically related to the predicted change in Bayes factor. Subsequently, the Jensen Shannon divergence is computed between predictions of a non-linear biochemical network. Since each model implies different relations between the predictive distributions, certain combinations of predictions lead to more discriminability than others. The method serves as a good predictor for effective experiments when compared to the obtained Bayes factors after the measurements have been performed. The approach can be used to design multiple experiments simultaneously, revealing benefits that arise from combinations of experiments.

Methods

Consider biochemical networks that can be modeled using a system of ordinary differential equations. These models comprise of equations $f (\vec{x} (t), \vec{u} (t), \vec{p})$ which contain parameters $\vec{p}$ (constant in time), inputs $\vec{u} (t)$ and state variables $\vec{x} (t)$ . Given a set of parameters, inputs, and initial conditions $\vec{x} (0)$ , these equations can be simulated. Measurements $\vec{y} (t) = g (\vec{x} (t), \vec{q}, \vec{r})$ are performed on a subset and/or a combination of the total number of state variables in the model. Measurements are hampered by measurement noise $\vec{ε}$ , while many techniques used in biology necessitate the use of scaling and offset parameters $\vec{r}$ [45]. The vector $\vec{θ}$ , defined as $\vec{θ} = {\vec{p}, \vec{r}, {\vec{x}}_{0}}$ , lists all the required quantities to simulate the model. The parameters $\vec{q}$ determine the experimental design and could include differences in when various responses are measured or the mapping from hidden model states $\vec{x}$ to observed responses $\vec{y}$ . We will refer to these as ‘design parameters’ that are, crucially, distinguished from model parameters $\vec{θ}$ . Design parameters are under experimental control and determine the experimental design. In what follows, we try to optimise the discriminability of models in terms of Bayesian model comparison by optimizing an objective function with respect to $\vec{q}$ . In the examples, we will consider $\vec{q}$ as the timing of extra observations.

To perform inference and experiment design, an error model is required. Considering R time series of length N ₁, N ₂ … N _R, hampered by independent noise, one obtains the equation:

p (y^{D} | \vec{θ}, M_{i}) = \prod_{k = 1}^{R} \prod_{j = 1}^{N_{k}} p (y_{k}^{D} (t_{j}), \vec{θ}, M_{i})

(1)

where M _i indicates a model and y ^D the observed data. The parameters are given by $\vec{θ}$ , while $y_{k}^{D} (t_{j})$ indicates the value of a data point of state variable k at time j, respectively.

Predictive Distributions

Posterior Predictive Distributions (PPDs) are defined as distributions of new observations conditioned on the observed data. They correspond to the predicted distribution of future experiments, considering the current model assumptions and uncertainty. To obtain a sample from these predictive distributions, we propagate the uncertainty from the data to the predictions. By specifying prior distributions on the parameters and applying Bayes rule, it is possible to define a posterior distribution over the parameters. The posterior parameter probability distribution $p (\vec{θ} | y^{D})$ is given by normalizing the likelihood multiplied with the prior to a unit area:

p (\vec{θ} | y^{D}) = \frac{p (y^{D} | \vec{θ}) p (\vec{θ})}{p (y^{D})} = \frac{p (y^{D} | \vec{θ}) p (\vec{θ})}{\int p (y^{D} | \vec{θ}) p (\vec{θ}) d \vec{θ}}

(2)

where $p (y^{D} | \vec{θ})$ is the distribution of the observed data given parameters $\vec{θ}$ . The parameter prior distributions $p (\vec{θ})$ typically reflect the prior uncertainty associated with the parameters. To sample from the posterior parameter distribution, one needs to verify that the posterior distribution is proper. This can be checked by profiling the different parameters and determining whether the likelihood times the prior does not flatten out [28, 46]. After checking whether the posterior distribution of parameters is proper, a sample from this distribution can be obtained using Markov Chain Monte Carlo (MCMC) [22, 28]. MCMC can generate samples from probability distributions whose probability densities are known up to a normalizing factor [47] (see Additional file 1: section S1). A sample of the posterior parameter distribution reflects the uncertainty associated with the parameter values and can subsequently be used to simulate different predictions. The predictive distribution can now be sampled by simulating the model for each of the samples in the posterior parameter distribution and adding noise generated by the associated error model. The latter is required as future observations will also be affected by noise.

Model selection

In a Bayesian setting, model selection is often performed using the Bayes factor [40, 48, 49]. This pivotal quantity in Bayesian model selection expresses the change of relative belief in both models after observing experimental data. By applying Bayes rule to the problem of assigning model probabilities, one obtains:

p (M | y^{D}) = \frac{p (y^{D} | M) p (M)}{p (y^{D})}

(3)

where P(M|y ^D) represents the probability of model M given observed data y ^D, while P(M) and P(y ^D) are the prior probabilities of the model and data, respectively. Rather than explicitly computing the model probability, one usually considers ratios of model probabilities, allowing direct comparison between different models. As the prior model probability can be specified a priori (equal if no preference is given), the only quantity that still requires evaluation is P(y ^D| M), which can be obtained by integrating the likelihood function over the parameters:

p (y^{D} | M) = \int p (y^{D} | M, {\vec{θ}}_{M}) p ({\vec{θ}}_{M} | M) d {\vec{θ}}_{M} .

(4)

The Bayes factor is actually the ratio of these integrated (also named marginal or marginalized) likelihoods and is defined as:

B_{12} = \frac{p (y^{D} | M_{1})}{p (y^{D} | M_{2})} = \frac{\int p (y^{D} | M_{1}, {\vec{θ}}_{M_{1}}) p ({\vec{θ}}_{M_{1}} | M_{1}) d {\vec{θ}}_{M_{1}}}{\int p (y^{D} | M_{2}, {\vec{θ}}_{M_{2}}) p ({\vec{θ}}_{M_{2}} | M_{2}) d {\vec{θ}}_{M_{2}}}

(5)

where M ₁ and M ₂ refer to the different models under consideration. Unnecessarily complex models are implicitly penalized due to the fact that these result in a lower weighting of the high likelihood region, which results in a lower value for the integrated likelihood. This is illustrated in Figure 2. This means that maximizing the model evidence corresponds to maintaining an accurate explanation for the data while minimizing complexity [50].

Bounds can be defined where the Bayes factor value becomes decisive for one model over the other. Typically, a ratio of 100:1 is considered decisive in terms of model selection [40, 51]. In dynamic causal modelling, variational methods predominate, usually under the Laplace assumption. This assumes that the posterior density has a Gaussian shape, which greatly simplifies both the integration problems and numerics. Note that assuming a Gaussian posterior over the parameters does not necessarily mean that the posterior predictive distribution over the data is Gaussian (see Figure 1). Computing the required marginal likelihoods is challenging for non-linear problems where such asymptotic approximations to the posterior distribution are not appropriate. Here one must resort to more advanced methods such as thermodynamic integration (see Additional file 1: section S2) [52] or annealed importance sampling [40]. Though the Bayes factor is a useful method of model selection, determining what to measure in order to improve the Bayes factor in favor of the correct model is a non-trivial problem. As such, it provides a means to perform model selection, but not the optimal selection of data features.

Experiment design

The approach is based on selecting measurements which provide the largest discriminatory power between competing models in terms of their predictive distributions. The design parameter in the proposed methodology corresponds to the choice and timing of a new observation. In other words, we want to determine which observable should be measured when in order to maximize the divergence between the posterior predictive densities, thereby maximally informing our model comparison. This divergence is quantified by means of the Jensen Shannon divergence (JSD) as it provides a measure of the dissimilarity between the probability density functions of competing models. The JSD is defined as the averaged Kullback Leibler divergence D _KL between probability distributions and their mixture:

\begin{matrix} D_{JS} = \sum_{i = 1}^{K} p (M_{i}) D_{KL} (p (y | M_{i}), \sum_{i = 1}^{K} p (M_{i}) p (y | M_{i})) . \end{matrix}

(6)

Here K represents the number of probability densities, p(M _i) the (prior) probability of model M _i and p(y| M _i) the Posterior Predictive Distribution. Additionally, this metric is monotonically related to an upper and lower bound of the classification error rate in clustering problems [33, 53] and is bounded between 0 and 1. In the case where the model that generated the data is in the set of competing models, it is analogous to the mutual information between a new measurement (or sample) coming from a mixture of the candidate models and a model classifier (see Additional file 1: section S3). In this study, we opted for comparing models in a pairwise fashion (K=2). This allows us to determine which models are distinguished by a new experiment. Mutual information has been considered before in the context of experimental design for constraining predictions or parameters of interest [29], but not in the setting of model selection. Though appealing for its properties, estimating the Jensen Shannon divergence for one or more experiments requires integration over the predictive densities, since:

D_{KL} (P, Q) = \int p (x) {log}_{2} (\frac{p (x)}{q (x)}) dx.

(7)

Here P and Q are random variables with p and q their associated densities. Considering that only a sample of the PPDs is available, it is required to obtain a density estimate suitable for integration. Density estimation can be approached in two ways: by Kernel Density Estimation (KDE), or by k-Nearest Neighbor (kNN) density estimation. In Kernel Density Estimation (KDE), an estimate of the density is made by centering normalized kernels on each sample and computing weighted averages. This results in a density estimate with which computations can be performed. The kernels typically contain a bandwidth parameter which is estimated by means of cross validation [54, 55].

For well behaved low dimensional distributions, KDE often performs well. Considering the strongly non-linear nature of both the parameter and predictive distributions, a Gaussian kernel with constant covariance is not appropriate. As the dimensionality and non-uniformity of the problem increases, more and more weights in the KDE become small and estimation accuracy is negatively affected [56]. Additionally, choosing an appropriate bandwidth by means of cross-validation is a computationally expensive procedure to perform for each experimental candidate.

With k-Nearest Neighbor (kNN) density estimation, density is estimated by computing the volume required to include the k nearest neighbors of the current sample [55–57]:

p (\vec{θ}) = \frac{1}{N} \frac{k}{ρ_{k} {(\vec{θ})}^{d} v_{d}}

(8)

In this equation $ρ_{k} (\vec{θ})$ represents the distance to the k ^th nearest neighbor, d the number of dimensions and v _d the volume of the unit ball in $R^{d}$ . Furthermore, N denotes the number of included samples and v _d is given by:

v_{d} = \frac{π^{d / 2}}{Γ (d / 2 + 1)}

(9)

where Γ corresponds to the Gamma function. The advantage of using the kNN estimate is that this estimator adapts to the local sampling density, adjusting its volume where sampling is sparse. Note, however, that, similar to the KDE, this estimator also suffers from a loss of accuracy when estimating high dimensional densities. Fortunately, the number of experiments designed simultaneously, and therefore the dimensionality of the density, is typically low. Consider ${\vec{y}}_{j}^{M_{i}}$ , a vector of predictions simulated with model M _i and parameter set ${\vec{θ}}_{j}$ , where each element of the vector corresponds to a different model prediction. A model prediction is defined as a quantity which can be computed by supplying model M _i with parameter set ${\vec{θ}}_{j}$ (e.g., a predicted value at a certain time point, a difference between predictions or an area under some predicted curve). For OED purposes, these should be quantities that could potentially be measured. The set of these predicted values coming from model M _i shall be referred to as $Ω_{M_{i}}$ . Inserting these quantities, the kNN estimate of the JSD becomes:

D_{js} = \frac{1}{2 N_{M_{1}}} \sum_{i = 1}^{N_{M_{1}}} {log}_{2} (Q_{1, 2} (i)) + \frac{1}{2 N_{M_{2}}} \sum_{i = 1}^{N_{M_{2}}} {log}_{2} (Q_{2, 1} (i))

(10)

with Q _a,b given by

\begin{matrix} Q_{a, b} (i) = \frac{2 N_{M_{b}} r_{k} {({\vec{y}}_{i}^{M_{a}}, Ω^{M_{b}})}^{d}}{N_{M_{b}} r_{k} {({\vec{y}}_{i}^{M_{a}}, Ω^{M_{b}})}^{d} + (N_{M_{a}} - 1) r_{k} {({\vec{y}}_{i}^{M_{a}}, Ω^{M_{a}} ∖ {\vec{y}}_{i}^{M_{a}})}^{d}} \end{matrix}

(11)

Here d denotes the number of elements in ${\vec{y}}_{j}^{M_{i}}$ (the number of predictions included), and $r_{k} (x_{i}, Ω^{M_{j}})$ corresponds to the Euclidean distance to the k ^th nearest neighbor of x _i in $Ω_{M_{j}}$ . Note that the backslash indicates excluding an element from the set. Using this equation, the JSD can straightforwardly be computed for all possible combinations of experiments and used to rank according to how well these experiments would discriminate between the models. A larger value for the JSD indicates an informative experiment. The last step involves sampling several combinations of measurements and determining the set of experiments which have the greatest JSD. The proposed methodology is depicted in Figure 4.

Testing the method: numerical experiments

To demonstrate the method, a series of simulation studies are performed. Since we know which model generated the data, it is possible to compare to the Bayes factor pointing to the true model. After generating an initial data set using the true model, PPDs are sampled for each of the competing models. As the design variable, we consider the timing of a new measurement. Hence, we look at differences between the predictive distributions belonging to the different models at different timepoints. We use a sample of simulated observables at specific timepoints to compute JSD estimates between the different models. We thereby test whether the JSD estimate can be used to compare different potential experiments. The new experimental data is subsequently included and the JSD compared to the change in Bayes factor in favor of the correct model. Note that this new Bayes factor depends on the experimental outcome and that this approach results in a distribution of predicted Bayes factors. A large change in Bayes factor indicates a useful experiment.

Analytic models

The method is applied to a number of linear regression models. Linear regression models are models of the form:

y (t) = \sum_{i = 1}^{L} θ_{i} B_{i} (t) + ε

(12)

where $\vec{θ}$ represents a parameter vector and B constitutes a design matrix with basis functions B _i(t). Given that σ, the standard deviation of the Gaussian observation error ε, is known and the prior distribution over the parameters is a Gaussian with standard deviation ξ, the mean and covariance matrix of the posterior distribution can be computed analytically (see Additional file 1: section S4). Using linear models avoids the difficult numerical integration commonly required to compute the Bayes factor and makes it possible to perform overarching Monte Carlo studies on how these Bayes factors adjust upon including new experimental data. The analytical expressions make it possible to compare the JSD to distributions of the actual Bayes factors for model selection.

Non-linear biochemical networks

To further test the methodology, a series of artificial models based on motifs often observed in signaling systems [58, 59] were implemented (see Additional file 1: section S5 for model equations). Artificial data was simulated for M ₁. Subsequently, inference was performed for all four competing topologies. The difference between each of the models was the origin and point of action of the feedback mechanism (see Figure 5).

Each of the artificial models was able to describe the measured data to an acceptable degree. We used a Gamma distribution with α=1 and β=3 for the prior distributions of the parameters. This prior is relatively non-informative (allowing a large range of parameter values), while not being so vague that the simplest model is always preferred (Lindley’s paradox [40]). Data was obtained using M ₁. Observables were Bp, of which three replicates were measured, and Dp, of which two replicates were measured. These were measured at t=[ 0,2,5,10,20,40,60,100]. All replicates were simulated by adding Gaussian white noise with a standard deviation of 0.03. The parameter values corresponding to the true system were obtained by running Monte Carlo simulations until a visible overshoot above the noise level was observed. Parameter inference was performed using population MCMC with the noise σ as an inferred parameter. As design variables we consider both the choice of which observable(s) to measure and the time point(s) of the measurement.

Computational implementation

All algorithms were implemented in Matlab (Natick, MA). Numerical integration of the differential equations was performed with compiled MEX files using numerical integrators from the SUNDIALS CVode package (Lawrence Livermore National Laboratory, Livermore, CA). Absolute and relative tolerances were set to 10⁻⁸ and 10⁻⁹ respectively. MCMC was performed using a population MCMC approach using N _T=40 chains with a temperature schedule given by $T_{n} = {(\frac{N_{T}}{n})}^{4}$ [52]. This also permitted computation of the Bayes factors between the non-linear models by means of thermodynamic integration. The Gaussian proposal distribution for the MCMC was based on an approximation to the Hessian computed using a Jacobian obtained by simulating the sensitivity equations. After convergence, the chain was thinned to 10000 samples. Since the number of experiments designed simultaneously (and therefore the dimensionality of the prediction vectors) was reasonably small (N _samples>>2^k), the kNN search was performed using k-d trees [60]. The figures in this paper were determined using k=10.

Results and discussion

Analytic models

A series of experiments were performed using linear regression models. To demonstrate the method, consider the following four competing models, where M ₃ is used to generate the data:

\begin{align} y_{M_{1}} & = θ_{1} t \end{align}

(13)

\begin{align} y_{M_{2}} & = θ_{1} t + θ_{2} t^{2} \end{align}

(14)

\begin{align} y_{M_{3}} & = θ_{1} t + θ_{2} t^{2} + θ_{3} sin (\frac{1}{5} t^{3}) \end{align}

(15)

\begin{align} y_{M_{4}} & = θ_{1} t + θ_{2} t^{2} + θ_{3} sin (\frac{1}{5} t^{3}) + θ_{4} sin (2 t) t \end{align}

(16)

The presence of sine waves in M ₃ and M ₄ elicits particularly noticeable patterns in the optimal experiment design matrices. D equidistantly sampled time points were generated as data (including Gaussian additive noise σ). To make sure that the model selection was unsuccessful a priori, these were sampled in a region where the models roughly predict the same behavior. Initially, the Bayes factors were log10(B ₃₁)=2.0439 (decisive), log10(B ₃₂)=−0.0554 (pointing to the wrong model) and log10(B ₃₄)=0.4658 (‘not worth more than a bare mention’ [51]). PPDs were generated for each of the models and used to compute credible predictive intervals that enclose 95% of the predictive density. Bayes factors were computed for each of the models. Since the aim of the design is to successfully select between the models after performing new experiments, the change in Bayes factor in favor of the true underlying model was computed. Since the experimental outcome is not known a priori, a distribution of Bayes factors is predicted:

\begin{matrix} Δ (B_{ab}) : = E [{log}_{10} (\frac{p (y^{D}, y_{n}^{D} | M_{a})}{p (y^{D}, y_{n}^{D} | M_{b})})] - {log}_{10} (\frac{p (y^{D} | M_{a})}{p (y^{D} | M_{b})}) . \end{matrix}

(17)

The expectation is taken with respect to new realizations of the data $y_{n}^{D}$ . Note that such an overarching estimation would be computationally intractable for a non-linear model. New experiments can be simulated in two ways. Either by using the correct model with the true parameter values and adding measurement noise or by taking samples from the posterior predictive distribution of the correct model (Δ B ab B). In practice these ‘true’ parameter values are not known and the predictive distribution of the measurement based on the posterior samples provides the best obtainable estimate given the current parameter uncertainty. The change in Bayes factor (in favor of the correct model) was compared to the Jensen Shannon divergence between the competing models. Large predicted changes indicate that the experiment would result in a successful selection. As for the JSD, a large value indicates a large distance between the joint predictive distributions, marking the measurement as useful. See Figure 6 for an example of the analysis results. As shown in the different panels, different models parameterized on the same data, result in different posterior predictive distributions (dotted lines in the top row). When comparing model 1 and 3 (the true model), we can see that differences in predictions occur mostly beyond the time range previously measured. Whereas model 1 predicts a straight line, the true underlying system deviates from a single line. Consider two new measurements. From the differences in PPDs, it is clear that measuring beyond the region where data is available would lead to a decisive choice against model 1 as corroborated by the large Bayes factor updates shown in the left plot on the middle row (B ₁₃). It can also be seen that the PPDs differ more for negative time than positive time. Therefore the area which is decisive is larger for negative time. The JSDs follow this same pattern. The PPD for model 2 is less different from the PPD the true model would have generated. For the simulations coming from model 2, we can see that the value for positive and negative time is correlated. For model 3, these prediction values are negatively correlated. Consequently, performing one measurement for negative time and one for positive time would lead to the largest discriminatory power.

The JSD agrees well with the actual Bayes factor updates as shown in the third row of Figure 6. Interestingly, all the designs based on the JSD result in designs that would effectively discriminate between the true model and its competitors without having to specify a true model a priori. Subsequently, a Monte Carlo study was performed where a large number of random models were generated and compared. Plotting the relationship between the updated Bayes factors upon a new experiment and the corresponding JSD typically reveals a monotonic relationship that underlines its usefulness as a design criterion (see Figure 7 for two typical examples). These analyses were performed for a large number of randomly generated linear models. The Spearman correlation coefficient between the JSD and the expected Bayes factor averaged at 0.91 for the experiments we performed (see Additional file 1: section S6).

Nonlinear models

In Figure 8, we show the different predictions after performing model inference. Two sets of PPDs were simulated for two experimental conditions. These sets mimic two different concentrations of a signaling molecule, and have been implemented by setting the stimulus u to either 1 or 2. To test the effect of measuring multiple predictions, divergence estimates were computed for a large number of different combinations of two measurements. These results are shown in Figure 9. Each subplot corresponds to a different model comparison. The axis of each subplot is divided into ten sections corresponding to different predictions. Within each section, the axis represents time. The color value indicates the JSD, where a large value indicates a lot of separation and therefore a good measurement. Note the bright squares corresponding to the concentration of BpCp in each of the models. These high efficacies are not surprising considering that the PPDs show large differences between the models for these concentrations (See Figure 8). Also noticeable is that many of the experiments on the same predictions reveal dark diagonals within each tile. Measuring the same concentration twice typically adds fewer predictive constraints than measuring at two different time points, unless the second measurement is performed using a different concentration of signaling molecule (note how the diagonal lights up on the combination of measuring BpCp in condition 1 and 2 when selecting between model 3 and 4). Interestingly, measuring certain states during the overshoot is highly effective (Bp and Dp for any comparison), while the overshoot is less informative for others (Bp under stimulus 1 and 2 for discriminating between M ₁ and M ₂). All in all, the information contained in such a matrix is very valuable when it comes to selecting from a small list of experiments. For example, we can also see that considering the current predictive distributions, model 2 and 4 can barely be distinguished. This implies that in order to actually distinguish between these two, a different experiment is required. Such a new experiment could, again, be evaluated by generating a new competing set of PPDs.

To test the results, in silico experiments have been performed by simulating new data from the true model and determining the Bayes factor change upon including this data. Bayes factors were estimated using thermodynamic integration (see Additional file 1: section S2). The calculation of each set of four marginal likelihoods took about 6 days of wall-clock time on an Intel i7 CPU (2.93 GHz) with MATLAB R2010a. To validate the method, experiments are selected where differences between models are expected. The following experiments were performed:

1. Steady state Cp and BpCp concentration
2. Bp and Dp during the peak in the second condition (u=2)
3. Steady state Cp

Experiment 1 should differentiate between M ₁ and M ₃ (D ₁₃≈0.49), but not between M ₁ and M ₂ or M ₁ and M ₄. Experiment 2 should give discriminatory power for all models (D ₁₂≈0.48,D ₁₃≈0.54,D ₁₄≈0.61). Experiment 3 should not provide any additional discriminatory power at all. The results of these analyses are shown in Table 1. As predicted, experiment 1 leads only to an increase in discriminatory power between model M ₁ and M ₃. Experiment 2 improves the discriminatory power between all the models, while experiment 3 even reveals a decrease in discriminatory power for model 1 and 2. Noteworthy is also the large variance observed for experiment 3, which is likely related to the large variance in the steady state predictions of Cp. Again, the predictions based on the JSD are well in line with the Bayes factors obtained.

Table 1 Sample table title

Full size table

Conclusions

This paper describes a method applicable to performing experiment design with the aim of differentiating between various hypotheses. We show by means of a simulation study on analytically tractable models that the JSD is approximately monotonically related to the expected change in Bayes factor in favor of the model that generated the data (considering the current uncertainty in its parameters). This monotonic relation is useful, because it implies that the JSD can be used as a predictor of the change in Bayes factor. The applicability to non-linear models of biochemical reaction networks was demonstrated by applying it to models based on motifs previously observed in signaling networks [58, 59]. Experiments were designed for distinguishing between different feedback mechanisms.

Though forecasting a predictive distribution of Bayes factors has been suggested [61], the implicit penalization of model complexity could have adverse consequences. The experiment design could suggest a measurement where the probability densities of two models overlap. When this happens, both models can describe the data equally well, which leads to an implicit penalization of the more complex model (since it allows for more varied predictions due to its added freedom). This penalization can then be followed by subsequent selection (of the simpler model). Though a decisive selection occurs, such an experiment would not provide additional insight however. In [61], this is mitigated by determining the evidence in favor of a more complex model. Moreover, computing the predictive distributions of Bayes factors required for this approach is computationally intractable for non-linear models that are not nested. By focusing on differences in predictive distributions, both these problems are mitigated, making it is possible to pinpoint where the different models predict different behavior. Aside from their usefulness in model selection, such predictive differences could also be attributed to the different mechanisms present in the different models. This allows for follow-up studies to investigate whether these are either artificial or true system behavior.

A complicating factor in this method is the computational burden. The largest challenge to overcome is to obtain a sample from the posterior parameter distribution. Running MCMC on high dimensional problems can be difficult. Fortunately, recent advances in both MCMC [19, 62] as well as approximate sampling techniques [39, 48, 63, 64] allow sampling parameter distributions of increasingly complex models [14, 34–38]. The bottleneck in computing the JSD resides in searching for the k ^th nearest neighbor. A subproblem which occurs in many different problems and for which computationally faster solutions exist [65, 66]. An attractive aspect of this methodology is that it is possible to design multiple experiments at once. However, the density estimates typically become less accurate as the number of designed experiments increases (see Additional file 1: section S8). Therefore, we recommend starting with a low number of experiments (two or three) and gradually adding experiments while the JSD is low. Density estimation can also be problematic when the predictions vary greatly in their dispersion. When considering non-negative quantities such as concentrations, log-transforming the predictions may alleviate problems. Finally, the number of potential combinations of experiments increases exponentially with the number of experiments designed. It is clear that this rapidly becomes infeasible for large numbers of experiments. However, it is not necessary to fill the entire experimental matrix and techniques such as Sequential Monte Carlo sampling could be considered as an alternative to more effectively probe this space. We revert the reader to Additional file 1: section S7 for a proof of principle implementation of such a sampler.

One additional point of debate is the weighting of each of the models in the mixture distribution used to compute the JSD. It could be argued that it would be more sensible to weight models according to their model probabilities by determining the integrated likelihoods of the data that is already available. The reason for not doing this is two-fold. Firstly, the computational burden this adds to the experimental design procedure is significant. More importantly however, the implicit weighting in favor of parsimony could strongly affect the design by removing models which are considered unnecessarily complex at this stage of the analysis. When designing new experiments, the aim is to obtain measurements that allow for optimal discrimination between the predictive distributions under the different hypotheses. Optimal discrimination makes it sensible to consider the models equally probable a priori.

The method has several advantages that are particularly useful for modeling biochemical networks. Because the method is based on sampling from the posterior parameter probability distribution, it is particularly suitable when insufficient data is available to consider Gaussian parameter probability distributions or model linearisations. Additionally, it allows incorporation of prior knowledge in the form of prior parameter probability distributions. This is useful when the available data contains insufficient constraints to result in a well defined posterior parameter distribution. Because the design criterion is based on predictive distributions and such distributions can be computed for a wide range of model quantities, the approach is very flexible. In biochemical research, in vivo measurements are often difficult to perform and practical limitations of the various measurement technologies play an important role. In many cases measurements on separate components cannot be performed and measurements result in indirect derived quantities. Fortunately, in the current framework such measurements can be used directly since distributions of such experiments can be predicted.

Moreover, the impact of specific combinations of experiments can be assessed by including them in the design simultaneously which reveals specific combination of measurements that are particularly useful. This way, informative experiments can be distinguished from non-informative ones and the experimental efforts can be targeted to discriminate between competing hypotheses.

Availability

Source code is available at: http://cbio.bmt.tue.nl/sysbio/software/pua.html.

References

Tiemann C, Vanlier J, Hilbers P, van Riel N: Parameter adaptations during phenotype transitions in progressive diseases. BMC Syst Biol. 2011, 5: 174-10.1186/1752-0509-5-174.
Article PubMed Central PubMed Google Scholar
van Riel NA, Tiemann CA, Vanlier J, Hilbers PA: Applications of analysis of dynamic adaptations in parameter trajectories. Interface Focus. 2013, 3 (2): 20120084-10.1098/rsfs.2012.0084.
Article PubMed Central PubMed Google Scholar
Schmitz J, Van Riel N, Nicolay K, Hilbers P, Jeneson J: Silencing of glycolysis in muscle: experimental observation and numerical analysis. Exp Physiol. 2010, 95 (2): 380-397. 10.1113/expphysiol.2009.049841.
Article PubMed Central CAS PubMed Google Scholar
Schilling M, Maiwald T, Hengl S, Winter D, Kreutz C, Kolch W, Lehmann W, Timmer J, Klingmüller U: Theoretical and experimental analysis links isoform-specific ERK signalling to cell fate decisions. Mol Syst Biol. 2009, 5: 334-
Article PubMed Central PubMed Google Scholar
Borisov N, Aksamitiene E, Kiyatkin A, Legewie S, Berkhout J, Maiwald T, Kaimachnikov N, Timmer J, Hoek J, Kholodenko B: Systems-level interactions between insulin–EGF networks amplify mitogenic signaling. Mol Syst Biol. 2009, 5: 256-
Article PubMed Central PubMed Google Scholar
Cedersund G, Roll J, Ulfhielm E, Danielsson A, Tidefelt H, Strålfors P: Model-based hypothesis testing of key mechanisms in initial phase of insulin signaling. PLoS Comput Biol. 2008, 4 (6): 799-806.
Article Google Scholar
Koschorreck M, Gilles E: Mathematical modeling and analysis of insulin clearance in vivo. BMC Syst Biol. 2008, 2: 43-10.1186/1752-0509-2-43.
Article PubMed Central PubMed Google Scholar
Schoeberl B, Eichler-Jonsson C, Gilles E, Müller G: Computational modeling of the dynamics of the MAP kinase cascade activated by surface and internalized EGF receptors. Nat Biotechnol. 2002, 20 (4): 370-375. 10.1038/nbt0402-370.
Article PubMed Google Scholar
Jeneson J, Westerhoff H, Kushmerick M: A metabolic control analysis of kinetic controls in ATP free energy metabolism in contracting skeletal muscle. Am J Physiol-Cell Physiol. 2000, 279 (3): C813-C832.
CAS PubMed Google Scholar
Wu F, Jeneson J, Beard D: Oxidative ATP synthesis in skeletal muscle is controlled by substrate feedback. Am J Physiol-Cell Physiol. 2007, 292: C115-C124.
Article CAS PubMed Google Scholar
Groenendaal W, Schmidt K, von Basum G, van Riel N, Hilbers P: Modeling glucose and water dynamics in human skin. Diabetes Technol Ther. 2008, 10 (4): 283-293. 10.1089/dia.2007.0290.
Article CAS PubMed Google Scholar
Vanlier J, Wu F, Qi F, Vinnakota K, Han Y, Dash R, Yang F, Beard D: BISEN: Biochemical simulation environment. Bioinformatics. 2009, 25 (6): 836-837. 10.1093/bioinformatics/btp069.
Article PubMed Central CAS PubMed Google Scholar
Vanlier J, Tiemann C, Hilbers P, van Riel N: Parameter uncertainty in biochemical models described by ordinary differential equations. Math Biosci. 2013, 246 (2): 305-314. 10.1016/j.mbs.2013.03.006.
Article CAS PubMed Google Scholar
Klinke D: An empirical Bayesian approach for model-based inference of cellular signaling networks. BMC Bioinformatics. 2009, 10: 371-10.1186/1471-2105-10-371.
Article PubMed Central PubMed Google Scholar
Taylor H, Barnes C, Huvet M, Bugeon L, Thorne T, Lamb J, Dallman M, Stumpf M: Calibrating spatio-temporal models of leukocyte dynamics against in vivo live-imaging data using approximate Bayesian computation. Integr Biol. 2012, 4 (3): 335-345. 10.1039/c2ib00175f.
Article Google Scholar
Raue A, Kreutz C, Maiwald T, Bachmann J, Schilling M, Klingmüller U, Timmer J: Structural and practical identifiability analysis of partially observed dynamical models by exploiting the profile likelihood. Bioinformatics. 2009, 25 (15): 1923-10.1093/bioinformatics/btp358.
Article CAS PubMed Google Scholar
Hasenauer J, Waldherr S, Wagner K, Allgower F: Parameter identification, experimental design and model falsification for biological network models using semidefinite programming. Syst Biol IET. 2010, 4 (2): 119-130. 10.1049/iet-syb.2009.0030.
Article CAS Google Scholar
Brännmark C, Palmér R, Glad S, Cedersund G, Strålfors P: Mass and information feedbacks through receptor endocytosis govern insulin signaling as revealed using a parameter-free modeling framework. J Biol Chem. 2010, 285 (26): 20171-10.1074/jbc.M110.106849.
Article PubMed Central PubMed Google Scholar
Girolami M, Calderhead B: Riemann manifold langevin and hamiltonian monte carlo methods. J R Stat Soci: Series B (Stat Methodol). 2011, 73 (2): 123-214. 10.1111/j.1467-9868.2010.00765.x.
Article Google Scholar
Cedersund G, Roll J: Systems biology: model based evaluation and comparison of potential explanations for given biological data. FEBS J. 2009, 276 (4): 903-922. 10.1111/j.1742-4658.2008.06845.x.
Article CAS PubMed Google Scholar
Müller T, Faller D, Timmer J, Swameye I, Sandra O, Klingmüller U: Tests for cycling in a signalling pathway. J R Stat Soc: Series C (Appl Stat). 2004, 53 (4): 557-568. 10.1111/j.1467-9876.2004.05148.x.
Article Google Scholar
Calderhead B, Girolami M: Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus. 2011, 1 (6): 821-835. 10.1098/rsfs.2011.0051.
Article PubMed Central PubMed Google Scholar
Tegnér J, Compte A, Auffray C, An G, Cedersund G, Clermont G, Gutkin B, Oltvai Z, Stephan K, Thomas R: Computational disease modeling–fact or fiction?. BMC Syst Biol. 2009, 3: 56-10.1186/1752-0509-3-56.
Article PubMed Central PubMed Google Scholar
Skanda D, Lebiedz D: An optimal experimental design approach to model discrimination in dynamic biochemical systems. Bioinformatics. 2010, 26 (7): 939-10.1093/bioinformatics/btq074.
Article CAS PubMed Google Scholar
Steiert B, Raue A, Timmer J, Kreutz C: Experimental Design for Parameter Estimation of Gene Regulatory Networks. PloS one. 2012, 7 (7): e40052-10.1371/journal.pone.0040052.
Article PubMed Central CAS PubMed Google Scholar
Casey F, Baird D, Feng Q, Gutenkunst R, Waterfall J, Myers C, Brown K, Cerione R, Sethna J: Optimal experimental design in an epidermal growth factor receptor signalling and down-regulation model. Syst Biol IET. 2007, 1 (3): 190-202. 10.1049/iet-syb:20060065.
Article CAS Google Scholar
Kreutz C, Raue A, Timmer J: Likelihood based observability analysis and confidence intervals for predictions of dynamic models. BMC Syst Biol. 2012, 6: 120-10.1186/1752-0509-6-120.
Article PubMed Central PubMed Google Scholar
Vanlier J, Tiemann C, Hilbers P, van Riel N: An integrated strategy for prediction uncertainty analysis. Bioinformatics. 2012, 28 (8): 1130-1135. 10.1093/bioinformatics/bts088.
Article PubMed Central CAS PubMed Google Scholar
Liepe J, Filippi S, Komorowski M, Stumpf MP: Maximizing the Information Content of Experiments in Systems Biology. PLOS Comput Biol. 2013, 9: e1002888-10.1371/journal.pcbi.1002888.
Article PubMed Central CAS PubMed Google Scholar
Vanlier J, Tiemann C, Hilbers P, van Riel N: A Bayesian approach to targeted experiment design. Bioinformatics. 2012, 28 (8): 1136-1142. 10.1093/bioinformatics/bts092.
Article PubMed Central CAS PubMed Google Scholar
King RD, Whelan KE, Jones FM, Reiser PG, Bryant CH, Muggleton SH, Kell DB, Oliver SG: Functional genomic hypothesis generation and experimentation by a robot scientist. Nature. 2004, 427 (6971): 247-252. 10.1038/nature02236.
Article CAS PubMed Google Scholar
MacKay DJ: Information-based objective functions for active data selection. Neural Comput. 1992, 4 (4): 590-604. 10.1162/neco.1992.4.4.590.
Article Google Scholar
Daunizeau J, Preuschoff K, Friston K, Stephan K: Optimizing experimental design for comparing models of brain function. PLoS Comput Biol. 2011, 7 (11): e1002280-10.1371/journal.pcbi.1002280.
Article PubMed Central CAS PubMed Google Scholar
Hug S, Raue A, Hasenauer J, Bachmann J, Klingmüller U, Timmer J, Theis F: High-dimensional Bayesian parameter estimation: case study for a model of JAK2/STAT5 signaling. Math Biosci. 2013, 246 (2): 293-304. 10.1016/j.mbs.2013.04.002.
Article CAS PubMed Google Scholar
Finley S, Gupta D, Cheng N, Klinke D: Inferring relevant control mechanisms for interleukin-12 signaling in naïve CD4+; T cells. Immunol Cell Biol. 2010, 89: 100-110.
Article PubMed Central PubMed Google Scholar
Konukoglu E, Relan J, Cilingir U, Menze B, Chinchapatnam P, Jadidi A, Cochet H, Hocini M, Delingette H, Jaïs P: Efficient probabilistic model personalization integrating uncertainty on data and parameters: Application to eikonal-diffusion models in cardiac electrophysiology. Prog Biophys Mol Biol. 2011, 107: 134-146. 10.1016/j.pbiomolbio.2011.07.002.
Article PubMed Google Scholar
Xu T, Vyshemirsky V, Gormand A, von Kriegsheim A, Girolami M, Baillie G, Ketley D, Dunlop A, Milligan G, Houslay M: Inferring signaling pathway topologies from multiple perturbation measurements of specific biochemical species. Sci Signal. 2010, 3 (113): ra20-
Article Google Scholar
Kalita MK, Sargsyan K, Tian B, Paulucci-Holthauzen A, Najm HN, Debusschere BJ, Brasier AR: Sources of cell-to-cell variability in canonical nuclear factor-κ B (NF-κ B) signaling pathway inferred from single cell dynamic images. J Biol Chem. 2011, 286 (43): 37741-37757. 10.1074/jbc.M111.280925.
Article PubMed Central CAS PubMed Google Scholar
Toni T, Stumpf M: Simulation-based model selection for dynamical systems in systems and population biology. Bioinformatics. 2010, 26: 104-110. 10.1093/bioinformatics/btp619.
Article PubMed Central CAS PubMed Google Scholar
Vyshemirsky V, Girolami M: Bayesian ranking of biochemical system models. Bioinformatics. 2008, 24 (6): 833-839. 10.1093/bioinformatics/btm607.
Article CAS PubMed Google Scholar
Schmidl D, Hug S, Li WB, Greiter MB, Theis FJ: Bayesian model selection validates a biokinetic model for zirconium processing in humans. BMC Syst Biol. 2012, 6: 95-10.1186/1752-0509-6-95.
Article PubMed Central CAS PubMed Google Scholar
Mélykúti B, August E, Papachristodoulou A, El-Samad H: Discriminating between rival biochemical network models: three approaches to optimal experiment design. BMC Syst Biol. 2010, 4: 38-10.1186/1752-0509-4-38.
Article PubMed Central PubMed Google Scholar
Flassig R, Sundmacher K: Optimal design of stimulus experiments for robust discrimination of biochemical reaction networks. Bioinformatics. 2012, 28 (23): 3089-3096. 10.1093/bioinformatics/bts585.
Article PubMed Central CAS PubMed Google Scholar
Endres DM, Schindelin JE: A new metric for probability distributions. Inf Theory IEEE Trans. 2003, 49 (7): 1858-1860. 10.1109/TIT.2003.813506.
Article Google Scholar
Kreutz C, Rodriguez M, Maiwald T, Seidl M, Blum H, Mohr L, Timmer J: An error model for protein quantification. Bioinformatics. 2007, 23 (20): 2747-10.1093/bioinformatics/btm397.
Article CAS PubMed Google Scholar
Raue A, Kreutz C, Theis F, Timmer J: Joining forces of Bayesian and frequentist methodology: A study for inference in the presence of non-identifiability. Phil Trans Roy Soc A. 2012, 371 (1984): 20110544-10.1098/rsta.2011.0544.
Article Google Scholar
Geyer C: Practical markov chain monte carlo. Stat Sci. 1992, 7 (4): 473-483. 10.1214/ss/1177011137.
Article Google Scholar
Toni T, Welch D, Strelkowa N, Ipsen A, Stumpf M: Approximate Bayesian computation scheme for parameter inference and model selection in dynamical systems. J R Soc Interface. 2009, 6 (31): 187-202. 10.1098/rsif.2008.0172.
Article PubMed Central PubMed Google Scholar
Burnham KP, Anderson DR: Model selection and multi-model inference: a practical information-theoretic approach. 2002, New York: Springer
Google Scholar
Penny WD, Stephan K, Mechelli A, Friston K: Comparing dynamic causal models. NeuroImage. 2004, 22 (3): 1157-1172. 10.1016/j.neuroimage.2004.03.026.
Article CAS PubMed Google Scholar
Good IJ: Weight of evidence: a brief survey. Bayesian Stat. 1985, 2: 249-269.
Google Scholar
Calderhead B, Girolami M: Estimating Bayes factors via thermodynamic integration and population MCMC. Comput Stat Data Anal. 2009, 53 (12): 4028-4045. 10.1016/j.csda.2009.07.025.
Article Google Scholar
Lin J: Divergence measures based on the Shannon entropy. Inf Theory IEEE Trans. 1991, 37: 145-151. 10.1109/18.61115.
Article Google Scholar
Härdle W, Werwatz A, Müller M, Sperlich S: Introduction. Nonparametric Semiparametric Models. 2004, New York: Springer
Book Google Scholar
Kraskov A, Stögbauer H, Grassberger P: Estimating mutual information. Phys Rev E. 2004, 69 (6): 066138-
Article Google Scholar
Budka M, Gabrys B, Musial K: On accuracy of PDF divergence estimators and their applicability to representative data sampling. Entropy. 2011, 13 (7): 1229-1266.
Article Google Scholar
Boltz S, Debreuve E, Barlaud M: High-dimensional statistical distance for region-of-interest tracking: Application to combining a soft geometric constraint with radiometry. IEEE Conference on Computer Vision and Pattern Recognition, 2007. CVPR’07. 2007, Minneapolis, Minnesota, USA: IEEE Computer Society, 1-8.
Chapter Google Scholar
Ogata H, Goto S, Sato K, Fujibuchi W, Bono H, Kanehisa M: KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 1999, 27: 29-34. 10.1093/nar/27.1.29.
Article PubMed Central CAS PubMed Google Scholar
Bevan P: Insulin signalling. J Cell Sci. 2001, 114 (8): 1429-1430.
CAS PubMed Google Scholar
Friedman JH, Bentley JL, Finkel RA: An algorithm for finding best matches in logarithmic expected time. ACM Trans Math Softw (TOMS). 1977, 3 (3): 209-226. 10.1145/355744.355745.
Article Google Scholar
Trotta R: Forecasting the Bayes factor of a future observation. Mon Notices R Astronomical Soc. 2007, 378 (3): 819-824. 10.1111/j.1365-2966.2007.11861.x.
Article Google Scholar
Calderhead B, Girolami M: Statistical analysis of nonlinear dynamical systems using differential geometric sampling methods. Interface Focus. 2011, 1 (6): 821-835. 10.1098/rsfs.2011.0051.
Article PubMed Central PubMed Google Scholar
Calderhead B, Girolami M, Lawrence N: Accelerating Bayesian inference over nonlinear differential equations with Gaussian processes. Adv Neural Inf Process Syst. 2009, 21: 217-224.
Google Scholar
Liepe J, Barnes C, Cule E, Erguler K, Kirk P, Toni T, Stumpf M: ABC-SysBio approximate Bayesian computation in Python with GPU support. Bioinformatics. 2010, 26 (14): 1797-10.1093/bioinformatics/btq278.
Article PubMed Central CAS PubMed Google Scholar
Arefin A, Riveros C, Berretta R, Moscato P: GPU-FS-kNN: A Software Tool for Fast and Scalable kNN Computation Using GPUs. PloS one. 2012, 7 (8): e44000-10.1371/journal.pone.0044000.
Article PubMed Central CAS PubMed Google Scholar
Garcia V, Debreuve E, Barlaud M: Fast k nearest neighbor search using GPU. CVPR Workshop on Computer Vision on GPU. 2008, Anchorage, Alaska, USA: IEEE Computer Society
Google Scholar

Download references

Acknowledgements

Funding

This work was funded by the Netherlands Consortium for Systems Biology (NCSB). The authors would like to thank H. W. H. van Roekel, R. M. Foster and C. Çölmekçi Öncü for helpful discussions.

Author information

Authors and Affiliations

Department of Biomedical Engineering, Eindhoven University of Technology, PO Box 513, Eindhoven, 5600, MB, The Netherlands
Joep Vanlier, Christian A Tiemann, Peter AJ Hilbers & Natal AW van Riel
Netherlands Consortium for Systems Biology, University of Amsterdam, Amsterdam, 1098 XH, The Netherlands
Joep Vanlier, Christian A Tiemann, Peter AJ Hilbers & Natal AW van Riel

Authors

Joep Vanlier
View author publications
You can also search for this author in PubMed Google Scholar
Christian A Tiemann
View author publications
You can also search for this author in PubMed Google Scholar
Peter AJ Hilbers
View author publications
You can also search for this author in PubMed Google Scholar
Natal AW van Riel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joep Vanlier.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

JV developed the method, performed the analyses, and wrote the paper. CT, PH and NR contributed to the computational analysis, interpretation of the results and revised the paper. All authors read and approved the final manuscript.

Electronic supplementary material

12918_2013_1290_MOESM1_ESM.pdf

Additional file 1:Supplementary Information. S1. Additional information regarding the MCMC Sampler. S2. Thermodynamic integration. S3. Mutual information. S4. Analytical expressions for the linear models. S5. Nonlinear model equations. S6. Spearman correlation JSD and Bayes factor. S7. Sampling bigger design matrices. S8. Choosing the size parameter for the density estimation. (PDF 5 MB)

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Vanlier, J., Tiemann, C.A., Hilbers, P.A. et al. Optimal experiment design for model selection in biochemical networks. BMC Syst Biol 8, 20 (2014). https://doi.org/10.1186/1752-0509-8-20

Download citation

Received: 11 April 2013
Accepted: 13 February 2014
Published: 20 February 2014
DOI: https://doi.org/10.1186/1752-0509-8-20

Optimal experiment design for model selection in biochemical networks

Abstract

Background

Results

Conclusions

Similar content being viewed by others

Background

Methods

Predictive Distributions

Model selection

Experiment design

Testing the method: numerical experiments

Analytic models

Non-linear biochemical networks

Computational implementation

Results and discussion

Analytic models

Nonlinear models

Conclusions

Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors’ contributions

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation