Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models
- 1.9k Downloads
- 6 Citations
Abstract
To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs), various recent studies have combined DBNs with multiple changepoint processes. The underlying assumption is that the parameters associated with time series segments delimited by multiple changepoints are a priori independent. Under weak regularity conditions, the parameters can be integrated out in the likelihood, leading to a closed-form expression of the marginal likelihood. However, the assumption of prior independence is unrealistic in many real-world applications, where the segment-specific regulatory relationships among the interdependent quantities tend to undergo gradual evolutionary adaptations. We therefore propose a Bayesian coupling scheme to introduce systematic information sharing among the segment-specific interaction parameters. We investigate the effect this model improvement has on the network reconstruction accuracy in a reverse engineering context, where the objective is to learn the structure of a gene regulatory network from temporal gene expression profiles. The objective of the present paper is to expand and improve an earlier conference paper in six important aspects. Firstly, we offer a more comprehensive and self-contained exposition of the methodology. Secondly, we extend the model by introducing an extra layer to the model hierarchy, which allows for information-sharing among the network nodes, and we compare various coupling schemes for the noise variance hyperparameters. Thirdly, we introduce a novel collapsed Gibbs sampling step, which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithm. Fourthly, we show how collapsing and blocking techniques can be used for developing a novel advanced MCMC algorithm with significantly improved convergence and mixing. Fifthly, we systematically investigate the influence of the (hyper-)hyperparameters of the proposed model. Sixthly, we empirically compare the proposed global information coupling scheme with an alternative paradigm based on sequential information sharing.
Keywords
Non-homogeneous dynamic Bayesian networks Gene regulatory networks Bayesian regularization Bayesian multiple changepoint processes Reversible jump Markov chain Monte Carlo1 Introduction
There is considerable interest in structure learning of dynamic Bayesian networks (DBNs), with a variety of applications in computational systems biology. However, the standard assumption underlying DBNs—that time-series have been generated from a homogeneous Markov process—is too restrictive in many applications and can potentially lead to artifacts and erroneous conclusions. While there have been various efforts to relax the homogeneity assumption for undirected graphical models (Talih and Hengartner 2005; Xuan and Murphy 2007), relaxing this restriction in DBNs is a more recent research topic (Lèbre 2007; Robinson and Hartemink 2009, 2010; Ahmed and Xing 2009; Kolar et al. 2009; Lèbre et al. 2010; Dondelinger et al. 2010, 2012; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011). Various authors have proposed relaxing the homogeneity assumption by complementing the traditional homogeneous DBN with a Bayesian multiple changepoint process (Lèbre 2007; Robinson and Hartemink 2009, 2010; Lèbre et al. 2010; Dondelinger et al. 2010, 2012; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011). Each time series segment defined by two demarcating changepoints is associated with separate node-specific DBN parameters, and in this way the conditional probability distributions are allowed to vary from segment to segment. An attractive feature of this approach is that under certain regularity conditions, most notably parameter independence and conjugacy of the prior, the parameters can be integrated out in closed form in the likelihood. The inference task thus reduces to sampling the network structure as well as the number and location of changepoints from the posterior distribution, which can be effected with reversible jump Markov chain Monte Carlo (RJMCMC) (Green 1995), e.g., as in Lèbre et al. (2010) or Robinson and Hartemink (2010), or with dynamic programming (Fearnhead 2006), as in Grzegorczyk and Husmeier (2011).
In many real-word applications, the assumption of parameter independence is questionable, though. Consider the cellular processes during an organism’s development (morphogenesis) or its adaptation to changing environmental conditions. The assumption of a homogeneous process with constant parameters is over-restrictive in that it fails to allow for the non-stationary nature of the processes. However, complete parameter independence is over-flexible in that it ignores the evolutionary aspect of adaptation processes, where the majority of segment-specific regulatory relationships among the interdependent quantities tend to undergo minor and gradual adaptations. Given a regulatory network at some time interval in an organism’s life cycle, it is unrealistic to assume that at the adjacent time intervals, nature has reinvented different regulatory circuits from scratch. Instead, we would assume that the knowledge of the interaction strengths at other time intervals will improve the inference of the interaction strengths associated with the given time interval, especially for sparse data. In what follows, we will describe how this idea can be implemented in the model, and which adaptations are required for the inference scheme.
There are various articles from the signal processing community that are related to our work. Our hierarchical Bayesian model structure is similar to the one proposed in Punskaya et al. (2002). However, in Punskaya et al. (2002) information is only shared among different parameter vectors via a common scalar scale hyperparameter, which does not provide the sort of more explicit information sharing motivated by our discussion above. Like the model in Punskaya et al. (2002), our model is based on a switching piecewise homogeneous autoregressive process, whereas the models in Andrieu et al. (2003), Moulines et al. (2005), and Wang et al. (2011) are based on continuously time varying autoregressive processes. Like our paper, Moulines et al. (2005) and Wang et al. (2011) introduce information sharing between consecutive regression parameter vectors; this is only achieved indirectly in Andrieu et al. (2003) via a nonlinear transformation into the space of complex-valued poles. Moulines et al. (2005) is a theoretical non-Bayesian paper on error bounds under a Lipschitz condition. A closer relative to our paper is the method of Wang et al. (2011), whose objective is online parameter estimation via particle filtering, with applications e.g. in tracking. This is a different scenario from most systems biology applications, where an interaction structure is typically learnt off-line after completion of the experiments. Unlike Wang et al. (2011), our work thus follows other applications of DBNs in systems biology (Lèbre et al. 2010; Robinson and Hartemink 2009, 2010; Dondelinger et al. 2010; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011), and Dondelinger et al. (2012) and aims to infer the model structure by marginalizing out the parameters in closed form. To paraphrase this: while inference in Wang et al. (2011) is based on filtering, inference in our work is based on smoothing.
Overview to time-varying dynamic Bayesian network models, which have recently been proposed in the literature. Detailed explanations are given in the text
Hard coupled network(s) | Weakly coupled networks | Weakly coupled networks | Uncoupled networks | Weakly coupled parameters | |
---|---|---|---|---|---|
Literature reference(s) | Grzegorczyk and Husmeier (2011) | Dondelinger et al. (2010) or Robinson and Hartemink (2011) | Dondelinger et al. (2012) | Lèbre et al. (2010) | Proposed here |
Network structures flexible? | No | Yes | Yes | Yes | No |
Network coupling scheme: | network is kept fixed | networks are sequentially coupled | networks are globally coupled | networks are not coupled | network is kept fixed |
Network parameters flexible? | Yes | Yes | Yes | Yes | Yes |
Network parameters coupled? | No | No | No | No | Yes |
In a previous journal paper, we have proposed a model for sequential information sharing with respect to the interaction parameters (Grzegorczyk and Husmeier 2012a). In a previous conference article, we have proposed a model for global information sharing with respect to the interaction parameters (Grzegorczyk and Husmeier 2012b). The objective of the present work is sixfold. Firstly, due to a strict page limit, the presentation of the methodology in Grzegorczyk and Husmeier (2012b) is very terse, and we here offer a more comprehensive and self-contained exposition. In particular, in Grzegorczyk and Husmeier (2012b) we only briefly outlined the Gibbs sampling scheme for inference. Here we provide all technical details including a graphical representation of the novel model and pseudo-code for the inference algorithm. Secondly, neither the sequentially (Grzegorczyk and Husmeier 2012a) nor the globally (Grzegorczyk and Husmeier 2012b) coupled model allow for information-sharing among the nodes in the network. Here, we extend the model from Grzegorczyk and Husmeier (2012b) by introducing an extra (level-3) layer to the hierarchy of the proposed model. While the hyperparameters of each node were modeled independently in the original models, the extended model hierarchically couples the node-specific noise variances and the node-specific coupling strengths between the segment-specific interaction parameters. Moreover, in our earlier works (Grzegorczyk and Husmeier 2012a, 2012b) we focused on node-specific variance hyperparameters which are shared by the node-specific time intervals. Here, we present nine different coupling schemes for the noise variance hyperparameters and we empirically compare three of them. Thirdly, we introduce a novel collapsed Gibbs sampling step, which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithms. Fourthly and most importantly, we show how this novel collapsed Gibbs sampling step as well as blocking techniques can be used for developing a novel advanced MCMC algorithm. We empirically show that the advanced MCMC algorithm performs significantly better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) in terms of convergence and mixing. In this context we also consider scenarios where the original MCMC sampling scheme fails to converge so that the advanced MCMC sampling scheme also reaches a better network reconstruction accuracy. Fifthly, neither in Grzegorczyk and Husmeier (2012a) nor in Grzegorczyk and Husmeier (2012b) did we investigate the robustness of the proposed model with respect to a variation of the fixed (hyper-)hyperparameters, and we focused our attention on one single hyperparameter setting, which was taken from Lèbre et al. (2010). Here we systematically vary the (hyper-)hyperparameters of those (hyper-)priors that are important for the noise variances and coupling strengths among segments and we investigate their influence on the performance. Sixthly, we conduct a comparative evaluation between the proposed global information coupling scheme and the alternative paradigm based on sequential information sharing (Grzegorczyk and Husmeier 2012a), and we discuss reasons for the potential fundamental improvement achieved with the new approach.
2 Mathematical details
2.1 Bayesian linear regression
2.2 Application to dynamic Bayesian networks
2.2.1 Fixed changepoints
Overview of the coupling schemes (S1)–(S9) for the noise variance hyperparameters. No coupling: The noise variance hyperparameters are d-separated, i.e., they have separate level-2 hyperparameters which are fixed. Weak coupling: The noise variance hyperparameters are not d-separated, i.e., they share a set of common level-2 hyperparameters which are flexible. Hard coupling: There are common noise variance hyperparameters (with fixed level-2 hyperparameters)
Segments h=1,…,K_{g} | No coupling | Nodes (g=1,…,N) weak coupling | Hard coupling |
---|---|---|---|
No coupling | (S1) \(\sigma_{g,h}^{-2}\sim \operatorname {Gam}(A_{\sigma,g,h},B_{\sigma,g,h})\) A_{σ,g,h} and B_{σ,g,h} fixed | (S2) \(\sigma_{g,h}^{-2}\sim \operatorname {Gam}(A_{\sigma,h},B_{\sigma,h})\) A_{σ,h} and/or B_{σ,h} flexible i.e. \(\{\sigma_{g,h}^{2}\}_{g}\) coupled ∀h | (S3) \(\sigma_{g,h}^{2}=\sigma_{h}^{2}\) \(\sigma_{h}^{-2}\sim \operatorname {Gam}(A_{\sigma,h},B_{\sigma,h})\) A_{σ,h} and B_{σ,h} fixed |
Weak coupling | (S4) \(\sigma_{g,h}^{-2}\sim \operatorname {Gam}(A_{\sigma,g},B_{\sigma,g})\) A_{σ,g} and/or B_{σ,g} flexible i.e. \(\{\sigma_{g,h}^{2}\}_{h}\) coupled ∀g | (S5) \(\sigma_{g,h}^{-2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A_{σ} and/or B_{σ} flexible i.e. \(\{\sigma_{g,h}^{2}\}_{g,h}\) coupled | (S6) \(\sigma_{g,h}^{2}=\sigma_{h}^{2}\) \(\sigma_{h}^{-2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A_{σ} and/or B_{σ} flexible i.e. \(\{\sigma_{h}^{2}\}_{h}\) coupled |
Hard coupling | (S7) \(\sigma_{g,h}^{2}=\sigma_{g}^{2}\) \(\sigma_{g}^{-2}\sim \operatorname {Gam}(A_{\sigma,g},B_{\sigma,g})\) A_{σ,g} and B_{σ,g} fixed | (S8) \(\sigma_{g,h}^{2}=\sigma_{g}^{2}\) \(\sigma_{g}^{-2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A_{σ} and/or B_{σ} flexible i.e. \(\{\sigma_{g}^{2}\}_{g}\) coupled | (S9) \(\sigma_{g,h}^{2}=\sigma^{2}\) \(\sigma^{-2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A_{σ} and B_{σ} fixed |
A systematic comparative evaluation of the coupling schemes (S1)–(S9) from Table 2 is confounded by the dependence of the performance of these methods on the choice of the level-2 hyperparameters and the level-3 hyperpriors. We therefore decided to select scheme (S8) based on the following four facts. First, for our applications to gene regulatory networks we would expect the differences among nodes (genes) to be more substantial than the differences among (time) segments for the same node (gene), which suggests a natural hierarchy of the strength of the coupling. Second, in explorative simulations, which we carried out for our earlier conference paper (Grzegorczyk and Husmeier 2012b), we obtained slightly better results with the “no coupling for the nodes, hard coupling for the segments” scheme (S7) than for the “fully flexible approach” (S1), which suggests that segment-specific noise variances hyperparameters lead to over-flexibility. Third, with coupling scheme (S8) the signal-to-noise hyperparameters, δ_{g}, as well as the noise variance hyperparameters, \(\sigma_{g}^{2}\), are both gene- but not segment-specific. Thus, both types of hyperparameters can consistently (symmetrically) be weakly coupled for the nodes. Fourth and most importantly, in an explorative pre-study for this paper we implemented the NH-DBN models with schemes (S8), (S4), and (S5) and for synthetic data we empirically found that coupling scheme (S8) performs consistently better than the coupling schemes (S4) and (S5).^{1}^{,}^{2}
Table of (hyper-)parameters and symbols, which have been introduced
Symbol | Explanation |
---|---|
g | The g-th network node (g=1,…,N) |
K_{g} | The number of segments for node g |
h | The h-th time segment (h=1,…,K_{g}) |
\(\mathcal{M}\) | The network structure, \(\mathcal{M}=\{\pi_{1},\ldots,\pi_{N}\}\) |
\(\sigma_{g}^{2}\) | The noise variance hyperparameter for node g see (16) |
δ_{g} | The signal-to-noise hyperparameter for node g see (17); \(\delta_{g}^{-1}\) is the “coupling strength” in the coupled NH-DBN |
π_{g} | The parent node set of node g |
\(\mathcal{F}\) | The fan-in restriction: \(|\pi_{g}|\leq\mathcal{F}\) for all nodes g |
\(\mbox {{\boldmath $\tau $}}_{g}\) | The set of changepoints, \(\mbox {{\boldmath $\tau $}}_{g}=\{\tau_{g,1},\ldots,\tau_{g,K_{g}-1}\}\), for node g |
m_{g} | The global interaction hyperparameter vector for node g |
w_{g,h} | The interaction parameter vector for the h-th segment of node g |
y_{g,h} | The target values of node g in segment h |
\(\mathbf {X}_{\pi _{g},h}\) | The design matrix for segment h of node g |
\(\mathbf {y}_{g,\mbox {{\boldmath $\tau $}}_{g}}\) | The set of target values, \(\{\mathbf {y}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) |
\(\mathbf {w}_{g,\mbox {{\boldmath $\tau $}}_{g}}\) | The set of interaction parameter vectors, \(\{\mathbf {w}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) |
\(\mathbf {X}_{\pi _{g},\mbox {{\boldmath $\tau $}}_{g}}\) | The set of design matrices, \(\{\mathbf {w}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) |
p and k | The hyperparameters of the negative binomial prior for the distance between changepoints, implying the changepoint sets, \(\mbox {{\boldmath $\tau $}}_{g}\); see Sect. 2.2.2 |
m_{†}, Σ_{†} | The level-2 hyperparameters of the Gaussian prior for m_{g}, see (12) |
A_{σ}, B_{σ} | The level-2 hyperparameters of the Gamma prior for \(\sigma_{g}^{-2}\), see (30) |
A_{δ}, B_{δ} | The level-2 hyperparameters of the Gamma prior for \(\delta_{g}^{-1}\), see (31) |
α_{σ}, β_{σ} | The level-3 hyperparameters of the Gamma prior for B_{σ}, see (32) |
α_{δ}, β_{δ} | The level-3 hyperparameters of the Gamma prior for B_{δ}, see (33) |
2.2.2 Variable changepoints
So far, we have assumed that the node-specific changepoints \(\mbox {{\boldmath $\tau $}}_{g}\) are fixed, but it is straightforward to make them variable. To this end, we need to decide on a prior distribution. Two alternative forms have been compared in Fearnhead (2006). The first approach, adopted in Lèbre et al. (2010), is based on a truncated Poisson prior on the number of changepoints (K_{g}−1), and an explicit specification of \(P(\mbox {{\boldmath $\tau $}}_{g}|(K_{g}-1))\), e.g. the uniform distribution. The second alternative, pursued in Grzegorczyk and Husmeier (2011) and used in the present work, is based on a point process, where the distribution of the distance between two successive points is a negative binomial distribution.
2.2.3 Hierarchical Bayesian model and MCMC inference scheme
The other prior distributions have been discussed in the previous sections. Sampling from the joint posterior distribution follows a Gibbs sampling like strategy, in which variables are sampled from their respective conditional distributions given the other variables in their Markov blankets. Whenever possible, we sample from the closed-form distributions and use collapsing, i.e. integrate (some) variables from the Markov blankets out analytically. Where closed form distributions are not available, we resort to RJMCMC steps. The overall sampling scheme is hence of the type RJMCMC within partially collapsed Gibbs.
The conditional distributions of the parent sets π_{g}, which define the network structure, and the changepoint sets \(\mbox {{\boldmath $\tau $}}_{g}\), are not of closed form. Sampling of \(\mbox {{\boldmath $\tau $}}_{g}\) from the proper conditional distribution (conditional on the variables in its Markov blanket) can be effected with the dynamic programming scheme described in Grzegorczyk and Husmeier (2011), at computational complexity quadratic in the time series length. Sampling of the parent configurations π_{g} from the respective conditional distribution is also feasible, by exhaustive enumeration of all valid parent configurations (subject to the fan-in restriction, \(\mathcal{F}\)) and normalization of their local posterior probability potentials. In principle, it is therefore possible to set up an overall Gibbs sampler that does not require any Metropolis-Hastings-(Green) moves (Green 1995). However, the computational complexity of Gibbs sampling steps for π_{g} and \(\mbox {{\boldmath $\tau $}}_{g}\) is substantially higher than that of all other sampling steps. These disproportional computational costs are suboptimal in a bottleneck sense by which the number of sampling steps for the other variables is restricted to the number of feasible dynamic programming and complete enumeration steps. An alternative approach is to give up on the desire to sample π_{g} and \(\mbox {{\boldmath $\tau $}}_{g}\) from the conditional distribution directly, and use a Metropolis-Hastings-Green RJMCMC scheme instead. This leaves the computational complexity of all individual sampling steps roughly balanced, and is the approach we adopted for the present work.
From (41) the network structure, \(\mathcal{M}\), can be sampled with the “improved structure MCMC sampling scheme” proposed in Grzegorczyk and Husmeier (2011). From (42) the changepoint sets, \(\{\mbox {{\boldmath $\tau $}}_{g}\}_{g}\) (g=1,…,N), can be sampled with reversible jump Markov chain Monte Carlo (RJMCMC) (Green 1995), as in Lèbre et al. (2010) and Robinson and Hartemink (2010).
The original MCMC simulation consists of three successive parts: (i) the network structure update part, (ii) the changepoint sets update part, and (iii) the update of the remaining (hyper-)parameters. In each single MCMC iteration, i=1,2,3,…, the three update parts are successively performed.
We note that this MCMC scheme subsumes MCMC inference for the uncoupled NH-DBN as a special case, in which the hyperparameter vectors are kept fixed at \(\mathbf{m}_{g}={\bf0}\).
In Sect. 2.2.4 we will briefly outline how collapsing and blocking techniques can be employed to improve this RJMCMC within partially collapsed Gibbs sampling scheme from Grzegorczyk and Husmeier (2012b). The technical details have been relegated to the appendix, where a complete description and pseudo code of the advanced MCMC sampling algorithm can be found.
2.2.4 Advanced MCMC inference scheme: collapsing and blocking
The second improvement is related to blocking, as widely applied in Gibbs sampling (Liang et al. 2010). Blocking is a technique by which correlated variables are not sampled separately, but are merged into blocks that are sampled together, conditional on their respective joint Markov blanket. Convergence problems of the original MCMC sampler, discussed in more detail in Sect. 5, resulted from correlations between the variables in layer 6: between the hyperparameters m_{g} and the parent configuration π_{g}, and between the hyperparameters m_{g} and the changepoint configuration \(\mbox {{\boldmath $\tau $}}_{g}\). In our improved MCMC scheme, we form two blocks, grouping m_{g} with π_{g}, and grouping m_{g} with \(\mbox {{\boldmath $\tau $}}_{g}\). Rather than sampling m_{g} on its own, m_{g} is always sampled jointly with the parent configuration π_{g}, and with the changepoint configuration \(\mbox {{\boldmath $\tau $}}_{g}\). While the conceptualization of this idea is simple and intuitive, the mathematical implementation is involved, due to the need to ensure that the sampling schemes satisfies the equations of detailed balance and converges to the proper posterior distribution. The mathematical details have therefore been relegated to the appendix, where a complete description of the algorithm can be found.
3 Data
3.1 Simulated data from the RAF pathway
For our simulation study we implement both dynamic and additive noise, but our focus is on additive white noise with the objective to keep the signal-to-noise ratio (SNR) constant such that it can be controlled and specified.^{8} Additive white noise can be employed without noise inflation. Having generated a time series \(\mathcal{D}\), as described above, we add white noise in a gene-wise manner. For each node, g, we compute the standard deviation, s_{g}, of its last 40 observations, \(\mathcal{D}_{g,2},\ldots,\mathcal {D}_{g,41}\), and we add iid Gaussian noise with zero mean and standard deviation SNR^{−1}⋅s_{g} to each individual observation, where SNR is the pre-defined signal-to-noise ratio level. That is, we substitute \(\mathcal{D}_{g,t}\) (t=2,…,41) for \(\mathcal{D}_{g,t} + v_{g,t}\) where v_{g,2},…,v_{g,41} are realizations of iid \(\mathcal {N}(0,(\mathrm{SNR}^{-1}\cdot s_{g})^{2})\) Gaussian variables. We distinguish three signal-to-noise ratios SNR=10 (weak noise), SNR=3 (moderate noise), and SNR=1 (strong noise).
3.2 Synthetic biology in Saccharomyces cerevisiae
Because of the temporal structure (switch of the carbon source in the middle of the experiment) the merged time series represents a scenario in which both coupling paradigms (global and sequential) can be applied. The Saccharomyces cerevisiae time series is therefore well suited to conduct a comparative evaluation between the proposed global coupling model and the sequential one proposed in Grzegorczyk and Husmeier (2012a).
3.3 Circadian rhythms in Arabidopsis thaliana
Gene expression time series segments for Arabidopsis thaliana. The table contains an overview of the experimental conditions under which each of the gene expression experiments was carried out. We note that there is no natural (temporal) ordering of the four experiments, i.e., the arrangement of the four time series in the table is interchangeable
Experiment 1 | Experiment 2 | Experiment 3 | Experiment 4 | |
---|---|---|---|---|
Source | Mockler et al. (2007) | Edwards et al. (2006) | Grzegorczyk et al. (2008) | Grzegorczyk et al. (2008) |
Time points | 12 | 13 | 13 | 13 |
Time interval | 4 h | 4 h | 2 h | 2 h |
Pre-experimental entrainment | 12h:12h light:dark cycle | 12h:12h light:dark cycle | 10h:10h light:dark cycle | 14h:14h light:dark cycle |
Measurements | Constant light | Constant light | Constant light | Constant light |
Laboratory | Kay Lab | Millar Lab | Millar Lab | Millar Lab |
4 Simulation setting
4.1 The objectives of our empirical studies
- In Sect. 5.1 we employ synthetic data from the RAF pathway and we aim to monitor the network reconstruction accuracy on a series of increasingly strong violations of the prior assumption inherent in (11)–(12). To this end, we generate synthetic data, as explained in Sect. 3.1, and we reverse-engineer the RAF pathway in Fig. 3. We do not allow for self-feedback loops in the NH-DBN models, i.e., we impose the constraints g∉π_{g} (g=1,…,N). In this first study we assume the segmentations (changepoint sets) to be known and we systematically cross-compare the network reconstruction accuracy of the uncoupled and the coupled NH-DBN model for various hyperparameter settings. We also compare the performance of both MCMC sampling schemes: the original and the advanced MCMC sampler, and we include a comparison with a conventional homogeneous DBN. See Fig. 5 and Table 5 for an overview.Table 5
Overview of the four methods under comparison. The conventional dynamic Bayesian network (DBN) model is homogeneous and assumes that the interaction parameters are constant and do not change over time. The non-homogeneous DBN (NH-DBN) models allow for changepoints that divide the time series into segments and for each segment there are segment-specific interaction parameters. Unlike the uncoupled NH-DBN model the coupled NH-DBN model allows for global information sharing (i.e. coupling) between the segment-specific interaction parameters. The coupled NH-DBN model can be inferred with two different MCMC sampling schemes. See Fig. 5 for a graphical representation of the relationships between the four methods
“Conventional” DBN
Uncoupled NH-DBN
Coupled NH-DBN original MCMC
Coupled NH-DBN advanced MCMC
Literature reference
Extension of standard textbooks
Extension of Lèbre et al. (2010)
Extension of Grzegorczyk and Husmeier (2012b)
Extension of Grzegorczyk and Husmeier (2012b)
Model defintion
See Fig. 2 with m_{g}=0 and \(\mbox {{\boldmath $\tau $}}_{g}=\emptyset\) fixed
See Fig. 2 with m_{g}=0 fixed
See Fig. 2
See Fig. 2
Non-homogeneous model?
no
yes
yes
yes
Global information coupling?
–
no
yes
yes
MCMC inference
For a brief explanation see Sect. 4.2
For a brief explanation see Sect. 2.2.3
Original MCMC adapted from Grzegorczyk and Husmeier (2012b)
In Sect. 5.2 we employ gene expression time series from Saccharomyces cerevisiae (see Sect. 3.2) to extend our comparative evaluation by a real-world application. As in the first study we evaluate the network reconstruction accuracy for different hyperparameter settings, we cross-compare the performance of the two MCMC sampling schemes, and we impose the constraints g∉π_{g} (g=1,…,N). But unlike in the first study we assume the segmentations (changepoint sets) to be unknown. The node-specific changepoint sets \(\mbox {{\boldmath $\tau $}}_{g}\) (g=1,…,N) have to be inferred from the data and the network reconstruction accuracy can be monitored in dependence on the inferred segmentations. In Sect. 5.2.2 we extend our cross-method comparison and empirically compare the proposed globally coupled NH-DBN with a sequentially coupled NH-DBN model, presented in Grzegorczyk and Husmeier (2012a), with respect to the network reconstruction accuracy.
In Sect. 5.3 we analyze gene expression time series from Arabidopsis thaliana (see Sect. 3.3). For the Arabidopsis thaliana data a proper evaluation in terms of the network reconstruction accuracy is infeasible owing to the absence of a proper gold standard. Several authors aim to pursue an evaluation without gold standard by arguing for the biological plausibility of subsets of inferred interactions. However, such an approach inevitably suffers from a certain selection bias and is somewhat subject to subjective interpretation. Our primary focus is therefore on quantifying the strength of the information coupling between the time series segments and the influence this coupling has on the regulatory network reconstruction. We compute and compare the correlations between the segment-specific interaction parameter vectors for the uncoupled and for the coupled NH-DBN. For comparing the correlations of the two NH-DBN models we require an invariant segmentation. Since there are four individual time series, which have been measured under different external conditions, as indicated in Table 4, a natural choice is to consider each of the four individual time series as a separate segment. In this third application we do not rule out self feedback loops, i.e., we allow for g∈π_{g} (g=1,…,N), since—from a biological perspective—self feedback loops cannot be excluded for the underlying gene regulatory network.
4.2 Hyperparameter settings for the coupled NH-DBN model and the competing methods
The gene- and segment-specific interaction parameter vectors w_{g,h} are assumed to be multivariate Gaussian distributed according to (11), and in the absence of any genuine prior knowledge we set C_{g,h}=I.
In our first empirical study in Sect. 5.1 we also compare the performance of the two NH-DBN models with the conventional homogeneous DBN, which is a special case of our model with an empty non-adaptable changepoint set.
For the analysis of the Saccharomyces cerevisiae gene expression time series in Sect. 5.2 we follow an unsupervised approach and assume that the changepoints segmenting the time series are unknown. To infer different segmentations we employ different hyperparameters of the point process prior on the changepoint sets. In the point process prior, described in Sect. 2.2.2, the prior distribution for the number of time points between two successive changepoints is a negative binomial distribution with hyperparameters k and p. In the probability mass function of the negative binomial distribution, given in (36), we fix k=1 and vary the hyperparameter p over a wide range of values: p∈{0,0.001,0.005,0.01,0.02,0.03,0.04,0.1,0.2,0.3,0.4}.
In our last empirical study in Sect. 5.2 we compare the performance of the two NH-DBN models with a sequentially coupled NH-DBN model, proposed in Grzegorczyk and Husmeier (2012a). For this study we re-use the hyperparameter values from Grzegorczyk and Husmeier (2012a). A brief description of the sequentially coupled NH-DBN can be found in Sect. 4 of Online Resource 2.
4.3 MCMC simulation lengths, convergence diagnostics and criterions for the network reconstruction accuracy
To assess convergence and mixing we applied standard convergence diagnostics, based on trace plots (Giudici and Castelo 2003) and the potential scale reduction factor (Gelman and Rubin 1992), and found that the PSRF’s of all individual edges were below 1.1 for simulation lengths of 10,000 MCMC steps, when the advanced MCMC sampling scheme is used. More details and in particular details on how we defined a PSRF for an individual network edge can be found in Sect. 3 of Online Resource 1.
If the true network is known, we evaluate the network reconstruction accuracy in terms of the areas under the receiver operator characteristic curve (AUC-ROC) and in terms of the areas under the precision recall curve (AUC-PR). Details on these two criterions can be found in Sect. 3 of Online Resource 1.
5 Results
5.1 Results on simulated data from the RAF pathway
We take the RAF network from Sachs et al. (2005), see Fig. 3, and generate synthetic non-homogeneous time series from a multiple changepoint linear regression model, as explained in Sect. 3.1. Our objective is to monitor the network reconstruction accuracy on a series of increasingly strong violations of the prior assumption inherent in (11)–(12).
5.1.1 Comparative evaluation between three DBN models for fixed level-2 and level-3 hyperparameters and flexible SNR
In a first step we select the level-3 hyperparameters such that the level-2 hyperparameters are equal in prior expectation to those imposed in earlier studies for simpler versions of these NH-DBN models without level-3 hyperpriors (see, e.g., Grzegorczyk and Husmeier 2012b).^{14} We cross-compare the performance of the conventional homogeneous DBN, the uncoupled NH-DBN akin to Lèbre et al. (2010), and the proposed coupled NH-DBN; see Fig. 5 and Table 5 in Sect. 4.
Since the network reconstruction accuracy is close to random expectation for the high noise level (SNR=1) and almost identical for the low (SNR=10) and the moderate (SNR=3) noise level, we focus our attention on the latter in the following subsections.
5.1.2 Comparison of three different coupling schemes for the noise variance hyperparameters
Six coupling schemes (S1)–(S9) for the noise variance hyperparameters, \(\sigma_{g,h}^{2}\), were briefly outlined in Table 2 in Sect. 2.2.1. Throughout this paper we focus on coupling scheme (S8): “weak coupling for nodes, hard coupling for segments”, but in this subsection we briefly compare this scheme with two alternative schemes, namely the (S4) approach: “no coupling for nodes, weak coupling for segments” and the (S5) approach: “weak coupling for both nodes and segments”. For this study we re-use the hyperprior from Sect. 5.1.1 for the signal-to-noise hyperparameters, δ_{g} (g=1,…,N), and we vary the level-3 hyperparameters for the noise variance hyperparameters, \(\sigma_{g}^{2}\) or \(\sigma_{g,h}^{2}\), respectively.^{15} The technical details and figures of the empirical results have been relegated to Sect. 3 of Online Resource 2. Here we just briefly summarize our findings for the RAF pathway data with SNR=3: In a comparative evaluation of the three approaches (S4), (S5), and (S8) for the proposed coupled NH-DBN model we found that the coupled NH-DBN yields consistently the best network reconstruction accuracy when coupling scheme (S8) is employed; see Figs. 7–8 in Sect. 3 of Online Resource 2. Moreover, for each of the three coupling schemes (S4), (S5), and (S8) we found that the proposed coupled NH-DBN model compares favorably to the uncoupled NH-DBN model akin to Lèbre et al. (2010); see Figs. 9–10 in Sect. 3 of Online Resource 2. In particular for (S4), (S5) and (S8) exactly the same trend can be observed: Except for the strongest amplitude of the perturbation (ε=1) the performance improvement of the proposed coupled NH-DBN over the uncoupled NH-DBN is significant and the relative AUC-ROC and AUC-PR differences increase as the amplitude, ε, decreases. Our empirical findings thus suggest that the merits of the proposed coupled NH-DBN model do not depend on the coupling scheme for the noise variance hyperparameters.
5.1.3 Robustness with respect to the level-2 hyperparameters
In the third step we focus on cross-comparing the uncoupled and the coupled NH-DBN model and we investigate whether the trends from Sect. 5.1.1 can also be found for other hyperparameter settings. For this analysis we return to the simpler NH-DBN models without level-3 hyperpriors (Grzegorczyk and Husmeier 2012b). That is, we directly fix the level-2 hyperparameters in (30)–(31), and we re-analyze the synthetic RAF network data with SNR=3 with the two NH-DBN models.^{16} Figures of the empirical results have been relegated to Sect. 1 of Online Resource 2 and can be summarized as follows. In consistency with the results from Sect. 5.1.1, the proposed coupled DBN increasingly outperforms the uncoupled NH-DBN as the amplitude of the perturbation ε of the parameter vectors decreases (see Figs. 1–2 in Sect. 1 of Online Resource 2). Our data analysis not only shows that the relative differences in the network reconstruction accuracy are in favor of the proposed coupled NH-DBN but also reveal that the network reconstruction accuracy, measured in terms of mean AUC-ROC scores, is robust with respect to the choices of the level-2 hyperparameters. As shown in Fig. 3 of Online Resource 2, the proposed coupled NH-DBN yields almost identical AUC-ROC scores for each of the 12 level-2 hyperparameter settings.
5.1.4 Robustness with respect to the level-3 hyperparameters
5.1.5 Posterior distribution of the signal-to-noise hyperparameter in dependence on the level-3 hyperparameters
We want to find the reason why the coupled NH-DBN does not perform better than the uncoupled NH-DBN for weak priors on B_{δ} (see Figs. 7–8). To this end, we explore the posterior distribution of the signal-to-noise hyperparameters, δ_{g}. Since our findings in Sect. 5.1.4 suggest that the two models appear to be robust with respect to a variation of the level-3 hyperprior on B_{σ}, we employ the weakest (most diffuse) prior for B_{σ}, \(B_{\sigma }\sim \operatorname {Gam}(0.01,2)\).
Histograms of the posterior distribution of log(δ_{g}) for the uncoupled NH-DBN with four different level-3 hyperpriors on B_{δ} can be found in Online Resource 2 (see Fig. 4). The level-3 hyperparameters have a moderate effect on the posterior variance, i.e., for the weaker priors the posterior distributions are slightly stronger peaked. The amplitude of the perturbation, ϵ, seems to have no effect on the posterior distribution of δ_{g}. This latter finding is not surprising, since the uncoupled NH-DBN learns the interaction parameters independently for each segment, and it thus does not matter whether the segment-specific interaction parameter vectors are similar or not. For the uncoupled NH-DBN the posterior distribution of δ_{g} depends on the amplitudes of the interaction parameter vectors only. And independently of the amplitude of the perturbations, ϵ, the amplitudes of the interaction parameter vectors are always equal to 1 in this particular application.
As a complementary analysis, Fig. 5 in Online Resource 2 shows overlaid trace plots of the signal-to-noise hyperparameters during the sampling phase (i.e., from iteration 5k to iteration 10k (with k=1,000)), from which the histograms in Fig. 9 have been extracted. The graphs indicate that the extreme signal-to-noise hyperparameter value, log(δ_{g})≪0, observed for weak priors on B_{δ}, is an attractor state, i.e., a state that the MCMC trajectory can converge to, but never leave. We note that the occurrence of such inconsistent absorbing states in Bayesian hierarchical models as a consequence of weak priors was briefly mentioned in Andrieu and Doucet (1999), p. 2673. We will discuss this point in more detail in Sect. 5.1.7.
5.1.6 Comparison of the two MCMC sampling schemes for the coupled NH-DBN model
In this subsection we cross-compare the performance of the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) and the advanced MCMC sampling scheme, proposed here (see Sect. 2.2.4); see Fig. 5 for an overview. To this end, we re-analyze the RAF pathway data with SNR=3 with the original MCMC sampling scheme. We have already seen in Sect. 5.1.4 that weak priors for B_{δ} lead to attractor states with extreme values for the signal-to-noise hyperparameters, δ_{g}. We suggest that these absorbing attractor states might also be responsible for the low network reconstruction accuracy (AUC-ROC values) of the original MCMC sampling in the bottom rows of Fig. 7. For each amplitude of the perturbation, ϵ∈{0,0.125,0.25,0.5,1}, we therefore randomly selected five synthetic RAF pathway data sets, i.e. 25 individual data sets in total, and for each individual data set we assessed convergence of the three NH-DBN methods from Fig. 5 and Table 5. We consider a strong prior and a weak prior on B_{δ}.^{19} With each of the three NH-DBN methods and each of the two priors on B_{σ} we performed H=5 independent MCMC simulations for each of the 25 individual data sets. We assessed convergence and mixing by computing the potential scale reduction factors (PSRFs) from the marginal posterior probabilities of the network edges, as described in detail in Sect. 3 of Online Resource 1.
5.1.7 Discussions of the results for the RAF pathway data
In this subsection we provide a theoretical explanation of two empirical findings. First, we explain why weak (vague) level-3 hyperpriors on B_{δ} are disadvantageous for the proposed coupled NH-DBN. Second, we explain why the advanced MCMC sampling scheme converges substantially better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b).
The disadvantage of weak (diffuse) priors on B_{δ}
In Sect. 5.1.4 we found that the network reconstruction accuracy of the coupled NH-DBN model tends to be superior to that of the uncoupled NH-DBN model unless we use a weak prior on B_{δ} and a medium amplitude of the perturbation, ϵ=0.5; see e.g. Fig. 8. The reason for this behavior becomes clear from the existence of an absorbing state with very low signal-to-noise value, log(δ_{g})≪0, which was already discussed in Sect. 5.1.4 and is illustrated in the two bottom rows of Fig. 9. For this absorbing state, the prior and posterior distributions of the segment-specific interaction parameters, w_{g,h}, become highly peaked around the global hyperparameter vector, m_{g}; see (11) and (27).^{21} Mathematically, w_{g,h} converges in distribution to m_{g} as δ_{g}→0: w_{g,h}→m_{g} (h=1,…,K_{g}) for δ_{g}→0, and the coupled NH-DBN reduces to a conventional homogeneous DBN. We can thus distinguish three regimes for the perturbation amplitude, ϵ. For zero (ϵ=0) or very small perturbations (0<ϵ≪1), the data are adequately modeled with a homogeneous DBN, and by reducing to this model, the coupled NH-DBN outperforms the uncoupled one. For intermediate amplitudes of the perturbation, ϵ=0.5, the data are not adequately modeled by a homogeneous DBN, the attractor state is inconsistent with the data, and by reducing to the homogeneous DBN, the coupled DBN is outperformed by the uncoupled one. For large noise amplitudes, ϵ=1, the attractor state is avoided, and the coupled NH-DBN no longer reduces to the homogeneous one. However, due to the large perturbation there is not much benefit in using any information sharing among segments, and the coupled and uncoupled NH-DBN show approximately equal performance.
As seen from the top rows of Fig. 8, effective information coupling for quasi-homogeneous data can be accomplished with less extreme values of δ_{g} than those of the absorbing state, while entrapment in the absorbing state is detrimental to the performance in the medium perturbation regime around ϵ≈0.5. For that reason, it is advisable to prevent such entrapment. Our results, shown in Fig. 9, suggest that this can be effected by the use of a sufficiently strong (informative, concentrated) prior on B_{δ}.
The advantage of the advanced MCMC sampling scheme
In Sect. 5.1.6 we found that the advanced MCMC sampling scheme, proposed here, converges substantially better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b); see Fig. 11. The convergence improvement that can be reached with the advanced MCMC sampling scheme, can be explained as follows: We assume that the Markov chain has reached a parent node set \(\pi_{g}^{(i)}\), the global interaction hyperparameter vector \(\mathbf{m}_{g}^{(i)}\), and the signal-to-noise hyperparameter, \(\delta_{g}^{(i)}\). Adding a new parent node to the current parent set, \(\pi_{g}^{(i)}\), yields a new parent set \(\pi_{g}^{(\diamond)}\) and the corresponding new global interaction hyperparameter vector, \(\mathbf{m}_{g}^{(\diamond)}\), requires a new component for the new parent node. Unlike the original MCMC sampling scheme, which only samples the new component of \(\mathbf {m}_{g}^{(\diamond)}\) according to its prior distribution (see (12)), the advanced MCMC sampling scheme re-samples the whole global hyperparameter vector, \(\mathbf {m}_{g}^{(\diamond)}\), conditional on the new parent set, \(\pi_{g}^{(\diamond )}\), according to its posterior distribution in (46). That is, the segment-specific interaction parameters for the new parent set are centered around the new vector, \(\mathbf {m}_{g}^{(\diamond)}\), which either contains an a priori sampled entry (original MCMC) or is an a posteriori sample (advanced MCMC). That is, unlike the original MCMC sampling scheme, the advanced MCMC sampling scheme guarantees that the distributions of the segment-specific interaction parameters are centered around an a posteriori sample \(\mathbf {m}_{g}^{(\diamond)}\), and thus ensures that the marginal likelihoods and the acceptance probabilities are higher.^{22} In particular, as discussed above, weak priors on B_{δ} can lead to attractor states with extremely low values for the signal-to-noise hyperparameters, \(\delta_{g}^{(i)}\). For \(\delta_{g}^{(i)}\rightarrow0\) the posterior distributions of the segment-specific interaction parameters, w_{g,h}, are not only centered but peaked^{23} around the global hyperparameter vector, \(\mathbf{m}_{g}^{(\diamond)}\); see (27), and the marginal likelihoods (acceptance probabilities) for the original MCMC sampling scheme, for which \(\mathbf {m}_{g}^{(\diamond)}\) contains an a priori sampled entry, can become very low.
5.2 Gene regulation in Saccharomyces cerevisiae
5.2.1 Performance of the coupled NH-DBN model
In this subsection we compare the three NH-DBN methods (see Fig. 5 and Table 5) on the gene expression profiles from Saccharomyces cerevisiae, described in Sect. 3.2. Here we also know the true regulatory network, shown in Fig. 4, so that we can objectively cross-compare the network reconstruction accuracy on real biological data. Unlike our earlier data analysis in Sect. 5.1 we now follow an unsupervised approach and assume the segmentations (changepoint sets) to be unknown. That is, the changepoint sets have to be inferred from the data. To obtain different data segmentations we run MCMC simulations (with 10k iterations each) for various hyperparameters of the point process prior on the changepoint locations. As described in Sect. 2.2.2, the distance between changepoints is assumed to follow a negative binomial distribution, and we use the hyperparameters k=1 and p∈{0,0.001,0.01,0.02,0.03,0.04,0.1,0.2,0.3,0.4} in (36).
For the synthetic RAF pathway data we found in Sect. 5.1 that the three methods are robust with respect to a variation of the level-3-hyperparameters for the hyperprior on B_{σ}, and we therefore use the weakest prior on B_{σ}.^{24} For the level-3 hyperparameters on B_{δ} we again choose four different settings.^{25}
For the two weak priors on B_{δ} in the bottom row of Fig. 12 the network reconstruction accuracy (measured in terms of AUC-ROC scores) for all three methods is substantially worse than for the stronger priors. Although the coupled NH-DBN model still performs better than the uncoupled NH-DBN model it appears that its performance does not depend on the average number of changepoints. That is, independently of the inferred average number of changepoints \(\overline {K}\) the mean AUC-ROC values of the coupled NH-DBN model are not better than the AUC-ROC values of a conventional homogeneous DBN without changepoints (\(\overline {K}=0\)). In consistency with those findings reported for the synthetic RAF pathway data in Sect. 3.1 it can also be seen from the bottom row in Fig. 12 that the advanced MCMC sampling performs (at least slightly) better than the original MCMC sampling scheme for the two weak priors on B_{δ}.
Overall, our findings for the Saccharomyces cerevisiae time series data are very similar to those observed for the synthetic RAF pathway data in Sect. 5.1. The coupled NH-DBN yields a significantly higher network reconstruction accuracy than the uncoupled NH-DBN. The advanced MCMC sampling performs (here: at least slightly) better than the original MCMC sampling scheme. The results are robust with respect to a variation of the level-3 hyperparameters, unless the prior on B_{δ} is too weak (diffuse) and yields attractor regions in the configurations space of the Markov chain.
5.2.2 Comparison with a sequentially coupled NH-DBN
Because of the temporal structure (switch of the carbon source in the middle of the experiment), the Saccharomyces cerevisiae time series is well suited to conduct a comparative evaluation of the network reconstruction accuracy between the proposed globally coupled NH-DBN and the sequentially coupled NH-DBN (Grzegorczyk and Husmeier 2012a). Unlike the globally coupled NH-DBN, the sequentially coupled NH-DBN model is based on the assumption that the interaction parameters at any time segment are similar to those at the previous time interval, i.e., there is coupling between adjacent time segments only. A brief mathematical description of the sequentially coupled NH-DBN and the empirical results of our cross-method comparison have been relegated to Sect. 4 of Online Resource 2. Our findings (see Figs. 11–12 in Online Resource 2) suggest that the globally coupled NH-DBN performs significantly better than the sequentially coupled NH-DBN model (Grzegorczyk and Husmeier 2012a) with respect to two figures of merit: First, it yields significantly higher maximal AUC scores (AUC-ROC and AUC-PR) than the sequentially coupled NH-DBN.^{28} Second, the degradation of the AUC scores for more changepoints is less pronounced for the globally coupled NH-DBN, indicating increased robustness with respect to a variation of the prior assumptions on the segmentation and redeeming the effect of over-fitting as a consequence of potential model over-flexibility.
A possible explanation for this improvement in performance can be gleaned from (2) in Online Resource 2. The information coupling for the model proposed in Grzegorczyk and Husmeier (2012a) is of the form of a Bayesian filter, and (2) in Online Resource 2 corresponds to a diffusion process. Time series generated from this model are intrinsically unstable, i.e., non-stationary with monotonically increasing variance. This is in mismatch with the actual data observed, and avoided by the model proposed in the present work. A second advantage in performance is related to the way the uncoupled model is obtained as a limiting case of the coupled one. For the model proposed in the present work this is effected by a peaked distribution of m_{g} in (43) and (46), respectively, so that m_{g} effectively becomes fixed. As seen from Fig. 2, a fixed valued of m_{g} implied d-separation between the w_{g,h}’s, i.e., the absence of coupling. Note that this effectively reduces to a hierarchical Bayesian model with one fewer layer of hyperparameters, and does not cause any problems with instability. For the sequentially coupled model proposed in Grzegorczyk and Husmeier (2012a), on the other hand, the strength of coupling decreases with increasing values for λ_{g} in (1)–(2) in Online Resource 2, which also implies an ever increasing degree of instability, though. Hence, a principled shortcoming of the model proposed in Grzegorczyk and Husmeier (2012a) is a systematic dependence between coupling strength and instability, and this problem is averted by the globally coupled model proposed in the present work.
5.3 Gene regulation in Arabidopsis thaliana
In this subsection we apply the proposed coupled NH-DBN model with the advanced MCMC sampling scheme from Sect. 2.2.4 (with 10k MCMC iterations) to the gene expression time series from Arabidopsis thaliana, described in Sect. 3.3. To focus on the relevant task, the regulatory network reconstruction, we kept the changepoints fixed at their known true values. However, it can be seen from Fig. 6 in Sect. 2 of Online Resource 2 that the three changepoints between the four time series in Table 4 can also be inferred from the data. Table 1 in Online Resource 2 provides correlation coefficients of the marginal edge posterior probabilities extracted from the supervised approach (with fixed changepoints) and the unsupervised approaches (with changepoint inference); see Sect. 2 of Online Resource 2 for more details.
6 Conclusion
Modeling non-homogeneous dynamic Bayesian networks (NH-DBNs) with a multiple changepoint process is popular due to the fact that conditional on the changepoints, the marginal likelihood can be computed in closed form. To our knowledge, all previous studies, including Lèbre (2007), Robinson and Hartemink (2009, 2010), Lèbre et al. (2010), Dondelinger et al. (2010, 2012), Husmeier et al. (2010), and Grzegorczyk and Husmeier (2011) compute the marginal likelihood under the assumption of parameter independence and the same independent parameter prior distributions for all time series segments. These approaches ignore the fact that many systems, e.g. regulatory networks and signaling pathways in the cell, adapt to changing internal and external conditions gradually. To allow for information sharing among separate time series segments we have proposed a novel regularized NH-DBN with a coupling mechanism in the sense that a priori the interaction parameters associated with separate time series segments are encouraged to be similar. Our empirical assessment on simulated data has revealed that the proposed method leads to an improvement in the network reconstruction accuracy. For time series from real time (RT) polymerase chain reaction (PCR) experiments in Saccharomyces cerevisiae, we have demonstrated that the novel NH-DBN also yields a better network reconstruction accuracy than the uncoupled NH-DBN, and that it leads to increased robustness with respect to a variation of the prior assumptions about the temporal heterogeneity. We have quantified the effect of the regularization for gene expression time series from Arabidopsis thaliana.
With the present paper we have expanded and improved an earlier conference paper (Grzegorczyk and Husmeier 2012b) in six important aspects. Firstly, due to a strict page limit, the presentation of the methodology in Grzegorczyk and Husmeier (2012b) is very terse, and we have offered a more comprehensive and self-contained exposition (see, e.g., Fig. 2, Table 3). Secondly, we have extended the NH-DBN model from Grzegorczyk and Husmeier (2012b) by introducing an extra (level-3) layer to the hierarchy of the proposed model, which allows for information-sharing among the nodes in the network. As is common with Bayesian hierarchical models, the proposed model depends on various hyperparameters. While the hyperparameters of each node were modeled independently in the original model, the extended model hierarchically couples the node-specific noise variances and the node-specific coupling strengths between the segment-specific interaction parameters (see (30)–(33) in Sect. 2.2.1). We have also presented nine different hierarchical coupling schemes for the noise variances hyperparameters (see Table 2). Thirdly, we have introduced a novel collapsed Gibbs sampling step (see (46) in Sect. 2.2.4; the derivation is provided in Sect. 2 of Online Resource 1), which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithm (see (43) in Sect. 2.2.3). Fourthly and most importantly, we have shown how the novel collapsed Gibbs sampling step and blocking techniques can be exploited for developing a novel advanced MCMC algorithm (see Sect. 2.2.4). We have empirically demonstrated that the advanced MCMC algorithm performs significantly better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) in terms of convergence and mixing (see, e.g., Fig. 11 in Sect. 5.1), and thus practically often also yields a higher network reconstruction accuracy (see, e.g., Fig. 7 in Sect. 5.1 or Fig. 12 in Sect. 5.2.1). Fifthly, in the data analysis we have systematically varied the (hyper-)hyperparameters of those (hyper-)priors that are important for the noise variances and coupling strengths among segments and we have investigated their influence on the performance. Our empirical findings indicate that vague level-3 hyperpriors may lead to extreme attractor states in the MCMC configuration space, as a consequence of which the coupled NH-DBN effectively reduces to a conventional DBN. Our study has provided clear graphical diagnostic tools that allow the user to identify this problem (see Figs. 9, 13, and 14(a)). Also, for sufficiently non-diffuse hyperpriors, this problem can be avoided altogether: our study has indicated that the proposed model is robust with respect to a variation of the level-3 hyperparameters, as long as diffuse hyperpriors are avoided. Sixthly, in Sect. 5.2.2 we have shown that the proposed globally coupled NH-DBN outperforms the sequentially coupled NH-DBN, proposed in Grzegorczyk and Husmeier (2012a), on expression time series from a synthetic biology study in which a synthetically designed Saccharomyces cerevisiae strain is exposed to a change of nutrients in its environment. The better performance seems to result from two methodological improvements, which are related to the avoidance of intrinsic instability and a more natural way of how the coupling scheme includes the uncoupled model as a limiting case (see Sect. 5.2.2).
Footnotes
- 1.
The most important results of our pre-study have been relegated to Sect. 3 of Online Resource 2, and we refer to these results in Sect. 5.1.
- 2.
Since we are modeling gene regulatory processes with NH-DBN models which have node-specific changepoints, the three coupling schemes (S2), (S3), and (S6) from Table 2 are not suitable. Node-specific changepoints imply that there is a separate segmentation for each gene. Consequently, there are gene-specific h-th segments which may represent different or even disjunct time intervals of the gene regulatory process.
- 3.
A priori we have: \(\mathit{CV}(\sigma_{g}^{-2}) :=\frac{E[\sigma_{g}^{-2}]}{\sqrt{\operatorname {Var}(\sigma_{g}^{-2})}}=\sqrt{A_{\sigma}}\) and \(\mathit{CV}(\delta_{g}^{-1}):=\frac{E[\delta_{g}^{-1}]}{\sqrt{\operatorname {Var}(\delta_{g}^{-1})}}=\sqrt{A_{\delta}}\).
- 4.
Note that the negative binomial distribution can be seen as a discrete version of the Gamma distribution.
- 5.
Given a time series of length T we have \(\tilde{n}=T-2\) possible changepoint locations. In a DBN with lag 1 the first time point must be removed, since no preceding time point is available. The last time point is no candidate for a changepoint either, since there are no observations after time point T which could be allocated to a new segment.
- 6.
If we impose an upper limit on the numbers of changepoints per node, K_{g}−1 a priori follows a truncated binomial distribution.
- 7.
- 8.
Dynamic noise systematically increases the variances of the signals for subsequent time points. From (50) it can be seen that adding (dynamic) noise (via u_{g,t}) at time point t increases the expected variance of the variables at time point t, \(\mathcal{D}_{g,t}\), which serve as signals for the next time point t+1. That is, strong dynamic noise injections increase the variances of the variables in \(\mathcal{D}_{g,t}\) and the signal-to-noise ratio gets weaker over time.
- 9.
We used RMA rather than GCRMA for reasons discussed in Lim et al. (2007).
- 10.
- 11.
With this setting of the hyperparameters, A_{σ}=0.005 and E[B_{σ}]=0.005, we follow Lèbre et al. (2010) and Grzegorczyk and Husmeier (2012b). In Grzegorczyk and Husmeier (2012b) we set \(A_{\sigma}=B_{\sigma}=\frac{\nu }{2}\) with ν=0.01. Note that we also briefly investigate the robustness with respect to the level-2 hyperparameters. In a study in Sect. 5.1 we employ fixed level-2 hyperparameters: (A_{σ},B_{σ})∈{(0.0005,0.0005),(0.005,0.005),(0.05,0.05)}.
- 12.
This setting (A_{δ}=2 and E[B_{δ}]=0.2) is motivated by earlier studies (Lèbre et al. 2010; Grzegorczyk and Husmeier 2012b). In Grzegorczyk and Husmeier (2012b) we set A_{δ}=2 and B_{δ}=0.2. Note that we also briefly investigate the robustness with respect to these level-2 hyperparameters; in a study in Sect. 5.1 we employ four pairs of fixed level-2 hyperparameters: (A_{δ},B_{δ})∈{(2,2),(2,0.2),(0.2,2),(0.2,0.2)}.
- 13.
\(\operatorname {Var}[B_{\delta}]=\frac{\alpha_{\delta}}{\beta_{\delta}^{2}}\in\{ 0.0002,0.002,0.02,0.2\}\).
- 14.
- 15.
- 16.
- 17.
As in Grzegorczyk and Husmeier (2012b) we set A_{σ}=0.005 and A_{δ}=2 in (30)–(31), and we consider 12 combinations of the level-3 hyperparameters: (α_{σ},β_{σ})∈{(1,200),(0.1,20),(0.01,2)} and (α_{δ},β_{δ})∈{(200,1000),(20,100),(2,10),(0.2,1)}. Note that all settings a priori ensure: E[B_{σ}]=0.005 and E[B_{δ}]=0.2 (as in Grzegorczyk and Husmeier (2012b)), while the “strengths” (variances) of the priors vary; see Sect. 4 for details.
- 18.
For small amplitudes of the perturbations, (ϵ≈0), the segment-specific interaction parameter vectors are similar. The relationships between nodes can then be adequately approximated by a homogeneous DBN.
- 19.
\(B_{\delta}\sim \operatorname {Gam}(20,100)\)) and (\(B_{\delta}\sim \operatorname {Gam}(0.2,1)\)) in (33).
- 20.
Note that for each ϵ the five individual data sets led to very similar results.
- 21.
- 22.
For the parent flip move the original MCMC sampling scheme also yields lower acceptance probabilities than the advanced MCMC sampling scheme: If the flip move proposes to substitute a “suboptimal” parent node for a “more suitable” new parent node, i.e., to move from \([\pi_{g}^{(i)},\mathbf {m}_{g}^{(i)}]\) to \([\pi_{g}^{(\diamond)},\mathbf {m}_{g}^{(\diamond)}]\), then the component of the suboptimal parent node in \(\mathbf {m}_{g}^{(i)}\) was sampled according to its posterior distribution earlier in the MCMC simulation. The original MCMC sampler which samples the component of the new parent node in \(\mathbf {m}_{g}^{(\diamond)}\) from its prior distribution yields a lower acceptance probability than the advanced MCMC sampler which re-samples \(\mathbf {m}_{g}^{(\diamond)}\) conditional on \(\pi_{g}^{(\diamond)}\) from its posterior distribution (see (46)).
- 23.
- 24.
- 25.
- 26.
For each gene, the mean of the posterior distribution of the number of changepoints was determined, and these values were averaged over all genes to obtain the average number of changepoints per gene, \(\overline {K}\).
- 27.
We have: \(\mathbf {w}_{g,h}^{(i)}\rightarrow \mathbf {m}_{g}^{(i)}\) (\(h=1,\ldots,K_{g}^{(i)}\)) for \(\delta_{g}^{(i)}\rightarrow0\), and this (quasi-)homogeneity also explains why the AUC-ROC scores for the coupled NH-DBN in the bottom row of Fig. 12 do not depend on the average number of changepoints, \(\overline {K}\).
- 28.
Recall that the highest AUC scores are reached for about one changepoint per gene (\(\overline {K}\approx1\)), reflecting the carbon source switch; see Sect. 3.2 for details.
- 29.
- 30.
If the changepoints are known, as assumed in Sect. 2.2.1, we keep them fixed throughout the whole MCMC simulation, i.e., we set \(\mbox {{\boldmath $\tau $}}_{g}^{(i)}=\mbox {{\boldmath $\tau $}}_{g}\) for each g and for all MCMC iterations i.
- 31.
The parent-node flip move was introduced in Grzegorczyk and Husmeier (2011) and randomly chooses a parent node, \(u\in\pi_{g}^{(i)}\), from the current parent node set, \(\pi_{g}^{(i)}\), and randomly chooses a node, \(v\notin\pi_{g}^{(i)}\), which is currently not a parent of node g, and substitutes the current parent node u for the new parent node v.
Notes
Acknowledgements
Marco Grzegorczyk is supported by the German Research Foundation (DFG), research grant GR3853/1-1. The work described in this article was partly carried out under the “Timet” project, funded by an EU FP7 grant.
Supplementary material
References
- Ahmed, A., & Xing, E. (2009). Recovering time-varying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences, 106, 11878–11883. CrossRefGoogle Scholar
- Alabadi, D., Oyama, T., Yanovsky, M., Harmon, F., Mas, P., & Kay, S. (2001). Reciprocal regulation between TOC1 and LHY/CCA1 within the Arabidopsis circadian clock. Science, 293, 880–883. CrossRefGoogle Scholar
- Albert, R. (2005). Scale-free networks in cell biology. Journal of Cell Science, 118, 4947–4957. CrossRefGoogle Scholar
- Andrieu, C., Davy, M., & Doucet, A. (2003). Efficient particle filtering for jump Markov systems. Application to time-varying autoregressions. IEEE Transactions on Signal Processing, 51, 1762–1770. MathSciNetCrossRefGoogle Scholar
- Andrieu, C., & Doucet, A. (1999). Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Transactions on Signal Processing, 47, 2667–2676. CrossRefGoogle Scholar
- Bishop, C. M. (2006). Pattern recognition and machine learning. Singapore: Springer. MATHGoogle Scholar
- Cantone, I., Marucci, L., Iorio, F., Ricci, M., Belcastro, V., Bansal, M., Santini, S., di Bernardo, M., di Bernardo, D., & Cosma, M. (2009). A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches. Cell, 137, 172–181. CrossRefGoogle Scholar
- McClung, C. R. (2006). Plant circadian rhythms. The Plant Cell, 18, 792–803. CrossRefGoogle Scholar
- Dondelinger, F., Lèbre, S., & Husmeier, D. (2010). Heterogeneous continuous dynamic Bayesian networks with flexible structure and inter-time segment information sharing. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the international conference on machine learning (ICML), Madison, WI, USA (pp. 303–310). Google Scholar
- Dondelinger, F., Lèbre, S., & Husmeier, D. (2012). Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure. Machine Learning. doi:10.1007/s10994-012-5311-x. Google Scholar
- Edwards, K., Anderson, P., Hall, A., Salathia, N., Locke, J., Lynn, J., Straume, M., Smith, J., & Millar, A. (2006). Flowering locus C mediates natural variation in the high-temperature response of the Arabidopsis circadian clock. The Plant Cell, 18, 639–650. CrossRefGoogle Scholar
- Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing, 16, 203–213. MathSciNetCrossRefGoogle Scholar
- Friedman, N., & Koller, D. (2003). Being Bayesian about network structure. Machine Learning, 50, 95–126. MATHCrossRefGoogle Scholar
- Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472. CrossRefGoogle Scholar
- Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman and Hall/CRC. MATHGoogle Scholar
- Giudici, P., & Castelo, R. (2003). Improving Markov chain Monte Carlo model search for data mining. Machine Learning, 50, 127–158. MATHCrossRefGoogle Scholar
- Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. MathSciNetMATHCrossRefGoogle Scholar
- Grzegorczyk, M., & Husmeier, D. (2011). Non-homogeneous dynamic Bayesian networks for continuous data. Machine Learning, 83, 355–419. MATHCrossRefGoogle Scholar
- Grzegorczyk, M., & Husmeier, D. (2012a). A non-homogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology. Statistical Applications in Genetics and Molecular Biology, 11, 7. MathSciNetCrossRefGoogle Scholar
- Grzegorczyk, M., & Husmeier, D. (2012b). Bayesian regularization of non-homogeneous dynamic Bayesian networks by globally coupling interaction parameters. In N. Lawrence & M. Girolami (Eds.), JMLR: W&CP: Vol. 22. Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS) (pp. 467–476). Google Scholar
- Grzegorczyk, M., Husmeier, D., Edwards, K., Ghazal, P., & Millar, A. (2008). Modelling non-stationary gene regulatory processes with a non-homogeneous Bayesian network and the allocation sampler. Bioinformatics, 24, 2071–2078. CrossRefGoogle Scholar
- Hill, M. (2012). Sparse graphical models for cancer signalling. PhD thesis, Warwick University. Google Scholar
- Husmeier, D., Dondelinger, F., & Lèbre, S. (2010). Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks. In J. Lafferty, C. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.), Proceedings of the 24th annual conference on neural information processing systems (NIPS) (pp. 901–909). Curran Associates. Google Scholar
- Johnson, C., Elliott, J., & Foster, R. (2003). Entrainment of circadian programs. Chronobiology International, 20, 741–774. CrossRefGoogle Scholar
- Kikis, E., Khanna, R., & Quail, P. (2005). ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY. Plant Journal, 44, 300–313. CrossRefGoogle Scholar
- Kolar, M., Song, L., & Xing, E. (2009). Sparsistent learning of varying-coefficient models with structural changes. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (NIPS) (Vol. 22, pp. 1006–1014). Google Scholar
- Lèbre, S. (2007). Stochastic process analysis for genomics and dynamic Bayesian networks inference. PhD thesis, Université d‘Evry-Val-d‘Essonne, France. Google Scholar
- Lèbre, S., Becq, J., Devaux, F., Lelandais, G., & Stumpf, M. (2010). Statistical inference of the time-varying structure of gene-regulation networks. BMC Systems Biology, 4. Google Scholar
- Liang, F., Liu, C., & Carroll, R. (2010). Wiley series in computational statistics. Advanced Markov chain Monte Carlo methods: learning from past samples. Cornwall: Wiley. MATHCrossRefGoogle Scholar
- Lim, W., Wang, K., Lefebvre, C., & Califano, A. (2007). Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics, 23, i282–i288. CrossRefGoogle Scholar
- Lindley, D. (1962). Discussion on the article by Stein. Journal of the Royal Statistical Society. Series B. Methodological, 24, 265–296. MathSciNetGoogle Scholar
- Locke, J., Southern, M., Kozma-Bognar, L., Hibberd, V., Brown, P., Turner, M., & Millar, A. (2005). Extension of a genetic network model by iterative experimentation and mathematical analysis. Molecular Systems Biology, 1 (online). Google Scholar
- Mockler, T. C., Michael, T. P., Priest, H. D., Shen, R., Sullivan, C. M., Givan, S. A., McEntee, C., Kay, S. A., & Chory, J. (2007). The diurnal project: diurnal and circadian expression profiling, model-based pattern matching and promoter analysis. Cold Spring Harbor Symposia on Quantitative Biology, 72, 353–363. CrossRefGoogle Scholar
- Moulines, E., Priouret, P., & Roueff, F. (2005). On recursive estimation for time varying autoregressive processes. The Annals of Statistics, 33, 2610–2654. MathSciNetMATHCrossRefGoogle Scholar
- Punskaya, E., Andrieu, C., Doucet, A., & Fitzgerald, W. (2002). Bayesian curve fitting using MCMC with applications to signal segmentation. IEEE Transactions on Signal Processing, 50, 747–758. CrossRefGoogle Scholar
- Robinson, J., & Hartemink, A. (2009). Non-stationary dynamic Bayesian networks. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (NIPS) (Vol. 21, pp. 1369–1376). San Mateo: Morgan Kaufmann. Google Scholar
- Robinson, J., & Hartemink, A. (2010). Learning non-stationary dynamic Bayesian networks. Journal of Machine Learning Research, 11, 3647–3680. MathSciNetMATHGoogle Scholar
- Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., & Nolan, G. (2005). Protein-signaling networks derived from multiparameter single-cell data. Science, 308, 523–529. CrossRefGoogle Scholar
- Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proc. of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206). Berkeley: Berkeley University Press. Google Scholar
- Talih, M., & Hengartner, N. (2005). Structural learning with time-varying components: tracking the cross-section of financial time series. Journal of the Royal Statistical Society. Series B. Methodological, 67, 321–341. MathSciNetMATHCrossRefGoogle Scholar
- Wang, S., Cui, L., Cheng, S., Zhai, S., Yeary, M., & Wu, Q. (2011). Noise adaptive LDPC decoding using particle filtering. IEEE Transactions on Communications, 59, 913–916. CrossRefGoogle Scholar
- Xuan, X. (2007). Bayesian inference on change point problems. Master’s thesis, The Faculty of Graduate Studies (Computer Science), The University of British Columbia, Vancouver. Google Scholar
- Xuan, X., & Murphy, K. (2007). Modeling changing dependency structure in multivariate time series. In Z. Ghahramani (Ed.), Proceedings of the 24th annual international conference on machine learning (ICML 2007) (pp. 1055–1062). Madison: Omnipress. CrossRefGoogle Scholar