Regularization of nonhomogeneous dynamic Bayesian networks with global informationcoupling based on hierarchical Bayesian models
 2k Downloads
 6 Citations
Abstract
To relax the homogeneity assumption of classical dynamic Bayesian networks (DBNs), various recent studies have combined DBNs with multiple changepoint processes. The underlying assumption is that the parameters associated with time series segments delimited by multiple changepoints are a priori independent. Under weak regularity conditions, the parameters can be integrated out in the likelihood, leading to a closedform expression of the marginal likelihood. However, the assumption of prior independence is unrealistic in many realworld applications, where the segmentspecific regulatory relationships among the interdependent quantities tend to undergo gradual evolutionary adaptations. We therefore propose a Bayesian coupling scheme to introduce systematic information sharing among the segmentspecific interaction parameters. We investigate the effect this model improvement has on the network reconstruction accuracy in a reverse engineering context, where the objective is to learn the structure of a gene regulatory network from temporal gene expression profiles. The objective of the present paper is to expand and improve an earlier conference paper in six important aspects. Firstly, we offer a more comprehensive and selfcontained exposition of the methodology. Secondly, we extend the model by introducing an extra layer to the model hierarchy, which allows for informationsharing among the network nodes, and we compare various coupling schemes for the noise variance hyperparameters. Thirdly, we introduce a novel collapsed Gibbs sampling step, which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithm. Fourthly, we show how collapsing and blocking techniques can be used for developing a novel advanced MCMC algorithm with significantly improved convergence and mixing. Fifthly, we systematically investigate the influence of the (hyper)hyperparameters of the proposed model. Sixthly, we empirically compare the proposed global information coupling scheme with an alternative paradigm based on sequential information sharing.
Keywords
Nonhomogeneous dynamic Bayesian networks Gene regulatory networks Bayesian regularization Bayesian multiple changepoint processes Reversible jump Markov chain Monte Carlo1 Introduction
There is considerable interest in structure learning of dynamic Bayesian networks (DBNs), with a variety of applications in computational systems biology. However, the standard assumption underlying DBNs—that timeseries have been generated from a homogeneous Markov process—is too restrictive in many applications and can potentially lead to artifacts and erroneous conclusions. While there have been various efforts to relax the homogeneity assumption for undirected graphical models (Talih and Hengartner 2005; Xuan and Murphy 2007), relaxing this restriction in DBNs is a more recent research topic (Lèbre 2007; Robinson and Hartemink 2009, 2010; Ahmed and Xing 2009; Kolar et al. 2009; Lèbre et al. 2010; Dondelinger et al. 2010, 2012; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011). Various authors have proposed relaxing the homogeneity assumption by complementing the traditional homogeneous DBN with a Bayesian multiple changepoint process (Lèbre 2007; Robinson and Hartemink 2009, 2010; Lèbre et al. 2010; Dondelinger et al. 2010, 2012; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011). Each time series segment defined by two demarcating changepoints is associated with separate nodespecific DBN parameters, and in this way the conditional probability distributions are allowed to vary from segment to segment. An attractive feature of this approach is that under certain regularity conditions, most notably parameter independence and conjugacy of the prior, the parameters can be integrated out in closed form in the likelihood. The inference task thus reduces to sampling the network structure as well as the number and location of changepoints from the posterior distribution, which can be effected with reversible jump Markov chain Monte Carlo (RJMCMC) (Green 1995), e.g., as in Lèbre et al. (2010) or Robinson and Hartemink (2010), or with dynamic programming (Fearnhead 2006), as in Grzegorczyk and Husmeier (2011).
In many realword applications, the assumption of parameter independence is questionable, though. Consider the cellular processes during an organism’s development (morphogenesis) or its adaptation to changing environmental conditions. The assumption of a homogeneous process with constant parameters is overrestrictive in that it fails to allow for the nonstationary nature of the processes. However, complete parameter independence is overflexible in that it ignores the evolutionary aspect of adaptation processes, where the majority of segmentspecific regulatory relationships among the interdependent quantities tend to undergo minor and gradual adaptations. Given a regulatory network at some time interval in an organism’s life cycle, it is unrealistic to assume that at the adjacent time intervals, nature has reinvented different regulatory circuits from scratch. Instead, we would assume that the knowledge of the interaction strengths at other time intervals will improve the inference of the interaction strengths associated with the given time interval, especially for sparse data. In what follows, we will describe how this idea can be implemented in the model, and which adaptations are required for the inference scheme.
There are various articles from the signal processing community that are related to our work. Our hierarchical Bayesian model structure is similar to the one proposed in Punskaya et al. (2002). However, in Punskaya et al. (2002) information is only shared among different parameter vectors via a common scalar scale hyperparameter, which does not provide the sort of more explicit information sharing motivated by our discussion above. Like the model in Punskaya et al. (2002), our model is based on a switching piecewise homogeneous autoregressive process, whereas the models in Andrieu et al. (2003), Moulines et al. (2005), and Wang et al. (2011) are based on continuously time varying autoregressive processes. Like our paper, Moulines et al. (2005) and Wang et al. (2011) introduce information sharing between consecutive regression parameter vectors; this is only achieved indirectly in Andrieu et al. (2003) via a nonlinear transformation into the space of complexvalued poles. Moulines et al. (2005) is a theoretical nonBayesian paper on error bounds under a Lipschitz condition. A closer relative to our paper is the method of Wang et al. (2011), whose objective is online parameter estimation via particle filtering, with applications e.g. in tracking. This is a different scenario from most systems biology applications, where an interaction structure is typically learnt offline after completion of the experiments. Unlike Wang et al. (2011), our work thus follows other applications of DBNs in systems biology (Lèbre et al. 2010; Robinson and Hartemink 2009, 2010; Dondelinger et al. 2010; Husmeier et al. 2010; Grzegorczyk and Husmeier 2011), and Dondelinger et al. (2012) and aims to infer the model structure by marginalizing out the parameters in closed form. To paraphrase this: while inference in Wang et al. (2011) is based on filtering, inference in our work is based on smoothing.
Overview to timevarying dynamic Bayesian network models, which have recently been proposed in the literature. Detailed explanations are given in the text
Hard coupled network(s)  Weakly coupled networks  Weakly coupled networks  Uncoupled networks  Weakly coupled parameters  

Literature reference(s)  Grzegorczyk and Husmeier (2011)  Dondelinger et al. (2010) or Robinson and Hartemink (2011)  Dondelinger et al. (2012)  Lèbre et al. (2010)  Proposed here 
Network structures flexible?  No  Yes  Yes  Yes  No 
Network coupling scheme:  network is kept fixed  networks are sequentially coupled  networks are globally coupled  networks are not coupled  network is kept fixed 
Network parameters flexible?  Yes  Yes  Yes  Yes  Yes 
Network parameters coupled?  No  No  No  No  Yes 
In a previous journal paper, we have proposed a model for sequential information sharing with respect to the interaction parameters (Grzegorczyk and Husmeier 2012a). In a previous conference article, we have proposed a model for global information sharing with respect to the interaction parameters (Grzegorczyk and Husmeier 2012b). The objective of the present work is sixfold. Firstly, due to a strict page limit, the presentation of the methodology in Grzegorczyk and Husmeier (2012b) is very terse, and we here offer a more comprehensive and selfcontained exposition. In particular, in Grzegorczyk and Husmeier (2012b) we only briefly outlined the Gibbs sampling scheme for inference. Here we provide all technical details including a graphical representation of the novel model and pseudocode for the inference algorithm. Secondly, neither the sequentially (Grzegorczyk and Husmeier 2012a) nor the globally (Grzegorczyk and Husmeier 2012b) coupled model allow for informationsharing among the nodes in the network. Here, we extend the model from Grzegorczyk and Husmeier (2012b) by introducing an extra (level3) layer to the hierarchy of the proposed model. While the hyperparameters of each node were modeled independently in the original models, the extended model hierarchically couples the nodespecific noise variances and the nodespecific coupling strengths between the segmentspecific interaction parameters. Moreover, in our earlier works (Grzegorczyk and Husmeier 2012a, 2012b) we focused on nodespecific variance hyperparameters which are shared by the nodespecific time intervals. Here, we present nine different coupling schemes for the noise variance hyperparameters and we empirically compare three of them. Thirdly, we introduce a novel collapsed Gibbs sampling step, which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithms. Fourthly and most importantly, we show how this novel collapsed Gibbs sampling step as well as blocking techniques can be used for developing a novel advanced MCMC algorithm. We empirically show that the advanced MCMC algorithm performs significantly better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) in terms of convergence and mixing. In this context we also consider scenarios where the original MCMC sampling scheme fails to converge so that the advanced MCMC sampling scheme also reaches a better network reconstruction accuracy. Fifthly, neither in Grzegorczyk and Husmeier (2012a) nor in Grzegorczyk and Husmeier (2012b) did we investigate the robustness of the proposed model with respect to a variation of the fixed (hyper)hyperparameters, and we focused our attention on one single hyperparameter setting, which was taken from Lèbre et al. (2010). Here we systematically vary the (hyper)hyperparameters of those (hyper)priors that are important for the noise variances and coupling strengths among segments and we investigate their influence on the performance. Sixthly, we conduct a comparative evaluation between the proposed global information coupling scheme and the alternative paradigm based on sequential information sharing (Grzegorczyk and Husmeier 2012a), and we discuss reasons for the potential fundamental improvement achieved with the new approach.
2 Mathematical details
2.1 Bayesian linear regression
2.2 Application to dynamic Bayesian networks
2.2.1 Fixed changepoints
Overview of the coupling schemes (S1)–(S9) for the noise variance hyperparameters. No coupling: The noise variance hyperparameters are dseparated, i.e., they have separate level2 hyperparameters which are fixed. Weak coupling: The noise variance hyperparameters are not dseparated, i.e., they share a set of common level2 hyperparameters which are flexible. Hard coupling: There are common noise variance hyperparameters (with fixed level2 hyperparameters)
Segments h=1,…,K _{ g }  No coupling  Nodes (g=1,…,N) weak coupling  Hard coupling 

No coupling  (S1) \(\sigma_{g,h}^{2}\sim \operatorname {Gam}(A_{\sigma,g,h},B_{\sigma,g,h})\) A _{ σ,g,h } and B _{ σ,g,h } fixed  (S2) \(\sigma_{g,h}^{2}\sim \operatorname {Gam}(A_{\sigma,h},B_{\sigma,h})\) A _{ σ,h } and/or B _{ σ,h } flexible i.e. \(\{\sigma_{g,h}^{2}\}_{g}\) coupled ∀h  (S3) \(\sigma_{g,h}^{2}=\sigma_{h}^{2}\) \(\sigma_{h}^{2}\sim \operatorname {Gam}(A_{\sigma,h},B_{\sigma,h})\) A _{ σ,h } and B _{ σ,h } fixed 
Weak coupling  (S4) \(\sigma_{g,h}^{2}\sim \operatorname {Gam}(A_{\sigma,g},B_{\sigma,g})\) A _{ σ,g } and/or B _{ σ,g } flexible i.e. \(\{\sigma_{g,h}^{2}\}_{h}\) coupled ∀g  (S5) \(\sigma_{g,h}^{2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A _{ σ } and/or B _{ σ } flexible i.e. \(\{\sigma_{g,h}^{2}\}_{g,h}\) coupled  (S6) \(\sigma_{g,h}^{2}=\sigma_{h}^{2}\) \(\sigma_{h}^{2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A _{ σ } and/or B _{ σ } flexible i.e. \(\{\sigma_{h}^{2}\}_{h}\) coupled 
Hard coupling  (S7) \(\sigma_{g,h}^{2}=\sigma_{g}^{2}\) \(\sigma_{g}^{2}\sim \operatorname {Gam}(A_{\sigma,g},B_{\sigma,g})\) A _{ σ,g } and B _{ σ,g } fixed  (S8) \(\sigma_{g,h}^{2}=\sigma_{g}^{2}\) \(\sigma_{g}^{2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A _{ σ } and/or B _{ σ } flexible i.e. \(\{\sigma_{g}^{2}\}_{g}\) coupled  (S9) \(\sigma_{g,h}^{2}=\sigma^{2}\) \(\sigma^{2}\sim \operatorname {Gam}(A_{\sigma},B_{\sigma})\) A _{ σ } and B _{ σ } fixed 
A systematic comparative evaluation of the coupling schemes (S1)–(S9) from Table 2 is confounded by the dependence of the performance of these methods on the choice of the level2 hyperparameters and the level3 hyperpriors. We therefore decided to select scheme (S8) based on the following four facts. First, for our applications to gene regulatory networks we would expect the differences among nodes (genes) to be more substantial than the differences among (time) segments for the same node (gene), which suggests a natural hierarchy of the strength of the coupling. Second, in explorative simulations, which we carried out for our earlier conference paper (Grzegorczyk and Husmeier 2012b), we obtained slightly better results with the “no coupling for the nodes, hard coupling for the segments” scheme (S7) than for the “fully flexible approach” (S1), which suggests that segmentspecific noise variances hyperparameters lead to overflexibility. Third, with coupling scheme (S8) the signaltonoise hyperparameters, δ _{ g }, as well as the noise variance hyperparameters, \(\sigma_{g}^{2}\), are both gene but not segmentspecific. Thus, both types of hyperparameters can consistently (symmetrically) be weakly coupled for the nodes. Fourth and most importantly, in an explorative prestudy for this paper we implemented the NHDBN models with schemes (S8), (S4), and (S5) and for synthetic data we empirically found that coupling scheme (S8) performs consistently better than the coupling schemes (S4) and (S5).^{1} ^{,} ^{2}
Table of (hyper)parameters and symbols, which have been introduced
Symbol  Explanation 

g  The gth network node (g=1,…,N) 
K _{ g }  The number of segments for node g 
h  The hth time segment (h=1,…,K _{ g }) 
\(\mathcal{M}\)  The network structure, \(\mathcal{M}=\{\pi_{1},\ldots,\pi_{N}\}\) 
\(\sigma_{g}^{2}\)  The noise variance hyperparameter for node g see (16) 
δ _{ g }  The signaltonoise hyperparameter for node g see (17); \(\delta_{g}^{1}\) is the “coupling strength” in the coupled NHDBN 
π _{ g }  The parent node set of node g 
\(\mathcal{F}\)  The fanin restriction: \(\pi_{g}\leq\mathcal{F}\) for all nodes g 
\(\mbox {{\boldmath $\tau $}}_{g}\)  The set of changepoints, \(\mbox {{\boldmath $\tau $}}_{g}=\{\tau_{g,1},\ldots,\tau_{g,K_{g}1}\}\), for node g 
m _{ g }  The global interaction hyperparameter vector for node g 
w _{ g,h }  The interaction parameter vector for the hth segment of node g 
y _{ g,h }  The target values of node g in segment h 
\(\mathbf {X}_{\pi _{g},h}\)  The design matrix for segment h of node g 
\(\mathbf {y}_{g,\mbox {{\boldmath $\tau $}}_{g}}\)  The set of target values, \(\{\mathbf {y}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) 
\(\mathbf {w}_{g,\mbox {{\boldmath $\tau $}}_{g}}\)  The set of interaction parameter vectors, \(\{\mathbf {w}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) 
\(\mathbf {X}_{\pi _{g},\mbox {{\boldmath $\tau $}}_{g}}\)  The set of design matrices, \(\{\mathbf {w}_{g,h}\}_{h=1,\ldots,K_{g}}\), implied by \(\mbox {{\boldmath $\tau $}}_{g}\) 
p and k  The hyperparameters of the negative binomial prior for the distance between changepoints, implying the changepoint sets, \(\mbox {{\boldmath $\tau $}}_{g}\); see Sect. 2.2.2 
m _{†}, Σ _{†}  The level2 hyperparameters of the Gaussian prior for m _{ g }, see (12) 
A _{ σ }, B _{ σ }  The level2 hyperparameters of the Gamma prior for \(\sigma_{g}^{2}\), see (30) 
A _{ δ }, B _{ δ }  The level2 hyperparameters of the Gamma prior for \(\delta_{g}^{1}\), see (31) 
α _{ σ }, β _{ σ }  The level3 hyperparameters of the Gamma prior for B _{ σ }, see (32) 
α _{ δ }, β _{ δ }  The level3 hyperparameters of the Gamma prior for B _{ δ }, see (33) 
2.2.2 Variable changepoints
So far, we have assumed that the nodespecific changepoints \(\mbox {{\boldmath $\tau $}}_{g}\) are fixed, but it is straightforward to make them variable. To this end, we need to decide on a prior distribution. Two alternative forms have been compared in Fearnhead (2006). The first approach, adopted in Lèbre et al. (2010), is based on a truncated Poisson prior on the number of changepoints (K _{ g }−1), and an explicit specification of \(P(\mbox {{\boldmath $\tau $}}_{g}(K_{g}1))\), e.g. the uniform distribution. The second alternative, pursued in Grzegorczyk and Husmeier (2011) and used in the present work, is based on a point process, where the distribution of the distance between two successive points is a negative binomial distribution.
2.2.3 Hierarchical Bayesian model and MCMC inference scheme
The other prior distributions have been discussed in the previous sections. Sampling from the joint posterior distribution follows a Gibbs sampling like strategy, in which variables are sampled from their respective conditional distributions given the other variables in their Markov blankets. Whenever possible, we sample from the closedform distributions and use collapsing, i.e. integrate (some) variables from the Markov blankets out analytically. Where closed form distributions are not available, we resort to RJMCMC steps. The overall sampling scheme is hence of the type RJMCMC within partially collapsed Gibbs.
The conditional distributions of the parent sets π _{ g }, which define the network structure, and the changepoint sets \(\mbox {{\boldmath $\tau $}}_{g}\), are not of closed form. Sampling of \(\mbox {{\boldmath $\tau $}}_{g}\) from the proper conditional distribution (conditional on the variables in its Markov blanket) can be effected with the dynamic programming scheme described in Grzegorczyk and Husmeier (2011), at computational complexity quadratic in the time series length. Sampling of the parent configurations π _{ g } from the respective conditional distribution is also feasible, by exhaustive enumeration of all valid parent configurations (subject to the fanin restriction, \(\mathcal{F}\)) and normalization of their local posterior probability potentials. In principle, it is therefore possible to set up an overall Gibbs sampler that does not require any MetropolisHastings(Green) moves (Green 1995). However, the computational complexity of Gibbs sampling steps for π _{ g } and \(\mbox {{\boldmath $\tau $}}_{g}\) is substantially higher than that of all other sampling steps. These disproportional computational costs are suboptimal in a bottleneck sense by which the number of sampling steps for the other variables is restricted to the number of feasible dynamic programming and complete enumeration steps. An alternative approach is to give up on the desire to sample π _{ g } and \(\mbox {{\boldmath $\tau $}}_{g}\) from the conditional distribution directly, and use a MetropolisHastingsGreen RJMCMC scheme instead. This leaves the computational complexity of all individual sampling steps roughly balanced, and is the approach we adopted for the present work.
From (41) the network structure, \(\mathcal{M}\), can be sampled with the “improved structure MCMC sampling scheme” proposed in Grzegorczyk and Husmeier (2011). From (42) the changepoint sets, \(\{\mbox {{\boldmath $\tau $}}_{g}\}_{g}\) (g=1,…,N), can be sampled with reversible jump Markov chain Monte Carlo (RJMCMC) (Green 1995), as in Lèbre et al. (2010) and Robinson and Hartemink (2010).
The original MCMC simulation consists of three successive parts: (i) the network structure update part, (ii) the changepoint sets update part, and (iii) the update of the remaining (hyper)parameters. In each single MCMC iteration, i=1,2,3,…, the three update parts are successively performed.
We note that this MCMC scheme subsumes MCMC inference for the uncoupled NHDBN as a special case, in which the hyperparameter vectors are kept fixed at \(\mathbf{m}_{g}={\bf0}\).
In Sect. 2.2.4 we will briefly outline how collapsing and blocking techniques can be employed to improve this RJMCMC within partially collapsed Gibbs sampling scheme from Grzegorczyk and Husmeier (2012b). The technical details have been relegated to the appendix, where a complete description and pseudo code of the advanced MCMC sampling algorithm can be found.
2.2.4 Advanced MCMC inference scheme: collapsing and blocking
The second improvement is related to blocking, as widely applied in Gibbs sampling (Liang et al. 2010). Blocking is a technique by which correlated variables are not sampled separately, but are merged into blocks that are sampled together, conditional on their respective joint Markov blanket. Convergence problems of the original MCMC sampler, discussed in more detail in Sect. 5, resulted from correlations between the variables in layer 6: between the hyperparameters m _{ g } and the parent configuration π _{ g }, and between the hyperparameters m _{ g } and the changepoint configuration \(\mbox {{\boldmath $\tau $}}_{g}\). In our improved MCMC scheme, we form two blocks, grouping m _{ g } with π _{ g }, and grouping m _{ g } with \(\mbox {{\boldmath $\tau $}}_{g}\). Rather than sampling m _{ g } on its own, m _{ g } is always sampled jointly with the parent configuration π _{ g }, and with the changepoint configuration \(\mbox {{\boldmath $\tau $}}_{g}\). While the conceptualization of this idea is simple and intuitive, the mathematical implementation is involved, due to the need to ensure that the sampling schemes satisfies the equations of detailed balance and converges to the proper posterior distribution. The mathematical details have therefore been relegated to the appendix, where a complete description of the algorithm can be found.
3 Data
3.1 Simulated data from the RAF pathway
For our simulation study we implement both dynamic and additive noise, but our focus is on additive white noise with the objective to keep the signaltonoise ratio (SNR) constant such that it can be controlled and specified.^{8} Additive white noise can be employed without noise inflation. Having generated a time series \(\mathcal{D}\), as described above, we add white noise in a genewise manner. For each node, g, we compute the standard deviation, s _{ g }, of its last 40 observations, \(\mathcal{D}_{g,2},\ldots,\mathcal {D}_{g,41}\), and we add iid Gaussian noise with zero mean and standard deviation SNR^{−1}⋅s _{ g } to each individual observation, where SNR is the predefined signaltonoise ratio level. That is, we substitute \(\mathcal{D}_{g,t}\) (t=2,…,41) for \(\mathcal{D}_{g,t} + v_{g,t}\) where v _{ g,2},…,v _{ g,41} are realizations of iid \(\mathcal {N}(0,(\mathrm{SNR}^{1}\cdot s_{g})^{2})\) Gaussian variables. We distinguish three signaltonoise ratios SNR=10 (weak noise), SNR=3 (moderate noise), and SNR=1 (strong noise).
3.2 Synthetic biology in Saccharomyces cerevisiae
Because of the temporal structure (switch of the carbon source in the middle of the experiment) the merged time series represents a scenario in which both coupling paradigms (global and sequential) can be applied. The Saccharomyces cerevisiae time series is therefore well suited to conduct a comparative evaluation between the proposed global coupling model and the sequential one proposed in Grzegorczyk and Husmeier (2012a).
3.3 Circadian rhythms in Arabidopsis thaliana
Gene expression time series segments for Arabidopsis thaliana. The table contains an overview of the experimental conditions under which each of the gene expression experiments was carried out. We note that there is no natural (temporal) ordering of the four experiments, i.e., the arrangement of the four time series in the table is interchangeable
Experiment 1  Experiment 2  Experiment 3  Experiment 4  

Source  Mockler et al. (2007)  Edwards et al. (2006)  Grzegorczyk et al. (2008)  Grzegorczyk et al. (2008) 
Time points  12  13  13  13 
Time interval  4 h  4 h  2 h  2 h 
Preexperimental entrainment  12h:12h light:dark cycle  12h:12h light:dark cycle  10h:10h light:dark cycle  14h:14h light:dark cycle 
Measurements  Constant light  Constant light  Constant light  Constant light 
Laboratory  Kay Lab  Millar Lab  Millar Lab  Millar Lab 
4 Simulation setting
4.1 The objectives of our empirical studies
 In Sect. 5.1 we employ synthetic data from the RAF pathway and we aim to monitor the network reconstruction accuracy on a series of increasingly strong violations of the prior assumption inherent in (11)–(12). To this end, we generate synthetic data, as explained in Sect. 3.1, and we reverseengineer the RAF pathway in Fig. 3. We do not allow for selffeedback loops in the NHDBN models, i.e., we impose the constraints g∉π _{ g } (g=1,…,N). In this first study we assume the segmentations (changepoint sets) to be known and we systematically crosscompare the network reconstruction accuracy of the uncoupled and the coupled NHDBN model for various hyperparameter settings. We also compare the performance of both MCMC sampling schemes: the original and the advanced MCMC sampler, and we include a comparison with a conventional homogeneous DBN. See Fig. 5 and Table 5 for an overview.Table 5
Overview of the four methods under comparison. The conventional dynamic Bayesian network (DBN) model is homogeneous and assumes that the interaction parameters are constant and do not change over time. The nonhomogeneous DBN (NHDBN) models allow for changepoints that divide the time series into segments and for each segment there are segmentspecific interaction parameters. Unlike the uncoupled NHDBN model the coupled NHDBN model allows for global information sharing (i.e. coupling) between the segmentspecific interaction parameters. The coupled NHDBN model can be inferred with two different MCMC sampling schemes. See Fig. 5 for a graphical representation of the relationships between the four methods
“Conventional” DBN
Uncoupled NHDBN
Coupled NHDBN original MCMC
Coupled NHDBN advanced MCMC
Literature reference
Extension of standard textbooks
Extension of Lèbre et al. (2010)
Extension of Grzegorczyk and Husmeier (2012b)
Extension of Grzegorczyk and Husmeier (2012b)
Model defintion
See Fig. 2 with m _{ g }=0 and \(\mbox {{\boldmath $\tau $}}_{g}=\emptyset\) fixed
See Fig. 2 with m _{ g }=0 fixed
See Fig. 2
See Fig. 2
Nonhomogeneous model?
no
yes
yes
yes
Global information coupling?
–
no
yes
yes
MCMC inference
For a brief explanation see Sect. 4.2
For a brief explanation see Sect. 2.2.3
Original MCMC adapted from Grzegorczyk and Husmeier (2012b)

In Sect. 5.2 we employ gene expression time series from Saccharomyces cerevisiae (see Sect. 3.2) to extend our comparative evaluation by a realworld application. As in the first study we evaluate the network reconstruction accuracy for different hyperparameter settings, we crosscompare the performance of the two MCMC sampling schemes, and we impose the constraints g∉π _{ g } (g=1,…,N). But unlike in the first study we assume the segmentations (changepoint sets) to be unknown. The nodespecific changepoint sets \(\mbox {{\boldmath $\tau $}}_{g}\) (g=1,…,N) have to be inferred from the data and the network reconstruction accuracy can be monitored in dependence on the inferred segmentations. In Sect. 5.2.2 we extend our crossmethod comparison and empirically compare the proposed globally coupled NHDBN with a sequentially coupled NHDBN model, presented in Grzegorczyk and Husmeier (2012a), with respect to the network reconstruction accuracy.

In Sect. 5.3 we analyze gene expression time series from Arabidopsis thaliana (see Sect. 3.3). For the Arabidopsis thaliana data a proper evaluation in terms of the network reconstruction accuracy is infeasible owing to the absence of a proper gold standard. Several authors aim to pursue an evaluation without gold standard by arguing for the biological plausibility of subsets of inferred interactions. However, such an approach inevitably suffers from a certain selection bias and is somewhat subject to subjective interpretation. Our primary focus is therefore on quantifying the strength of the information coupling between the time series segments and the influence this coupling has on the regulatory network reconstruction. We compute and compare the correlations between the segmentspecific interaction parameter vectors for the uncoupled and for the coupled NHDBN. For comparing the correlations of the two NHDBN models we require an invariant segmentation. Since there are four individual time series, which have been measured under different external conditions, as indicated in Table 4, a natural choice is to consider each of the four individual time series as a separate segment. In this third application we do not rule out self feedback loops, i.e., we allow for g∈π _{ g } (g=1,…,N), since—from a biological perspective—self feedback loops cannot be excluded for the underlying gene regulatory network.
4.2 Hyperparameter settings for the coupled NHDBN model and the competing methods
The gene and segmentspecific interaction parameter vectors w _{ g,h } are assumed to be multivariate Gaussian distributed according to (11), and in the absence of any genuine prior knowledge we set C _{ g,h }=I.
In our first empirical study in Sect. 5.1 we also compare the performance of the two NHDBN models with the conventional homogeneous DBN, which is a special case of our model with an empty nonadaptable changepoint set.
For the analysis of the Saccharomyces cerevisiae gene expression time series in Sect. 5.2 we follow an unsupervised approach and assume that the changepoints segmenting the time series are unknown. To infer different segmentations we employ different hyperparameters of the point process prior on the changepoint sets. In the point process prior, described in Sect. 2.2.2, the prior distribution for the number of time points between two successive changepoints is a negative binomial distribution with hyperparameters k and p. In the probability mass function of the negative binomial distribution, given in (36), we fix k=1 and vary the hyperparameter p over a wide range of values: p∈{0,0.001,0.005,0.01,0.02,0.03,0.04,0.1,0.2,0.3,0.4}.
In our last empirical study in Sect. 5.2 we compare the performance of the two NHDBN models with a sequentially coupled NHDBN model, proposed in Grzegorczyk and Husmeier (2012a). For this study we reuse the hyperparameter values from Grzegorczyk and Husmeier (2012a). A brief description of the sequentially coupled NHDBN can be found in Sect. 4 of Online Resource 2.
4.3 MCMC simulation lengths, convergence diagnostics and criterions for the network reconstruction accuracy
To assess convergence and mixing we applied standard convergence diagnostics, based on trace plots (Giudici and Castelo 2003) and the potential scale reduction factor (Gelman and Rubin 1992), and found that the PSRF’s of all individual edges were below 1.1 for simulation lengths of 10,000 MCMC steps, when the advanced MCMC sampling scheme is used. More details and in particular details on how we defined a PSRF for an individual network edge can be found in Sect. 3 of Online Resource 1.
If the true network is known, we evaluate the network reconstruction accuracy in terms of the areas under the receiver operator characteristic curve (AUCROC) and in terms of the areas under the precision recall curve (AUCPR). Details on these two criterions can be found in Sect. 3 of Online Resource 1.
5 Results
5.1 Results on simulated data from the RAF pathway
We take the RAF network from Sachs et al. (2005), see Fig. 3, and generate synthetic nonhomogeneous time series from a multiple changepoint linear regression model, as explained in Sect. 3.1. Our objective is to monitor the network reconstruction accuracy on a series of increasingly strong violations of the prior assumption inherent in (11)–(12).
5.1.1 Comparative evaluation between three DBN models for fixed level2 and level3 hyperparameters and flexible SNR
In a first step we select the level3 hyperparameters such that the level2 hyperparameters are equal in prior expectation to those imposed in earlier studies for simpler versions of these NHDBN models without level3 hyperpriors (see, e.g., Grzegorczyk and Husmeier 2012b).^{14} We crosscompare the performance of the conventional homogeneous DBN, the uncoupled NHDBN akin to Lèbre et al. (2010), and the proposed coupled NHDBN; see Fig. 5 and Table 5 in Sect. 4.
Since the network reconstruction accuracy is close to random expectation for the high noise level (SNR=1) and almost identical for the low (SNR=10) and the moderate (SNR=3) noise level, we focus our attention on the latter in the following subsections.
5.1.2 Comparison of three different coupling schemes for the noise variance hyperparameters
Six coupling schemes (S1)–(S9) for the noise variance hyperparameters, \(\sigma_{g,h}^{2}\), were briefly outlined in Table 2 in Sect. 2.2.1. Throughout this paper we focus on coupling scheme (S8): “weak coupling for nodes, hard coupling for segments”, but in this subsection we briefly compare this scheme with two alternative schemes, namely the (S4) approach: “no coupling for nodes, weak coupling for segments” and the (S5) approach: “weak coupling for both nodes and segments”. For this study we reuse the hyperprior from Sect. 5.1.1 for the signaltonoise hyperparameters, δ _{ g } (g=1,…,N), and we vary the level3 hyperparameters for the noise variance hyperparameters, \(\sigma_{g}^{2}\) or \(\sigma_{g,h}^{2}\), respectively.^{15} The technical details and figures of the empirical results have been relegated to Sect. 3 of Online Resource 2. Here we just briefly summarize our findings for the RAF pathway data with SNR=3: In a comparative evaluation of the three approaches (S4), (S5), and (S8) for the proposed coupled NHDBN model we found that the coupled NHDBN yields consistently the best network reconstruction accuracy when coupling scheme (S8) is employed; see Figs. 7–8 in Sect. 3 of Online Resource 2. Moreover, for each of the three coupling schemes (S4), (S5), and (S8) we found that the proposed coupled NHDBN model compares favorably to the uncoupled NHDBN model akin to Lèbre et al. (2010); see Figs. 9–10 in Sect. 3 of Online Resource 2. In particular for (S4), (S5) and (S8) exactly the same trend can be observed: Except for the strongest amplitude of the perturbation (ε=1) the performance improvement of the proposed coupled NHDBN over the uncoupled NHDBN is significant and the relative AUCROC and AUCPR differences increase as the amplitude, ε, decreases. Our empirical findings thus suggest that the merits of the proposed coupled NHDBN model do not depend on the coupling scheme for the noise variance hyperparameters.
5.1.3 Robustness with respect to the level2 hyperparameters
In the third step we focus on crosscomparing the uncoupled and the coupled NHDBN model and we investigate whether the trends from Sect. 5.1.1 can also be found for other hyperparameter settings. For this analysis we return to the simpler NHDBN models without level3 hyperpriors (Grzegorczyk and Husmeier 2012b). That is, we directly fix the level2 hyperparameters in (30)–(31), and we reanalyze the synthetic RAF network data with SNR=3 with the two NHDBN models.^{16} Figures of the empirical results have been relegated to Sect. 1 of Online Resource 2 and can be summarized as follows. In consistency with the results from Sect. 5.1.1, the proposed coupled DBN increasingly outperforms the uncoupled NHDBN as the amplitude of the perturbation ε of the parameter vectors decreases (see Figs. 1–2 in Sect. 1 of Online Resource 2). Our data analysis not only shows that the relative differences in the network reconstruction accuracy are in favor of the proposed coupled NHDBN but also reveal that the network reconstruction accuracy, measured in terms of mean AUCROC scores, is robust with respect to the choices of the level2 hyperparameters. As shown in Fig. 3 of Online Resource 2, the proposed coupled NHDBN yields almost identical AUCROC scores for each of the 12 level2 hyperparameter settings.
5.1.4 Robustness with respect to the level3 hyperparameters
5.1.5 Posterior distribution of the signaltonoise hyperparameter in dependence on the level3 hyperparameters
We want to find the reason why the coupled NHDBN does not perform better than the uncoupled NHDBN for weak priors on B _{ δ } (see Figs. 7–8). To this end, we explore the posterior distribution of the signaltonoise hyperparameters, δ _{ g }. Since our findings in Sect. 5.1.4 suggest that the two models appear to be robust with respect to a variation of the level3 hyperprior on B _{ σ }, we employ the weakest (most diffuse) prior for B _{ σ }, \(B_{\sigma }\sim \operatorname {Gam}(0.01,2)\).
Histograms of the posterior distribution of log(δ _{ g }) for the uncoupled NHDBN with four different level3 hyperpriors on B _{ δ } can be found in Online Resource 2 (see Fig. 4). The level3 hyperparameters have a moderate effect on the posterior variance, i.e., for the weaker priors the posterior distributions are slightly stronger peaked. The amplitude of the perturbation, ϵ, seems to have no effect on the posterior distribution of δ _{ g }. This latter finding is not surprising, since the uncoupled NHDBN learns the interaction parameters independently for each segment, and it thus does not matter whether the segmentspecific interaction parameter vectors are similar or not. For the uncoupled NHDBN the posterior distribution of δ _{ g } depends on the amplitudes of the interaction parameter vectors only. And independently of the amplitude of the perturbations, ϵ, the amplitudes of the interaction parameter vectors are always equal to 1 in this particular application.
As a complementary analysis, Fig. 5 in Online Resource 2 shows overlaid trace plots of the signaltonoise hyperparameters during the sampling phase (i.e., from iteration 5k to iteration 10k (with k=1,000)), from which the histograms in Fig. 9 have been extracted. The graphs indicate that the extreme signaltonoise hyperparameter value, log(δ _{ g })≪0, observed for weak priors on B _{ δ }, is an attractor state, i.e., a state that the MCMC trajectory can converge to, but never leave. We note that the occurrence of such inconsistent absorbing states in Bayesian hierarchical models as a consequence of weak priors was briefly mentioned in Andrieu and Doucet (1999), p. 2673. We will discuss this point in more detail in Sect. 5.1.7.
5.1.6 Comparison of the two MCMC sampling schemes for the coupled NHDBN model
In this subsection we crosscompare the performance of the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) and the advanced MCMC sampling scheme, proposed here (see Sect. 2.2.4); see Fig. 5 for an overview. To this end, we reanalyze the RAF pathway data with SNR=3 with the original MCMC sampling scheme. We have already seen in Sect. 5.1.4 that weak priors for B _{ δ } lead to attractor states with extreme values for the signaltonoise hyperparameters, δ _{ g }. We suggest that these absorbing attractor states might also be responsible for the low network reconstruction accuracy (AUCROC values) of the original MCMC sampling in the bottom rows of Fig. 7. For each amplitude of the perturbation, ϵ∈{0,0.125,0.25,0.5,1}, we therefore randomly selected five synthetic RAF pathway data sets, i.e. 25 individual data sets in total, and for each individual data set we assessed convergence of the three NHDBN methods from Fig. 5 and Table 5. We consider a strong prior and a weak prior on B _{ δ }.^{19} With each of the three NHDBN methods and each of the two priors on B _{ σ } we performed H=5 independent MCMC simulations for each of the 25 individual data sets. We assessed convergence and mixing by computing the potential scale reduction factors (PSRFs) from the marginal posterior probabilities of the network edges, as described in detail in Sect. 3 of Online Resource 1.
5.1.7 Discussions of the results for the RAF pathway data
In this subsection we provide a theoretical explanation of two empirical findings. First, we explain why weak (vague) level3 hyperpriors on B _{ δ } are disadvantageous for the proposed coupled NHDBN. Second, we explain why the advanced MCMC sampling scheme converges substantially better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b).
The disadvantage of weak (diffuse) priors on B_{δ}
In Sect. 5.1.4 we found that the network reconstruction accuracy of the coupled NHDBN model tends to be superior to that of the uncoupled NHDBN model unless we use a weak prior on B _{ δ } and a medium amplitude of the perturbation, ϵ=0.5; see e.g. Fig. 8. The reason for this behavior becomes clear from the existence of an absorbing state with very low signaltonoise value, log(δ _{ g })≪0, which was already discussed in Sect. 5.1.4 and is illustrated in the two bottom rows of Fig. 9. For this absorbing state, the prior and posterior distributions of the segmentspecific interaction parameters, w _{ g,h }, become highly peaked around the global hyperparameter vector, m _{ g }; see (11) and (27).^{21} Mathematically, w _{ g,h } converges in distribution to m _{ g } as δ _{ g }→0: w _{ g,h }→m _{ g } (h=1,…,K _{ g }) for δ _{ g }→0, and the coupled NHDBN reduces to a conventional homogeneous DBN. We can thus distinguish three regimes for the perturbation amplitude, ϵ. For zero (ϵ=0) or very small perturbations (0<ϵ≪1), the data are adequately modeled with a homogeneous DBN, and by reducing to this model, the coupled NHDBN outperforms the uncoupled one. For intermediate amplitudes of the perturbation, ϵ=0.5, the data are not adequately modeled by a homogeneous DBN, the attractor state is inconsistent with the data, and by reducing to the homogeneous DBN, the coupled DBN is outperformed by the uncoupled one. For large noise amplitudes, ϵ=1, the attractor state is avoided, and the coupled NHDBN no longer reduces to the homogeneous one. However, due to the large perturbation there is not much benefit in using any information sharing among segments, and the coupled and uncoupled NHDBN show approximately equal performance.
As seen from the top rows of Fig. 8, effective information coupling for quasihomogeneous data can be accomplished with less extreme values of δ _{ g } than those of the absorbing state, while entrapment in the absorbing state is detrimental to the performance in the medium perturbation regime around ϵ≈0.5. For that reason, it is advisable to prevent such entrapment. Our results, shown in Fig. 9, suggest that this can be effected by the use of a sufficiently strong (informative, concentrated) prior on B _{ δ }.
The advantage of the advanced MCMC sampling scheme
In Sect. 5.1.6 we found that the advanced MCMC sampling scheme, proposed here, converges substantially better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b); see Fig. 11. The convergence improvement that can be reached with the advanced MCMC sampling scheme, can be explained as follows: We assume that the Markov chain has reached a parent node set \(\pi_{g}^{(i)}\), the global interaction hyperparameter vector \(\mathbf{m}_{g}^{(i)}\), and the signaltonoise hyperparameter, \(\delta_{g}^{(i)}\). Adding a new parent node to the current parent set, \(\pi_{g}^{(i)}\), yields a new parent set \(\pi_{g}^{(\diamond)}\) and the corresponding new global interaction hyperparameter vector, \(\mathbf{m}_{g}^{(\diamond)}\), requires a new component for the new parent node. Unlike the original MCMC sampling scheme, which only samples the new component of \(\mathbf {m}_{g}^{(\diamond)}\) according to its prior distribution (see (12)), the advanced MCMC sampling scheme resamples the whole global hyperparameter vector, \(\mathbf {m}_{g}^{(\diamond)}\), conditional on the new parent set, \(\pi_{g}^{(\diamond )}\), according to its posterior distribution in (46). That is, the segmentspecific interaction parameters for the new parent set are centered around the new vector, \(\mathbf {m}_{g}^{(\diamond)}\), which either contains an a priori sampled entry (original MCMC) or is an a posteriori sample (advanced MCMC). That is, unlike the original MCMC sampling scheme, the advanced MCMC sampling scheme guarantees that the distributions of the segmentspecific interaction parameters are centered around an a posteriori sample \(\mathbf {m}_{g}^{(\diamond)}\), and thus ensures that the marginal likelihoods and the acceptance probabilities are higher.^{22} In particular, as discussed above, weak priors on B _{ δ } can lead to attractor states with extremely low values for the signaltonoise hyperparameters, \(\delta_{g}^{(i)}\). For \(\delta_{g}^{(i)}\rightarrow0\) the posterior distributions of the segmentspecific interaction parameters, w _{ g,h }, are not only centered but peaked^{23} around the global hyperparameter vector, \(\mathbf{m}_{g}^{(\diamond)}\); see (27), and the marginal likelihoods (acceptance probabilities) for the original MCMC sampling scheme, for which \(\mathbf {m}_{g}^{(\diamond)}\) contains an a priori sampled entry, can become very low.
5.2 Gene regulation in Saccharomyces cerevisiae
5.2.1 Performance of the coupled NHDBN model
In this subsection we compare the three NHDBN methods (see Fig. 5 and Table 5) on the gene expression profiles from Saccharomyces cerevisiae, described in Sect. 3.2. Here we also know the true regulatory network, shown in Fig. 4, so that we can objectively crosscompare the network reconstruction accuracy on real biological data. Unlike our earlier data analysis in Sect. 5.1 we now follow an unsupervised approach and assume the segmentations (changepoint sets) to be unknown. That is, the changepoint sets have to be inferred from the data. To obtain different data segmentations we run MCMC simulations (with 10k iterations each) for various hyperparameters of the point process prior on the changepoint locations. As described in Sect. 2.2.2, the distance between changepoints is assumed to follow a negative binomial distribution, and we use the hyperparameters k=1 and p∈{0,0.001,0.01,0.02,0.03,0.04,0.1,0.2,0.3,0.4} in (36).
For the synthetic RAF pathway data we found in Sect. 5.1 that the three methods are robust with respect to a variation of the level3hyperparameters for the hyperprior on B _{ σ }, and we therefore use the weakest prior on B _{ σ }.^{24} For the level3 hyperparameters on B _{ δ } we again choose four different settings.^{25}
For the two weak priors on B _{ δ } in the bottom row of Fig. 12 the network reconstruction accuracy (measured in terms of AUCROC scores) for all three methods is substantially worse than for the stronger priors. Although the coupled NHDBN model still performs better than the uncoupled NHDBN model it appears that its performance does not depend on the average number of changepoints. That is, independently of the inferred average number of changepoints \(\overline {K}\) the mean AUCROC values of the coupled NHDBN model are not better than the AUCROC values of a conventional homogeneous DBN without changepoints (\(\overline {K}=0\)). In consistency with those findings reported for the synthetic RAF pathway data in Sect. 3.1 it can also be seen from the bottom row in Fig. 12 that the advanced MCMC sampling performs (at least slightly) better than the original MCMC sampling scheme for the two weak priors on B _{ δ }.
Overall, our findings for the Saccharomyces cerevisiae time series data are very similar to those observed for the synthetic RAF pathway data in Sect. 5.1. The coupled NHDBN yields a significantly higher network reconstruction accuracy than the uncoupled NHDBN. The advanced MCMC sampling performs (here: at least slightly) better than the original MCMC sampling scheme. The results are robust with respect to a variation of the level3 hyperparameters, unless the prior on B _{ δ } is too weak (diffuse) and yields attractor regions in the configurations space of the Markov chain.
5.2.2 Comparison with a sequentially coupled NHDBN
Because of the temporal structure (switch of the carbon source in the middle of the experiment), the Saccharomyces cerevisiae time series is well suited to conduct a comparative evaluation of the network reconstruction accuracy between the proposed globally coupled NHDBN and the sequentially coupled NHDBN (Grzegorczyk and Husmeier 2012a). Unlike the globally coupled NHDBN, the sequentially coupled NHDBN model is based on the assumption that the interaction parameters at any time segment are similar to those at the previous time interval, i.e., there is coupling between adjacent time segments only. A brief mathematical description of the sequentially coupled NHDBN and the empirical results of our crossmethod comparison have been relegated to Sect. 4 of Online Resource 2. Our findings (see Figs. 11–12 in Online Resource 2) suggest that the globally coupled NHDBN performs significantly better than the sequentially coupled NHDBN model (Grzegorczyk and Husmeier 2012a) with respect to two figures of merit: First, it yields significantly higher maximal AUC scores (AUCROC and AUCPR) than the sequentially coupled NHDBN.^{28} Second, the degradation of the AUC scores for more changepoints is less pronounced for the globally coupled NHDBN, indicating increased robustness with respect to a variation of the prior assumptions on the segmentation and redeeming the effect of overfitting as a consequence of potential model overflexibility.
A possible explanation for this improvement in performance can be gleaned from (2) in Online Resource 2. The information coupling for the model proposed in Grzegorczyk and Husmeier (2012a) is of the form of a Bayesian filter, and (2) in Online Resource 2 corresponds to a diffusion process. Time series generated from this model are intrinsically unstable, i.e., nonstationary with monotonically increasing variance. This is in mismatch with the actual data observed, and avoided by the model proposed in the present work. A second advantage in performance is related to the way the uncoupled model is obtained as a limiting case of the coupled one. For the model proposed in the present work this is effected by a peaked distribution of m _{ g } in (43) and (46), respectively, so that m _{ g } effectively becomes fixed. As seen from Fig. 2, a fixed valued of m _{ g } implied dseparation between the w _{ g,h }’s, i.e., the absence of coupling. Note that this effectively reduces to a hierarchical Bayesian model with one fewer layer of hyperparameters, and does not cause any problems with instability. For the sequentially coupled model proposed in Grzegorczyk and Husmeier (2012a), on the other hand, the strength of coupling decreases with increasing values for λ _{ g } in (1)–(2) in Online Resource 2, which also implies an ever increasing degree of instability, though. Hence, a principled shortcoming of the model proposed in Grzegorczyk and Husmeier (2012a) is a systematic dependence between coupling strength and instability, and this problem is averted by the globally coupled model proposed in the present work.
5.3 Gene regulation in Arabidopsis thaliana
In this subsection we apply the proposed coupled NHDBN model with the advanced MCMC sampling scheme from Sect. 2.2.4 (with 10k MCMC iterations) to the gene expression time series from Arabidopsis thaliana, described in Sect. 3.3. To focus on the relevant task, the regulatory network reconstruction, we kept the changepoints fixed at their known true values. However, it can be seen from Fig. 6 in Sect. 2 of Online Resource 2 that the three changepoints between the four time series in Table 4 can also be inferred from the data. Table 1 in Online Resource 2 provides correlation coefficients of the marginal edge posterior probabilities extracted from the supervised approach (with fixed changepoints) and the unsupervised approaches (with changepoint inference); see Sect. 2 of Online Resource 2 for more details.
6 Conclusion
Modeling nonhomogeneous dynamic Bayesian networks (NHDBNs) with a multiple changepoint process is popular due to the fact that conditional on the changepoints, the marginal likelihood can be computed in closed form. To our knowledge, all previous studies, including Lèbre (2007), Robinson and Hartemink (2009, 2010), Lèbre et al. (2010), Dondelinger et al. (2010, 2012), Husmeier et al. (2010), and Grzegorczyk and Husmeier (2011) compute the marginal likelihood under the assumption of parameter independence and the same independent parameter prior distributions for all time series segments. These approaches ignore the fact that many systems, e.g. regulatory networks and signaling pathways in the cell, adapt to changing internal and external conditions gradually. To allow for information sharing among separate time series segments we have proposed a novel regularized NHDBN with a coupling mechanism in the sense that a priori the interaction parameters associated with separate time series segments are encouraged to be similar. Our empirical assessment on simulated data has revealed that the proposed method leads to an improvement in the network reconstruction accuracy. For time series from real time (RT) polymerase chain reaction (PCR) experiments in Saccharomyces cerevisiae, we have demonstrated that the novel NHDBN also yields a better network reconstruction accuracy than the uncoupled NHDBN, and that it leads to increased robustness with respect to a variation of the prior assumptions about the temporal heterogeneity. We have quantified the effect of the regularization for gene expression time series from Arabidopsis thaliana.
With the present paper we have expanded and improved an earlier conference paper (Grzegorczyk and Husmeier 2012b) in six important aspects. Firstly, due to a strict page limit, the presentation of the methodology in Grzegorczyk and Husmeier (2012b) is very terse, and we have offered a more comprehensive and selfcontained exposition (see, e.g., Fig. 2, Table 3). Secondly, we have extended the NHDBN model from Grzegorczyk and Husmeier (2012b) by introducing an extra (level3) layer to the hierarchy of the proposed model, which allows for informationsharing among the nodes in the network. As is common with Bayesian hierarchical models, the proposed model depends on various hyperparameters. While the hyperparameters of each node were modeled independently in the original model, the extended model hierarchically couples the nodespecific noise variances and the nodespecific coupling strengths between the segmentspecific interaction parameters (see (30)–(33) in Sect. 2.2.1). We have also presented nine different hierarchical coupling schemes for the noise variances hyperparameters (see Table 2). Thirdly, we have introduced a novel collapsed Gibbs sampling step (see (46) in Sect. 2.2.4; the derivation is provided in Sect. 2 of Online Resource 1), which replaces a less efficient uncollapsed Gibbs sampling step of the original MCMC algorithm (see (43) in Sect. 2.2.3). Fourthly and most importantly, we have shown how the novel collapsed Gibbs sampling step and blocking techniques can be exploited for developing a novel advanced MCMC algorithm (see Sect. 2.2.4). We have empirically demonstrated that the advanced MCMC algorithm performs significantly better than the original MCMC sampling scheme from Grzegorczyk and Husmeier (2012b) in terms of convergence and mixing (see, e.g., Fig. 11 in Sect. 5.1), and thus practically often also yields a higher network reconstruction accuracy (see, e.g., Fig. 7 in Sect. 5.1 or Fig. 12 in Sect. 5.2.1). Fifthly, in the data analysis we have systematically varied the (hyper)hyperparameters of those (hyper)priors that are important for the noise variances and coupling strengths among segments and we have investigated their influence on the performance. Our empirical findings indicate that vague level3 hyperpriors may lead to extreme attractor states in the MCMC configuration space, as a consequence of which the coupled NHDBN effectively reduces to a conventional DBN. Our study has provided clear graphical diagnostic tools that allow the user to identify this problem (see Figs. 9, 13, and 14(a)). Also, for sufficiently nondiffuse hyperpriors, this problem can be avoided altogether: our study has indicated that the proposed model is robust with respect to a variation of the level3 hyperparameters, as long as diffuse hyperpriors are avoided. Sixthly, in Sect. 5.2.2 we have shown that the proposed globally coupled NHDBN outperforms the sequentially coupled NHDBN, proposed in Grzegorczyk and Husmeier (2012a), on expression time series from a synthetic biology study in which a synthetically designed Saccharomyces cerevisiae strain is exposed to a change of nutrients in its environment. The better performance seems to result from two methodological improvements, which are related to the avoidance of intrinsic instability and a more natural way of how the coupling scheme includes the uncoupled model as a limiting case (see Sect. 5.2.2).
Footnotes
 1.
The most important results of our prestudy have been relegated to Sect. 3 of Online Resource 2, and we refer to these results in Sect. 5.1.
 2.
Since we are modeling gene regulatory processes with NHDBN models which have nodespecific changepoints, the three coupling schemes (S2), (S3), and (S6) from Table 2 are not suitable. Nodespecific changepoints imply that there is a separate segmentation for each gene. Consequently, there are genespecific hth segments which may represent different or even disjunct time intervals of the gene regulatory process.
 3.
A priori we have: \(\mathit{CV}(\sigma_{g}^{2}) :=\frac{E[\sigma_{g}^{2}]}{\sqrt{\operatorname {Var}(\sigma_{g}^{2})}}=\sqrt{A_{\sigma}}\) and \(\mathit{CV}(\delta_{g}^{1}):=\frac{E[\delta_{g}^{1}]}{\sqrt{\operatorname {Var}(\delta_{g}^{1})}}=\sqrt{A_{\delta}}\).
 4.
Note that the negative binomial distribution can be seen as a discrete version of the Gamma distribution.
 5.
Given a time series of length T we have \(\tilde{n}=T2\) possible changepoint locations. In a DBN with lag 1 the first time point must be removed, since no preceding time point is available. The last time point is no candidate for a changepoint either, since there are no observations after time point T which could be allocated to a new segment.
 6.
If we impose an upper limit on the numbers of changepoints per node, K _{ g }−1 a priori follows a truncated binomial distribution.
 7.
 8.
Dynamic noise systematically increases the variances of the signals for subsequent time points. From (50) it can be seen that adding (dynamic) noise (via u _{ g,t }) at time point t increases the expected variance of the variables at time point t, \(\mathcal{D}_{g,t}\), which serve as signals for the next time point t+1. That is, strong dynamic noise injections increase the variances of the variables in \(\mathcal{D}_{g,t}\) and the signaltonoise ratio gets weaker over time.
 9.
We used RMA rather than GCRMA for reasons discussed in Lim et al. (2007).
 10.
 11.
With this setting of the hyperparameters, A _{ σ }=0.005 and E[B _{ σ }]=0.005, we follow Lèbre et al. (2010) and Grzegorczyk and Husmeier (2012b). In Grzegorczyk and Husmeier (2012b) we set \(A_{\sigma}=B_{\sigma}=\frac{\nu }{2}\) with ν=0.01. Note that we also briefly investigate the robustness with respect to the level2 hyperparameters. In a study in Sect. 5.1 we employ fixed level2 hyperparameters: (A _{ σ },B _{ σ })∈{(0.0005,0.0005),(0.005,0.005),(0.05,0.05)}.
 12.
This setting (A _{ δ }=2 and E[B _{ δ }]=0.2) is motivated by earlier studies (Lèbre et al. 2010; Grzegorczyk and Husmeier 2012b). In Grzegorczyk and Husmeier (2012b) we set A _{ δ }=2 and B _{ δ }=0.2. Note that we also briefly investigate the robustness with respect to these level2 hyperparameters; in a study in Sect. 5.1 we employ four pairs of fixed level2 hyperparameters: (A _{ δ },B _{ δ })∈{(2,2),(2,0.2),(0.2,2),(0.2,0.2)}.
 13.
\(\operatorname {Var}[B_{\delta}]=\frac{\alpha_{\delta}}{\beta_{\delta}^{2}}\in\{ 0.0002,0.002,0.02,0.2\}\).
 14.
 15.
 16.
 17.
As in Grzegorczyk and Husmeier (2012b) we set A _{ σ }=0.005 and A _{ δ }=2 in (30)–(31), and we consider 12 combinations of the level3 hyperparameters: (α _{ σ },β _{ σ })∈{(1,200),(0.1,20),(0.01,2)} and (α _{ δ },β _{ δ })∈{(200,1000),(20,100),(2,10),(0.2,1)}. Note that all settings a priori ensure: E[B _{ σ }]=0.005 and E[B _{ δ }]=0.2 (as in Grzegorczyk and Husmeier (2012b)), while the “strengths” (variances) of the priors vary; see Sect. 4 for details.
 18.
For small amplitudes of the perturbations, (ϵ≈0), the segmentspecific interaction parameter vectors are similar. The relationships between nodes can then be adequately approximated by a homogeneous DBN.
 19.
\(B_{\delta}\sim \operatorname {Gam}(20,100)\)) and (\(B_{\delta}\sim \operatorname {Gam}(0.2,1)\)) in (33).
 20.
Note that for each ϵ the five individual data sets led to very similar results.
 21.
 22.
For the parent flip move the original MCMC sampling scheme also yields lower acceptance probabilities than the advanced MCMC sampling scheme: If the flip move proposes to substitute a “suboptimal” parent node for a “more suitable” new parent node, i.e., to move from \([\pi_{g}^{(i)},\mathbf {m}_{g}^{(i)}]\) to \([\pi_{g}^{(\diamond)},\mathbf {m}_{g}^{(\diamond)}]\), then the component of the suboptimal parent node in \(\mathbf {m}_{g}^{(i)}\) was sampled according to its posterior distribution earlier in the MCMC simulation. The original MCMC sampler which samples the component of the new parent node in \(\mathbf {m}_{g}^{(\diamond)}\) from its prior distribution yields a lower acceptance probability than the advanced MCMC sampler which resamples \(\mathbf {m}_{g}^{(\diamond)}\) conditional on \(\pi_{g}^{(\diamond)}\) from its posterior distribution (see (46)).
 23.
 24.
 25.
 26.
For each gene, the mean of the posterior distribution of the number of changepoints was determined, and these values were averaged over all genes to obtain the average number of changepoints per gene, \(\overline {K}\).
 27.
We have: \(\mathbf {w}_{g,h}^{(i)}\rightarrow \mathbf {m}_{g}^{(i)}\) (\(h=1,\ldots,K_{g}^{(i)}\)) for \(\delta_{g}^{(i)}\rightarrow0\), and this (quasi)homogeneity also explains why the AUCROC scores for the coupled NHDBN in the bottom row of Fig. 12 do not depend on the average number of changepoints, \(\overline {K}\).
 28.
Recall that the highest AUC scores are reached for about one changepoint per gene (\(\overline {K}\approx1\)), reflecting the carbon source switch; see Sect. 3.2 for details.
 29.
 30.
If the changepoints are known, as assumed in Sect. 2.2.1, we keep them fixed throughout the whole MCMC simulation, i.e., we set \(\mbox {{\boldmath $\tau $}}_{g}^{(i)}=\mbox {{\boldmath $\tau $}}_{g}\) for each g and for all MCMC iterations i.
 31.
The parentnode flip move was introduced in Grzegorczyk and Husmeier (2011) and randomly chooses a parent node, \(u\in\pi_{g}^{(i)}\), from the current parent node set, \(\pi_{g}^{(i)}\), and randomly chooses a node, \(v\notin\pi_{g}^{(i)}\), which is currently not a parent of node g, and substitutes the current parent node u for the new parent node v.
Notes
Acknowledgements
Marco Grzegorczyk is supported by the German Research Foundation (DFG), research grant GR3853/11. The work described in this article was partly carried out under the “Timet” project, funded by an EU FP7 grant.
Supplementary material
References
 Ahmed, A., & Xing, E. (2009). Recovering timevarying networks of dependencies in social and biological studies. Proceedings of the National Academy of Sciences, 106, 11878–11883. CrossRefGoogle Scholar
 Alabadi, D., Oyama, T., Yanovsky, M., Harmon, F., Mas, P., & Kay, S. (2001). Reciprocal regulation between TOC1 and LHY/CCA1 within the Arabidopsis circadian clock. Science, 293, 880–883. CrossRefGoogle Scholar
 Albert, R. (2005). Scalefree networks in cell biology. Journal of Cell Science, 118, 4947–4957. CrossRefGoogle Scholar
 Andrieu, C., Davy, M., & Doucet, A. (2003). Efficient particle filtering for jump Markov systems. Application to timevarying autoregressions. IEEE Transactions on Signal Processing, 51, 1762–1770. MathSciNetCrossRefGoogle Scholar
 Andrieu, C., & Doucet, A. (1999). Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC. IEEE Transactions on Signal Processing, 47, 2667–2676. CrossRefGoogle Scholar
 Bishop, C. M. (2006). Pattern recognition and machine learning. Singapore: Springer. MATHGoogle Scholar
 Cantone, I., Marucci, L., Iorio, F., Ricci, M., Belcastro, V., Bansal, M., Santini, S., di Bernardo, M., di Bernardo, D., & Cosma, M. (2009). A yeast synthetic network for in vivo assessment of reverseengineering and modeling approaches. Cell, 137, 172–181. CrossRefGoogle Scholar
 McClung, C. R. (2006). Plant circadian rhythms. The Plant Cell, 18, 792–803. CrossRefGoogle Scholar
 Dondelinger, F., Lèbre, S., & Husmeier, D. (2010). Heterogeneous continuous dynamic Bayesian networks with flexible structure and intertime segment information sharing. In J. Furnkranz & T. Joachims (Eds.), Proceedings of the international conference on machine learning (ICML), Madison, WI, USA (pp. 303–310). Google Scholar
 Dondelinger, F., Lèbre, S., & Husmeier, D. (2012). Nonhomogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually timevarying structure. Machine Learning. doi: 10.1007/s109940125311x. Google Scholar
 Edwards, K., Anderson, P., Hall, A., Salathia, N., Locke, J., Lynn, J., Straume, M., Smith, J., & Millar, A. (2006). Flowering locus C mediates natural variation in the hightemperature response of the Arabidopsis circadian clock. The Plant Cell, 18, 639–650. CrossRefGoogle Scholar
 Fearnhead, P. (2006). Exact and efficient Bayesian inference for multiple changepoint problems. Statistics and Computing, 16, 203–213. MathSciNetCrossRefGoogle Scholar
 Friedman, N., & Koller, D. (2003). Being Bayesian about network structure. Machine Learning, 50, 95–126. MATHCrossRefGoogle Scholar
 Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences. Statistical Science, 7, 457–472. CrossRefGoogle Scholar
 Gelman, A., Carlin, J., Stern, H., & Rubin, D. (2004). Bayesian data analysis (2nd ed.). London: Chapman and Hall/CRC. MATHGoogle Scholar
 Giudici, P., & Castelo, R. (2003). Improving Markov chain Monte Carlo model search for data mining. Machine Learning, 50, 127–158. MATHCrossRefGoogle Scholar
 Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82, 711–732. MathSciNetMATHCrossRefGoogle Scholar
 Grzegorczyk, M., & Husmeier, D. (2011). Nonhomogeneous dynamic Bayesian networks for continuous data. Machine Learning, 83, 355–419. MATHCrossRefGoogle Scholar
 Grzegorczyk, M., & Husmeier, D. (2012a). A nonhomogeneous dynamic Bayesian network with sequentially coupled interaction parameters for applications in systems and synthetic biology. Statistical Applications in Genetics and Molecular Biology, 11, 7. MathSciNetCrossRefGoogle Scholar
 Grzegorczyk, M., & Husmeier, D. (2012b). Bayesian regularization of nonhomogeneous dynamic Bayesian networks by globally coupling interaction parameters. In N. Lawrence & M. Girolami (Eds.), JMLR: W&CP: Vol. 22. Proceedings of the 15th international conference on artificial intelligence and statistics (AISTATS) (pp. 467–476). Google Scholar
 Grzegorczyk, M., Husmeier, D., Edwards, K., Ghazal, P., & Millar, A. (2008). Modelling nonstationary gene regulatory processes with a nonhomogeneous Bayesian network and the allocation sampler. Bioinformatics, 24, 2071–2078. CrossRefGoogle Scholar
 Hill, M. (2012). Sparse graphical models for cancer signalling. PhD thesis, Warwick University. Google Scholar
 Husmeier, D., Dondelinger, F., & Lèbre, S. (2010). Intertime segment information sharing for nonhomogeneous dynamic Bayesian networks. In J. Lafferty, C. Williams, J. ShaweTaylor, R. Zemel, & A. Culotta (Eds.), Proceedings of the 24th annual conference on neural information processing systems (NIPS) (pp. 901–909). Curran Associates. Google Scholar
 Johnson, C., Elliott, J., & Foster, R. (2003). Entrainment of circadian programs. Chronobiology International, 20, 741–774. CrossRefGoogle Scholar
 Kikis, E., Khanna, R., & Quail, P. (2005). ELF4 is a phytochromeregulated component of a negativefeedback loop involving the central oscillator components CCA1 and LHY. Plant Journal, 44, 300–313. CrossRefGoogle Scholar
 Kolar, M., Song, L., & Xing, E. (2009). Sparsistent learning of varyingcoefficient models with structural changes. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.), Advances in neural information processing systems (NIPS) (Vol. 22, pp. 1006–1014). Google Scholar
 Lèbre, S. (2007). Stochastic process analysis for genomics and dynamic Bayesian networks inference. PhD thesis, Université d‘EvryVald‘Essonne, France. Google Scholar
 Lèbre, S., Becq, J., Devaux, F., Lelandais, G., & Stumpf, M. (2010). Statistical inference of the timevarying structure of generegulation networks. BMC Systems Biology, 4. Google Scholar
 Liang, F., Liu, C., & Carroll, R. (2010). Wiley series in computational statistics. Advanced Markov chain Monte Carlo methods: learning from past samples. Cornwall: Wiley. MATHCrossRefGoogle Scholar
 Lim, W., Wang, K., Lefebvre, C., & Califano, A. (2007). Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks. Bioinformatics, 23, i282–i288. CrossRefGoogle Scholar
 Lindley, D. (1962). Discussion on the article by Stein. Journal of the Royal Statistical Society. Series B. Methodological, 24, 265–296. MathSciNetGoogle Scholar
 Locke, J., Southern, M., KozmaBognar, L., Hibberd, V., Brown, P., Turner, M., & Millar, A. (2005). Extension of a genetic network model by iterative experimentation and mathematical analysis. Molecular Systems Biology, 1 (online). Google Scholar
 Mockler, T. C., Michael, T. P., Priest, H. D., Shen, R., Sullivan, C. M., Givan, S. A., McEntee, C., Kay, S. A., & Chory, J. (2007). The diurnal project: diurnal and circadian expression profiling, modelbased pattern matching and promoter analysis. Cold Spring Harbor Symposia on Quantitative Biology, 72, 353–363. CrossRefGoogle Scholar
 Moulines, E., Priouret, P., & Roueff, F. (2005). On recursive estimation for time varying autoregressive processes. The Annals of Statistics, 33, 2610–2654. MathSciNetMATHCrossRefGoogle Scholar
 Punskaya, E., Andrieu, C., Doucet, A., & Fitzgerald, W. (2002). Bayesian curve fitting using MCMC with applications to signal segmentation. IEEE Transactions on Signal Processing, 50, 747–758. CrossRefGoogle Scholar
 Robinson, J., & Hartemink, A. (2009). Nonstationary dynamic Bayesian networks. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.), Advances in neural information processing systems (NIPS) (Vol. 21, pp. 1369–1376). San Mateo: Morgan Kaufmann. Google Scholar
 Robinson, J., & Hartemink, A. (2010). Learning nonstationary dynamic Bayesian networks. Journal of Machine Learning Research, 11, 3647–3680. MathSciNetMATHGoogle Scholar
 Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D., & Nolan, G. (2005). Proteinsignaling networks derived from multiparameter singlecell data. Science, 308, 523–529. CrossRefGoogle Scholar
 Stein, C. (1955). Inadmissibility of the usual estimator for the mean of a multivariate normal distribution. In Proc. of the third Berkeley symposium on mathematical statistics and probability (Vol. 1, pp. 197–206). Berkeley: Berkeley University Press. Google Scholar
 Talih, M., & Hengartner, N. (2005). Structural learning with timevarying components: tracking the crosssection of financial time series. Journal of the Royal Statistical Society. Series B. Methodological, 67, 321–341. MathSciNetMATHCrossRefGoogle Scholar
 Wang, S., Cui, L., Cheng, S., Zhai, S., Yeary, M., & Wu, Q. (2011). Noise adaptive LDPC decoding using particle filtering. IEEE Transactions on Communications, 59, 913–916. CrossRefGoogle Scholar
 Xuan, X. (2007). Bayesian inference on change point problems. Master’s thesis, The Faculty of Graduate Studies (Computer Science), The University of British Columbia, Vancouver. Google Scholar
 Xuan, X., & Murphy, K. (2007). Modeling changing dependency structure in multivariate time series. In Z. Ghahramani (Ed.), Proceedings of the 24th annual international conference on machine learning (ICML 2007) (pp. 1055–1062). Madison: Omnipress. CrossRefGoogle Scholar