# Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure

## Abstract

The proper functioning of any living cell relies on complex networks of gene regulation. These regulatory interactions are not static but respond to changes in the environment and evolve during the life cycle of an organism. A challenging objective in computational systems biology is to infer these time-varying gene regulatory networks from typically short time series of transcriptional profiles. While homogeneous models, like conventional dynamic Bayesian networks, lack the flexibility to succeed in this task, fully flexible models suffer from inflated inference uncertainty due to the limited amount of available data. In the present paper we explore a semi-flexible model based on a piecewise homogeneous dynamic Bayesian network regularized by gene-specific inter-segment information sharing. We explore different choices of prior distribution and information coupling and evaluate their performance on synthetic data. We apply our method to gene expression time series obtained during the life cycle of *Drosophila melanogaster*, and compare the predicted segmentation with other state-of-the-art techniques. We conclude our evaluation with an application to synthetic biology, where the objective is to predict an in vivo regulatory network of five genes in *Saccharomyces cerevisiae* subjected to a changing environment.

## Keywords

Dynamic Bayesian networks Hierarchical Bayesian models Multiple changepoint processes Reversible jump Markov chain Monte Carlo Gene expression time series Systems and synthetic biology## 1 Introduction

One of the challenging problems in the field of systems biology is the inference of gene regulatory networks from high-throughput transcriptomic profiles, as obtained e.g. with microarrays or next generation sequencing. While protein interactions can be measured directly with various high-throughput assays (e.g. yeast-2-hybrid or phage display), gene regulatory interactions involve several intermediate steps related to the formation, activation and complex formation of transcription factors (e.g. via phosphorylation or dimerization). These processes are not observable at the transcriptional level. For that reason the inference of interactions has to be based on indirect noisy measurements of mRNA concentrations (a proxy for gene activity), rendering the problem of regulatory network reconstruction more difficult than for proteins. Various statistical techniques aim to perform network inference on this data, and the reconstructed regulation networks can reveal how the genes and the proteins they code for interact. However, many of the regulatory interactions in the cell vary in time. During the development and growth of an organism, some genes and pathways are more active during the early stages, but show practically no activity during the later stages, or vice-versa. *Drosophila melanogaster*, for instance, goes through several developmental stages, from embryo to larva to pupa to adult. Genes involved in wing muscle development would naturally fulfill different roles during the embryonal phase, when no wings are present, than they do in the adult fly, when the wings have fully developed. Another instance in which the gene regulatory network varies in time is in reaction to an environmental trigger, such as the type of growth substrate. Such a trigger can enhance or prevent the interactions of certain genes, which in turn can have repercussions for the whole gene network.

We are therefore presented with the problem of inferring a regulatory network from a series of discrete measurements or observations in time, where the structure of the network is subject to potential change. Moreover, we may not always know at which stage structural changes are likely to occur, as the underlying processes may be time-delayed or dependent on unobservable external factors. To extend conventional reverse engineering methods, which only aim to infer a single immutable regulatory network, our work builds on recent research in combining dynamic Bayesian networks (DBNs) with multiple changepoint processes (Robinson and Hartemink 2009, 2010; Grzegorczyk and Husmeier 2009, 2011; Lèbre 2007; Lèbre et al. 2010; Kolar et al. 2009). Below, we will briefly review the state of the art and the shortcomings of existing methods that we aim to address.

The standard assumption underlying DBNs is that time-series have been generated from a homogeneous Markov process. This assumption is too restrictive, as discussed above, and can potentially lead to erroneous conclusions. While there have been various efforts to relax the homogeneity assumption for undirected graphical models (Talih and Hengartner 2005; Xuan and Murphy 2007), relaxing this restriction in DBNs is a more recent research topic (Robinson and Hartemink 2009, 2010; Grzegorczyk and Husmeier 2009, 2011; Ahmed and Xing 2009; Lèbre 2007; Lèbre et al. 2010; Kolar et al. 2009). At present, none of the proposed methods is without its limitations, leaving room for further methodological innovation. The method proposed in Ahmed and Xing (2009) and Kolar et al. (2009) is non-Bayesian. This requires certain regularization parameters to be optimized “externally”, by applying information criteria (like AIC or BIC), cross-validation or bootstrapping. The first approach is suboptimal, the latter approaches are computationally expensive.^{1} In the present paper we therefore follow the Bayesian paradigm, as in Robinson and Hartemink (2009, 2010), Grzegorczyk and Husmeier (2009, 2011), Lèbre (2007) and Lèbre et al. (2010). These approaches also have their limitations. The method proposed in Grzegorczyk and Husmeier (2009, 2011) assumes a fixed network structure and only allows the interaction parameters to vary with time. This assumption is too rigid when looking at processes where changes in the overall regulatory network structure are expected, e.g. in morphogenesis or embryogenesis. The method proposed in Robinson and Hartemink (2009, 2010) requires a discretization of the data, which incurs an inevitable information loss. These limitations are addressed in Lèbre (2007) and Lèbre et al. (2010), where the authors propose a method for continuous data that allows network structures associated with different nodes to change with time in different ways. However, this high flexibility causes potential problems when applied to time series with a low number of measurements, as typically available from systems biology, leading to overfitting or inflated inference uncertainty.

The objective of the present paper is to propose a novel model that addresses the methodological shortcomings of the three Bayesian methods mentioned above, and to demonstrate its viability by application to gene expression time series from *Drosophila melanogaster* and *Saccharomyces cerevisiae*. Unlike Robinson and Hartemink (2009, 2010), our model is continuous and therefore avoids the information loss inherent in a discretization of the data. We further improve on the model in Robinson and Hartemink (2009, 2010) by allowing for different penalties for changing edges and non-edges in the network, and by allowing different nodes in the networks to have different penalty terms. Unlike Grzegorczyk and Husmeier (2009, 2011), our model allows the network structure to change among segments, leading to greater model flexibility. As an improvement on Lèbre (2007) and Lèbre et al. (2010), our model introduces information sharing among time series segments, which provides an essential regularization effect. We have applied the model to reconstruct two regulatory networks: a network of genes involved in wing muscle development during the life cycle of *Drosophila melanogaster* (Arbeitman et al. 2002), and an engineered network from synthetic biology, consisting of five genes in *Saccharomyces cerevisiae* (Cantone et al. 2009).

The present paper follows on from two earlier conference papers of ours (Dondelinger et al. 2010; Husmeier et al. 2010). In Dondelinger et al. (2010), we compared two different information coupling paradigms: global information coupling and sequential information coupling. Global information coupling is appropriate when there is no natural sequential order of the time series segments, such as for segments derived from different experimental conditions. Sequential information sharing, which we investigated in more detail in Husmeier et al. (2010) and in the present paper, is appropriate for modelling a temporal developmental process, such as those related to morphogenesis, where changes to the network structure happen sequentially.

The present paper extends Dondelinger et al. (2010) and Husmeier et al. (2010) in several respects. Firstly, restricted by a strict page limit, our earlier papers were rather terse. The present paper provides a more comprehensive exposition of the methodology, which is self-contained. Secondly, we have explored different versions of information coupling (hard versus soft) and functional forms of the prior (exponential versus binomial). In Husmeier et al. (2010), not all combinations of strength versus functional form were investigated, and we have completed these combinations in our present work. Thirdly, we have improved the MCMC scheme. In our earlier work, a standard Metropolis-Hastings-Green (RJMCMC) sampler was employed. In the present work we have identified several scenarios where this sampler is bound to fail, and we propose a new type of MCMC proposal move. We show that these new moves avoid the convergence problems encountered with the original sampler, leading to a substantial improvement in mixing. Fourthly, the Bayesian hierarchical models that we propose depend on various hyperparameters. As opposed to our earlier work, we have investigated the influence of the higher level hyperparameters. To this end, we have first carried out a set of simulation studies for the proposed model. To substantiate our findings, we have then additionally carried out semi-analytical investigations for a simplified scenario, in which the computation of the marginal likelihood is tractable (see Sect. 5.2). Fifthly, we have rerun all our earlier simulations to understand the effect of model choice, unconfounded by MCMC mixing problems, and we have improved the interpretation of the results for the real-world problems.

We note that while we were extending our earlier work of Husmeier et al. (2010), a somewhat related paper has been published: Wang et al. (2011). While methodologically similar, there is an important difference in the application and inference, though. The objective of Wang et al. (2011) is online parameter estimation via particle filtering, with applications e.g. in tracking. This is a different scenario from most systems biology applications, where an interaction structure is typically learnt off-line after completion of a series of high-throughput experiments. Unlike Wang et al. (2011), our work thus follows other applications of DBNs in systems biology (Robinson and Hartemink 2009, 2010; Grzegorczyk and Husmeier 2009, 2011; Lèbre 2007; Lèbre et al. 2010; Kolar et al. 2009) and aims to infer the model structure by marginalizing out the parameters in closed form. To paraphrase this: while inference in Wang et al. (2011) is based on a filter, inference in our work is based on a smoother.

Our paper is organized as follows. Section 2 reviews the non-homogeneous DBN on which our work is based. Section 3 describes the methodological innovation of Bayesian regularization via information coupling. Section 4 describes the implementation of our method and the setup of the simulation studies. Section 5 discusses results obtained on synthetic data, with an investigation of the influence of the hyperparameters. Section 6 describes and interprets two real-world applications, related to morphogenesis in *Drosophila melanogaster* and synthetic biology in *Saccharomyces cerevisiae*. The paper concludes in Sect. 7 with a general discussion and summary.

## 2 Background: non-homogeneous DBNs

This section summarizes the auto regressive time-varying DBN proposed in Lèbre (2007) and Lèbre et al. (2010). A similar model was proposed in Punskaya et al. (2002). The idea is to combine the Bayesian regression model of Andrieu and Doucet (1999) with multiple changepoint processes and pursue Bayesian inference with reversible jump Markov chain Monte Carlo (RJMCMC) (Green 1995). We call this method TVDBN (Time-Varying Dynamic Bayesian Network).

The model is based on the first-order Markov assumption. This assumption is not critical, though, and a generalization to higher orders, as pursued in Punskaya et al. (2002), is straightforward. The value that a node in the graph takes on at time *t* is determined by the values that the node’s parents (i.e. potential regulators, see below) take on at the previous time point, *t*−1. More specifically, the conditional probability of the observation associated with a node at a given time point is a conditional Gaussian distribution, where the conditional mean is a linear weighted sum of the parent values at the previous time point, and the interaction parameters and parent sets depend on the time series segment. The latter dependence adds extra flexibility to the model and thereby relaxes the homogeneity assumption. The interaction parameters, the variance parameters, the number of potential parents, the location of changepoints demarcating the time series segments, and the number of changepoints are given (conjugate) prior distributions in a hierarchical Bayesian model. For inference, all these quantities are sampled from the posterior distribution with RJMCMC. Note that a complete specification of all node-parent configurations determines the structure of a regulatory network: each node receives incoming directed edges from each node in its parent set. In what follows, we will refer to nodes as genes and to the network as a gene regulatory network. The method is not restricted to molecular systems biology, though.

### 2.1 Graph

Let *p* be the number of observed genes, and let * x*=(

*x*

_{ i }(

*t*))

_{1≤i≤p,1≤t≤N }be the expression values measured at

*N*time points.

**G**^{ h }represents a directed graph, i.e. the network defined by a set of directed edges among the

*p*genes. \(\boldsymbol{G}^{h}_{i}\) is the subnetwork associated with target gene

*i*, determined by the set of its parents, i.e. the nodes with a directed edge feeding into gene

*i*; these are the potential regulators of the target gene. The meaning of the superscript

*h*is explained in the next section.

### 2.2 Multiple changepoint process

The set of regulatory relationships among the genes, defined by **G**^{ h }, may vary across time, which we model with a multiple changepoint process. For each target gene *i*, an unknown number *k* _{ i } of changepoints define *k* _{ i }+1 non-overlapping segments. Segment *h*=1,..,*k* _{ i }+1 starts at changepoint \(\xi _{i}^{h-1}\) and stops before \(\xi_{i}^{h}\), where \(\boldsymbol{\xi}_{i}=(\xi_{i}^{0},\ldots , \xi_{i}^{h-1}, \xi_{i}^{h},\ldots, \xi_{i}^{k_{i}+1})\) with \(\xi_{i}^{h-1}<\xi_{i}^{h}\). To delimit the bounds, two pseudo-changepoints are introduced: \(\xi _{i}^{0}=2\) and \(\xi_{i}^{k_{i}+1}=N+1\). Thus vector **ξ**_{ i } has length |**ξ**_{ i }|=*k* _{ i }+2. The set of changepoints is denoted by * ξ*=(

**ξ**_{ i })

_{1≤i≤p }. This changepoint process induces a partition of the time series, \(\boldsymbol{x}_{i}^{h} = (x_{i}(t))_{\xi_{i}^{ h - 1 } \leq t < \xi_{i}^{h}} \) , with different network structures \(\boldsymbol{G}_{i}^{h} \) associated with the different segments

*h*∈{1,…,

*k*

_{ i }+1}. Identifiability is satisfied by ordering the changepoints based on their position in the time series. We define \(\boldsymbol{G}_{i}= \{\boldsymbol{G}_{i}^{h}\}_{1 \leq h \leq k_{i}+ 1}\) and

*={*

**G**

**G**_{ i }}

_{1≤i≤p }.

### 2.3 Regression model

*i*, the random variable

*X*

_{ i }(

*t*) refers to the expression of gene

*i*at time

*t*. Within any segment

*h*, the expression of gene

*i*depends on the

*p*gene expression values measured at the previous time point through a regression model defined by (a) a set of \(s_{i}^{h}\) parents denoted by \(\boldsymbol{G}_{i}^{h}=\{j_{1},\ldots, j_{s_{i}^{h}}\} \subseteq \{1, \ldots, p\}\), \(\vert\boldsymbol{G}_{i}^{h} \vert=s_{i}^{h} \), and (b) a set of parameters \((\boldsymbol{a}_{i}^{h}, \ \sigma_{i}^{h})\) where \(\boldsymbol{a}_{i}^{h}=(a_{ij}^{h})_{0 \leq j \leq p}\), \(a_{ij}^{h} \in\mathbb{R}\) and \(\sigma_{i}^{h} >0\). For all \(j\not=0\), \(a_{ij}^{h}=0\) if \(j \notin\boldsymbol{G}_{i}^{h}\). For each gene

*i*, for each time point

*t*in segment

*h*(\(\xi _{i}^{h-1} \leq t < \xi_{i}^{h}\)), the random variable

*X*

_{ i }(

*t*) depends on the

*p*variables {

*X*

_{ j }(

*t*−1)}

_{1≤j≤p }according to where the noise \(\varepsilon_{i}^{h}(t)\) is assumed to be Gaussian with mean 0 and variance \((\sigma_{i}^{h})^{2}\), \(\varepsilon_{i}^{h}(t) \sim N(0,(\sigma_{i}^{h})^{2}) \) . We define \(\boldsymbol{a}_{i}=(\boldsymbol{a}_{i}^{h})_{1 \leq h\leq k_{i}+1}\),

*=(*

**a**

**a**_{ i })

_{0≤i≤p }, \(\boldsymbol{\sigma}^{\boldsymbol{2}}_{i}= (\sigma_{i}^{h})^{2}_{1 \leq h\leq k_{i}+1}\) and \(\boldsymbol{\sigma}^{\boldsymbol{2}}= (\boldsymbol{\sigma}^{\boldsymbol{2}}_{i})_{0 \leq i \leq p}\).

### 2.4 Prior

*k*

_{ i }+1 segments are delimited by

*k*

_{ i }changepoints, where

*k*

_{ i }is distributed a priori as a truncated Poisson random variable with mean

*λ*and maximum \(\overline{k}= N-2\): where

*=(*

**k***k*

_{1},…,

*k*

_{ p }). Conditional on

*k*

_{ i }changepoints, the changepoint position vector \(\boldsymbol{\xi}_{i}=(\xi_{i}^{0}, \xi_{i}^{1},\ldots, \xi_{i}^{k_{i}+1})\) takes non-overlapping integer values, which we take to be uniformly distributed a priori. There are (

*N*−2) possible positions for the

*k*

_{ i }changepoints, thus vector

**ξ**_{ i }has prior density:

*i*, for each segment

*h*, the number \(s_{i}^{h}\) of parents for node

*i*follows a truncated Poisson distribution with mean

*Λ*and maximum \(\overline{s}=5\): Conditional on \(s_{i}^{h}\), the prior for the parent set \(\boldsymbol{G}_{i}^{h} \) is a uniform distribution over all parent sets with cardinality \(s_{i}^{h}\),

*j*+1)

^{ th }column contains the observed values \((x_{j}(t))_{\xi _{i}^{h-1}-1 \leq t < \xi_{i}^{h}-1}\) for each factor gene

*j*in \(\boldsymbol{G}_{i}^{h}\). This so-called g-prior was also used in Andrieu and Doucet (1999) and is motivated in Zellner (1986). Finally, the conjugate prior for the variance \((\sigma_{i}^{h})^{2}\) is the inverse gamma distribution, \(P((\sigma_{i}^{h})^{2})= \mathcal {IG}(\upsilon_{0},\gamma_{0})\). Following Lèbre (2007) and Lèbre et al. (2010), we set the hyper-hyperparameters for shape,

*υ*

_{0}=0.5, and scale,

*γ*

_{0}=0.05, to fixed values that give a vague distribution. The terms

*λ*and

*Λ*can be interpreted as the expected number of changepoints and parents, respectively, and

*δ*

^{2}is the expected signal-to-noise ratio. These hyperparameters are drawn from vague conjugate hyperpriors, which are in the (inverse) gamma distribution family:

### 2.5 Posterior

*h*. From Bayes’ theorem, the posterior is given by the following equation, where all prior distributions have been defined above:

*and*

**a**

**σ**^{ 2 }in the posterior distribution of Eq. (11) is analytically tractable: For each gene

*i*, the joint distribution for

*k*

_{ i },

**ξ**_{ i },

**G**_{ i },

**a**_{ i }, \(\boldsymbol{\sigma}^{\boldsymbol{2}}_{i}\),

**x**_{ i }conditional on hyperparameters

*λ*,

*Λ*,

*δ*

^{2}, is integrated over the parameters

**a**_{ i }(normal distribution) and \(\boldsymbol{\sigma}^{\boldsymbol{2}}_{i}\) (inverse gamma distribution). Solving this integral (for details see Lèbre et al. 2010), the following expression is obtained: where

*C*

_{ λ },

*C*

_{ Λ }are the normalization constants required by the truncation of the Poisson distribution (2) and (4) and where where the matrices \(\boldsymbol{P}_{i}^{h}\) and \(\boldsymbol{M}_{i}^{h}\) are defined as follows, with

*referring to the identity matrix of size length \((\boldsymbol{x}_{i}^{h})\): The number of changepoints*

**I***and their location,*

**k***, the network structure*

**ξ***and the hyperparameters*

**G***λ*,

*Λ*and

*δ*

^{2}can be sampled from the posterior distribution

*P*(

*,*

**k***,*

**ξ***,*

**G***λ*,

*Λ*,

*δ*

^{2}|

*) with a reversible jump MCMC (Green 1995) scheme detailed in the next subsection.*

**x**### 2.6 RJMCMC scheme

*B*); death (removal) of an existing changepoint (

*D*); shift of a changepoint to a different time-point (

*S*); and update of the network topology within the segments (

*N*). These moves occur with probabilities \(b_{k_{i}}\) for

*B*, \(d_{k_{i}}\) for

*D*, \(u_{k_{i}}\) for

*S*and \(v_{k_{i}}\) for

*N*, depending only on the current number of changepoints

*k*

_{ i }and satisfying \(b_{k_{i}}+d_{k_{i}}+u_{k_{i}}+v_{k_{i}}=1\). The changepoint birth and death moves represent changes from, respectively,

*k*

_{ i }to

*k*

_{ i }+1 segments and

*k*

_{ i }to

*k*

_{ i }−1 segments. In order to preserve the restriction on the number of changepoints, some probabilities are set to 0:

*d*

_{0}=

*u*

_{0}=0 and \(b_{\overline{k}}=0\). Otherwise, following Green (1995), these probabilities are chosen as follows,

*P*(

*k*

_{ i }|

*λ*) is the prior distribution for the number of changepoints defined in Eq. (2) and the constant

*c*is chosen to be smaller than 1/4 so that network structure updates and changepoint position shifts are proposed more frequently than births and deaths of changepoints. This improves mixing and convergence with respect to changepoint positions and network structures within the different segments. Shifting of a changepoint is proposed with probability \(u_{k_{i}}=(1-b_{k_{i} }-d_{k_{i}+1})/3\), and updating of the network structure within each segment is proposed with probability \(v_{k_{i}} = 1-(b_{k_{i} }+d_{k_{i}}+u_{k_{i}})\).

*R*} where the acceptance ratio

*R*reads as follows:

**ξ**_{ i }be the current changepoint vector containing

*k*

_{ i }changepoints. For a changepoint birth move, a new changepoint position

*ξ*

^{⋆}is sampled uniformly from the available positions. The new changepoint is within an existing segment

*h*

^{⋆}of the target gene

*i*, \(\xi_{i}^{h^{\star}-1} < \xi^{\star} < \xi_{i}^{h^{\star }}\). Let us denote by \(h^{\star}_{L}\) and \(h^{\star}_{R}\) the segments to the left and to the right of the new changepoint respectively and by \(\boldsymbol {x}_{i}^{h^{\star}}=(\boldsymbol{x}_{i}^{h^{\star}_{L}},\boldsymbol {x}_{i}^{h^{\star}_{R}})\) the observed values for gene

*i*in those segments. One of \(h^{\star }_{L}\) and \(h^{\star}_{R}\) is chosen with equal probability. That segment retains the current network topology \(\boldsymbol{G}_{i}^{h^{\star}}\) of segment

*h*

^{⋆}, and an entirely new topology is sampled from the prior defined in Eq. (6) for the other segment. Let us denote by

*s*

^{⋆}the number of edges of the new topology. The Jacobian is equal to 1 and the prior ratio is computed from the probability of choosing a new changepoint position and a new network structure for the new segment. Then the birth of the proposed changepoint is accepted with probability \(A(\boldsymbol{\xi}_{i}^{+} \vert\boldsymbol{\xi}_{i} )=\min\{ 1,R(\boldsymbol{\xi}_{i}^{+} \vert \boldsymbol{\xi}_{i} ) \}\), with For details see Lèbre et al. (2010). Here \(\boldsymbol{\xi}_{i}^{+}\) refers to the proposed changepoint vector after adding the new changepoint

*ξ*

^{⋆}to the current vector

**ξ**_{ i }and for all

*h*in {1,…,

*k*

_{ i+1}}, \(\varGamma_{h} =\varGamma( \frac{\upsilon_{0} + \xi_{i}^{h}-\xi _{i}^{h-1}}{2} ) \), and all other quantities are defined in Sect. 2.5.

**ξ**_{ i }. The acceptance ratio of the changepoint death move is equal to the inverse of the changepoint birth acceptance ratio \(R(\boldsymbol{\xi}_{i} \vert \boldsymbol{\xi }_{i}^{-})\) for proposing a change from \(\boldsymbol{\xi}_{i}^{-}\) to

**ξ**_{ i }, given in Eq. (19). Therefore the acceptance probability of a changepoint death move is,

*R*} where \(R= (\mathrm{posterior\ ratio}) \times(\mathrm{proposal\ ratio}) \). The new changepoint vector \(\boldsymbol{\tilde{\xi}}_{i}\) is obtained by replacing \(\xi_{i}^{h}\) with \(\tilde{\xi}_{i}^{h}\) such that the absolute value \(\vert\xi_{i}^{h}-\tilde{\xi}_{i}^{h} \vert=1\). The posterior ratio is obtained from Eq. (12). Let us denote by \(\mathcal{Q}(\boldsymbol{\tilde{\xi}}_{i} \vert\boldsymbol{\xi }_{i})\) the probability of shifting changepoint \(\xi_{i}^{h}\) to \(\tilde{\xi}_{i}^{h}\) in the current changepoint vector

**ξ**^{ i }(and reciprocally for \(\mathcal{Q}(\boldsymbol{\xi}_{i} \vert\boldsymbol{\tilde{\xi}}_{i} )\)), then the changepoint shift is accepted with probability \(A(\boldsymbol{\tilde {\xi}}_{i} \vert\boldsymbol{\xi}_{i} )=\min\{1, R(\boldsymbol{\tilde{\xi}}_{i} \vert\boldsymbol{\xi}_{i} )\}\) where, where \(\boldsymbol{\tilde{x}}_{i}^{h}\) and \(\boldsymbol{\tilde {x}}_{i}^{h+1}\) refer to the expression levels for gene

*i*observed in phase

*h*and

*h*+1 of the new changepoint vector \(\boldsymbol{\tilde{\xi}}_{i} \), and \(\boldsymbol{\tilde {P}}_{i}^{h} \) and \(\boldsymbol{\tilde{P}}_{i}^{h+1} \) are the projection matrices built from \(\boldsymbol{\tilde{x}}_{i}^{h}\) and \(\boldsymbol{\tilde {x}}_{i}^{h+1}\) as defined in Eq. (15), and all other quantities are as defined in Sect. 2.5. See Lèbre et al. (2010) for the derivation of this equation.

*b*

_{0}=1,

*d*

_{0}=0, \(b_{\overline{s}}=0\) and \(d_{\overline{s} }=1\). The acceptance ratio \(R(\tilde{\boldsymbol{G}}_{i}^{h} \vert \boldsymbol{G}_{i}^{h})\) for the new set of \(\tilde{s}_{i}^{h}\) parents \(\tilde {\boldsymbol{G}}_{i}^{h}\) (which corresponds to \(\boldsymbol{G}_{i}^{h}\) with a parent added or removed) is computed according to Eq. (18). Using Eqs. (4) and (5), the edge birth prior ratio becomes

*δ*(

*x*,

*y*) being the Kronecker delta function. \(\mathcal{Q}^{+}(\tilde{\boldsymbol{G}}_{i}^{h}|\boldsymbol{G}_{i}^{h}) = 1/(p-|\tilde{\boldsymbol{G} }_{i}^{h}|)\) is the proposal probability of an edge birth move, and \(\mathcal{Q}^{-}(\tilde{\boldsymbol{G}}_{i}^{h}|\boldsymbol{G}_{i}^{h}) = 1/|\tilde{\boldsymbol{G} }_{i}^{h}|\) is the proposal probability of an edge death move. The Jacobian equals 1. Then using Eq. (14) for the likelihood ratio, the Metropolis-Hastings acceptance ratio for an edge move becomes

The sampling scheme for updating the hyperparameters *δ* ^{2}, *λ* and *Λ* is described in Lèbre (2007) and Lèbre et al. (2010). Together the four moves B, D, S and N allow the generation of samples from probability distributions defined on unions of spaces of different dimensions for both the number of changepoints *k* _{ i } and the number of parents \(s_{i}^{h}\) within each segment *h* for gene *i*.

## 3 Model improvement: information coupling between segments

Allowing the network structure to change between segments leads to a highly flexible model. However, this approach faces a conceptual and a practical problem. The *practical* problem is potential model over-flexibility. If subsequent changepoints are close together, network structures have to be inferred from short time series segments. This will almost inevitably lead to overfitting (in a maximum likelihood context) or inflated inference uncertainty (in a Bayesian context). The *conceptual* problem is the underlying assumption that structures associated with different segments are a priori independent. While this may be true in some circumstances (e.g. if a drug treatment leads to a drastic, rather than gradual, change), in most cases this assumption is not realistic. For instance, for the evolution of a gene regulatory network during embryogenesis, we would assume that the network evolves gradually and that networks associated with adjacent time intervals are a priori similar.

### 3.1 Hard versus soft information coupling of nodes

As noted above, we propose to share information about the network structure among the different time series segments that result from the changepoint process. The strength of these couplings is governed by the hyperparameters associated with the information sharing prior. We represent these hyperparameters collectively by * Θ*. However, another level of coupling is possible, coupling genes (nodes in the network) rather than time series segments.

Recall from Sect. 2 that each node in the network is associated with a random variable *X* _{ i }(*t*) that represents the gene expression level of gene *i* at time *t*. Under the regression model in Eq. (1), the regulators for gene *i* are independent of the structure of the rest of the network. Once we bring in information sharing, however, there is a set of hyperparameters that could conceivably be shared among different nodes; namely * Θ*. We address this by proposing two different ways of sharing

*: Hard coupling, where the information sharing prior has the same hyperparameters*

**Θ***for all nodes (with hyperprior having level-2 hyperparameters*

**Θ***); and soft coupling, where the information sharing prior has node-specific hyperparameters*

**Ψ**

**Θ**_{ i }, with common level-2 hyperparameters

*. In both cases we have a prior on*

**Ψ***with level-3 hyperparameters*

**Ψ***. See Figs. 1 and 2 for an illustration of hard versus soft information coupling of nodes.*

**Ω**In the following sub-sections, we will describe the different information sharing schemes in more detail.

### 3.2 Hard information coupling based on an exponential prior

*K*

_{ i }:=

*k*

_{ i }+1 the total number of partitions in the time series associated with node

*i*, and recall that each time series segment \(\boldsymbol{x}_{i}^{h}\) is associated with a separate subnetwork \(\boldsymbol{G}_{i}^{h}\), 1≤

*h*≤

*K*

_{ i }. We modify the prior from Eq. (6) by imposing a prior distribution \(P(\boldsymbol{G}_{i}^{h}|\boldsymbol{G}_{i}^{h-1},\beta) \) on the structures, and the joint probability distribution factorizes according to a Markovian dependence: Similar to Werhli and Husmeier (2008) we define

*h*≥2, where

*β*is a hyperparameter that defines the strength of the coupling between \(\boldsymbol{G}_{i}^{h}\) and \(\boldsymbol{G}_{i}^{h-1}\), and |.| denotes the Hamming distance. For

*h*=1, \(P(\boldsymbol{G}_{i}^{h})\) is given by (6). The denominator \(Z(\beta,\boldsymbol{G}_{i}^{h-1})\) in (30) is a normalizing constant, also known as the partition function: \( Z(\beta, \boldsymbol{G}_{i}^{h-1}) = \sum\nolimits_{\boldsymbol{G}_{i}^{h}\in\mathbb{G}}e^{-\beta |\boldsymbol{G} _{i}^{h}-\boldsymbol{G}_{i}^{h-1}|} \) where \(\mathbb{G}\) is the set of all valid subnetwork structures. If we ignore any fan-in restriction that might have been imposed a priori (via \(\overline{s}\) in Eq. (4)), then the expression for the partition function can be simplified: \(Z(\beta, \boldsymbol{G}_{i}^{h-1}) \approx \prod_{j=1}^{p} Z_{j}(\beta, e_{ij}^{h-1}) \) , where \(e_{ij}^{h}\) is a binary variable indicating the presence or absence of a directed edge from node

*j*to node

*i*in time series segment

*h*, and \( Z_{j}(\beta, e_{ij}^{h-1}) = \sum\nolimits_{e_{ij}^{h}=0}^{1} e^{-\beta|e_{ij}^{h}-e^{h-1}_{ij}|} = 1 + e^{-\beta} \) . Note that this expression no longer depends on \(\boldsymbol{G}_{i}^{h-1}\), and hence

*h*, the prior probability ratio in Eq. (23) has to be replaced by \(\frac{P(\boldsymbol{G}_{i}^{h+1}|\tilde{\boldsymbol {G}_{i}^{h}},\beta )P(\tilde{\boldsymbol{G}_{i}^{h}}|\boldsymbol{G}_{i}^{h-1},\beta )}{P(\boldsymbol{G}_{i}^{h+1}|\boldsymbol{G}_{i}^{h},\beta )P(\boldsymbol{G}_{i}^{h}|\boldsymbol{G}_{i}^{h-1},\beta)} \) , leading to the acceptance probability

*β*from the posterior distribution. For a proposal move \(\beta\rightarrow\tilde{\beta}\) with symmetric proposal probability \(\mathcal{Q}(\tilde{\beta}|\beta) = \mathcal{Q}(\beta|\tilde{\beta}) \) we get the following acceptance probability:

*P*(

*β*) was chosen as the uniform distribution on the interval [0,20].

### 3.3 Soft information coupling based on an exponential prior

*β*, which defines the prior coupling strength between structures associated with adjacent segments, node-dependent:

*β*→

*β*

_{ i }, and

*β*

_{ i }. At the first level, the hyperparameters are given a common gamma prior:

*κ*>0 and scale parameter

*ρ*>0. Recall that the gamma distribution has mean

*μ*=

*κρ*and variance

*σ*

^{2}=

*κρ*

^{2}. We elect to set the scale parameter

*ρ*=0.1 fixed. The shape parameter

*κ*is given a vague exponential prior:

*λ*

_{ κ }=10 to reflect our prior ignorance. This choice of prior has the following motivation. The coupling strength between the substructures is defined by the coefficient of variation \(\sigma/\mu= 1/\sqrt{\kappa}\), with smaller coefficients corresponding to stronger coupling strengths, and a zero coefficient (

*κ*→∞) reducing to the hard coupling scheme discussed in the previous section. By inferring the shape parameter

*κ*from the data, starting from a vague yet proper prior distribution, we determine if the coupling strength should be strong or weak.

*h*, the prior probability ratio in Eq. (23) has to be replaced by the ratio \(\frac{P(\boldsymbol{G}_{i}^{h+1}|\tilde{\boldsymbol {G}_{i}^{h}},\beta_{i})P(\tilde{\boldsymbol{G}_{i}^{h}}|\boldsymbol {G}_{i}^{h-1},\beta_{i})}{P(\boldsymbol{G}_{i}^{h+1}|\boldsymbol {G}_{i}^{h},\beta_{i})P(\boldsymbol{G}_{i}^{h}|\boldsymbol {G}_{i}^{h-1},\beta_{i})} \) , leading to the equivalent of the acceptance probability in Eq. (28):

*κ*of the level-2 hyperprior. Drawing a new shape parameter \(\tilde{\kappa}\) from a symmetric proposal distribution \(\mathcal{Q}(\tilde{\kappa}|\kappa)\), the acceptance probability is given by

### 3.4 Hard information coupling based on a binomial prior

*a*,

*b*is a beta distribution,

*i*and segment

*h*, \(\boldsymbol{G}_{i}^{h} \rightarrow\tilde{\boldsymbol{G}_{i}^{h}} \), the structures \(\boldsymbol{G}_{i}^{h}\) and \(\tilde{\boldsymbol{G}_{i}^{h}}\) enter the prior probability ratio in Eq. (23) via the expression \(P(\{\boldsymbol{G}_{i}^{h}\}|\alpha,\overline{\alpha },\gamma ,\overline{\gamma})\). The prior probability ratio becomes \(\frac{P(\{\boldsymbol{G}_{i}^{1}, \ldots, \tilde{\boldsymbol{G}_{i }^{h}}, \ldots, \boldsymbol{G}_{i}^{K_{i}}\}_{i=1}^{p}|\alpha ,\overline{\alpha},\gamma,\overline{\gamma})}{P(\{\boldsymbol {G}_{i}^{1}, \ldots, \boldsymbol{G}_{i}^{h}, \ldots, \boldsymbol {G}_{i}^{K_{i}}\}_{i=1}^{p}|\alpha,\overline{\alpha} ,\gamma,\overline{\gamma})} \) , leading to the acceptance probability This equation is equivalent to Eq. (28), with the prior probabilities in Eq. (23) replaced by those in Eq. (46). Note that \(P(\boldsymbol{x}_{i}^{h}|\boldsymbol{G}_{i}^{h})\) is short for \(P(\boldsymbol{x}_{i}^{h} \vert\boldsymbol{G}_{i}^{h}, \delta^{2})\) which is defined in Eq. (14) and the proposal ratio \(\frac {\mathcal{Q}(\boldsymbol{G}_{i}^{h}|\tilde{\boldsymbol{G}_{i}^{h}})}{\mathcal{Q}(\tilde {\boldsymbol{G}_{i}^{h}}|\boldsymbol{G}_{i}^{h})}\) defined in Eqs. (24) and (25). From Fig. 1, it becomes clear that as a consequence of integrating out the hyperparameters, all network structures become interdependent, and information about the structures is contained in the sufficient statistics \(N_{1}^{1}, N_{1}^{0}, N_{0}^{1}, N_{0}^{0}\). A new proposal move for the level-2 hyperparameters is added to the existing RJMCMC scheme of Sect. 2.6. New values for the level-2 hyperparameters

*α*are proposed from a uniform distribution over the support of

*P*(

*α*). For a move

*α*→ \(\tilde{\alpha}\), the acceptance probability is:

*γ*and \(\overline {\gamma}\).

### 3.5 Soft information coupling based on a binomial prior

*a*

_{ i },

*b*

_{ i }that are softly coupled via a common level-2 hyperprior, \(P(a_{i},b_{i}|\alpha,\overline{\alpha},\gamma,\overline{\gamma}) \propto a_{i}^{(\alpha-1)} (1-a_{i})^{(\overline{\alpha}-1)} b_{i}^{(\gamma-1)} (1-b_{i})^{(\overline{\gamma}-1)} \) as illustrated in Fig. 2:

*a*,

*b*by

*a*

_{ i },

*b*

_{ i }—from which we get as an equivalent to (46), using the definition \(N_{k}^{l}[i]=\sum_{h=2}^{K_{i }}N_{k}^{l}[h, i]\): As in Sect. 3.4, we extend the RJMCMC scheme from Sect. 2.6 so that when proposing a new network structure, \(\boldsymbol{G}_{i}^{h} \rightarrow\tilde{\boldsymbol{G}_{i}^{h}} \), the prior probability ratio in Eq. (23) has to be replaced by: \(\frac{P(\boldsymbol{G}_{i}^{1}, \ldots, \tilde{\boldsymbol{G}_{i}^{h }}, \ldots, \boldsymbol{G}_{i}^{K_{i}}|\alpha,\overline{\alpha} ,\gamma,\overline{\gamma})}{P(\boldsymbol{G}_{i}^{1}, \ldots, \boldsymbol{G}_{i}^{h}, \ldots, \boldsymbol{G}_{i}^{K_{i }}|\alpha,\overline{\alpha},\gamma,\overline{\gamma})} \) , leading to the equivalent of the acceptance probability in Eq. (28):

*α*→ \(\tilde {\alpha}\), where the prior and proposal probabilities are the same as in Sect. 3.4, the acceptance probability becomes:

*γ*and \(\overline {\gamma}\).

### 3.6 Improved MCMC scheme

The various information sharing priors that we have introduced in the previous Sects 3.2, 3.3, 3.4, 3.5 share the characteristic that they encourage the networks of all segments to be similar to each other.^{2} When applying the MCMC scheme from Lèbre et al. (2010), summarized in Sect. 2.6, adapted to our prior as discussed above, this can lead to the following curious effect. On simulated data where the network structure is the same for all segments we found that the network reconstruction accuracy deteriorated when we increased the coupling strength between the structures. The results will be presented below, in Sect. 5 and Fig. 4. These findings appear counter-intuitive, given that increasing the coupling strength brings the prior more in line with the truth (the perfect prior would have infinitely strong coupling). However, it is easily seen that increasing the coupling strength adversely affects the mixing of the Markov chains. Consider a set of identical network structures which, at an initial stage of the MCMC simulations, are all poor at explaining the data. We now visit a segment and propose a modification of the network structure associated with it. This modification introduces a mismatch between the structures and is, hence, discouraged by the prior. For strong coupling this discouragement might outweigh the gain in the likelihood that would result from a better structure. The structures thus remain identical, which in turn will tend to increase the coupling strength. The MCMC simulation thus gets trapped in a suboptimal state of the configuration space (local optimum).

*i*). However, they can be generalized for inference over the whole network by simply picking a target node at random. Given a node, the proposal move consists of two steps: (1) Pick one of

*p*possible parents for the target node

*i*. (2) For each segment

*h*of the

*K*

_{ i }segments, flip the edge status (changing an edge to a non-edge or vice-versa) between the parent node and the target node with probability

*q*. In our simulations, we set \(q = \frac{1}{2}\) so that flipping the edge status and conserving it are equally likely outcomes. It is straightforward to adapt this parameter during the burn-in phase. This means that the probability of proposing a new set of structures \(\tilde{\boldsymbol{G}_{i}}\) given the set of network structures

**G**_{ i }using the multi-segment move is:

**G**_{ i }and the proposed set \(\tilde{\boldsymbol{G}_{i}}\). Let \(R_{likelihood}( \tilde{\boldsymbol{G}_{i}^{h}}|\boldsymbol{G}_{i }^{h}) = \frac{P(\boldsymbol{x}_{i}^{h} \vert\tilde{\boldsymbol {G}}_{i}^{h}, \delta^{2})}{P(\boldsymbol{x}_{i}^{h} \vert\boldsymbol{G}_{i}^{h}, \delta^{2})}\) be the likelihood ratio of the original and proposed network structures for segment

*h*and target node

*i*, where the likelihood \(P(\boldsymbol{x}_{i}^{h} \vert\boldsymbol{G}_{i}^{h}, \delta^{2})\) is defined in Eq. (14) of Sect. 2.6. Note that the changes introduced by multi-segment moves are equivalent to a sequence of add and remove edge moves applied to individual segments, so that this ratio remains unchanged. Then the acceptance ratio for multi-segment moves can be expressed as:

*P*(

**G**_{ i }) depends on our choice of prior. If segments are independent, then \(P(\boldsymbol{G}_{i}) = \prod_{h=1}^{K_{i}} {P(\boldsymbol{G}_{i}^{h})}\), where \(P(\boldsymbol{G}_{i}^{h})\) is the prior from Eq. (6), with a Poisson distribution on the number of parents. If we want to use information sharing between segments, then the prior for segment

*h*depends on segment

*h*−1, so that \(P(\boldsymbol{G}_{i}) = P(\boldsymbol{G}_{i}^{1}) \prod_{h =2}^{K_{i}} P(\boldsymbol{G}_{i}^{h}|\boldsymbol{G}_{i}^{h- 1})\), where \(P(\boldsymbol{G}_{i}^{h}|\boldsymbol{G}_{i}^{h- 1})\) could be any of the information sharing priors introduced in Sect. 3. Finally, \(R_{proposal}(\tilde {\boldsymbol{G}_{i}}|\boldsymbol{G}_{i})\) is the Hastings ratio:

**G**_{ i }, the multi-segment moves are symmetric, and we obtain that \(R_{proposal}(\tilde{\boldsymbol{G}_{i}}|\boldsymbol{G}_{i}) = 1\).

We have explored an alternative proposal scheme consisting of two moves: (1) a move proposing network structures where an edge has been set identical in all segments, and (2) the move described above, which corresponds to a random perturbation of an edge. However, we found that including the first kind of proposal move adversely affected mixing and convergence in simulations where the true network structure presented differences among segments. These network structures are less likely to be proposed when both moves are included. Details can be found in Dondelinger (2012).

## 4 Implementation and simulations

We have implemented our model in R, based on code from Lèbre (2007) and Lèbre et al. (2010). The network structure, the changepoints and the hyperparameters are sampled from the posterior distribution using RJMCMC as described in Sects. 2.6 and 3.6. We ran the MCMC chains until we were satisfied that convergence was reached. Then we sampled 1000 network and changepoint configurations in intervals of 200 RJMCMC steps. By marginalization and under the assumption of convergence, this represents a sample from the posterior distribution in Eq. (12). By further marginalization, we get the posterior probabilities of all gene regulatory interactions, which defines a ranking of the interactions in terms of posterior confidence. We use the potential scale reduction factor (PSRF) (Gelman and Rubin 1992), computed from the within-chain and between-chain variances of marginal edge posterior probabilities, as a convergence diagnostic. The usual threshold for sufficient convergence lies at PSRF ≤1.1. In our simulations, we extended the burn-in phase until a value of PSRF ≤1.05 was reached.

For the study on simulated data, and the synthetic biology data, the true interaction network is known. Therefore, varying the threshold on this ranking allows us to construct the Receiver Operating Characteristic (ROC) curve (plotting the sensitivity or recall^{3} against the complementary specificity^{4}) and the precision-recall (PR) curve (plotting the precision^{5} against the recall), and to assess the network reconstruction accuracy in terms of the areas under these graphs (AUROC and AUPRC, respectively); see Davis and Goadrich (2006). These two measures are widely used in the systems biology literature to quantify the overall network reconstruction accuracy (Prill et al. 2010), with larger values indicating a better prediction performance overall.

## 5 Evaluation on simulated data

### 5.1 Comparative evaluation of network reconstruction and hyperparameter inference

The purpose of the simulation study is two-fold. Firstly, we want to carry out a comparative evaluation of the proposed Bayesian regularization schemes for a controlled scenario in which the true network structure is known. Secondly, we want to assess the Bayesian inference scheme and test the viability of the proposed MCMC samplers. To focus on the task of network reconstruction, we keep the changepoints fixed at their true values. The inference of the changepoints will be investigated later, on the real gene expression time series (see Fig. 12).

*λ*

_{ parents }=3 was used to determine the number of parents for each node. We simulated changes in the network structure by producing 4 different network segments, where a Poisson distribution with mean

*λ*

_{ changes }∈{0.25,0.5,1} was used to determine the number of changes per node. The changes were then applied uniformly at random to edges and non-edges in the previous segment. For each segment

*h*, we generated a time series of length 15 using a linear regression model:

*(*

**x***t*) is the 10×1 vector of observations at time

*t*and \(\boldsymbol{W}^{h}= \{w_{ij}^{h}\}\) is the 10×10 matrix of segment-specific regression weights for each edge. We chose the regression weights such that \(w_{ij}^{h}= 0\) if there is no edge between node

*i*and node

*j*in the network structure for segment

*h*, and \(w_{ij}^{h} \sim N(0, 1)\) otherwise. We added Gaussian observation noise

*ϵ*

_{ i }∼

*N*(0,1) independently for each observation of node

*i*.

*β*should lead to the best network reconstruction accuracy, as this corresponds to the tightest tying between adjacent structures. However, repeating the MCMC simulations initially did not confirm this conjecture; see Figs. 4(c) and 4(d). As discussed in Sect. 3.6, the observed mismatch was a consequence of poor mixing and convergence for large hyperparameter values, which is endemic to the naive extension of the MCMC sampler from Lèbre et al. (2010). Repeating the simulations with the novel MCMC scheme proposed in Sect. 3.6 leads to the graphs of Figs. 4(a) and 4(b). Here, the network reconstruction accuracy no longer deteriorates with increasing hyperparameters, indicating that the mixing and convergence problems have been averted.

*β*of the exponential prior does indeed tend to higher values, the situation is different for the hyperparameters

*a*and

*b*of the binomial prior. The top panels in Fig. 5 show the network reconstruction accuracy in terms of AUROC and AUPRC scores for several fixed values of the hyperparameters

*a*and

*b*. As expected, the peak performance is reached for the highest values, as no mismatch between the structures implies that tight coupling is consistent with the data. The centre panels of Fig. 5 show the posterior distribution of the hyperparameters that was obtained with the conventional MCMC proposal scheme adapted from Lèbre et al. (2010) and described in Sect. 2.6. There is an obvious mismatch between the high-posterior probability region and the region of hyperparameters that optimize the network reconstruction. This provides more evidence that the sampler adapted for segment coupling from Lèbre et al. (2010) suffers from mixing and convergence problems. The bottom panels of Fig. 5 show the marginal posterior distributions of the hyperparameters inferred in the MCMC simulations with the novel multi-segment proposal move introduced in Sect. 3.6. It is seen that, unlike the centre panels in Fig. 5, and as a consequence of the different proposal scheme, the high posterior probability region now concurs with the region of maximum network reconstruction accuracy. This agreement suggests that the novel MCMC sampler leads to a significant improvement in mixing and convergence, in corroboration of our conjecture in Sect. 3.6.

Next, we turn our attention to varying network structures. We varied the percentage of edges that change from segment to segment between 2.5 % to 10 %.^{6} A significant improvement in the network reconstruction accuracy can be achieved over the unregularized method, as shown in the bottom panels of Fig. 3. However, the magnitude of the improvement in the scores decreases as the number of changes between adjacent segments increases. This is plausible: as we introduce more structural changes between adjacent networks, we would expect to gain less benefit from information sharing. We note that the degradation in performance seems to be stronger for the exponential prior than for the binomial prior.

*β*<∞ and 0<

*a*,

*b*<1. This has in fact been borne out in our simulations. Figure 6 shows the network reconstruction accuracy in terms of AUROC and AUPRC scores for different values of the hyperparameters

*a*,

*b*. The best network reconstruction accuracy is obtained when

*b*, which governs consistency among non-interactions, is high (≥0.9), while

*a*, which controls agreement among interactions, is reduced to a range around its uninformative setting

*a*≈0.5. The bottom panel of Fig. 6 shows that the inferred posterior distribution is consistent with these ranges, and that the Bayesian inference scheme thus optimizes the network reconstruction accuracy. A slightly different picture emerges for the exponential prior, though. Figures 7(a)–7(b) show the AUROC and AUPRC scores for different values of

*β*, indicating a clear peak in the network reconstruction accuracy for finite 0<

*β*<∞. This peak does not coincide with the high posterior probability range of

*β*, as shown in Fig. 7(c). Only when increasing the data set size by a factor of 4 does the Bayesian inference scheme succeed in optimizing the network reconstruction accuracy in the sense that the high posterior probability region now coincides with the range of the highest AUROC/AUPRC scores. The obvious question to ask is whether this trend is another artifact of poor MCMC convergence/mixing. To this end we have devised a simplified model for which the posterior distribution can be computed in closed form. Our analysis, which we present in Sect. 5.2, reproduces the results from this simulation study, suggesting that the suboptimal performance of the Bayesian inference scheme is intrinsic to the chosen form of the prior.

^{7}

Returning to the binomial prior, we finally investigated the influence of the level-2 hyperparameters \(\alpha, \overline{\alpha}, \gamma\), and \(\overline{\gamma}\). Recall that owing to the conjugacy of the prior, these values can be interpreted as fictitious prior observation counts. Our initial idea was to keep the mismatch hyperparameters fixed at \(\overline{\alpha}=\overline{\gamma}=1\), while putting a vague uniform distribution over the set {1,2,…,100} as a prior on the match hyperparameters *α* and *γ*. The rationale behind this choice is that the regularization scheme is intended to encourage similarity rather than dissimilarity between adjacent network structures. However, repeating the MCMC simulations for different values of the level-2 hyperparameters revealed that the setting \(\overline{\alpha}=\overline{\gamma}=1\) is too restrictive and that the network reconstruction accuracy can be improved by relaxing this constraint (see Fig. 8).

The findings of our simulation study can be summarized as follows. A naive extension of the MCMC sampler of Lèbre et al. (2010), as described in Sect. 3.6, leads to a poor network reconstruction accuracy for high values of the hyperparameters; this problem can be resolved with the novel proposal scheme introduced in Sect. 3.6. With this new proposal scheme, information sharing with the binomial prior leads to a significant improvement in the network reconstruction accuracy in all cases, while information sharing with the exponential prior leads to a significant improvement when the true network structures are sufficiently similar. A detailed analysis of hyperparameter inference shows that the Bayesian inference scheme is consistent for the binomial prior in the sense that the high posterior probability region of the hyperparameters concurs with the one that optimizes the network reconstruction accuracy. For the exponential prior, this consistency is only given when the data set size is sufficiently large; otherwise a more restrictive hyperprior (i.e. prior on *β*) is needed. On the other hand, a restrictive setting for the level-2 hyperparameters of the binomial prior is counter-productive, and better network reconstruction scores are obtained with a non-informative hyperprior.

### 5.2 Closed-form inference for the exponential prior

*per se*. As a demonstration, we reproduce the observation from Fig. 7 with a simpler model for which a closed-form expression of the posterior distribution of the hyperparameter can be derived. We consider the scenario depicted in Fig. 9, where edges of a hypothetical network can be divided into different categories, depending on whether or not they are true, supported by the data, or included in the prior network. An overview of the notation is presented in Table 2. With the simplifying assumption of posterior independence of the edges, the likelihood is given by

*n*

_{ S }counts the number of elements in set

*S*for network

*, and the symbols denoting the sets have been defined in Table 2. Assuming a uniform prior on*

**G***β*, the posterior distribution of the hyperparameter becomes: where

**G**^{0}represents our prior knowledge. Inserting (57) into (58) we get, with Eq. (31) for

*Z*(

*β*) and under the assumption of a uniform prior on

*β*:

*β*≈1. The effect of the data set size is emulated by varying the settings of the parameters entering the likelihood. For small values of

*A*and

*A*

^{∗}, corresponding to small data sets, the posterior probability increases monotonically in

*β*, and the Bayesian inference scheme intrinsically fails to find the range of hyperparameters that optimizes the network reconstruction accuracy. When we increase the data set size, this mismatch disappears, and the two regions concur. These findings are consistent with those presented in Fig. 7 and suggest that the observed mismatch is a genuine inference feature rather than an MCMC artifact.

*A*and

*A*

^{∗}for which the posterior distribution shows a peak for a finite value of

*β*. Analytically, this corresponds to finding values for

*A*and

*A*

^{∗}such that the equation \(\frac{dP(\beta |\boldsymbol{x} )}{d\beta} = 0\) has a solution. Unfortunately, it is non-trivial to determine the existence of a solution analytically; we have therefore resorted to numerically calculating \(\frac{dP(\beta|\boldsymbol{x})}{d\beta }\) for

*β*=20. At

*β*=0, we have \(\frac{dP(\beta|\boldsymbol{x})}{ d\beta} > 0\); therefore, if \(\frac{dP(\beta|\boldsymbol{x})}{d\beta} < 0\) at

*β*=20, this indicates that the distribution has a peak on the interval [

*β*,20]. On the other hand, under the assumption of unimodality, \(\frac{dP(\beta|\boldsymbol{x})}{d\beta} > 0\) at

*β*=20 indicates that the marginal posterior probability of

*β*increases monotonically with

*β*. The results of this analysis are shown in Fig. 11, which shows a clear phase shift towards distributions with a peak as

*A*and

*A*

^{∗}increase.

What does this analysis entail for the general applicability of the exponential prior? It is clear that when the data set size is too small, then the marginal posterior distribution of *β* will be biased towards high values. The exact definition of “too small” will crucially depend on the nature of the dataset. Given that we have shown in Sect. 5.1 that the binomial prior avoids this weakness and outperforms the exponential prior in terms of network reconstruction accuracy, we would recommend that this form of information sharing prior be used in preference of the exponential prior.

## 6 Real-world applications

### 6.1 Morphogenesis in *Drosophila melanogaster*

*Drosophila melanogaster*undergoes four major stages of morphogenesis: embryo, larva, pupa and adult. Arbeitman et al. (2002) obtained a gene expression time series covering all four stages. We have applied our methods to a subset of this gene expression time series consisting of eleven genes involved in wing muscle development. First, we investigated whether the changepoints inferred by our methods correspond to the known transitions between stages. Figure 12(a) shows the posterior probabilities of inferred changepoints for any gene using TVDBN-0 (unregularized by information sharing, see Table 1), while Figs. 12(c)–12(d) show the posterior probabilities for the information sharing methods. We compared this performance to the method proposed in Ahmed and Xing (2009), using the authors’ software package TESLA (Fig. 12(b)). In addition, Robinson and Hartemink (2009) used a discrete non-homogeneous DBN to analyse the same data set, and a plot corresponding to Fig. 12(b) can be found in their paper.

List of different information sharing (IS) priors for the TVDBN (Time-Varying Dynamic Bayesian Network), the equation where they were defined, and the most common hyperparameter settings that were used, or hyperparameter ranges if they are inferred. Only the highest level hyperparameters in the Bayesian hierarchy are shown

Name | Prior | Section | Equation | Hyperparameters |
---|---|---|---|---|

TVDBN-0 | Poisson (No IS) | | ||

TVDBN-Exp-hard | Exponential Hard IS | | ||

TVDBN-Exp-soft | Exponential Soft IS | | ||

TVDBN-Bino-hard | Binomial Hard IS | \(\alpha ,\overline{\alpha},\gamma,\overline{\gamma}\in \{1,2,\ldots,100\}\) | ||

TVDBN-Bino-soft | Binomial Soft IS | \(\alpha,\overline{\alpha},\gamma ,\overline{\gamma}\in \{1,2,\ldots,100\}\) |

Likelihood and prior scores for the edges contained in the sets defined in Fig. 9. The product of the prior and the likelihood defines the rank of the edge; the truth indicator is shown in the second column

Set | True edge | Supported by the data | Supported by the prior | Likelihood | Prior | Number of edges |
---|---|---|---|---|---|---|

| yes | yes | no | | | |

| yes | yes | yes | | 1 | |

| yes | no | yes | 1 | 1 | \(N_{LB^{*}}\) |

| yes | no | no | 1 | | \(N_{L^{*}}\) |

| no | no | yes | 1 | 1 | |

| no | yes | yes | | 1 | \(N_{B^{*}}\) |

| no | yes | no | | | \(N_{F^{*}}\) |

| no | no | no | 1 | | |

An analysis of the results suggests that our non-homogeneous DBN methods are generally more successful than TESLA. We recover changepoints for all three transitions (embryo → larva, larva → pupa, and pupa → adult). As shown in Fig. 12(b), the last transition, pupa → adult, is less clearly detected with TESLA, and it is completely absent in Robinson and Hartemink (2009). Furthermore, TESLA and our method both detect additional changepoints during the embryo stage, which are missing in Robinson and Hartemink (2009). It is not implausible that additional transitions at the gene regulatory network level should occur within one morphogenic phase. One would expect that a complex gene regulatory network is unlikely to transition into a new phase all at once, and some pathways might have to undergo activational changes earlier in preparation for the morphogenic transition. However, a failure to detect a known transition represents a shortcoming of a method, and so we can say that in this aspect, our model appears to outperform the two alternative approaches.

*eve*and

*twi*. This interaction is also reported in Guo et al. (2007) and Zhao et al. (2006), while Robinson and Hartemink (2009) seem to have missed it. We also recover a cluster of interactions among the genes

*myo61f*,

*msp300*,

*mhc*,

*prm*,

*mlc1*and

*up*during all morphogenic phases. This result is not implausible, as all genes (except

*up*) belong to the myosin family. However, unlike Robinson and Hartemink (2009), we find that

*actn*also participates as a regulator in this cluster. There is some indication of this in Zhao et al. (2006), where

*actn*is found to regulate

*prm*. We have further validated our reconstructed networks using genetic and protein interactions recorded in the FLIGHT database (Sims et al. 2006). We found that a number of the inferred interactions over all segments correspond to interactions that have been reported in the literature. Some of these result from indirect interactions, where the intermediate gene is missing in the data. Table 3 gives an overview of the identified interactions with references to the biological literature.

Reconstructed interactions in the *Drosophila melanogaster* wing muscle development network that have been validated using the FLIGHT database (Sims et al. 2006)

Interaction | References | Interaction | Notes |
---|---|---|---|

| Homyk and Emerson (1988); Nongthomba et al. (2003),; Montana and Littleton (2004) | Protein | Via missing gene |

| Protein | Via missing gene | |

| Parkhurst and Ish-Horowicz (1991) | Protein | Via missing gene |

| Homyk and Emerson (1988); Nongthomba et al. (2003); Montana and Littleton (2004) | Protein | Direct interaction |

| Formstecher et al. (2005) | Gene | Via missing gene |

| Sanchez et al. (1999) | Gene | Direct Interaction |

| Formstecher et al. (2005) | Gene | Via missing gene |

| Gene | Via missing gene | |

| Protein and Gene | Via missing gene |

### 6.2 Synthetic biology in *Saccharomyces cerevisiae*

*in vivo*from cells with a known gene regulatory network structure to objectively assess the network reconstruction accuracy. Our work is based on Cantone et al. (2009), where the authors constructed a synthetic regulatory network with 5 genes in

*Saccharomyces cerevisiae*(yeast). Then they measured gene expression time series with RT-PCR for 16 and 21 time points under two experimental conditions, related to the carbon source: galactose (“switch on”), and glucose (“switch off”). The authors applied two established state-of-the-art methods from computational systems biology to reconstruct the known underlying network from these time series. One is based on ODEs: ordinary differential equations (TSNI), the other is based on conventional DBNs (Banjo); see Cantone et al. (2009) for details. Both methods are optimization-based and only output a single network. By comparison with the known network, the authors calculated the precision (proportion of predicted regulatory interactions in the network that are correct) and recall (proportion of predicted true interactions) scores. Figure 14 shows the true networks, the reconstructed networks for TSNI and Banjo, as well as the reconstructed networks using TVDBN-Bino-hard, where we have applied a threshold of 0.75 to the inferred marginal posterior probabilities of the gene interactions to obtain absence/presence values for the edges.

^{8}

In our study, we merged the time series from the two experimental conditions under exclusion of the boundary point,^{9} and applied the non-homogeneous DBNs from Table 1. Figures 12(e) and 12(f) show the inferred marginal posterior probabilities of potential changepoints. The salient changepoint is at the boundary between the “switch on” (galactose) and “switch off” (glucose) phases, confirming that the true changepoint is consistently identified. However, in the absence of information sharing, we observe additional spurious changepoints. These changepoints are successfully suppressed with the proposed Bayesian information-coupling schemes, with the binomial prior having a slightly stronger regularizing effect than the exponential one.

*β*, which correspond to stronger coupling, result in a better performance (Fig. 16(a)). Figure 16(b) shows the effect of different values for

*κ*in Eq. (37). There is no discernible trend, which suggests that the strength of the coupling scheme does not matter much for this application, and that when moving closer to the hard coupling scheme (higher

*κ*while keeping the mean

*μ*of the gamma distribution fixed), the network reconstruction performance does not change significantly. The results obtained with the binomial prior demonstrate that, for this application, encouraging agreement related to the presence of interactions is more important than agreement related to the absence of interactions (Fig. 16(c)). Figure 16(d) confirms that our sampled hyperparameters

*a*and

*b*are in the correct range for optimal network reconstruction.

## 7 Discussion

In the present paper we have addressed some of the challenges encountered in systems biology when attempting to reconstruct gene regulatory networks from gene expression time series. We have looked at the case where the network structure may change over time due to developmental or environmental causes. To deal with this situation, we have developed a non-homogeneous DBN, which has various advantages over existing schemes: it does not require the data to be discretized (as opposed to Robinson and Hartemink 2009, 2010); it allows the network structure to change with time (as opposed to Grzegorczyk and Husmeier 2009, 2011); it includes four different regularization schemes based on inter-time segment information sharing (as opposed to Lèbre 2007; Lèbre et al. 2010); and it allows all hyperparameters to be inferred from the data via a consistent Bayesian inference scheme (as opposed to Ahmed and Xing 2009).

- (1)
We allow for different penalties between edges and non-edges. The method in Robinson and Hartemink (2009, 2010) simply penalizes the number of different edges, i.e. the Hamming distance, between two adjacent structures. This corresponds to the approach taken for the exponential prior in Sects. 3.2 and 3.3. The inclusion of an extra edge leads to the same penalty as the deletion of an existing edge. This might not always be appropriate. Removing a rate-limiting reaction step of a critical signalling pathway is a more substantial change than including some redundant bypass pathway. Our two models based on the binomial prior (Sects. 3.4 and 3.5) allow for that by introducing different prior penalties for the deviation between edges and for the deviation between non-edges. In Sect. 5.1 we have experimentally shown that an information sharing approach based on different penalties for edges and non-edges can outperform the simpler approach when the number of changes among segments is small, but non-zero.

- (2)
We allow for different nodes of the network to have different penalty terms. The model in Robinson and Hartemink (2009, 2010) has a single hyperparameter for penalizing differences between structures:

*λ*_{ s }. This might not be appropriate if different subnetworks are conserved to a different degree. For instance, we would assume that molecular network substructures related to generic functionality, e.g. to maintain an essential baseline metabolism, are conserved to a greater extent than more peripheral pathways. By introducing node-dependent hyperparameters, the priors described in Sects. 3.3 and 3.5 generalize the approach in Robinson and Hartemink (2009, 2010) by allowing different parts of the network to be conserved during the temporal process to a different extent.

A further difference to Robinson and Hartemink (2009, 2010) merits some additional discussion. In our model, the changepoints are node-dependent. This gives us extra model flexibility, which is biologically motivated: on infection of an organism by a pathogen, genes involved in defence pathways are likely to be up-regulated, while others are not. Hence, it is plausible that different genes respond to changes in the environment differently, and this is directly incorporated in our model. In Robinson and Hartemink (2010), node-specific changepoints can be obtained indirectly: the calculation of the sufficient statistics for computing the marginal likelihood depends on the intervals during which each parent set is active. The marginal likelihood is recomputed for epochs, where an epoch is the union of consecutive time intervals during which a node-dependent substructure does not change. Since these unions of sets can be different for different nodes, the model does allow different changepoint sets to be associated with different nodes. However, there is a considerable price to pay for that: a changepoint in Robinson and Hartemink (2010) is intrinsically associated with a structure change, whereas in our model, a changepoint can be related to either a structure or a parameter change, or both. This gives us extra model flexibility, which is important for systems biology: when adapting to environmental change, several molecular interactions in signalling pathways may be up- or down-regulated, rather than switched on or off altogether.

An evaluation on simulated data has demonstrated that the proposed Bayesian regularization and information sharing schemes lead to an improved performance over Lèbre (2007) and Lèbre et al. (2010). We have carried out a comparative evaluation of four different information coupling schemes: a binomial versus an exponential prior, and hard versus soft information coupling. This comparison has revealed that the binomial prior allows for more consistent inference of the right level of information sharing, while the exponential prior tends to enforce overly-strong information sharing. The difference between hard and soft information coupling seems negligible in the scenarios we investigated. A detailed investigation of the hyperparameter inference has allowed us to improve the MCMC sampler for better convergence, and to explore the limitations of the exponential information sharing prior.

The application of our method to gene expression time series taken during the life cycle of *Drosophila melanogaster* has revealed better agreement with known morphogenic transitions than the methods of Robinson and Hartemink (2009, 2010) and Ahmed and Xing (2009), and we have been able to identify several gene and protein interactions that are known from the literature. In an application to data from a topical study in synthetic biology (Cantone et al. 2009), our methods have outperformed two established network reconstruction methods from computational systems biology, and information sharing has allowed us to reconstruct the true underlying gene network with higher overall precision and recall than would have been possible without it.

We have investigated the performance of our methods on datasets which arise from gene regulatory networks with temporal changes in the structure of the network. There are several special cases of this situation which merit further discussion. The simplest case occurs when the changes of the underlying process are limited to parameter changes, and the true structure of the network remains constant. We have shown in Sect. 5.1 that our methods can deal with this situation effectively thanks to information sharing among segments. A more complicated case could involve a reoccurring event that causes certain gene interactions to switch on or off, leading to repeated network structures. For example, in a circadian clock system such as Locke et al. (2006), Pokhilko et al. (2010), the absence of sunlight might deactivate the interaction between two genes in the network, causing its structure to change from A to B.^{10} If gene expression levels are measured both during the day and at night for three days, then we will observe a sequence like ABABAB. While our methods can in principle represent repeated segments, the multiple changepoint process was not designed with this in mind. A better model for repeated segments might be a Hidden Markov Model (HMM), where each hidden state corresponds to a network structure, and transitions between states correspond to changes in the structure, in the same vein as applied to changing tree structures in phylogeny (Husmeier and McGuire 2003). The disadvantage of using HMMs is that they impose a geometric distribution on the segment lengths, and in that respect our changepoint process is more flexible. To have the same flexibility with HMMs, model extensions along the lines of hierarchical HMMs or HMMs with weighting times could be pursued, as known from speech processing, but this would come at significantly increased computational costs. Hence, this approach only appears to make sense if there is strong prior indication that repetitions occur.

An interesting topic for future work is to investigate other functional forms of the information sharing mechanism. In our work, we have investigated four different models, based on an exponential versus binomial distribution, with or without gene-specific hyperparameters. It has recently come to our attention that Wang et al. (2011) have experimented with a different approach, which effectively combines our exponential prior with an additional factor that encourages network sparsity. Sparsity in our model is encouraged by the truncated Poisson prior of Eq. (4), as explained in the paragraph under Eq. (30). It would be interesting to explore the effect of the additional factor used in Eq. (7) of Wang et al. (2011) in the context of gene network reconstruction.

Reconstructing gene regulatory networks from transcriptional profiles remains a challenging problem, which a flurry of ongoing methodological developments in the computational systems biology community are trying to address. We believe that our paper adds a valuable contribution to this field, by presenting a consistent and flexible Bayesian model for the case where the network structures change over time.

## Footnotes

- 1.
See Larget and Simon (1999) for a demonstration of the higher computational costs of bootstrapping over Bayesian approaches based on MCMC.

- 2.
Note that the binomial information sharing prior (Sects. 3.4 and 3.5) can in principle encourage either similarity or dissimilarity depending on the hyperparameters

*a*and*b*. As discussed in Sect. 5, we had originally envisaged setting the level-2 hyperparameters \(\overline{\alpha}\) and \(\overline{\gamma}\) equal to 1 to enforce similarity, but Fig. 8 demonstrates that this constraint is too restrictive. - 3.
The

*sensitivity*or*recall*denotes the fraction of true interaction that have been recovered. - 4.
The

*specificity*denotes the fraction of spurious interactions that have been successfully avoided. - 5.
The

*precision*is the fraction of predicted interactions that are correct. - 6.
Because our simulation was set up so that we had on average 3 regulatory interactions per node, this corresponds to a change of between 8.25 % and 33 % of the original interactions.

- 7.
We note that the results for the exponential prior seem to be at odds with those reported in Husmeier et al. (2010). The reason is that in Husmeier et al. (2010) we had selected, by a fluke, a more restrictive prior on the hyperparameter:

*β*∈[0,5]. As our discussion in Sect. 5.2 shows, this setting boosts the network reconstruction performance. - 8.
Note that while our TVDBN methods are in principle capable of inferring the type of interaction (activation or inhibition) by sampling regression weights, we have not investigated this for the purpose of this paper. Therefore in Fig. 14, the arrows in the networks reconstructed using TVDBN-Bino-hard only record the presence or absence of an interaction, and not its type.

- 9.
When merging two time series (

*x*_{1},…,*x*_{ m }) and (*y*_{1},…,*y*_{ n }), only the pairs*x*_{ i }→*x*_{ j }and*y*_{ i }→*y*_{ j }are presented to the DBN, while the pair*x*_{ m }→*y*_{1}is excluded due to the obvious discontinuity. - 10.
Note that our definition of a deactivated gene interaction includes interactions that no longer occur because one of the interacting genes is no longer expressed.

## Notes

### Acknowledgements

Most of the work was carried out while Dirk Husmeier was employed at Biomathematics and Statistics Scotland, and the work was supported by the Scottish Government’s Rural and Environment Science and Analytical Services Division (RESAS). This work was partly funded by EU FP7 grant “Timet”. Frank Dondelinger’s PhD research is partly funded by the Engineering and Physical Sciences Research Council (EPSRC).

## References

- Ahmed, A., & Xing, E. P. (2009). Recovering time-varying networks of dependencies in social and biological studies.
*Proceedings of the National Academy of Sciences*,*106*, 11878–11883. CrossRefGoogle Scholar - Andrianantoandro, E., Basu, S., Karig, D., & Weiss, R. (2006). Synthetic biology: new engineering rules for an emerging discipline.
*Molecular Systems Biology*,*2*(1), E1–E14. Google Scholar - Andrieu, C., & Doucet, A. (1999). Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC.
*IEEE Transactions on Signal Processing*,*47*(10), 2667–2676. CrossRefGoogle Scholar - Arbeitman, M., Furlong, E., Imam, F., Johnson, E., Null, B., Baker, B., Krasnow, M., Scott, M., Davis, R., & White, K. (2002). Gene expression during the life cycle of
*Drosophila melanogaster*.*Science*,*297*(5590), 2270–2275. CrossRefGoogle Scholar - Cantone, I., Marucci, L., Iorio, F., Ricci, M.A., Belcastro, V., Bansal, M., Santini, S., di Bernardo, M., di Bernardo, D., & Cosma, M. P. (2009). A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches.
*Cell*,*137*(1), 172–181. CrossRefGoogle Scholar - Davis, J., & Goadrich, M. (2006). The relationship between precision-recall and ROC curves. In
*Proceedings of the 23rd international conference on machine learning*(p. 240). New York: ACM. Google Scholar - Dondelinger, F. (2012).
*A machine learning approach to reconstructing signalling pathways and interaction networks in biology*. PhD thesis, University of Edinburgh (in preparation). Google Scholar - Dondelinger, F., Lebre, S., & Husmeier, D. (2010). Heterogeneous continuous dynamic Bayesian networks with flexible structure and inter-time segment information sharing. In
*Proceedings of the 27th international conference on machine learning (ICML)*. Google Scholar - Formstecher, E., Aresta, S., Collura, V., Hamburger, A., Meil, A., Trehin, A., Reverdy, C., Betin, V., Maire, S., Brun, C., et al. (2005). Protein interaction mapping: a Drosophila case study.
*Genome Research*,*15*(3), 376. CrossRefGoogle Scholar - Gelman, A., & Rubin, D. (1992). Inference from iterative simulation using multiple sequences.
*Statistical Science*,*7*(4), 457–472. CrossRefGoogle Scholar - Green, P. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination.
*Biometrika*,*82*, 711–732. MathSciNetzbMATHCrossRefGoogle Scholar - Grzegorczyk, M., & Husmeier, D. (2009). Non-stationary continuous dynamic Bayesian networks. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.),
*Advances in neural information processing systems (NIPS)*(Vol. 22, pp. 682–690). Google Scholar - Grzegorczyk, M., & Husmeier, D. (2011). Non-homogeneous dynamic Bayesian networks for continuous data.
*Machine Learning*,*83*, 355–419. zbMATHCrossRefGoogle Scholar - Guo, F., Hanneke, S., Fu, W., & Xing, E. (2007). Recovering temporally rewiring networks: a model-based approach. In
*Proceedings of the 24th international conference on machine learning*(p. 328). New York: ACM. Google Scholar - Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications.
*Biometrika*,*57*, 97–109. zbMATHCrossRefGoogle Scholar - Homyk, T. Jr, & Emerson, C. Jr (1988). Functional interactions between unlinked muscle genes within haploinsufficient regions of the Drosophila genome.
*Genetics*,*119*(1), 105. Google Scholar - Husmeier, D., & McGuire, G. (2003). Detecting recombination in 4-taxa DNA sequence alignments with Bayesian hidden Markov models and Markov chain Monte Carlo.
*Molecular Biology and Evolution*,*20*(3), 315–337. CrossRefGoogle Scholar - Husmeier, D., Dondelinger, F., & Lèbre, S. (2010). Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks. In J. Lafferty (Ed.),
*Proceedings of the twenty-fourth annual conference on neural information processing systems (NIPS)*(Vol. 23, pp. 901–909). New York: Curran Associates. Google Scholar - Kolar, M., Song, L., & Xing, E. (2009). Sparsistent learning of varying-coefficient models with structural changes. In Y. Bengio, D. Schuurmans, J. Lafferty, C. K. I. Williams, & A. Culotta (Eds.),
*Advances in neural information processing systems (NIPS)*(Vol. 22, pp. 1006–1014). Google Scholar - Larget, B., & Simon, D. L. (1999). Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees.
*Molecular Biology and Evolution*,*16*(6), 750–759. CrossRefGoogle Scholar - Lèbre, S. (2007).
*Stochastic process analysis for genomics and dynamic Bayesian networks inference*. PhD thesis, Université d‘Evry-Val-d‘Essonne, France. Google Scholar - Lèbre, S., Becq, J., Devaux, F., Lelandais, G., & Stumpf, M. (2010). Statistical inference of the time-varying structure of gene-regulation networks.
*BMC Systems Biology*,*4*, 130. CrossRefGoogle Scholar - Locke, J., Kozma-Bognár, L., Gould, P., Fehér, B., Kevei, E., Nagy, F., Turner, M., Hall, A., & Millar, A. (2006). Experimental validation of a predicted feedback loop in the multi-oscillator clock of
*Arabidopsis thaliana*.*Molecular Systems Biology*,*2*(1), 59. Google Scholar - Montana, E., & Littleton, J. (2004). Characterization of a hypercontraction-induced myopathy in Drosophila caused by mutations in
*mhc*.*The Journal of Cell Biology*,*164*(7), 1045. CrossRefGoogle Scholar - Nongthomba, U., Cummins, M., Clark, S., Vigoreaux, J., & Sparrow, J. (2003). Suppression of muscle hypercontraction by mutations in the myosin heavy chain gene of
*Drosophila melanogaster*.*Genetics*,*164*(1), 209. Google Scholar - Parkhurst, S., & Ish-Horowicz, D. (1991).
*WIMP*, a dominant maternal-effect mutation, reduces transcription of a specific subset of segmentation genes in Drosophila.*Genes & Development*,*5*(3), 341. CrossRefGoogle Scholar - Pokhilko, A., Hodge, S., Stratford, K., Knox, K., Edwards, K., Thomson, A., Mizuno, T., & Millar, A. (2010). Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model.
*Molecular Systems Biology*,*6*(1), 416. Google Scholar - Prill, R. J., Marbach, D., Saez-Rodriguez, J., Sorger, P. K., Alexopoulos, L. G., Xue, X., Clarke, N. D., Altan-Bonnet, G., & Stolovitzky, G. (2010). Towards a rigorous assessment of systems biology models: the DREAM3 challenges.
*PLoS ONE*,*5*(2), e9202. CrossRefGoogle Scholar - Punskaya, E., Andrieu, C., Doucet, A., & Fitzgerald, W. (2002). Bayesian curve fitting using MCMC with applications to signal segmentation.
*IEEE Transactions on Signal Processing*,*50*(3), 747–758. CrossRefGoogle Scholar - Robinson, J. W., & Hartemink, A. J. (2009). Non-stationary dynamic Bayesian networks. In D. Koller, D. Schuurmans, Y. Bengio, & L. Bottou (Eds.),
*Advances in neural information processing systems (NIPS)*(Vol. 21, pp. 1369–1376). San Mateo: Morgan Kaufmann. Google Scholar - Robinson, J., & Hartemink, A. (2010). Learning non-stationary dynamic Bayesian networks.
*Journal of Machine Learning Research*,*11*, 3647–3680. MathSciNetzbMATHGoogle Scholar - Sanchez, C., Lachaize, C., Janody, F., Bellon, B., Roeder, L., Euzenat, J., Rechenmann, F., & Jacq, B. (1999). Grasping at molecular interactions and genetic networks in
*Drosophila melanogaster*using FlyNets, an internet database.*Nucleic Acids Research*,*27*(1), 89. CrossRefGoogle Scholar - Sims, D., Bursteinas, B., Gao, Q., Zvelebil, M., & Baum, B. (2006). FLIGHT: database and tools for the integration and cross-correlation of large-scale RNAi phenotypic datasets.
*Nucleic Acids Research*,*34*(suppl 1), D479. CrossRefGoogle Scholar - Talih, M., & Hengartner, N. (2005). Structural learning with time-varying components: tracking the cross-section of financial time series.
*Journal of the Royal Statistical Society B*,*67*(3), 321–341. MathSciNetzbMATHCrossRefGoogle Scholar - Wang, Z., Kuruoglu, E., Yang, X., Xu, Y., & Huang, T. (2011). Time varying dynamic Bayesian network for non-stationary events modeling and online inference.
*IEEE Transactions on Signal Processing*,*4*(59), 1553. CrossRefGoogle Scholar - Werhli, A. V., & Husmeier, D. (2008). Gene regulatory network reconstruction by Bayesian integration of prior knowledge and/or different experimental conditions.
*Journal of Bioinformatics and Computational Biology*,*6*(3), 543–572. CrossRefGoogle Scholar - Xuan, X., & Murphy, K. (2007). Modeling changing dependency structure in multivariate time series. In Z. Ghahramani (Ed.),
*Proceedings of the 24th annual international conference on machine learning (ICML 2007)*(pp. 1055–1062). New York: Omnipress. CrossRefGoogle Scholar - Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions. In P. Goel & A. Zellner (Eds.),
*Bayesian inference and decision techniques*(pp. 233–243). Amsterdam: Elsevier. Google Scholar - Zhao, W., Serpedin, E., & Dougherty, E. (2006). Inferring gene regulatory networks from time series data using the minimum description length principle.
*Bioinformatics*,*22*(17), 2129. CrossRefGoogle Scholar