# Dynamic causal modelling for EEG and MEG

- First Online:

- Received:
- Accepted:

DOI: 10.1007/s11571-008-9038-0

- Cite this article as:
- Kiebel, S.J., Garrido, M.I., Moran, R.J. et al. Cognitive Neurodynamics (2008) 2: 121. doi:10.1007/s11571-008-9038-0

## Abstract

Dynamic Causal Modelling (DCM) is an approach first introduced for the analysis of functional magnetic resonance imaging (fMRI) to quantify effective connectivity between brain areas. Recently, this framework has been extended and established in the magneto/encephalography (M/EEG) domain. DCM for M/EEG entails the inversion a full spatiotemporal model of evoked responses, over multiple conditions. This model rests on a biophysical and neurobiological generative model for electrophysiological data. A generative model is a prescription of how data are generated. The inversion of a DCM provides conditional densities on the model parameters and, indeed on the model itself. These densities enable one to answer key questions about the underlying system. A DCM comprises two parts; one part describes the dynamics within and among neuronal sources, and the second describes how source dynamics generate data in the sensors, using the lead-field. The parameters of this spatiotemporal model are estimated using a single (iterative) Bayesian procedure. In this paper, we will motivate and describe the current DCM framework. Two examples show how the approach can be applied to M/EEG experiments.

### Keywords

MagnetoencephalographyElectroencephalographyDynamic systemConnectivityBayesian## Introduction

In the following, we first highlight several new methodological developments, which we believe are important for new approaches to M/EEG analysis. This introductory motivation is intended to be general. The key components will be reprised using concrete examples of Dynamic Causal Modelling (DCM) later.

The analysis of MEG and EEG data can be approached from various angles. To the un-initiated and expert researcher alike, the diversity of methods can be simply breath-taking. However, most researchers typically avoid switching among methods, because exploration of the methods landscape can incur a high cost. Rather, most M/EEG laboratories have adopted a dual strategy: Firstly, experiments are analyzed using some ‘safe’ strategy based on a kernel of robust and widely accepted methods. This is usually the method of choice for analysis and publication, for example (Picton et al. 2000). Secondly, new methods are evaluated when they look interesting and if there are enough resources for doing so. This strategy might be considered as an ideal mixture of exploitation and exploration. But why is this approach so endemic in the M/EEG field? Take fMRI analysis, after an initial decade of methods exploration, most groups nowadays agree on the main methodological issues. For example, in a recent special issue of the journal Human Brain Mapping, a comparison of a dozen different approaches highlighted the use of common analysis strategies (Poline et al. 2006).

In M/EEG there does not seem to be the same level of consensus. There are many analysis schemes available, which look at different components of the signal: analysis of evoked responses (Kiebel and Friston 2004; van Wassenhove et al. 2005), of single trials using multivariate techniques like independent component analysis, analysis of oscillations using time-frequency power or coherence analysis (Friston et al. 2006; Gross et al. 2001; Liljestrom et al. 2005; Makeig et al. 2002). These analyses can proceed in sensor or brain-space based on source reconstruction using equivalent current dipoles (ECDs) or imaging solutions (Baillet and Garnero 1997; Daunizeau et al. 2006, 2007; Jun et al. 2005; Mattout et al. 2006). Variations and mixtures of all these approaches exist. There might be a simple reason for this diversity of methods: M/EEG data contain much more information about underlying neuronal dynamics than the fMRI signal (Daunizeau et al. 2007). The underlying dynamics, while still largely a mystery, confer great potential on M/EEG. Classical M/EEG analysis methods usually try to reduce the amount of temporal detail, for example by averaging over temporal windows and channels. This is an appropriate strategy for extracting behaviourally relevant features from the data (Rugg and Curran 2007). However, averaging, in particular in sensor-space, also moves the analysis away from the brain enforcing inferences about summary measures (Makeig et al. 2002). There is not only uncertainty about how data should be analyzed but also about how these signals are generated and what they tell us about the underlying system. Therefore, a key topic in M/EEG methods research is to identify models that describe the mapping from the underlying neuronal system to the observed M/EEG response, which incorporate known or assumed constraints (David and Friston 2003; Sotero et al. 2007). Using biophysically and neuronally informed forward models means we can use the M/EEG to make explicit statements about the underlying biophysical and neuronal parameters (David et al. 2006).

In the past years, there have been three important developments in M/EEG analysis. The first is that standard computers are now powerful enough to perform sophisticated analyses in a routine fashion (David et al. 2006). These analyses would have been impractical ten years ago, even for low-density EEG measurements. Secondly, the way methods researchers describe their M/EEG models has changed dramatically in the last decade. Recent descriptions tend to specify the critical assumptions underlying the model, followed by the inversion technique. This is useful because models for M/EEG can be complex; specifying the model explicitly also makes a statement about how one believes data were generated (Daunizeau and Friston 2007). This makes model development more effective and transparent because fully specified models can be compared to other models.

The third substantial advance is the advent of Empirical or hierarchical Bayesian approaches to M/EEG model inversion. Bayesian approaches are important, because they allow for the introduction of constraints that ensure robust parameter estimation, for example (Auranen et al. 2007; Nummenmaa et al. 2007; Penny et al. 2007; Zumer et al. 2007). This is vital once the model is complex enough to generate ambiguities (conditional dependencies) among groups of parameters. One could avoid correlations among parameter estimates by avoiding complex models. However, this would preclude further research into the mechanisms behind the M/EEG. An empirical Bayesian formulation allows the data to resolve these ambiguities and uncertainties. The traditional argument against the use of Bayesian methods is that the priors introduce ‘artificial’ or ‘biased’ information not solicited by the data. Essentially, the claim is that the priors enforce solutions, which are desired by the researcher. This argument can be discounted for three reasons: (i) In Empirical Bayes the weight afforded by the priors is determined by the data, not the analyst. (ii) Bayesian analysis provides the posterior distribution, which encodes uncertainty about the parameters, after observing the data. If the posterior is similar to the prior, then the data do not contain sufficient information to enable qualitative inference. This can be tested explicitly using the model evidence; the fact that a parameter cannot be resolved is usually informative in itself. (iii) Usually, Bayesian analysis explores a selection of models, followed by model comparison (Garrido et al. 2007). For example, one can invert a model derived from one’s favourite cognitive neuroscience theory, along with other alternative models. The best model can then be found by comparing model evidences (see below) using standard decision criteria (Penny et al. 2004).

In summary, we argue that the combination of these developments allow for models that are sophisticated enough to capture the full richness of the data. The Bayesian approach is central to this new class of models, without which is not possible to constrain complex models or deal with inherent correlations among parameter estimates. Bayesian model comparison represents the important tool of selecting the best among competing models, which is central to the scientific process.

Dynamic Causal Modelling provides a generative spatiotemporal model for M/EEG responses. The idea central to DCM is that M/EEG data are the response of a dynamic input–output system to experimental inputs. It is assumed that the sensory inputs are processed by a network of discrete but interacting neuronal sources. For each source, we use a neural mass model, which describes responses of neuronal subpopulations. Each population has its own (intrinsic) dynamics governed by the neural mass equations, but also receives extrinsic input, either directly as sensory input or from other sources. The whole set of sources and their interactions are fully specified by a set of first-order differential equations that are formally related to other neural mass models used in computational models of M/EEG (Breakspear et al. 2006; Rodrigues et al. 2006). We assume that the depolarization of pyramidal cell populations gives rise to observed M/EEG data; one specifies how these depolarizations are expressed in the sensors through a conventional lead-field. The full, spatiotemporal model takes the form of a nonlinear state-space model with hidden states modelling (unobserved) neuronal dynamics, while the observation (lead-field) equation is instantaneous and linear in the states. In other words, the model consists of a temporal and spatial part with temporal (e.g., connectivity between two sources) and spatial parameters (e.g., lead-field parameters, like ECD locations). In the next section, we describe the DCM equations and demonstrate how the ensuing model is inverted using Bayesian techniques. We illustrate inference using evoked responses from a multi-subject data set. We also introduce a recent addition to the DCM framework, which can be used to make inferences about M/EEG steady-state responses. We conclude with a discussion about current DCM algorithms and point to some promising future developments.

## Dynamic Causal Modelling—theory

Intuitively, the DCM scheme regards an experiment as a designed perturbation of neuronal dynamics that are promulgated and distributed throughout a system of coupled anatomical sources to produce region-specific responses. This system is modelled using a dynamic input–state–output system with multiple inputs and outputs. Responses are evoked by deterministic inputs that correspond to experimental manipulations (i.e., presentation of stimuli). Experimental factors (i.e., stimulus attributes or context) can also change the parameters or causal architecture of the system producing these responses. The state variables cover both the neuronal activities and other neurophysiological or biophysical variables needed to form the outputs. Outputs are those components of neuronal responses that can be detected by MEG/EEG sensors. In our model, these components are depolarizations of a ‘neural mass’ of pyramidal cells. DCM starts with a reasonably realistic neuronal model of interacting cortical regions. This model is then supplemented with a spatial forward model of how neuronal activity is transformed into measured responses, here, M/EEG scalp-averaged responses. This enables the parameters of the neuronal model (e.g., effective connectivity) to be estimated from observed data. For M/EEG data, this spatial model is a forward model of electromagnetic measurements that accounts for volume conduction effects (Mosher et al. 1999).

### Hierarchical MEG/EEG neural mass model

DCMs for M/EEG adopt a neural mass model (David and Friston 2003) to explain source activity in terms of the ensemble dynamics of interacting inhibitory and excitatory subpopulations of neurons, based on the model of Jansen and Rit (1995). This model emulates the activity of a source using three neural subpopulations, each assigned to one of three cortical layers; an excitatory subpopulation in the granular layer, an inhibitory subpopulation in the supra-granular layer and a population of deep pyramidal cells in the infra-granular layer. The excitatory pyramidal cells receive excitatory and inhibitory input from local interneurons (via intrinsic connections, confined to the cortical sheet), and send excitatory outputs to remote cortical sources via extrinsic connections. See also Grimbert and Faugeras (2006) for a bifurcation analysis of this model.

In David et al. (2005), we developed a hierarchical cortical model to study the influence of forward, backward and lateral connections on evoked responses. This model embodies directed extrinsic connections among a number of sources, each based on the Jansen model (Jansen and Rit 1995), using the connectivity rules described in Felleman and Van Essen (1991). Using these rules, it is straightforward to construct any hierarchical cortico-cortical network model of cortical sources. Under simplifying assumptions, directed connections can be classified as: (i) Bottom-up or forward connections that originate in the infragranular layers and terminate in the granular layer. (ii) Top-down or backward connections that connect infragranular to agranular layers. (iii) Lateral connections that originate in infragranular layers and target all layers. These long-range or extrinsic cortico-cortical connections are excitatory and are mediated through the axonal processes of pyramidal cells. For simplicity, we do not consider thalamic connections, but model thalamic afferents as a function encoding subcortical input (see below).

*x*are the neuronal states of cortical sources,

*u*are exogenous inputs, and

*h*is the system’s response.

*θ*are quantities that parameterize the state and observer equations (see also below under ‘Prior assumptions’). The state-equations are ordinary first-order differential equations and are derived from the behaviour of the three neuronal subpopulations, which operate as linear damped oscillators. The integration of the differential equations pertaining to each subpopulation can be expressed as a convolution of the exogenous input to produce the response (David and Friston 2003). This convolution transforms the average density of pre-synaptic inputs into an average postsynaptic membrane potential, where the convolution kernel is given by

*e*” stands for “excitatory”. Similarly subscript “

*i*” is used for inhibitory synapses.

*H*controls the maximum postsynaptic potential, and

*τ*represents a lumped rate constant. An operator

*S*transforms the potential of each subpopulation into firing rate, which is the exogenous input to other subpopulations. This operator is assumed to be an instantaneous sigmoid nonlinearity of the form

*ρ*

_{1}and

*ρ*

_{2}determine its slope and translation. Interactions, among the subpopulations, depend on internal coupling constants,

*γ*

_{1,2,3,4}, which control the strength of intrinsic connections and reflect the total number of synapses expressed by each subpopulation (see Fig. 1). The integration of this model, to form predicted responses, rests on formulating these two operators (Eqs. 2, 3) in terms of a set of differential equations as described in David and Friston (2003). These equations, for all sources, can be integrated using the matrix exponential of the systems Jacobian as described in the appendices of David et al. (2006). Critically, the integration scheme allows for conduction delays on the connections, which are free parameters of the model. A DCM, at the network level, obtains by coupling sources with extrinsic forward, backward and lateral connections as described above.

### Event-related input and event-related response-specific effects

To model event-related responses, the network receives inputs from the environment via input connections. These connections are exactly the same as forward connections and deliver inputs *u* to the spiny stellate cells in layer 4 of specified sources. In the present context, inputs *u* model afferent activity relayed by subcortical structures and is modelled with two components: The first is a gamma density function (truncated to peri-stimulus time). This models an event-related burst of input that is delayed with respect to stimulus onset and dispersed by subcortical synapses and axonal conduction. Being a density function, this component integrates to one over peri-stimulus time. The second component is a discrete cosine set modelling systematic fluctuations in input, as a function of peri-stimulus time. In our implementation, peri-stimulus time is treated as a state variable, allowing the input to be computed explicitly during integration. Critically, the event-related input is exactly the same for all ERPs. The effects of experimental factors are mediated through event-related response (ERR)-specific changes in connection strengths. See Fig. 1 for a summary of the resulting differential equations.

*A*

_{ij}encodes the strength of a connection to the

*i*th source from the

*j*th and

*B*

_{ijk}encodes its gain for the

*k*th ERP. The superscripts (

*F*,

*B*, or

*L*) indicate the type of connection, i.e., forward, backward or lateral (see also Fig. 1). By convention, we set the gain of the first ERP to unity, so that the gains of subsequent ERPs are relative to the first. The reason we model extrinsic modulations in terms of gain (a multiplicative factor), as opposed to additive effects, is that by construction, connections should always be positive. This is assured; provided both the connection and its gain are positive. In this context, a [positive] gain of less than one represents a decrease in connection strength.

*H*

_{e}of the synaptic kernel (Eq. 2). A gain greater than one effectively increases the maximum response that can be elicited from a source. For the

*i*th source:

### The spatial forward model

*i*th source \( x^{{(i)}}_{0} \) is detected remotely on the scalp surface in M/EEG. The relationship between scalp data

*h*and source activity is assumed to be linear and instantaneous

*L*is a lead-field matrix (i.e., spatial forward model), which accounts for passive conduction of the electromagnetic field (Mosher et al. 1999). Here, we assume that the spatial expression of each source is caused by one ECD. Of course, one can use different source models, e.g. extended patches on the cortical surface (see Section “Discussion”). The head model for the dipoles is based on four concentric spheres, each with homogeneous and isotropic conductivity. The four spheres approximate the brain, skull, cerebrospinal fluid, and scalp. The parameters of the model are the radii and conductivities for each layer. Here, we use as radii 71, 72, 79, and 85 mm, with conductivities 0.33, 1.0, 0.0042, and 0.33 S/m, respectively. The potential at the sensors requires an evaluation of an infinite series, which can be approximated using fast algorithms (Mosher et al. 1999; Zhang 1995). The lead-field of each ECD is then a function of three location and three orientation or moment parameters

*θ*

^{L}= {

*θ*

^{pos}

*,θ*

^{mom}}. For the ECD forward model, we used a Matlab (Mathworks) routine that is freely available as part of the FieldTrip package (http://www2.ru.nl/fcdonders/fieldtrip/), under the GNU general public license.

### Dimension reduction

*E*

*ε*is the observation error (see below). The eigenvectors are computed using principal component analysis or singular value decomposition (SVD). Because this projection is orthonormal, the independence of the projected errors is preserved, and the form of the error covariance components assumed by the observation model remains unchanged. In this paper, we reduce the sensor data such that the retained modes capture at least 99% of the variability of the data.

### The observation or likelihood model

*h(θ)*(Eq. 6). This generalized convolution gives an observation model for the vectorized data

*y*and the associated likelihood

*ε*is assumed to be zero-mean Gaussian and independent over channels, i.e., Cov(vec(

*ε*)) = diag(

*λ*) ⊗

*V*, where

*λ*is an unknown vector of channel-specific variances.

*V*represents the error’s temporal autocorrelation matrix, which we assume is the identity matrix. This is tenable because we down-sample the data to about 8 ms. Low-frequency noise or drift components are modelled by

*X*, which is a block diagonal matrix with a low-order discrete cosine set for each evoked response and channel. The order of this set can be determined by Bayesian model selection (see below). This model is fitted to data by tuning the free parameters

*θ*to minimize the discrepancy between predicted and observed MEG/EEG time series under model complexity constraints (more formally, the parameters minimize the Variational Free Energy; see below). In addition to minimizing prediction error, the parameters are constrained by a prior specification of the range they are likely to lie in Friston et al. (2003). These constraints, which take the form of a prior density

*p*(

*θ*), are combined with the likelihood,

*p*(

*y*|

*θ*), to form a posterior density

*p*(

*θ|y*) ∝

*p*(

*y|θ*)

*p*(

*θ*) according to Bayes’ rule. It is this posterior or conditional density we want to estimate. Gaussian assumptions about the errors in Eq. 8 enable us to compute the likelihood from the prediction error. The only outstanding quantities we require are the priors, which are described next.

### Prior expectation

*p*(

*θ*

_{i}) of the

*i*th parameter is defined by its mean and variance. The mean corresponds to the prior expectation. The variance reflects the amount of prior information about the parameter. A tight distribution (small variance) corresponds to precise prior knowledge. The parameters of the state-equation can be divided into six subsets: (i) extrinsic connection parameters, which specify the coupling strengths among sources, (ii) intrinsic connection parameters, which reflect our knowledge about canonical micro-circuitry within a source, (iii) conduction delays, (iv) synaptic and sigmoid parameters controlling the dynamics within an source, (v) input parameters, which control the subcortical delay and dispersion of event-related responses, and, importantly, (vi) intrinsic and extrinsic gain parameters. Table 1 list the priors for these parameters; see also David et al. (2006) for details. Note that we fixed the values of intrinsic coupling parameters as described in Jansen and Rit (1995). Inter-laminar conduction delays are usually fixed at 2 ms and inter-regional delays have a prior expectation of 16 ms.

Prior densities of parameters (for connections to the *i*th source from the *j*th, in the *k*th evoked response)

Extrinsic coupling parameters | \( \begin{array}{lll} A^{F}_{{ijk}}= A^{F}_{{ij}} B_{{ijk}} & A^{F}_{{ij}}= 32\exp(\theta ^{F}_{{ij}} ) & \theta ^{F}_{{ij}} \sim N(0,\frac{1} {2})\\ A^{B}_{{ijk}}=A^{B}_{{ij}} B_{{ijk}} & A^{B}_{{ij}}= 16\exp (\theta ^{B}_{{ij}} )& \theta ^{B}_{{ij}}\sim N(0,\frac{1} {2}) \\ A^{L}_{{ijk}}= A^{L}_{{ij}} B_{{ijk}} & A^{L}_{{ij}}= 4\exp(\theta ^{L}_{{ij}} )& \theta ^{B}_{{ij}} \sim N(0,\frac{1} {2})\\ & B_{{ijk}}= \exp(\theta ^{B}_{{ijk}} )& \theta ^{L}_{{ij}} \sim N(0,\frac{1} {2})\\& C_{i}= \exp (\theta ^{C}_{i})&\theta ^{C}_{i} \sim N(0,\frac{1} {2}) \end{array} \) |

Intrinsic coupling parameters | \( \begin{array}{*{20}c} {{\gamma _{1} = 128}} & {{\gamma _{2} = \frac{4} {5}\gamma _{1} }} & {{\gamma _{3} = \frac{1} {4}\gamma _{1} }} & {{\gamma _{4} = \frac{1} {4}}} \\ \end{array} \gamma _{1} \) |

Conduction delays (ms) | \( \begin{array}{*{20}c} {{\Updelta _{{ii}} = 2}} & {{\Updelta _{{ij}} = 16\exp (\theta^{\Updelta }_{{ij}} )}} & {{\theta ^{\Updelta}_{{ij}} \sim N\,(0,\frac{1} {{16}})}}\\ \end{array} \) |

Synaptic parameters (ms) | \( \begin{array}{*{20}c} & {{T^{{(i)}}_{e} = 8\exp (\theta ^{T}_{i} )}} & {{\theta^{T}_{i} \sim N(0,\frac{1} {8})}} \\ {{H^{{(i)}}_{{e,k}} = B_{{iik}} H^{{(i)}}_{e} }} &{{H^{{(i)}}_{e} = 4\exp (\theta ^{H}_{i} )}} & {{\theta ^{H}_{i} \sim N(0,\frac{1} {8})}}\\ {{}} & {{T_{i} = {\text{16}}{\quad\,}}} & {{H_{i} = 32}}\\ \end{array} \) |

Sigmoid parameters | \( \begin{array}{*{20}c} {{\rho ^{{(i)}}_{1}\,=\, \frac{2} {3}\exp (\theta ^{{\rho _{1}}}_{i} )}} & {{\theta ^{{\rho _{1} }}_{i} \sim N{\left( {0,\frac{1} {8}} \right)}}}\\ {{\rho ^{{(i)}}_{2}\,=\, \frac{1} {3}\exp (\theta ^{{\rho _{2} }}_{i} )}} &{{\theta ^{{\rho _{2} }}_{i} \sim N{\left( {0,\frac{1} {8}} \right)}}} \\\end{array} \) |

Input parameters (s) | \( \begin{array}{ll} u(t) = b(t,\eta _{1} ,\eta _{2} ) + {\sum {\theta ^{c}_{i} \cos(2\pi (i - 1)t)} } & \theta ^{c}_{i} \sim N(0,1)\\ \eta _{1} = \exp (\theta ^{\eta}_{1}) &\theta ^{\eta }_{1} \sim N(0,\frac{1} {{16}})\\ \eta _{2} = 16\exp (\theta ^{\eta }_{2} ) & \theta^{\eta }_{2} \sim N(0,\frac{1} {{16}})\end{array} \) |

Spatial (ECD) parameters (mm) | \( \begin{aligned}{} & \theta ^{{pos}}_{i} \sim N\,(L^{{pos}}_{i} ,32I_{3} ) \\ & \theta^{{mom}}_{i} \sim N\,(0,8I_{3} ) \\ \end{aligned} \) |

### Inference and model comparison

*m*, parameter estimation corresponds to approximating the moments of the posterior distribution given by Bayes’ rule

*η*and covariance Σ) are updated iteratively using Variational Bayes under a fixed-form Laplace (i.e., Gaussian) approximation to the conditional density

*q*(

*θ*) =

*N*(

*η,*Σ). This can be regarded as an Expectation-Maximization (EM) algorithm that employs a local linear approximation of Eq. 8 about the current conditional expectation. The

**E**-step conforms to a Fisher-scoring scheme (Fahrmeir and Tutz 1994) that performs a descent on the variational free energy

*F*(

*q*,

*λ*,

*m*) with respect to the conditional moments. In the

**M**-Step, the error variances

*λ*are updated in exactly the same way. The estimation scheme can be summarized as follows:

Note that the free energy is simply a function of the log-likelihood and the log-prior for a particular DCM and *q*(*θ*). The expression \( {\left\langle \cdot \right\rangle }_{q} \) denotes the expectation under the density *q*. *q*(*θ*) is the approximation to the posterior density *p*(*θ*|*y,λ,m*) we require. The **E**-step updates the moments of *q*(*θ*) (these are the variational parameters *η* and Σ) by minimizing the variational free energy. The free energy is the Kullback–Leibler divergence (denoted by \( D( \cdot || \cdot ) \)), between the real and approximate conditional density minus the log-likelihood. This means that the conditional moments or variational parameters maximize the marginal log-likelihood, while minimizing the discrepancy between the true and approximate conditional density. Because the divergence does not depend on the covariance parameters, minimizing the free energy in the **M**-step is equivalent to finding the maximum likelihood estimates of the covariance parameters. This scheme is identical to that employed by DCM for functional magnetic resonance imaging (Friston et al. 2003). Source code for this routine can be found in the Statistical Parametric Mapping software package (see Software note below), in the function ‘spm_nlsi_N.m’.

Bayesian inference proceeds using the conditional or posterior density estimated by iterating Eq. 10. Usually this involves specifying a parameter or compound of parameters as a contrast, *c*^{T}*η*. Inferences about this contrast are made using its conditional covariance, *c*^{T}Σ*c*. For example, one can compute the probability that any contrast is greater than zero or some meaningful threshold, given the data. This inference is conditioned on the particular model specified. In other words, given the data and model, inference is based on the probability that a particular contrast is bigger than a specified threshold. In some situations one may want to compare different models. This entails Bayesian model comparison.

*m*, under a normal approximation (Friston et al. 2003), by

This is simply the maximum value of the objective function attained by EM (see the **M**-Step in Eq. 10). The most likely model is the one with the largest log-evidence. This enables Bayesian model selection. Model comparison rests on the likelihood ratio *B*_{ij} (i.e., Bayes Factor) of the evidence or relative log-evidence for two models. For models *i* and *j*

Conventionally, strong evidence in favour of one model requires the difference in log-evidence to be three or more (Penny et al. 2004). This threshold criterion plays a similar role as a *p*-value of 0.05 = 1/20 in classical statistics (used to reject the null hypothesis in favour of the alternative model). A difference in log-evidence of greater than three (i.e., a Bayes factor more than exp(3) ∼ 20) indicates that the data provide strong evidence in favour of one model over the other. This is a standard way to assess the differences in log-evidence quantitatively.

### Models for steady-state responses

The description up to this point has assumed that we are dealing with evoked responses, which can be located precisely in time. As we have shown above, given stimulus timing information, one can model the measured M/EEG response as the system whose dynamics are prescribed by neural mass models. In theory, one can also estimate the input itself, e.g., its onset and duration. This is a hard, nonlinear problem and we acknowledge most experiments control the input by design. However, there are experiments for which one does not know the input function. For example, in sleep research, one might want to model networks that receive internally generated input functions. Similarly, experiments that measure electrophysiological data (M/EEG, local field potentials) over long times, without any designed inputs, must assume the exact input function to be unknown. In such cases, one can posit that the data have been induced by input with an assumed statistical distribution. Linear system models offer particularly amenable analysis strategies to explore state-space models. The form of the neural mass model equations above are an example of such state-space models.

For steady-state responses the system can, in essence, be understood as a filter with an accompanying transfer function

Here *s* represents real and imaginary frequency components *ζ*_{i} represent the system “zeros” where the frequency response is zero and *λ*_{i}, the system “poles” where the frequency goes to infinity. This function describes how any spectral input is shaped to produce spectral output. This presumes time invariance in the inputs and response and can describe neatly the dynamics in the frequency domain, *s*.

With DCM, we can use this strategy for steady-state paradigms by assuming a white noise (flat spectral) input. Assuming the system operates in a steady-state around its fixed point, we can linearize the nonlinear differential equations to describe the system response in the frequency domain. Note that by response we now mean the spectral output that is shaped by the system transfer function. This linearization allows us to establish a mapping from the system parameters to the predicted frequency spectrum (Moran et al. 2007). Importantly, as with multiple evoked responses, this enables us to model differences between two or more spectra, acquired under different conditions, as consequences of specific parameter changes. These parameters might be intrinsic or extrinsic connections, but can also be, for example, the excitatory rate constants *τ*_{e} (Eq. 2), which have a marked influence on the frequency spectrum (Moran et al. 2007). The basic idea is to manipulate the (real) system (e.g. by experimental changes in the level of a neurotransmitter), model this change in terms of changes in specific DCM parameters, and then test the implicit hypothesis using Bayesian inference, i.e., model comparison and posterior probabilities. This strategy has been applied in (Moran et al. in press) using just one source, where we showed that changes in glutamate levels as measured by microdialysis can be modelled by a change in DCM parameters. We will come back to this work below.

## Illustrative examples

### Mismatch negativity

In this section, we illustrate the use of DCM for ERP/ERFs by analysing data acquired under a mismatch negativity (MMN) paradigm. Critically, DCM allows us to test hypotheses about the changes in connectivity between sources. In this example study, we will test a specific hypothesis (see below) about the MMN generation and compare various models over twelve subjects. The results shown here are a part of a series of papers that consider the MMN and its underlying mechanisms in detail (Garrido et al. 2007).

Novel sounds, or oddballs, embedded in a stream of repeated sounds, or standards, produce a distinct response that can be recorded non-invasively with MEG and EEG. The MMN is the negative component of the waveform obtained by subtracting the event-related response to a standard from the response to an oddball, or deviant. This response to sudden changes in the acoustic environment peaks at about 100–200 ms from change onset (Sams et al. 1985) and exhibits an enhanced negativity that is distributed over auditory and frontal areas, with prominence in frontal regions.

The MMN is believed to be an index of automatic change detection reflecting a pre-attentive sensory memory mechanism (Tiitinen et al. 1994). There have been several compelling mechanistic accounts of how the MMN might arise. The most common interpretation is that the MMN can be regarded as a marker for error detection, caused by a break in a learned regularity, or familiar auditory context. The early work by Näätänen and colleagues suggested that the MMN results from a comparison between the auditory input and a memory trace of previous sounds. In agreement with this theory, others (Naatanen and Winkler 1999; Sussman and Winkler 2001; Winkler et al. 1996) have postulated that the MMN would reflect on-line modifications of the auditory system, or updates of the perceptual model, during incorporation of a newly encountered stimulus into the model—*the model-adjustment hypothesis*. Hence, the MMN would be a specific response to stimulus change and not to stimulus alone. This hypothesis has been supported by Escera et al. (2003) who provided evidence that the prefrontal cortex is involved in a top-down modulation of a deviance detection system in the temporal cortex. In the light of the Näätänen model, it has been claimed that the MMN is caused by two underlying functional processes, a sensory memory mechanism related to temporal generators and an automatic attention-switching process related to the frontal generators (Giard et al. 1990). Accordingly, it has been shown that the temporal and frontal MMN sources have distinct behaviours over time (Rinne et al. 2000) and that these sources interact with each other (Jemel et al. 2002). Thus the MMN could be generated by a temporofrontal network (Doeller et al. 2003; Opitz et al. 2002), as revealed by M/EEG and fMRI studies. This work has linked the early component (in the range of about 100–140 ms) to a sensorial, or non-comparator account of the MMN, elaborated in the temporal cortex, and a later component (in the range of about 140–200 ms) to a cognitive part of the MMN, involving the frontal cortex (Maess et al. 2007).

Using DCM, we modelled the MMN generators with a temporo-frontal network comprising bilateral sources over the primary and secondary auditory and frontal cortex. Following the model-adjustment hypothesis, we assume that the early and late component of the MMN can be explained by an interaction of temporal and frontal sources or network nodes. The MMN itself is defined as the difference between the responses to the oddball and the standard stimuli. Here, we modelled both evoked responses and explained the MMN, i.e., differences in the two ERPs, by a modulation of DCM parameters. There are two kinds of parameters that seem appropriate to induce the difference between oddballs and standards: (i) modulation of extrinsic connectivity between sources, and (ii) modulation of intrinsic parameters in each source. Modulation of intrinsic parameters would correspond to a mechanism that is more akin to an *adaptation* hypothesis, i.e., the MMN is generated by local adaptation of populations. This is the hypothesis considered by Jaaskelainen et al. (2004) who report evidence that the MMN is explained by differential adaptation of two pairs of bilateral temporal sources. In a recent paper (Garrido et al. in press), we have compared models derived from both hypotheses: (i) the model-adjustment hypothesis and (ii) the adaptation hypothesis. Here, we will constrain ourselves to demonstrate inference based on DCMs derived from the model-adjustment hypothesis only, which involves a fronto-temporal network.

#### Experimental design

We studied a group of 13 healthy volunteers aged 24–35 (5 female). Each subject gave signed informed consent before the study, which proceeded under local ethical committee guidelines. Subjects sat on a comfortable chair in front of a desk in a dimly illuminated room. Electroencephalographic activity was measured during an auditory ‘oddball’ paradigm, in which subjects heard of “standard” (1,000 Hz) and “deviant” tones (2,000 Hz), occurring 80% (480 trials) and 20% (120 trials) of the time, respectively, in a pseudo-random sequence. The stimuli were presented binaurally via headphones for 15 min every 2 s. The duration of each tone was 70 ms with 5 ms rise and fall times. The subjects were instructed not to move, to keep their eyes closed and to count the deviant tones.

EEG was recorded with a Biosemi system with 128 scalp electrodes. Data were recorded at a sampling rate of 512 Hz. Vertical and horizontal eye movements were monitored using EOG (electro-oculograms) electrodes. The data were epoched offline, with a peri-stimulus window of −100 to 400 ms, down-sampled to 200 Hz, band-pass filtered between 0.5 and 40 Hz and re-referenced to the average of the right and left ear lobes. Trials in which the absolute amplitude of the signal exceeded 100 μV were excluded. Two subjects were eliminated from further analysis due to excessive trials containing artefacts. In the remaining subjects, an average 18% of trials were excluded.

#### Specification of dynamic causal model

Prior coordinates for the locations of the ECDs in Montreal Neurology Institute (MNI) space [mm]

Left primary auditory cortex (lA1) | −42, −22, 7 |

Right primary auditory cortex (rA1) | 46, −14, 8 |

Left superior temporal gyrus (lSTG) | −61, −32, 8 |

Right superior temporal gyrus (rSTG) | 59, −25, 8 |

Right inferior frontal gyrus (rIFG) | 46, 20, 8 |

#### Results

### Steady-state responses

*H*

_{e}) of synaptic kernels and an increase in the coupling parameters (

*γ*

_{1},

*γ*

_{2},

*γ*

_{3}) in the isolated group, relative to the social control group.

Microdialysis measures of extracellular glutamate neurotransmitter levels from two groups (social and isolated) of Wistar rats

Glutamate | |
---|---|

Social | 4.2 ± 1.4 μM (100%) |

Isolated | 1.5 ± 0.8 μM (36%) |

Empirical LFP data were acquired with a Data Science International radio-telemetric system; collecting LFPs over a 24 h period from the prefrontal cortex of six social and six isolation reared animals. These animals were moving freely in their home cage and not exposed to external stimuli. The data analyzed here was an average spectral response over a 10-min period. Pre-processing involved a Fast Fourier Transform of the data, using the same frequencies as above.

*H*

_{i}

*,τ*

_{i}

*,γ*

_{4}

*,γ*

_{5}to zero. The model could then account for differences in spectral response, between the two groups, using the excitatory parameters

*H*

_{e}

*,τ*

_{e}

*,γ*

_{1}

*,γ*

_{2}

*,γ*

_{3}and

*ρ*

_{2}. Population differences between their MAP estimates were significant in the case of

*H*

_{e}and

*ρ*

_{2}(

*p*< 0.05). Group parameter means and their respective

*p*-values are illustrated in Fig. 7.

The picture from LFP modelling corresponds to the microdialysis predictions on two levels, first the MAP estimates suggest a sensitization of post-synaptic responses; with increases in *H*_{e} (and excitatory intrinsic connections) in the isolated animals, secondly an overall decrease in firing rate for that group with the increase in *ρ*_{2} point to a low excitatory field. This parameter is a proxy for neuronal adaptation and highlights a greater adaptation in the low-glutamate “schizophrenic” rat group. This is consistent with reduced levels of extracellular neurotransmitter. For a more detailed discussion of these results see Moran et al. (in press).

## Discussion

Dynamic Causal Modelling for M/EEG entails the inversion of informed spatiotemporal models of observed responses. The idea is to model condition-specific responses over channels and peri-stimulus time with the same model, where the differences among conditions are explained by changes in only a few key parameters. The face and predictive validity of DCM have been established, which makes it a potentially useful tool for group studies (David et al. 2006; Garrido et al. 2007; Kiebel et al. 2006). In principle, the same approach can be applied to the analysis of single trials, where one would use a parametric modulation of parameters to model the effects of trial-to-trial changes in an experimental variable (e.g., reaction time or forgotten vs. remembered). Furthermore, we have described how DCM can be extended to cover source-reconstructed M/EEG or LFP steady-state responses under simple assumptions about the statistical distribution of the input.

One can also view DCM for evoked responses as a source reconstruction device using biophysically informed temporal constraints. This is because DCM has two components; a neural-mass model of the interactions among a small number of dipole sources and a classical electromagnetic forward model that links these sources to extra-cranial measurements. Inverting the DCM implicitly optimises the location and orientation of the sources. This is in contrast to traditional ECD fitting approaches, where dipoles are fitted sequentially to the data; using user-selected periods and/or channels of the data. Classical approaches have to proceed in this way, because there is usually too much spatial and temporal dependency among the sources to identify the parameters precisely. With our approach, we place temporal constraints on the model that are consistent with the way that signals are generated biophysically. As we have shown, these allow simultaneous fitting of multiple dipoles to the data.

We used the ECD model because it is analytic, fast to compute and a quasi-standard when reconstructing evoked responses. However, the ECD model is just one candidate for spatial forward models. Given the lead-field, one can use any spatial model in the observation equation (Eq. 6). A further example would be some linear distributed approach (Baillet et al. 2001; Phillips et al. 2005; Daunizeau et al. 2006), where a ‘patch’ of dipoles, confined to the cortical surface, would act as the spatial expression of one area. With DCM, one could also use different forward models for different areas (hybrid models). For example, one could employ the ECD model for early responses while using a distributed forward model for higher areas.

More generally, we anticipate that Bayesian model comparison will become a ubiquitous tool in M/EEG. This is because further development of M/EEG models and their fusion with other imaging modalities requires more complex models embodying useful constraints. The appropriateness of such models for any given data cannot necessarily be intuited, but can be assessed formally using Bayesian model comparison. The key is to compute the model evidence *p*(*y*|*m*) (Eq. 12), for using a variational approach (see above) or as described in Sato et al. (2004), or by employing sampling approaches like the Monte Carlo Markov Chain (MCMC) techniques as in Auranen et al. (2007) and Jun et al. (2005). In principle, one can compare models based on different concepts or, indeed, inversion schemes, for a given data set *y*. For example, one can easily compare different types of source reconstruction (ECD versus source imaging) with DCM. This cannot be done with classical, non-Bayesian approaches, for which model comparisons are only feasible under certain constraints (‘nested models’); precluding comparisons among qualitatively different models. Although other approximations to the model evidence exist, e.g. the Akaike Information Criterion, they are not generally useful with informative priors (Beal 2003).

Currently, the DCM framework is deterministic, i.e., it allows for observation noise in the sensors but does not consider noise at the level of the neuronal dynamics. This is part of ongoing work; several groups are developing variational techniques that invert stochastic DCMs based on stochastic differential equations with both nonlinear evolution and observation functions (c.f., Eq. 1).

## Software note

All procedures described in this note have been implemented as Matlab (MathWorks) code. The source code is freely available in the DCM and neural model toolboxes of the Statistical Parametric Mapping package (SPM5) under http://www.fil.ion.ucl.ac.uk/spm/.

## Acknowledgments

This work was supported by the Wellcome Trust. MIG holds a FCT doctoral scholarship, Ministry of Science, Portugal. RJM was supported by the Irish Research Council for Science Engineering and Technology.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.