Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The brain can be described as a network of interconnected information processing nodes. Each node, corresponding to a brain region, is specialized for a specific type of information. It is the interaction of the different nodes via the connections of the network that allows the brain to process and appropriately respond to external stimuli. Consider a subject that is asked to reach for and grasp an object when a visual cue is presented. First, visual information is transferred from the retina to the primary visual cortex through the optic nerve, optic tract and optic radiation. It is then expected that the reach and grasp component of the task will be handled separately by subregions of the parietal and premotor cortex [1]. Finally, the primary motor cortex will be involved to initiate the reaching movement. Individually, the visual, parietal, premotor, and motor cortex cannot produce the desired task. It is their interaction, or more specifically their exchange of information, that allows the subject to reach and grasp the object. The ability to image and observe the interaction of the different regions of the brain would therefore be invaluable to our understanding of the brain.

In this work, we present a new model of information flow in the brain built on the previous work of [2]. First, we build a Bayesian network which captures the state of the cortical surface of the brain and their connections. The network is subject specific as it is built from the structural connectivity via diffusion magnetic resonance imaging (MRI). Next, we introduce the electroencephalography (EEG) measurements as evidence into this network. Finally, using the principle of maximum entropy on the mean, we infer the posterior probability that cortical regions and connections are active. By projecting activation probabilities of connections back onto the streamlines obtained from diffusion MRI, we generate an animated representation of information flow. Thus, by combining functional information obtained from EEG with anatomical information obtained in diffusion MRI, we infer the information flow in the white matter of the brain.

2 Theory

We model the cortical activity using a distributed dipole model. The activation of a small area of the cortical surface is modeled by a dipole n that is perpendicular to the surface and whose intensity at a time t is given by \(x_{n,t} \in {\text {I}\!\text {R}}\). We distribute thousands of these dipoles on the cortical surface and combine their intensities in a vector \(\varvec{x} \in {\text {I}\!\text {R}}^{N_dN_t}\) where \(N_d\) and \(N_t\) are the number of dipoles and samples, respectively. The forward problem can then be modeled as \(\varvec{m} = \varvec{G}\varvec{x} + \varvec{\epsilon }\) where \(\varvec{m} \in {\text {I}\!\text {R}}^{N_sN_t}\) is the vector of EEG measurements with \(N_s\) the number of sensors. The matrix \(\varvec{G}\) is the lead field that projects the dipole intensities onto the sensors. It can be computed using the anatomical MR images of the subject [5] and is therefore a known quantity. The observation are corrupted by the additive noise term \(\varvec{\epsilon }\). Note that we consider all time samples in a single very large problem which makes the forward operator \(\varvec{G}\) block diagonal.

2.1 Connectivity Informed Maximum Entropy on the Mean

The vector of dipole intensities \(\varvec{x}\) represents the cortical activity for a given time window. Estimating these intensities from the measurements \(\varvec{m}\) is an ill-posed problem. We have many more intensities to estimate than available measurements. In [2], Amblard et al. propose to group the dipoles into \(N_S\) cortical regions, which we refer to as clusters. Each cluster is assigned a state variable which dictates the possible intensities of dipoles within the cluster. Let \(S_k\) represent the state of the \(k^{th}\) cluster, that is whether it is active or not. Let \(\varvec{S} = [S_1, S_2, ..., S_{N_S}]\) be the cluster state vector. Note that like \(\varvec{m}\) and \(\varvec{x}\), the vector \(\varvec{S}\) incorporates a time component. That is, if the cortical surface is segmented into 60 regions and the data window contains 10 time points, \(\varvec{S}\) contains 600 state variables. Let \(d\mu (\varvec{x}, \varvec{S}) = \mu (\varvec{x}, \varvec{S})d\varvec{x}\) be a reference law that represents the a priori information of the dipole intensities and cluster states. We assume the source amplitudes \(\varvec{x}_k\) of the \(k^\text {th}\) cluster depends only on the state of the cluster \(S_k\) to get

$$\begin{aligned} d\mu (\varvec{x}, \varvec{S}) = \pi (\varvec{S})\prod _{k=1}^{N_S}{d\mu (\varvec{x}_k|S_k)} \end{aligned}$$
(1)

where \(\pi (\varvec{S})\) is the joint probability law of the cluster states. This distribution, along with the assumption of independent cluster states, was used by [2] to constrain the inverse problem and estimate the dipole intensities.

The output of a tractography pipeline is a set of streamlines which represent the white matter fiber bundles. When streamlines begin in a cluster and end in another, we say that those two clusters are connected. A connection between two clusters is characterized by its length, which is the average length of the streamlines that reach those clusters. We treat these connections as wires or pipes which allow communication between cortical regions. The state of a cluster at a time \(t_0\) can influence the state of a different cluster at a time \(t_\varDelta \) if they are connected. The delay \(\varDelta \) is proportional to the length of the connection and the speed at which information flows in the connection. For example, if a cluster is known to be active at a time \(t_0\), it increases the likelihood that its connected counterpart is active at a time \(t_\varDelta \). This spatio-temporal regularization constitutes an additional prior and is therefore included in the model by modifying the distribution (1). The connections are added as a new set of variable which represent connection states. Much like the state of the clusters affect the dipoles, the state of the connections affect clusters. That is, the state of a connection \(C_i\) gives us information about the state of the clusters it connects. Like clusters, we assume connections are either active or inactive. Let \(\varvec{C}\) be the connection state vector, the updated prior information on sources, clusters, and connections is then given by

$$\begin{aligned} d\mu (\varvec{x}, \varvec{S}, \varvec{C}) = \varphi (\varvec{C})\pi (\varvec{S}|\varvec{C})\prod _{k=1}^{N_S}{d\mu (\varvec{x}_k|S_k)} \end{aligned}$$
(2)

where \(\varphi (\varvec{C})\) is the joint probability law of the connection states. The distribution (2) is too general. It allows the state of a cluster to be affected by a connection that does not reach it. We therefore assume that the state of a cluster depends only on the connections that reach it. If we let \(\varvec{C}_{\gamma (k)}\) be the connections that reach the \(k^{th}\) cluster, the joint distribution of the cluster states given the connection states is

$$\begin{aligned} \pi (\varvec{S}|\varvec{C}) = \prod _{k=1}^{N_S}{\pi (S_k|\varvec{C}_{\gamma (k)})}. \end{aligned}$$
(3)

Finally, we assume that the state of a connection is independent of other connection which, along with (3) yields

$$\begin{aligned} d\mu (\varvec{x}, \varvec{S}, \varvec{C}) = \prod _{i=1}^{N_C}{\varphi (C_i)} \prod _{k=1}^{N_S}{\pi (S_k|\varvec{C}_{\gamma (k)}) d\mu (x_k|S_k)}. \end{aligned}$$
(4)

The distribution (4) contains all of the available prior information. It describes our prior knowledge of the source intensities, cluster states, and connection states given no EEG measurements. The next step is to insert the observed EEG measurements as evidence and thus be able to infer the posterior probability of any given source, cluster, and connection configuration.

Let the dipole intensities \(\varvec{x}\) have a probability law \(dp(\varvec{x})\), distinct from the reference law \(d\mu (\varvec{x})\) described previously. The objective of the MEM algorithm is to find the probability law \(dp^*(\varvec{x}) = p^*(\varvec{x})d\varvec{x}\) that satisfies

$$\begin{aligned} \varvec{m} = \varvec{Gx}^* \text{ with } \varvec{x}^* = \int {\varvec{x} p^*(\varvec{x})} d\varvec{x} \end{aligned}$$
(5)

while minimizing the distance to the reference law measured using the Kullback-Leibler divergence \(D_\text {KL}(p(\varvec{x})|d\mu (\varvec{x}))\). In other words, the solution is the probability law \(dp^*(\varvec{x})\) closest to the reference law \(d\mu (\varvec{x})\) that explains the observations on average. By introducing Lagrange multipliers, minimizing \(D_\text {KL}(p(\varvec{x})|d\mu (\varvec{x}))\) subject to (5) can be written as

$$\begin{aligned} \underset{p(\varvec{x}), \varvec{\lambda }, \lambda _0}{ \text{ minimize } } D_\text {KL}(p(\varvec{x})|d\mu (\varvec{x})) + \varvec{\lambda }^T(\varvec{m} - \varvec{G}\int {\varvec{x} p(\varvec{x})} d\varvec{x}) + \lambda _0 (1 - \int {dp(\varvec{x})}). \end{aligned}$$
(6)

The first term is the entropy of the model, the second is the data fit, and the third ensures \(p(\varvec{x})\) is a probability distribution. Eq. (6) can be thought of as finding the optimal compromise between the uncertainty (i.e. entropy) of the model and the observations. If we assume the noise is Gaussian with zero mean, the unique optimal Lagrange multipliers \(\varvec{\lambda }^*\) can be determined by solving [2]

$$\begin{aligned} \varvec{\lambda }^* = \underset{\varvec{\lambda }}{{{\mathrm{arg\,min}}}} \ \ln Z(\varvec{\lambda }) + (\varvec{\lambda }^T m - \varvec{\lambda }^T {\varSigma }_\epsilon ^2 \varvec{\lambda }) \end{aligned}$$
(7)

where \({\varSigma }_\epsilon ^2\) is the variance of the additive noise and where \(Z(\varvec{\lambda })\) is the partition function given by

$$\begin{aligned} Z(\varvec{\lambda }) = \int {\exp {(\varvec{\lambda }^T \varvec{Gx})} d\mu (\varvec{x})}. \end{aligned}$$
(8)

Note that (7) does not depend on \(p(\varvec{x})\) or \(\lambda _0\), which can be determined from the unique minimizer \(\varvec{\lambda }^*\). Finding the optimal Lagrange multipliers from (7) requires evaluating \(Z(\varvec{\lambda })\) which in turn requires the reference law \(d\mu (\varvec{x})\). We propose to obtain this reference law by marginalizing \(\varvec{S}\) and \(\varvec{C}\) out of (4), that is

$$\begin{aligned} d\mu (\varvec{x}) = \sum _{\left\{ \varvec{C}\right\} } \prod _{i=1}^{N_C}{\varphi (C_i)} \sum _{\left\{ \varvec{S}\right\} } \prod _{k=1}^{N_S}{\pi (S_k|C_{\gamma (k)}) d\mu (\varvec{x}_k|S_k)} \end{aligned}$$
(9)

where the sum over \(\left\{ \varvec{C}\right\} \) indicates a sum over all possible realizations of \(\varvec{C}\). The optimal Lagrange multipliers \(\varvec{\lambda }^*\) are identified by solving (7) which requires evaluating \(Z(\varvec{\lambda })\) repeatedly. The sum over all possible states of \(\varvec{C}\) which appears in (9) is prohibitive because it generates \(2^{N_c}\) terms which quickly becomes intractable. This problem can be mitigated by noting that \(Z(\varvec{\lambda })\) is the unnormalized sum of the marginal probability of any variable in \(\varvec{C}\) or \(\varvec{S}\). All terms in the sum need not be evaluated because the state of a cluster depends only on a few connections. Using a suitable variable elimination order, which depends on the specific connections between clusters, the sum of products can be rearranged to compute \(Z(\varvec{\lambda })\) efficiently.

3 Methods

3.1 Synthetic Data

To validate our algorithm, synthetic EEG data was generated using a simplified model of the visual system. The model consists of two clusters of 10 sources representing the left and right primary visual cortices. In addition, the model contains four clusters representing the lateral geniculate nuclei (LGN) and the eyes. These four clusters each have a single invisible source, i.e. their corresponding column of \(\varvec{G}\) contains only zeros. Bidirectional connections are present between the LGN and between the visual cortices. Unidirectional connections link the eyes to the LGN and the LGN to the visual cortices. To simulate signals, information is propagated from the eyes to the LGN and to the visual cortex by selecting random connections. When a cluster is activated by one of its connections, the correspond sources are activated. The synthetic EEG signals are then recorded by two sensors located near the visual cortices. Gaussian noise is added to the synthetic EEG measurements to obtain the desired signal to noise ratio (SNR) defined as the variance of the signal over the variance of the noise.

To quantify how the SNR affects our algorithm’s ability to recover information flow patterns, synthetic signals were generated with an SNR varying between 1 and 20. These signals were then used to recover the posterior probability that each connection was active. A recovery was deemed successful if all connections used to simulate the data obtained a posterior probability above 0.5 while all other posterior probabilities were below 0.5.

3.2 Experimental Data Acquisition and Preprocessing

Magnetic resonance images and EEG signals were acquired on a healthy volunteer. The MR images included diffusion weighted images acquired using a SPLICE sequence (b-value 1000 s/mm\(^2\), 64 directions) and a T1 weighted image. The SPLICE sequence was selected over a more common EPI sequence because it does not generate distortions near the eyes and allows us to track the optic nerve and optic tract. Fiber orientation distribution functions were computed using constrained spherical deconvolution [6] implemented in dipy [4]. Fiber tracking was performed using anatomically constrained particle filter tractography [7] also implemented in dipy. The surface of the cortex was extracted and segmented from the T1 weighted image using FreeSurfer and then downsampled to 8000 vertices. The forward operator \(\varvec{G}\) was computed using OpenMEEG [5, 8].

After the MR acquisition, the subjects were fitted with a 64 channel actiCAP (Brain Products GmbH) EEG cap. The subjects were seated in front of a computer screen in a dark quiet room and were asked to fixate on a red circle appearing at the center of the screen. The visual stimuli consisted of a circle appearing on the left or right side of the screen. One run consisted of 50 left and 50 right visual cues randomly interleaved. A total of 10 runs were performed, 5 with each stimuli. The EEG signals were separated into 150 ms epochs starting at the stimulus onset. Epochs with blink artifacts were excluded from further analysis. All remaining epochs for each stimuli were averaged and downsampled to 100 Hz.

3.3 Model Parameters

The model presented above relies on several parameters. First, the information velocity must be specified to compute the delay associated with each connection. Here, we assume a constant velocity of 6 m/s [9] for all connections of the brain. Next, the parameters of the prior distribution \(d\mu (\varvec{x}, \varvec{S}, \varvec{C})\) must be given. Connections and clusters are initialized with a 0.9 probability of being inactive. However, when a connection is active, the clusters it connects must also be active. Finally, the intensity of sources follows a Gaussian distribution with zero mean. If the cluster is active or inactive, the variance of the distribution is 1e−4 or 1e−7, respectively.

Fig. 1.
figure 1

Rate of successful information flow recovery at different SNR.

4 Results

Figure 1 illustrates the rate of successful information flow recovery averaged over 100 realizations for each SNR. The rate of recovery sharply rises between a SNR of 5 and 10. An example of a successful recovery is illustrated in Fig. 2. The left part of the figure presents the posterior probabilities projected back onto the model and animated. Information can be seen flowing through the different clusters. The right part of the figure presents and overview of the same results in the form of a flow diagram. Each row of the graph represents a cortical region and the circles correspond to individual time points. Information flow in connections is illustrated by lines connecting cortical regions at different time points. In both cases, green indicates a high probability that a region or connection is active whereas white indicates a low probability.

Fig. 2.
figure 2

Reconstruction of the information flow from the left eye, to the right LGN, to right visual, to the left visual (\(\text{ SNR }=5\)). The left of the figure is a video of the information flow projected back onto the model. The right of the figure presents the same information in the form of an information flow diagram.

Reconstruction of the information flow following a left and right visual stimulus is illustrated in Fig. 3. For the left visual stimulus, information flow through the optic nerve and optic chiasm to reach the right LGN. The information then goes through the optic radiation to reach the occipital lobe. Notably, the right LGN and right optic radiation have a higher posterior probability than their left counterpart, consistent with the expected visual activation. Similar results are obtained for the right visual stimuli, however the right and left visual pathways are activated with similar posterior probabilities.

Fig. 3.
figure 3

Video of the reconstructed information flow in the visual pathway for the 120 ms following a left (a) or right (b) visual stimulus.

5 Conclusion

By combining structural connectivity information from diffusion MRI and functional information from EEG, our algorithm is able to infer information in the white matter of the brain. We evaluated the rate of successful recovery using synthetic data and presented preliminary in vivo results of information flow in the visual pathway. In this study, a constant information velocity of 6 m/s was assumed for all connections of the brain. However, it should be noted that our model is not specific to this assumption and can accommodate different information velocities for different connections. In the future, it may be possible to tune information speed using additional micro-structure information such as axon diameter or myelination. Our method represents a promising new approach to visualizing brain dynamics and connectomics across white matter pathways.