# Identifying distributed and overlapping clusters of hemodynamic synchrony in fMRI data sets

- 823 Downloads

## Abstract

Natural sensory stimuli elicit complex brain responses that manifest in fMRI as widely distributed and overlapping clusters of hemodynamic responses. We propose a statistical signal processing method for finding synchronous hemodynamic activity that directly or transiently reflects information about the experimental condition. When applied to fMRI data, the method searches for voxels with activation patterns exhibiting high coherence and simultaneously high variance across brain scans. The crux of the method is functional principal component analysis (fPCA) of activation patterns stored in a two-dimensional data matrix, with rows and columns representing voxels and scans, respectively. Without external information, fPCA is performed directly on this data matrix. Otherwise, the data matrix is first transformed to highlight a specific source of variation, enabling fully or partially supervised fPCA with a single parameter determining the degree of supervision. We evaluated our method on a public benchmark of fMRI scans of subjects viewing natural movies. Our method turns out to be very suitable for flexibly uncovering distributed and overlapping hemodynamic patterns that distinguish well between experimental conditions or cognitive states.

## Keywords

Natural perception Brain activity fMRI Functional data analysis Semi-supervised models## 1 Introduction

Methods for functional magnetic resonance image (fMRI) analysis can be broadly divided into model-based analysis and data-driven analysis. The difference between the two is not absolute but rather indicates the point of departure. Model-driven methods, such as the common general linear model [5, 12, 14, 27, 37], assume an explicit temporal hemodynamic model based upon the experimental condition. These methods have proven to be useful for spatial localization of covariate-related brain responses. The a priori model, however, is limited in dealing with hemodynamic variations across subjects, brain regions, and even cortical layers [1, 16]. As an alternative, data-driven methods group brain responses by temporal similarity [2, 7, 30, 24] or distinguish brain response from various noise sources by data decomposition [4, 11, 27]. These methods are powerful in revealing multivariate patterns of brain activity independent of experimental conditions. The interpretation of such patterns, however, is often problematic due to the presence of many confounding sources of brain activity. Hence, the effectiveness of either data-driven and model-based methods partially resolves the fMRI data analysis problem.

A new class of methods [17, 22, 23] combines the simplicity of model-based methods with the flexibility of data-driven methods. These methods take advantage of similarities in hemodynamic patterns among subjects. Each subject’s hemodynamic time course is voxel-wise correlated with every other subject’s hemodynamic time courses. Intersubject correlation matrices are then constructed for all voxels to measure hemodynamic consistency given a specific task. In a post-processing step, voxels with similar temporal patterns are clustered for further examination. Intersubject similarity-based methods work well for the identification of brain activity for such tasks as the auditory odd ball task [22]. They also work well for uncovering new brain areas responding to complex visual stimuli [17]. The versatility of these methods, however, is limited by the exclusion of valuable information from external sources. It is therefore natural, as we are pursuing here, to incorporate information about experimental conditions in the data analysis without compromising the flexibility of similarity based methods.

We propose a method that leniently uses information about the experimental condition to discover synchrony in hemodynamics. The method searches for voxels whose activation pattern exhibits high coherence and simultaneously high variance across brain scans. The crux of the method is functional principal component analysis of activation patterns stored in a two-dimensional data matrix with rows and columns representing voxels and scans, respectively. There are three modes of operation. Without external information, principal component analysis is performed on the original data matrix. Otherwise, the data matrix is first transformed to highlight specific sources of variation using stimulus data, group labels or any other coded information. The transformed data matrix is subsequently subjected to fully or partially supervised principal component analysis, with a single parameter determining the degree of supervision. Principal component analysis is performed on the rows of the data matrix in an incremental way. At each step, rows with low principal component scores are removed from the data matrix, resulting in nested voxel clusters with synchronous activity patterns. Optimal voxel clusters are subsequently determined from Gap statistics.

The underlying principle of our method comes from the popular gene shaving method (see [18]), which has been widely used in bioinformatics to find biologically relevant patterns of variations across genes, samples, and outcome measurements. Our motivation for extending the gene shaving method to fMRI data analysis is the inability of conventional fMRI data analysis to unravel the complex brain activity that natural sensory stimuli elicit [20]. Such complex brain activities often manifest in fMRI as spatially widely distributed and overlapping clusters of hemodynamic responses [19]. This type of nested clusters is the target of the method we propose here. Specifically, our fMRI data analysis method aims to detect distributed and overlapping voxel clusters with synchronous hemodynamyic responses, when onsets and identities of their underlying processes are either fully known or unknown.

The difference between gene shaving and our method is that the first operates on discrete measurements (gene expression) while our method operates on signal data from EEG, fMRI or any other modality. The external source of variation may be signal data too. Here, we specifically focus on hemodynamics in fMRI data, calling our method *voxel sieving* as it incrementally separates out voxels with asynchronous activation patterns. We evaluate voxel sieving on simulated fMRI data and on an international fMRI test benchmark involving natural movie stimuli. We explore the correspondence between voxel cluster detections and known functional specialization. In addition, we compare our method’s ability to decode cognitive states with that of other state-of-the-art multivariate fMRI data analysis methods.

## 2 Materials

Stimulus and brain response data have been obtained from a publicly available benchmark for testing and comparing brain activity interpretation methods (see [32] for more detail and references). The benchmark has been extensively used in an international brain reading competition, providing the possibility to objectively compare our method’s performance with that of others.

### 2.1 Data

The brain response data involve fMRI data associated with passive viewing of Home Improvement sitcom movies for approximately 20 min. This TV video provided long shots and a repeating use of a small number of actors in a small number of sets that allows common elements to reoccur. Also, the materials (character types, settings, events, objects) are typical of what the subjects would be expected to have experience with [32]. The 20-min movies contained five interruptions where no video was present but only a white fixation cross on a black background. Three subjects watched the same three movies while undergoing functional brain imaging. Neuroimage data were collected on a Siemens Allegra 3T scanner. The structural neuroimage data were acquired with 1 mm spatial resolution. The functional scans produced volumes with approximately* V* = 36,000 brain voxels, each approximately 3.28 mm × 3.28 mm × 3.5 mm, with one volume produced every 1.75 s. These scans were preprocessed (motion correction, slice time correction, linear trend removal) and spatially normalized (non-linear registration) to the Montreal Neurological Institute brain atlas [26].

After fMRI scanning, the three subjects watched the three movies again to rate 30 movie features at time intervals corresponding to the fMRI scan rate. The extensive behavioral time vector ratings included the coding of categories such as *faces, motion*, and *emotional states* at multiple levels of hierarchy (i.e. *faces* versus individual *actors*). All three subjects generated ratings for each feature in each movie by moving a slider that controls a line on a screen showing the current value of the slider. Each rating was done on a 4-point scale. For the feature *faces*, for example, 0 indicates no faces, 1 faces somewhere in the picture, 2 faces at between 25 and 50% of the image, and 3 faces seen at more than 50% of the image. Each vector-valued rating pattern was subsequently convolved with a double-gamma hemodynamic response function to define the stimulus signal. A complete description of features and generation of feature vectors can be found in [32].

We use data associated with movies 1 and 2, as data associated with movie 3 has not been made public for objective on-site evaluation purpose. As we are interested in finding continuous hemodynamics caused by the content of the movies, we exclude parts of the data corresponding to video presentations of a white fixation cross on a black background. Taking into account the hemodynamic lag, we divide each fMRI scan and each subject rating into six parts corresponding with the movie on parts. The six fMRI parts differ somewhat in number of volumes: part 1 consists of 91 volumes and the other parts of 90, 115, 108, 116, and 112 volumes, respectively. For a single movie, this results in 18 fMRI scans (3 subjects × 6 movie parts) and 18 real-valued and subject-dependent movie ratings.

We denote these four-dimensional fMRI scans by *I* _{ s }(**x**, *t*), where *s* = 1,…,*S* indicates the sample scan, **x** ∈ ℜ^{3} is 3D discrete spatial position, and *t* is time point. The real-valued ratings for sample *s* are denoted by the vector **g** _{ s }, containing *S* real values corresponding to the strength of a movie feature at the time scan *I* _{ s }(**x**, *t*) was acquired.

### 2.2 Data representation

**F**(

*t*), or scalars, e.g.

**F**bold face lower case indicates a vector of functions, e.g.

**f**(

*t*), or scalars, e.g.

**f**, and regular lower case indicates a function or a scalar.

**f**= [

*f*

_{1},…,

*f*

_{ T }] by

*B*

_{ m }(

*t*) is the

*m*th basis function and ω

_{ m }the weight of that basis. In our case B-splines are used to represent the non-periodic voxel activation data in a continuous manner. The functional representation of all

*v*= 1,…,

*V*voxel time-courses of

*I*(

**x**,

*t*) forms a vector

**f***(

*t*) of functions

*K*= 2

^{ l }clusters. Clusters at the highest level

*l*= 0 are created by clustering with

*K*= 1, at level

*l*= 1 by clustering with

*K*= 2, at the next level we take

*K*= 4 and so on. The number of clusters at the lowest level is equal to the number of voxels

*V*the atlas contains. Assuming this number is a power of two, this results in a total of 2

*V*− 1 clusters. By imposing a range on the levels, for example, considering higher levels of hierarchy only (\({\mathcal{L}} = \{0, \ldots ,L\}\) with

*L*< log2(

*V*)), the number of all clusters to be a analyzed can be limited and sensitivity to noise limited. Clusters at all levels are indexed by

*c*= 1,…,

*C*with \( C = \sum_{l \in {\mathcal{L}}} 2^l.\)

*I*(

**x**,

*t*), by the vector of average voxel time courses

**f**(

*t*) = [

*f*

_{1}(

*t*), ... ,

*f*

_{ C }(

*t*)], with

*c*and \(|{\mathcal{V}}_c|\) denotes the number of elements in that set. We refer to

*f*

_{ c }(

*t*) as a

*supervoxel*. Supervoxels have a regularizing effect. They reduce the multiple comparison problem and alleviate the need for spatial clustering of activated voxels as required in most voxel-wise methods.

*S*fMRI scans we define a

*C*×

*S*data matrix

**F**(

*t*) correspond to supervoxels, the columns to fMRI scans

*I*

_{ s }(

**x**,

*t*), and the element

*f*

_{cs}(

*t*) is the

*c*th supervoxel of scan

*s*. For example, when only supervoxels at hierarchical levels \({\mathcal{L}}= \{ 9,10,11,12 \}\) are considered for the

*S*= 18 fMRI scans from the free movie viewing study, this will result in a 7,680 × 18 data matrix

**F**(

*t*). Each row of

**F**(

*t*) is centered to have zero mean.

## 3 Methods

The main computational parts of the voxel sieving method are shown in Fig. 1e. Each of these components will be described in more detail in the following subsections.

### 3.1 Unsupervised voxel sieving

Unsupervised voxel sieving operates directly on **F**(*t*) (see Fig. 1c). It aims at identifying voxels with synchronous activity patterns independent of experimental conditions.

#### 3.1.1 Principal component analysis

**F**(

*t*) with both high column variance and high coherence between supervoxels (see Fig. 1d). A good way to accomplish this is to perform functional principal component analysis [31, 35] of

**F**(

*t*) and to use principal component scores to identify rows of

**F**(

*t*) that have high correlated variation. The central concept for the univariate functional data set

**f**(

*t*) = [

*f*

_{1}(

*t*), ...,

*f*

_{ C }(

*t*)] is taking the linear combination

*f*

_{cq}is the principal component score value of voxel time course

*f*

_{ c }(

*t*) in dimension

*q*. Principal components α

_{ q }(

*t*),

*q*= 1,…,

*Q*are sought for one after the other by optimizing

_{ q }(

*t*) is subject to the following orthonormal constraints:

*f*

_{ c }(

*t*) onto the subspace spanned by the

*Q*first principal component functions results in the vector of principal component scores

**f**

_{ c }= [

*f*

_{c1},…,

*f*

_{ cQ }]. This mapping is very similar to local linear discriminant analysis of fMRI data (e.g. in [9, 28]). In this work, we only consider the main mode of variation, i.e. we set

*Q*= 1.

**F**(

*t*) is multivariate we need to perform multivariate functional principal component analysis (see [31]). The principal component in this case is defined by an

*S*-vector of weight function \({\mathbf{\alpha}}= [\alpha^1_q(t), \ldots, \alpha^S_q(t)]\) with \({\alpha^S_q(t)}\) denoting the variation for sample

*s*. The inner product on the space of vector functions is defined as the sum of the inner products of the

*S*components. Hence, Eq. 5 becomes

**F**(

*t*) to form a composite function. Subsequently, we perform univariate functional principal component analysis. This results in the principal component score vector

**f**= [

*f*

_{1},…,

*f*

_{ C }], which is subjected to sieving.

#### 3.1.2 Principal component sieving

Principal component sieving starts with the full data matrix **F**(*t*). The sieving procedure aims to remove δ percent of the supervoxels, i.e. rows, of **F**(*t*) with lowest absolute principal component scores, in order to arrive at a reduced data matrix **F** ^{1}(*t*). The sieving parameter δ allows to control for the graininess of sieving. When it has a low value, small clusters of voxels with strong synchronous activity can be detected (at the cost of computation). In contrast, larger voxel clusters with less heomdynamic synchrony emerge when the value of δ is high.

We denote the set of supervoxels that survives the first sieving sequence by *supercluster* \({\mathcal{V}}_1\) (note the difference between set of voxels denoted by \({\mathcal{V}}\) and set of supervoxels denoted by \({\mathcal{V}}\)). Then, functional principal component sieving is repeated on the reduced data matrix **F** ^{1}(*t*) to yield a new smaller supercluster. This process is repeated until the data matrix cannot be sieved anymore. Hence, voxel sieving results in superclusters \({\mathcal{V}}_1 \supset {\mathcal{V}}_2,\ldots, \supset {\mathcal{V}}_J\), with *I* being the total number of sieving sequences. We denote the working matrix associated with the supercluster at sieving sequence *j* by **F** ^{ j }(*t*), *j* = 1,…,*J*.

#### 3.1.3 Cluster size determination

*V*

_{ B }(

*t*) and total variance

*V*

_{ T }(

*t*) defined as

**F**

^{ j }(

*t*) and \(\bar{f}^j_s(t)\) is the

*s*th column mean of

**F**

^{ j }(

*t*). A large value of \(R = \int(\sqrt{V_B(t)/V_T(t)}\)) implies a tight cluster of coherent supervoxels.

**F**

^{ j }(

*t*) be the data matrix corresponding with sieving sequence

*j*and

*R*

_{ j }its R measure. To determine whether

*R*

_{ j }is larger than expected by chance if the rows and columns of the data were independent, we permute the elements within each row of

**F**

^{ j }(

*t*). We perform

*P*such permutations to obtain equally many R-measures. The Gap function is then defined by the difference between the real R-measure and the average R-measure of the randomized data

**F**(

*t*).

#### 3.1.4 Data orthogonalization

**F**(

*t*) with respect to the column average \(\bar{{\mathbf{f}}}(t)\) of the supercluster found in the previous step. This is equivalent to regressing each row of

**F**(

*t*) on \(\bar{{\mathbf{f}}}(t)\) and replacing the rows with the regression residuals. As we are dealing with functional data, we use a point-wise multivariate functional linear model to orthogonalize

**F**(

*t*). This reduces to solving

**f**

_{ c }(

*t*),

*c*= 1,…,

*C*is a row vector of

**F**(

*t*), β(

*t*) is the regression function and \({\varvec{\epsilon}}(t)= [\epsilon_1(t),\ldots.,\epsilon_S(t)]^T\) is the vector of residual functions. Under the assumption that the residual functions \({\varvec{\epsilon}} (t)\) are independent and normally distributed with zero mean, the regression function is estimated by least squares minimization such that

*t*). We regularize the second derivative of β(

*t*). The estimated regression function provides the best estimate of

**f**

_{ c }(

*t*) in least squares sense:

**F***(

*t*) with rows

**F***(

*t*). The search for the next supercluster starts with centering of the rows of

**F***(

*t*). Then all steps described in sect. 3.1 are repeated again on the new centered data matrix. This iterative procedure continues until a predefined number of superclusters has been identified. As the number of meaningful superclusters cannot be known a priori, the search for new superclusters may be stopped based on the quality of estimating voxel time course by a linear combination of supercluster averages: when adding new superclusters does not lead to increasing percent variance explained, this can be taken as a stop condition.

Note that because orthogonalization is done with respect to the average time course of a supercluster, supervoxels in different clusters can be highly correlated with one another. Moreover, one supervoxel can belong to multiple superclusters, i.e. supervoxels removed in a previous sieving step may be part of the supercluster of the next step.

### 3.2 Supervised voxel sieving

The method discussed so far has not used external information about the columns of **F**(*t*) to ‘supervise’ the sieving of rows. External information such as cognitive states, subject information or stimulus patterns may be crucial in uncovering hidden hemodynamic synchrony. Here, we generalize voxel sieving to incorporate different types of external covariates such as continuously valued stimulus data or discrete class labels for the purpose of steering the discovery of hidden hemodynamic synchrony.

*g*

_{ s }(

*t*) by fitting a B-spline to the vector-valued movie rating

**g**

_{ s }. For the task at hand, we subsequently map the supervoxels onto a subspace spanned by the movie rating data using the

*S*×

*S*projection matrix

**g**

^{+}(

*t*) is the generalized Moore-Penrose pseudo inverse of

**g**(

*t*) = [

*g*

_{1}(

*t*), ...,

*g*

_{ S }(

*t*)]

^{ T }. Then, given data matrix

**F**(

*t*) and projection matrix

**P**

^{1}we map the supervoxels:

**F****(

*t*) rather than on

**F**(

*t*). Note that when the task at hand is to predict the stimulus from brain activity data (e.g. for brain reading tasks), we can reverse the roles of the predictor and the predictant, treating voxel activity data as the predictor and the stimulus as the response.

**F**(

*t*), then an \(S \times |{\mathcal{L}}|\) matrix of scalars can be defined that maps the columns of

**F**(

*t*) onto \(|{\mathcal{L}}|\) columns containing the class averages for each row. In the example of the three subjects watching six movie parts, we may, for example, want to identify synchronous brain activity across subjects using projection matrix

**F**(

*t*) by

**P**

^{2}results in an alternative

*C*× 3 working matrix with the three columns now corresponding to the three subjects. The data analysis steps described in Sect. 3.1 are subsequently executed to identify across-subject hemodynamic synchronization.

Hence, incorporation of different types of external covariates in the voxel sieving procedure is achieved by performing a suitable data projection operation prior to the data analysis procedure of Sect. 3.1.

### 3.3 Partially supervised voxel sieving

**F**(

*t*) and projection matrix

**P**, partially supervised data analysis is facilitated through

**P*** is a weighted combination of the projection matrix

**P**and identity matrix

**I**:

*λ*= 1, the data are projected onto themselves and hence lead to unsupervised sieving. For

*λ*= 0, the data are projected by

**P**only and thus analysis reduces to supervised sieving. Values between 0 and 1 enable partial supervision. Note that

**P**and

**I**become matrices of functions when the external covariate itself is functional.

## 4 Experiments and results

We use voxel sieving to uncover distributed and overlapping patterns of fMRI activity predicative of sources underlying these patterns. Our experiments aim at exploring how well this can be achieved. All experiments are performed on a functional data representation of the fMRI data. An important motivation for using B-splines, rather than temporal smoothing with an HRF kernel, is minimization of bias. To what extent a predefined kernel smoother gives an acceptable level of bias can only be determined empirically. We choose to determine the smoother in a more objective manner by calculating smooth splines for our time courses with roughness of derivatives as a penalty [31]. We subsequently determine the minimum number of basis functions producing very similar smoothing results, to get an efficient yet accurate data representation. Note that this generally imposes some restriction on variation in fMRI scan length, repetition times, etc. The fMRI scans in our experiments are reasonably uniform in terms of number of volumes and hence can all be approximated with the same number of basis functions.

### 4.1 Simulated fMRI data

As an initial test we apply our method to artificial fMRI data. Following [6, 8], we simulate fMRI data using three types of sources: task-related, transiently task related, and function related. The task-related source corresponds with an activation paradigm. It is periodic and slowly changing. The transiently related source closely matches the task-related source but has an activation that is more pronounced at parts of each task cycle. The function-related source is characterized by random fluctuations. The three sources are super-Gaussian in nature; they are localized. We disregard source variations across large image areas such as motion-related sources, assuming these have been accounted for in the preprocessing step.

*S*= 3). Each of the three fMRI data sets consists of 64 x 64 voxels and 100 time points. Approximately 22% of these 4,096 voxels has a task-related source. These voxels are clustered at three spatially distributed locations. Another 15% has a transiently task-related source, distributed over two equally large clusters. The fraction of voxels with a mixture of the aforementioned sources is 7%. Finally, a random sources is assigned to 5 percent of the voxels. We add Gaussian noise to the constructed data sets at signal-to-noise ratios (SNR): 2, 1.5, 1, 0.5, 0.25. The SNR measure we use is the standard deviation over all sources divided by the standard deviation over all noise sources. Figure 2 summarizes the sources.

We fit a 20-coefficient B-spline to the discrete voxel time courses to obtain functional data. The voxel time courses are hierarchically clustered in space. The highest level used in hierarchical clustering was *l* = 5. It produces 2^{5} = 32 voxel clusters with on average 128 voxels. We excluded higher levels because we expect these will not be informative. At the lowest level the 2^{12} = 4,096 individual voxels themselves are considered. The supervision weight λ is set to 0 (fully supervised) or 1 (fully unsupervised). Data randomization to separate real from random clusters is done on the basis of *P* = 3 permutations.

*Precision*and

*Recall*to measure performance.

*Precision*indicates the fraction of relevant voxels in the two detected superclusters, while

*Recall*is the number of relevant voxels in the two detected superclusters divided by the number of relevant voxels in the entire fMRI volume. The harmonic mean combines these two measures into a single one [34]:

*F*

_{ score }for supervised (

*λ*= 1) and unsupervised analysis (

*λ*= 0) of simulated fMRI data for various signal-to-noise ratios and values of sieving parameter δ. We first discuss two noteworthy observations for both unsupervised and unsupervised sieving. First, lower values of δ, yielding fine-grained voxel clusters, lead to better detection performance. This is to be expected as the relevant voxels are clustered in relatively small parts of the 3D space. In real fMRI scans, where sources may be spread over the entire space, larger values of δ may perform better (as we will discuss next). Second, the decay of detection performance with decrease of SNR is lower for larger values of the sieving parameter. This can be explained by the fact that course sieving results in larger voxel clusters that tend to average out noise more vigorously.

*F*_{ score } of detection for: unsupervised \(\mid\) supervised sieving

δ | SNR = 2.0 | SNR = 1.5 | SNR = 1.0 | SNR = 0.5 | SNR = 0.25 |
---|---|---|---|---|---|

0.1 | 0.78 \( \mid\) 0.81 | 0.79 \( \mid\) 0.72 | 0.59 \( \mid\) 0.67 | 0.44 \( \mid\) 0.40 | 0.27 \( \mid\) 0.25 |

0.3 | 0.65 \( \mid\) 0.68 | 0.52 \( \mid\) 0.55 | 0.48 \( \mid\) 0.47 | 0.35 \( \mid\) 0.29 | 0.28 \( \mid\) 0.19 |

0.5 | 0.50 \( \mid\) 0.51 | 0.41 \( \mid\) 0.44 | 0.36 \( \mid\) 0.35 | 0.27 \( \mid\) 0.17 | 0.23 \( \mid\) 0.11 |

0.7 | 0.33 \( \mid\) 0.32 | 0.23 \( \mid\) 0.20 | 0.19 \( \mid\) 0.16 | 0.17 \(\mid\) 0.09 | 0.11 \( \mid\) 0.05 |

0.9 | 0.20 \( \mid\) 0.18 | 0.18 \( \mid\) 0.12 | 0.13 \( \mid\) 0.10 | 0.09 \( \mid\) 0.08 | 0.09 \( \mid\) 0.06 |

In comparison to unsupervised sieving, supervised sieving performs better at high SNR when sieving is done in a fine-grained fashion (low δ values). A close inspection of detected superclusters reveals the following recurring pattern. In supervised analysis, the first supercluster is large relative to the second and is almost entirely composed of task-related voxels. The second supercluster is small and includes voxels with a mix of task-related and transiently task-related time courses. As a result *Precision* is very high. Conversely, in unsupervised analysis, the first cluster and the second supercluster are relatively large and comparable in size. Almost all voxels in the first cluster are task-related. The second supercluster also contains a considerable amount of irrelevant clusters. This leads to lower *Precision* and higher *Recall*, compared with supervised analysis. On average, detection performance reduces for unsupervised analysis. When the signal-to-noise ratio decreases and becomes more realistic, however, unsupervised sieving outperforms supervised sieving, particularly for course sieving (higher values of δ). Overall, these results indicate that voxel sieving is capable of identifying localized synchrony in hemodynamics at multiple levels of granularity, using covariate information in a flexible manner as a pilot.

### 4.2 Real fMRI data

Our experiments with real data involve fMRI data acquired during a free movie viewing study involving Home Improvement sitcoms. With these experiments we aim to explore the spatial nature of detected voxel clusters under a variety of source-specific conditions. Second, we test the ability of these voxel clusters to predict natural sensory stimuli, i.e. to do brain reading. Hence, we test whether we localize brain regions containing information about the external sources, rather than testing for brain regions that activate with the external sources.

Functional data for the real fMRI data sets have been obtained by fitting a 30-coefficient B-spline to the discrete data points, both for voxel activation and stimulation data. The highest level used in hierarchical clustering was *l* = 9 (see Fig. 1a). It produces 2^{9} = 512 voxel clusters with an average size of 70 voxels, while the lowest level of *l* = 12 produces 2^{12} = 4,096 voxel clusters with an average size of 9 voxels. In sect. 4.2.2 we discuss how we selected values of *l* = 9,…,12 to limit the search space to *C* = 2^{9} + 2^{10} + 2^{11} + 2^{12} = 7,680 supervoxels and speed up the search. Hence, the *C* × *S* data matrix **F**(*t*) consists of *C* = 7,680 supervoxels for *S* = 18 scan samples. The sieving parameter δ as described above was set to 0.2. This setup requires 24 computation hours on a standard desktop computer. The parameter λ for controlling supervision was varied between [0, 1] depending on the experiment. Data randomization to separate real from random clusters was done on the basis of *P* = 5 permutations.

We first describe application of voxel sieving for identification of brain areas reacting in synchrony across brain scans in an unsupervised manner. Then we elaborate on supervised voxel sieving for finding across-subject hemodynamic synchrony. An interparticipant correlation map is created to compare our findings with that of an intersubject similarity-based method ([22]).

#### 4.2.1 Interscan synchronization

Unsupervised analysis of the fMRI data implies* λ* = 1. In this case, the projection matrix **P** = **I**. Voxel sieving thus performs a data-driven search for voxel activity patterns with high across-scan variance and high across-voxel coherence. The resulting voxels highlight parts of the brain that act in synchrony during natural movie viewing.

From the Gap statistics at each sieving sequence, it follows that for the first supercluster, the largest Gap occurs when only two supervoxels remain, consisting of 19 voxels. The second supercluster has 27 supervoxels with a total of 299 voxels. All supervoxels are at levels *l* = 11 or *l* = 12. Note that these levels are automatically selected from the available levels by our method. We examined the spatial distribution of individual voxels over known functional areas. The pie chart in Fig. 3 shows that the voxels in the first supercluster are mostly localized in functional areas for motor and action, while voxels in the second cluster are distributed over a wide range of functional areas. We speculate that during passive movie viewing, hemodynamic synchrony is strongly present at brain areas for motor and action.

#### 4.2.2 Intersubject synchronization

*λ*= 0 and consequently activating

**P**

^{2}in Eq. 19. Projection of the data matrix

**F**(

*t*) by

**P**

^{2}and incrementally sieving away supervoxels identifies the voxels highlighted in the first row of Fig. 4. The first supercluster in red contains three supervoxels. Almost half of the 25 individual voxels is located at the temporal lobe where audio processing takes places. The second cluster in

*blue*contains 54 supervoxels (578 voxels). Again all supervoxels are at levels

*l*= 11 or

*l*= 12. Across-subject synchronization is identified at multiple areas across the entire brain. Notice that very specific brain areas are visible with a very strong synchrony rather than a widespread cortical activation pattern as reported in a similar natural movie viewing study ([17]). These specific results are typical of voxel sieving and provide additional insight into correlates of natural movie viewing.

### 4.3 Localization and prediction

We now consider the task of localization of covariate-related brain responses. We analyze the fMRI data under full and partial supervision. Then we concentrate on predicting external covariates on the basis of fMRI data.

#### 4.3.1 Localization

The projection matrix **P** ^{1}(*t*) in Eq. 17 forms the basis for localization of covariate-related brain responses. In a full supervision mode, i.e.* λ* = 0, sieving is performed on the matrix **F**(*t*)**P** ^{1}(*t*). This is the equivalent to standard regression analysis. The aim is to find rows of **F**(*t*) with column means that best regress on the external covariates. However, rather than performing regression voxel-wise or volume-wise, it is here performed on clusters of voxels. This has the benefit of allowing to find multiple specific voxel clusters that are independently related to the stimulus. Furthermore, supervoxels eliminate the need for spatial regularization.

*face*stimulus. For the first cluster the difference between the real explained variance and the randomized explained variance occurs at the last sieving sequence, corresponding to 2 supervoxels. The second supercluster contains 12 supervoxels. The majority of the individual voxels of both superclusters is located in the left fusiform area, which is known to be involved in face processing [15]. The other identified functional areas associated with the

*face*stimulus are temporal inferior lobe, left cerebellum, and left lingual. Almost all of these functional areas are involved in language processing; It is conceivable that these areas activate when perceiving human faces. Note that there is a lot of (spatial) overlap between the two superclusters. The first supercluster in fact is a subset of the second, possibly indicating functional specialization.

*λ*= 0.5. In this case, the supervision criteria is less rigid, providing more room for identifying transient brain activity related to the

*face*stimulus. The first supercluster contains 34 supervoxels. The second supercluster has 27 supervoxels, mostly at higher levels of hierarchy (

*l*∈ [10, 11]). The individual voxels are found at a broader range of spatial and functional areas. Most voxels are found in the following areas: fusiform, temporal inferior lobe, left cerebellum, and left lingual. This gives reason to believe that next to voxels that are directly related to the stimulus many more are transiently related.

#### 4.3.2 Prediction

Evaluation of detected brain responses to naturalistic stimuli, as in our case, is difficult because of lack of appropriate reference material. One way of dealing with this challenge is to invert the task from correlating external covariates with fMRI data to predicting these covariates from the fMRI data. This makes evaluation of detected brain responses more objective [21]. Here, we use partially supervised voxel sieving to uncover voxels that are predictive of the *face* stimulus in our movie data. We concentrate on the *face* stimulus because of the large body of reference material [15].

For various values of* λ* we identify two clusters that we subsequently use as predictors in a functional linear model (see [31] for more detail), with the stimulus as dependent variable and the cluster averages as independent variables, i.e. predictors. In the training phase the best model is selected: a model with one or two predictors. The trained model is then applied in the testing phase on independent data to predict a feature. We use movie 1 data for training and movie 2 data for prediction, and vice versa. Pearson correlation coefficient between manual feature rating functions and the automatically predicted feature functions was used as an evaluation measure.

*λ*= 0.75. This indicates that brain activity patterns that are transiently related to the stimulus are relevant for prediction. The highest cross correlation value of 0.62 is for feature

*faces*for

*λ*= 0.75. The second row of Fig. 8 shows the voxels that have been used for prediction of this feature, with the first supercluster containing 5 supervoxels and the second supercluster 12. As expected, most voxels are localized in brain areas related directly or indirectly to face processing.

The first row of Fig. 8 also shows the distribution of cluster resolutions that were used for prediction. Most of the identified voxel clusters are at the lowest hierarchical level, i.e. have cluster size of approximately nine voxels. Some features such as *environmental sounds*, however, also benefit from supervoxels at higher levels of hierarchy, suggesting that some features are processed more globally than other ones. We note that we restricted our supervoxels to only four hierarchical levels, as these levels performed best in a prediction experiment where we started with supervoxels at the lowest level (*l* = 12, *C* = 4,690) and stepwise included higher levels. Prediction performances for all features and for supervision weight* λ* set to 0, 0.25, 0.5, 0.75, 1, increased steadily up to level *l* = 9. Beyond this level performance first remained stable and then reduced. Hence, at least for the prediction task, a multiresolution approach pays off.

We compared voxel sieving performance with that of the three winning entries of the 2006 EBC Brain Reading competition. These entries used recurrent neural networks, ridge regression, and a dynamic Gaussian Markov Random Field modeling on the same test data benchmark, yielding across feature average cross correlations of 0.49, 0.49, and 0.47, respectively. For the voxel sieving method, the feature average cross correlation value is 0.44. This is good considering that the predictions are based on a reduced data set, while the reported results of the winning entries are based on the full data set. The fact that we have used a smaller training set is likely to have had a negative impact on the prediction results. Note, that in the 2006 competition our entry, an initial version of the voxel sieving method, ranked first in the actor category [32]. We were able to accurately predict which actor the subjects were seeing purely based on fMRI scans [10].

#### 4.3.3 Consistency

*λ*= 0) and prediction experiments (λ = {0, 0.25, 0.5, 0.75, 1}) three times, each time using only fMRI data of a single subject instead of all three. We measured the overlap in supervoxels across subjects in terms of the number of supervoxels that were in the superclusters of all three subjects relative to the total number of supervoxels. We examined consistency separately for the superclusters. Figure 9a shows the results of consistency analysis.

When we only consider supervoxels in the first supercluster, almost 22 percent of the supervoxels from subjects 1, 2, and 3 overlap in the localization task. For the second supercluster the overlap is significantly higher: 28%. We attribute this difference in consistency between the first and second superclusters to number and spatial size of supervoxels, which tend to be larger for the second supercluster. In the prediction task, we computed consistency separately for supervision weights 0, 0.25, 0.5, 0.75, 1 and subsequently averaged these. For larger values of the supervision weight, the supervoxels detected are generally few, spatially confined and variable across subjects, adversely affecting consistency of voxel detections across subjects. The amount of overlap drops for both superclusters to 21 and 18%, respectively. This, however, does not necessarily imply that a source-specific search for hemodynamic synchrony yields more consistency than unbiased probing. It might be that consistency emerges with across subject analysis as reflected in the prediction results based on fMRI data of individual subjects (Fig. 9b). Considering the large amount of supervoxels (*C* = 7,680), the obtained results indicate a reasonable consistency of voxel detections across subject.

## 5 Discussion

We have introduced a statistical signal analysis method for identification of distributed and overlapping synchronous hemodynamic patterns that are directly or transiently linked to their underlying sources. The method is applicable for brain activity from any modality and covariates of any form. We focused on fMRI data from a free natural movie viewing study, as these data generally contain complex distributed and overlapping synchronous hemodynamic patterns. Our experiments showed that voxel sieving is very effective in uncovering both anticipated (visual and auditory regions) and unexpected cortical areas involved in face processing (such as motor and action regions). The viability of voxel sieving to find established or discoverable relations also holds for the other movie stimuli in our data set. There is generally a meaningful relation between cognitive concepts from the movie stimuli and synchronously active brain areas as identified by voxel sieving. The performance of voxel sieving in fMRI-based prediction of the movie stimuli strongly supports the significance of exposed brain areas.

Voxel sieving can be conceived of as a superset of many existing fMRI data analysis methods. When a single cluster is searched for in a fully unsupervised mode (*λ* = 1) without sieving (*δ* = 1), our method reduces to functional principal component analysis [35]. Independent component analysis [4, 11] is approximated when multiple independent clusters are searched using* λ* = 1 and* δ* = 1. In a fully supervised mode (*λ* = 0) using projection matrix **P** ^{1}(*t*), fMRI data are analyzed in a manner analogous to standard regression analysis ([13]). Other projections matrices can be used, for example, for discriminating activity between subjects, groups or conditions (similar to [ 3, 29, 36]). By varying the values of sieving parameter δ and levels of hierarchy *l*, the method enables voxel-wise, cluster-wise, and volume-wise data analysis. In addition, as voxel sieving relies heavily on data averaging and dimension reduction on a data sets from multiple subjects or multiple conditions, it is reasonably robust against multiple testing problems. Hence, voxel sieving is generic in that it can be used for an ensemble of data analysis approaches and tasks.

More importantly, voxel sieving has the capability to uncover patterns in brain activity data that are hard to capture with existing fMRI data analysis techniques. Our method generally identifies multiple specific cortical clusters across the brain. We attribute the specificity of the results to the ability of our method to find voxel activity patterns with both high coherence and high variance, while other similar methods [17, 22, 23] focus on coherence only. Another distinguishing feature of voxel sieving is that identified voxel clusters are independent of each other. Rather than seeking for voxel clusters with similar temporal properties, the method inclines to search for distinct cluster characteristics. As a consequence, once a specific synchrony is captured in one cluster, the same structure will no longer be captured in subsequent clusters. Overlapping voxel clusters, however, are allowed if such voxels induce clusters that uncover distinct brain processes. These aspects of our method are important and cannot be captured by fitting predefined models to voxels or by globally grouping voxels into classes, clusters or components.

In the study, we have experienced, as others have done before, that estimating the number of clusters and finding the optimal cluster size is a difficult task as there is no clear definition of a ‘cluster’. Simulation studies have demonstrated that the Gap estimate is good for identifying well-separated clusters ([18]). However, when data are not clearly separated into groups, suboptimal clusters can be identified. In this case, a more flexible procedure is needed for the determination of the best cluster. One alternative is to select a cluster with a larger size than the optimal cluster and a Gap statistic within a small percentage of the maximal Gap statistic. We have not investigated whether our brain activity data suffer from suboptimal voxel clusters and how alternative procedures effect the performance of voxel sieving.

## 6 Conclusion

Our statistical signal analysis method identifies hemodynamic synchrony that distinguishes well between experimental conditions or cognitive states. Two important properties of these method are that it allows to conveniently specify (1) external sources of variation associated with brain activity and (2) the degree of supervision during the data analysis process. In the absence of prior or external information about brain scans, the method operates in a data-driven manner. When meta-information about brain activity is present, the method uses this for fully or partially supervised data analysis. This flexibility of our method together with its ability to identify multiple, potentially overlapping, brain areas independently of each other and in a multivariate way, makes it appropriate for finding very specific brain responses, even to complex stimuli. We have shown this in the context of a free movie viewing fMRI study, where flexible probing of functional characteristics exposed spatially localized synchronous brain activity at anticipated and less expected brain regions. The significance of these findings is supported by the excellent performance of our method on an international test benchmark for fMRI-based movie stimuli prediction. Hence, we conclude that the unique ability of our method to capture distributed and overlapping hemodynamic responses in a flexible and effective way, suitably complements existing statistical signal processing methods in neuroscience.

## Notes

### Acknowledgments

We acknowledge Prof. dr. Victor Lamme, Dr. Steven Scholte and Dr. Hilde Huizenga (Department of Psychology, University of Amsterdam) for the valuable discussions. Support for this work has been generously provided by the national projects MultimediaN and VL-E, both funded by the Dutch BSIK Program.

### Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

## References

- 1.Aguirre GK, Zarahn E, D’Esposito M (1998) The variability of human, bold hemodynamic responses. NeuroImage 8(4):360–369Google Scholar
- 2.Arfanakis K, Cordes D, Haughton VM, Moritz CH, Quigley MA, Meyerand ME (2000) Combining independent component analysis and correlation analysis to probe interregional connectivity in fMRI task activation datasets. Magn Reson Imaging 18(8):921–930CrossRefGoogle Scholar
- 3.Beckmann C, Jenkison M, Smith SM (2003) General multilevel linear modeling for group analysis in fmri. NeuroImage 20:1052–1063CrossRefGoogle Scholar
- 4.Beckmann C, Smith SM (2004) Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans Med Imaging 23:137–152CrossRefGoogle Scholar
- 5.Boynton GM, Engel SA, Glover GH, Heeger DJ (1996) Linear systems analysis of functional magnetic resonance imaging in human v1. J Neurosci 16(13):4207–4221Google Scholar
- 6.Calhoun V, Adali T (2006) Unmixing fmri with independent component analysis. IEEE Eng Med Biol Mag 25(2):79–90CrossRefGoogle Scholar
- 7.Chuang K-H, Chiu M-J, Lin C-C, Huang K-M, Chiang P-J, Chen J-H (1999) Model free functional mri analysis using kohonen clustering neural network. IEEE Trans Med Imaging 18(12):1117–1128CrossRefGoogle Scholar
- 8.Correa N, Li Y-O, Adali T, Calhoun V (2005) Comparison of blind source separation algorithms for fmri using a new matlab toolbox: Gift. In: Proceedings of IEEE international conference on acoustics, speech, signal processing, vol. 5Google Scholar
- 9.Cox DD, Savoy RL (2003) Functional magnetic resonance imaging (fmri) “brain reading”: detecting and classifying distributed patterns of fmri activity in human visual cortex. NeuroImage 19(2 Pt 1):261–270CrossRefGoogle Scholar
- 10.Editorial (2006) What’s on your mind. Nat Neurosci 7(8):523:534Google Scholar
- 11.Esposito F, Scarabino T, Hyvarinen A, Himberg J, Formisano E, Comani S, Tedeschi G, Goebel R, Seifritz E, Di Salle F (2005) Independent component analysis of fmri group studies by self-organizing clustering. Neuroimage 25(1):193–205CrossRefGoogle Scholar
- 12.Friston KJ, Holmes A, Worsley K, Poline JB, Frith C, Frackowiak R (1995) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 1:189–210Google Scholar
- 13.Friston KJ, Holmes AP, Worsley KJ, Poline JB, Frith C, Frackowiak RSJ (1995) Statistical parametric maps in functional imaging: a general linear approach. Hum Brain Mapp 2:189–210CrossRefGoogle Scholar
- 14.Friston KJ, Josephs O, Rees G, Turner R (1998) Nonlinear event-related responses in fmri. Magn Reson Med 39(1):41–52Google Scholar
- 15.Grill-Spector K, Knouf N, Kanwisher N (2004) The fusiform face area subserves face perception, not generic within-category identification. Nat Neurosci 7(5):555:562Google Scholar
- 16.Handwerker DA, Ollinger JM, D’Esposito M (2004) Variation of bold hemodynamic response function across subjects and brain regions and their effects on statistical analysis. NeuroImage 21(4):1639–1651Google Scholar
- 17.Hasson U, Nir Y, Levy I, Fuhrmann G, Malach R (2004) Intersubject synchronization of cortical activity during natural vision. Science 303(5664):1634–1640Google Scholar
- 18.Hastie T, Tibshirani R, Eisen MB, Alizadeh A, Levy R, Staudt L, Chan W, Botstein B, Brown P (2000) Gene shaving as a method for identifying distinct sets of genes with similar expression patterns. Genome Biol 1(2) (Epub)Google Scholar
- 19.Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P (2001) Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science 293:2425–2430CrossRefGoogle Scholar
- 20.Haynes J, Rees G (2005) Predicting the orientation of invisible stimuli from activity in human primary visual cortex. Nat Neurosci 8:686–691Google Scholar
- 21.Haynes J, Rees G (2006) Decoding mental states from brain activity in humans. Nature Neuroscience 7(7):523:534Google Scholar
- 22.Hejnar MP, Kiehl KA, Calhoun VD (2006) Interparticipant correlations: a model free fmri analysis technique. Hum Brain MappGoogle Scholar
- 23.Kim D, Pearlson GD, Kiehl KA, Bedrick E, Demirci O, Calhoun VD (2008) A method for multi-group inter-participant correlation: abnormal synchrony in patients with schizophrenia during auditory target detection. NeuroImage 39(2):1129–1141CrossRefGoogle Scholar
- 24.Liu Y, Gao JH, Liu HL, Fox PT (2000) The temporal response of the brain after eating revealed by functional mri. Nature 405(6790):1058–1062CrossRefGoogle Scholar
- 25.Marzouk YM, Ghoniem AF (2005) K-means clustering for optimal partitioning and dynamic load balancing of parallel hierarchical n-body simulations. J Comput Phys 207(2):493–528MathSciNetzbMATHCrossRefGoogle Scholar
- 26.Mazziotta JC, Toga AW, Evans A, Fox P, Lancaster J (1995) A probablistic atlas of the human brain: theory and rationale for its development. NeuroImage 2:89–101Google Scholar
- 27.McKeown MJ, Makeig S, Brown CG, Jung TP, Kindermann SS, Bell AJ, Sejnowski TJ (1998) Analysis of fMRI data by blind separation into independent spatial components. Hum Brain Mapp 6(3):160–188CrossRefGoogle Scholar
- 28.McKeown MJ, Li J, Huang X, Lewis MM, Rhee S, Young Truong KN, Wang ZJ (2007) Local linear discriminant analysis (llda) for group and region of interest (roi)-based fmri analysis. Neuroimage 37(3):855–65CrossRefGoogle Scholar
- 29.Mouro-Miranda J, Bokde ALW, Born C, Hampel H, Stetter M (2005) Classifying brain states and determining the discriminating activation patterns: support vector machine on functional mri data. NeuroimageGoogle Scholar
- 30.Ngan S, Hu X (1999) Analysis of functional magnetic resonance imaging data using self-organizing mapping with spatial connectivity. Magn Reson Med 41(5):939–946CrossRefGoogle Scholar
- 31.Ramsay J, Silverman B (1997) Functional data analysis. Springer, BerlinGoogle Scholar
- 32.Schneider W, Bartels A, Formisano E, Haxby J, Goebel R, Mitchell T, Nichols T, Siegle G (2006) Competition: inferring experience based cognition from fmri. In: Proceedings organization of human brain mapping, Florence, ItalyGoogle Scholar
- 33.Tibshirani R, Walther G, Hastie T (2001) Estimating the number of clusters via the gap statistic. J R Stat Soc B 63(2)Google Scholar
- 34.van Rijsbergen CJ (1979) Information retireval. Butterworths, TorontoGoogle Scholar
- 35.Viviani R, Grohn G, Spitzer M (2005) Functional principal component analysis of fmri data. Hum Brain Mapp 24:109–129CrossRefGoogle Scholar
- 36.Wang Z, Childress AR, Wang J, Detre JA (2007) Support vector machine learning-based fmri data group analysis. NeuroimageGoogle Scholar
- 37.Worsley KJ, Poline JB, Friston KJ, Evans AC (1997) Characterizing the response of pet and fmri data using multivariate linear models. Neuroimage 6(4):305–319Google Scholar