1 Introduction

In general, resting-state functional connectivity is a set of pair-wise connectivity measurements, each of which describes the strength of co-activity between two regions in human brain. In many group comparison studies, FC obtained from resting-state fMRI shows observable abnormal patterns in patient cohort to understand different disease mechanisms. In clinical practice, FC is regarded as an important biomarker for disease diagnosis and monitoring in various clinical applications such as Alzheimer’s disease [1] and Autism [2].

In current functional brain network studies, Pearson’s correlation on BOLD (Blood Oxygen Level Dependent) signals is widely used to measure the strength of FC between two brain regions [2, 3]. It is worth noting that such correlation based connectivity measure is exclusively calculated based on the observed BOLD signals and fixed for the subsequent data analysis. However, the BOLD signal usually has very poor signal-to-noise ratio and is mixed with substantial non-neural noise and artefacts. Therefore, it is hard for current state-of-the-art methods to determine a good threshold of FC measure which can effectively distinguish real and spurious connections.

For simplicity, many FC characterization methods assume that connectivity patterns in the brain do not change over the course of a resting-state fMRI scan. There is a growing consensus in the neuroimaging field, however, that the spontaneous fluctuations and correlations of signals between two distinct brain regions change with correspondence to cognitive states, even in a task-free environment [4]. Thus, dynamic FC patterns have been investigated recently by mainly using sliding window technique [4, 1113]. However, it is very difficult to synchronize the estimated dynamic patterns with the real fluctuations of cognitive state, even using advanced machine learning techniques such as clustering [5] and hidden Markov model [6]. For example, both methods have to determine the number of states (clusters) which might work well on the training data but have the potential issue of generality to the unseen testing subjects.

To address above issues, we propose a novel data-driven solution to reveal the consistent spatial-temporal FC patterns from resting-state fMRI image. Our work has two folds. First, we present a robust learning-based method to optimize FC from the BOLD signals in a fixed sliding window. In order to avoid the unreliable calculation of FC based on signal correlations, high level feature representation is of necessity to guide the optimization of FC. Specifically, we apply singular value decomposition (SVD) to the tentatively estimated FC matrix and regard the top ranked eigenvectors are as the high level network features which characterize the principal connection patterns across all brain regions. Thus, we can optimize functional connections for each brain region based on not only the observed region-to-region signal correlations but also the similarity between high level principal connection patterns. In turn, the refined FC can lead to more reasonable estimation of principal connection patterns. Since brain network is intrinsically economic and sparse, sparsity constraint is used to control the number of connections during the joint estimation of principal connection patterns and the optimization of FC. Second, we further extend the above FC optimization framework from one sliding window (capturing the static FC patterns) to a set of overlapped sliding windows (capturing the dynamic FC patterns), as shown in the middle of Fig. 1. The leverage is that we arrange the FCs along time into a tensor structure (pink cubic in Fig. 1) and we employ additional low rank constraint to penalize the oscillatory changes of FC in the temporal domain.

Fig. 1.
figure 1

The advantage of our learning-based spatial-temporal FC optimization method (bottom) over the conventional method (top) which calculate the FC based on signal correlations. As the trajectory of FC at Amygdala shown in the right, the dynamic FC optimized by our learning-based method is more reasonable than the conventional correlation-based method.

In this paper, we apply our learning-based method to find the spatial-temporal functional connectivity patterns for identifying childhood autism spectrum disorders (ASD). Compared with conventional approaches which simply calculate FC based on signal correlations, more accurate classification results have been achieved in classifying normal control (NC) and ASD subjects by using our learned spatial-temporal FC patterns.

2 Method

2.1 Construct Robust Functional Connectivity

Let \( {\mathbf{x}}_{i} \in {\Re }^{W \times 1} \) denote the mean BOLD signal calculated in brain region O i (\( i = 1, \ldots ,N \)), where W is the length of time course within the sliding window. Conventionally, a \( N \times N \) connectivity matrix \( {\mathbf{S}} \) is used to measure the FCs in the whole brain, where each element \( s_{ij} \) quantitatively measure the strength of FC between region O i and O i (\( i \ne j \)). For convenience, we use \( {\mathbf{s}}_{i} \in {\Re }^{N \times 1} \) denote i-th column in connectivity matrix \( {\mathbf{S}} \), which characters the connections w.r.t. other brain regions. Since the signal-to-noise ratio of observed \( {\mathbf{x}}_{i} \) is low, high level feature is of necessity to guide the estimation of connectivity matrix \( {\mathbf{S}} \). To achieve it, we apply singular value decomposition to \( {\mathbf{S}} \) and regard the top ranked eigenvectors matrix \( \varvec{F}_{{K \times \varvec{N}}} = \left[ {{\mathbf{f}}_{i} } \right]_{i = 1, \ldots ,N} \) as the high level network features, where each \( {\mathbf{f}}_{i} \in {\Re }^{K \times 1} \) denotes the principal connection pattern on region O i . Thus, instead of calculating the connectivity \( s_{ij} \) based on correlation \( c\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right) \) between observed BOLD signals \( {\mathbf{x}}_{i} \) and \( {\mathbf{x}}_{j} \), we require the optimal connectivity \( s_{ij} \) should (1) be in consensus with the correlation of low level signals between \( {\mathbf{x}}_{i} \) and \( {\mathbf{x}}_{j} \); and (2) be in line with similarity of high level principal connection patterns between \( {\mathbf{f}}_{i} \) and \( {\mathbf{f}}_{j} \). To that end, the objective function is defined as:

$$ \begin{aligned} & { \arg }\;{ \hbox{min} }_{{s_{i,j} ,{\mathbf{f}}_{i} }} \sum\nolimits_{i = 1}^{N} {\left[ {\sum\nolimits_{j = 1}^{N} {\left( {\left\| {1 - c\left( {{\mathbf{x}}_{i} ,{\mathbf{x}}_{j} } \right)} \right\|_{2}^{2} s_{ij} + \left\| {{\mathbf{f}}_{i} - {\mathbf{f}}_{j} } \right\|_{2}^{2} s_{ij} } \right) + r_{1} \left\| {{\mathbf{s}}_{i} } \right\|_{1} + r_{2} \left\| {{\mathbf{s}}_{i} } \right\|_{2}^{2} } } \right]} \\ & \quad \quad \quad \quad \quad \quad \quad \quad \quad \quad s.t. \;\forall i,\; {\mathbf{s}}_{i} > 0, \\ \end{aligned} $$
(1)

where \( r_{1} \) is the scalar controlling the strength of connection sparsity for each connection pattern \( {\mathbf{s}}_{i} \). In order for robustness, \( L_{2} \) norm is applied to \( {\mathbf{s}}_{i} \). Since the estimation of \( s_{ij} \) and \( {\mathbf{f}}_{i} \) are coupled, we propose the following solution to alternative solve \( s_{ij} \) and \( {\mathbf{f}}_{i} \):

  1. (1)

    Initialize connectivity matrix by letting \( s_{ij} = c({\mathbf{x}}_{i} ,{\mathbf{x}}_{j} ) \);

  2. (2)

    Given \( {\mathbf{S}} \), obtain the principal connection pattern \( {\mathbf{f}}_{i} \) for each region O i by applying eigenvalue decomposition to \( {\mathbf{S}} \) since S is symmetric. After that, we select the top K eigenevectors.

  3. (3)

    Fixing \( {\mathbf{f}}_{i} \), we divide the estimation of \( s_{ij} \) in Eq. (1) into two sub-tasks: (a) Estimate \( s_{ij} \) without the sparsity constraint. Since the objective function without the \( L_{1} \) norm can be reformulated into a quadratic form, we can use Karush Kuhn Tucker (KKT) [7] algorithm to optimize \( s_{ij} \). (b) Make the connection pattern \( {\mathbf{s}}_{i} \) sparse. The objective function requires the optimized connection pattern \( {\mathbf{s}}_{i} \) not only sparse but also close to the solution in step 3(a) Standard Alternating Direction Method of Multipliers (ADMM) [7, 8, 14] can be used to solve this sub-task.

  4. (4)

    Go to step 2 until converge.

Typical optimized connectivity matrix \( {\hat{\mathbf{S}}} \) is shown in the pink cubic in Fig. 1. Compared to the connectivity matrix by conventional method based on the signal correlation, our learned connectivity matrix is much sparser and it becomes much easier to construct the brain network since a lot of spurious connections have been removed by using the sparsity constraint during optimization.

2.2 Characterize Dynamic Functional Connectivity

Next, we extend our learning based FC estimation method to address the problem of dynamic connectivity in fMRI data. Here, we follow the sliding window technique to obtain T overlapped sliding windows which cover the whole time course. Since we can optimize the connectivity matrix \( {\mathbf{S}}^{t} \) for each sliding window, we employ a tensor \( {\mathbb{S}} = \{ {\mathbf{S}}_{t} |t = 1, \ldots ,T\} \in {\Re }^{N \times N \times T} \) to describe the dynamic connectivity. Similarly, we construct tensor \( {\mathbb{C}} = \{ {\mathbf{C}}_{t} |t = 1, \ldots ,T\} \in {\Re }^{N \times N \times T} \) regarding the dynamic correlation of BOLD signals and tensor \( {\mathbb{F}} = \{ {\mathbf{F}}_{t} |t = 1, \ldots ,T\} \in {\Re }^{N \times N \times T} \) regarding the dynamic similarity of principal connection patterns, where \( {\mathbf{C}}_{t} = \left[ {c_{ij}^{t} } \right]_{i,j = 1, \ldots ,N} (c_{ij}^{t} = 1 - c\left( {{\mathbf{x}}_{i}^{t} ,{\mathbf{x}}_{j}^{t} } \right)) \) and \( \varvec{F}_{t} = \left[ {{\mathbf{f}}_{ij} } \right]_{i,j = 1, \ldots ,N} ({\mathbf{f}}_{ij}^{t} = \left\| {{\mathbf{f}}_{i}^{t} - {\mathbf{f}}_{j}^{t} } \right\|_{2}^{2} ) \) are the \( N \times N \) matrices in t-th sliding window. Then, we extend the objective function in Eq. (1) to the spatial-temporal domain using tensor analysis:

$$ \begin{aligned} { \arg }\,{ \hbox{min} }_{{\mathbb{S}}} {\mathbb{C}} \otimes {\mathbb{S}} + {\mathbb{F}} \otimes {\mathbb{S}} & + \alpha \left\| {{\mathbb{S}}_{\left( 1 \right)} } \right\|_{*} + r_{1} \left\| {{\mathbb{S}}_{\left( 2 \right)} } \right\|_{1} + r_{2} \left\| {{\mathbb{S}}_{\left( 2 \right)} } \right\|_{F}^{2} \\ & s.t.\; \forall i,t,\; {\mathbf{s}}_{i}^{t} > 0, \\ \end{aligned} $$
(2)

where \( {\mathbb{C}} \otimes {\mathbb{S}} = \sum\nolimits_{t = 1}^{T} {\left( {{\mathbf{C}}_{t} } \right)^{T} {\mathbf{S}}_{t} } \) and \( {\mathbb{F}} \otimes {\mathbb{S}} = \sum\nolimits_{t = 1}^{T} {\left( {{\mathbf{F}}_{t} } \right)^{T} {\mathbf{S}}_{t} } \). We use \( {\mathbb{S}}_{(k)} \) denote the unfolding operation to a general tensor \( {\mathbb{S}} \) along the k-th mode. In our method, we have \( {\mathbb{S}}_{({\bf 1})} \in {\Re }^{{N^{2} \times T}} \) and \( {\mathbb{S}}_{({\bf 2})} \in {\Re }^{NT \times N} \). Since brain in resting state generally transverses a small number of discrete stages during a short period of time [4], we require the change of connectivity matrix \( {\mathbf{S}}_{t} \) to be smooth along time. Thus, it is reasonable to apply low rank constraint on \( {\mathbb{S}}_{\left( 1 \right)} \) such that the minimization of \( \left\| {{\mathbb{S}}_{\left( 1 \right)} } \right\|_{*} \) (nuclear norm of \( {\mathbb{S}}_{\left( 1 \right)} \)) can suppress too rapid FC change in the temporal domain. \( L_{1} \)-norm is applied to \( {\mathbb{S}}_{\left( 2 \right)} \) since the brain network within each sliding window is sparse.

Optimization.

In order to make the optimization of Eq. (2) tractable, we introduce two dummy variables \( {\mathbf{Z}}_{1} \) and \( {\mathbf{Z}}_{2} \) so that we can solve this problem using ADMM [7, 8]:

$$ \begin{aligned} & { \arg }\,{ \hbox{min} }_{{{\mathbb{S}},{\mathbf{Z}}_{1} ,{\mathbf{Z}}_{2} }} \sum\nolimits_{t = 1}^{T} {\left[ {\left( {{\mathbf{C}}_{t} } \right)^{T} {\mathbf{S}}_{t} + \left( {{\mathbf{F}}_{t} } \right)^{T} {\mathbf{S}}_{t} + r_{2} \left\| {{\mathbf{S}}_{t} } \right\|_{F}^{2} } \right] + \alpha \left\| {{\mathbf{Z}}_{1} } \right\|_{*} + r_{1} \left\| {{\mathbf{Z}}_{2} } \right\|_{1} } \\ & \quad \quad \quad \quad \quad \quad \quad s.t. \;\forall i,t, \;{\mathbf{s}}_{i}^{t} > 0,\; {\mathbb{S}}_{\left( 1 \right)} = {\mathbf{Z}}_{1} ,\; {\mathbb{S}}_{\left( 2 \right)} = {\mathbf{Z}}_{2} . \\ \end{aligned} $$
(3)

Using Lagrangian multipliers, we can remove the equality constraints in Eq. (3) and reformulate Eq. (3) into:

$$ \begin{aligned} & { \arg }\,{ \hbox{min} }_{{{\mathbb{S}},{\mathbb{Z}}_{1} ,{\mathbb{Z}}_{2} }} \sum\nolimits_{t = 1}^{T} {\left[ {\left( {{\mathbf{C}}_{t} } \right)^{T} {\mathbf{S}}_{t} + \left( {{\mathbf{F}}_{t} } \right)^{T} {\mathbf{S}}_{t} + r_{2} \left\| {{\mathbf{S}}_{t} } \right\|_{F}^{2} } \right] + \alpha \left\| {{\mathbf{Z}}_{1} } \right\|_{*} + r_{1} \left\| {{\mathbf{Z}}_{2} } \right\|_{1} } \\ & + \frac{{\mu_{1} }}{2}\left\| {{\mathbb{S}}_{\left( 1 \right)} - {\mathbf{Z}}_{1} } \right\| _{F}^{2} + {\varvec{\Lambda}}_{1}^{T} \left( {{\mathbb{S}}_{\left( 1 \right)} - {\mathbf{Z}}_{1} } \right) + \frac{{\mu_{2} }}{2}\left\| {{\mathbb{S}}_{\left( 2 \right)} - {\mathbf{Z}}_{2} } \right\| _{F}^{2} + {\varvec{\Lambda}}_{2}^{T} \left( {{\mathbb{S}}_{\left( 2 \right)} - {\mathbf{Z}}_{2} } \right), \\ \end{aligned} $$
(4)

where \( {\varvec{\Lambda}}_{1} \) and \( {\varvec{\Lambda}}_{2} \) are the \( N^{2} \times T \) Largrangian multiplier matrix, and \( \mu_{1} \) and \( \mu_{2} \) are the penalty parameters. Furthermore, we solve Eq. (4) by alternatively optimize \( {\mathbb{S}} \), \( {\mathbf{Z}}_{1} \) and \( {\mathbf{Z}}_{2} \) until Eq. (4) converges. The dynamic connectivity matrices \( {\mathbf{S}}_{t} \) can be optimized by following the Karush Kuhn Tunker (KKT) method in [9]. Standard soft threshold shrinkage method [7] can be used to solve \( {\mathbf{Z}}_{1} \) and \( {\mathbf{Z}}_{2} \).

2.3 Identifying ASD Subject with the Learned Static/Dynamic FC Patterns

Conventional method first calculate the connectivity matrix \( {\mathbf{S}} \) within certain window and then extract N dimension node-wise features which describe the connectivity efficiency at each ROI (Region of Interest) [10]. After that, classic SVM (linear kernel and L2 penalty, https://www.csie.ntu.edu.tw/~cjlin/libsvm/) is trained to identify individual ASD subjects. We follow the same approach except we extract the node-wise features from our learned static/dynamic connectivity matrices.

3 Experiment

Image Preprocessing.

We conducted various experiments on resting-state fMRI images from both NYU and UM sites in Autism Brain Imaging Data Exchange (ABIDE) database, in order to demonstrate the generality of our method. Specifically, 45 NC and 45 ASD subjects are selected from the NYU site. 74 NC and 57 ASD subjects are selected from UM site. The subjects from NYU site and UM site were scanned for six and ten minutes during resting state, respectively, producing 180 time points and 300 time points at a repetition time (TR) of 2 s. We processed all these data using Data Processing Assistant for Resting-State fMRI (DPARSF) software. Specifically, we remove the first 20 time points and last 20 time points for robustness. After that, we corrected the fMRI images by slice timing and motion correction. Then, we register individual subjects to the standard space, apply the AAL template with 116 ROIs to the subject image domain and compute the mean BOLD signal in each ROI, where conventional method calculate the \( 116 \times 116 \) connectivity matrix \( {\mathbf{S}} \) based on the correlation of mean BOLD signals between any pair of two distinct brain regions.

Experiment Setup.

Ten-fold cross validation strategy is used in our experiments. We randomly partition the subjects in the NC and ASD groups into 10 non-overlapping approximately equal size sets. At each subject, we apply our learning-based method to optimize the static/dynamic functional connectivity matrices and extract the node-wise network features. Then, we use one fold for testing and the remaining folds are used for training the SVM. The training subjects are further divided into 5 subsets for another 5-fold inner cross validation to learn the optimal parameters. The optimal principle component range is [6, 8] and it take average 10 s to run one subject.

Evaluation of Learned Static/Dynamic FC Patterns in NC/ASD Classification.

We first manually set up the sliding window size which ranges from 20 % to 100 % of the entire time course. In optimizing the dynamic FC pattern, we set the shift of sliding window to 1 TR, in order to fully capture the dynamics of FC. The NC/ASD classification results on UM and NYU dataset are shown in Tables 1 and 2, respectively. In these two tables, ACC represents accuracy and AUC represent the area under an ROC curve. It is clear that SVM using the learned dynamic FC patterns outperforms the conventional correlation based FC patterns and the learned static FC patterns, where using the learned dynamic FC patterns can achieve almost 8 % increase over the correlation-based FC patterns and almost 3 % increase over using the learned static FC patterns.

Table 1. Accuracy of identifying ASD subjects on UM dataset w.r.t. sliding window size.
Table 2. Accuracy of identifying ASD subjects on NYU dataset w.r.t. sliding window size.

Visualization of Dynamic Function Connectivity Patterns.

The learned dynamic functional connectivity matrices \( \left\{ {{\mathbf{S}}_{t} } \right\} \) for one ASD subject (in blue box) and one NC subject (in red box) are displayed in the top of Fig. 2. For comparison, we also show the corresponding connectivity matrices independently calculated based on the correlation of BOLD signals in the bottom of Fig. 2. It is apparent that our learned dynamic connectivity matrices are much sparser than the counterpart matrices using conventional method. Thus, it becomes much easier to construct functional brain network using threshold based approach. In order to evaluate the dynamic functional transit, we visualize the ROIs connected to Amygdala along time by examining the estimated connectivity matrices since Amygdala is a critical sub-cortical structure related to Autism. As the red masks shown in Fig. 2, the transitions of ROIs connected to Amygdala revealed by our learned dynamic FC patterns are much more consistent than those by conventional correlation based FC patterns, where the transit of connected ROIs is unrealistically fast and random. Specifically, we transpose the connection vector \( {\mathbf{s}}_{i} \) (i denotes the index for Amygdala here) into column vector and sequentially arrange them into a matrix and visualize the transition of \( {\mathbf{s}}_{i} \) along time in the right of Fig. 1. From the trajectory of each element in \( {\mathbf{s}}_{i} \), it is clear that the dynamic FC optimized by our leaning-based method is more reasonable than the conventional method.

Fig. 2.
figure 2

Visualization of dynamic functional connectivity matrices by our learning-based method (top) and conventional correlation based method for an ASD subject (a) and a NC subject (b).

Validation of FC Pattern.

We also checked the learned fc pattern and found that the connection pattern for vision function regions, such as Lingual gyrus, Cuneus, Parahippocampalgyrus, and for motion function such as Putamen, Globus Pallidus are consistently stable for all subjects. We also found that the learned fc pattern is similar as correlation fc pattern, however, the change is much smoother along time which is consistent with current neuroscience findings.

4 Conclusion

In this work, we propose a novel learning-based method to discover both static and dynamic connectivity patterns from resting-state fMRI data. For static FC estimation, our method optimizes the functional connectivity based on not only the correlation of low level BOLD signals but also the similarity of high level principal components from the link-to-link connectivity patterns. To address the problem of dynamic functional connectivity, we arrange connectivity matrices along time into a tensor structure and apply sparsity to suppress spurious functional connectivities and low rank to avoid unrealistic fast state transition along time. We use our method to obtain dynamic connectivity patterns and apply them to identify ASD subject at individual level, where classification method using our learned dynamic connectivity patterns can improve the ASD identification accuracy with almost 8 % increase conventional correlation-based framework.