1 Introduction

As a challenging and interesting task, accurate diagnosis of Alzheimer’s disease (AD) and its prodromal stage, i.e., mild cognitive impairment (MCI), is very important for early treatment and possible delay of disease progression. A large number of pattern analysis methods have been proposed and applied to identifying disease-related imaging markers from advanced medical imaging techniques, e.g., functional magnetic resonance imaging (fMRI). Compared with other imaging techniques, fMRI provides a non-invasive way to quantify the functional interaction of the cerebrum, thus providing an insight into the basic mechanism and cognitive processes of the human brain [1]. These interaction patterns among brain regions are usually characterized as connectivity networks (CNs), and used for brain disease analysis and diagnosis by using graph/network based methods, thus helping us better understand the pathological underpinnings of neurological disorder. Hence, functional CNs using resting-state fMRI (rs-fMRI) have been widely applied to automated diagnosis of AD/MCI [2].

Studies on functional CNs currently focus on two aspects: (1) traditional CNs and (2) dynamic CNs (DCN). The former usually implicitly assumes that functional connectivity is a constant (i.e., temporal stationary) throughout recording period in rs-fMRI. However, the dynamics of CNs are neglected in these studies. The latter focuses on the temporal changes of functional connectivities between specific brain regions. Numerous studies have indicated that the changes of functional connectivity over time may be related to cognitive and vigilance state [3], and is critical for better understanding the underpinnings of pathology of brain diseases [4]. And, studies have found that AD is associated with changes of functional connectivity over time [5]. All these studies usually construct the CNs by simply calculating the Pearson correlation coefficients (PCCs) between time series from brain regions, and then extract the low-level measures (e.g., clustering coefficients) from constructed CNs as features to train the learning model (e.g., support vector machine, SVM). However, first, in network construction, the valuable observation information (e.g., specific contributions of different time points) is neglected in these studies. Intuitively, different time points should have different contributions for characterizing interaction between brain regions. Second, the high-level (i.e., high-order) network properties that could further improve the performance are also neglected in feature learning step. In addition, since network construction, feature learning and classification are separately performed, it could yield sub-optimal learning model, thus decreasing the classification performance.

Fig. 1.
figure 1

Architecture of the proposed wck-CNN framework for DCN construction and analysis using fMRI data. There are four convolutional layers, i.e., con1: connectivity construction layer, con2: regional feature layer, con3: brain-network feature layer and con4: temporal feature layer, and two fully connected layers (i.e., FC1 and FC2) including 64 and 32 units, respectively. Here, the kernel sizes in four layers are \(1\times L_1\), \(N\times L_2\), \(N\times L_3\) and \(1\times L_4\) (with the corresponding kernel numbers of \(M_1\), \(M_2\), \(M_3\), \(M_4\)), respectively. \(T_1\), \(T_2\) and \(T_3\) denote the total operations of using kernel along temporal dimension for con1, con2 and con3 layers, respectively. T is the length of time series of each ROI, and N is the number of ROIs.

To address these problems and motivated by recent successful applications of convolutional neural network (CNN) in the natural image analysis field, in this paper we first define a weighted correlation kernel (called wc-kernel) for calculating the correlation between brain regions by using learned weights to characterize the contributions of different time points. Compared with the PCC method, the proposed wc-kernel can capture the specific contributions of different time points, thus conveying the richer interaction information among brain regions. Furthermore, we propose a wc-kernel based CNN (called wck-CNN) framework for defining/extracting the hierarchical (i.e., from low-order to high-order) functional connectivities for disease diagnosis, by using fMRI data. To the best of our knowledge, our proposed method is among the first attempt to define the correlation kernel in CNN for characterizing the interactions among brain regions, and explore a unified CNN framework for DCN construction and analysis using fMRI data. Figure 1 shows the architecture of the proposed wck-CNN framework. Specifically, we first define a layer to build DCNs using the defined wc-kernels. Here, multiple DCNs can be constructed using multiple wc-kernels, with each DCN reflecting changes of CNs over time, thus conveying richer dynamic information of brain network. Then, we build other three layers to extract local (brain-region specific), global (network specific) and temporal high-order properties from the constructed low-order functional connectivities as features for classification. Results on 174 subjects (a total of 563 scans) with rs-fMRI data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database demonstrate the efficacy of our method.

2 Method

2.1 Subjects and Image Preprocessing

We use a total of 174 subjects, including 48 NCs (\(28\hbox {female}\ (\hbox {F})/20\hbox {male}\) (M), aged \(76.0\pm 6.8\) years), 50 early MCI (eMCI) (\(30\hbox {F}/20\hbox {M}\), aged \(72.4\pm 7.1\) years), 45 late MCI (lMCI) (\(18\hbox {F}/27\hbox {M}\), aged \(72.3\pm 8.1\ \hbox {years}\)) and 31 AD (\(15\hbox {F}/16\hbox {M}\), aged \(73.2\pm 7.3\ \hbox {years}\)), with rs-fMRI data from ADNI database. Totally, there are 563 scans covering nine possible stages (i.e., baseline, 6, 12, 24, 36, 48, 60, 72 and \(84\ \hbox {months}\)), including 154, 165, 145, 99 scans for NC, eMCI, lMCI and AD subject groups, respectively. There are 147 subjects with baseline scans, and other 27 subjects without baseline scan. The image resolution is 2.29–3.31 mm for inplane and 3.31 mm for slice thickness, TE (echo time) is 30 ms and TR (repetition time) is 2.2–3.1 s. For each subject, there are 140 volumes.

Image pre-processing is performed for all rs-fMRI data by using a standard pipeline in FSL FEAT software package (http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FEAT), including removing the first 3 volumes, slice time correction, motion correction, bandpass filtering, and regression of white matter, CSF, and motion parameters. The subjects with large head motion (i.e., larger than 2.0 mm or \(2^{\circ }\)) are discarded, since the head motion has substantial effects on functional CN measures [6]. Structural skull stripping is performed using FSL, which is used to register the fMRI to the Montreal Neurological Institute (MNI) space. The fMRI data are then spatially smoothed using a 6 mm Gaussian kernel. The subjects with more than 2.5 min of large frame-wise displacement (\({>}0.5\)) are excluded in this study. The BOLD signals are band-pass filtered (\(0.015\le f \le 0.15\) Hz). The mean time series are extracted from each of the 116 regions of interest (ROIs) by the automated anatomical labeling (AAL) template [7]. The time point signal from each ROI i is normalized using the following scheme:

$$\begin{aligned} g(z)=(z-\mu _i)/\sigma _i \end{aligned}$$
(1)

where z corresponds to the time point signal from the ROI i, \(\mu _i\) and \(\sigma _i\) are the mean and standard deviation of time series from the ROI i, respectively.

2.2 Proposed Weighted Correlation Kernel

To capture the specific contributions of different time points, we defined a weighted correlation kernel for calculating the correlation between brain regions, i.e.,

$$\begin{aligned} k({\mathbf{x}_i,\mathbf{x}_j})=\sum _{l=1}^{L_1}{\mathbf{w}^l\mathbf{x}^l_i\mathbf{x}^l_j} \end{aligned}$$
(2)

where \(\mathbf{x}_i\) is the normalized (using Eq. 1) time series of the ROI i, \(\mathbf{x}^l_i\) is the \(l^{th}\) time point, \(\mathbf{w}=[\mathbf{w}^1,\mathbf{w}^2,\dots ,\mathbf{w}^{L_1}]\) is a weight vector, and the kernel size is \(1\times L_1\).

According to the definition in Eq. 2, the wc-kernel calculates the correlation between time series of a pair of ROIs by using a weight \(\mathbf{w}^l\) to characterize the specific contribution of each time point, thus conveying richer interaction information of brain regions compared with the PCC method, since the latter computes the correlation of brain regions using the same contribution for all time points (i.e., with all weights in \(\mathbf{w}\) equaling to 1). Therefore, the defined wc-kernel is actually an extension of the PCC.

2.3 Architecture of the Proposed Wc-Kernel Based CNN

As shown in Fig. 1, the proposed wck-CNN framework includes four convolutional layers (i.e., Con1, Con2, Con3 and Con4) and two fully connected layers (i.e., FC1 and FC2). Each layer uses a rectified linear unit (ReLU) as the activation function, and each fully connected layer is followed by dropout with a rate of 0.50. The input of this model is time series of all ROIs, and the output (via soft-max) is the probability of the subject belonging to four categories (i.e., NC, eMCI, lMCI and AD). Here, \(M_1\), \(M_2\), \(M_3\) and \(M_4\) denote the numbers of kernels in four convolutional layers, respectively. Next, we will present the details of the four convolutional layers.

Con1: Connectivity Construction Layer

We define a connectivity construction layer for CN construction using the defined wc-kernels, with time series of ROIs as the input. The output of this layer with a given wc-kernel is a matrix \(\mathbf{C}\in R^{N^2\times T_1}\) as

$$\begin{aligned} \mathbf{C}_{i+j-1,t} =k({\mathbf{S}^t_i,\mathbf{S}^t_j}) \end{aligned}$$
(3)

where k is the wc-kernel defined in Eq. 2, \(\mathbf{S}_i\in R^{T}\) denotes the whole time series from the \(i^{th}\) ROI, \(\mathbf{S}^t_i\) denotes the corresponding segment of time series when performing the \(t^{th}\) operation of sliding kernel along the temporal dimension (corresponding to time series of ROIs), T is the length of time series, \(T_1\) is the total operations of sliding kernel along the temporal dimension, and N is the number of ROIs.

In connectivity construction layer, the convolution along the spatial dimension (corresponding to any pair of ROIs) computes the functional connectivity between ROIs, reflecting their interactions. Thus, each column in \(\mathbf{C}\) denotes a CN. The convolution along the temporal dimension computes different functional connectivities of the same pair of ROIs within different segments of time series (similar to the sliding window method in conventional DCN construction), reflecting the changes of functional connectivity over time. Thus, the matrix \(\mathbf{C}\) denotes a DCN, reflecting dynamics of CNs. Finally, the output of this layer with \(M_1\) wc-kernels is a 3D tensor, which includes \(M_1\) DCNs, conveying richer dynamic information of brain networks.

Con2: Regional Feature Layer

Following the connectivity construction layer, we build a regional feature layer to learn local (i.e., brain-region specific) high-order features by using the DCNs in connectivity construction layer. Specifically, we use the kernels with the size of \(N\times L_2\), and set the size of stride along both dimensions (i.e., temporal and spatial dimensions) to (N, 1). Thus, the convolution along the spatial dimension is a feature mapping for each ROI by computing the weighted combination of functional connectivities connected to that ROI across \(L_2 ({>}1)\) neighboring time points (i.e., CNs). The convolution along the temporal dimension corresponds to the different feature mappings for the same ROI over time, reflecting temporal variability of ROI. Note that features learned in this layer are high-order since they are calculated based on series of functional connectivities of specific ROI across multiple CNs, thus characterizing temporal properties of functional connectivity series of specific ROI.

Con3: Brain-Network Feature Layer

Following the regional feature layer, we build a brain-network feature layer to learn the global (i.e., brain-network specific) high-order features of whole CN using brain-region specific features. Specifically, we use the kernels with the size of \(N \times L_3\), and set the size of stride along both dimensions to (1, 1). Therefore, the convolution along the spatial dimension is a feature mapping for the whole CN by computing the weighted combination of all brain-region specific features across \(L_3\) (\({>}1\)) neighboring time points. The convolution along the temporal dimension corresponds to different mappings of the whole CN over time, reflecting the temporal variability of the whole brain network. Similar to the regional feature layer, the features learned in this layer are also high-order.

Con4: Temporal Feature layer

To reduce the feature dimensionality, we further build a temporal feature layer to learn high-level temporal feature. Specifically, we use a kernel with the size of \(1\times L_4\), set the size of stride along both dimensions to (1, 1), and perform an average-pooling (AP) operation after convolution for mapping all features into a feature. Thus, the output of this layer with a learned kernel can be used as a measure for the temporal variability of the whole CN.

3 Experiments

Experimental Settings:

We perform a multi-class task, i.e., NC vs. eMCI vs. lMCI vs. AD classification, by using a 5-fold cross-validation. Specifically, the set of 147 subjects with baseline scan is (roughly) equivalently partition into five subsets. One subset is selected as the testing data. The remaining four subsets and the set of 27 subjects without baseline scan are combined as the training subjects. Note, to enhance the generalization of model, all scans of each training subject are used as training data, with each scan as an independent sample but with the same class label. We evaluate the performance by computing the overall accuracy of four categories, and the accuracy for each category. In the experiment, we set the parameters \(M_1=16\), \(M_2=32\), \(M_3=64\), \(M_4=64\), \(L_1=70\), \(L_2=2\), \(L_3=2\), \(L_4=8\). In connectivity construction layer, we set the size of stride along temporal dimension to 2. Note that other scans (excepting for the baseline) of the testing subjects are not used for training or testing.

We first compare the proposed method with two traditional learning methods, including (1) baseline method (donted as BL) and (2) SVM method with local clustering coefficients (denoted as SVM). In both methods, the CN of each subject is first built by computing the PCC between the whole time series of a pair of ROIs, and the connectivity strengths and the local clustering coefficients are then extracted from constructed CNs as features, respectively. A t-test method with the threshold (i.e., \(p-\hbox {value}<0.05\)) is used for feature selection, followed by a linear SVM with default parameters for classification. Here, a one-to-all strategy is used for multi-class task.

To further evaluate the contributions of the proposed method, we compare wck-CNN with its three variants. These variants include (1) CNN method using traditional CNs (denoted as CNN), (2) CNN method using DCNs (denoted as DCN-CNN), and (3) wck-CNN framework without using high-order feature information (denoted as wck-CNN-1). In the CN-CNN and DCN-CNN methods, there don’t include the proposed wc-kernel based network construction layer, while use the traditional CNs and DCNs as input of CNN, respectively. Here, the DCNs are constructed using overlapping sliding window method with the window length equal to 70, and the translation step equal to 2. In wck-CNN-1 method, no high-order features are extracted in regional feature layer and brain-network feature layer, i.e., setting \(L_1\) and \(L_2\) to 1.

Table 1. Performance of all methods in NC vs. eMCI vs. lMCI vs. AD classification.

Results:

Experimental results of all methods are summarized in Table 1. As can be seen from Table 1, our proposed method achieves the overall accuracy of \(57.0\%\) for four classes, while the best overall accuracy of competing methods is \(50.0\%\) (by regarding wck-CNN-1 still our method), suggesting the effectiveness of our proposed wck-CNN method. In addition, from Table 1, we can make four interesting observations. First, compared with traditional learning methods (i.e., BL and SVM), CNN-based methods (i.e., CNN, DCN-CNN wck-CNN-1 and wck-CNN) can achieve much higher performance, indicating that CNN can capture the underlying properties of brain networks, and thus can be better applied for brain network analysis. Second, compared with traditional CN-based methods (i.e., BL, SVM and CNN), DCN-based methods (i.e., DCN-CNN, wck-CNN-1 and wck-CNN) can achieve higher accuracies, suggesting that the dynamics of CNs can provide useful clues for better understanding the underpinnings of brain disease pathology, which is consistent with existing studies [4]. Third, the wc-kernel based methods (i.e., wck-CNN-1 and wck-CNN) perform better than conventional DCN based method (i.e., DCN-CNN), further indicating the effectiveness of our defined wc-kernel in conveying the interaction information between brain regions. Finally, the wck-CNN method can achieve higher performance in comparison with wck-CNN-1 method, demonstrating the advantage of exploring high-order information from brain networks.

Fig. 2.
figure 2

The group difference of functional connectivity in the (dynamic) CNs constructed using different methods between AD and NC group. Here, p-values more than 0.05 are set to 1 (denoted as yellow points), wck1,..., wck16 correspond to the DCNs constructed by using the proposed method with 16 different wc-kernels, respectively. CN and DCN correspond to traditional CNs and DCNs constructed by using the overlapping sliding windows method, respectively.

Connectivity Analysis:

Furthermore, we investigate the DCNs constructed by using the proposed wc-kernel based method. Specifically, we construct the DCNs (i.e., the output of connectivity construction layer) for all subjects using the model learned in the first cross-validation. We obtained 16 DCNs for each subject with 16 wc-kernels. For simplicity, we compute the average network for each DCN of each subject. Then, for each wc-kernel, we compute the group difference of functional connectivity in average network using the standard t-test. Figure 2 gives the results (denoted as wck1 to wck16) between AD and NC groups. For comparison, in Fig. 2 we also report the group difference of functional connectivity in the traditional CNs and DCNs constructed by the overlapping sliding windows method, respectively. Here, we threshold the obtained p-values (i.e., setting p-values more than 0.05 to 1) for clarity.

From Fig. 2, we can make three interesting observations for most of the proposed wc-kernel based DCNs, compared with traditional CNs and DCNs. First, there are more discriminative functional connectivities (with the corresponding p-value less than 0.05), indicating these DCNs are more discriminative. Second, there are more obvious patterns. For example, the discriminative functional connectivities focus on connection with specific regions, including lateral surface, parietal lobe, limbic lobe and sub-cortical gray nuclei, which have been widely reported in existing studies [8]. Finally, there are few discriminative functional connectivity among brain regions from the cerebellum, but have a few discriminative functional connectivities between brain regions from the cerebellum and the cerebrum, indicating that the cerebellum might be associated with AD and it may provide useful information for AD prognosis [9].

4 Conclusion

In this paper, we define a novel wc-kernel for characterizing the rich interaction information among brain regions, and propose a unified wck-CNN framework for DCN construction and analysis using fMRI data. Results on 174 subjects with a total of 563 scans from ADNI database demonstrate that our proposed method can not only improve the classification performance compared with state-of-the-art methods, but also provide insights into the interactions of brain activity and their changes in AD.