1 Introduction

Hyper-connectivity brain network is a network where each edge is connected to more than two brain regions, which can be naturally denoted using a hyper-graph. Hyper-connectivity network, either based on structural or functional interactions among the brain regions, has been used for brain disease diagnosis [1]. Functional interactions and structural interactions can be extracted from functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI), respectively [2]. However, the conventional hyper-network, which is constructed solely based on single modality data, ignores the potential complementary information conveyed by other modalities. Integration of complementary information from different modalities has been shown to provide a more comprehensive representation on the brain structural and functional organizations [3, 4]. Inspired by this observation, classification framework based on multimodal brain networks constructed from resting-state fMRI (rs-fMRI) and DTI has been proposed to enhance the classification performance of mild cognitive impairment (MCI) [5].

In this paper, we proposed the first multimodal hyper-connectivity network modelling method that simultaneously considers the information from rs-fMRI and DTI data during the network construction. Specifically, the multimodal hyper-connectivity network was constructed using a star expansion method [6] based on the anatomically weighted functional distance between pairs of brain regions. The anatomically weighted functional distance, which is defined as the strength of the anatomically weighted functional connectivity (awFC) [7], was computed using the complementary information conveyed by the rs-fMRI and DTI data. We then extracted network features from the constructed hyper-connectivity network, and selected the most discriminative features using a manifold regularized multi-task feature selection method (M2TFS) [1]. Finally, we applied a support vector machine (SVM) on the selected features for MCI classification. Promising classification results demonstrated the superiority of the proposed multimodal hyper-connectivity network over the single-modal hyper-connectivity networks which were constructed either from rs-fMRI or DTI data.

2 Materials and Methodology

2.1 Dataset

Ten MCI patients (5M/5F) and 17 normal controls (8M/9F) were included in this study with informed consent obtained from all participants, and the experimental protocols were approved by the institutional ethics board. The mean age for MCI and control groups are 74.2 ± 8.6 and 72.1 ± 8.2 (years), respectively. All the subjects were scanned using a 3.0-Tesla scanner to acquire the rs-fMRI and DTI data. The acquisition parameters for rs-fMRI were as follows: repetition time (TR) = 2000 ms, echo time (TE) = 32 ms, flip angle = 77°, acquisition matrix = 64 × 64, voxel size = 4 mm. One hundred fifty fMRI volumes were acquired. During the scanning, all subjects were instructed to keep their eyes open and stare at a fixation cross in the middle of the screen, which lasted for 5 min. The acquisition parameters for DTI were as follows: b = 0 and 1000 s/mm2, flip angle = 90°, TR/TE = 17000/78 ms, imaging matrix = 128 × 128, FOV = 256 × 256 mm2, voxel thickness = 2 mm, and 72 continuous slices.

2.2 Data Preprocessing

Resting-state fMRI images were preprocessed using Statistical Parametric Mapping software package (SPM8). Specifically, the first 10 fMRI volumes were removed before parcellating the brain space into 116 regions-of-interest (ROIs) based on the automated anatomical labeling (AAL) [8] template. We averaged the fMRI time series over all voxels in each ROI to compute the mean fMRI time series. Prior to constructing the hyper-connectivity network, a temporal band-pass filtering with frequency interval (\( 0.025\, \le \,f\, \le \,0.100\,{\text{Hz}} \)) was applied to the mean time series of each individual ROI to reduce the effects of physiological and measurement noises. Following previous study, global signal regression was not performed due to its controversy in the rs-fMRI preprocessing procedure [9].

Similar to the fMRI preprocessing, DTI images were aligned to the AAL template space using a deformable DTI registration algorithm (F-TIMER) [10] before the parcellating the brain space into 116 ROIs. A whole-brain streamline fiber tractography was then applied on each image using ExploreDTI [11] with the minimal seed point fractional anisotropy (FA) of 0.45, stopping FA of 0.25, minimal fiber length of 20 mm, and maximal fiber length of 400 mm.

2.3 Methods

Anatomically Weighted Functional Distance.

We proposed a novel multimodal hyper-connectivity network modelling method that simultaneously utilizes the information from rs-fMRI and DTI data. Our method is based on the anatomically weighted functional distance which reflects the evidence for the underlying DTI data to supplement the fMRI data as defined below [7]

$$ awFD_{ij} = \left( {1 - \frac{{\pi_{ij} }}{\uplambda}} \right)FD_{ij} $$
(1)

where \( \pi_{ij} \in \left[ {0,} \right.\left. 1 \right) \) is the strength of DTI-based structural connectivity between the brain regions \( i \) and \( j \), \( \uplambda \in \left[ {1,} \right.\left. \infty \right) =\varvec{\varOmega} \) is an unknown parameter that potentially attenuates the anatomically weighting, and \( FD_{ij} \) is the functional distance between the fMRI profiles. Equation (1) explicitly incorporates the brain anatomy for guiding a more accurate inference of the functional connectivity between two brain regions. Following the premise that structural connection is neither a sufficient nor necessary condition for the functional connection [7], a parameter \( \uplambda \) was imposed in Eq. (1) to regulate the contribution of the structural connection especially for the case where no fibers connect two regions. The functional distance between the fMRI profiles of ROIs \( i \) and \( j \) at lag-\( o \) is defined as [7]

$$ FD_{ij} = 1 - \mathop {\hbox{min} }\limits_{o \in O} FD_{ij} \left( o \right) = 1 - \mathop {\hbox{min} }\limits_{o \in O} \left\{ {\frac{{\mathop \sum \nolimits_{t = 1}^{ T - o} \left[ {x_{i} (t + o) - \bar{x}_{i} } \right]\left[ {x_{j} (t) - \bar{x}_{j} } \right]}}{{\widehat{\sigma }_{i} \widehat{\sigma }_{j} }}} \right\} $$
(2)

where \( x_{i} (t) \) denotes the fMRI time series of the ROI \( i \) at time \( t, \) \( T \) is the total number of rs-fMRI volumes, \( \hat{\sigma }_{i} \) and \( \hat{\sigma }_{j} \) denote the standard deviations of samples \( x_{i} \) and \( x_{j} \), \( \bar{x}_{i} \) and \( \bar{x}_{j} \) indicate the sample means of \( x_{i} \) and \( x_{j} \), respectively. For the ease of explanation, we considered only the positive correlation. In view of the potential differences in the hemodynamic responses of resting-state neuronal activity between different brain regions, we estimated the functional distance with a few lagging \( o \) in \( O\, = \,\left[ { - 3,{ 3}} \right] \) and obtained the minimum lag-\( o \) distance [7].

The structural distance, which represents the strength of the DTI-based structural connectivity between pairs of ROIs, is defined as [7]

$$ SD_{ij} = (1 - \frac{{\pi_{ij} }}{\uplambda}) $$
(3)

where \( \pi_{ij} \), which is the average on-fiber FA, denotes the strength of structural connection between ROIs \( i \) and \( j \), and \( \uplambda \) denotes an unknown parameter that potentially reduces the effect of structural data. The indirect structural connections were allowed by defining \( \pi_{ij} = { \hbox{max} }[\pi_{ij} ,{ \hbox{max} }_{l} (\pi_{il} ,\pi_{lj} )] \) [7]. The optimal \( \uplambda \) was determined empirically through minimizing the impact of false positive structural connectivity [7].

Hyper-graph Construction.

We employed a multimodal hyper-graph construction method to estimate the anatomically weighted functional distance. Let \( V \) be the vertex set and \( E \) the hyper-edge set of a hyper-graph \( G \). For the \( n \)-th subject with \( P \) ROIs, a hyper-graph \( G_{n} = \left( {V_{n} ,E_{n} } \right) \) with \( P \) vertices can be constructed with each of its vertices representing an ROI. We employed a star expansion method [6] to generate hyper-edges among vertices. Specifically, for each distance matrix, a vertex was first selected as the centroid vertex and a hyper-edge was then constructed by linking the centroid vertex to its nearest neighbors within \( {\upvarphi }\bar{d} \) distance [6]. Here, \( . \bar{d} . \) is the average anatomically weighted distance between regions and \( {\upvarphi } \), which was set to 0.78 via grid search on training data, is a hyper-parameter controlling the sparsity of the hyper-network. It is noteworthy that the constructed hyper-edges were non-weighted edges.

Feature Extraction and Selection.

Topological properties derived from a hyper- connectivity network provide quantitative measures to effectively study the differences in terms of brain organization between MCI subjects and normal controls (NC). In this study, we extracted three different types of clustering coefficients from the constructed multimodal hyper-connectivity network. Given a multimodal hyper-network \( G = \left( {V,E} \right) \), let \( M\left( v \right) \) be the hyper-edges adjacent to the vertex \( v \), i.e., \( M\left( v \right) = \left\{ {e \in E:v \in e} \right\} \), and \( N\left( v \right) \) the neighboring vertices to \( v \), i.e., \( N\left( v \right) = \left\{ {u \in V:\exists e \in E, u,v \in e} \right\} \). Then, three different types of clustering coefficients [1] can be computed on the vertex \( v \) as

$$ {\text{HCC}}^{1} \left( v \right) = \frac{{2\mathop \sum \nolimits_{u,q \in N(v)} I\left( {u,q,\neg v} \right)}}{{\left| {N\left( v \right)} \right|\left( {\left| {N\left( v \right)} \right| - 1} \right)}} $$
(4)
$$ {\text{HCC}}^{2} \left( v \right) = \frac{{2\mathop \sum \nolimits_{u,q \in N(v)} I'\left( {u,q,v} \right)}}{{\left| {N\left( v \right)} \right|\left( {\left| {N\left( v \right)} \right| - 1} \right)}} $$
(5)
$$ {\text{HCC}}^{3} \left( v \right) = \frac{{2\mathop \sum \nolimits_{e \in M(v)} \left( {\left| e \right| - 1} \right) - \left| {N\left( v \right)} \right|}}{{\left| {N\left( v \right)} \right|\left( {\left| {M\left( v \right)} \right| - 1} \right)}} $$
(6)

where \( u,q,v \in V \) and \( e \in E \), \( I\left( {u,q,\neg v} \right) = 1 \) if there exists \( e \in E \) such that \( u,q \in e \) but \( v \notin e \), and 0 otherwise. \( I^{'} \left( {u,q,v} \right) = 1 \) if there exists \( e \in E \) such that \( u,q,v \in e \), and 0 otherwise. Three types of clustering coefficient features represent the topological properties of the multimodal hyper-connectivity network from three different perspectives. Specifically, the HCC1 denotes the number of neighboring nodes that have connections not facilitated by node \( v \). In contrast, the HCC2 denotes the number of neighboring nodes with connections facilitated by node \( v \), giving that these nodes may share some brain functions with each other and node \( v \). The HCC3 denotes the amount of overlap among adjacent hyper-edges of node \( v \). We jointly selected features from these three types of clustering coefficients using a manifold regularized multi-task feature selection method (M2TFS) defined as [1]

$$ \min_{W} \frac{1}{2}\sum\nolimits_{c = 1}^{C} {\left\| Y \right. - Z^{c} w^{c} \left\| {_{2}^{2} } \right. + \beta } \sum\nolimits_{c = 1}^{C} {\left( {Z^{c} w^{c} } \right)^{T} L^{c} \left( {Z^{c} w^{c} } \right) + \gamma \left\| W \right\|_{2,1} } $$
(7)
$$ S^{c} \left( {n,m} \right) = { \exp }\left( { - \left\| {z_{n}^{c} } \right. - \left. {z_{m}^{c} } \right\|^{2} /h} \right) $$
(8)

where \( Z^{c} = \left[ {z_{1}^{c} , \cdots ,z_{n}^{c} , \cdots ,z_{N}^{c} } \right]^{T} \in R^{N \times P} \) denotes a set of features from a total of \( N \) training subjects, each with \( P \) regions, and \( z_{n}^{c} = \left[ {{\text{HCC}}^{c} \left( {v_{i} } \right)} \right]_{i = 1:P} \in R^{P} \) is the vector of clustering coefficients from the \( n \)-th training subject on task \( c \) (in our case, a task represents selecting features from one type of clustering coefficients), \( Y = \left[ {y_{1} , \cdots ,y_{n} , \cdots ,y_{N} } \right]^{T} \in R^{N} \) is the response vector for those N training subjects, where \( y_{n} \) is the class label for the \( n \)-th training subject. \( L^{c} = D^{c} - S^{c} \) is the combinatorial Laplacian matrix on task \( c. \) \( S^{c} \) is a matrix that describes the similarity on the \( c \)-th task across training subjects, where \( D^{c} \) is a diagonal matrix defined as \( D^{c} \left( {n,n} \right) = \mathop \sum \limits_{m = 1}^{N} S^{c} \left( {n,m} \right) \). \( W = \left[ {w^{1} ,w^{2} , \cdots ,w^{C} } \right] \in R^{P \times C} \) is a weight matrix with \( C \) being the total number of tasks (i.e., \( C = 3 \)), and \( \left\| W \right\|_{2,1} = \sum\nolimits_{i = 1}^{P} {\left\| {w_{i} } \right\|}_{2} \) is the group sparsity regularizer that encourages features from different tasks to be jointly selected. Here, \( w_{i} \) is the \( i \)-th row vector of \( W \). \( \beta \) and \( \gamma \) are the corresponding regularization coefficients. \( h \) is a free parameter to be tuned empirically. The values of \( h, \beta \) and \( \gamma \) can be determined via inner cross-validation on the training subjects.

Classification.

We employed a multi-kernel SVM to fuse three types of clustering coefficient features for MCI classification. Specifically, let \( f_{n}^{c} \) be the selected features from the \( c \)-th task of the \( n \)-th subject. We computed a linear kernel based on the features selected by the M2TFS method for each type of clustering coefficients and then fused them via a multi-kernel technique given as follows:

$$ k\left( {f_{n} ,f_{m} } \right) = \sum\nolimits_{c = 1}^{C} {\mu^{c} k^{c} \left( {f_{n}^{c} ,f_{m}^{c} } \right)} $$
(9)

where \( k^{c} \left( {f_{n}^{c} ,f_{m}^{c} } \right) \) denotes the linear kernel function between the \( n \)-th and \( m \)-th subjects for the \( c \)-th set of selected clustering coefficients, and \( \mu^{c} \) is a non-negative weight coefficient with \( \mathop \sum \limits_{c = 1}^{C} \mu^{c} = 1 \). A coarse-grid search was used to optimize \( \mu^{c} \) through a nested cross-validation on the training subjects.

3 Experiment Results

Due to the limited sample size, we employed a nested leave-one-out cross-validation (LOOCV) scheme to evaluate the performance and generalization power of our proposed method. In the inner LOOCV loop, the training data was used to optimize the parameters \( h \), \( \beta \) and \( \gamma \) that identify a set of the most discriminative features for classification. To determine the weights \( \mu^{c} \) for integrating multiple kernels, we used a grid search with the range [0, 1] at a step size of 0.1.

The proposed method was compared to three single-modal models, i.e., hyper- networks derived either from fMRI or DTI data individually and also hyper-networks constructed from fMRI using sparse representation (fMRI-SR) [1]. Multiple values of the regularization parameter that determines the sparsity level of hyper-networks in the fMRI-SR model were set to [0.1, 0.2, …, 0.9]. As shown in Table 1, the proposed method yielded an accuracy of 96.3%, which is 7.4% better than the second best performed DTI-based hyper-network model. The fMRI-based hyper-network model performed the worst with an accuracy of 74.1%. The area under receiver operating characteristic curve (AUC) was used to evaluate the generalization performance and the proposed method achieved an AUC of 0.98, indicating an excellent generalization performance.

Table 1. Classification performance for four comparison approaches.

As shown in Table 2, there were 11 most discriminative features that were always selected in each LOOCV fold. These brain regions included regions located in the frontal lobes (e.g., left inferior frontal gyrus (triangular) [12] and left rectus gyrus [13]), the temporal lobes (e.g. left temporal pole and middle temporal gyrus [14]), cerebellum, and other regions including hippocampus [14] and occipital gyrus [14]. Our findings are consistent with previous findings that (1) atrophies of regions in the temporal lobe and frontal lobe were found at the early AD [15], and (2) gial accumulation of redox-active iron in the cerebellum was found significant in preclinical Alzheimer’s disease patients [16]. Figure 1 graphically illustrates the significant differences in terms of hypergraph structure between MCI and NC [1]. For example, in Fig. 1(b), the right hippocampus (HIP.R) was connected to the left hippocampus (HIP.L), left thalamus (THA.L), right thalamus (THA.R), right parahippocampal gyrus (PHG.R), right lenticular nucleus (pallidum) (PAL.R) and right cerebellum 3 (CRBL3.R) in MCI, while it was connected to the left hippocampus (HIP.L), left thalamus (THA.L), right thalamus (THA.R), right parahippocampal gyrus (PHG.R), right temporal pole (superior) (TPOsup.R) and right cerebellum 6 (CRBL6.R) in NC. As the hippocampus is highly associated with the memory performance, this pattern of alteration in functional connectivity involving the hippocampus may provide clues on the underpinnings of cognitive deficit in MCI.

Table 2. The most discriminative ROIs that were selected during MCI classification.
Fig. 1.
figure 1

The average degree of hyper-edges for NC and MCI for the 4 brain regions listed in Table 2. Each sub-figure represents a hyper-edge between the corresponding brain region (indicated by the red node) and other nodes. The average degree of hyper-edges for a node is computed from the top d ROIs with the highest occurrence number among all subjects.

4 Conclusion

In this paper, we proposed a novel multimodal hyper-network modelling method for improving the diagnostic accuracy of MCI. The proposed hyper-connectivity network encodes complementary information from multiple modalities to provide a more comprehensive representation on the brain structural and functional organizations. We demonstrated the superiority of our proposed method via MCI classification. Compared to the single-modal method, our proposed method achieved a higher classification accuracy and a better generalization performance. In the future, we will evaluate the performance of the proposed method on larger datasets.