1 Introduction

Medical imaging technologies have been applied already to brain disease diagnosis, such as Alzheimer’s disease (AD) and Fronto-Temporal Dementia (FTD) since they exploit physical phenomena by creating visual images of both internal tissues and external structure of the human body in a noninvasive manner [4, 5].

Among the enormous imaging clinical methods, resting-state functional magnetic resonance imaging (rs-fMRI) provides rapid identification in functional areas for different patient groups [26]. Therefore, it is an important tool for studying spontaneous functional brain activity in the resting state [27, 28]. Specifically, the brain Functional Connectivity Networks (FCNs) has become a powerful method for measuring and mapping brain activity, they are first constructed based on Blood-Oxygen-Level Dependent (BOLD) signals, and then, diverse machine learning methods are used to characterize the patterns of brain functional activities by employing classifiers [17, 22]. Thus, the existing methods on the rs-fMRI data contain three processes, i.e., FCN creation, feature learning and disorder diagnosis.

FCN creation employs different kinds of relationships to brain regions for describing statistical results of brain neural activities or representing the degrees of correlations between two brain regions [32]. The general methods of FCN creation contain linear ways (e.g., partial coherence/correlation or Pearson correlation) and non-linear ways (e.g., mutual information) [44]. Feature learning focuses on searching the informative features, i.e., deep representations, and semantic information, from FCNs of each subject [33]. Disorder diagnosis employs different classifiers to verify the performance of a learning subset of features, such as the Support Vector Machine (SVM) and the decision tree [16, 18]. However, the traditional machine learning methods separately conduct feature learning and disorder diagnosis. Therefore, the selected optimal features might not obtain the best classification performance in disorder diagnosis as well as the subset of features corresponding to the best classification result may not be the optimal ones, resulting in the sub-optimal problem in the two parts by each other [51].

Recently, deep learning methods have been widely used for disorder diagnosis by the reason of dealing with high-level features [40]. In general, convolutional neural networks (CNNs) can only handle data with a grid structure, but non-grid structure data are the mainstream representation in the real world. Moreover, the correlations of each sample (i.e., brain region or subject) are not considered in CNNs and that is also beneficial to feature learning. Thus, graph neural networks (GNNs) have become an effective way since the representative features are learned from both the original data and the neighbouring nodes [24, 31]. Additionally, GNNs can combine feature information and structural information in feature learning to effectively improve model performance [1, 25]. For rs-MRI data classification, the general methods of GNNs can be divided into two parts, the individual graph model which considers the unique information of each subject and the population graph model which exploits the common message across subjects [17]. However, both of them cannot conduct a comprehensive consideration of local information and global information. Hence, the joint framework by combining the individual graph model with the population graph model has become the indispensable way [50]. For example, Zhou et al. proposed an individual graph model to search the important features by the functional connective network to each subject and conducted a population graph model to exploit the representative features by node (i.e., each subject) and edge (i.e., phenotypic information) [49]. Although the graph is used to guide convolution operations in the GNN model and plays an important role in feature extraction, some issues can be found. For one issue, the above procedure of graph construction methods is independent of classification tasks, and fixed graphs are applied to the entire network but do not obtain the underlying structure of nodes features in different layers very well, resulting in the model performance reduction. The other issue is that it uses local features from each subject to reconstruct global features across subjects, resulting in the sub-optimal solution of utilizing local features for global feature learning. Meanwhile, the initial global features across subjects also provide important information that is not considered.

To solve the aforementioned issues, we develop a Hierarchical graph learning with convolutional network framework by considering both local regions of the brain and global information about subjects. Specifically, one graph model is conducted to consider individual brain function networks, called the individual graph and the other graph model is designed based on the entire population network, called the population graph. The former model learns a node representation of each brain region via the GCN and uses a graph to obtain the graph-level features of each subject. Then, the latter model further updates the graph data embedding of each node by aggregating the representations of its neighbours and itself. It is used to process correlations between subjects. Moreover, the important contributions of this paper are summarized in the following.

  • We develop a unified framework with the capability of processing both local regions of the brain and the global information of subjects, and can effectively learn high-level embeddings of brain network representations at both node-level and graph-level.

  • The graph learning in this paper is dynamically updated in the GNN for each subject, enabling better graph representations to be obtained. Also, the optimal graph representation is obtained for the population graph model.

  • Extensive experiments have been conducted in three real-life medical clinical applications, and the results indicate that learning network embeddings from correlations between population networks and individual brain networks can improve predictive performance.

The flow of the next parts can be viewed as the related work is shown in Section 2. The details of the proposed method are discussed in Section 3. Then, the procedure of experiments and the process of ablation study are described in Sections 4 and 5, respectively. Lastly, a conclusion is given in Section 6.

2 Related work

2.1 Brain connectivity analysis (SPBA)

The three main types of analysis for brain connectivity are seed point-based analysis (namely SPBA), independent component analysis (namely ICA), and graph theory.

2.1.1 Seed point-based analysis

Seed point-based analysis is essentially a model-based approach where a seed point or region of interest (ROI) is selected and the linear correlation of that seed point region with all other voxels throughout the brain is found, resulting in a seed-based Functional Connectivity map [34]. The straightforwardness and interpretability of the technique make it an adequate method for studying rs-fMRI FC. However, as the technique is fully dependent on user-specified ROIs, it is not easy to detect functional connectivity throughout the brain using this method [35].

2.1.2 Independent component analysis (ICA)

The human brain consists of an extensive network of neurons that produce fluctuations in both low-frequency and high-frequency [28]. Then, rs-fMRI depends on spontaneous low-frequency fluctuations (Less than 0.1 Hz) from anatomical regions of a network that are spatially separated from each other, functionally connected and in constant communication [37]. The rs-fMRI signals we extract from our subjects are composite signals containing the signals of interest and other extra artefacts. This analysis method uses mathematical algorithms to decompose the signals from entire brain voxels into temporally and spatially independent components that help to extract different rs-fMRI networks efficiently [13].

2.1.3 Graph theory

Graph theory in human neuroscience is to build mathematical models of the function of complicated networks in the human brain [9]. The neural networks have associations between various regions and sub-regions of the brain, and the dynamic connections between the networks form a larger single neural network [12]. A graph theory method focuses on the relationship between nodes and their edges, which can be expressed as \(\mathcal {G} = ({\textbf{V}}, {\textbf{E}})\), where \({\textbf{V}}\) is the set of nodes and \({\textbf{E}}\) is the edge that connects those nodes. The application of graph theory in brain FC analysis can be characterised by different graph-theoretic metrics to demonstrate different aspects of connectivity [14]. These include the average path length, the clustering coefficient, the degree of a node, the centrality measures, and the level of modality.

The summary of brain connectivity analysis with SPBA, ICA, and graph theory with the latest and informative literature can be found in Table 1.

Table 1 The summary of brain connectivity analysis

Compared with seed-based analysis a single correlation between the seed region and the entire brain can be found and independent component analysis that voxel-to-voxel interactions across several different networks in the brain can be searched [36], graph theory focuses on the topological properties of the seed points in the brain or in the neural network associated with a particular function. Segregation and integration are two means by which neural networks are represented because the brain operates in this way. Functional integration views the brain as a large neural network of interactions that integrates different neural networks of the brain to collaborate on specific functions, whereas segregation implies connections within the various networks of the brain. Therefore, graph theory is one of the useful techniques in inspecting the integration and separation of brain neural networks.

2.2 Graph neural networks

By combining graph broadcast operations and deep learning algorithms, graph neural networks allow both structural and vertex attribute information to be involved in learning. In this way, good results and interpretability have been shown in applications, such as vertex classification, graph categorisation and disease prediction, and have become a widely used method for graph analysis.

2.2.1 Individual graph model on GNN

Given a set of \(\mathcal {G}i = (\textbf{X}i, \textbf{A}i)\), individual graph models usually input all \(\mathcal {G}i\) into the same GNN model in sequence [31]. For example, Jiang et al. [22] proposed a hierarchical graphical convolutional network to learn feature embeddings of graphs while considering network topological information and associations between subjects. Zhou et al. [49] proposed a graphical convolutional network that processes both subject-level information and area-level information in the brain. Then, it learns the individual feature of each subject and classifies each subject by employing the classifier. However, two issues can be found in the following, (i) the same element (i.e., the correlation between two brain regions) in different subjects has the correlation by similar label (e.g., patient or healthy control); (ii) the features across subjects are also important to the final classification task.

2.2.2 Population graph model on GNN

Given a set of BOLD signals, the population graph model calculates multiple FCNs of all subjects and extracts FCNs’ upper triangle part to obtain a feature matrix. Following this, the traditional methods conduct feature extraction or feature selection models to search the important features for the classification task, or design variant CNN models to diagnose disorders on brain region images [17, 39, 46]. Farouk et al. [10] argued that methods based on deep neural network could produce more accurate feature representations compared to traditional methods using shallow learning. Zhang et al. proposed a residual CNN model to diagnose Alzheimer’s disease in an end-to-end approach by considering the global, the local features and the spatial features [46]. However, the existing node classification methods either conduct separate feature selection and classifier, which may result in the method may not consider the correlations between the brain regions which is important to model construction and brain region selection.

3 Proposed method

In this section, the graph convolutional network method will be reviewed first and then introduce our proposed method in detail.

3.1 Graph convolutional network

The most important aspect of deep learning is feature learning, which can automatically discover potential high-level information from high-dimensional neuroimaging data. It aims to obtain hierarchical feature information in a hierarchical network, solving the important challenge of needing to design features manually [6]. Although CNNs are involved in many tasks yet, convolutional neural networks can only operate on regular Euclidean data (e.g. images and text) [11]. In reality, most data are often obtained from non-Euclidean domains and need to be analysed effectively, such as the relational networks of social networks, traffic networks and chemical molecules. This data can be better represented by graphs [31]. Therefore, we adopt the spectral approach to define graph convolution by using a well-defined locus operator on graphs.

Fig. 1
figure 1

The framework of the proposed method for brain disease prediction

The spectral method treats the attributes of the node as signals of the graph and performs a convolution operation on the spectrum of the graph (i.e. the singular values of the graph Laplacian) directly. The convolution of the spectrum of the filter \(g\theta = diag\left( \theta \right)\) in the Fourier domain is as follows:

$$\begin{aligned} \begin{array}{l} g\theta *{\textbf{x}} = {\textbf{U}}g\theta \left( \varvec{\Lambda } \right) {{\textbf{U}}^T}{\textbf{x}}, \end{array} \end{aligned}$$
(1)

where \({\textbf{x}} \in {{\mathbb {R}}^N}\) is the signal , and \(\textbf{x}\) is the eigenvector corresponding to each vertex on the graph. \(\theta \in {{\mathbb {R}}^N}\) are the parameters, \(*\) denotes the graph convolution operation. \({\textbf{U}}\) and \(\varvec{\Lambda }\) represent the singular vectors and singular values of the graph Laplacian \({\textbf{L}} = {{\textbf{D}}^{ - 1/2}}\left( {{\textbf{D}} - {\textbf{A}}} \right) {{\textbf{D}}^{ - 1/2}}\), respectively, where \({\textbf{D}}\) is a diagonal matrix and \({\textbf{A}}\) is a adjacency matrix.

However, the computational complexity of the singular value decomposition is too high to apply for the large-scale data. Thus, Defferrard et al. [6] proposed an approximated solution to the spectral filter with Chebyshev polynomials as follows:

$$\begin{aligned} \begin{array}{l} g\theta *\textbf{x}=\sum \nolimits _{p=0}^{P}{\theta _{p}^{'}{{T}_{p}}\left( \textbf{L} \right) \textbf{x}}, \end{array} \end{aligned}$$
(2)

where \({T_p}\) and \({\theta _p}\) are the Chebyshev polynomials and coefficients, respectively. Kipf et al. [25] further simplified the Chebyshev map convolution as:

$$\begin{aligned} \begin{array}{l} g\theta *{\textbf{x}} = \theta \left( {{\textbf{I}} + {{\textbf{D}}^{ - 1/2}}{\textbf{A}}{{\textbf{D}}^{ - 1/2}}} \right) {\textbf{x}} \end{array} \end{aligned}$$
(3)

By constraining the first-order Chebyshev polynomial and let the maximum singular value be equal to two, where \({\textbf{I}}\) denotes an identity matrix. Moreover, by defining \({\tilde{\textbf{A}}} = {\textbf{A}} + {\textbf{I}}\), \({\tilde{\textbf{D}}}\) to be a diagonal matrix of \(\tilde{\textbf{A}}\), with the diagonal elements being the column sums of the matrix \(\tilde{\textbf{A}}\). If the signal has one input channel and one spectral filter, the convolution equation is given by:

$$\begin{aligned} \begin{array}{l} {{\textbf{H}}^{\left( {l + 1} \right) }} = \mathrm{{ReLU}}\left( {{{{\tilde{\textbf{D}}}}^{ - 1/2}}{\tilde{\textbf{A}}}{{{\tilde{\textbf{D}}}}^{ - 1/2}}{{\textbf{H}}^{\left( l \right) }}{\varvec{\Theta }}} \right) , \end{array} \end{aligned}$$
(4)

where \({\textbf{X}} \in {{\mathbb {R}}^{N \times C}}\), \(\nleq \,\in {{\textbf{R}}^{F \times C}}\) is a filter parameter matrix, \({\textbf{H}}\) is the feature matrix of each layer and \({{\textbf{H}}^{\left( 0 \right) }} = {\textbf{X}}\), \({\text {ReLU}}\left( \cdot \right)\) is a non-linear activation function. The final output layer \({\textbf{Z}}\) is defined as follows:

$$\begin{aligned} \begin{array}{l} {\textbf{Z}} = {\text {softmax}}\left( \overset{\wedge }{\textbf{A}} \mathrm{{ReLU}}\left( \overset{\wedge }{\textbf{A}}{\textbf{X}}{{\varvec{\Theta }}^{\left( 0 \right) }} \right) {{\varvec{\Theta }}^{\left( 1 \right) }} \right) , \end{array} \end{aligned}$$
(5)

where \({\overset{\wedge }{\textbf{A}} = {{\tilde{\textbf{D}}}^{ - 1/2}}{\tilde{\textbf{A}}}{{\tilde{\textbf{D}}}^{ - 1/2}}}\). There are also many graph convolution network models based on spectral methods, for example, Defferrard et al. [6] proposed Chebyshev graph convolution by fitting a convolution kernel with Chebyshev polynomials. Bruna et al. [2] proposed a method to define the graph convolution based on the theory of eigenvalue decomposition of graph Laplacian matrices in the Fourier domain.

The diagnosis of functional brain networks is a classic graph classification problem. It takes the brain neural network as an input and predicts the corresponding label (i.e., clinical state). Therefore, a new graph neural network framework is proposed to process both local regions of the brain and global information of subjects. The framework consists of two parts, a graph for modelling individual brain function networks, called the individual graph. The other is a graph applied to the whole population network, called the population graph. Figure 1 presents the diagram of the overall framework of the proposed method.

3.2 The individual graph model

In the individual graph model, multiple layers of the graph convolutional network are stacked. After the convolution layer, the average pooling operator generates a coarsened graph globally, which summarises sub-graph information while exploiting the sub-graph structure [22]. In addition, the pooling layer enables the graph convolutional network to reduce the total number of parameters by reducing the size of the representation, thus avoiding overfitting. The output layer resolves the node representation of each graph to a single graph representation.

Given each subject, define a graph \({G_n} = \left\{ {{{\textbf{X}}_n},\mathrm{{ }}{{\textbf{A}}_n}} \right\}\), where \({{\textbf{X}}_n} = \left\{ {d_i^1,...,d_i^n} \right\}\), n is the number of ROIs, and \({{\textbf{A}}_i} \in {{\mathbb {R}}^{n \times n}}\) is the adjacency matrix to represent the network connectivity of the i-th subject, and each graph is obtained by K-nearest-neighbor way. Each node embedding in \({{\textbf{X}}_i}\) is learned in the GCN training phase. Thus, in the proposed individual graph model, each brain region takes into account information from neighbouring brain regions of the same subject [49].

We use a two-layer GCN and the individual graph model is defined as:

$$\begin{aligned} \begin{array}{l} {{\textbf{H}}_n} = \sigma \left( {{{\textbf{A}}_n}\sigma \left( {{{\textbf{T}}_n}({{\textbf{A}}_n}{{\textbf{X}}_n}{{\textbf{W}}_1})} \right) {{\textbf{W}}_2}} \right) , \\ \textbf{A}_n(i,j) = \frac{exp(ReLU(\textbf{p}^T [\textbf{X}_n(i)\textbf{W}, \textbf{X}_n(j)\textbf{W}]))}{\sum \nolimits _{k \in \mathcal {N}(i)} exp(ReLU(\textbf{p}^T [\textbf{X}_n(i) \textbf{W}, \textbf{X}_n(k) \textbf{W}]))}, \end{array} \end{aligned}$$
(6)

where \({\textbf{W}_1}\) and \({\textbf{W}_2}\) represent the weight matrix in various convolutional layers, and \(\sigma\) denotes the activation function. \({\textbf{H}}\) represent the learned feature matrix. \({\textbf{T}}\) can dynamically select useful brain regions for each subject. Each element in \(\textbf{A}_n\) is denoted by \(\textbf{A}_n(i,j)\), \(\textbf{p}\) is a learned weight vector, \(\mathcal {N}(i)\) denote the set of nearest neighbors about the node i.

3.3 The population graph model

In the population graph model, its purpose is to search the representative features from both the graph matrix and the feature matrix. While we obtain a set of feature matrices {\(\textbf{H}_1\),...,\(\textbf{H}_n\),...,\(\textbf{H}_N\)} and graph representation {\(\textbf{A}_1\),...,\(\textbf{A}_n\),...,\(\textbf{A}_N\)} from individual graph model, we plan to exploit the informative features from them. Hence, we have

$$\begin{aligned} \begin{array}{l} \textbf{H} = MLP(\textbf{H}_1,...,\textbf{H}_N),\\ \textbf{S} = Mean(\textbf{A}_1,...,\textbf{A}_N), \end{array} \end{aligned}$$
(7)

where the fused feature matrix \(\textbf{H}\) and the fused graph matrix \(\textbf{S}\), respectively, consider the neighbouring information across subjects and the correlations information across brain regions of different subjects.

Then, we employ a two-layer GCN to search important features, and we obtain

$$\begin{aligned} \begin{array}{l} {\textbf{F}} = \sigma \left( {{\textbf{S}}\sigma \left( {{{\textbf {SH}}}{{\varvec{\Theta }}_1}} \right) {{\varvec{\Theta }}_2}} \right) , \end{array} \end{aligned}$$
(8)

where \({\varvec{\Theta }}\) is the weight matrix, and \(\textbf{F}\) is the learned features across brain regions and across subjects at the same time.

3.4 The unified model

After obtaining the features \({\textbf{F}}\), the final diagnostic features can be obtained by

$$\begin{aligned} \begin{array}{l} {{\textbf{F}}^*} = \sigma \left( {\mathrm{{MLP}}\left\{ {{\textbf{F}}} \right\} } \right) , \end{array} \end{aligned}$$
(9)

where \(\sigma (\cdot )\) denotes the activation function which is Softmax for multi-class classification or Sigmoid for binary classification.

Moreover, the cross-entropy loss is employed for the final classification task as follows:

$$\begin{aligned} {\text {Loss}} = - \sum \limits _{i \in {Y}} {\sum \limits _{j = 1}^c {{\textbf{Y}}\ln {{\textbf{F}}^*}} }, \end{aligned}$$
(10)

where Y is a set of labeled nodes and \({\textbf{Y}}\) represents the real label.

For individual graph model and population graph model, the computational complexity is (O(n)+O(2n)) and O(N), respectively. Then, the total computational complexity is T*(N*O(3n)+O(N)), where T is the number of epochs, N is the number of subjects, and n is the number of brain regions.

4 Experiments

4.1 Dataset

To assess the validity of the proposed model, we conducted experiments on three datasets, namely FrontoTemporal Dementia (FTD), Obsessive-Compulsive Disorder (OCD), the Alzheimer’s Disease Neuroimaging Initiative (ADNI), and Autism Brain Imaging Data Exchange (ABIDE).

  • The FTD dataset contains 95 FTD subjects and 86 age-matched healthy control subjects from the sourceFootnote 1.

  • The OCD dataset have 20 healthy control subjects and 62 OCD subjects from the hospital [7].

  • The ADNI dataset includes 59 Alzheimer’s disease subjects and 48 healthy control subjectsFootnote 2.

  • The ABIDE dataset contains 1029 subjects with functional magnetic resonance imaging data from ABIDE-I and ABIDE-II datasets, including 485 ASD patients and 544 healthy control subjectsFootnote 3.

For a clear demonstration, the summary of datasets can be found in Table 2. Then, For details of the data acquisition process, please refer to [29] and [33].

Table 2 The summary of all datasets

4.2 Setting

We ran all experiments on a server with 8 NVIDIA GeForce 3090 GPU and implemented them in PyTorch. We obtained author-provided code for all our comparison methods and followed the settings for parameters recommendations in the associated literature to guarantee that all our comparison methods perform optimally on each dataset. In addition, given the initial graph, training/test partitioning, network dimensions and training procedures, all methods including comparison methods and our proposed method use the same settings.

In all experiments, training/testing data is split by 5-fold cross-validation and the experiments are repeated 5 times with random seeds. The average results with corresponding standard deviation (std) are reported for all methods. We selected 30% of the entire dataset as the marker samples randomly in the training set, For the training process of the Adam optimizer, the maximum number of epochs is set to 500, and the initial learning rate and weight-decay are set to 0.01 and 0.0005, respectively [19]. Four metrics were evaluated for the diagnostic results of all methods, including accuracy (ACC), specificity (SPE), sensitivity (SEN) and area under the subject operating characteristic curve (AUC).

4.3 Comparison methods

We employed seven comparison methods, the details of all methods can be found in the following:

  • High-order Functional Connectivity (HFC) uses higher-level dynamic interactions between brain regions for the diagnosis of early mild cognitive impairment [47].

  • Strength and Similarity Group Sparse Representation (SSGSR) integrates low-level and high-level functional connectivity to accurately guide the modelling of the brain network function [48].

  • Graph Convolutional Networks (GCN) generates a new node representation through aggregated node information by utilizing the edge information of connected nodes [25].

  • Deep Iterative and Adaptive Learning (DIAL) is an end-to-end graph learning framework for learning graph structure and graph embedding simultaneously. It iteratively updates the new graph [3].

  • Simplify Graph Convolutional networks (SGC) iteratively removes non-linearities between GCN layers and compressing the resulting function into a single linear transformation to reduce the complexity of the network [42].

  • Interpretable Brain Graph Neural Network (BrainGNN) is a graph neural network framework to analyse the functional magnetic resonance images and discover the neurological biomarkers [30].

  • Hierarchical Graph Convolution Network (HiGCN) designs an intra-subject GCN to explore the informative feature vector and an inter-subject GCN to obtain the important feature for disease diagnosis [20].

For all comparison methods, they contain four kinds of categories, the shallow learning-based methods (i.e., HFC and SSGSR) consider the shallow features for disease diagnosis. The single population graph models (i.e., GCN, DIAL, and SGC) consider feature matrix and graph representation across subjects to learn important features for the classification task. The single individual graph approach (i.e., BrainGNN) considers the local information of each subject and combining with all local features for the final diagnosis task. The unified model (i.e., HiGCN) first conducts an individual graph module to obtain local information, and then, utilizes the combined features and the fixed graph representation to achieve important features for the classification task.

Table 3 All evaluation metrics (%) of different methods on the dataset FTD
Table 4 All evaluation metrics (%) of different methods on the dataset OCD

4.4 Experimental results

Tables 3, 45 and 6 showed the disease diagnostic performance of the different methods on three real neurological disease datasets. From all tables, we obtained that the proposed method outperforms the comparison methods, and followed by BrainGNN, HiGCN, DIAL, SSGSR, GCN, SGC, and HFC in terms of four evaluation metrics. For example, the proposed method averagely increased by 1.22%, 1.89%, 1.47%, and 0.89% compared to the best comparison method (i.e., BrainGNN), on the dataset FTD, OCD, ADNI, and ABIDE with all evaluation metrics. Likewise, the proposed method on average improved by 1.77%, 1.86%, 2.05%, and 0.98%, respectively, compared to the best shallow learning method (i.e., SSGSR) with regard to ACC, SEN, SPE, and AUC on four datasets. The main reason is that the proposed method searches both local information of each subject and global information across subjects at the same time, and learns the common graph representation across subjects at the same time. Moreover, the proposed method exploits the potential deep learning-based feature representation, which can obtain better performance than the shallow learning-based features, resulting in better classification results.

Table 5 All evaluation metrics (%) of different methods on the dataset ADNI
Table 6 All evaluation metrics (%) of different methods on the dataset ABIDE
Table 7 Ablation analysis of our method on four datasets (FTD, OCD, ADNI, ABIDE)

5 Ablation study

The Ablation study was conducted to demonstrate the necessity and the effectiveness of the technologies of the proposed method, and it mainly contains four aspects, i.e., the population graph model, dynamic graph representation, the individual graph model, and the selected brain regions by learnable weight.

Fig. 2
figure 2

Visualization of top selected brain regions by our proposed method on FTD, OCD, ADNI, and ABIDE

5.1 Effectiveness of the population graph model

We then remove the population graph model (namely w/o PopulationGM) from our framework to demonstrate its effectiveness. In Table 7, compared to our method, the performance of w/o PopulationGM is poorer on all datasets at different label ratios. The reasons could be that the population graph model is essential for learning individual features for each subject and improving the performance of personalized diagnosis.

5.2 Effectiveness of the dynamic graph representation

We remove the dynamic graph representation (namely w/o DynamicGM) from our framework to demonstrate its effectiveness. As shown in Table 7, compared to our method without DynamicGM, our method improved by 12.14%, 16.68%, 15.55%, and 4.68% in terms of all label rates on the dataset FTD, OCD, ADNI, and ABIDE. The reason could be that the updated graph representation of each subject can exploit the important brain regions in the population graph model which is essential for accessing the graph structure dynamically.

5.3 Effectiveness of the individual graph model

Table 7 shows the classification performance of our proposed method and the comparison methods with different label rates (i.e., 10%, 20% and 30%). After evaluating the effectiveness of the individual graph model (namely w/o IndividualGM), we can see that the proposed method is superior to w/o IndividualGM, and the results increased by 4.89% with regard to all label rates on all datasets. The reason might be that the individual graph model can extract relationships among subjects, and successfully capture important information for brain diseases diagnosis.

5.4 Effectiveness of selected specific brain regions

To further verify the importance of selecting specific brain regions \({\textbf{T}}\) (SelectSBA), we conducted experiments that removed from our framework (namely w/o SelectSBA) on all datasets. As shown in Table 7, the results of our proposed method are better than that of w/o SelectSBA on disease classification task, which indicates that selecting specific brain regions \(\mathbf{{T}}\) plays a vital role in improving the classification effectiveness.

Then, the visualization of top brain regions by the proposed method on all datasets can be viewed in Fig. 2. The proposed method selected 12, 13, 14, and 25 brain regions on the dataset FTD, OCD, ADNI, and ABIDE, respectively.

6 Conclusion

In this article, we have proposed a new hierarchical graph learning with convolutional network framework for brain disease diagnosis, which can process both local brain regions and subject information as a whole. The framework consists of two parts including the individual graph model that learns the node representation of each brain region through GCN and uses a graph to capture the graph-level features of each subject, and the population graph model that further updates the graph data by embedding each node through the aggregation of representations of its neighbours and itself. Experimental results show that our proposed method surpasses the comparison algorithms on three real data sets. In future work, we will improve our model as follows. Aiming to the data, the characteristics of rs-fMRI may not be considered comprehensively, that is, the time series information and the correlations between time series will be considered in the future model. Paying attention to the model structure, each subject has private information and common information related to different diseases, so our future model will optimize the model structure and search the private information and common information across subjects and across similar diseases. Trying different kinds of graph representations and features to improve the accuracy of brain disease prediction, and focusing on the interpretability of each brain region of each subject and of each subject at the same time.