Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging

Kang, Wenjie; Lin, Lan; Sun, Shen; Wu, Shuicai

doi:10.1038/s41598-023-33055-9

Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging

Article
Open access
Published: 07 April 2023

Volume 13, article number 5750, (2023)
Cite this article

Download PDF

You have full access to this open access article

Scientific Reports

Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging

Download PDF

Wenjie Kang¹,
Lan Lin¹,
Shen Sun¹ &
…
Shuicai Wu¹

Abstract

Accurately diagnosing of Alzheimer's disease (AD) and its early stages is critical for prompt treatment or potential intervention to delay the the disease’s progression. Convolutional neural networks (CNNs) models have shown promising results in structural MRI (sMRI)-based diagnosis, but their performance, particularly for 3D models, is constrained by the lack of labeled training samples. To address the overfitting problem brought on by the insufficient training sample size, we propose a three-round learning strategy that combines transfer learning with generative adversarial learning. In the first round, a 3D Deep Convolutional Generative Adversarial Networks (DCGAN) model was trained with all available sMRI data to learn the common feature of sMRI through unsupervised generative adversarial learning. The second round involved transferring and fine-tuning, and the pre-trained discriminator (D) of the DCGAN learned more specific features for the classification task between AD and cognitively normal (CN). In the final round, the weights learned in the AD versus CN classification task were transferred to the MCI diagnosis. By highlighting brain regions with high prediction weights using 3D Grad-CAM, we further enhanced the model's interpretability. The proposed model achieved accuracies of 92.8%, 78.1%, and 76.4% in the classifications of AD versus CN, AD versus MCI, and MCI versus CN, respectively. The experimental results show that our proposed model avoids overfitting brought on by a paucity of sMRI data and enables the early detection of AD.

A Hybrid Deep Learning Framework to Predict Alzheimer’s Disease Progression Using Generative Adversarial Networks and Deep Convolutional Neural Networks

Article 09 June 2023

A multilayered framework for diagnosis and classification of Alzheimer's disease using transfer learned Alexnet and LSTM

Article 07 December 2023

Enhancing magnetic resonance imaging-driven Alzheimer’s disease classification performance using generative adversarial learning

Article Open access 14 March 2021

Introduction

Dementia is a leading cause of disability in people over 65 years old worldwide^1,2. Alzheimer’s disease (AD) is the principal cause of dementia, characterized by memory loss and cognitive decline caused by the deposition of amyloid-β protein^3,4. According to AD International⁵, at least 50 million people worldwide are affected by AD, and this number is expected to triple to 152 million by 2050⁶. Over the last decade, there has been a growing interest in the diagnosis of AD and its preclinical state, mild cognitive impairment (MCI) (a transitional state between CN and AD). Structural magnetic resonance imaging (sMRI) is a non-invasive neuroimaging technology for measuring neural damage and disease progression that has been used in the computer-aided diagnosis of AD and/or MCI.

In recent years, deep learning methods have shown great capacity for many computer vision tasks^7,8. A typical deep learning model, convolutional neural network (CNN), has been widely used in the neuroimaging community, especially in AD classification⁹. Neuroimaging studies usually have a limited amount of data. To address this problem, a lot of scientific research on AD classification^10,11,12 sliced 3D brain volumes into two dimensional (2D) images, adopted a classical 2D CNN pre-trained by natural images as a starting point, and fine-tuned the network through transfer learning. Nanni et al.¹³ applied the pre-trained AlexNet, GoogleNet, ResNet50, ResNet101, and InceptionV3 to sMRI, and obtained the area under the receiver operating characteristic curve (AUC) of 90.8%, 89.6%, 89.8%, 89.9%, and 88.8%, respectively, when comparing AD and CN. In our previous work¹⁴, we improved the performance of 2D CNNs by utilizing multi-slice and multi-model integration. However, these 2D-based approaches ignore the spatial information between slices, making it impossible to fully utilize 3D contextual data.

Compared with a 2D CNN, a 3D CNN can utilize richer spatial 3D contextual information and generate more discriminative features. 3D CNN has been applied to the staging of the AD spectrum. Kong and his colleagues¹⁵ initially trained a 3D sparse autoencoder to learn the filters on randomly chosen 3D patches of the sMRI and then used those pretrained kernels as the first convolution layer of a 3D CNN. Li et al.¹⁶ proposed a hybrid convolutional and recurrent neural network by combining 3D DenseNets and (bidirectional gated recurrent unit) BGRU for AD diagnosis based on hippocampus volumes. 3D DenseNet was then utilized to learn the various local features from image patches, while BGRU was applied to capture the high-level correlation features between the left and right hippocampus. In Li et al.¹⁷’s work, three 3D VGG-like CNNs were applied to capture the features of the 3D hippocampal shapes and asymmetry. The cascaded 2D CNN learned the high-level correlation features between two hippocampi, and the features learned by the asymmetry channel and 2D CNNs were combined with a fully connected layer. Liu et al.¹⁸ constructed a multi-task deep CNN model for jointly learning hippocampus segmentation and AD classification. The features from 3D U-Net and DenseNet were combined for AD classification. Huang et al.¹⁹ proposed a hybrid 3D VGG + support vector machine (SVM) model in which CNN was used to extract features and the SVM was used to obtain classification results based on the extracted features. The model consisted of three branches; each branch was for a binary classification, and three branches were fused for a ternary classification.

Convolutional layers have trainable parameters that are independent of image size. However, the number of trainable parameters in the subsequent fully connected layers depends on the size of the feature map of the last convolutional layer. In 2D CNNs, this is not an issue because convolutional filters learn smaller latent representations from 2D images. However, in 3D CNNs, the latent representations of the last convolutional layer grow in size, increasing the size of the weights that need to be learned in the first fully connected layers and making training more difficult. Despite the fact that 3D CNNs can solve the problem of discontinuity across slices, extending 2D CNNs to 3D CNNs faces significant challenges, such as high memory and computational costs, the curse of dimensionality for high-dimensional data, and the phenomenon of overfitting. 3D CNNs might require a larger training dataset than their 2D counterparts due to the larger number of parameters. Therefore, preventing the overfitting phenomenon during the training process caused by the data scarcity is very important. A possible solution is cross-domain transfer learning. Liu et al.²⁰ aggregated the training dataset from several medical challenges with diverse modalities, target organs, and pathologies and trained a 3D segmentation network. The residual CNN pre-trained in the hippocampal segmentation task was then transferred for AD versus CN classification. However, this strategy has not yet yielded the expected results because of the large differences between different modalities, target organs, and pathologies of medical images.

Generative adversarial networks (GAN)²¹ is an unsupervised deep learning model based on the idea of a zero-sum game. It includes two competing networks: a generative network (G) and a discriminant network (D). The adversarial game between G and D allows the G to generate convincing samples, and the D to have a better feature extraction capability. G and D are pitted against each other and eventually reach a Nash equilibrium. The GAN model defines adversarial goals between the G and the D, and allows the D to better learn the common features of the training images through adversarial learning and feature matching. This could provide an attractive solution to overfitting in 3D CNNs by first using the D network as a common feature extractor and then reusing the D network as a starting point for supervised training. The main contributions of this work are outlined as follows:

A)
We extended 2D Deep Convolutional GANs (DCGAN) into 3D DCGAN and integrated the residual block concept into DCGAN to improve the feature extraction ability.
B)
A three-round learning strategy (unsupervised adversarial learning for pre-training a classifier and two-round transfer learning for fine-tuning the classifier)is proposed to solve the problem of overfitting in 3D CNNs caused by small samples and improve the classification performance in AD staging. To the best of our knowledge, we first introduced the 3D DCGAN in AD classification.
C)
Grad-CAM visualizations are integrated into the system and provide useful explanations of its predictions.

The rest of the paper is organized as follows: Section "Material and methods" introduces the data preprocessing procedures, the three-round training procedure, and a diagnosis model based on a 3D DCGAN. Section "Experiment and results" presents the experimental settings and experimental results of this research. Section "Discussion" discusses the overall results and possibilities for future work, and Section "Conclusions" concludes the study.

Material and methods

Dataset

Data used in the preparation of this article were obtained from the ADNI. The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether MRI, positron emission tomography, other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of MCI and AD^22,23. All relevant tests and methods were performed in accordance with relevant guidelines and regulations, and this study was approved by the ADNI Publications Committee.

1.5 T T1-weighted baseline sMRI scans from 798 participants of the ADNI-1 cohort were included in this study (187 AD, 382 MCI, and 229 CN). The MMSE scores were significantly different between the AD, MCI, and CN groups. Table 1 shows the demographic characteristics of the three groups of participants. MCI subjects were further divided into two subgroups: progressive MCI (pMCI) and stable MCI (sMCI), according to 24-months cognitive assays. Some MCI patients were excluded from further analysis due to incomplete follow-up conversion status, or reversion status. Thus, of the 382 MCI subjects, 138 pMCI and 181 sMCI participants remained in the study of the prediction of MCI converting to AD. The entire data set was randomly split into training, validation, and test sets in a ratio of 7: 1: 2.

Table 1 Demographic information of the subjects in ADNI-1.

Full size table

Image pre-processing

The anatomical MRI scans were reconstructed from the Digital Imaging and Communications in Medicine (DICOM) file and converted to the Neuroimaging Informatics Technology Initiative (NIfTI) format, using dcm2niigui (distributed by MRIcron). Image pre-processing was performed with the SPM12 (http://www.fil.ion.ucl.ac.uk/spm/software/spm) toolbox Computational Anatomy Toolbox 12 (CAT12, http://www.neuro.uni-jena.de/cat/). T1 images were first bias-field inhomogeneity corrected, registered using an initial affine transformation, followed by non-linear deformation. Then, the normalized images were segmented into gray matter (GM), white matter, and cerebrospinal fluid tissue classes and modulated. Before being used for further analysis, the extracted GM density maps (GMDM) were smoothed with a 2.0 mm full width at half maximum (FWHM) Gaussian isotropic kernel. The preprocessed GMDM had the dimension of 121 × 145 × 121 with an isotropic resolution of 1.5 mm. Finally, due to GPU memory limitations, the GMDM were cropped and padded to 128 × 128 × 128 voxels and down sampled to 64 × 64 × 64 voxels with an isotropic resolution of 3.0 mm.

3D DCGAN architecture

An important characteristic of GANs is unsupervised representation extraction from unlabeled data. DCGAN²⁴ is a milestone improvement of the original GAN by building the GAN structure with CNNs. In this work, we have proposed a 3D version of the DCGAN, where D uses four residual blocks to improve the feature representations with low memory usage. In the DCGAN, the G synthesizes T1 weighted sMRI of the whole brain, and the D discriminates between genuine sMRI and synthetic sMRI. Two networks are constantly competing against each other. Competition between the G and D through an adversarial process helps both networks learn the statistical distribution of unlabeled neuroimaging data. When the network reached Nash equilibrium, a two-round transfer learning strategy was applied. The first round of transfer learning is used for AD classification, and the second round of transfer learning is applied for other binary tasks. The flowchart of the 3D DCGAN is shown in Fig. 1.

G takes a latent vector of size 100 drawn from a normal Gaussian distribution as input. The G network yields a synthetic sMRI from random vectors that fits into the genuine sMRI distribution by utilizing up-sampling blocks. The D network discriminates if the input is a genuine sMRI or whether it’s a synthetic sMRI generated by the G network. As for the network architecture of D, we constructed a 3D ResNet-like CNNs by presenting shortcut connections, which can stimulate the training speed and overcome the impact of the vanishing gradient problem. D consists of a convolution block, four residual blocks, and an output block. The residual blocks in D include two different architectures. Residual block1 and block3 are the standard residual blocks, and residual block2 and block4 are bottleneck blocks. The bottleneck structure reduces the amount of calculation by adding a 1 × 1 × 1 convolution layer to the standard residual module to reduce the number of features. A dropout layer was set in the output block to alleviate overfitting. The detailed architectures of G and D are demonstrated in Fig. 2, Tables 2 and 3.

Table 2 The network architecture of G.

Full size table

Table 3 The network architecture of D.

Full size table

Transfer learning

As it has been shown in Fig. 1, the proposed 3D DCGAN was trained using the whole training set, including AD, MCI, and CN. The 3D CNN classifier (D-classifier) shares the same convolution architecture with D before the output layer, which can utilize the supplementary information learned in the training of 3D DCGAN. In order to achieve satisfactory classification performance in AD diagnosis, the procedure of transfer learning was adopted, and the output layer of the pre-trained D was changed to a classification block (three fully-connected layers). During the second-round, the parameters in the convolution block and the first two residual blocks were kept unchanged, while the weights of the rest of the residual blocks were fine-tuned, and the fully connected layers were fully trained. The architecture of the D-classifier in the AD versus CN classification task is shown in Fig. 3a.

In order to utilize the supplementary knowledge learned from the AD versus CN classification task, the third-round procedure was adopted, for which the D-classifier from the AD versus CN classification was transferred to the classifications of AD versus MCI, MCI versus CN, and MCI conversion prediction. During the third-round, the convolution block and the first residual block were frozen, residual blocks were fine-tuned, and the fully connected layers were fully trained. The architecture of the D-classifiers in MCI related binary classification tasks is shown in Fig. 3b.

Mapping disease regions

Making the results more logical and explainable is crucial in many CNN applications related to medical imaging. Gradient-weighted class activation mapping (Grad-CAM) is a very powerful tool for providing an explainable visualization of CNN and could construct the visual clarification for any CNN to learn more about the model's decision-making process. It makes use of the gradient information of the target class, which flows into the last convolutional layer of the second residual block1 and contains the spatial information indicating discriminative regions for classifications. The class activation mapping is generated by concentrating on the specific portion of image discriminative features that the model used for classification. To illustrate the region of interest for CNN and carry out the quantitative analysis, we generated a heatmap by averaging the heatmaps with prediction probabilities larger than 0.7 for each class (real neuroimaging, AD, and CN classes).

Performance metrics

To evaluate the performance of our classifiers, the following metrics were calculated: accuracy, sensitivity, precision, and AUC. These metrics are formulated as follows:

$$Accuracy=\frac{TP+TN}{TP+TN+FN+FP}$$

(1)

$$Sensitivity=\frac{TP}{TP+FN}$$

(2)

$$Precision=\frac{TP}{TP+FP}$$

(3)

where TP, FP, TN, and FN are the numbers of true positives, false positives, true negatives, and false negatives, respectively. AUC is calculated based on the area under the receiver operating characteristic curve.

Ethical approval

The Ethics committees/institutional review boards that approved the ADNI study are: Albany Medical Center Committee on Research Involving Human Subjects Institutional Review Board, Boston University Medical Campus and Boston Medical Center Institutional Review Board, Butler Hospital Institutional Review Board, Cleveland Clinic Institutional Review Board, Columbia University Medical Center Institutional Review Board, Duke University Health System Institutional Review Board, Emory Institutional Review Board, Georgetown University Institutional Review Board, Health Sciences Institutional Review Board, Houston Methodist Institutional Review Board, Howard University Office of Regulatory Research Compliance, Icahn School of Medicine at Mount Sinai Program for the Protection of Human Subjects, Indiana University Institutional Review Board, Institutional Review Board of Baylor College of Medicine, Jewish General Hospital Research Ethics Board, Johns Hopkins Medicine Institutional Review Board, Lifespan—Rhode Island Hospital Institutional Review Board, Mayo Clinic Institutional Review Board, Mount Sinai Medical Center Institutional Review Board, Nathan Kline Institute for Psychiatric Research & Rockland Psychiatric Center Institutional Review Board, New York University Langone Medical Center School of Medicine Institutional Review Board, Northwestern University Institutional Review Board, Oregon Health and Science University Institutional Review Board, Partners Human Research Committee Research Ethics, Board Sunnybrook Health Sciences Centre, Roper St. Francis Healthcare Institutional Review Board, Rush University Medical Center Institutional Review Board, St. Joseph’s Phoenix Institutional Review Board, Stanford Institutional Review Board, The Ohio State University Institutional Review Board, University Hospitals Cleveland Medical Center Institutional Review Board, University of Alabama Office of the IRB, University of British Columbia Research Ethics Board, University of California Davis Institutional Review Board Administration, University of California Los Angeles Office of the Human Research Protection Program, University of California San Diego Human Research Protections Program, University of California San Francisco Human Research Protection Program, University of Iowa Institutional Review Board, University of Kansas Medical Center Human Subjects Committee, University of Kentucky Medical Institutional Review Board, University of Michigan Medical School Institutional Review Board, University of Pennsylvania Institutional Review Board, University of Pittsburgh Institutional Review Board, University of Rochester Research Subjects Review Board, University of South Florida Institutional Review Board, University of Southern, California Institutional Review Board, UT Southwestern Institution Review Board, VA Long Beach Healthcare System Institutional Review Board, Vanderbilt University Medical Center Institutional Review Board, Wake Forest School of Medicine Institutional Review Board, Washington University School of Medicine Institutional Review Board, Western Institutional Review Board, Western University Health Sciences Research Ethics Board, and Yale University Institutional Review Board.

Consent to participate

Informed consent was obtained from all participants included in the study.

Experiment and results

Experiment implementation

All models in this work were deployed in Python 3.7.9 and Keras in TensorFlow 2.4 packages on a workstation with an Intel Xeon W-2223 CPU with 16 GB of RAM and an NVIDIA GeForce RTX 3090 GPU with 24 GB.

In the DCGAN model, binary cross entropy was used as a loss function. In the training phase, the batch size was set to 16, and the parameters were estimated through the Adam optimizer, with initial learning rates of 2 × 10^–3 and 2 × 10^–4 for the G and D, respectively. As training epochs increased, the losses of the validation set from both the G and the D converged to certain constant numbers, and the D’s accuracy hovered around 50%, indicating that the DCGAN had finally approached Nash equilibrium. The DCGAN was trained for 1000 epochs. Figure 4 demonstrates the loss and accuracy of G and D in the validation set. We adopted the procedure of transfer learning for binary classification; the 3D D-classifier was trained using the Adam optimizer with an initial learning rate of 1 × 10^–3 to iteratively fine-tune the weights with the error back-propagation iterative algorithm. The loss of the classifier was computed with weighted binary cross entropy, and the weights were determined by the ratio of samples in the classes.

Impact of the frozen layers

Transfer learning enabled us to skip weight recalculations and upgrades for frozen layers. Due to partial updates of convolutional layer parameters, fine-tuning after transfer learning is less expensive than learning from scratch. Thus, we look at the performance of transfer learning, varying the number of frozen layers. Generally, the first several blocks of the pre-trained D are good for capturing low-level features in the images, such as boundaries, edges, and shapes. The subsequent layers aim at capturing common features that sMRIs have. However, they are not fully effective for specific classification tasks, such as AD diagnosis. If a certain convolutional block of the D-classifier is finely tuned, it can perform more specialized for a particular classification task. In our experiments, we determined the optimal configuration by trial and error. The best performance was achieved when the Conv blocks were frozen up to residual block2, while the other layers were fine-tuned (Table 4).

Table 4 Classification performance of the pre-trained D-classifier with different frozen blocks.

Full size table

Impact of model dimensions

The 3D CNN model can utilize all the information from the 3D sMRIs, while the 2D sliced images can only use some of the information. However, the number of kernels used in fully connected layers is significantly greater in 3D CNNs than in 2D CNNs, and the likelihood of overfitting increases. It is therefore important to assess whether a 2D or 3D model is more appropriate for classifying AD. We thus compared 2D and 3D DC-GAN models. Table 5 shows the overall performance of the proposed method. The proposed 3D DCGAN based model is better than a formerly proposed multi-slice 2D DCGAN based classifier¹⁴ that obtained accuracies of 90.4%, 74.6%, 69.1% and 66.7% for the diagnoses of AD versus CN, AD versus MCI, MCI versus CN, and pMCI versus sMCI, respectively. The proposed method shared the same populations and preprocessing steps as the previous 2D based work. The better performance is due to the use of 3D convolution operations, where the 3D model can better exploit the richer spatial and background 3D information in sMRI.

Table 5 Results of the proposed method in four binary classification tasks.

Full size table

Impact of training strategy

Increasing the convolution dimension from two to three leads to an increase in the number of parameters. The more parameters a 3D CNN must learn, the larger the training data set required to overcome the overfitting problem. To get beyond dataset constraints, training strategy advancements are required. Table 6 shows the results of classification accuracy for three 3D CNN architectures, with two models (VGG-like CNN and D-classifier-like CNN) trained from scratch and one model (D-classifier) using our three-round training procedure. The results demonstrated that the classifier trained with the three-round training procedure yielded better results than those 3D CNNs trained from scratch. An interesting finding was that the simple architecture (VGG-like CNN) led to better testing results, compared to the more complex architecture (residual network based D-classifier-like CNN), as it is less prone to overfitting when trained from scratch. During the training of DCGAN, D focuses on image discrimination and guides G, which focuses on image generation, to create images that have similar visual and statistical features to the training set. Both networks try to learn deep representations from high-dimensional distributions of sMRI data. The combination of unsupervised sMRI feature learning and feature transfer can boost image classification performance with small to medium-sized training samples.

Table 6 Classification performance of three classifiers.

Full size table

Impact of sample size

We also investigated the impact of sample size on classification performance. When keeping the same model design and reducing the training data to two-thirds or one-half of the original size and keeping the test sample size the same, the 3D classifier was tested to see whether an insufficient training set led to overfitting (Table 7). Reducing the training sample size to one-half of the original samples had a relatively small impact on accuracy for 3D CNNs trained from scratch, with a drop of 4.2% and 1.4% for VGG-like and D-classifier-like 3D CNNs, respectively. It is also observed that reduction of the training set has a significant negative effect on the D-classifier, with half of the training data decreasing the accuracy by 9.2%. The D-classifier outperforms other models regardless of the number of samples, and this advantage becomes more and more obvious as the amount of data increases. Taken together, the above results suggest that the proposed learning procedure in the D-classifier is more beneficial for training a robust model when the sample size is small.

Table 7 Classification performance of three models’ comparison with varying sample size for AD versus CN.

Full size table

Model's generalizability

Using the ADNI-2 dataset, we further confirmed the model's generalizability. ADNI-2 and ADNI-1 (1.5 T SPGR and 3 T MPRAGE) can be regarded as separate studies due to major scanner updates. At ADNI-2 sites with 3 T MRI scanners, sMRI images were produced using a 3D MP-RAGE T1-weighted sequence. The test set consisted of baseline scans from 105 CN individuals and 187 AD patients. We have had success with the ADNI-2 dataset; the values for accuracy, sensitivity, and precision were 85.6%, 87.1%, and 90.1%, respectively. In other words, the model put forward in this study has the potential to be used in more extensive research.

Model's interpretability

In order to better understand the model and identify brain regions linked to classification, the 3D Grad-CAM approach was applied in this work. We generated average relevance heatmaps for each class after weight back-propagation of trained models. The heatmaps of intermediate slices from the coronal, cross-sectional, and sagittal planes are shown in Fig. 5. Figure 5a depicts the average heatmap of the DCGAN training set participants. The average heatmaps of the AD patients and CN participants in the AD versus CN classification from the test set are shown in Fig. 5b. As shown in Fig. 5, the heatmaps of real neuroimages obtained by the D model in unsupervised learning have a large degree of overlap in the activation regions when compared with the AD vs. CN heatmaps obtained by the D-classifier in the AD and CN classification tasks. This suggests that the unsupervised pre-training of DCGAN gives the AD classifier a good place to start, and that this will allow the model to converge more quickly and prevent overfitting than if it were trained from scratch. In addition, there are some differences between the heatmaps of the AD and CN groups, with the heatmaps of the AD group focusing more on brain areas known to be associated with AD, such as the inferior and middle temporal gyri, and the hippocampus.

Discussion

When dealing with 3D sMRI, the simplest approach is to treat a 3D volume as a sequence of 2D slices, and apply a pretrained classic 2D CNN to them. However, this would lead to discontinuous predictions across slices. An effective way to avoid it is to expand the CNN from 2D to 3D. In comparison to a 2D CNN pre-trained on ImageNet, sMRI datasets normally are relatively small to train the 3D CNN from scratch. To address this challenge, we train a 3D CNN with a three-round learning procedure: unsupervised sMRI feature extraction followed by two rounds of transfer learning. It is proved by the above experimental results that our proposed method provides a solution by training the models with unsupervised adversarial learning and fine-tune the model for the target problem.

Significant research efforts have been made in recent years to build predictive models of AD dementia based on sMRI data using CNNs. The experimental results of some state-of-the-art approaches for staging the AD spectrum reported in the literature, and the proposed classifier in this paper are shown in Table 8. It includes one 2D CNN based method, and thirteen 3D CNN based approaches (i.e., patch-level, ROI-level, and subject level). It is worth noting that data leakage is a serious problem in AD research⁹, some papers published have used inessential additional information for AD diagnosis (e.g., data leakage)²⁵. The articles that may have had data leakage were excluded from the comparison. Because the training was performed with a varying number of subjects, the results in Table 8 are not fully comparable. Overfitting is more likely to occur when the sample size is small. By roughly comparing the proposed approach with those studies, we obtained two observations. First, in contrast to those methods, our proposed approach was trained on a relatively small sample, which means the model has a relatively higher risk of overfitting. Nonetheless, three-round learning in 3D CNN provided comparable performance to those cutting-edge CNNs, demonstrating the effectiveness of the training procedure. Second, the proposed CNN was highly-performing in differentiating AD from CN, good-performing in differentiating AD from MCI, and MCI from CN, but relatively poor in predicting MCI conversion to AD within 24 months. Compared with AD classification, MCI conversion prediction presents a greater challenge, since structural atrophy caused by MCI may be insignificant. The DCGAN model generated images with similar characteristic features of sMRI, such as the anatomical configuration of GM in GMDM, which mimicked the morphology and anatomy of the brain. Due to the difficulties in 3D training, the synthetic sMRIs normally have a blurred anatomical structural boundary, and contain artificial texture. Unsupervised learning in DCGAN can only capture characteristics commonly shared among sMRI. More effort should be put into the design of GAN architecture in order to capture characteristics in subtle atrophies.

Table 8 Comparison of classification performance of state-of-art studies based on baseline sMRI data of ADNI.

Full size table

The subject number represents the total subject numbers used in training, validation and testing

Our findings should be viewed within the context of several limitations. First, DCGAN extracts informative characteristic features in an unsupervised manner from sMRI, and is capable of synthesizing sMRI from those features. GAN may have the ability to learn the complete distribution of sMRI when given a sufficiently large sample. During DCGAN training, we only used the images from the ADNI-1 dataset, which can't take full advantage of GAN’s unsupervised mechanism. In the future, more neuroimage datasets^34,35,36, even those that do not contain AD subjects, can be added to DCGAN training, which offers a better sMRI feature representation capability accordingly. Second, CNNs are employed for both the G and D models. The extensions of the G and D networks were expected to help the DCGAN capture more relevant features from sMRI. Many advances have been proposed in CNN architectures: the residual block is used to increase network depth, the inception block is used to extract multi-scale features, the dense block is used to improve the direction of information flow, and the self-attention mechanism structure is capable of semantic feature extraction. Replacing the regular convolution layers in G and D with those innovative blocks or structures may improve the network’s ability to extract subtle anatomical features. However, DCGAN maintains the dynamic stability of the training between the G and the D. The better the D is, the more serious the gradient of the G disappears; the convergence of the cost functions may become unstable. Improved designs of GAN, such as least squares GAN (LSGAN)³⁷, Wasserstein GAN (WGAN)³⁸, and energy-based GAN (EBGAN)³⁹ can be adopted to improve the model’s performance and avoid vanishing gradients and mode collapse.

Conclusions

AD is recognized as an irreversible degenerative disease. Recently, deep learning methods, especially 3D CNN, have been used for AD classification in the field of neuroimaging with some success. Insufficient sample size and high-dimensional feature representations are the main challenges of 3D CNN⁴⁰. In this paper, we propose a three-round learning procedure for a DCGAN-based classifier to address the issue of overfitting susceptibility of 3D models. To work on 3D sMRIs, we extend the original 2D DCGAN to 3D by redesigning the architecture; the structure of D was improved by introducing residual blocks to avoid the disappearance of gradient. After unsupervised training of the 3D DCGAN, a feature extractor D for extracting common features of sMRI is obtained, and 3D Grad-CAM shows that it provides a good starting point for AD classification. The two-stage supervised transfer learning strategy accelerated the learning process and extracted more meaningful classification features for AD staging. The heatmaps of the AD group focused more on brain regions known to be associated with AD. The experimental results show that the performance of the 3D CNN suffers when it is trained with a limited number of samples. When the training set is small, the model obtained by the three-round learning procedure has an advantage over the model trained from scratch, and this advantage becomes more and more obvious as the amount of data increases. By using a three-round learning strategy, the problem of overfitting in 3D model training can be alleviated to some extent. The designed model performs comparably to existing state-of-the-art approaches on small to medium-sized datasets, proving its effectiveness.

Data availability

Data used in the study were obtained from the Alzheimer's Disease Neuroimaging Initiative (ADNI), a publicly available database (https://adni.loni.usc.edu) with no accession number.

References

Jia, L. et al. Dementia in China: Epidemiology, clinical management, and research advances. Lancet Neurol. 19, 81–92 (2020).
Article PubMed Google Scholar
Zhao, L. 2020 Alzheimer's disease facts and figures. Alzheimers Dement. (2020).
Hampel, H. et al. The amyloid-β pathway in Alzheimer’s disease. Mol. Psychiatry. 26(10), 5481–5503 (2021).
Article CAS PubMed PubMed Central Google Scholar
Miller-Thomas, M. M. et al. Multimodality review of amyloid-related diseases of the central nervous system. Radiographics 36(4), 1147–1163 (2016).
Article PubMed Google Scholar
Graham, N. Alzheimer’s disease international. Int. Psychogeriatr. 9, 5–6 (1997).
Article CAS PubMed Google Scholar
Kalmet, P. H. S. et al. Deep learning in fracture detection: A narrative review. Acta Orthop. 91(2), 215–220 (2020).
Article PubMed PubMed Central Google Scholar
Kalmet, P. H. S. et al. Deep learning in fracture detection: A narrative review. Acta Orthop. 91(2), 215–220 (2020).
Article PubMed PubMed Central Google Scholar
Chan, H. P., Samala, R. K., Hadjiiski, L. M. & Zhou, C. Deep learning in medical image analysis. Adv. Exp. Med. Biol. 1213, 3–21 (2020).
Article PubMed PubMed Central Google Scholar
Wen, J. et al. Convolutional neural networks for classification of Alzheimer’s disease: Overview and reproducible evaluation. Med. Image Anal. 63, 101694 (2020).
Article PubMed Google Scholar
Aderghal, K., Afdel, K., Benois-Pineau, J. & Catheline, G. Improving Alzheimer’s stage categorization with Convolutional Neural Network using transfer learning and different magnetic resonance imaging modalities. Heliyon. 6, e05652 (2020).
Article PubMed PubMed Central Google Scholar
Lin, W. et al. Convolutional neural networks-based MRI image analysis for the Alzheimer’s disease prediction from mild cognitive impairment. Front. Neurosci. 12, 777 (2018).
Article PubMed PubMed Central Google Scholar
Pan, D. et al. Early detection of Alzheimer’s disease using magnetic resonance imaging: A novel approach combining convolutional neural networks and ensemble learning. Front. Neurosci. 14, 259 (2020).
Article PubMed PubMed Central Google Scholar
Nanni, L. et al. Alzheimer’s disease neuroimaging initiative: Comparison of transfer learning and conventional machine learning applied to structural brain MRI for the early diagnosis and prognosis of Alzheimer’s disease. Front Neurol. 11, 576194 (2020).
Article PubMed PubMed Central Google Scholar
Kang, W., Lin, L., Zhang, B., Shen, X. & Wu, S. Alzheimer’s Disease Neuroimaging I: Multi-model and multi-slice ensemble learning architecture based on 2D convolutional neural networks for Alzheimer’s disease diagnosis. Comput. Biol. Med. 136, 104678 (2021).
Article PubMed Google Scholar
Kong, Z. et al. Multi-modal data Alzheimer’s disease detection based on 3D convolution. Biomed. Signal Process. Control 75, 103565 (2022).
Article Google Scholar
Li, F. & Liu, M. A hybrid convolutional and recurrent neural network for hippocampus analysis in Alzheimer’s disease. J. Neurosci. Methods 323, 108–118 (2019).
Article ADS PubMed Google Scholar
Li, A. et al. Hippocampal shape and asymmetry analysis by cascaded convolutional neural networks for Alzheimer’s disease diagnosis. Brain Imaging Behav. 15(5), 2330–2339 (2021).
Article PubMed Google Scholar
Liu, M. et al. A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer’s disease. Neuroimage 208, 116459 (2020).
Article PubMed Google Scholar
Huang, Z., Sun, M., & Guo, C. Automatic diagnosis of Alzheimer's disease and mild cognitive impairment based on CNN + SVM networks with end-to-end training. Comput. Intell. Neurosci. 9121770 (2021).
Liu, M., Li, F., Yan, H., Wang, K., & Ma, Y., Alzheimer's Disease Neuroimaging I, et al. A multi-model deep convolutional neural network for automatic hippocampus segmentation and classification in Alzheimer's disease. Neuroimage. 208, 116459 (2020).
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., & Ozair, S., et al. Generative adversarial networks (2014). arXiv:abs/1406.2661.
Weber, C. J. et al. The Worldwide Alzheimer’s Disease Neuroimaging Initiative: ADNI-3 updates and global perspectives. Alzheimers Dement (N Y). 7(1), e12226 (2021).
PubMed Google Scholar
Aisen, P. S. Q&A: The Alzheimer’s disease neuroimaging initiative. BMC Med. 9, 101 (2011).
Article PubMed PubMed Central Google Scholar
Radford, A., Metz, L., & Chintala, S. J. C. Unsupervised representation learning with deep convolutional generative adversarial networks (2016). arXiv:abs/1511.06434.
Jo, T., Nho, K. & Saykin, A. J. Deep learning in Alzheimer’s disease: Diagnostic classification and prognostic prediction using neuroimaging data. Front Aging Neurosci. 11, 220 (2019).
Article PubMed PubMed Central Google Scholar
Cui, R. & Liu, M. Alzheimer’s disease neuroimaging initiative: RNN-based longitudinal analysis for diagnosis of Alzheimer’s disease. Comput. Med. Imag. Graph. 73, 1–10 (2019).
Article Google Scholar
Hu, J. et al. Deep learning-based classification and voxel-based visualization of frontotemporal dementia and Alzheimer’s disease. Front. Neurosci. 14, 626154 (2020).
Article PubMed Google Scholar
Huang, Z., Sun, M. & Guo, C. Automatic diagnosis of Alzheimer’s disease and mild cognitive impairment based on CNN + SVM networks with end-to-end training. Comput. Intell. Neurosci. 912, 1770 (2021).
Google Scholar
Lin, W. et al. Alzheimer’s Disease Neuroimaging Initiative: Bidirectional mapping of brain MRI and PET with 3D reversible GAN for the diagnosis of alzheimer’s disease. Front. Neurosci. 15, 646013 (2021).
Article PubMed PubMed Central Google Scholar
Liu, M., Zhang, J., Adeli, E. & Shen, D. Landmark-based deep multi-instance learning for brain disease diagnosis. Med. Image Anal. 43, 157–168 (2018).
Article PubMed Google Scholar
Shen, X., Lin, L., Xu, X. & Wu, S. Effects of patchwise sampling strategy to three-dimensional convolutional neural network-based Alzheimer’s disease classification. Brain Sci. 13(2), 254 (2023).
Article PubMed PubMed Central Google Scholar
Wu, Y., Zhou, Y., Zeng, W., Qian, Q. & Song, M. An attention-based 3D CNN with multi-scale integration block for Alzheimer’s disease classification. IEEE J. Biomed. Health Inform. 26, 5665–5673 (2022).
Article PubMed Google Scholar
Zhang, Z. et al. THAN: task-driven hierarchical attention network for the diagnosis of mild cognitive impairment and Alzheimer’s disease. Quant. Imaging Med. Surg. 11, 3338–3354 (2021).
Article PubMed PubMed Central Google Scholar
Sudlow, C. et al. UK biobank: An open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Article PubMed PubMed Central Google Scholar
Manniën, J. et al. The Parelsnoer Institute: A National Network of Standardized Clinical Biobanks in the Netherlands. Open Journal of Bioresources. 4, 1 (2017).
Article Google Scholar
Ellis, K. A. et al. The Australian Imaging, Biomarkers and Lifestyle (AIBL) study of aging: methodology and baseline characteristics of 1112 individuals recruited for a longitudinal study of Alzheimer’s disease. Int. Psychogeriatr. 21, 672–687 (2009).
Article PubMed Google Scholar
Mao, X., Li, Q., Xie, H., Lau, R. Y. K., & Wang, Z., & Smolley, S.P. Least squares generative adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV) 2017.2813–21.
Arjovsky, M., Chintala, S., & Bottou, L.J.A. Wasserstein GAN (2017). arXiv:abs/1701.07875.
Kumar, R., Goyal, A., Courville, A. C., & Bengio, Y. J. A. Maximum entropy generators for energy-based models (2019). arXiv:abs/1901.08508.
Xu, X., Lin, L., Sun, S., & Wu, S. A review of the application of three-dimensional convolutional neural networks for the diagnosis of Alzheimer's disease using neuroimaging. Rev. Neurosci. (2023) (In Press).

Download references

Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer's Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Funding

This work was supported by grants from National Natural Science Foundation of China (81971683), Natural Science Foundation of Beijing Municipality (L182010), and the Scientific Research General Project of Beijing Municipal Education Committee (KM201810005033).

Author information

Authors and Affiliations

Beijing International Platform for Scientific and Technological Cooperation, Department of Biomedical Engineering, Faculty of Environment and Life Sciences, Beijing University of Technology, Beijing, 100124, China
Wenjie Kang, Lan Lin, Shen Sun & Shuicai Wu

Authors

Wenjie Kang
View author publications
You can also search for this author in PubMed Google Scholar
Lan Lin
View author publications
You can also search for this author in PubMed Google Scholar
Shen Sun
View author publications
You can also search for this author in PubMed Google Scholar
Shuicai Wu
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

W.K.: methodology, investigation, writing—original draft. L.L.: conceptualization, supervision, writing—review and editing, project administration. S.S.: resources, data curation. S.W.: project administration.

Corresponding author

Correspondence to Lan Lin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kang, W., Lin, L., Sun, S. et al. Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging. Sci Rep 13, 5750 (2023). https://doi.org/10.1038/s41598-023-33055-9

Download citation

Received: 29 August 2022
Accepted: 06 April 2023
Published: 07 April 2023
DOI: https://doi.org/10.1038/s41598-023-33055-9
Springer Nature Limited

Three-round learning strategy based on 3D deep convolutional GANs for Alzheimer’s disease staging

Abstract

Similar content being viewed by others

A Hybrid Deep Learning Framework to Predict Alzheimer’s Disease Progression Using Generative Adversarial Networks and Deep Convolutional Neural Networks

A multilayered framework for diagnosis and classification of Alzheimer's disease using transfer learned Alexnet and LSTM

Enhancing magnetic resonance imaging-driven Alzheimer’s disease classification performance using generative adversarial learning

Introduction

Material and methods

Dataset

Image pre-processing

3D DCGAN architecture

Transfer learning

Mapping disease regions

Performance metrics

Ethical approval

Consent to participate

Experiment and results

Experiment implementation

Impact of the frozen layers

Impact of model dimensions

Impact of training strategy

Impact of sample size

Model's generalizability

Model's interpretability

Discussion

The subject number represents the total subject numbers used in training, validation and testing

Conclusions

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation