Abstract
Early detection of Alzheimer's disease (AD) is critical due to its rising prevalence. AI-aided AD diagnosis has grown for decades. Most of these systems use deep learning using CNN. However, a few concerns must be addressed to identify AD: a. there is a lack of attention paid to spatial features; b. there is a lack of scale-invariant feature modelling; and c. the convolutional spatial attention block (C-SAB) mechanism is available in the literature, but it exploits limited feature sets from its input features to obtain a spatial attention map, which needs to be enhanced. The suggested model addresses these issues in two ways: through a backbone of multilayers of depth-separable CNN. Firstly, we propose an improved spatial convolution attention block (I-SAB) to generate an enhanced spatial attention map for the multilayer features of the backbone. The I-SAB, a modified version of the C-SAB, generates a spatial attention map by combining multiple cues from input feature maps. Such a map is forwarded to a multilayer of depth-separable CNN for further feature extraction and employs a skip connection to produce an enhanced spatial attention map. Second, we combine multilayer spatial attention features to make scale-invariant spatial attention features that can fix scale issues in MRI images. We demonstrate extensive experimentation and ablation studies using two open-source datasets, OASIS and AD-Dataset. The recommended model outperforms existing best practices with 99.75% and 96.20% accuracy on OASIS and AD-Dataset. This paper also performed a domain adaptation test on the OASIS dataset, which obtained 83.25% accuracy.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Loss of brain tissue and nerve cells leads to Alzheimer's disease (AD), a terrible, non-transmittable neurological condition. Such death of cells is caused by pairs of TAU proteins that stabilise the microtubules, resulting in improper guidance or blockage of the molecules and essential nutrients to the axons and causing dementia. The impacts of dementia on a person's thinking, acting, and other social abilities. Basically, there are four different stages of dementia caused by AD: non, very mild, mild, and moderate. When AD mutates from very mild to moderate, it severely exacerbates its impact on heart muscles or the respiratory system, which can even lead to the death of the person. The incidences of AD have been increasing significantly over the past few decades all across the world. Recently, the WHO [1] reported that 60% to 70% of patients suffer from AD out of the 55 million identified dementia patients. Several studies have been carried out to diagnose AD using different biomarkers such as blood samples, MRI scans, CT, or PET reports. Although there is currently no cure for AD, early-stage treatment with the right drug can lessen the disease's severity. Therefore, it is crucial to find AD early to improve patients' quality of life.
The vast majority of currently used techniques for diagnosing AD rely on time-consuming manual evaluation by medical professionals. Computer vision employing machine learning-based technologies is also being developed to assist doctors in the diagnosis process. Many new studies use machine learning methods, particularly deep learning strategies, to offer a variety of alternatives for AD identification using MRI scans. Most state-of-the-art models either utilise transfer-learning-based [2,3,4,5,6,7] or non-transfer-learning-based approaches [8,9,10,11,12,13,14,15,16,17,18]. The VGG-16, Alex-Net, Res-Net, and GoogLE-Net are some of the frequently used models in transfer-learning techniques for AD detection. On the other hand, the authors created a number of models employing deep learning architectures like CNN, Autoencoder, GAN, RNN, and LSTM in non-transfer learning approaches. The main objective of all the existing techniques is to exploit high-level spatial features to improve the accuracy of AD detection. However, these techniques lack attention to the spatial features, indicating where the important features are available in each of the high-level spatial features extracted from the deep models. Although such drawbacks can be avoided by using Convolution Spatial Attention Block (C-SAB), this technique suffers from obtaining a better spatial attention map (SAM). The existing solution only relies on max and average pool values to obtain SAM, which can be improved by adding other features. In addition, there is also a lack of techniques that exploit scale-invariant features for AD detection from MRI scans. Thus, the shortcomings can be summarised as:
-
Lack of attention to high or low-level spatial features.
-
The existing spatial block attention mechanism suffers from limited representation as far as the generation of SAM is concerned.
-
The literature also lacks scale-invariant feature extraction for the diagnosis of AD.
Therefore, this study presents a unique deep-learning solution for AD diagnosis to address the aforementioned shortcomings through multiscale feature modelling using improved spatial attention-guided depth-separable CNN. The proposed model improves the Convolution Spatial Attention Block (C-SAB) by introducing an Improved Spatial Attention Block (I-SAB). The I-SAB is designed to enhance the spatial attention map by extracting several features cues from the input and employing stack of depth separable CNN and skip connection. The proposed I-SAB is plugged into different layers of a multilayer depth-separable CNN to obtain multiscale enhanced spatial attention maps. Such maps are used to obtain spatially guided, multiscale features. All such features are fused and concatenated to obtain multiscale features forming scale-invariant features, which are inputted into a multilayer perceptron for classifying multiple classes of dementia caused by AD. The contributions could be summarised as follows:
-
An I-SAB is designed to enhance the spatial attention map.
-
Multilayers of depth separable CNN are designed to exploit multiscale features.
-
The I-SABs are plugged into different layers of the backbone to extract enhanced spatial attention-guided multiscale features, which are used to predict AD using a multilayer neural network.
The remainder of the article is structured as follows: Sect. 2 discusses the literature review; Sect. 3 describes the proposed method and model; Sect. 4 illustrates dataset information and performance measures; Sect. 5 discusses experimental setup and results analysis; Sect. 6 illustrates the ablation study; and Sect. 7 presents the conclusion.
2 Literature Study
There is an overwhelming use of machine learning approaches for AD detection. Nevertheless, the current research trends show the development of various models using deep learning techniques not only for AD detection but also in other research domains such as text classification using NLP [19], Object detection [20], Breast cancer detection [21] and so on. These models can be broadly categorised into two groups: learning-based transfer models and non-transfer learning-based models. The taxonomy of AD detection is illustrated in Fig. 1 and discussed in the following subsection.
2.1 Transfer Learning-Based AD Detection
Many of the existing transfer learning-based techniques use the weights of pre-trained baseline models such as VGG16, ResNet, GoogLeNet, AlexNet, Inception, and so on. Noting a few, Shahwar et al. [5] extracted 512 feature vectors from ResNet34 and inputted them to a quantum variational circuit for AD detection. In a study proposed by Naz et al. [3], the authors used freeze features of 11 pretrained baseline models for the detection of AD. The authors [3] identified that the VGG-16 exhibits better performance than other models. Ghazal et al. [7] utilised AlexNet to exploit object-level features for determining the existence of AD. Deepa et al. [2] applied an arithmetic optimisation algorithm to a fully connected neural network for AD detection. The inputs to the neural network are the features of the pre-trained VGG 16, which obtained better detection accuracy. But knowledge transfer from deep models trained in similar research domains can have similar feature maps and produce better results. However, the aforementioned approaches’ main objective is to extract features from MRI images, but they have used pre-trained models that are trained using samples other than medical images, which can result in producing vague results.
To overcome such limitations, finetuning these models is of the utmost importance. Chui et al. [6] focused on extracting transfer learning-based features by finetuning the hyperparameters of a CNN-based network for AD detection. However, detection accuracy must be improved. Shamrat et al. [4] fine-tuned the baseline pretrained architectures such as ResNet50, MobileNetV2, VGG16, AlexNet, and Inception V3 and found that with an accuracy of 96.32%, Inception V3 performs better than other baseline models. However, enhanced attention-guided feature modelling and performance are some of the major setbacks of these baseline models.
2.2 Non-Transfer Learning-Based AD Detection
These models have a variety of architectures that they have constructed utilising deep learning methods like generative models, LSTM (RNN), and CNN. Due to its astonishing ability to extract spatial semantic characteristics from MRI images, CNN-based architectures have been extensively researched. CNN topological structure with multiple layers was proposed by Murugan et al. [9]. The authors were significantly more accurate for the balanced dataset. A unified model using three deep networks was proposed by Orouskhani et al. [18]. The integrated VGG-16 structure serves as the model's structural backbone. Houria et al. [14] designed a multilayer of convolution layers for AD prediction. Bandyopadhyay et al. [11] designed an artificial neural network model with multiple layers of perceptrons for AD detection. However, detection accuracy must be improved. Recently, Lahmiri et al. [16] proposed a hybrid model where the authors used a CNN model for feature extraction followed by a KNN with Bayesian optimisation for the classification of AD stages. However, the validation dataset is limited in sample size. On the contrary, Abbas et al. [8] fulfilled the requirement of locating discriminant landmark areas for AD diagnosis and improved the performance through a proposed Jacobian Domain CNN. Sequential models have also been designed for AD detection. On the other hand, Hajamohideen et al. [13] used a deep CNN with a Siamese foundation to detect AD. The deep network has been optimised by the authors using the triplet loss function. An LSTM-based two-stage deep learning algorithm was proposed by El-sappagh et al. [15]. The authors have greatly improved the detection accuracy. Lee et al. [10], on the other hand, suggested a multi-modal deep learning architecture employing a recurrent neural network to predict AD from several biomarkers. Even though these sequential models were only intended to be used with time series or sequential data, the information sets that are offered are nonetheless static MRI brain pictures.
In the literature, autoencoders are established in addition to the multi-layer CNN structureto enhance accuracy through the utilisation of fine-grained abstract information. Ansingkar et al. [17] utilised a capsule encoder network and optimised it using a hybrid equilibrium method for the diagnosis of AD. Shi et al. [12] proposed a multiple loss-based autoencoder that is constrained by GAN. Among the several aforementioned deep learning techniques, the CNN-based approach has been widely used because of its capacity to exploit fine-grained contextual features. However, currently available CNN-based models perform poorly because they do not efficiently take advantage of enhanced spatial attention-guided multiscale features. The model that has been suggested fills this knowledge gap, which will be addressed in the part that precedes it.
3 Proposed Method and Model
Multiscale feature modelling and feature attention mechanisms play an important role in solving very critical and non-linear classification problems by exploiting fine-grained features. One way to extract multiscale features is to fuse multi-layer features of CNN architecture. Whereas the feature attention mechanism can be exhibited by using a spatial attention block. However, the following are some of the concerns:
-
Exploiting multiscale features using CNN would result in more computation complexity due to the increase in convolution operations.
-
Use of multiscale features without specifying "where the important features are", would result in performance degradation.
-
The current spatial attention block mechanism needs to be changed to improve the quality of spatial attention feature maps.
The proposed model addresses the above concerns. The following are the contents that describe the proposed method and model very clearly.
-
Network overview.
-
Efficient feature modelling using multilayers of depth-wise separable CNN.
-
Backgrounds of spatial attention mechanisms.
-
Improved spatial attention mechanism.
-
Improved spatial attention guided multiscale feature modelling.
-
Classification of AD stages and optimisation.
3.1 Network Overview
Figure 2 presents a graphic illustration of the suggested model's design in detail. The entire proposed architecture is built on multilayers of depth-wise separable CNN (M-DSC). The M-DSC acts as the backbone of the model and takes MRI scans as input. The backbone has been built with 10 layers of depth separable convolution layers and 4 max-pooling 2D layers. All these pooling layers have the kernel shape of \(\left(2\times 2\right)\). Each one of these pooling layers is placed after every couple of DSC layers. The kernel shapes of DSC layers starting from the first to the tenth position are as follows: \(\left(7\times 7\right)\), \(\left( {4 \times 4} \right),\left( {3 \times 3} \right),\left( {3 \times 3} \right),\left( {3 \times 3} \right),\left( {3 \times 3} \right),\left( {3 \times 3} \right),\left( {3 \times 3} \right),\left( {2 \times 2} \right),{\text{and}}\;\;\left( {2 \times 2} \right)\). The depth multipliers of the first and rest of the DSC layers are set to 3 and 2, respectively.
The feature maps of every even-numbered depth separable convolution layer are inputted to a proposed Improved Spatial Attention Block (I-SAB), as shown in Fig. 2. There are five I-SABs plugged into the backbone; each one takes feature maps of even-numbered, depth-wise separable convolution layers. The insights of I-SAB are illustrated in Fig. 4.
The I-SAB contains three parallel pooling layers, such as Max, Average, and Min Pooling. The features of these pooling layers are concatenated and then merged by a kernel-based Conv2D layer having sigmoidal activation. The activated feature map is then given to a Feature Map Enhancement Module (FMEM) to obtain an enhanced spatial attention map. The FMEM includes two DS-CNN layers that are Sigmoid triggered. These layers have kernel size of \(\left(2\times 2\right)\) with a depth multiplier of 2.
The input of the FMEM is elementwise multiplied with the sigmoidal output of the second DSC layer via a skip connection (as illustrated in Fig. 4), and then a merging layer (sigmoidal Conv2D with kernel \(\left(1\times 1\right)\)) is applied. This merging layer's output is referred to as an "enhanced spatial attention map." The enhanced spatial attention map of each I-SAB is multiplied by the input of the I-SAB to get enhanced spatially attentive features, which are fused by a depth-wise separable CNN with kernel \(\left(1\times 1\right)\), followed by a ReLU activation layer.
All the fused features corresponding to five I-SABs are flattened and concatenated to obtain improved multiscale features that are scale-invariant in nature. Such features are densely connected to 256 ReLU neurons, which are followed by an output layer containing 4 neurons each for very-mild demented (VMD (class-0)), mild-demented (MD (class-1)), moderate-demented (MoD (class-2)), and non-demented (ND (class-3)).
3.2 Efficient Feature Modelling Using Depth-wise Separable CNN
A specific type of convolutional layer used in the development of Convolutional Neural Networks (CNNs) is the depth-wise separable convolution (DSC) layer. The primary goal of the creation of such a layer is to increase the effectiveness of the standard CNN while reducing the number of parameters necessary to perform convolution operations.
The traditional CNN architecture carried out the convolution operation by moving a kernel, also referred to as a filter, through the input volume and estimating a dot product between the kernel and its associated spatially aligned region in the input. Mathematically such convolution operation could be written as follows:
Here \({F}_{i}\) is the input feature at ith layer of a convolution neural network which has \(\left[H\times W\right]\) and \(C\) as spatial and channel dimensions, respectively. It contains all real-valued numbers, thus \({F}_{i}\in {\mathbb{R}}^{H\times W\times C}\). In Eq. 1, the kernels at ith layer are represented as \({k}_{i}\), having \(N\) number of kernels of size, \(\left[h\times w\right]\) thus \({k}_{i}\in {\mathbb{R}}^{h\times w\times N}\). The function \(AF\left(.\right)\) defines the activation function, and the symbol ⨂ represents the convolution operation. The \({F}_{i+1}\) represents the output of the convolution operation, which will be the input for the \({\left(i+1\right)}^{th}\) layer, and \({F}_{i+1}\in {\mathbb{R}}^{H\times W\times N}\).
A DSC layer, on the other hand, splits the ordinary convolution operation into depth-wise and point-wise convolutions. Here's how it works:
-
Depth-wise convolution: Here, the convolutional kernel is applied separately to each channel of the input volume. Instead of using a single kernel to compute the dot product across all channels, a separate kernel is used for each channel. This means that if the input volume (at ith layer) has C input channels, we will have C separate convolutional kernels such as \({k}_{{i}_{1}}, {k}_{{i}_{2}}, \dots .., {k}_{{i}_{C}}\), where each \({k}_{{i}_{p}}\in {\mathbb{R}}^{h\times w\times 1}\). As input volume or feature at ith layer i.e., \({F}_{i}\) is composed of C channels, let us assume that each channel feature is represented by \({F}_{{i}_{p}}\) where p ranges from 1 to C. Let the features obtained during depth-wise convolution is represented by \({DF}_{i}\) which can be represented mathematically as:
$$DF_{i} = Concate\;\left[ {F_{{i_{1} }} \otimes k_{{i_{1} }} ,F_{{i_{2} }} \otimes k_{{i_{2} }} , \ldots ,F_{{i_{C} }} \otimes k_{{i_{C} }} } \right] \in {\mathbb{R}}^{H \times W \times C}$$(2)
Here, \({\text{Concate}}\) operation defines the concatenation of results of channel-wise convolution operation. Note that the concatenation is done at the channel dimension.
-
Point-wise convolution: After the depth-wise convolution, a 1 × 1 convolution (also known as point-wise convolution) is applied to the output of the depth-wise convolution i.e., \({{\text{DF}}}_{{\text{i}}}\). The 1 × 1 convolution operates on the output channels of the depth-wise convolution, combining information across channels. Let the point convolution be represented by the point kernel i.e., \({{\text{pk}}}_{{{\text{i}}}_{{\text{p}}}}\in {\mathbb{R}}^{1\times 1\times {\text{C}}}\) where p ranges from 1 to M that means we have a \({\text{M}}\) number of point convolutions. Thus, the final feature at ith layer can be obtained as:
$${\text{F}}_{i + 1} = {\text{Concate}}\;\left[ {{\text{DF}}_{{\text{i}}} \otimes {\text{pk}}_{{{\text{i}}_{1} }} ,{\text{DF}}_{{\text{i}}} \otimes {\text{pk}}_{{{\text{i}}_{2} }} \ldots ,{\text{DF}}_{{\text{i}}} \otimes {\text{pk}}_{{{\text{i}}_{{\text{M}}} }} } \right] \in {\mathbb{R}}^{{{\text{H}} \times {\text{W}} \times {\text{M}}}}$$(3)
The DSC effectively reduces the number of parameters and computations compared to standard convolution. It reduces the complexity from \({\text{O}}\left({\text{H}}\times {\text{W}}\times {\text{C}}\times {\text{M}}\right)\) in the standard convolution to \({\text{O}}\left({\text{H}}\times {\text{W}}\times {\text{C}}+{\text{C}}\times {\text{M}}\right)\) in the depth-wise separable convolution. Here, (H, W), C, and M represent spatial, channels and output channel dimensions, respectively. So, by taking advantages of such a variant of convolution, we have been motivated to designing a computationally efficient multilayer of such a depth-wise separable convolution as the backbone for our proposed network for fine-grained feature modelling. The details of the backbone network are illustrated in Sect. 3.1.
3.3 Backgrounds of Spatial Attention Mechanism
The multiscale features can be extracted by fusing the multilayer features of any backbone network [24]. However, before fusing the multiscale features, it is better to highlight and identify where the important features are in several feature maps. This can be done by using Convolution Spatial Attention Block (C-SAB), as described by Woo et al. [22]. The architectural details of C-SAB are given in Fig. 3. The procedures to obtain a spatial attention map are as follows:
-
Both average and max pooling are applied along the channel axes of depth-wise separable convolution features, \({{\text{F}}}_{{\text{i}}}\), let these features are represented by \({{\text{F}}}_{{\text{M}}}\in {\mathbb{R}}^{{\text{H}}\times {\text{W}}\times 1}\) and \({{\text{F}}}_{{\text{A}}}\in {\mathbb{R}}^{{\text{H}}\times {\text{W}}\times 1}\) respectively.
-
These features are fused or concatenated to form a tensor \({[{\text{F}}}_{{\text{M}}};{{\text{F}}}_{{\text{A}}}]\).
-
Then a standard convolution layer with sigmoid activation is applied to the fused features and producing an attention map of shape \(\left[{\text{H}}\times {\text{W}}\right]\) by computing the spatial attention as: \({{\text{M}}}_{{\text{s}}}=\updelta \left({{\text{Conv}}}^{7\times 7}\left(\left[{{\text{F}}}_{{\text{M}}};{{\text{F}}}_{{\text{A}}}\right]\right)\right).\mathrm{ Here},\updelta\) and \({{\text{Conv}}}^{7\times 7}\) denoting the sigmoid function and convolution operation with the filter size of 7 × 7, respectively. The \({M}_{{\text{s}}}\in {\mathbb{R}}^{{\text{H}}\times {\text{W}}}\).
Such an approach to getting feature maps has been applied to many research domains, but not as far as AD detection is concerned. In addition, the C-SAB can also be improved to obtain an improved spatial attention map by infusing more features into the map. Section 3.4 describes the details of such an improved spatial attention mechanism.
3.4 Improved Spatial Attention Mechanism for Multiscale Features
The basis behind proposing an improved spatial attention mechanism for multiscale features relies on two key observations from the basic C-SAB structure.
-
First, the spatial attention map \({M}_{{\text{s}}}\) solely depends on the maximum and average pool information from the input feature maps. However, minimum pool properties can also play an important role in providing better discriminant features. Thus, such features can’t be removed. So, three pooling features are obtained and concatenated from the input feature maps. Thus, the fused tensor would be\({[{\text{F}}}_{{\text{M}}};{{\text{F}}}_{{\text{A}}};{{\text{F}}}_{{\text{Min}}} ]\), where \({{\text{F}}}_{{\text{M}}}, {{\text{F}}}_{{\text{A}}},\mathrm{ and }{{\text{F}}}_{{\text{Min}}}\) are the max, average, and min pool features.
-
Second, the attention map obtained from basic C-SAB can also be improved further by processing and exploiting fine-grained features through multilayers of depth-wise separable convolution layers.
Thus, addressing the above points can enhance the model’s ability to exploit and focus relevant spatial features across different scales of features in the proposed backbone of the network by improving the spatial attention mechanism. Such objectives can be fulfilled by doing two important things:
-
First, improve the spatial attention mechanism, and
-
Plugging this mechanism across several layers of the backbone to focus on the relevant spatial features.
The following explains the former point, and the latter is explained in Sect. 3.5.
An improved spatial attention block (I-SAB) is designed that accumulate the aforementioned important features. The details of I-SAB are shown in Fig. 4. The input feature maps are utilised to extract the maximum, minimum, and average pooling features, which are then combined to create a fused tensor and applied to a sigmoidal conv2d layer. The sigmoidal conv2D layer contains one filter and produces a spatially attentive feature map \(\in \boldsymbol{ }{\mathbb{R}}^{{\varvec{H}}\times {\varvec{W}}}\). Such a feature map is again given to a Feature Map Enhancement Module (FMEM) to obtain an improved spatial attention map. The FMEM contains two consecutive sigmoidal depth-wise separable convolution layers with depth multipliers of factor two. These two layers exploit and produce a fine-grained spatial attention map from their input. The input to FMEM is multiplied with the fine-grained features through a skip connection, which can be observed in Fig. 4. The enhanced features are then merged by using a sigmoidal Conv2D layer to obtain an improved spatial attention map (ISAM). Mathematically, we can represent all these operations as follows:
3.5 Improved Spatial Attention Guided Multiscale Feature Modelling
Five I-SAB modules are plugged in parallelly in the network; each of these modules takes the feature maps of the 2nd, 4th, 6th, 8th, and 10th depth-wise separable convolution layers. As shown in Fig. 2, through a skip connection, the feature maps of an evenly placed DSC layer are element-wise multiplied with the output of I-SAB to form improved spatially attentive feature maps (ISAF) of a particular layer of the backbone. The ISAFs are then merged through a depth-wise separable convolution with kernel shape and depth multiplier of \(1\times 1\) and 1 respectively. All the merged enhanced multilayer features are flattened and fused to form attentive multiscale or scale-invariant features. These features are given to a multilayer neural network (MNN) for AD stage classification. The MNN consists of one hidden layer of 256 ReLU neurons and one output layer of 4 neurons. The output layer of the MNN employs SoftMax activation. ReLU serves as the activation mechanism for the backbone layers.
3.6 Classification of AD Stages and Optimization
The proposed model has four neurons in the output layer belonging to predict Non-Demented, Very-Mild-Demented, Mild-Demented, and Moderate-Demented. The activation function for the output layer is SoftMax. Let \(X=\left[{x}_{1},{x}_{2},\dots \dots ,{x}_{N}\right]\) be a set that provides the expected output score for each of the N samples. Similarly, let another set \(S=\left[{s}_{1},{s}_{2},\dots \dots ,{s}_{N}\right]\) represents the ground-truth labels of the samples. Let, each \(x_{{\text{i}}} |_{{i = {\text{1}},{\text{2}},{\text{3}},{\text{4}},5 \ldots N}} and\;s_{{\text{i}}} |_{{i = {\text{1}},{\text{2}},{\text{3}},{\text{4}},5 \ldots N}}\) is a one-hot vector containing predicted and ground-truth score for four classes and are defined as \(x_{i} = \left[ {x_{{i_{1} }} ,x_{{i_{2} }} ,x_{{i_{3} }} ,x_{{i_{4} }} } \right]\;{\text{and}}\;\;s_{i} = \left[ {s_{{i_{1} }} ,s_{{i_{2} }} ,s_{{i_{3} }} ,s_{{i_{4} }} } \right]\) respectively. Let \(\varnothing\) serve as the representation for each trainable parameter in the recommended network. The proposed model is trained by minimizing the categorical cross-entropy loss between \({\text{predicted}}\;{\text{and}}\;{\text{ground - truth}}\) scores of mini batches of samples. We can calculate the categorical cross-entropy loss of \({t}^{th}\) batch of samples \(\left( {t = 1\;to\;\left\lceil {\frac{N}{b}} \right\rceil where\;b\;is\;the\;batch\;size} \right)\) by using the following equation:
For a given network parameter \(\varnothing\), we have minimised the loss for the \({t}^{th}\) batch \(\left( {i.e.,\mathop {argmin}\limits_{\emptyset } \;Loss_{t} } \right)\) using Adaptive Moment (Adam) optimizer [25].
4 Dataset, Experimental Setup and Performance Measures
4.1 Stats of Datasets
The Alzheimer's disease dataset that is published on Kaggle [26] and OASIS-1[27] are the two publicly accessible datasets that we used. The AD dataset [26] is made up of MRI scans from patients with four different classifications of dementia, such as very mild, mild, moderate, and non-demented. The resolution of MRI scans is \(\left[176\times 208\right]\). The description of such dataset is provided in Table 1. The splitting of the training and testing set for this dataset is adopted from the work of Murugan et al. [9] where 10% of randomly chosen dataset are used for testing.
The OASIS dataset contains samples of 436 subjects. But, only 399 samples can be downloaded as far as the current state of the OASIS-1 is concerned. The sample distribution of the OASIS Category 1 dataset is shown in Table 1. As the samples of such a dataset are very limited, we have used two-fold cross-validation for training and testing by adopting the work of Chui et al. [6]. Figure 5 shows some of the samples of both AD [26] and OASIS datasets.
4.2 Experimental Setup
Keras layers with TensorFlow as a backend are used to code the proposed model in Python. The code is executed on the CoLAB platform. The hyperparameters such as kernel regularised parameter, learning rate, and batch size were set to 0.01, 0.01, and 32, respectively. To prevent overfitting, the model adopts early stopping and L2 regularisation. The patience parameter of early-stopping was set to 10 epochs, and the maximum training epochs are 1000.
4.3 Performance Measures
Accuracy, precision, recall, and F1-Score are the performance metrics utilised in this article. The following are the detailed descriptions of these metrics.
Here, the terms TP, TN, FP, and FN, respectively, stand for true positive, true negative, false positive, and false negative. These measures can be obtained from the confusion matrix which is given below in Fig. 6.
5 Analysis of Results
The results analysis of the proposed model for two datasets are given below.
5.1 For AD Dataset
On the AD dataset, the suggested framework achieves an accuracy of 96.25%. The proposed model's recall, precision, and F1-score are 96.36%, 96.71%, and 96.52%, respectively. The confusion matrix heat map of the predicted result on this dataset is given in Fig. 7. The performance of the proposed model is best for diagnosing MoD samples, whereas it is least effective for detecting MD samples. The AUC of the proposed model is 99.29. Additionally, Table 2 illustrates the comparison of the proposed model's performance with current state-of-the-art methods. Recent deep models such as Deep ConvNet [28], DCNN-VGG19 [29], Inception-V4 [30], ADDTLA [7], DEMNET [9], and Landmark Feature Modelling [31] are included in this comparison study. All these models exploit spatial features without providing any type of attention mechanism for the spatial features. With an accuracy of 91.70%, the ADDTLA gains second position in Table 2 by adopting a transfer learning approach. On the other hand, models like Deep ConvNet [28], DEMNET [9], Landmark Feature Modelling [31], DCNN-VGG19 [29], and Inception-V4 [30] acquired accuracies of 90.4%, 85.00%, 79.02%, 77.66%, and 73.75% respectively on the concerned dataset. However, the proposed model with improved spatial attention mechanism and enhanced multiscale feature mechanism obtained an accuracy of 96.25% which tops Table 2 among others. The proposed models outperform the most recent ones in terms of recall, precision, and F1-score. Hence, the proposed model addresses the identified challenges very well and exhibits well on the AD dataset available on Kaggle.
5.2 For the OASIS Dataset
The proposed model's accuracy on the OASIS dataset is 99.75%. The model has a 0.25% error rate, 99.63% precision, 99.91% recall, and 99.77% F1-score, respectively. Figure 8 depicts the predictive diagnosis labels' confusion matrix and the ROC curves are also obtained and illustrated in Fig. 9. The proposed model performs well in predicting all the test labels. Only one label for the non demented (ND) is not classified to it. In addition, the ROC of the proposed model on this dataset is also obtained and mentioned in Fig. 10 in which the AUC score of 99.99%. The suggested model's performance is also contrasted with a few current state-of-the-art methods that used the OASIS dataset to develop their models, which can be seen in Table 3. These models include Deep Net [32], Ensemble Hybrid Deep Net [33], CNN + Optimal KNN with BO [16], Ensemble-Deep CNN [34], Conv-TL [6], and ANN [11]. The DeepNet placed in second place in Table 3 with an accuracy of 99.68% by exploiting deep spatial features from its input. Whereas the GBM-ResNet-50 got an accuracy of 98.99% and placed in third place. Ensemble approaches using deep learning have shown promising results, for example, Ensemble-Hybrid Deep Net [33] and Ensemble-Deep CNN [34] have got an accuracy of 95.23% and 93.18% respectively. The multilayer perceptron-based model such as ANN[11] has got 92.00% accuracy on this dataset. The transfer learning-based approaches like CNN + Optimal KNN [16] and Conv-TL [6] have got accuracies of 94.96% and 93.80% respectively. Gupta et al. [35] have achieved 74.90% accuracy on the OASIS dataset, which is the lowest performance as far as Table 3 is concerned. All these methods exploited spatial features without placing emphasize on the important spatial features for AD detection. However, the proposed model with improved spatial attention mechanism tops in Table 3 with an accuracy of 99.75%. Thus, the proposed model fulfils the obtained research gaps by obtaining better performance as compared with other methods.
6 Ablation Study and Generalisation Test
The analysis of the ablation research and the generalisation test on the model are both covered in this part.
6.1 Ablation Study
Apart from the results analysis, this paper also conducted an ablation study on the various (or combination of) components of the proposed model. The main aim is to show the behaviour of each of the components of the proposed model. For this, the whole model is divided into the following modules:
-
Model-1: Proposed model with SAB instead of I-SAB. In this model, the proposed I-SAB is replaced by the conventional SAB [22] and the rest of the model remains same. The purpose is to understand and analyse the behaviour of conventional SAB.
-
Model-2: Proposed model with Four scale only. This model does not contain the first I-SAB.
-
Model-3: Proposed Model with Three Scale. It does not contain initial two I-SABs.
-
Model-4: Proposed Model with Two Scale. It does not contain initial three I-SABs.
-
Model-5: Proposed Model with One Scale. It only contains the last I-SAB.
-
Model-6: Proposed model without I-SAB and multilayer feature modelling. It contains only the backbone of the proposed model.
6.1.1 Ablation Study on the AD Dataset
Table 4 presents quantitative findings from an investigation of the six models stated above in the AD dataset. Performance measures such as accuracy, error rate, precision, recall, and f1-score are used for performance comparison. The confusion matrixes of these models are also obtained and shown in Figs, 11, 12, 13, 14, 15 and 16. The impact of SAB (Model-1) is analysed, and it has an accuracy of 94.06%, which is less than the proposed model. Similarly, it is necessary to observe the performance of multiple scale features for AD detection. This has been addressed through Models 2 to 5, whose accuracies are mentioned in Table 4. There is a gradual decrease in performance if we minimise the inclusion of multiscale features. In addition, the impact of the backbone for AD detection has to be observed. This is done through Model 6, which has an accuracy of 92.03%. Thus, it is very important to accumulate all these models to achieve better performance.
6.1.2 Study of Ablation on the OASIS Dataset
The results of the ablation study and quantitative comparison of six models on the OASIS dataset are mentioned in Table 5. The backbone of the proposed model (i.e., Model-6) has an accuracy of 94.25% which is far more less than the proposed model. The model with SAB (Model-1) achieved an accuracy of 97.75% on the OASIS dataset whereas the proposed model got 99.75%. This shows that the improved spatial attention block significantly improves the detection accuracy. The multiscale models such as 4-Scale (Model-2), 3-Scale (Model-3), 2-Scale (Model-4), and 1-Scale (Model-5) have got accuracies of 99.00%, 97.25%, 96.00%, 95.25%, and 94.25% respectively. This demonstrates that as the number of scales is reduced, the model's performance declines. The confusion matrix of all six models is shown in Figs. 17, 18, 19, 20, 21 and 22. Finally, we can conclude from this study is that the proposed model performance well in all aspects by using multiscale improved spatial attention features for AD detection on the OASIS dataset.
6.2 Generalisation Test
A dataset generalisation test is also conducted to show the behaviour of the model during domain adaptability. In this test, the proposed model is trained on the AD dataset and tested on the OASIS dataset. It has been recorded that the proposed model obtained an accuracy of 83.95% during this test. The confusion matrix of this test is shown in Fig. 23.
The confusion matrix shows that the proposed model does not classify the moderately demented class. The prediction accuracy of VMD and MD samples is nearly the same, which is 71.20% and 69.20%, respectively. The performance on ND samples is high, at 88.20%. The main reason could be the unavailability of MoD, MD, and VMD samples. Nevertheless, the proposed model exhibits better accuracy (83.95% accuracy) during such a generalisation test. The future work will focus on improving such results by proposing an advanced domain generalisation model.
7 Conclusion
An innovative deep learning-based technique for the diagnosis of AD has been provided in this article. The suggested model's framework is built utilising multiple depth-wise separable convolution layers. The depth-wise separable CNN is preferred over conventional CNN to take the advantages of less computational cost. The model exploited improved spatial attention guided multiscale spatial features for AD detection. The conventional spatial attention mechanism is limited in exploiting a better attention map which is addressed by the proposed improved spatial attention block (I-SAB). The details of I-SAB have been illustrated under Sect. 3. Multiple I-SABs are plugged in multiple layers of the backbone (illustrated in Fig. 2) to provide improved spatially attentive multilayer features. These features are fused and given to a multilayer of perceptron for disease classification. The behaviour of the model is demonstrated by performing experiment on two publicly available AD datasets such as the AD dataset available in Kaggle and the OASIS dataset. The proposed model achieves 99.75% and 96.25% of accuracy on the OASIS and Kaggle dataset. Such results outperform the existing models. On the proposed model, ablation research is also carried out. Six different sub models are generated from the proposed model and their quantitative results analysis is illustrated in Table 4. It is clear from this research that the suggested model outperforms the model with the traditional SAB. Also, the fusion of multiscale features is also important to obtain better accuracy. Additionally, a generalisation test was carried out in this paper using the OASIS dataset after the model had been evaluated on the Kaggle dataset. Such test results in 83.95% of accuracy. Thus, in all aspects the proposed model performs well but there is still a need to improve the generalization accuracy which will be our future research scope.
Data Availability
The dataset used in this paper derived from earlier published works and referred in the text i.e., OASIS and AD Kaggle.
References
Global action plan on the public health response to dementia
Deepa, N., Chokkalingam, S.P.: Optimization of VGG16 utilizing the Arithmetic Optimization Algorithm for early detection of Alzheimer’s disease. Biomed. Signal Process. Control (2022). https://doi.org/10.1016/j.bspc.2021.103455
Naz, S., Ashraf, A., Zaib, A.: Transfer learning using freeze features for Alzheimer neurological disorder detection using ADNI dataset. Multimed. Syst. 28, 85–94 (2022). https://doi.org/10.1007/s00530-021-00797-3
Shamrat, F.M.J.M., Akter, S., Azam, S., et al.: AlzheimerNet: an effective deep learning based proposition for Alzheimer’s disease stages classification from functional brain changes in magnetic resonance images. IEEE Access 11, 16376–16395 (2023). https://doi.org/10.1109/ACCESS.2023.3244952
Shahwar, T., Zafar, J., Almogren, A., et al.: Automated detection of Alzheimer’s via hybrid classical quantum neural networks. Electronics (Switzerland) (2022). https://doi.org/10.3390/electronics11050721
Chui, K.T., Gupta, B.B., Alhalabi, W., Alzahrani, F.S.: An MRI scans-based Alzheimer’s disease detection via convolutional neural network and transfer learning. Diagnostics 12(7), 1531 (2022)
Ghazal, T.M., Abbas, S., Munir, S., et al.: Alzheimer disease detection empowered with transfer learning. Comput. Mater. Contin. 70, 5005–5019 (2022). https://doi.org/10.32604/cmc.2022.020866
Qasim Abbas, S., Chi, L., Chen, Y.-P.P.: Transformed domain convolutional neural network for Alzheimer’s disease diagnosis using structural MRI. Pattern Recogn. 133, 109031 (2023). https://doi.org/10.1016/j.patcog.2022.109031
Murugan, S., Venkatesan, C., Sumithra, M.G., et al.: DEMNET: a deep learning model for early diagnosis of Alzheimer diseases and dementia from MR images. IEEE Access 9, 90319–90329 (2021). https://doi.org/10.1109/ACCESS.2021.3090474
Lee, G., Nho, K., Kang, B., et al.: Predicting Alzheimer’s disease progression using multi-modal deep learning approach. Sci. Rep. (2019). https://doi.org/10.1038/s41598-018-37769-z
Bandyopadhyay, A., Ghosh, S., Bose, M., Singh, A., Othmani, A., Santosh, K.C.: In: Santosh, K.C., Goyal, A. (eds.) Recent trends in image processing and pattern recognition, pp. 12–21. Springer Nature Switzerland, Cham (2023)
Shi, R., Sheng, C., Jin, S., et al.: Generative adversarial network constrained multiple loss autoencoder: a deep learning-based individual atrophy detection for Alzheimer’s disease and mild cognitive impairment. Hum. Brain Mapp. 44, 1129–1146 (2023). https://doi.org/10.1002/hbm.26146
Hajamohideen, F., Shaffi, N., Mahmud, M., et al.: Four-way classification of Alzheimer’s disease using deep Siamese convolutional neural network with triplet-loss function. Brain Inform. (2023). https://doi.org/10.1186/s40708-023-00184-w
Houria, L., Belkhamsa, N., Cherfa, A., Cherfa, Y.: Multi-modality MRI for Alzheimer’s disease detection using deep learning. Phys. Eng. Sci. Med. 45, 1043–1053 (2022). https://doi.org/10.1007/s13246-022-01165-9
El-Sappagh, S., Saleh, H., Ali, F., et al.: Two-stage deep learning model for Alzheimer’s disease detection and prediction of the mild cognitive impairment time. Neural Comput. Appl. 34, 14487–14509 (2022). https://doi.org/10.1007/s00521-022-07263-9
Lahmiri, S.: Integrating convolutional neural networks, kNN, and Bayesian optimization for efficient diagnosis of Alzheimer’s disease in magnetic resonance images. Biomed. Signal Process. Control 80, 104375 (2023). https://doi.org/10.1016/j.bspc.2022.104375
Ansingkar, N.P., Patil, R.B., Deshmukh, P.D.: An efficient multi class Alzheimer detection using hybrid equilibrium optimizer with capsule auto encoder. Multimed. Tools Appl. 81, 6539–6570 (2022). https://doi.org/10.1007/s11042-021-11786-z
Orouskhani, M., Zhu, C., Rostamian, S., et al.: Alzheimer’s disease detection from structural MRI using conditional deep triplet network. Neurosci. Inform. 2, 100066 (2022). https://doi.org/10.1016/j.neuri.2022.100066
Liu, J., Jin, H., Xu, G., et al.: Aliasing black box adversarial attack with joint self-attention distribution and confidence probability. Expert Syst. Appl. (2023). https://doi.org/10.1016/j.eswa.2022.119110
Chen Y, Lin M, He Z, et al (2023) Consistency- and dependence-guided knowledge distillation for object detection in remote sensing images. Expert Syst Appl 229:. https://doi.org/10.1016/j.eswa.2023.120519
He, Z., Lin, M., Xu, Z., et al.: Deconv-transformer (DecT): A histopathological image classification model for breast cancer based on color deconvolution and transformer architecture. Inf Sci (N Y) 608, 1093–1112 (2022). https://doi.org/10.1016/j.ins.2022.06.091
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S. CBAM: Convolutional Block Attention Module
Panigrahi, S.K., Tripathy, S.K., Bhowmick, A., et al.: Multi-scale based approach for denoising real-world noisy image using curvelet thresholding: scope and beyond. IEEE Access 12, 25090–25105 (2024). https://doi.org/10.1109/ACCESS.2024.3364397
Zhang, Y., Zhou, C., Chang, F., Kot, A.C.: A scale adaptive network for crowd counting. Neurocomputing 362, 139–146 (2019). https://doi.org/10.1016/j.neucom.2019.07.032
Kingma, D.P., Ba, J.L. (2015) Adam: A method for stochastic optimization. 3rd International Conference on Learning REPRESENTATIONS, ICLR 2015 - Conference Track Proceedings 1–15
Dubey, S., (2016) https://www.kaggle.com/tourist55/alzheimers-dataset-4-class-of-images.
Marcus, D.S., Wang, T.H., Parker, J., et al.: Open access series of imaging studies (OASIS): cross-sectional MRI data in young, middle aged, nondemented, and demented older adults. J. Cogn. Neurosci. 19, 1498–1507 (2007). https://doi.org/10.1162/jocn.2007.19.9.1498
Sharma, S., Guleria, K., Tiwari, S., Kumar, S.: A deep learning based convolutional neural network model with VGG16 feature extractor for the detection of Alzheimer Disease using MRI scans. Measure. Sens. 24, 100506 (2022). https://doi.org/10.1016/j.measen.2022.100506
Ajagbe, S.A., Amuda, K.A., Oladipupo, M.A., et al.: Multi-classification of alzheimer disease on magnetic resonance images (MRI) using deep convolutional neural network (DCNN) approaches. Int. J. Adv. Comput. Res. 11, 51–60 (2021). https://doi.org/10.19101/ijacr.2021.1152001
Islam, J., Zhang, Y.A.: Novel deep learning based multi-class classification method for alzheimer’s disease detection using brain MRI data (2017)
Zhang, J., Liu, M., An, L., et al.: Alzheimer’s disease diagnosis using landmark-based features from longitudinal structural MR images. IEEE J. Biomed. Health Inform. 21, 1607–1616 (2017). https://doi.org/10.1109/JBHI.2017.2704614
El-Geneedy, M., Moustafa, H.E.-D., Khalifa, F., et al.: An MRI-based deep learning approach for accurate detection of Alzheimer’s disease. Alex. Eng. J. 63, 211–221 (2023). https://doi.org/10.1016/j.aej.2022.07.062
Jabason, E., Ahmad, M.O., Swamy, M.N.S.: Classification of Alzheimer’s disease from MRI data using an ensemble of hybrid deep convolutional neural networks. In: 2019 IEEE 62nd International Midwest Symposium on Circuits and Systems (MWSCAS). pp 481–484 (2019)
Islam, J., Zhang, Y.: An ensemble of deep convolutional neural networks for alzheimer’s disease detection and classification (2017)
Gupta, S., Saravanan, V., Choudhury, A., et al.: Supervised computer-aided diagnosis (CAD) methods for classifying Alzheimer’s disease-based neurodegenerative disorders. Comput. Math. Methods Med. (2022). https://doi.org/10.1155/2022/9092289
Fulton, L.V., Dolezel, D., Harrop, J., et al.: Classification of alzheimer’s disease with and without imagery using gradient boosted machines and resnet-50. Brain Sci. (2019). https://doi.org/10.3390/brainsci9090212
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All the authors have designed the study, developed the methodology, performed the analysis, and written the manuscript. All authors have contributed equally.
Corresponding authors
Ethics declarations
Conflict of Interest
The authors declare that there is no conflict of interest regarding the publication of this paper.
Ethical approval
Not applicable' as the study did not require ethical approval.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Tripathy, S.K., Nayak, R.K., Gadupa, K.S. et al. Alzheimer’s Disease Detection via Multiscale Feature Modelling Using Improved Spatial Attention Guided Depth Separable CNN. Int J Comput Intell Syst 17, 113 (2024). https://doi.org/10.1007/s44196-024-00502-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s44196-024-00502-y