1 Introduction

Alzheimer’s disease (AD) is a neurological disease that affects the disorder in brain function and destroys brain cells slowly that leads to a loss in memory and instability in human life [122]. AD pathogenesis is thought to be caused by the overproduction of amyloid-β (Aβ) and hyperphosphorylation of tau protein. This results in the accumulation of Aβ plaques and tau neurofibrillary tangles, which disrupt the nucleocytoplasmic transport between neurons leading to cell death, which causes loss in memory and learning [112]. Physicians diagnose patients concerning many requirements where imaging scanning is an essential part. The common symptoms are (1) loss of motion function, (2) speaking difficulties, and (3) memory problems [12].

In our time with economic development and the advent of computer technology and medical information processing technologies, doctors require fast and accurate ways to diagnose and detect the disease aiming to help patients and save their lives. Compared with the traditional ways for diagnosing the disease. Patients pass through several stages to be diagnosed with the disease, but this can be late due to the diagnosis stages, and patients become in a late stage [109]. So, early diagnosis of AD is very important for patients, that help him in taking precaution and help clinicians to detect the risk of the progress of AD, it provides AD patients with knowledge of the seriousness and encourages them to take preventive steps, such as lifestyle changes and drugs [87].

Researchers want to find a simple and accurate approach to identify Alzheimer’s disease before symptoms appear. So, Early detection of AD has discovered the symptoms before reaching to risk stage. AD has several stages, one of these stages that appear disease in the prodromal stage is MCI. MCI is a stage of memory loss or other cognitive abilities loss (such as language or visual/spatial perception) in people who can still do most of their daily tasks independently [39].

Recently, most researchers’ interest in a search in this field to care about it to improve the quality of a patient’s life and discover drugs by tracking the pathological processes related to several stages of AD [112]. Due to AD is develop progressive disease so it has several stages, cognitive normal (CN), Mild cognitive impairment (MCI), Late Mild cognitive impairment (LMCI), and Alzheimer’s disease (AD). There are various technologies for neuroimaging that help researchers in classification, the common one that uses to imaging brain tissue in magnetic Resonance Imaging (MRI) [6]. Deep learning is the most common and best technique used for the diagnosis and classification of disease with a large number of input data [6].

Many surveys have been published recently reviewing histopathological imag#e analysis comprising its history, and detailed information of general artificial intelligence techniques [29, 52, 54, 55, 136, 139]; the main limitation is the lack of surveys of histopathological image analysis that focused on Alzheimer’s disease [29, 52, 54, 63, 152]. Accordingly, we present more image analysis from an Alzheimer’s disease point of view in this survey.

The main objective of the current survey is to provide a comprehensive overview of the state-of-the-art image analysis and artificial intelligence techniques, specifically for histopathology images in AD, and their challenges. This survey focuses on 159 state-of-the-art related studies, where 110 papers concentrate mainly on Alzheimer’s disease. Figure 1 depicts the corresponding statistical distribution of the studies used in the current survey.

Fig. 1
figure 1

Statistical Distribution of the Studies used in the current Survey. Left: (I) Number of Publications per Year, Middle: (II) Type of Publisher, and Right: (III) Publisher

1.1 Paper contributions

In summary, the current survey is to introduce a comprehensive evaluation and analysis of the most recent studies for AD early detection and classification under the state-of-the-art deep learning approach. Also, we present some preprocessing techniques that can enhance the quality of images and achieve the best performance of the classification process. Moreover, it will highlight different challenges throughout related studies and how they overcome them. In evaluating and analyzing the existing studies, several common trends and gaps have been identified.

The contributions of the current survey are summarized as follows:

  • Introducing a process and stage of diagnosis of Alzheimer’s disease from acquiring the data to how to classify disease.

  • Presenting type and modalities of brain imaging such as MRI, PET, and CT and comparative advantage and disadvantage.

  • Summarizing the most important basics and background related to the presented survey with a focus on deep learning as one of the most important recent trends to improve these systems.

  • Categorizing the most important Research Challenges for each the most recent and significant.

  • Presenting a comparison between various articles and present the contribution of each article, the most significant feature, and the advantage (and the disadvantage) of the solution that they used in solving a specific problem.

  • Concluding the most open research points in this area.

1.2 Paper organization

The current survey is organized as follows: Section 2 presents different related works in the diagnosis of AD, Section 3 introduces an overview of ML and deep learning (DL), definitions, and challenges. Section 4 focuses on the diagnosis of Alzheimer’s disease as an overview and highlights the various used methods. Section 5 explores the most important problems and challenges associated with the early detection of AD and concludes remarks. We introduce some future possibilities in Section 6. We present our limitation in Section 7. Finally, the survey is concluded in Section 8.

2 Related work

Due to the importance of early diagnosis of AD, many researchers interest search in this field to solve the problem of AD. So, this section will present the most important research in this field. Table 1 introduces a comparison between different studies that diagnosis AD with different DL techniques. Jain et al. [66] proposed a transfer learning approach for classifying MRI images. They used PE SE CTL mathematical model to differentiate between 3 classes (AD, MCI, CN). Firstly, they collected data from ADNI datasets and make preprocessing data by using FreeSurfer (PE) to eliminate unnecessary information of MRI images. Preprocessing techniques that they used are 5 process motion correction, non-uniform intensity normalization, Talairach transform computation, intensity normalization, skull stripping. Then after preprocessing data, select the most important slices (SE) that have more information based on entropy. Lastly, they used the VGG16 pre-trained model and transfer learning to build a classification model (CTL). Their proposed technique achieves high accuracy of 95.3%, 99.14%, 99.3%, 99.22% for 3-way classification (AD vs MCI vs CN), AD vs CN, AD vs MCI, and CN vs MCI, respectively.

Table 1 A literature review of diagnosis Alzheimer Disease with Deep Learning Techniques

Ding et al. [35] introduced a CNN architecture by using an Inception v3 network trained on 90% of ADNI data and testing 10%. Fluorine 18 fluorodeoxyglucose PET images are processed by using the grid method, these images are acquired from the ADNI dataset. Otsu threshold was applied to detect brain voxel. Adam optimizer was used with a learning rate of 0.0001 and with batch size 8 for the training model. The model was trained by using 90% of the dataset (1921 images studies), this dataset includes 3 classes (AD, MCI, and no disease). The proposed architecture achieves 82% of specificity and 100% sensitivity.

In Chitradevi et al. [28], several optimization algorithms (Genetic Algorithm, particle Swarm Optimization Algorithm, Grey Wolf Optimization, and Cuckoo Search) were used to segment the brain into sub-region such as the hippocampus, white matter, and gray matter. They acquired images from Chettinad Health City which contain 200 images for AD and normal patients, images were processed with various methods such as skull stripping, enhance quality, and contrast enhancement. After segmentation, to get the performance of segmentation they validate the segmented region with a ground truth image. The validation measures Feature Similarity Index Metrics, Structure Similarity Index Metrics, dice similarity, Jaccard Index, Tanimoto, and volume similarity. To perform feature extraction and classification model CNN was used especially AlexNet. Grey Wolf Optimization shows the highest performance compared with other optimization techniques by achieving high accuracy with 95%.

Nawaz et al. [94] proposed three models and compared them to get which of them achieve high accuracy. In the first model, images are preprocessed and extracted the handcrafted features and make classification by using support vector machine, k-nearest neighbor, and Random Forest. Second model, training model on the preprocessed dataset from scratch by using CNN deep learning model. The third model, AlexNet is used to extract deep features, to determine the best classifier support vector machine, k-nearest neighbor, and Random Forest was fed with features. By comparing the three models, the deep features-based model achieved the best accuracy with a support vector machine classifier. By comparing result analysis support vector machine achieve the highest accuracy of 99.21% and the accuracy of k-nearest neighbor 57.32% and random forest achieve 93.97%.

Kundaram et al. [79] acquired data from the ADNI dataset and preprocessed images by rescaling it to 255. CNN models are used to trained and classify disease. Images were classified into 3 classes (AD, MCI, and NC), 9540 images are used for the training model. CNN model is consists of three convolutional layers, three max-pooling, four ReLU activation layers. They used different optimizers such as Adam, SGD, Adagrad, Nadam, Adadelta, Rmsprop. By comparing different optimizers with the proposed framework, Adagrad achieves the best accuracy with less loss. The proposed model achieved 98.57% accuracy on the ADNI dataset.

In Table 2 we compare the recent survey of AD with DL and our proposed survey. Description and limitation for each survey are presented to show the difference and similarities between other survey and our proposed survey.

Table 2 Comparison between other surveys and our proposed survey

3 Machine Learning and Deep Learning overview

Machine learning is a branch of artificial intelligence that has become precisely widespread, and valuable, in the last two decades. One definition of ML is that it is the semi-automated extraction of knowledge from data [15]. ML uses data to feed an algorithm that can understand the relationship between the input and the output. When the machine finishes learning, it can predict the value or the class of a new data point [64].

Deep learning is a subset of ML, as shown in Fig. 2. ML is an algorithm with the ability to learn without being explicitly programmed. Artificial intelligence is a technique of getting the machine to work and behave like humans. In DL, the learning phase is done through the artificial neural network [14]. An artificial neural network is an architecture where the layers are stacked on top of each other. There are different DL types such as convolutional neural networks (CNN), recurrent neural networks (RNN), and autoencoders [58].

Fig. 2
figure 2

Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL) Definitions

Although the performance of ML models has progressively better in many functions, they still need guidance (e.g., human experts) to solve some problems. Developers should enhance the architecture or algorithm if incorrect predictions are returned. On the other hand, a DL model algorithm, whether the prediction is correct or not, is adaptive and learns from the features automatically on its own [89].

The major differences between ML and DL are summarized in Table 1. DL is a specific category (i.e., branch) of ML. ML extracts a relevant feature manually from input data. Then the extracted feature is used to update the model parameters that will help in the correct prediction (i.e., classification) process [146]. This does not apply with DL as relevant features are extracted automatically from data. In addition to that, DL implements end-to-end learning where data and the task required to be implemented are passed to the network [26]. As mentioned, the learning process is done automatically from the features and hence no more manual modifications (i.e., enhancements) are required.

From Table 3. It compares ML and DL according to some factor data size, time of training data, and interpretability. ML provides several approaches and models that you can choose depending on your application, the size of the data you are processing, and the type of problem you want to solve [69]. To train the model, a high-performance DL application requires a very large amount of data (i.e., thousands of records) [51]. Over the last few years, DL has provided extensive applications in image recognition [101, 148], speech recognition [33, 97], medicine and pharmacy [84, 144], natural language processing [100, 155]. An extensive number of DL methods have been proposed recently [10], and these methods can be broadly classified into many algorithms that will be discussed in Section 3.

Table 3 Difference between Machine Learning and Deep Learning Techniques

4 An overview on diagnosis of Alzheimer’s disease

Diagnosis (i.e., classification) is an important area in computer science concerning the number of published articles recently [43, 151]. The AD diagnosis process passes through many stages to detect the disease and classify it as shown in Fig. 3.

Fig. 3
figure 3

The Process of Diagnosis Alzheimer Disease

From Fig. 3, the first stage is the data acquisition process for gathering the dataset required for diagnosis. The second stage is preprocessing the dataset to enhance the dataset quality and improve the performance of the classification task. The third stage is the splitting stage. The dataset can be split into training, testing, and validation subsets. The last stage is a learning system with proper and specific techniques to extract the features, learn from the data, update the parameters, and classify the disease to a specific class. The following subsections discuss these stages in detail.

4.1 Data acquisition stage

The first stage is the acquisition of raw data. Information in images describes inner aspects of the body that can be taken with different modalities or techniques. In this process, we can collect neuroimaging images that may utilize different physical principles. There are different modalities of neuroimaging such as (1) Magnetic Resonance Imaging (MRI), (2) Positron Emission Tomography (PET), (3) functional Magnetic Resonance Imaging (fMRI), and (4) Computed Tomography (CT) [138]. Selecting one of these modalities depends on the researcher’s choice, task, and the used model. Datasets can be acquired from different organizations such as hospitals, clinical centers, radiology centers, and online websites [149].

4.1.1 Biomarkers and neuroimaging for Alzheimer’s disease diagnosis

Conventionally, the clinical diagnosis of dementia has concentrated on (1) clinical assessment, (2) neuropsychological testing, and (3) the exclusion of other possible causes [102]. Normally, the diagnosing of AD can be accomplished with three altered methods, as depicted in Fig. 4.

Fig. 4
figure 4

Different Imaging Techniques for AD Diagnosis

From Fig. 4, these three methodologies are (1) Memory test through history and discussion, mental status test, and neuropsychological tests. (2) numerical laboratory, and (3) brain imaging scan [21]. There has been a revolution in the part played by neuroimaging in AD study and practice in the last years. Diagnostically, imaging has motivated from a slight exclusionary role to a crucial position [72]. In research, imaging is aiding address numerous scientific interrogations. Concurrently the probability of brain imaging has extended rapidly with new modalities and innovative ways of acquiring images and of analyzing them [141]. The definite modalities included are magnetic resonance imaging (MRI; both structural and functional) and positron emission tomography (PET; for assessment of both cerebral metabolism and amyloid).

Imaging Module and Types

These modalities have different strengths and limitations and as a result, have different and often balancing roles and scope [108]. Although additional data are required, imaging is preliminary to offer prognostic information at this premature preclinical phase. The necessity for an earlier and more definite diagnosis will only grow as disease-modifying therapies are identified [40]. This will be particularly true if, as expected, these therapies work best (or only) when introduced at the preclinical stage. Table 4 summarizes the different brain imaging in AD [73].

Table 4 Summary on the Different Brain Imaging in Alzheimer’s Disease

In Table 4, it compares advantage and disadvantage between neuroimaging modalities and shows open research area for three modalities. Brain imaging has different scans depend on the type of disease, there are CT, MRI, and PET imaging. Structural imaging provides information about the shape, and volume of the brain like CT and MRI, functional imaging showing activity of the brain and how cells work.

4.2 Preprocessing techniques

Datasets, especially images, may contain noise and distortions. Radiography noise is generally caused by changes in the sensitivity of the detector, diminished illumination of the object (i.e., low contrast), photographic limitations, and spontaneous variations in the radiation signal [111]. So, it is essential to preprocess the data to enhance its quality or to optimize its geometric and intensity patterns [107]. Preprocessing lets researchers concentrate on a specific part of the brain and highlights the most vital information that is required in the classification process. Preprocessing techniques are many and the most common ones are depicted in Fig. 5.

Fig. 5
figure 5

Taxonomy of AD for Classification and Preprocessing Techniques

Figure 5, divide AD diagnosis into two mainly step. Firstly, preprocessing techniques on images. Secondly, techniques which used in learning and classify diseases. Choosing one of these techniques according to the problem that the researcher wants to solve and the type of input data.

4.2.1 Intensity normalization

It is an essential preprocessing technique that is used for mapping the intensity of all image pixels to a reference scale [133]. In general, data collected from different sources, or the same source but at different points of time, may not have identical intensity ranges [120]. For example, normalization can calibrate different pixels to the normal distribution as depicted in Fig. 6 [115].

Fig. 6
figure 6

Before and After Normalization Appliance [115]

4.2.2 Contrast enhancement

Contrast enhancement is the difference between the highest and smallest pixel intensities as shown in Eq. 1. It improves the quality of images and increases the contrast of borders in the image that helps us to differentiate between organs. It improves the brightness of the image by expanding the range of pixel values as well [80].

$$ g\left(x,y\right)=\frac{f\left(x,y\right)- fmin}{fmax- fmin}\ast levels\ of\ gray $$
(1)

where fmin is the minimum value, fmax is the maximum value, f(x, y) is the value of each pixel in the image, and g(x, y) is the enhanced pixel after that image contrast is applied [117]. Figure 7 shows a sample MRI axial brain image with high contrast.

Fig. 7
figure 7

Applying the Contrast Enhancement Technique on a Sample Image. Left: Input Image and Right: Output Image

4.2.3 Denoising process

Median filter

The median filter is a technique that is used to minimize noise without blurring the edges [110]. It is especially appropriate for the enhancement of the required MRI images. The median filter identifies pixels as noise by matching each pixel in the image to its neighboring pixels [117]. It contains a filter (i.e., kernel) with a specific size that passes through each pixel value in the image and replaces it with the corresponding median value. The median value is determined by sorting the surrounding pixels’ values and then replaces the pixel with the corresponding middle pixel value [127]. Figure 8 shows the effect of applying the median filter on a sample image where the filter had s size of (3 * 3).

Fig. 8
figure 8

Left: Before Removing Noise by the Median Filter and Right: After Applying the Median Filter

Gaussian filter

Gaussian filtering is a technique that helps to denoising the image and is performed by detecting the size of the mask as shown in Eq. 2 [159].

$$ {\displaystyle \begin{array}{*{20}c}G(x)=\frac{1}{\sigma \sqrt{2\pi }}.{e}^{\frac{-1}{2} * {\left(\frac{x-\mu }{\sigma}\right)}^2}\end{array}} $$
(2)

where σ is the standard deviation and defines how the Gaussian looks like, μ is the mean value, and x is the input value. Figure 9 shows the effect of applying the Gaussian filter on a sample image by using a mask size of (3 * 3).

Fig. 9
figure 9

Left: Before Removing Noise by the Gaussian Filter and Right: After Applying the Gaussian Filter

4.2.4 Brain extraction and skull stripping

It is one of the preprocessing techniques that remove any non-brain tissues such as eyes, necks, and skulls. It segments these by using the dark space between the skull and brain occupied by the Cerebrospinal fluid (CSF). There are many tools presented in [75], that are used in brain extraction, such as the brain extraction tool ROBEX algorithm that uses machine learning. It may use DL also, but it will be a more extensive computational process and will need specific hardware to be able to run the algorithm. Khademi et al. [75] used the Random Forest classifier for brain extraction. They targeted to find a binary segmentation mask for the brain. After getting that mask, they multiplied it with the original image as shown in Fig. 10 [126].

Fig. 10
figure 10

Brain Extraction with a Binary Segmentation Mask [75]

4.2.5 Data augmentation

Data augmentation techniques help our model to avoid the overfitting problem [126]. The meaning of the overfitting is increasing in validation error with decreasing training error value so to build the best model, validation error must still decrease with training error [5]. After collecting your data, we can apply data augmentation to increase the images’ diversity in each class. There are different techniques such as cropping, shifting, shearing, scaling, and zooming [95].

From Table 5, A literature review about preprocessing techniques and what is the methodology used to diagnosis AD. Also, we present the advantage and effectiveness of preprocessing on the images and show the performance of the training model.

Table 5 Compare different studies showing preprocessing techniques and classifiers

4.3 Classification

DL, as mentioned earlier, is a sub-field of ML [24]. DL is more effective than the traditional ways of ML because it extracts the features automatically [37]. Also, DL performs “end-to-end learning” where raw data and tasks are provided to the network [17]. Most researchers depended on Convolutional Neural Network approaches for detecting Alzheimer’s disease from MRI images compared to other techniques of DL such as Recurrent Neural Networks (as shown in Fig. 11(b)), Deep Neural Networks (as shown in Fig. 11(a)), Autoencoder (as shown in Fig. 11(c)), and Deep Belief Networks (as shown in Fig. 11(d)) [8, 62].

Fig. 11
figure 11

a) DNN Architecture, b) RNN Architecture, c) Autoencoder, and d) Deep Belief Network

4.3.1 Deep neural network (DNN)

DNN, as shown in Fig. 11(a), has an input layer, output layer, and one (or more) hidden layers [50]. It is distinguished by dealing with complicated problems and understanding the relationship between input and output data, also able to model complex non-linear relationships [42]. It considers supervised learning techniques and is used in various areas of research to explore patterns between inputs that were unknown before [68]. It requires a large number of training data to extract the features of the labeled images [98].

4.3.2 Convolution neural network (CNN)

A CNN is one of the most successful techniques to perform image classification and recognition in neural networks. From Fig. 12, CNN is composed of several convolution layers, pooling layers, activation layers, fully connected layers, and a classifier layer.

Fig. 12
figure 12

Convolutional Neural Network Architecture

The convolution layer is an essential layer that extracts the feature maps bypassing the learned filter (or kernel) with a specific size of the input image [42]. Then, it follows up the activation function that decides whether the neuron should be activated or not. It makes the nonlinear transformation to the input making it capable to learn and perform more complex tasks [2]. Activation functions have numerous types such as sigmoid, Tanh, and ReLU to create feature maps [123]. Pooling layers reduce the dimensionality but keep the most important features. They can be considered as down-scalers [41]. A fully connected layer connects every neuron from the previous layer to all neurons in the current layer. Finally, the classifier layer selects a class (i.e., label) with the highest probabilities [7].

One important thing in CNN is that it can handle large datasets to get high performance in the classification task [53]. From the transfer learning point of view [140], CNN has several architectures that were trained on the ImageNet dataset including VGGNet, LeNet, GoogLeNet, ResNet, AlexNet [118]. With a pre-trained CNN model, the developer can benefit from the parameters (i.e., weights) of that model and transfer them to the new task [128, 150].

4.3.3 Recurrent neural network (RNN)

RNN is used in sequence- or time-series problems [91]. The most important advantage in that approach is the used memory and hidden state. Figure 11(b) shows a sample RNN architecture with an input, a hidden, and an output layer. The hidden state is effective in remembering the confident information about the problem sequence [106]. Another distinguishing characteristic of RNN is that they share the same parameters within each layer in the network unlike the feedforward networks [65]. The later networks, the feedforward networks, have different parameters for each node in the networks thus causing a large number of parameters [114].

RNN does not have a large number of layers and is not too deep compared to CNN’s or DNNs [88]. However, the model is difficult to train and suffers from vanishing or exploding gradients limiting its application for modeling long-time activity sequence and temporal dependencies in sensor data [81]. The common applications of RNNs are natural language processing [154], speech recognition [57], and language translation [27].

Long short-term memory (LSTM) and Gated recurrent units (GRUs) are common architectures of RNN [36]. The main purpose of the LSTM is to maintain any error that occurs through the different layers and times [96]. It contains cells in the hidden layer, which have three gates: input, output, and forget gate. These gates are responsible for storing the information and regulating the flow of information to predict the output of the network [114]. This single-cell helps the model to decide which one to stock and when it can read and update the information through the gates [98]. GRUs use a hidden state and have two gates: reset gate and update gate. They control what and how much information will be retained [114]. Its performance, in many tasks, is better than LSTM [36, 42].

4.3.4 Autoencoder (AE)

AE is an unsupervised learning technique and can take an unlabeled dataset and compress it to feature encoded data. It is used for dimensionality reduction and consists of two major parts: an encoder and a decoder, as shown in Fig. 11(c) [98]. The encoder converts the input data to code (i.e., compressed data) and then the decoder rebuilds the code to the output which looks like the input [105]. Layers in the encoder part may be dense layers or convolution layers. The number of layers in the encoding part must be equal to the layers in the decoding part. The encoder reduces the dimensions of data, but it increases the dimensions of data [103]. The middle layer is called the bottleneck layer that compresses the representation of the input data [48].

AE has different types that improve the performance named (1) denoising AE, (2) sparse AE, and (3) contractive AE [1]. AE faces some problems such as a copy of the input layer to the hidden layer causes inefficient extraction of the meaningful features although it can retrieve the input in the output layer [134]. Denoising AE solves that problem by corrupting the inputs that the AE must then reconstruct or denoising [34]. This helps the model to recognize the feature from noisy input and hence can classify. The model does not copy the input to the output without learning features about data.

The sparse AE added many constraints to reduce the number of hidden nodes and limit nodes that were activated [135]. When the average of activation of the hidden nodes is close to zero this means nodes in the hidden layer are active and the other not active [147]. It can learn features by imposing some penalty, it is applying on the hidden layer. There are two ways to put sparsity constraint: (1) L1 regularization that is added to cost function which helps in preventing the overfitting problem [93] and (2) KL-Divergence constraint that is added to all hidden nodes to provide a low average activation value [11]. The main purpose of contractive AE is to support strong representation which will be able to extract useful information and be less sensitive to small variations in data [124].

4.3.5 Deep belief network (DBN)

DBN is a supervised learning technique that can link the unsupervised features, which are extracted from the stacked layer [70]. It is a generative graphical model and is constructed by a stack of Restricted Boltzmann machines (RBM) which can extract features and reconstruct the input. DBN has an undirected connection between the top of two layers as depicted in Fig. 11(d). DBN reduces the weight initialization by using RBM that helps the model overcome the overfitting problem [70].

DBN was created to analyze the apparent distribution between the input and the hidden layers in such a way that the lower layer node is connected directly, and the upper layer nodes are connected indirectly [92]. This model is helpful in a task that needs to extract features, involving biological data, and with classes that are not separated linearly [13].

Finally, the authors summarize the DL branches including the different models graphically in Fig. 13.

Fig. 13
figure 13

Summarize section of Deel Learning Techniques

5 Research challenges

In this section, we present some of the research challenges that can be divided into two categories (1) the first is related to the data, (2) the second is related to the classification problem. The important challenges for each category are highlighted in the following subsections. The authors summarize the challenges section in Fig. 14 and discussed it in detail in the following subsections.

Fig. 14
figure 14

summarize challenges of data and classification

5.1 Availability of large datasets

To achieve the best result, DL techniques require a large number of datasets for the training process. Unavailability of data is a major challenge as it is difficult to acquire data from hospitals and clinical centers due to the privacy of patients. A set of online medical datasets available for a researcher for example ADNI (Alzheimer’s Disease Neuroimaging Initiative), OASIS (Outcome and Assessment Information Set), COBRE (CENTR for biomedical research excellence), and the FBIRN (Function Biomedical Informatics Research Network). To overcome the small diversity of the available datasets, data augmentation is a technique that is used to increase the number of images without adding a new image by flipping, padding, rotation, etc. Table 6 summarizes the different methods used in different studies to solve the issue of limiting datasets.

Table 6 The Different Methods to Overcome Data Limitation

In Table 6, we compare different studies how they overcome the limitation of availability of data. We compared their advantage, disadvantage, type of image modalities, datasets, and methods they used for overcoming this problem.

5.1.1 Alzheimer disease datasets

To have the ability to train systems and compare performance between architecture by using a different dataset, large datasets are required for training, testing, and validation. Table 7, summarize the dataset of AD with their links. ADNI dataset is a multicenter study that aims to develop imaging, clinical, and genetic for tracking the growth of disease and to detect AD at the early stage (pre-dementia). There are 6 classes: Patient = 28, CN = 834, MCI = 671, EMCI = 340, LMCI = 185, AD = 450, SMC = 115.

Table 7 Datasets for Alzheimer’s disease

OASIS is a collection of neuroimaging data sets that are open access for research and analysis. On XNAT Central, you may see and download 3-OASIS datasets. OASIS-1 is a collection of cross-sectional MRI data from 434 scan sessions including 416 participants. It was first released in 2007. OASIS-2 is a collection of longitudinal MRI data from 373 imaging sessions in 150 people. In 2010, it was made available. OASIS-3 is made up of 1098 subjects’ cross-sectional and longitudinal MRI and PET data. It was released in the year 2018.

Alzheimer’s Dataset (4 classes of images) is from open access Kaggle website that consisting MRI images. The dataset contains two files training and testing each of them containing 5121 and 1279 images respectively. There are 4 classes in training file: Mild Demented = 717, Very Mild Demented = 1792, Moderate Demented = 52, Non Demented = 2560.

MIRIAD is a database of volumetric MRI brain-scan of 46 Alzheimer’s sufferers and 23 healthy elderly people. It includes a total of 708 scans and should be of particular interest for work on longitudinal biomarkers and image analysis. It is also an open-access dataset.

Table 7, presents available datasets with their links, the number of available images, shows the number of classes,s and is open access for the researcher or not?.

5.2 Overcoming data imbalance problem

Data imbalance is one of the problems that face researchers when they solve any problem by using DL. We may describe this as the distribution of examples across classes is not equal. For example, in the AD dataset, the number of images that have a disease is larger than images with no disease in which will cause an imbalance in the dataset [71].

There are different suggestions to solve this problem as shown in Table 8. Under-sampling and oversampling are two common techniques to enhance the data imbalance problem. Under-sampling is done by deleting data randomly which have enough amount of data in class to balance between classes [113]. It helps in obtaining an equal number of samples in classes and fast training time. Although the simplicity of that approach, there is a high probability that the data which we are deleting may have essential information or features for the classification of this class.

Table 8 The Different Methods to Balance the Different Classes

Over-sampling means increasing the amount of data by copying the existing sample. So, to achieve balanced classes, increase the size of the minority class. This process is done on the minority class which has a smaller number of data than other classes. Overfitting is the major issue that occurs with over-sampling [113].

In Table 8, we compare different studies on how they overcome the problem of data imbalance. We compared their advantage, disadvantage, type of image modalities, datasets, and methods they used for overcoming how to make different classes have the same number of images.

5.3 Multimodality images in classification

There is different scan type for neuroimaging such as Magnetic Resonance Imaging (MRI), Positron Emission Tomography (PET), functional Magnetic Resonance Imaging (fMRI) and Computed Tomography (CT). Learning the model used in this different modality is another challenge. Learning Heterogeneity data may cause less performance because all input data are different and combine [82]. To solve this problem every single modality is learning separately by multi hidden layers to extract features then the second stage is combining features from the last hidden layer of each modality and then learn a model to classify labels by using combined features from the last stage.

A combination of features from different modalities has high performance than a training model with a single modality. Each neuroimage modality can offer new different details for the disease that make classification more effective [18]. Table 9 shows a summary of the different methods of the papers used in solving a combination of different modalities of the images.

Table 9 The Different Methods to Solve Multimodalities Images in the Dataset

In Table 9, we compare different studies which used different modalities of images in the learning model. We compared their advantage, disadvantage, type of image modalities, datasets, which classifier they used, and methods they used for overcoming this problem.

5.4 Collecting necessary information

MRI images may have some information that is unnecessary for diagnosis of AD that increases the time of processing and training data, computational process, and cause less efficiency in the training of our classification model [90]. The solution for this challenge is using techniques for preprocessing data before training it as shown in Table 10. It summarizes some of the related articles that solve this problem. It is worth mentioning that FreeSurfer is a free tool from the internet used for processing images like skull stripping, segmentation for the essential part of the brain for diagnosis of Alzheimer’s disease [22].

Table 10 The Different Methods of Removing Unnecessary Data from the Image

In Table 10, we compare different studies on how they collecting important information in images and how to discard other information. We compared their advantage, disadvantage, type of image modalities, datasets, and methods they used for overcoming this problem.

5.5 Neuroimages noise manipulation

Adversarial noise may be found in neuroimages and this reduces the performance of the classification process. To minimize noise, as we mentioned before in Fig. 5, we present some preprocessing techniques that help in removing the noise from images as shown in Table 11. It summarizes the different methods that are used in removing noise from neuroimages. Gaussian filter, median filter, and many filters that remove noise from images increase the quality of training of the classification model.

Table 11 The Different Methods of Removing the Noise from Neuroimages

In Table 11, we compare different studies which have datasets or images with noise. We compared their advantage, disadvantage, type of image modalities, datasets, which classifier they used, and methods they used for overcoming neuroimaging noise.

5.6 Overfitting problem

Overfitting happens when the model is trained on data that have noise and more unuseful details. The size of data used for training may not be enough is also one of the reasons that cause overfitting in this case to solve this problem we need to increase the amount of data [59]. To overcome this problem, the dropout can be used to drop the units (i.e., hidden and visible) in a neural network as shown in Table 12 [130]. In it, some articles used that method and others solve that problem in another way. This means removing the units temporarily from the network. Thereby, choosing a random sample of neurons to train rather than train whole neurons in the network. This can make the learning of the hidden layer better.

Table 12 The Different Methods for Overcoming the Overfitting Problem in Classification

In Table 12, we compare different studies in which their architecture suffers from an overfitting problem. We compared their advantage, which classifier they used, methods they used for overcoming overfitting problems, and shows the performance of each study.

5.7 Hybrid approach

The hybrid model is defined as combining more than an approach or technique to achieve high performance or enhance the training and classification process [99]. One example of a hybrid method is feature selector and CNN model, feature selector with the pre-trained model or transfer learning, and hyperparameters optimizer with any model of DL. As shown in Table 13, some of the articles combined more than one learning method or classification technique to enhance the overall classification of the disease.

Table 13 Summarize Different Hybrid Approaches in AD

In Table 13, we compare different studies which used a hybrid approach in their studies. We compared methods they used for combining more than one model, shows the performance of each model, type of image, datasets that they used, which classifier they used, and our comments about their methods.

5.8 Black box challenges

One of the trending issues is the black box. Neural networks, which can be thought of as black boxes that convert input into output, are often used in machine learning methods [49]. Although math used to construct a neural network is straightforward how the output arrived is exceedingly complicated, ML algorithms get a bunch of data as input, identify patterns, and build a predictive model but understanding how the model worked is an issue [85].

Although DL has the most success in achieving high performance close to human in classification and predicting process, operate as black boxes [49]. It doesn’t offer a specific reason or explanation for choosing a specific feature in the training process or why this achieves high or low performance or how the training data’s relations are reflected in the feature selection as shown in Table 14 [25].

Table 14 The different articles for black-box challenges

In Table 14, we compare different studies about black box challenges. We compared their model, shows the performance of each model, the type of image, datasets that they used, which classifier they used, their contribution, and our comments about their methods.

6 Future directions

By revising the most recent literature on early diagnosis of Alzheimer’s disease, it was concluded that. To achieve overall improvement and upgrading the accuracy of diagnosis using a computer application, the following points must be taken into account:

  • One of the challenges is collecting brain-balanced and sufficient data related to Alzheimer’s disease [60, 119, 125, 131].

  • Most of the recently deployed methods and techniques correlated to DL, including deep sparse multi-task learning [131], stacked auto-encoder [137], and sparse regression models [132], each is attempting to overcome the aforementioned challenges. In [119], proposed a deep architecture to remove the features without load redundant information by using sparse multi-task learning in a hierarchy.

  • Deep learning segmentation (e.g., U-Nets) can be injected in the process to specify only the region of interest.

  • Combining two various conceptual methods of sparse regression and DL to diagnose AD can be effective [125]. Also, one of the promising techniques is the manifold-based learning method.

  • Data augmentation and scaling techniques can help to improve the overall state-of-the-art performance.

7 Limitations

All studies have both strengths and weak points so, we have several limitations. First, we mentioned only the most common preprocessing techniques (intensity normalization, contrast enhancement, De-noising process, brain extraction, and data augmentation) that are used with neuroimaging. Second, we discussed only five techniques of DL (DNN, ANN, CNN, AE, and DBN) although there are many approaches we mentioned the most common one with a diagnosis of AD. Third, ML is not discussed in detail as DL. Finally, we mentioned four datasets, not all of them, and the current study works on ten years of study.

8 Conclusions and future work

AD is a cumulative neurological disorder that is the most common form of late dementia. AD causes nerve cell death and tissue loss in the brain, resulting in a substantial decrease in brain volume over time and impairment of most of its functions. In this paper, we started with the big difference between traditional ML and DL, followed by the stage of diagnosis of AD. In the diagnosis of AD, we need to preprocess images to enhance the quality of learning, so we show some preprocessing techniques used with images. And also, we presented different methods of DL that are most common in the classification process such as CNN, RNN, DNN, AE, and DBN. Although the importance of the classification of disease by using DL, there are challenges for dealing with the dataset.so, we presented a review of literature for every challenge and show their suggestion to solve these problems. The novelty of the current survey can be summarized in (1) introduce different preprocessing techniques which processed on neuroimaging, (2) combine preprocessing techniques with the most common DL methods in one survey, (3) compared different state-of-art research with their challenges in dealing with dataset and classification stage. In future studies, to classify AD with the proper dataset, we can use (1) Abstract CNN Models, (2) apply transfer learning only, (3) apply both transfer learning with abstract CNN models, (4) use feature selector to select feature separately and after that using CNN models, (5) use feature selector to select feature separately and after that using transfer learning, (6) compare the performance of two models that mentioned in 4 and 5 points, and (7) using hyperparameters optimizer with one of the models like CNN or transfer learning.