1 Introduction

Computer-aided diagnostic (CAD) systems assist healthcare providers to identify, detect and diagnose a particular ailment in the human body (Gendy and Yuce 2023). Today, due to the advancements in the fields of biomedical engineering, data acquisition techniques and data analytics, CAD systems are used across almost all the fields of the medical sciences (Amorim et al. 2023). Neurodegenrative disorders are a type of ailment that are characterized by functional loss of the neuron cells both in the brain and the central nervous system. Alzheimer’s disease (AD) is one such progressive neurodegenerative disorder that affects a large population across the world (Knopman et al. 2021). AD occurs in elderly people resulting in long-term memory loss, cognitive disabilities, behavioral inconsistencies, and several other such symptoms. AD, a type of dementia, refers to a condition characterized by the gradual or enduring decline in cognitive abilities, particularly affecting memory and abstract thinking due to the malfunctioning of entorhinal cortex and hippocampus of the brain (Strikwerda-Brown et al. 2019). These effects in the brain may begin a decade or earlier before the symptoms first appear (Yu et al. 2021a), this stage is referred to as mild cognitive impairment (MCI) (Garg et al. 2023).

AD affects about one person in every nine individuals of the elderly population (Spetz and Flatt 2023). Due to the degenerative nature of the disease, treating AD is a challenging task and can be tackled through clinical management only. Thus, diagnosing AD at early stages (i.e., during the MCI stage) leads to better treatment of the disease (Porsteinsson et al. 2021). Through the aid of CAD techniques, early detection of AD has today become more efficient and robust. Advancements in neuroimaging techniques like positron emission tomography (PET) and magnetic resonance imaging (MRI) scans, coupled with new machine learning (ML) techniques have contributed to the CAD methods (Ghazal and Issa 2022). Several diagnostic datasets are now available across various sources like open access series of imaging studies (OASIS) and Alzheimer’s disease neuroimaging initiative (ADNI) that include patient health records such as demographic, imaging, clinical, and genetic details, etc. These datasets are today being used by researchers, academicians, and industrialists to develop ML techniques for CAD (El-Sappagh et al. 2021a).

In the past, several surveys have been published on CAD of the AD. They review the works on different approaches. Some of them considered different ML techniques for classification of AD stages (for example, Grueso and Viejo-Sobera 2021; Zhao et al. 2023; Billeci et al. 2020), while the most recent surveys have concentrated on DL techniques (for example, Jo et al. 2019; Ebrahimighahnavieh et al. 2020; Fathi et al. 2022). A few other review articles consider parameters like imaging data (Warren and Moustafa 2023), clinical data (Kumar et al. 2021), and prediction results (Rowe et al. 2021). Although there are several other such reviews, they focus on either one approach of CAD or on the single modality of the diagnostic data from popular datasets like ADNI and OASIS. They lack a birds-eye view on the CAD of the AD. Also, a background on the CAD and ML is missing in them. Further, a comprehensive discussion on the future directions with respect to the computational aspects is also lacking.

Our review is different from the above works. To present the recent developments in the fields of CAD of the AD, we have selected a large number of articles (80) from the past 5 years. Further, they are classified into the taxonomy based on the datasets and ML techniques utilized. The merits, demerits, and challenging issues of each of the reviewed work are also outlined. In this regard, our contributions are as follows.

  • Background on AD We give a summary of AD that will help the novice in the ML domain to understand the disease’s challenges for CAD from a medical perspective.

  • Implementation details We have provided in-depth implementation details along with the architectural and dataset features.

  • Review, discussions and comparison of the state-of-the art We have extensively reviewed the recent works on CAD of the AD and conducted a comprehensive analysis of the ongoing trends that are promising for the future researchers to understand the position of CAD with respect to AD.

The rest of the article is organized as following. Section 2 provides a detailed description of AD from disease and diagnostic perspectives. Furthermore, it provides a background on several implementation tools, and datasets. Section 3 reviews the most recent works that address the challenges of CAD of the AD. We also present the benefits and drawbacks of each of the work. Further, in Sect. 4 we discuss the lessons learnt from our review by comparing the works on their implementation parameters and in Sect. 5 we list a few future directions that are to be incorporated to improvise the efficiency of CAD for the AD. Finally, Sect. 6 concludes the article.

2 Background

In this section, we present a comprehensive overview of the AD. In the first subsection, we discuss the disease from medical and diagnostic perspectives. We also establish the need for its early detection. Further, in the next subsection, we provide a background on the computer-aided diagnosis of the AD. It includes a description of various data sources available for the researchers and developers along with the different ML models used in the CAD.

2.1 Alzheimer’s disease

AD is a brain disorder that gradually deteriorates memory, thinking skills, and eventually, the ability to do daily chores of life. Research shows that a number of genetic, environmental circumstances, and lifestyle choices collectively lead to development of AD, although its exact cause is poorly understood. The accumulation of tau tangles and amyloid plaques, the aberrant protein structures in the brain is one of the characteristics of the AD (Karran and De Strooper 2022). Beta-amyloid protein fragments clump together and build up outside of the brain cells to form amyloid plaques (Horie et al. 2022). On the other hand, tau tangles are created when tau protein builds up inside the brain cells. There are specific genes, such as the Apolipo-protein E gene, that is identified as hereditary determinant for developing AD (Liu et al. 2013). Heart illness, head injuries, high blood pressure, and a lack of exercise are a few of the environmental and lifestyle factors that are linked to an elevated risk of AD (van Praag 2018).

As AD affects the elderly population, their lifestyle, and health gradually deteriorate. Also, there exists no cure for it. The disease is primarily managed through supportive care by the healthcare providers. Thus, its diagnosis at early stages is crucial. AD diagnosis is established by examining a person for cognitive and functional changes (Kim et al. 2020). This includes a thorough examination of medical history, cognitive ability, and physical conditions. The purpose of AD diagnosis is assessing the presence and severity of cognitive impairment, the underlying cause of the symptoms, and to eliminate potential additional factors of cognitive decline. The diagnostic procedure is complex and consists of a number of steps, during which a physician inquires about the patient’s symptoms and underlying medical conditions.

The cognitive and memory tests include montreal cognitive assesment and mini-mental state examination (Pinto et al. 2019). Neurological examinations are also carried out to assess reflexes, muscle strength, coordination, and other elements of nervous system functioning (Galasko et al. 1990). AD-related alterations in the brain are discovered with the help of imaging tests like PET and MRI scans. While both MRI and PET can identify different brain regions, they do so using different methods and are used for specific purposes. MRI is primarily used for identifying structural abnormalities in brain, while PET is utilized to identify functional changes that occur in the brain. In some cases, both imaging techniques may be used together to furnish a more inclusive viewpoint of the brain and its functions. Laboratory tests such as blood and urine tests are also used to evaluate the overall health condition (Coimbra et al. 2006).

As summarized above, the diagnosis of AD is a complex and tedious process, that leads to ineffective treatment strategies. To address these challenges there is a need for an computational mechanism for early detection to assist the healthcare providers and enhance the treatment process. CAD is one such automated technique that aids in the diagnosis of a disease. With respect to AD, CAD is a classification strategy belonging to machine learning studies. It classifies a given data sample as either (i) binary classification (i.e., positive/negative towards disease) (Tufail et al. 2020) or, (ii) multi-label classification (i.e., identifying the different stages of the disease) (Altaf et al. 2018).

As AD is a degenerative disease, a patient suffering from it passes through several stages of the disease. Though the medical and research community is divided among the number and types of stages in AD, a four-stage classification is generally accepted. In the first stage, i.e., no dementia (ND), also known as pre-clinical AD, there are no noticeable symptoms. The terms ND, normal condition (NC), cognitively normal (CN), and healthy controls (HC) are used interchangeably. However, an imaging scan might detect deposits of beta-amyloid protein. These deposits can lead to early diagnosis and thus better clinical management (McCormick et al. 1994). The second stage of AD is referred to as very mild dementia (VMD) or MCI. In this stage, symptoms of the disease start to appear however, they do not affect the patient’s day-to-day activities. During this stage, several tests like, cognitive screening test, biomarker identification, etc., are prescribed. This leads to a collection of multi-modal datasets (Petersen 2009). During the third stage, i.e., mild dementia (MID), the symptoms are prominent and affect the day-to-day activities of the patients. The above medical test may be prescribed again here to asses and plan the clinical management of the disease. The last stage of AD is further categorized into two sub-stages, i.e., moderate (MOD), and severe dementia. Here, there is a complete loss of memory, in turn leading to other symptoms like loss of communication ability, decline in physical activities, etc (Andersen et al. 1999).

2.2 Computer aided diagnostics for AD

As discussed in the earlier subsection, diagnosis of AD at early stages is necessary for impactful treatment of the disease. Thus, CAD plays a pivotal role here. The data collected during the screening tests and diagnostic procedures are used by the CAD system to make predictions about the disease and its progression level. A CAD system at its core is a ML model built by training the model on the historical diagnostic and other relevant data. Due to the advancements in the fields of data acquisition technologies and machine learning, today CAD is robust and effective if not on par with a human counterpart. The emerging field of deep learning in ML has contributed significantly to the development of CAD systems for diagnostic purposes. In this subsection we present the fundamentals of CAD concepts for Alzheimer’s disease (Gupta et al. 2022).

2.2.1 Use of CAD in AD

A computer-aided diagnostic system finds several applications in the diagnosis of the AD. A few of them are discussed here. In neuroimaging analysis, using CAD it is possible to find patterns and alterations that indicate AD in brain scans such as PET and MRI. Automated volumetric analysis tools can measure the volume of different brain regions and identify structural alterations over time, helping to diagnose AD and monitor the progression of the disease (Ferreira and Busatto 2011; Vinutha et al. 2019b). In biomarker analysis, CAD aids in detecting biomarkers linked with AD in medical images, such as amyloid plaques and tau protein tangles. This helps in disease diagnosis, monitoring, and drug discovery (Jack and Holtzman 2013).

By examining alterations in the brain morphology that take place before the major symptoms manifest, CAD aids in the early diagnosis of the AD. Through early intervention, the condition can be slowed or prevented from progressing. This assists in identifying the high-risk patients (Swainson et al. 2001). Also, in order to generate a personalized risk assessment for each person, machine learning algorithms can examine medical scans in addition to other data such as genetic, and demographic risk factors (Patterson et al. 2008).

2.2.2 Diagnostic data

Due to several varied types of diagnostic tests performed by physicians to detect AD, there are many types of data collected. Figure 1 depicts the classification of various data types that are used in CAD of the AD. They are described below:

Fig. 1
figure 1

Classification of the data types in CAD of AD

2.2.2.1 Clinical data

The involvement of clinical data is vital in the detection of AD. This data contains specifics about the patient’s medical background, physical examination, and other clinical evaluations. The diagnosis of AD is frequently made using clinical symptoms, and the precision of the diagnosis relies on quality of the clinical data provided. Clinical information used for diagnosing AD includes demographic data, medical history, cognitive and neuropsychological testing, and laboratory results. These numerous types of data reveal important details regarding a patient’s general health, cognitive abilities, and the presence of AD biomarkers (Bateman et al. 2012; Vinutha et al. 2019a).

2.2.2.2 Neuroimaging data

Imaging data has evolved as a significant tool in the diagnosis and detection of the AD. MRI and PET are some imaging modalities used to visualize anatomical and functional changes that occur during AD inside the brain. MRI scans offer intricate images of the brain’s anatomy and are proficient in recognizing the atrophy, or shrinkage, in specific regions of the brain that are linked to AD. PET scans employ radioactive tracers to evaluate brain metabolism and find tau tangles and amyloid plaques, that are two signs of AD, that accumulate inside the brain. Utilizing imaging data has greatly increased the accuracy of the CAD systems, enabling early diagnosis and better treatment outcomes (Johnson et al. 2012).

There are several datasets available to researchers for training ML models. These provide access to the diagnostic data related to the AD. They are discussed in the next subsection (refer Sect. 2.2.3).

2.2.2.3 Multimodal data

Multimodal data combines data from different perspectives about the same event. In CAD, multimodal is the integration of diverse forms of diagnostic data, such as genetic data, imaging data, clinical data, etc. This fusion of data not only increases the accuracy of disease detection but also provides valuable insights obtained from different types of diagnostic tests. It also aids in establishing the correlations among the tests (Kong et al. 2022). Although extensive investigations have been conducted on contriving ML models for AD diagnosis, there is still a clear gap in employing multimodal data as evident from the literature review presented in Sect. 3.3. We discuss the challenges and future directions with respect to multimodal data in further sections.

2.2.2.4 Hybrid data

While in multimodal data, we considered data from different modalities i.e., different diagnostic tests, in hybrid data there is a combination of different diagnostic data from same modality. For example, in case of neuroimaging diagnostic data, a combination of functional MRI (fMRI) and structural MRI (sMRI) is utilized for training the ML models. Through this method, pre-processing of datasets plays a vital role, i.e., role of a particular brain region or some context of diagnostic data in AD diagnosis is inferred here. Similar to multimodal data, although there are some works carried out in this direction, still there is a need to explore hybrid data in diagnostic ML models. We discuss the promising works in this regards in future direction section (refer Sect. 5).

2.2.3 Databases

As discussed in the previous section, there are several data types of diagnostic data to train a ML model for CAD system design and implementation. There exist several data repositories that aid researchers and developers to implement such a system. In the following section we discuss a few of the popular data repositories that hold AD diagnostic data. Table 1 lists the discussed databases along with their descriptions.

Table 1 List of data repositories available for AD diagnosis ML model development

ADNI (Jack Jr et al. 2008)

The ADNI dataset holds longitudinal data that are collected through modern imaging techniques, biomarkers, and cognitive assessments for detecting early brain changes that are connected to AD development and progression. There also exists a Japanese counterpart, J-ADNI (Iwatsubo 2010).

OASIS (Marcus et al. 2007)

It is a library of neuroimaging and clinical data on dementia. It includes diagnostic data from neuroimaging scans, such as MRI and PET images, and also cognitive testing and clinical evaluations. OASIS is a valuable resource for researchers exploring the early detection, and diagnosis of dementia. It is also used in the development of new biomarkers and therapeutic targets.

GEO Omnibus (Chang et al. 2017)

Gene expression omnibus (GEO) is a public database of gene expression information. It allows researchers to deposit and access a wide range of functional genomics datasets, such as microarray, RNA-sequence, and ChIP-sequence data. Datasets pertaining to particular genes, illnesses, or experimental settings can be found by doing a database search. It offers tools for data analysis and visualization in addition to access to raw data, facilitating easier data exploration and interpretation for researchers.

GARD (Hoskins 2022)

The Genetic and Rare Diseases Information Center (GARD) database is a centralized repository for information on genetic and rare diseases, including a wide range of resources for patients, families, healthcare practitioners, and researchers. More than 7000 uncommon diseases are covered in the database, along with details on genetic testing, clinical studies, and treatment options. Additionally, the GARD database offers a platform for interaction and teamwork between researchers and healthcare professionals working on the study and treatment of rare diseases. The GARD database is a valuable resource for furthering existing understanding of rare illnesses and improving outcomes for patients and families affected by these conditions.

NACC (Beekly et al. 2004)

The National Alzheimer’s Coordinating Center (NACC) database collects data from more than forty two Alzheimer’s disease centers throughout America and includes information on more than 47,000 participants. It contains a wide range of data, including clinical and neuropsychological assessments, neuroimaging data, genetics data, and autopsy data, making it a valuable resource for researchers studying AD and related dementia. The database includes data from a diverse population of participants, including individuals with AD, MCI, and HC, as well as individuals from diverse racial and ethnic backgrounds.

Apart from the above mentioned data repositories, there are also a few others. These include, Framingham Heart Study (2023), Australian Imaging Biomarkers and Lifestyle (2023), Lewy Body Dementia Center for Excellence at Stanford University (2023), Rohrer and Rosen (2013), UK Biobank (2023), Alzheimer’s Disease Repository Without Borders (2023), Computer Aided Diagnosis of Dementia (2023), and minimal interval resonance imaging in Alzheimer’s Disease (2023). In addition to this, some universities and hospitals also have created their own repositories for AD, such as Beijing Easy Monitor Technology, University Hospital of Modena, Italy, John Radcliffe Hospital in Oxford, UK, and Center for Biomedical Technology Madrid, Spain.

2.2.4 ML models for AD diagnosis

As discussed earlier, in CAD, identifying whether a given data sample is positive towards the disease diagnosis or not is to be addressed. This task is achieved through the ML models. In particular it is a prediction or classification task in machine learning. There exists several techniques or models to achieve it (Vinutha et al. 2018a). In this paper, we adapt a two type classification strategy to categorize such models. The first type, i.e., traditional models, includes ML classification algorithms that are long established in the field. The second type includes the newer DL models. Figures 2 and 3 categorizes the different ML models that we consider for review in this work.

Fig. 2
figure 2

Different traditional ML models employed by the state-of-the-art works in the last 5 years to design CAD system for the AD

Fig. 3
figure 3

Different DL models employed by the state-of-the-art works in the Last 5 years to design CAD system for the AD

The traditional class includes ML classifications and regression models like, support vector machines (SVM), random forest (RF), k-nearest neighbours (kNN), etc. In these models feature selection plays a pivotal role and thus aids in identification of the role of various biomarkers in the disease diagnosis. They are also helpful in establishing the correlations among them. However, designing such a model is tedious and time consuming. Also, these models suffer from poor performance when compared with the state-of-the-art DL models. While traditional ML models require manual feature selection, DL models automate this task (Vinutha et al. 2021). They use multi-layer neural networks for extracting complex features from training set. Due to this automated feature engineering, DL models have gained popularity for image classification task. With respect to AD, the most important biomarkers are neuroimaging data and thus DL models are the preferred choice nowadays. The most commonly used DL models in AD diagnosis by CAD systems are convolutional neural networks (CNN), recurrent neural networks (RNN), generative adversarial networks (GAN), etc. Although, the performance of the DL models is superior when they are compared with the traditional ML models, they lack explainability and are computationally expensive. We further discuss the applicability of these models for AD diagnosis in Sect. 4.

Also, there are bunch of tools available for the researchers and developers to implement the CAD systems. Some of them are specific to the AD. They are used for tasks such as data preparation, feature selection and extraction, test validation etc. A few of them are FreeSurfer, Brain Voyager, BioImage Suite, and others. A detailed description of such tools are given in Vinutha et al. (2016, 2018b).

3 Literature survey

This section presents a review of recent state-of-the-art research articles that address CAD of the AD. It consists of three subsections, in the first subsection, we provide an overview of systematic search done to find studies from various databases. In second subsection provide an overview of our classification taxonomy of the reviewed works. Next, we provide a comprehensive review of eighty papers in the last subsection. Each work is reviewed with respect to the model used, its implementation details, performance, advantages, and disadvantages.

3.1 Review protocol

This review adhered to the methodological guidelines outlined in the preferred reporting items for systematic reviews and meta-analyses (PRISMA) framework. The systematic search for relevant studies centered around keywords namely “Alzheimer’s disease detection”, “computer aided diagnosis”, “deep learning”, “neuroimaging”, and “multimodal data”. This search conducted across prominent databases, such as ScienceDirect, IEEEXplore, Pubmed, Scopus, and several other sources. Figure 4 shows the flowchart of the adapted PRISMA approach.

Fig. 4
figure 4

PRISMA workflow adapted for our survey study selection

To ensure a standardized analysis, only articles published in the English language were considered within the scope of inclusion. Further we considered only the recent scientific peer-reviewed publications published in the last 5 years. The publications for this review were selected in the following four stages, (i) in first stage, we carried a search over the reputed databases with relevant keywords, (ii) in second stage, we excluded irrelevant articles by reading their abstracts, (iii) next, we classified the works into our proposed taxonomy by going through the full-text of the relevant articles, and (iv) last, we carried out in-depth analysis of the state-of-the-art works by comparing them to gain our insights, gaps, and future directions.

The preliminary search yielded a total of 120 documents, which were subjected to screening process. Through this process, 17 documents were identified as duplicates and subsequently excluded. Among the remaining, 15 documents were deemed irrelevant to the review focus and were consequently excluded. Furthermore, an additional 8 documents were excluded based on predetermined eligibility criteria. Through this process we have selected eighty recent state-of-the-art publications from the year 2018 to 2023. Table 2 lists the databases from which the documents are retrieved for review.

Table 2 Databases of the documents collected for review

3.2 Classification overview

Due to advancements in the field of deep learning and data acquisition methods, there are a plethora of machine learning techniques available to the researchers for design and development of early diagnosis systems. In the past, CAD of the AD utilized traditional ML models. However, now the focus has shifted to DL models. Simultaneously, multimodal datasets are also taken into consideration by the state-of-the-art works. We have classified the existing works into two broad categories. Figure 5 depicts our classification taxonomy of the present CAD techniques for the AD. The first class consists of all the traditional ML models utilized for CAD system development (refer Sect. 3.3.1) While the second class includes the modern DL approaches (refer Sect. 3.3.2). Further, each of these classes are again sub-classified based on the dataset used for training the models, type of the model or its architecture, and also on the pre-processing and feature engineering techniques. In the next subsection, we review the articles according to the taxonomy that is established here.

Fig. 5
figure 5

Classification of the state-of-the-art CAD models for AD diagnosis

3.3 State-of-the-art review

In this section, we discuss the eighty state-of-the-art works on CAD of the AD that are published in the last 5 years. They are presented according to our classification taxonomy as described earlier.

3.3.1 ML models

For diagnosis of AD, there exists several biomarkers. The feature engineering task for machine learning algorithms makes use of these biomarkers to select appropriate characteristics from the training data as features. Also, these models are explainable with respect to the features and thus, aid in biological knowledge of the disease. In the following paragraphs we review such works that implement the traditional ML models according to our classification taxonomy.

In a few works, information about the human brain, such as shape, volume, and region are taken as features for machine learning algorithms. In Fuse et al. (2018), Fuse et. al., examined the efficacy of a technique for differentiating between healthy people and AD patients using information about the brain’s shape. Shape information is extracted from lateral ventricle a brain region excluding septum lucidum using P-type Fourier descriptor. Further, a SVM model is utilized for classification by combining various descriptors along with shape information. An accuracy of 87.5% is achieved in the classification task performed. The current research suggests that shape information, as opposed to traditional volume ratio, is more helpful in diagnosis. A thorough analysis of the traits for classification and interpretation, however, is omitted in this work.

Similarly, Mofrad et al. (2021) designed a framework for developing longitudinal MRI examination-based cognitive decline prediction models. Both the transition from ND to MCI and from MCI to AD is effectively detected through its utilization. The work focuses on developing a system that uses hippocampi morphometric measures taken by the MRI. The proposed approach demonstrates accuracy rates of 73% and 78% for detecting the transition from ND to MCI and from stable MCI to AD, respectively. One notable strength of this work is its capability to handle longitudinal datasets, accommodating scenarios where subjects have varying numbers of scans captured at various time intervals. However, predicting the conversion from ND to MCI poses inherent challenges. This is primarily because of its difficulty in differentiating between cognitive decline correlated with symptoms of MCI and cognitive decline while maintaining a stable baseline cognitive function. This limitation represents a significant drawback of the suggested framework. Further, Uysal and Ozturk (2020) proposed ML model for early AD diagnosis through the analysis of neuroimages. Various classification models, including logistic regression (LR), decision tree (DT), (Gaussian Naive Bayes) GNB, SVM, kNN, and RF are utilized to predict the diagnosis, considering factors such as gender, age, and right and left hippocampus volume from ADNI data. Notably, the combination of multiple biomarkers yields superior diagnostic estimations compared to individual biomarker evaluations. Furthermore, the findings highlight the influence of both gender and atrophy values on the diagnostic decision-making process. kNN achieves the highest accuracy of 98% when considering the combination of the gender, age, and brain region volumes for the AD-ND diagnostic group. However, the relationship between cognitive decline specifically in the left hemisphere and AD remains insufficiently addressed.

Aruchamy et al. (2020) developed a technique to detect AD from 3D MRI scans by using first-order statistical features. It incorporates white matter and grey matter slices from three perspectives: axial, coronal, and sagittal. Feature reduction is carried out using principal component analysis (PCA). The algorithm’s efficacy is evaluated using LR, NB, SVM, and AdaBoost classifiers. The suggested method attains an accuracy of 90.9%. The experimental results reveal that white matter slices with a coronal view yields the highest accuracy. However, the performance of AdaBoost in white matter classification for the sagittal view and the performance of NB in grey matter classification using the axial view are deemed unsatisfactory. Similarly, Bi et al. (2020) developed an unsupervised DL approach based on MRI scans for diagnosis of the AD. The methodology employs PCANet, an unsupervised CNN model to extract essential patterns from the MRI scans. For classification purposes, a k-means clustering based technique is employed. The proposed method utilizes three orthogonal panels (TOP) along with a single slice of an MRI image as input data. Utilizing input data consisting of MRI images’ TOP slices leads to substantial increase in classification accuracy. Specifically, for a single slice, it attains an accuracy of 95.52% in MCI versus AD and 90.63% in NC versus MCI classification. In contrast, utilizing TOP data results in achieving 92.6% accuracy for NC versus MCI and 97.01% for MCI versus AD. Lack of fine-tuning in the k-means approach and the relatively small-scale dataset are the limitations of this study.

Table 3 Summary of latest works based on brain morphology

As evident from the above works, morphological features of the human brain captured through MRI and PET scans are useful to establish disease-relatedness. Further research has to be carried out in this direction in the future as discussed in the Sect. 5. Table 3 lists the summary of the most recent works that are discussed above.

Also, genetic data plays a pivotal role in CAD. As seen in Mahendran and PM (2022), Mahendran and Vincent proposed a classification model involving DNA methylation process as features. For embedded feature selection, LASSO regression, SVM, RF, and AdaBoost are employed. The accuracy of this model is 88.7%. When quality control is applied, the data is greatly skewed as low-performing samples (p-values > 0.01) are excluded. The flaw in this methodology is that not all genes linked to AD are examined. On the same line, Vinutha et al. (2020) devised a ML framework for functional impairments and cognitive assessment in AD through clinical datasets. To improve classification results, the imputation approach is used to predict values that are missing in the study dataset. The sampling method is used to balance the study data, that outperforms the imbalanced training model. To retain the fittest individual traits, a genetic algorithm-based elitism technique is applied. R is used to implement the proposed framework. Its maximum accuracy is 96.06%. The feature selection on the selected scores utilizing various optimization strategies is missing. In Table 4, a summary of these works is presented.

Table 4 Summary of latest works based on clinical diagnostic data

Diagnostic scans play an equal role in the detection of disease, earlier models focused on MRI image data for classification. Sivakani and Ansari (2020) proposed a ML framework for AD diagnosis. The ML algorithm is utilized for both feature selection and extraction, followed by classification on the OASIS longitudinal dataset. EM algorithm is used for clustering. Best First Search (BFS) is utilized for selecting the features. Gaussian process algorithm is employed for performing classification of the OASIS dataset, and Linear Regression is utilized for constructing the model. ML is used for feature extraction, however, DL yields more precise results. Similarly, Mathew et al. (2018) devised a method for AD diagnosis through MRI scans. For classification, probabilistic neural network (PNN), SVM, and kNN are employed. Approximately 600 images are utilized for training the classifiers. For PNN, kNN, and SVM the trained model displays an accuracy of 85%, 77%, and 68% respectively. The primary benefit is the ability to extract several statistical properties from MRI images for in-depth study, including features for contrast, homogeneity, correlation, and energy. In terms of accuracy, PNN, a DL method, demonstrates superior performance compared to the heuristic ML methods such as kNN and SVM.

Sheng et al. (2021) devised an approach that aims to improve the accuracy of classification AD stages by optimizing feature selection. For the purpose of preparing and segmenting structural and functional brain MRI data, a non-Human connectome project preparation approach is established. With the help of various ML classifiers, collection of significant network properties that undergo substantial changes during the specific progressive connection is extracted. The average accuracies in the binary classification are as follows: (i) HC versus Early MCI (EMCI) is 90.5%, (ii) HC versus Late MCI (LMCI) is 92%, (iii) HC versus AD is 95.5%, (iv) EMCI versus LMCI is 86.0%, (v) LMCI versus AD is 87.0% and, (vi) EMCI versus AD is 88.5%. DT, kNN, and ensemble approaches are examined in this study, and SVM consistently outperformed them in terms of accuracy. Its primary flaw is that it has a small sample size, that causes model underfitting. Puente-Castro et al. (2020) developed a DL technique using sagittal MRI scans for automatically detecting the presence of AD. This work leads to two primary conclusions: (i) in sagittal MRI, the effects of AD and its stages can be differentiated and, (ii) the obtained results from DL techniques using sagittal MRIs are on par with the state-of-the-art technique, that utilizes MRIs in the horizontal plane. The ResNet architecture is employed for feature extraction, while SVM is utilized for the purpose of classification. It is implemented on a system with NVIDIA GPU. It attains an accuracy of 78.64% and 86.81% on ADNI and OASIS datasets respectively. This study demonstrates that despite being less commonly used, sagittal-plane MRIs are shown to be equally effective as MRIs from different planes in detecting early stage AD. In comparison to OASIS, the ADNI dataset used for this work lacks a sufficient number of representative cases for each stage across all possible scenarios. Table 5 compares the works that use MRI data as features for ML models.

Table 5 Summary of latest works based on MRI diagnostic data

In some works, a hybrid dataset is used for training the ML models. Features from different scans are combined to form hybrid dataset. Zhang et al. (2021b) presented a multimodal neuroimaging-based AD multi-class framework incorporating fusion techniques and embedding feature selection. It has three innovative features: (i) the multi-class hinge loss is combined with an \(l_{2,1}\)-norm regularization term., (ii) an \(l_p\)-norm (\(1< p < \infty \)) regularization term is established for effectively combining complementary information from each modality,and (iii) a theorem is presented that transforms the issue of minimizing the multi-class hinge loss using both \(l_{2,1}\)-norm and \(l_p\)-norm. SVM achieves the highest accuracy of 87.45%. The optimization procedure is theoretically proven to converge to the global optimum. When handling large quantity of subjects, this model encounters substantial challenges due to the significant storage space required to store a kernel matrix. Similarly, Janghel and Rathore (2021) presented a ML-based approach for detection of AD using ADNI data. For extracting features VGG16 CNN model is employed. For classification task DT, k means clustering, SVM, and linear discriminate are employed. 99.95% is the accuracy obtained for the fMRI dataset categorization, whereas the PET dataset has an average accuracy of 73.46%. fMRI achieves highest accuracy across all classifiers. Execution time of classification need to be reduced. In Table 6 the works that use hybrid datasets for training the ML models are listed and compared.

Table 6 Summary of latest works based on hybrid datasets

Best results are achieved through the combination of clinical and diagnostic scan data. Here, clinical datasets are fused with MRI or PET scans for training the ML model. Qiu et al. (2022) proposed a ML framework that identifies people with NC, MCI, AD, and non-AD Dementias (nADD) by completing several diagnostic processes in succession. The demonstration of a variety of models that may accept various combinations of commonly gathered clinical data, including medical history, demographics, functional assessments, and neuropsychological testing. It employs ML classifiers like RF, DT, SVM, kNN, and multi layer perceptron (MLP). It is implemented on a system with 4 NVIDIA RTX 2080Ti GPUs and Intel 3.3 GHz 14-core i9 processor using PyTorch, Matplotlib, NumPy, Pandas, and SciPy. With an accuracy of 95%, the combination of CNN and the CatBoost demonstrated superior performance in classifying various cognitive categories. To maximize the utilization of all available data, a common fusion model is employed to combine both MRI and non-imaging features from diverse sources. Lack of multi-label classification makes it impossible to identify coexisting dementia diseases in the same person. On the same line, El-Sappagh et al. (2021b) evaluated the effectiveness of five commonly used ML algorithms – LR, DT, kNN, RF and SVM for predicting the AD progression across a 2.5-year prediction horizon. The comorbidities of the patient, their cognitive scores, their medication history, and demographic data are just a few of the cost-effective time-series features used to optimize these models. These models perform ND, subjective MCI (sMCI), prodromal MCI (pMCI), and AD, a 4-class classification tasks. Random forest outperformed other ML algorithms with an accuracy rate of 90.51%. According to these studies, all models have strong predictive value when comorbidity and medication variables are combined with other features early on, with the RF model performing the most accurately when compared to other models. Other ML algorithms’ performances are less than satisfactory in comparison. Also, El-Sappagh et al. (2021a) developed a precise and comprehensible model for diagnosing and detecting the progression of AD. This methodology offers accurate decisions for physicians as well as reasons for each decision. The proposed approach aims to fully integrate high-quality Alzheimer’s data for accurate prediction and identification of the disease progression within 3 year time-frame from the baseline. It employs RF and DT implemented in Python, scikit-learn, and Eli5, SHAP for the explanation. The first layer achieves an accuracy of 93.95%, and second layer achieves an accuracy of 87.08%. In this work, only baseline data is used to make decisions, and due to the chronic nature of AD, the analysis of time-series data plays a crucial role. Table 7 compares these works based on modality of the diagnostic data used.

Table 7 Summary of latest works based on multimodal data

In some works, traditional ML models are augmented with newer DL models to retain the feature extraction step and improve the performance accuracy. Mohammed et al. (2021) proposed a hybrid approach utilizing DL for diagnosing AD and dementia at early stages. The performance of two CNN architectures, namely ResNet-50 and AlexNet, along with hybrid techniques combining traditional ML and DL such as SVM+ResNet-50, and SVM+AlexNet, are assessed. t-SNE is employed for transforming the high-dimensional data to a lower-dimensional space for representation and visualization. SVM+AlexNet model produces an highest accuracy of 94.8%. The hybrid algorithm between ML and DL achieves better results than DL is a major advantage. But training the model is time consuming. Similarly, Abuhmed et al. (2021) devised and tested two hybrid DL architectures for the AD detection. Multiple deep BiLSTM models are combined to construct the model. The first architecture is the multitask regression model, and the second model is a hybrid architecture that utilizes the BiLSTM model to extract deep features. It is implemented using Python, and Keras on a system with Intel Xenon(R) CPU, 3 GeForce GTX TITANX 12GB GPU, with CUDA\(-\)10.0 and attains an accuracy of 82.63%. For enhancing the performance of the model, DL models output layer is substituted with more precise classifiers like RF. The impact of fine-tuning the suggested model and analyzing the run time complexity of the models is not investigated.

Further, Khagi et al. (2019) suggested a DNN architecture that uses deep layers feature extraction, without considerable hardware resource training, or classification error. CNN inspired by Alexnet, is used for automatic feature extraction from images obtained through the extraction of slices from whole-brain MRI scans. Feature ranking algorithms such as Laplacian, ReliefF, Mutinffs etc. are employed for selecting features. ML algorithms such as SVM, kNN, and Subspace Ensemble are tested under various conditions. It achieves classification accuracy ranging from 98 to 99%. Because there are fewer sets of training resources, CNN is not optimal. In the same direction, Song et al. (2019) developed and evaluated a multi-class graph CNN (GCNN) classifier. It is utilized in classifying patients on the AD spectrum into four distinct categories based on network analysis: ND, EMCI, LMCI, and AD. Structured connection graphs obtained from diffusion tensor imaging data are used to train and evaluate the network. It achieves an accuracy of 89% when implemented on the Caffe Platform with an NVIDIA 1080 Ti GPU, using a Thinkmate VSX R5 540V4 workstation. This model demonstrates that the GCNN outperforms the SVM classifier and achieves great performance even with small sample sizes.

Similarly, AlSaeed and Omar (2022) developed a method for AD diagnosis from MRI images using a ResNet50 pre-trained CNN model. Its primary purpose is analyzing and enhancing the performance of classifying MRI images for early detection of AD by leveraging DL techniques. The performance is evaluated for ResNet-50. Features are extracted using SVM, RF, and softmax for the classification task. It is implemented on the Google Collaboratory Platform using TensorFlow, Keras, scikit-learn, NumPy, OpenCV, NiBabel (2023); Nilearn (2023)), and DeepBrain AI - Best AI Video Generator (2023). The model performed well in terms of accuracy when it is trained using data from ADNI than that of model trained on MIRIAD dataset. An accuracy of 99%, 92%, and 85.7% is obtained when ResNet50 is combined with softmax, SVM, and RF respectively. Accuracy is low when combined with RF. Table 8 presents the summary of the traditional ML models that are augmented with the newer DL models.

Table 8 Summary of latest works based on traditional ML models augmented with DL models

As evident from the above overview of the works, traditional ML models rely on the characteristics of the training data. These models although are explainable, they lack the computational efficiency of the modern approaches. Such newer ML models are discussed in the next section.

3.3.2 DL models

Deep learning models, a kind of machine learning model extract higher (deeper) level of features from the given training data. In most cases, a DL model is made up of a large number of unbounded layers, these layers are constructed from bounded (fixed) size of neurons. Where the neurons are the fundamental processing units of DL models that extract features. In the case of traditional ML techniques, the features were drawn manually from the training data through some mathematical models. However, in DL models, the features are automatically extracted from hidden layers. These characteristics aid in numerous applications. CAD has drawn such an application in recent times. In the following paragraphs, we discuss such models that address the problem of AD disease diagnosis.

As said earlier, traditional models require manual tuning of the features and thus, pre-processing of the data is an integral part of them. However, in the case of DL models, pre-processing leads to better performance. In this regard, Kabir et al. (2021) created a multi-classification-based AD detection system using a comparative study of brain MRI data. An 18-layer CNN model architecture is used. It is a special pre-processing method that employs one sequential model for all the three anatomical planes of MRI scan. Pre-trained CNN models, VGG19 (Simonyan and Zisserman 2014) and InceptionV3 (Szegedy et al. 2016) are used for comparison. It shows an accuracy of 80.09%. A hand-crafted pre-processing approach is used to fix issues with the unbalanced data. The architecture can be improved by implementing an improved loss function, clustering procedure, and k-fold class validation. Furhter, Hussain et al. (2020) developed a model using brain MRI data for binary classification and AD detection based on a 12-layer CNN. Image scaling and image denoising are the two data pre-processing techniques used. The proposed model attains an accuracy rate of 97.75%. This model outperforms Xception (Chollet 2017), InceptionV3, VGG19, and MobileNetV2 (Sandler et al. 2018) four pre-trained models, in-terms of performance. The drawback is that the multi-class classification approach is not applied.

Similarly, Yagis et al. (2020) proposed a 3D VGG version of CNN for evaluating the classification accuracy using two freely accessible data sets, ADNI and OASIS for AD diagnosis. 3D models are utilized to avoid information loss that happens when 3D MRI data is transformed into 2D images for analysis by 2D convolutional filters. Data pre-processing through co-registration, skull stripping, brain masking, and re-sampling were employed. It is implemented on a system with NVIDIA RTX2080 GPU using Keras, and TensorFlow. The accuracy of the proposed model through the 5-fold cross validation techniques are 73.4% on the ADNI dataset and 69.9% on the OASIS dataset respectively. If the network shape and hyperparameters had been tuned, the classification accuracy would have increased. On the same line, Zheng et al. (2018) devised an algorithm for early AD diagnosis through PET scans utilizing an ensemble of AlexNets (EnAlexNets). The EnAlexNets algorithm effectively addresses the limited information provided by a single image mode and simplifies the requirements for multi-mode images in the diagnosis process. The suggested method is divided into four main steps: (i) anatomical volume identification, (ii) enhancement of data, (iii) classification of patches, and (iv) discriminating different types of dementia. It attains an accuracy of 91%. Compared to utilizing multi-modal images, this approach is more robust in capturing distinguishing features and demonstrates superior performance. These analyses can be made more efficient by strengthening the CNN architecture and increasing the magnitude of the dataset.

Marzban et al. (2020) developed a robust method for classifying MCI and AD against HC. This system is designed to operate on a low-cost network, characterized by a shallow architecture and efficient processing. CNN is used as the classification algorithm, with grey matter volumes and diffusion maps as input images. It demonstrates the significance of integrating data from diverse imaging modalities such as sMRI, diffusion tensor imaging (DTI) and elucidates its impact on the overall results. The implementation of the model is conducted on a 64-bit Windows Server 2019 computer featuring 2 GHz processor with Intel Xeon CPU, 384 GB RAM and eight cores. The CNN model is created with MATLAB R2018b. The model is 71.1% accurate for HC versus MCI and 88.9% accurate for HC versus AD respectively. There should be no overlap in the subject IDs between testing and training set. In the same directions, Mehmood et al. 2020 devised siamese CNN model for classifying dementia stages by drawing inspiration from the VGG16 architecture. By utilizing augmentation techniques, this method expanded the incomplete and unbalanced data. The model is implemented on Keras library and is simulated on a Intel Xeon (R) CPU with 64GB of memory, and E5-2630V3 2.40GHz*32 processor. Validation result of 99.05% accuracy is reported for dementia stage classification. Normalization procedures are utilized to minimize overfitting and regularize the model’s performance. Because there are fewer annotated data and pre-processing phases, as well as intricate parameters for segmentation and normalization, it is difficult to handle numerous parameters effectively.

Also, Alshammari and Mezher (2021) implemented CNN for AD stage classification and detection using MRI scans. Basic pre-processing techniques are employed in the implementation methodology before the features are extracted and reconverted into a 1D vector that is submitted to the CNN with accompanying labels. In accordance with the four different stages of AD that are considered, four labels are employed: ND, VMD, MID, and MOD. This model is implemented on a system with Windows 7 OS using Tensorflow, OpenCV, Pandas, NumPy, MatplotLib, and PysimpleGUI in Pycharm IDE with Anaconda developer GUI. Only 10 epochs were run to determine the model’s performance. The model’s documented accuracy is 97%. Due to the smaller amount of training samples, the class labeled as "Moderate Demented" exhibits the lowest accuracy score; other classes vary similarly. Also, the training epoch is very less. Further, Bringas et al. (2020) developed a DL model for identifying the AD stages employing mobility data of patients. The methodology consists of two distinct phases: (i) a pre-processing phase that reduces the length of accelerometer data sequences and ensures uniform intervals between data points for improved data consistency. and, (ii) a supervised learning phase that creates a CNN for predicting different stages of AD. It is implemented on a system with RTX2070 Graphics Card using Keras and, scikit library attaining an accuracy of 90.91%. In comparison to traditional supervised learning models, the utilization of this CNN-based strategy significantly enhances the accuracy in identifying different stages of AD. Data is dependent on accelerometer sensors that do not guarantee data acquisition rate.

Finally, Tufail et al. (2022) suggested CNN model for early stage classification of AD into MCI, AD, NC classes utilizing the PET scans. To create 2D CNN architectures, ideas including distant domain transfer learning (TL) and blurring before subsampling are applied. CNN architectures performed well in the 3D domain, showcasing the significance of training in higher dimensions. Three binaries and one multi-class classification is carried out. The statistical analysis reveals that through five-fold cross validation the 3D-CNN model performed better, reaching an accuracy of 62.25% on NC versus MCI, 71.70% on MCI versus AD, 89.21% on NC versus AD binary classification, and 59.73% on AD versus MCI versus NC multi-class classification. The above works are summarized and compared in the Table 9.

Table 9 Summary of latest works based on DL model data pre-processing

In a few of the works, the morphological features of the human brain are utilized for feature extractions. For example, Murugan et al. (2021) developed a model named DEMentia NETwork (DEMNET) for detecting different stages of dementia. For identifying particular AD traits from the MRI images CNN is employed. The proposed model takes into consideration the four phases of dementia and performs a specific diagnosis by generating high quality disease probability maps based on regional brain structure. It attains an accuracy of 95.23%. The suggested model demonstrates effectiveness in identifying specific brain areas correlated with AD and serves as an effective decision supporting system in its diagnosis. However, barriers to obtaining better results include the unbalanced dataset and pre-processing operations like intensity normalization and skull stripping. On the same line, Basher et al. (2021) devised a method that utilizes volumetric characteristics extracted from sMRI data of the right and left hippocampi on a slice-by-slice basis for diagnosing AD. This method combines a CNN model and a DNN model. Ensemble of Hough-CNN technique in two-stages is employed to automatically localize the brain regions. By leveraging the extracted volumetric features, the suggested method achieves an accuracy of 94.82% for right hippocampi and 94.02% for left hippocampi respectively. The suggested method excels in accurately classifying AD and NC. Earlier proposed DVE-CNN and Hough-CNN methods performance may pose challenges when compared to the current approach. Similarly, Huang et al. (2023) developed a DL approach for analyzing cerebral grey matter alterations in MCI through voxel-based morphological analysis. The CNN-based DL approach is utilized for extracting characteristics from the images of cerebral grey matter. The accuracy attained for identifying people with MCI by the model is 80.9%. These findings show that brain morphology research gives an excellent method for non-invasive, objective evaluation, clinical, and early detection of AD. Despite the ADNI database’s abundance of MCI samples, clinically identifying MCI patients is difficult.

Further, Liu et al. (2022) devised a technique utilizing 3D DCNN to precisely differentiate patients with mild AD dementia from those with MCI and CN using sMRI. A reference model is constructed through the dimensions and thickness already known regions of brain that affected from the disease. NVIDIA CUDA parallel computing platform is used for model simulation. A progression forecast is made using the model. A batch size of four is used due to computational restrictions, and thus the learning rate is low and is time intensive. Table 10 presents the comparison of a few of the recent works that rely on human brain morphology for training the models.

Table 10 Summary of latest works on DL model that are based on brain morphology

Also, in one of the work, existing CNN architecture was modified, i.e., Basheer et al. (2021) devised a method for predicting the diagnosis of AD using a capsuled network (He et al. 2021). Modified CapsNets (M-CapNets), a type of modified capsule network that is more effective than CNN, is the proposed model. This model is implemented using Python, NumPy, seaborn, pandas, Matplotlib, sklearn, and TensorFlow on a system with an Intel i5, 16GB RAM, 8th generation processor, and CentOS obtaining an accuracy of 92.39%. Better hierarchical linkages, processing power, and accuracy are provided by M-CapNets. Applying a capsule network to heterogeneous data makes it more computationally expensive because it requires several steps to implement.

As seen earlier, there exist several datasets that hold AD diagnostic data. Due to the numerous test available for diagnosis, their data type also varies. In a few works, this is leveraged to construct DL models that fit for particular data types. In Basaia et al. (2019) constructed and verified a DL algorithm for prediction of diagnosis for MCI and AD. CNNs are trained by making use MRI images from ADNI. CNN performance in identifying AD, convertible MCI (cMCI), and sMCI is evaluated. All classifications exhibited exceptional accuracy rates, with the highest obtained in HC versus AD. In particular, the HC versus AD classification yielded an accuracy of 99% when utilizing only the ADNI dataset, and a slightly lower accuracy of 98% when combined the ADNI and non-ADNI datasets. CNNs are highly valuable in facilitating automatic individual patient diagnosis across the AD continuum. Despite the diversity of imaging techniques and scanners, without any prior feature engineering this strategy performed effectively. Clinical diagnosis and algorithm performance can be improved by a longer clinical follow-up. Similarly, Etminani et al. (2022) proposed a 3D DL model that employs fluorine 18 fluorodeoxyglucose PET scans for predicting ultimate clinical diagnosis of dementia and AD. The diagnostic outcomes predicted by this model demonstrate a commendable performance in comparison to human readers and their collective agreement. The 3D CNN model is constructed utilizing the VGG16 CNN architecture. It is implemented using the Keras on a system running the Ubuntu 18.09 OS and equipped with an NVIDIA Quadra GV100 GPU. It achieves a validation accuracy of 78.9%. The reasonably diversified datasets and sizable test sets in this study are its benefits. The method’s robustness is limited to the specific characteristics and data distribution found within the ADNI datasets.

Furhter, Giovannetti et al. (2021) developed a Deep-MEG approach by ensembling classifiers grounded on deep CNNs merged with image-based representations of magnetoencephalography (MEG) data. In order to anticipate the early stages of AD, brain biomagnetic signals arising from spatially distinct brain areas are utilized as MEG data in this study. This study employs TL via pre-trained CNNs. Accuracy values of 89% is achieved for predicting AD conversion in a 54 MCI sample subjects and 87% are obtained in 87 sample subjects, that includes 33 HCs. The proposed Deep-MEG technique is useful for early detection of changes in the spectral-temporal connectivity profiles. However, the results of three-class classification (HC, pMCI, sMCI) are not satisfactory. Also, Balboni et al. (2022) used spatial warping network segmentation to segment the MRI scans. The implementation is performed using Python 3.6.9 along with the Keras and TensorFlow libraries. In this work, TL has been successfully set up on CNN, boosting network performance. The study’s key disadvantage is the limited number of samples utilized to evaluate the new models, that limits the statistical power of the results. Finally, Folego et al. (2020) employs AD biomarkers from sMRI for CAD design. It is implemented on a system with Intel Xeon 2.40GHz E5645 CPU around 2GB RAM and NVIDIA GPU:GeForce GTX TITAN X. When the method is set up with ADNet-DA, an ML domain adaptation, accuracy of 52.3%, is obtained. One contribution of this study is the construction of a DL system that is totally autonomous, relatively fast, and capable of providing competitive results without relying on any domain-specific disease knowledge from patients. Table 11 compares the above DL models based on the dataset used for model development.

Table 11 Summary of latest works on DL models based on dataset utilized

In other works, several DL models are used for training the CAD system. In the following paragraphs, we summarize such research efforts. Ghazal and Issa (2022) developed an technique based on multi-class classification by employing TL for AD detection utilizing MRI scans. The suggested system model is based on pre-trained AlexNet for detecting AD in its early phases. It categorizes images into four stages: MID, MOD, ND, and VMD. The proposed model has an accuracy of 91.70%. It does not require any hand-crafted features and is quick and simple to use. Its advantages include modest image databases. The model’s accuracy is low because of the limited number of epochs.

Odusami et al. (2021) devised an DL based approach for predicting MCI, EMCI, LMCI, and AD. For successfully classifying fMRI brain scans, it employs a modified ResNet18 fine-tuning technique. The notion of TL underpins the process of network fine-tuning. In all of the studies, the PyTorch library is utilized with Python. The model performance at an accuracy of (i) 99.99% for AD versus EMCI, (ii) 99.95% MCI versus EMCI, and (iii) 98.5% for AD versus LMCI, respectively. The advantage of the proposed scheme is that fine-tuning of the model yields better accuracy and optimal performance without the need of dropout layer. The fine-tuning model showed signs of overfitting, and the use of dropout did not effectively mitigate this issue.

Similarly, few CAD systems for the AD utilize GAN architecture for training and predictions. In this regard, Jung et al. (2023) developed a conditional GAN (cGAN) for generating high-resolution 3D MRI images representing different phases of AD. To ensure realistic and smooth transition in 3D space, an additional module is incorporated into the cGAN architecture. This allows for the generation of visually coherent and realistic MRI images that capture the progression of AD across different stages. This model comprises of a 2D discriminator and a 2D attention-based generator. The approach incorporates a 3D discriminator that can produce continuous axial view in 2D slices, leading to generation of high-resolution 3D MR volumes. It provides an identity loss that is adaptive for maintaining the original identity while achieving seamless deformation. Similarly, Roychowdhury and Roychowdhury (2020) correlated brain shrinkage with the AD progression. GANs are used to generate synthetic MRI images using a set of conditional deep convolutional generators. Further, these synthetic images are assessed by calculating the fractal dimension of the cortical brain ribbons. The feasibility of this approach is illustrated by employing a cascade of adversarial networks that replicate various stages of the AD. However, training phase of the proposed model requires a significant amount of GPU memory and is also time consuming.

Few others used Auto-encoder as the DL model. Pinaya et al. (2021) assessed structural neuroimaging data from AD and MCI patients to assess normative models based on deep autoencoders. Using the adversarial autoencoder, a normative model is created. RVM with a linear kernel is utilized to perform classification analysis. This model is built with TensorFlow and sklearn_rvm packages. In most situations, the classifiers outperformed the normative method in terms of mean performance. Although traditional classifiers performed better in certain circumstances, the obtained difference is not Statistically reliable in most of the of cases. Further, Guo and Zhang (2020) developed an earlier detection approach for AD based on DNN and various medical data have been built using resting-state fMRI. All fMRI images and textual data such as gender, genetics, and age are used in model training and data classification. Functional intellectual networks are constructed by leveraging the correlation of Resting-state fMRI (R-fMRI) signals. These networks are used for enhancing the formation of neural networks by incorporating information from correlation coefficients. When compared to standard procedures, the proposed methodology improves diagnosis accuracy by about 25%. After around 50 training iterations, the convergence of the loss function starts to decelerate and decrease, indicating inadequate hyperparameter selection.

Also, in the domain of CAD systems for the AD, other DL models are also experimented with. Our findings on them are as described below. Zhu et al. (2021) proposed a dual channel DL model. It comprises of three main constituents: (i) Patch-Nets, that incorporates spatial attention blocks, (ii) a pooling operation based on attention multi instance learning (MIL), and (iii) a global classifier that includes attention mechanisms. It is implemented using the PyTorch package on a system having NVIDIA GTX Titan x GPU system. For NC versus AD classification, an accuracy of 92.4% is obtained and 80.2% is obtained for sMCI versus PMCI classification. The advantage of this model is its superior diagnostic performance, surpassing several state-of-the-art works, while also detecting discriminative abnormal regions in sMRI scans.A notable limitation of this approach is that when patch location ideas are isolated from the subsequent network based on group comparison it impacts on achieving optimal performance. Bi and Wang (2019) proposed an approach that involves the development of a multi-task learning deep probabilistic model. It aims in classifying Electroencephalography (EEG) spectrum images into different AD classes. Two components comprise this strategy:(i) a strategy of learning multiple tasks simultaneously and, (ii) an advanced model that integrates discriminative and generative abilities using deep convolutional networks. Since the developed model links feature extraction and classification, it performs better than other generative models. It has a 95.04% accuracy rate. Another critical element is the use of multitask learning to overcome overfitting. The use of regularization instead of other data types proves the uniqueness of the model. The misclassifications between MCI and AD account for a substantial proportion of the overall errors.

Zhang et al. (2021a) developed an explainable DL model for CAD system. It is based on the residual attention theory of neural networks that enables end-to-end learning through the use of sMRI images. This technique has two contributions, (i) a residual self-attention DNN to gather spatial, local, and global information from MRI scans for increasing the diagnostic performance, and (ii) a method based on gradient-based localization class activation mapping for increased interpretability. The implementation is carried out using the PyTorch framework on a server having an Intel(R), NVIDIA 2080TI GPU, Xeon(R) CPU, and 64GB memory. The proposed method automatically extracts nonlinear and high-dimensional features from entire MRI image leading to enhanced classification performance in the diagnosis of AD. The batch size utilized is small, that results in increased noise in gradient estimates. EL-Geneedy et. al., (Marwa et al. 2023) built a pipeline based on DL for precise AD stage classification and diagnosis. 2D T1-weighted MRI images and a shallow CNN architecture are used in the proposed analysis pipeline. The proposed model includes both global and local classification as well as quick and accurate diagnosis. A NVIDIA TESLA P100 GPU is used to implement the model, that has a 99.68% accuracy rate. The advantage of this approach is the improved accuracy that may be reached with the appropriate network architecture choice. It is not suitable for large-scale datasets.

Bangyal et al. (2022) devised an approach employing DCNN for AD diagnosis. For fair comparison, ML-based approaches namely LR, RF, kNN, SVM, SGD, Gradient boosting, XGB, MLP are used. The CNN model predicts AD into 4 stages: MID, ND, MOD, and VMD. The implementation of the method utilizes a system equipped with an Intel Core i7–1045 H Processor featuring 2.6 GHz base frequency with Turbo Boost, 12MB cache, 6 cores, 12 threads, and an GeForce RTX NVIDIA 2070 Max-Q with 8GB RAM. It achieves an accuracy of 94.61%. The moderate demented class has fewer images. Ahmad et al. (2021) implemented a CNN-based DL model for CAD. By identifying the difference between the affected brain and the healthy brain, it is useful for early AD diagnosis. Simple and residual neural networks performance is improved by the proposed CNN and CAD models. It displays a 97% accuracy rate. According to this study, DL algorithms are the best tools for categorizing clinical MRI data using the scale and shift homogenous distill properties that the CNN extracted. Nonetheless, a notable drawback is the limited size of the data utilizing for training.

Fan et al. (2021) developed a model based on U-Net (Ronneberger et al. 2015) using 3D MRI scans for diagnosis of AD. The suggested U-Net design includes structural characteristics and can be used for classification tasks. This model is implemented in the Keras library on a system with NVIDIA Titan xp GPU and achieves 86.47% accuracy. The U-Net model is effective in various tasks, including medical image segmentation, image classification, and incorporating skip-connections. Deep supervision effectively enhances models performance in challenging categorization tasks. The dataset size employed in the experiment is insufficient for real-world deployment.

Ding et al. (2019) proposed and validated a DL based algorithm for AD final diagnosis prediction. Prospective 18F-FDG PET brain scans from the ADNI are used as datasets. InceptionV3 is used in this study. It is implemented using Python, SciPy, and Keras on a system with 6 core i7 5930k 3.5GHz Processor and NVIDIA Pascal TitanX GPU with CUDA 8.0 and CuDNN 6.0. With great performance and resilience against external test data, this method shows how DL algorithms accurately predicts the AD diagnosis from brain PET imaging investigations. The DL algorithm’s robustness is constrained because of the limited quantity of test data available, and its performance is primarily applicable to the specific clinical distribution found within the ADNI training set.

Al-Khuzaie et al. (2021) proposed a DL methodology for distinguishing between AD patients and healthy people utilizing 2D anatomical slices obtained via MRI. In contrast to most earlier research, which used a 3D CNN, this work used 2D slices as input data for CNN. The CNN structure is trained using 2D slices for demonstrating weightings in deep network, which is known as Alzheimer Network (AlzNet). It is implemented in Python 3.6, and Keras, and attains an accuracy of 99.30%. Several parameters, including the dropout rate, number of filters, number of layers, etc. are examined in this study to determine their influence on the AlzNet system. Li and Liu (2018) developed an ensemble model by combining DenseNet with k-means clustering. It captures many local features of MRI scans. It is implemented on a system having NVIDIA GeForce GTX 1080 Ti GPU, using the Keras library. It is accurate in classifying AD versus NC at 89.5% and MCI versus NC at 73.8%, respectively. It is useful that the suggested model handles small training set challenges. DenseNet models parameters including the type and number of layers, are not selected in an optimum way for enhanced feature selection.

Salehi et al. (2020) implemented a CNN to classify and diagnose AD at early stages using MRI images. It is implemented with TensorFlow on a system with graphics Intel HD 6000 1536 MB and 8GB RAM, reaching 99% accuracy. It combines two image data sources to increase the quantity of images, resulting in models performance improvement. The restriction is that the moderate dementia class is not included. Ajagbe et al. (2021) presented a multi-classification of AD with CNN, VGG16, and VGG19. This study aims at improving the classification of AD images using DCNN. This model is implemented on a system with Intel Core, 8GB RAM, i7 4600U Processor, having Windows 10–64bit OS, and screen size of 14-inch, 256GB SSD. Matplotlib, TensorFlow, and Keras are the libraries used for implementation. Accuracy attained are 77.66%, 77.04%, and 71.02%, respectively. VGG19 performed the best of the three models used, CNN performed best than VGG16. The absence of real-life or local datasets, as well as the lengthy compute time of the machine used for implementation, are the constraints of this work.

Ahila et. al. (Hamdi et al. 2022) proposed a CAD technique for differentiating NC from AD patients based on characteristics from 18FDG-PET scans. Features from individual slices of FDG-PET images are extracted by decomposing it into multiple 2D slices. The proposed model has a higher capacity to discriminate AD from NC, with a classification accuracy of 96.8%. Because only 855 samples are utilized in model training, the tested accuracy does not apply to live datasets such as ADNI and OASIS. Ebrahimi et al. (2020) proposed a technique for early detection of AD by employing TL in 3D ResNet18 model. It enables knowledge transfer from 2D to 3D image datasets. This study compares 2D and 3D CNNs, and the findings indicate that using a 3D CNN with TL increases the accuracy of the model. It attains an accuracy of 96.88%. On an NVIDIA DGX STATION, networks are constructed and trained using the deep learning toolbox in MATLAB. The studies are performed on a system having 256GB GPU and 32 GB memory. TL significantly enhanced the accuracy of identification of AD from MRI data. The training, validation, and test set each had only 200, 32, and 32 scans, that is a relatively small number.

Goceri (2019) proposed a thoroughly supervised approach built using 3D characteristics to provide a reliable AD diagnosis from MRI. This work leads to contributions in the following aspects (i) a new 3D CNN architecture, (ii) a novel optimization approach based on Sobolev gradients that incorporates weights for each decision parameter, (iii) AD diagnosis using the suggested optimizer and CNN architecture, and (iv) comparisons of the outcomes from the most recent AD diagnosis methods that are used. The proposed model effectively extracts the desired information from MRI images and achieves an accuracy of 98.06%. When a batch size of 15 is selected, the validation accuracy exhibits a significant decrease, indicating poor performance. The above works are summarized and compared in the Table 12.

Table 12 Summary of Latest Works based on Standalone Architecture

Also, several different DL models were used independently to build the CAD systems. Like, Saratxaga et al. (2021) provided a DL method for analysis of brain MRI sequences in order to estimate the presence of AD. This approach makes use of the ResNet18, BrainNet2D (Jiang et al. 2019), and BrainNet3D network architectures. Up to 93% of balanced accuracy is attained. This proves that DL-based methodologies are a useful for developing a reliable CAD system based on MRI data. Incorporating a larger fused dataset sourced from multiple origins amplifies the diversity of input samples across various target classes, thereby empowering the model with greater strength. Battineni et al. (2023) came up with a DNN in the detection of AD for brain imaging studies. It involves three models for AD classification that are ANN, DenseNet, and MobileNet. These three models are contrasted against two traditional ML models, such as LR and SVM. The model attained a classification accuracy of 95.41%. MobileNet outperforms other models across all performance metrics, showcasing superior performance. The performance of traditional ML models is relatively unsatisfactory as that of DL algorithms.

Similarly, Islam and Zhang (2018) proposed a DCNN using brain MRI data for AD diagnosis. InceptionV4 and ResNet are developed as baseline DCNNs, and their architectures are tweaked to categorize 3D brain MRI data. The implementation is carried out on a Linux machine equipped with an GeForce NVIDIA GTX 770 GPU, AMD A8 CPU, and 16GB RAM. With an accuracy of 93.18%, the model accurately diagnoses AD in its early stages. It performs better classification even with sparse data and is unaffected by the vanishing gradient problem. The Moderate Demented class has poor classification performance. Sadat et al. (2021) implemented five state-of-the-art CNN models viz., ResNet152V2, Inception-ResNetV2 (Szegedy et al. 2017), VGG19, EfficientNetB5 (Tan and Le 2019), EfficientNetB6, and a custom-designed model. Ensemble learning is employed by combining various combinations of the six models to improve the overall outcome. This approach achieves an accuracy of 96%. With the aid of TL, it is further improved by utilizing newest and most modern structures. This work’s weakness is its reliance on smaller datasets.

On the same line, Suganthe et al. (2021) devised a DCNN model for AD detection and its stages using MRI scans. This model is trained using the following classes: ND, VMD, MID, and MOD. A combination of Inception and ResNet formulation is used to diagnose AD. It achieves an accuracy of 79.12%. This model employed the sagittal view of T1-weighted MRI images. This model is created in Python. Increasing the quantity of training and validation samples, as well as fine-tuning the CNN model, enhances model accuracy.

Furhter, Hazarika et al. (2023) used brain MRI images for implementing some DNN models that are commonly used for AD classification. They presented a lightweight hybrid approach that incorporates AlexNet and LeNet. It is implemented on a system using Python with 12GB RAM, 500GB SSD, 2GB Graphics, and i7 Processor. It achieves an accuracy of 93.58%. LeNet (LeCun et al. 1998) and AlexNet performs exceptionally well in terms of computing speed. The concept of dense block is not employed. Shamrat et al. (2023) proposed a fine-tuned CNN model named AlzheimerNetfor identification of stages of AD and NC class. Initially, five existing models are trained and tested: MobileNetV2, InceptionV3, AlexNet, VGG16, and ResNet50. Due to the highest accuracy of InceptionV3, it was selected and further refined to create the AlzheimerNet model using the RMSprop optimizer. It is implemented using Python, OpenCV, NumPy, and sci-kit learn on a system having AMD Ryzen 7 3800X CPU, and MD Radeon RX580 series GPU achieving an accuracy of 98.68%. The model achieves significantly higher performance in comparison to the five existing models. The hybrid model based on PET and fMRI datasets from ADNI to diagnose AD is an option but not used. A brief overview of the works are discussed in the Table 13.

Table 13 Summary of latest works based on multiple independent DL models

In a few of the works, instead of using multiple DL models, the CAD architecture was designed by fusing the concepts from different types of DL models. For example, in Zhao et al. (2020) devised a multi-class classifier through 3D DenseNet model.A 3D multi-informational generative adversarial network (mi-GAN) is also incorporated along with the DenseNet model. mi-GAN is comprised of a 3D CNN-based discriminator and a 3D U-Net-based generator that are trained concurrently. It predicts pristine brain MRI scans at upcoming time intervals. To determine the stages of AD a focal loss function is designed. It is built using TensorFlow (2023) and NVIDIA Titan Xp GPU for implementation and achieves an accuracy of 76.67%. The multiclassification model accepts the images produced by mi-GAN, and this work’s key contribution is the reduction of focus loss. It performs poorly at predicting both the distribution of grey matter and the short-term brain image. Similarly, Ebrahimi et al. (2021a) proposed deep sequence modeling for MRI-based AD detection. ResNet18 (He et al. 2016), a CNN trained on an ImageNet dataset, is used in this study. Temporal convolutional network (TCN) and various varieties of RNN namely LSTM, Bi-LSTM, and GRU are used as sequence-based models. It is implemented using MATLAB deep learning toolbox (TensorFlow 2023) on a system having 96GB RAM and NVIDIA V100 GPU, achieving an accuracy of 91.78%. Improved control over enhanced parallel processing capabilities and the receptive field size are the two benefits of using TCNs in this work. The main disadvantage is not being able to execute categorization and feature extraction simultaneously.

Also, Liu et al. (2021) suggested a method for classifying AD by utilizing deep separable CNN. In place of traditional convolution, depthwise separable convolution (DSC) is employed. TL is additionally used to enhance model performance. GoogLeNet (Szegedy et al. 2015) and AlexNet (Krizhevsky et al. 2017), two trained models having average classification rates of 93.02% and 91.40%, respectively, are employed for TL, and DSC achieves an accuracy of 77.79%. The suggested neural network has a minimal power requirement, considerably reduced parameters, and computing costs. Training a model from scratch typically results in poor classification accuracy, particularly when dealing with limited datasets, which can lead to overfitting or underfitting. Hedayati et al. (2021) provided a model for deep feature extraction employing ensemble of convolutional autoencoders. It consists of two phases, in first stage features are extracted through a combination of pre-trained auto-endcoders and then a CNN model is developed through these features for classification of AD stages. With this method, the accuracy rates for NC versus AD, NC versus MCI, and AD versus MCI are 92.5%, 95%, 90% respectively. Less inaccuracy in NC diagnosis and high sensitivity for early diagnosis of AD are a few benefits. Both autoencoders and CNN’s convolutional structure filters are not tuned, that results in a decline in classification performance.

On the same lines, Venugopalan et al. (2021) proposed multimodal DL models for the early stage AD stage detection. Imaging, electronic health records (EHRs), and single nucleotide polymorphism (SNP) are used in multimodal data fusion utilizing DL models. After that, the ML models are applied to these multimodal data. They are clubbed into two groups i.e, (i) EHR and SNP and (ii) MRI. For the former group stack auto-encoder is used while 3D CNN is applied for the latter. Once the model is individually trained for each data modality, DT, RF, SVM, and KNN are applied for conducting integrated classification of AD stages. SNP+EHR, imaging+EHR achieves an accuracy of 78% and 77% respectively. Although, the work clubs multimodal data; these data are grouped meaninglessly and do not provide any significant knowledge on AD. Li and Liu (2019) used structural MRI images for detailed hippocampal analysis in AD by constructing a hybrid RNN and CNN. For learning the shape properties and intensity, DenseNets are built on deconstructed picture patches of the external and internal hippocampus region. For learning the high-level features in disease categorization, RNN with the stacked BGRU is cascaded to merge data from the hippocampus regions. It is implemented using the Keras library on a system with Ubuntu OS having NVIDIA GTX 1080 Ti GPU. It obtains 89.1%, 75%, and 72.5% accuracy for AD versus NC, NC versus MCI, and pMCI versus sMCI, respectively. The suggested model does not require tissue segmentation or nonlinear registration, that saves calculation expenses and speeds up the procedure. The fact that the DenseNet model’s parameters are not chosen optimally is a significant disadvantage.

Ebrahimi et al. (2021b) proposed a method for the early AD identification in MRI scans by utilizing CNN. This research implements and contrasts several DL models such as RNNs, 2D and 3D CNNs. Its key contribution is the application of TL to 3D CNNs using a collection of 2D images. The model is implemented on a system having NVIDIA DGX station utilizing the MATLAB deep learning toolkit and achieves an accuracy of 96.88%. The usage of TL improved the model’s performance. A significant disadvantage is that there aren’t many AD subjects in medical datasets. Chui et al. (2022) devised an approach for AD detection employing CNN using MRI images. TL is employed to improve hyperparameter fine-tuning and detection accuracy. GAN is employed additional training data generation for benchmark datasets’ minority classes. Three OASIS datasets viz. OASIS-2, OASIS-1, and OASIS-3, are deemed heterogeneous datasets, with the model achieving 96.1%, 96.9%, and 97.5% accuracy, respectively. This approach has the merit of more data creation, automatic feature extraction, a less biased detection model, and improved hyperparameter adjustment. One drawback of the model is the variability in the accuracies of individual classes.

Helaly et al. (2021) developed an end-to-end framework for facilitating early identification of AD and classification of medical images into different AD stages. This work employs a DL method, namely CNN. When the pre-trained VGG19 CNN model is fine-tuned, an accuracy of 97% is achieved in multi-class classification of AD stages. The advantage of this model is that it decreases memory requirements and computing complexity, gives manageable time and reduces overfitting. The lack segmentation of MRI images for emphasizing AD characteristics prior to AD stage classification proved to be a hindrance.

Further, Chen and Xia (2021) developed a combined model consisting of deep feature extraction and identification stages. The extraction module is employed to capture and record the global-to-local structured data extracted from 62 cortical areas. The critical cortical regions are then identified and integrated into the extraction module utilizing a newly constructed sparse regression module. It employs 3D ResNet10, that has an accuracy of 77.6% for sMCI versus pMCI and 95.32% for AD versus CN. Freesurfer 5.3 is used for data pre-processing, MATLAB 2016b for training, and Python 3.5 for testing. The training process for the model is time-consuming, requiring approximately 18 h to complete, which is a significant duration. There is a need to carefully select sparse features from each anatomical structure and explore sparsity in multi-modal imaging data. Feng et al. (2019) developed a DL Framework for AD diagnosis via. FSBi-LSTM and 3D-CNN. First, a 3D-CNN model is developed to extract deep feature representations from both PET and MRI imaging data. Deep feature maps is processed using FSBi-LSTM for extracting hidden spatial information for further improving its performance. Finally, the model is tested using the dataset obtained from ADNI. It is carried out on a Windows system equipped with an TITAN NVIDIA Xt GPU. It attains an accuracies of 65.35%, 86.36%, and 94.82% for distinguishing sMCI from NC, AD from NC, pMCI from NC.

Finally, Raju et al. (2021) devised a technique for multilevel classification of AD using DL. The datasets are categorized as ND, VMD, MID, and MOD. It employs TL with VGG16 using fastai. This method has 99% prediction accuracy. The combination of VGG16 and fastai allows this model to perform well in the four-way classification of AD, and the training procedure takes only a few epochs. The TL technique is used to avoid costly training from scratch and to achieve improved efficiency with a limited number of datasets. The disadvantage of this model is that it does not use data from several other modalities like fMRI and PET. A few of the above works that employs fused DL model architectures are presented in Table 14.

Table 14 Summary of latest works based on fused DL models

Based on the aforementioned review, it is evident that DL models are considered the state-of-the-art methods for the design and developing CAD systems. These models are efficient when compared to the traditional ML models but are expensive with respect to the computational resources and time. In the next section, we discuss the lessons learnt from the above review.

4 Comparison and discussion

In the previous section, we reviewed the recent publications that address the classification task of AD diagnosis in the past 5 years. As evident from our taxonomy, the present state-of-the-art CAD works for AD focus on two perspectives, i.e., (i) selection of appropriate dataset (clinical, imaging, etc.), and (ii) choice of ML models. In this section, we present a critical discussion on our findings from the previous section.

4.1 Insights on data source and feature selection

The type of dataset used can significantly impact the accuracy and effectiveness of the ML model used for CAD of the AD. The choice of the dataset is often dependent on the features relevant to the diagnosis task. As depicted in the Fig. 6a, most of the works make use of MRI data. In particular, out of fifty-nine articles, three works utilize fMRI images, while eight of them use sMRI data, remaining works subject the ML model to plain MRI images. The next spot is taken by multimodal data with eight works using it. Similarly, PET data are used by five works and hybrid by four other works respectively. MRI scan data is highly focused because of the availability of neuroimaging diagnostic data in popular repositories like ADNI and OASIS (as seen in Fig. 6b). Also, due to the fact that other scan data are in 3D and require huge computational power, and hence are not time efficient. Though, it is found in a few of the works that PET scans hold a lot of morphological features that can aid in feature selection of AD diagnosis. From the review, it is also found that only four works concentrate on the fusion of MRI and PET images. Although, during diagnosis, a physician makes use of both the scans equally for detection, little concentration is given to the hybrid data models by the researchers in CAD. Also, clinical data suffers from the same fate. This is attributed to the fact that there is a lack of such data in the public repositories.

Fig. 6
figure 6

a Distribution of several datatypes used by the recent publications for feature engineering. b Distribution of various datasets used by the recent publications. c Percentage of works that uses traditional ML models for CAD system development. d Different feature engineering techniques used by the recent publications

Figure 6b shows the distribution of data source frequency used by the recent publications. Here, most of the works utilize the ADNI dataset due to its popularity and ease of access. OASIS takes the next position due to the same reasons. However, due to the ADNI dataset’s larger and more diverse collection, it is more popular. ADNI also includes data from multiple sites for a larger number of samples. This diversity allows for more comprehensive training and testing of ML models, enhancing their performance and generalizability. Moreover, it provides valuable longitudinal data. Longitudinal data is particularly beneficial in DL models, as it allows for the exploration of temporal patterns and the development of predictive models that can track disease progression accurately. While the OASIS dataset is also a valuable resource for AD research, its primary focus is on cross-sectional data rather than longitudinal data. This limitation makes it less suitable for studying disease progression and developing predictive models using DL techniques. Additionally, ADNI is used often by the researchers, leading to its recognition as a benchmark dataset. Also, a few of the works utilize data from different sources and fuse them before pre-processing or training step. This is the requirement when the multimodal or hybrid approach is essential for ML model development.

As discussed earlier, feature selection plays a pivotal role in construction of a efficient classifier model. Thus, most of the works employ different feature selection strategies based on the modality of the training data. The distribution of these methods are as shown in the Fig. 6d. As noted from the pie-chart most of the works use dimensionality reduction technique for feature processing. It is due to the fact that most of the existing CAD system developed so far utilize neuroimaging data for training the classifiers. Due to large scale feature size of these scans data, numerous pre-processing methods are used. Although existing works make use of these techniques, there is still a lack on the feature selection strategies based on the new biomarkers identified in the medical community. The future works should take this into consideration so as to establish a correlation between biological significance and feature selection strategies. This leads to establishing explainability of the ML models.

4.2 Insights on ML model type

We have classified the existing works on CAD of the AD into two main categories. In the traditional ML model category, there are twenty one works and in the DL model category there are fifty nine works. Figure 6c shows the distribution of several traditional models employed by the researchers in the last 5 years for AD diagnosis. Among them SVM is the most popular technique, due to the fact that they can classify non-linear data optimally. kNN, DT and RF are next most popular classifiers used. Further, Fig. 7 shows the distribution of the different DL models used by the 58 works. Here, more than 40 works have employed CNN and remaining have employed other DL techniques such as RNN, generative models etc. Figure 8 shows the distribution of the different DL models in detail. It is evident from them that CNN models are more popular than others. The CNNs are today used unanimously for image classification task, they extract features from a high-dimensional image data without losing much of the information. However, the focus has to now shift on contriving of other DL models that addresses the challenges of longitudinal data presented by the multimodal diagnostic data of the AD.

Fig. 7
figure 7

Distribution of different DL models used by the recent publications

Fig. 8
figure 8

Distribution of different DL models surveyed in Sect. 3

A summary of the different models that we surveyed in Sect. 3.3 is presented in Table 15. It provides an overview for the purpose of selecting a particular model for developing a CAD system. Further, in the Tables 16 and 17 we compare the traditional ML and DL models respectively. With respect to the traditional ML models, most of the existing works use MRI scans from ADNI to train the classifiers. Though, these models concentrate on the feature engineering task for training the classifiers, their performance is comparatively lower to the DL models. Also, these models are impractical for large size datasets due to drastic increase in the training time with the increase in the training sample size. It can be also noted that the traditional ML algorithms perform the best with respect to the hybrid data and thus can aid in effective diagnosis when biomarkers other than neuroimaging data are considered. These models can further be fused with the newer DL models during the feature engineering step to extract the biomarker information. In the subsequent stages, DL models can be trained to enhance the correlation among the diagnostic procedure and the explainability of the overall CAD model.

Table 15 ML models used by different works for AD diagnosis

The popularity of the choice of DL models for developing a CAD model is evident from their outperformed classification accuracy when compared to the traditional models. The availability of neuroimaging data across several data sources has contributed to this as well. However, these models are in-explainable from the feature engineering perspectives and thus do not aid in establishing correlation among the diagnostic procedure established by the clinicians. Also, these models are computationally expensive with respect to the space and memory requirements. They require modern processing units to carry out the resource intensive automation of feature extraction. Most of the works that we reviewed in the snapshot of the last 5 years, employ CNN architecture. Though these models attain higher classification accuracy, they fail to capture the characteristics of the longitudinal data that is correlated with the clinicians understanding of the disease progression. To address this issue, the focus is to be given on the longitudinal data. Also, modalities from other diagnostic tests are to be taken into account to build a data-model that incorporates these characteristics. Further, the data-model must posses transfer learning attributes to include explainable features from other comorbidities.

There is also a need to establish a benchmark dataset for AD diagnosis with the existing available biomarkers. It not only solves the challenges of unbalanced data, but also establishes a standard framework for designing the CAD models. Further, such a dataset motivates the researchers to focus on the multimodal aspect of diagnosis that aids in treatment strategies developed through advancement in the field of Smart Healthcare, Internet of Things, and others.

Table 16 Comparison of the state-of-the-art CAD traditional ML models for AD diagnosis
Table 17 Comparison of the state-of-the-art CAD DL models for AD diagnosis

4.3 Prediction tasks in CAD of the AD

Although a lot of work has been done in the past with respect to the classification of AD and its stages, there is a clear gap in prediction tasks associated with the disease. AD is characterized by variations in the rate and pattern of progression among individuals, this complicates the development of universal predictive models. Also, biomarkers used for AD prediction such as Amyloid-Beta, Tau Protein, Neuro-inflammatory Markers and Genetic Markers exhibits variability over time, making it challenging to establish consistent baselines and track disease progression accurately. Research should focus on identifying and validating novel biomarkers, including blood-based markers and digital health data, that can enhance the accuracy of predictions and facilitate early detection. Recent research has shown predicting AD progression using specific biomarkers. For instance, a study by Whelan et al. (2019) demonstrated the utility of tau protein levels in cerebrospinal fluid as a reliable predictor of cognitive decline in AD.

Using neuroimaging features such as MRI, PET for AD prediction is challenging due to the high dimensional nature of the data, overfitting and variability in imaging protocols. Zhao et al. (2020) came up with use of MRI as neuroimaging feature, and predicting the disease effectively with GAN model.

Clinical data usage in AD prediction leads to variablity in data quality, limited availability of comprehensive clinical records, and the need for careful feature selection to extract relevant information from diverse clinical measures while addressing missing data issues. Alzubaidi et al. (2023) explained ways and tools to deal with the challenge of data scarcity.

AD is a multifactorial disease, and integrating various biomarkers, including genetic, neuroimaging, and biochemical markers improves the prediction accuracy. To this end, El-Sappagh et al. (2021a) demonstrates the use of multimodal data that involves neuroimaging, clinical and genetic data. In conclusion, addressing these facets–enhancing biomarkers, leveraging neuroimaging, incorporating cognitive and clinical data, integrating multimodal data constitutes the pathway to refining and advancing the prediction of AD progression.

4.4 Data scarcity

Data scarcity is a major challenge in deep learning, and it has particularly serious effects on CAD models used to diagnose AD. DL models, such as CNN and RNN, rely heavily on large and diverse datasets for effective training. Particularly in the context of AD diagnosis, obtaining such datasets with well-labeled medical information can be extremely challenging.

This scarcity of data can lead to several issues, firstly, limited data can result in models that lack the ability to recognize subtle patterns in brain scans or clinical data potentially causing model overfitting-underfitting, reducing the model’s diagnostic accuracy. Secondly, it can lead to bias in the model’s predictions, as the available data will not adequately represent the entire population causing data imbalance. These issues hinders the model’s real-world clinical applicability.

To address data scarcity in deep learning, Alzubaidi et al. (2023) have conducted a survey where they provided various solutions and tips to solve the challenge of data scarcity. One such solution is transfer learning, where pre-trained models on related tasks are fine-tuned with limited data. Shamrat et al. (2023), AlSaeed and Omar (2022), Tufail et al. (2022), Balboni et al. (2022), Ghazal and Issa (2022) etc. have incorporated TL to tackle data scarcity. Moreover, Zhao et al. (2020), Roychowdhury and Roychowdhury (2020), Chui et al. (2022) and Jung et al. (2023), used generative adversarial networks for synthetic data generation. Similarly (Murugan et al. 2021; El-Sappagh et al. 2021b) used synthetic minority over-sampling technique (SMOTE) to address class imbalance. In addition to that collaboration with genuine medical institutions to gather patient data, while ensuring patient privacy and ethical considerations, is also a way that will lead to larger datasets.

4.5 Model explainability

Model explainability refers to the ability to understand how a machine learning model makes its decisions. It involves gaining insights into the features, or data points that influence a model’s decision, which is crucial for transparency, and trust in AI systems. This challenge arises because deep learning models, especially sophisticated ones like neural networks, which often involves complex computations across numerous hidden layers, making it challenging to understand why a model generates a certain output.

This lack of transparency can hinder clinical acceptance and decision-making, as medical professionals need to understand why a model makes a particular diagnosis. It also poses potential risks of misdiagnosis or bias, as it is difficult to identify the specific features or patterns the model is using to make its predictions. This will result in lack of trust among clinicians to use such models. This lack of trust can hinder the adoption of AI-based diagnostic tools, even if they show promising accuracy rates. Furthermore, in the medical field, it is crucial to provide explanations for diagnoses to ensure that patients and doctors can make decisions about treatments. A systematic review conducted by Albahri et al. (2023) demonstrates trustworthiness and explainability of AI applications in healthcare, incorporating the assessment of quality, bias risk, and data fusion.

4.6 Algorithmic fairness

Algorithmic fairness involves ensuring that the AI systems give everyone a fair chance and do not unintentionally favor some groups over others or make biased decisions. It’s a critical consideration in the development of CAD models for diseases like AD, as biased predictions can lead to disparities in diagnosis and treatment. Data scarcity amplifies the challenge of algorithmic fairness because limited data results in misrepresentation of certain demographic groups in the training data, making it difficult for the model to learn unbiased patterns. A study conducted by Chen et al. (2021) elaborates algorithm fairness in AI in the field of medicine and healthcare.

5 Future directions

In the previous section, we discussed the lessons learnt from our review. In this section, we outline few future directions that are to be incorporated by the researchers, industrialist and academicians to develop a robust, multimodal, explainable, accurate CAD system for the detection of Alzheimer’s disease.

  1. 1.

    Exploring biomarkers From the review in Sect. 3.3 it is found that neuroimaging is widely used as a biomarker by the ML models for AD detection. Although it plays a major role in the diagnostic process, few other biomarkers are also at present used by the physicians. They include genetic information, human activity, biochemistry of the bio-molecules, cognitive assessments, and a few others. These sets of biomarkers are not yet addressed by the research community. Identifying a single biomarker that precisely indicates the disease’s presence is difficult. There is a requirement to explore combinations of biomarkers to enhance diagnostic accuracy. Moreover, distinguishing between the biomarker alterations associated with normal aging and those distinctive to AD poses yet another challenge. Thus, there is a need to explore additional new biomarkers of the AD for early diagnosis. To this end, a few promising directions are mentioned in Yu et al. (2021b), Snoun et al. (2023), Rabbito et al. (2020) and Schneider and Goldberg (2020).

  2. 2.

    Identification and extraction of features As features are the most significant factors for learning in ML models, their identification is crucial. The datasets collected contains noise or irrelevant information that can obscure meaningful patterns. The challenge is to identify the most informative features while eliminating noise. Not all features are equally relevant in distinguishing between healthy individuals and those with AD. Effective feature selection is crucial to focus computational resources on the most informative attributes that contribute to accurate diagnosis.Since biomarkers act as features, their mapping in the training dataset is essential. Local binary patterns (LBP) (Francis et al. 2021), volumetric analysis (Das and Kalita 2022), gray-level co-occurrence matrix (GLCM) (Gao 2021), surface-based analysis (Zhao et al. 2022), diffusion tensor imaging (DTI) (Torso et al. 2021), CNN (AlSaeed and Omar 2022), are encouraging in this direction.

  3. 3.

    Bridging gap between traditional ML and DL models There is a trade-off between traditional ML models and DL models. Though DL models outperform ML models in terms of accuracy, they lack explainability that is achievable with traditional models. As discussed earlier, recognition of different biomarkers enables feature selection, that is accomplishable through traditional ML models. Thus, there is a clear need for the combination of both the traditional and DL models in early AD diagnosis. As noted from the review hybrid models outperform standalone models. Also, a few other works to look in this direction are mentioned in Mohammed et al. (2021), Alatrany et al. (2021), Venugopalan et al. (2021), Janghel and Rathore (2021), AlSaeed and Omar (2022) and Puente-Castro et al. (2020).

  4. 4.

    Multimodal learning: With the identification of new biomarkers, new diagnostic approaches appear from time to time. Thus, it is appropriate to develop data fusion models that incorporate features from different modalities (for example, imaging, cerebrospinal fluid, blood-based markers, etc.,). In this regard, there are several data fusion models proposed in domains like sensor fusion (Yeong et al. 2021), decision-level fusion (Gumaei et al. 2022), data-level fusion (Zhang et al. 2022), feature-level fusion (Cai et al. 2020), knowledge-based fusion (Iakovidou et al. 2020) that are useful to proceed in this direction.

Apart from the above mentioned directions, several other challenges and their possible solutions are mentioned in Jo et al. (2019), Ebrahimighahnavieh et al. (2020), Fathi et al. (2022) and Zhao et al. (2023). Addressing these challenges aids in devising new techniques and models that enables early detection of the AD and thus improvises the treatment strategy for it.

6 Conclusions

The early detection of AD plays a significant role in its treatment strategy. There exists numerous diagnostic test to validate the disease. They make use of several biomarkers viz., neuroimaging, clinical, neuropsychological score, genetics, etc. These tests leads to a diverse collection of data, and CAD makes use of them to aid in early diagnosis of the disease. In this study, we presented a comprehensive and systematic overview of the existing state-of-the-art CAD methods developed for the diagnosis of AD. We provided a comprehensive list of several ML models employed over the last 5 years. A systematic classification method was devised through which an overview of the existing models are presented. The CAD models were reviewed based on the diagnostic modality utilized, data repositories employed, and different types of models implemented. Throughout our review we establish the significance of incorporating hybrid and multimodal data, and also emphasize on using augmented and ensembled ML models. Further, by expanding the scope and efficacy of the research findings, our comprehensive investigation has conclusively demonstrated that newer generation CAD systems must posses explainability features to correlate with the human counterpart. While both traditional ML and DL models have been employed for this purpose, it is evident from our review that both types of ML models are necessary to develop a robust, explainable, multimodal and accurate CAD system for early detection of the AD.