Introduction

Alzheimer's disease (AD) [1] is the most prevalent neurological disease that causes dementia in older adults over the age of 65, affecting around 6.5 million people worldwide [2]. AD is a subtype of dementia rather than its cause. Dementia is an all-encompassing word referring to a decrease in cognitive function, including memory, thinking, and reasoning, which is severe enough to interfere with daily life. AD is the leading cause of dementia, accounting for between 60 and 80% of cases. Memory loss, difficulties with language and communication, disorientation, mood changes, and finally difficulty with basic self-care chores such as dressing and bathing are typical signs of Alzheimer's disease. It is considered that a mix of genetic, environmental, and behavioral factors contribute to the development of AD. There is currently no cure for AD, although there are therapies that can help control symptoms and decrease the illness's progression. These treatments may include drugs to improve cognitive function, therapy to address behavioral and psychological issues, and improvements to one's lifestyle, such as physical activity and a balanced diet [3].

Alzheimer's disease typically begins with mild memory problems that progressively worsen over time, leading to impaired brain function. While the exact cause of Alzheimer's is not fully understood, there are several factors that are thought to contribute to its development, including aging, genetic predisposition, untreated clinical depression, lifestyle factors, severe head injury, and prolonged hypertension.

The human brain is composed of billions of neurons that form connections with each other. In Alzheimer's disease, these connections are lost due to the buildup of abnormal protein structures known as “plaques” and “tangles,” which ultimately led to the death of neurons. Plaques are deposits of amyloid beta (Abeta), [4] a peptide that is insoluble. Alzheimer's disease is typically divided into three phases: early, middle, and late. In the late stage, individuals may develop dementia.

In the early stage of Alzheimer's disease, a person may experience repeated forgetfulness, which differs from normal forgetfulness. While it is common for anyone to forget things, in the early stage of Alzheimer's disease, a person's brain capacity progressively declines, making it difficult to remember regular activities. Confusion is also common in the early stage, whereby a person may forget what they were doing or what they intended to do. Some of the common challenges in the early stage of Alzheimer's disease include [5]

  • Difficulty recalling recent conversations or events.

  • Frequently misplacing items.

  • Struggling to remember the names of places and things.

  • Having trouble finding the right words.

  • Making poor judgments or struggling to make decisions.

  • Becoming less adaptable and more resistant to change.

  • Experiencing memory issues that interfere with daily activities.

  • Finding it difficult to solve problems or plan ahead.

  • Having trouble completing routine tasks.

  • Confusion about time or location.

  • Losing track of items and the ability to recall past events.

During the middle stage of Alzheimer's disease, confusion tends to worsen gradually, and patients may also experience difficulties with sleep [6]. In the late stage, individuals may become less responsive to their environment, struggle with communication, and lose control over their movements. Although they may still be able to express words or phrases, they may have trouble conveying their emotions. The risk factors of AD are depicted in Fig. 1.

Fig. 1
figure 1

Risk factor of Alzheimer's disease (AD)

Recently researchers and academician have shown interest in computer-assisted machine learning methodologies for analyzing and predicting different diseases using medical data. Many state-of-the-art works applied traditional pattern analysis techniques, such as linear discriminant analysis (LDA), linear program boosting method (LPBM), logistic regression (LR), support vector machine (SVM), and support vector machine-recursive feature elimination (SVM-RFE). The outcomes from these approaches are promising for the early detection of Alzheimer's disease and the prediction of AD progression [7].

To implement machine learning models, we require designing architecture and perform preprocessing. In general, machine learning-based classification studies involve four steps: feature extraction, feature selection, dimensionality reduction, and feature-based classification algorithms. These processes require specialized knowledge and several optimization steps, which can be time consuming. The reproducibility of these methods has been a problem [8]. In the feature selection procedure, for instance, AD-related features are selected from various neuroimaging modalities to derive more informative combinatorial measures, which may include mean subcortical volumes, gray matter densities, cortical thickness, brain glucose metabolism, and cerebral amyloid accumulation in regions of interest (ROIs) such as the hippocampus [9]. The fast development in high-volume biomedical datasets (neuroimaging and related biological data) over the past decade, in tandem with the advancements in machine learning (ML), has opened up new pathways for the diagnosis and prognosis of neurodegenerative and neuropsychiatric illnesses [10, 11]. In this paper, we adopt an ensemble-based machine learning approach to detect AD.

The rest of the paper is organized as follows. In "Review of Machine Learning Usage in Medical Clinical Diagnostics" section, we review the literature. The methodology is presented in "Materials and Methods" section. The result is analyzed with respect to different performance metrics in "Results and Discussions" section before concluding the paper in "Conclusions" section.

Review of Machine Learning Usage in Medical Clinical Diagnostics

In the field of medical science, a number of applications that use various machine learning techniques are presently being used for data analysis and innovation. Machine learning techniques have been used in a number of recent healthcare research studies, including the diagnosis of COVID-19 using X-rays [12, 13], the identification of tumors using MRIs [14, 15], the prediction of cardiovascular diseases [16, 17], dengue [18, 19], stroke [20], and cancer [21, 22]. Kader et al. [23] created a model that employed feature selection and extraction strategy to predict Alzheimer's disease using machine learning techniques. The OASIS longitudinal dataset is used for classification. A concise introduction of the many methods used to analyze brain scans in order to identify brain illnesses. This research also showed the technique that is most reliable for identifying brain diseases. Using 22 brain disease databases, the authors can identify the most precise diagnostic technique. The research combines current findings on Alzheimer's disease, Parkinson's disease, epilepsy, and brain malignancies.

Mehmood et al. [24] presented an overview of recent studies on the categorization and Alzheimer's disease diagnosis. It offers instances of the methodology used to identify and classify AD. Martinez-Murcia et al. [25] developed a model in which deep convolutional autoencoders are used to examine data analysis of AD. This model can extract MRI features from MRI images. It characterized a person's cognitive problems and the underlying neurodegenerative process via data-driven deconstruction. Imaging-derived markers and MMSE or ADAS11 scores can be used to predict an AD diagnosis in more than 80% of cases. Helaly et al. [26] developed a model which employed coherent approach for Alzheimer's disease early detection. This study utilized convolutional neural networks (CNN) to categorize AD. Two techniques are applied to predict AD. Using a web application, clinicians and patients may remotely screen for Alzheimer's disease. According to the AD spectrum, it also establishes the patient's AD stage. The VGG19 pre-trained model has been improved, and it now identifies AD stages with an accuracy of 97%.

Kavitha et al. [27] constructed a model which uses optimal parameters for predicting Alzheimer's disease. Classifiers like Gradient Boosting, Support Vector Machine, and Decision Tree and Voting are used to identify AD. The proposed study displays excellent results, with 83% average validation accuracy. This test accuracy score is significantly higher than that of prior efforts.

Ghazal et al. [28] developed a model which utilized transfer learning on multiclass categorization using brain MRIs, to classifying the pictures into four categories: very mild dementia (VMD), mild dementia (MD), and moderate dementia (MOD) non-dementia (ND). The correctness of the proposed system model is 91.70% according to simulation findings. It was also noted that the suggested technique provides findings that are more accurate when compared to earlier methods.

Gaudiuso et al. [29] developed a model which utilizes Laser-Induced Breakdown Spectroscopy (LIBS) and machine learning. This study examined micro-drops of plasma samples from AD and healthy controls (HC) results in robust categorization. After obtaining the LIBS spectra of 67 plasma samples from a population of 31 AD patients and 36 healthy controls (HC), we successfully detected late-onset AD (> 65 years old), with a total classification accuracy of 80%, specificity of 75%, and sensitivity of 85%.

Nawaz et al. [30] developed a model which utilized deep feature-based strategy for predicting the stage of Alzheimer's disease using a convolutional neural network. By transferring the initial layers of a previously trained Alex Net model and extracting the deep features Researchers applied the popular Machine learning approaches include k-nearest neighbor (KNN), random forest, and support vector machine (SVM) for the categorization of the retrieved deep features (RF). According to the assessment findings of the suggested plan, a deep feature-based model beat handcrafted and deep learning techniques with an accuracy rate of 99.21%. Basheer et al. [31] constructed a model using the OASIS dataset. Demented and non-demented classifications were used in the study to distinguish the classes. By proving the correlation accuracy across a number of repetitions with an acceptable accuracy of 92.39%, the discovery has been confirmed as accurate.

Prajapati et al. [32] developed a model using deep neural networks. The study has produced improved categorization outcomes in contemporary medical research sectors. The suggested DNN with the highest validation accuracy score achieved test data accuracy of 85.19%, 76.93%, and 72.73%. Lucas et al. [33] developed model based on quantitative EEG (qEEG) processing technique to automatically distinguish AD patients from healthy people. 19 healthy patients and 16 AD patients with probable mild to moderate symptoms had their EEGs analyzed. The accuracy and sensitivity of the analysis, which took into account each patient's particular diagnosis, were 87.0% and 91.7%, respectively. The accuracy and sensitivity of the EEG analysis were found to be 79.9% and 83.2% respectively. The accuracy and sensitivity of the analysis, which took into account each patient's particular diagnosis, were 87.0% and 91.7%, respectively. Sudharsanet al. [34] created a model using structural magnetic resonance imaging (sMR). The study utilize moderate cognitive impairment (MCI), and healthy control patients, to diagnosis early Alzheimer’s disease.

The main contributions to this proposed research work are highlighted as follows:

  • To employ a feature selection approach to identify the most relevant features and avoid data redundancy. The feature selection method chooses the most important k number of features, and we apply a standard scaler for scaling features’ values.

  • To train the model using the oasis dataset that contains numerous missing values. We utilized the mean approach to address them.

  • To predict AD, we apply six different machine learning classifiers and propose a voting-based ensemble approach which demonstrates around 96% accuracy.

Materials and Methods

The three basic steps of the proposed technique are data collection, preprocessing, and prediction utilizing machine learning algorithms to forecast Alzheimer's disease. Pandas [35] is used to load the initial dataset and import the required libraries for preprocessing. The consistency and redundancy of the input dataset reduce the machine learning algorithm's accuracy. For optimal outcomes, in this research, the data are cleaned up before being used in a machine learning algorithm, eliminating unnecessary values and attributes. The data have been preprocessed and randomly spilled into testing and training. The ratio of dividing data is 80:20 randomly which means 80% of the data are used to train the model and 20% for testing. Figure 2 illustrates the system's process for making early prediction of Alzheimer's disease.

Fig. 2
figure 2

Working flowchat of the proposed system to predict AD

Dataset

A longitudinal dataset [36] is utilized in this research to predict Alzheimer's disease. The initial task is to ascertain how cross-sectional the data at a given period or at a certain baseline. Following that, a thorough data analysis is carried out, which included comparing the primary research components and the associated data obtained on each visit. The research includes 150 individuals with MRI data ranging in age from 60 to 96. Each patient was scanned at least once during the study. Every patient is right handed. At the time of the preliminary examination, 64 patients are identified with Dementia, whereas 72 are classed as non-demented, which stayed the same during the research.

Table 1 shows the OASIS longitudinal dataset description. The attribute visits indicate the number of patients visited at any point throughout the trial. M/F specify the patients' gender M and F stand for male and female, respectively. The age of the patients is described via the attribute age. Patients' study time is determined by the EDUC characteristic. SES stands for patients' socioeconomic status. Mini-Mental State Examination and nWBV are the two terms normalize the entire brain's volume.

Table 1 Description of the dataset

Data Preprocessing

The raw dataset has data redundancy and missing value [37, 38]. As part of data handling, missing value features are extracted and transformed. Feature selection and feature scaling are also performed in this work for the preprocessing of the dataset.

Data Analysis

Data analysis is inspecting the dataset, then cleansing, transforming, and modeling the dataset into a suitable form. Data analysis [39] aims to find helpful information that will later be used to support decision making. For example, the analysis could help in evaluating the characteristics of the data and the relation between co-relation attributes. Table 2 displays the minimum, maximum, and median values for the each attribute of the dataset.

Table 2 Each attribute's maximum, minimum, and median values

Missing Data Handling

The OASIS dataset contains a number of missing values. Missing values can affect the outcomes ML model or reduce the model accuracy. In this study, we employed the mean method to impute missing values. We obtained the mean (average) of all observable data and replaced it with the mean values in place of missing values.

$$\text{Mean }=\frac{\text{Sum of all data points }}{\text{Number of data points}}.$$
(1)

In column “SES,” 19 rows of missing values have been found, and “MMSE” is two rows of missing values found. Imputation is a method of replacing missing values by replacing them with equivalent values. The 19 rows of missing value have been replaced with mean value. As for “MMSE,” 2 rows of missing values have been replaced with the Mode Value. Figure 3 displays the data visualization before and missing data handling.

Fig. 3
figure 3

Display the data visualization. a Before missing data handling, b after missing data handling

Feature Selection

While creating a predictive model, feature selection is the process of choosing relevant features by reducing the dimensionality of the input attribute. Feature selection [40] is crucial in machine learning. The purpose of feature selection techniques is to decrease the quantity of input parameters. Therefore, it is beneficial to a model to predict and increase the accuracy score by reducing identify of the dataset. In this research, feature selection is utilized to analyze clinical data related to Alzheimer's disease. Select kbest choose features are based on the k highest scores. In this work, the select kbest approach is used to identify the best feature, and f class is used as the scoring function. This method is employed since it allows us to use both classification and regression data by just changing the “score func” option. In Eq. (2), x represents the raw score, μ represents the population's mean, and μ denotes the standard deviation. Table 3 represents the score value of each attribute using “score func.”

Table 3 Sore value of each attribute using “score func”
$$Z=\frac{x-\mu }{\sigma }$$
(2)

Correlation Coefficient

The linear connection between two variables is measured by their covariance. It is simple to discover a link between the different phases of Alzheimer's using correlation coefficients. The drawback with this strategy is that the information is gathered from a variety of sources, making it subject to outliers. Equation (3) is used to define the correlation between the two variables J and M [41].

$${\rho }_{J,M}=\frac{{\text{cov}}(J,M)}{\sigma J\sigma M}$$
(3)

Heat-Map

A heat map is a type of data visualization method showing a phenomenon’s magnitude in two dimensions using color. The reader can better see how the phenomena are grouped or fluctuate in space thanks to the color variation [42]. The values of the first dimension are presented as rows in the table, while the values of the second dimension are displayed as columns. The color of the cell is determined by the proportion of measurements that match the dimensional value. The pattern emphasizes distinctions and variances within the same dataset. In Fig. 4, warmer color indicates higher values and colder color indicates lower values. Row attribute visit is correlated with column visit which is shown in warmer color yellow and values is 1 which is highest MR Delay and visit is highly correlated that is shown in warm yellow color. ASF and visit attribute has lower correlation which is shown colder color light green. In Fig. 4, same attribute is highly correlated whereas Visit and ASF nWBV and Visit eTiv and Visit have lower correlation.

Fig. 4
figure 4

Heat-map for correlation between every attribute

Standard Scaler

Data standardization is the process of combining several datasets' structures into a single uniform data format. Standardizing a dataset includes rescaling data that mean observed values is 0 and standard deviation is 1. Standard scaler is used in this work.

Classifier Models

Gaussian Naive Bayes (Gaussian NB)

Gaussian Naive Bayes is a simple probabilistic algorithm. This is one of the most widely used Naive Bayes (NB) algorithm that utilizes the Bayes theorem. The strategy is made to deal with the continual qualities that are to associate each class and generated based on a Gaussian distribution. The NB family has several key benefits, including the ability to be trained extremely successfully in supervised learning, the ability to be applied to real-world classification problems, and the need for minimal training data. The NB family has a significant flaw in that the qualities are expected to be independent, which is virtually impossible. Thus to calculate the probability of a continuous data set, the following Eq. (4) can be used [43].

$$P\left(X=x|C=c\right)=\frac{1}{\sqrt{2\pi \sigma }}{e}^{\frac{{-(x-\mu )}^{2}}{2{\sigma }^{2}}}$$
(4)

where x = variable, c = class, \(\pi \) = mean,\(\sigma \) = stander deviation.

XGBoost

XGBoost stands for Extreme Gradient Boosting. It works at its fastest and most efficient utilizing gradient-boosted decision trees. Here, D represent the dataset and M represent the number of training samples and i ranges from 1 to m where \(\{xi,yi\}\) represents the ith training example. In Eq. (5) the estimated label is performed.

(5)

where P is the space of decision trees which is also known as Classification and Regression Tree (CART). Each \(f\_M\) corresponds to an independent tree structure. For boosting tree algorithm, Eq. (6) is the regularized objective function minimized in which \(\Omega (f)\) represents the \(L1\) regularization. Besides, \(l\) is a differentiable convex loss function.

$$ \left( \phi \right) = \sum\nolimits_{i} {l\left( {\hat{y}_{i} ,\,y_{i} } \right)} + \sum\nolimits_{M} {\Omega \left( {f_{M} } \right)} . $$
(6)

For gradient boosting algorithms, Eq. (7) is the objective function.\({y}_{i}^{\widehat{(t)}}\) is the estimation of the ith instance at the tth iteration.

$$ \mathcal{L}^{\left( t \right)} = \sum\nolimits_{i = 1}^{n} {l\left( {y_{i} ,\,\hat{y}_{i}^{{\left( {t - 1} \right)}} + f_{t} \left( {x_{i} } \right)} \right)} + \Omega \left( {f_{t} } \right). $$
(7)

Equation (8) shows the second-order approximation which is used to optimize the objective Falster.

$$ \overline{\mathcal{L}}^{\left( t \right)} = \sum\nolimits_{i = 1}^{n} {l\left( {y_{i} ,\,\hat{y}_{i}^{{\left( {t - 1} \right)}} + f_{t} \left( {x_{i} } \right)} \right)} + \Omega \left( {f_{t} } \right). $$
(8)

In Eq. (7), \(g_{i} = \delta_{{\hat{y}}}^{{\left( {t - 1} \right)}} l\left( {y_{i} ,\,\hat{y}^{{\left( {t - 1} \right)}} } \right)\;{\text{and}}\;h_{i} = \partial_{{\hat{y}^{{\left( {t - 1} \right)}} }}^{2} l\left( {y_{i} ,\,\hat{y}^{{\left( {t - 1} \right)}} } \right)\) are the 1st and 2nd-order statistics on the lost function [44].

Decision Tree (DT)

The goal of DT is to predict the value of a target variable. A DT is utilized, with the leaf node representing a class label and the interior node representing features [45]. Equations (10, 11) help for calculating the output of decision tree.

Information gain

(9)

Gain Ratio: Gain Ratio (M, J) = Gain(M, J)/Split Information (M, J)

$$\text{SplitInformation }(M, J) =\sum_{i=1}^{c}(|{M}_{i}| /|\text{M}|){\text{log }}_{2}(|{\text{M}}_{i}| /|\text{M}|).$$
(10)

Gini value

$$\text{Gini }(\text{D}) = 1 -\sum_{j=1}^{n}{p}_{j}^{2}.$$
(11)

where \(pj\) is relative frequency of class \(j\) in \(D\). If dataset \(D\) is split on M into two subsets \(D1\), \(D2\) the gini index \(\text{gini}(D)\) is defined as follows: \(\text{Gini}A(D) = |D1|/(|D|\text{gini}(D1)) + |D2|/(|D|\text{gini}(D2))\).

Random Forest (RF)

A random forest model outperforms a decision tree model because it avoids the problem of overfitting. Random forest models are composed of a variety of decision trees, each completely distinct from the others. In Random Forest, several trees are utilized to build a forest or forest, and each tree is then continually evaluated [46]. Equation 12 is used to get the Gini Index for the classification:

$$\text{Gini}=1-\sum_{i-1}^{n}{({p}_{i })}^{2}.$$
(12)

In Eq. (12), the value of \({p}_{i}\) is the probability of the object to be classified in a certain class/feature.

Gradiant Boosting

Both the base-learner models and the loss function are freely securable. When given a particular loss function (y, f) and/or a particular base-learner (x, \({\theta }_{t}\)), the answer to the parameter estimates could be complicated to calculate in actuality [47]. It was suggested to address this by choosing a different functionh(x, θt) that is most parallel to the observed data's negative gradient, gt(xi)Ni = 1.

$$ g_{t} \left( x \right) = {\text{Ey}}\left[ {\frac{{\partial \psi \left( {y,\,f\left( x \right)} \right)}}{{\partial \left. {f\left( x \right)} \right|}}\left| x \right.} \right]f\left( x \right) = f^{ \wedge } t - 1\left( x \right). $$
(13)

Voting

One of the simplest methods of merging the forecasts from several learning algorithms is by voting. Voting classifiers are not really classifiers, but rather wrappers for multiple ones that are trained and evaluated concurrently to benefit from their unique qualities. To predict the final result, datasets are trained using various algorithms and ensembles. A qualified majority on a forecast can be obtained in two ways:

Hard Voting Hard voting is the simplest type of majority voting. The class with the most votes (Nc) will be selected in this instance. The majority vote of each classifier is used to make prediction [48]. In Eq. (14), the class label j is predicted using majority (plurality) voting of each classifier M

$$j =\text{ mode }\left\{ {M}_{1}\left(x\right),{M}_{2}\left(x\right), .., {M}_{n}\left(x\right)\right\}.$$
(14)

Model Validation

In this study, overfitting issue is diminished via model validation. To measure the model accuracy, Cross-Validation is used to train the ML models. Moreover, making the ML model noise-free is a daunting task. Hence, the proposed study uses 10-fold cross-validation method, which divides the whole dataset into 10 divisions those are equal in size. The ML model trains each iteration using the 9 divisions. The performance of the approach is examined using the mean of all 10-folds.

Results and Discussions

In this work, different performance metrics are used including F1 score, recall, accuracy, and precision. 10-fold cross-validation approach is utilized to get each model's ideal parameter. After that, each model's accuracy is evaluated. The confusion matrix is used to explain performance assessments, which can be either binary or multiclass in nature. A unique machine learning classifier is constructed and validated to predict and differentiate actual Alzheimer's disease-affected persons, and a learning model is created to distinguish truly afflicted individuals from a given population. These features are used to compute the precision, recall, accuracy, and F-score assessment metrics. According to this study, the recall (sensitivity) is the percentage of individuals who are correctly categorized as having Alzheimer's. The percentage of patients who are accurately identified as not having the condition indicates the accuracy of an Alzheimer's diagnosis. In contrast, accuracy denotes the percentage of subjects that are properly identified, whereas F1 denotes the weighted average of recall and precision. The patient is given a report that details the findings and the stage of Alzheimer's disease they are now experiencing. Because the phases are dependent on the patients' responses, it is crucial to identify them. In addition, recognizing the stage helps doctors in recognizing how the disease is impacting patients. For the purpose of executing its tests and data analysis, this study applied the following settings, resources, and libraries:

  1. (a)

    Environments Used:- Python 3

  2. (b)

    Scikit-learn libraries for machine learning

Figure 5 shows that Men are more likely than women to have dementia or Alzheimer's disease. Figure 6 shows that compared to the dementia group, the non-demented group had significantly better MMSE (Mini-Mental State Examination) scores.

Fig. 5
figure 5

Resolution of demented and non-demented rate based on gender, Gender group Female = 0, Male = 1 and Non Dementia = 2

Fig. 6
figure 6

Resolution of MMSE scores for demented (Female and Male) and non-demented group of patients

Figure 7a–c indicates resolution values of ASF, eTIV, and nWBV for Demented and Non-demented group of people. The ratio of brain volume that the non-demented group is larger than the demented group, as shown by the graph in Fig. 7. Figure 8 indicates the analyzed results which denoted for demented and Non-demented people of EDUC.

Fig. 7
figure 7

a–c Resolution of ASF, eTIV and nWBV for Demented and Non-demented group

Fig. 8
figure 8

Resolution of education on years

Figure 8 shows the analysis of education on years and Fig. 9 illustrates the age factor to determine the proportion of afflicted individuals based on the demented and non-demented groups. It has been shown that people with dementia tend to be older on average between 70 and 80 years old than patients without dementia. People who suffer from such type of disease are probably not very likely to survive. There are not many people that are above 90 years old.

Fig. 9
figure 9

Resolution on people impacted by demented and non-demented group based on age

The following is a summary of the intermediate findings from the analysis of the attributes shown above.

  1. 1.

    Men are more likely than women to have dementia or Alzheimer's disease.

  2. 2.

    Patients with dementia have less education in regards to educational years.

  3. 3.

    There is a difference in brain volume between those with and without dementia.

  4. 4.

    The demented group had a higher number of patients in their 70 s and 80 s as compared to the non-demented group.

Performance Evaluation Measures

Here, the following performance assessment metrics (Eqs. 1518) are computed using the true-positive (TP), false-positive (FP), true-negative (TN), and false-negative (FN) counts.

Accuracy Finding the percentage of correctly categorized outcomes from all examples produces this measurement.

$$\text{Accuracy}=\frac{\text{TP }+\text{ TN}}{\text{TP }+\text{ FP }+\text{ TN }+\text{ FN}}.$$
(15)

Precision This method evaluates the proportion of accurately predicted positive rates to all predicted positive rates. When the precision value is 1, the classifier is considered to be effective.

$$\text{Precision}=\frac{TP}{\text{TP }+\text{ FP}}.$$
(16)

Recall Recall can define as a true-positive rate. Whether the recall is 1, it is significant as a good classifier.

$$\text{Recall }=\frac{TP}{\text{TP }+\text{ FN}}.$$
(17)

F1 Score It is a measurement that takes the Recall and Precision parameters into account. Only when both measures, such as recall and precision, are 1, does the F1 score become 1.

$$\text{F}1=2\frac{{\text{Precision}}.{\text{Recall}} }{{\text{Precision}}+{\text{Recall}}}.$$
(18)

Precision and recall are the primary performance indicators utilized in medical diagnostics research. Because it combines recall and accuracy into a single number that is easier to use for comparisons, the F-measure is especially significant for reporting performance in medical diagnostics. Recall is viewed as a more significant parameter in the medical field since a false negative is typically more damaging than a false positive. A missing condition might have severe repercussions for a patient, but a missed FP (precision) might not be as essential as a missed FN (recall) because FPs can be dismissed by doctors. Given this, a modified version of F1 that emphasizes recall over precision is a more useful statistic. The confusion matrix ML models are shown in Fig. 10.

Fig. 10
figure 10

Confusion matrix for algorithms a GaussianNB, b decision Tree, c random Forest, d XGBoost, e GradientBoost, f voting Classifier

Table 4 shows that training and testing accuracies for each model have been compared to minimize overfitting. Precision, recall, accuracy, and F1 score for each model are also presented in Table 4. The results of the study shown in Table 03 affirmed that the best and optimal procedures, which have stellar results, are random forest, and GaussianNB Voting classifier.

Table 4 Performance evaluation of several ML models

Figure 11 shows the graphical comparison of evaluation metrics attained by the proposed system and existing classifiers: for GaussianNB, Decision Tree, Random Forest, XGBoost, GradientBoost, and Voting Classifier. Here shows the comparison among accuracy, precision, recall, and F1 Score using ML algorithms where easily noted the highest value and lowest value.

Fig. 11
figure 11

Graphical comparison of evaluation metrics attained by the proposed system and existing classifiers

Figure 12 denotes ROC curve for multiclass classifiers where blue color defines as micro-average ROC curve, and yellow, green, and red define as ROC curve of class for multiple class (class-0,class-01,class-2) for (a) GaussianNB (b) Decision Tree (c) Random Forest (d) XGBoost (e) GradientBoost (f) Voting Classifier.

Fig. 12
figure 12

ROC curve a GaussianNB, b decision Tree, c random Forest, d XGBoost, e GradientBoost, f voting Classifier

Figure 13 demonstrates that the result of test dataset on AD where the characteristics with the lowest squared error identified by the model. All the missing values are removed through SGD learning, which also changed the nominal properties into binary ones. Also, every attribute is normalized MLP learning.

Fig. 13
figure 13

Comparison of the achieved accuracy of different classifies

Comparison with Existing Works

Table 5 shows Comparison with the existing work for AD prediction using ML Martinez-Murcia et al. [25], and Basheer et al. [31] used both machine learning and deep learning. Sudharsan et al. [34] used machine learning technique and PCA and got 78.31% of accuracy to predict AD. Kavitha et al. [27] used OASIS dataset to predict AD, and using ML, they obtained 83% accuracy. OASIS dataset is also used by Basheer et al. [31] to predict AD with machine learning and deep learning models. The authors obtained the accuracy of 92.39%. As far as it can observe that the suggested method outperforms all others in the literature.

Table 5 Analysis of the proposed model in comparison to the closest-related works

Data Integrity Issue

Data preprocessing is a crucial stage in machine learning since this helps cleanse, transform, and prepare data for training machine learning models. Nonetheless, preprocessing can possibly compromise the data's integrity, which can have a detrimental impact on the performance of machine learning models for identifying different diseases. However, we have adopted some approaches that can address this issue:

  • We apply data augmentation that involves producing new synthetic data from an existing dataset. This can assist in enhancing the diversity and quantity of the dataset, hence, enhancing the robustness of machine learning models.

  • We apply cross-validation that is a method for assessing the performance of machine learning models. This includes partitioning the dataset into numerous folds, with each fold serving as a validation set while the remaining folds are used for training. This can assist in identifying problems with the data preprocessing, such as overfitting or underfitting.

  • We have employed feature selection that entails selecting the most pertinent characteristics from a dataset. This can assist lower the dataset's dimensionality and enhance the performance of machine learning models.

  • We find out the outliers which are data points that drastically vary from the usual distribution of data.

  • Finally, we have developed a robust algorithms that are less susceptible to data preparation problems like missing values, outliers, and skewed distributions.

To overcome the potential integrity issues created by data preprocessing, we apply data augmentation, cross-validation, feature selection, outlier detection, and removal for building a robust AD detection model.

Conclusions

Since presently there is no proven cure for Alzheimer's disease, it is far more important than ever to reduce risk, provide early diagnosis, and thoroughly assess symptoms. The literature research reveals that numerous efforts have been attempted to identify Alzheimer's disease using variety of machine learning algorithms and micro-simulation approaches; nevertheless; it is still daunting to establish relevant traits that might detect Alzheimer's disease at an early stage. In order to identify the most reliable factor for Alzheimer's disease prediction, a number of approaches, including GaussianNB, Decision Tree, Random Forest, XGBoost, Voting Classifier, and GradientBoost, have been used in this study. Feature selection and feature scaling are used in this work to improve the accuracy of the machine learning algorithms. The proposed study yields a more beneficial outcome, with the voting classifier having the greatest validation accuracy of 96% on the test data of AD. To improve the detection approaches' accuracy, future research will focus on removing redundant and unneeded characteristics from existing feature sets as well as on extracting and analyzing unique features that are more likely to aid in the detection of Alzheimer's disease.