1 Introduction

Diabetes, a chronic metabolic disease, is associated with disturbances in protein, glucose and fat metabolism caused by relative or absolute insulin deficiency [1]. The disease can be diagnosed by high blood glucose levels due to a deficiency in insulin production [2]. There are two different types of diabetes, Type 1 (T1D) and Type 2 (T2D). Diabetes affects an estimated 537 million adults worldwide [3]. If this trend continues, it is estimated that the number of diabetics worldwide could exceed 629 million by 2045 [4]. According to the World Health Organization, diabetes is a worldwide epidemic [5]. Approximately 760 billion dollars are spent annually in the fight against diabetes [6]. The cost of treating diabetes-related complications imposes a significant economic burden on healthcare systems, and this burden is increasing day by day [7].

With the widespread prevalence of diabetes, machine learning (ML) models are increasingly used to predict diabetes and its possible complications due to their ability to deal with large and complex datasets. Tan et al. [8] used ML methods to predict T2D diabetes complications. As a result of the study, they obtained successful results in the T2D estimation of the random forest (RF) method. Grupta et al. [9] have proposed a Deep Density Layer Neural Network (DDLNN) for diabetes prediction. Naive Bayes (NB), K-nearest neighbor (KNN), logistic regression (LR), support vector machine (SVM) and decision tree (DT), which are ML algorithms frequently used in the literature, were also used in the study. In line with the findings obtained as a result of the study, it was emphasized that the proposed model gave more effective results than other machine learning methods. Popo and Khosa [10] estimated blood glucose in patients with T1D over a 30- and 60-minute progressive period. They used a multilayer long-short-term memory-based recurrent neural network for blood glucose prediction and obtained successful results. It was emphasized that the findings obtained from this study will shed light on the determination of the amount of insulin to be given to T1D patients. Zheng et al. [11] proposed a multivariate risk prediction model for the prediction of T2D patients. The proposed model is based on 5 independent influence factors and the LR method is used. Successful results were obtained in the study, in which the clinical features of T2D disease were shown comprehensively. Zhu et al. [12] [deep learning and meta-learning] propose a fast-adaptive and confident neural network (FCNN) to predict blood glucose in T1D patients. The proposed method was developed on 3 different data sets. The findings obtained as a result of the study produced successful results in predicting the future periods of blood sugar in the 18- and 64-minute periods. Alqushaibi et al. [13] created a hybrid structure using Bayesian optimization algorithm and convolutional neural network (CNN) for T2D risk prediction. Successful results were obtained in the Bayesian CNN architecture. Li et al. [14] developed a diabetes risk prediction model based on XGBoost. Integrated learning, deep learning, and logistic regression methods were used in the developed method. An AUC value of 0.91 was obtained with the proposed method. Aslan and Sabancı [15] proposed a new method based on deep learning for diabetes prediction. In the study, firstly, numerical data were converted into visuals. The resulting images are first given to CNN’s ResNet-18 and ResNet-50 models. Then, the features of the ResNet model are combined with SVM and classified. In the last step, the fusion features are classified according to SVM. Successful results were obtained with the proposed method. Nguyen et al. [7] used KNN, LR, SVC, AdaBoost Classifier, Gradient Enhancer Classifier, and RF Classifier methods to diagnose T2D patients. In the study where the results were presented in a controversial manner, it was emphasized that RF produced more successful results than other ML methods. Naz and Ahuja [16] proposed an expert system for T2D diabetes detection. The proposed method is a hybrid of synthetic minority oversampling technique (SMOTE) and sequential minimal optimization (SMO) algorithms. SMOTE was used for data preprocessing, and SMO was used for classification. An accuracy rate of 99.07% was achieved with the proposed method.

Although various methods for predicting diabetes are frequently employed in the literature, their predictive performance remains limited due to challenges such as hyperparameter selection and parameter optimization [13, 15, 17]. The choice of hyperparameters plays a crucial role in enhancing the classification performance of these methods [18]. Adjusting problem-specific hyperparameters through trial-and-error methods is nearly impossible [19, 20]. Hybrid structures of the methods used for hyperparameter selection, which constitute a multidimensional space, with optimization algorithms will increase the prediction capabilities of these methods [21]. Hence, the efficacy of structures constructed using hyperparameters determined solely through trial and error on particular models remains constrained. Furthermore, existing literature lacks comprehensive studies necessary for the development of real-time implementations of successful models. Patient decision support system applications that could potentially be developed serve as pivotal tools in safeguarding and directing patients’ health, thereby mitigating various adverse effects of diseases in their early stages. In light of these considerations, the primary objective and purpose of this study are:

  • To achieve high accuracy in predicting patients’ diabetes risks.

  • Developing a hybrid architecture using stacked autoencoders and Softmax and then optimizing both the structure and hyperparameters of the architecture using genetic algorithm.

  • It is to offer a hybrid methodology designed to optimize entire architectures and hyperparameters tailored to specific problems and datasets.

According to these goals and purposes:

  • To enhance classifier efficacy through synergizing the unique strengths of stacked autoencoder, Softmax, and genetic algorithm, we devised a deep learning network rooted in optimization and hybridization with the aforementioned components. This network amalgamates the capabilities of stacked autoencoder, Softmax, and genetic algorithm to yield a more potent classifier.

  • In order to get effective problem-specific outcomes, GA was used to optimize both the architecture parameters and the hyperparameters inside the architecture.

  • The proposed hybrid deep learning model was applied to the early stage diabetes risk prediction dataset from UCI. In addition, different artificial intelligence methods frequently used in the literature were applied to this data set and the findings were presented comparatively.

  • The results obtained with the proposed hybrid deep learning model are presented in comparison with the results of other studies conducted with the same data set in the literature.

Novelties that resulted from experiments:

  • We developed a hybrid deep learning network approach featuring a problem-specific model and an innovative hyperparameter optimization technique.

  • Our approach achieved higher success rates in predicting early stage diabetes risk compared to both the methods used in this study and those documented in the existing literature for this dataset.

  • Furthermore, we developed a web-based application for early stage diabetes risk prediction, which can be integrated into patient decision support systems.

2 Materials and methods

2.1 Genetic algorithm

The optimization strategy has been acknowledged in the literature as a means of achieving the best result that is closest to the solution. It permits the determination of unknown parameter values within specific allowable limitations [22]. Often utilized in the literature, GA is an optimization technique that incorporates organic evolutionary processes [23]. The main goal of GA is to produce the best solutions based on the principle of survival of the fittest [24]. The steps of GA are shown in Fig. 1.

Fig. 1
figure 1

GA structure

In Fig. 1, many random solutions are generated for the problem in the genetic space. Each of these solutions is called an individual. The quality of these individuals is measured by the fitness function [25]. The higher the quality of the individual, the more likely it is to survive both the selection and the next generation. The crossover process produces new individuals from two selected individuals. Individuals are mutated to maintain diversity in the population [26].

2.2 Stacked autoencoder

SAE is a stacked unsupervised neural network formed by a collection of autoencoder (AE) neural networks. The AE neural network basically consists of two parts: an encoder and a decoder. It converts the input data into features using the encoder and then reconstructs the input data by converting these features back to the initial data through the decoder [27]. A simple AE structure is presented in Fig. 2.

Fig. 2
figure 2

AE structure

The structure of AE consists of 3 layers as shown in Fig. 2. In AE, the output of each hidden layer is transferred to the next hidden layer as input data. AEs are divided into 2 parts: encoder part and decoder part [28]. AEs reduce the size of the data by feature extraction in the encoder part. Equation (1) is used for this process.

$$\begin{aligned} y=s \left(W^tx+b\right) \end{aligned}$$
(1)

In the equation, s represents the Gaussian, sigmoid, and tanh activation functions. W represents the weights between the input layer and the hidden layer. b represents the bias value. x represents the input value. The y is a neuron’s scalar output. In the decoder section, the dimensionally reduced data in the hidden layer are decoded. At this stage, the data dimensions are made close to the dimensions of the input data. Equation (2) is used for this process.

$$z = s\left( {Wy + b^{\prime } } \right)$$
(2)

where z represents the reconstructed state of the input values. After this step, a back-propagation algorithm is used to bring the new values close to the input layer data. Equation (3) is used for this.

$$\begin{aligned} min\sum _{i=1}^m(z-x)^2 \end{aligned}$$
(3)

This process is performed to reveal important data in the data [29].

The overlearning problem within the AE architecture is solved by using the regularization method presented in Eq. (4) [30].

$$min\left[ {\left( {\sum\limits_{{j = 1}}^{m} {(x^{\prime } - x)^{2} } } \right) + \gamma L(w)} \right]$$
(4)

In Eq. (4), \(\gamma\) represents the regularization parameter. L(w) represents the weight adjustment parameter. Here, the error term is multiplied by the back-propagation algorithm of the weighting factor to avoid overlearning. At this stage, the values of the parameters after sigma are determined by trial and error to obtain the best result.

For more efficient classification, multiple autocoders can be connected to each other. The connected autocoders form a stacked AE. A simple stacked AE is shown in Fig. 3.

Fig. 3
figure 3

Stacked autoencoder and deep learning structure

As shown in Fig. 3, the output data obtained with stacked AE are used as input data for the Softmax classifier. With this structure, the data in the input layer are classified by the Softmax classifier in the output layer. Softmax classifier is a probability-based linear classifier used when there are two or more classes [30].

The stacked autocoder architecture shown in Fig. 3;

  • Number of encoders and decoders

  • Number of layers in encoders and decoders

  • Activation function used in layers in the encoder section

  • Activation function used in layers in the decoder section

  • Weight regularization coefficient (L2WeightRegularization) value used to prevent overfitting and improve the generalization ability of models

  • Sparsity ratio coefficient (SparsityRegularization) value, which helps back-propagation and is used in dilution

  • Sparsity adjustment coefficient (SparsityProportion) value, which helps backscatter and is used in dilution

  • Lscaledata values directly affect the performance of the architecture.

In order to achieve high success in solving classification problems, it is almost impossible to set these parameters appropriately by trial-and-error methods. It is imperative to establish hybrid structures with optimization algorithms for these parameters, which should be determined specifically for the problem, to achieve high successful results [18].

2.3 Other methods used in experimental studies

The artificial intelligence methods utilized in experimental studies for early stage diabetes risk prediction, commonly found in the literature, are presented in Table 1.

Table 1 Other methods used in experimental studies

These methods presented in Table 1 are frequently used in classification problems [35,36,37,38].

3 Experimental studies and results

3.1 Dataset

In the study, the early stage diabetes risk prediction dataset available at the University of California Irvine (UCI) was used for diabetes risk prediction. This dataset is frequently used in the literature [39,40,41]. The attributes in this dataset are shown in Table 2.

Table 2 Dataset and attributes

In the dataset, whose attributes are shown in Table 2, there are 16 disease-defining attributes. With these attributes on the dataset, it is shown whether people belong to the diabetes class or not. There are a total of 520 patients in the dataset, 192 women and 328 men. While 200 of these patients were healthy, 320 of them were diagnosed with diabetes.

The data in the dataset used in the study are divided into two groups as 70% training and 30% testing.

3.2 Performance evaluations matrices

In this study, accuracy, sensitivity, precision, accuracy, and F1 score values will be used to determine the success of each classifier. In order to calculate these values, the complexity matrix of the classifiers must first be created. For a 2-class classification, the complexity matrix is shown in Fig. 4.

Fig. 4
figure 4

2-class complexity matrix

In Fig. 4, TP represents true positives, that is, the number of healthy individuals estimated to be healthy. FP represents true negatives, that is, the number of sick individuals estimated to be healthy. TN represents false positives, i.e., the number of sick individuals predicted to be sick. FN represents false negatives, that is, the number of healthy individuals predicted to be sick.

The calculation of accuracy, precision, F1 score, and recall based on a complexity matrix shown in Fig. 4 is shown in Eqs. (58).

$$\begin{aligned} Accuracy\,(A)= & {} \frac{TP+TN}{TP+TN+FN+FP} \end{aligned}$$
(5)
$$\begin{aligned} Precision\,(P)= & {} \frac{TP}{TP+FP} \end{aligned}$$
(6)
$$\begin{aligned} Recall\,(R)= & {} \frac{TP}{TP+FN} \end{aligned}$$
(7)
$$\begin{aligned} F1 score= & {} \frac{2*P*R}{P+R} \end{aligned}$$
(8)

3.3 Proposed deep learning model

In this section, a hybrid deep learning model architecture was created with GA, SAE and Softmax classifier. The flow diagram of the architecture is shown in Fig. 5.

Fig. 5
figure 5

Hybrid deep learning model

In Fig. 5, initially, SAE architectures will be generated with various configurations and hyperparameters within defined limits corresponding to the number of populations. Each architecture will be paired with a Softmax classifier. Subsequently, these composite structures will undergo training followed by testing on previously unseen data. Throughout each iteration of the GA, both the architectural parameters and hyperparameters within the architectures will be optimized to enhance prediction accuracy, leveraging the capabilities of the GA.

Most of the information in the data set used in the flow diagram presented in Fig. 5 has first been preprocessed. In this preprocessing step, 364 data points were randomly selected from the dataset for the training dataset. The remaining 156 data points were allocated as the test dataset for evaluating each gene within the architecture. The representations of the data used in the proposed model are shown in Table 3.

Table 3 Data preprocessing

The dataset comprises 16 attributes that evaluate the risk of early stage diabetes, resulting in a classification area derived from these features. The presented model aims to make predictions utilizing these 16 attributes. The representation of the attributes in the dataset shown in Table 3 will provide ease of coding.

In the flow diagram presented in Fig. 5, the initial parameters are determined as the first step. In the proposed hybrid deep learning model, the initial parameters were determined as a result of experimental studies and these parameters are shown in Table 4. The proposed hybrid deep learning model was developed in MATLAB platform due to the ease of coding with multilayer data.

Table 4 GA parameters for hybrid deep learning model

The parameters presented in Table 4 are necessary for the GA to function in the proposed hybrid deep learning model. After determining the initial parameters, the next step for the proposed model is to create the initial population. In the proposed hybrid deep learning model, each gene contains parameters that can create SAE and Softmax classifiers. The chromosome sequence of each gene is shown in Table 5.

Table 5 Chromosome Sequence

These parameters, presented in Table 5 and present in each gene structure, are determined randomly. The EnDeN value for each gene (G) is randomized according to the constraint function presented in Eq. (9).

$$G_{i} \,En\,De\,N_{j} (x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {x < 1} \hfill \\ x \hfill & {1 \le x \le 3} \hfill \\ 3 \hfill & {x > 3} \hfill \\ \end{array} } \right.$$
(9)

The \(G_{i}EnDeN_{j}(x)\) given in Eq. (9) represents the EnDeN value of gene i in the j iteration.

The value of EnDeHLN for each G is randomly determined according to the constraint function presented in Eq. (10).

$$G_{i} \,En\,De\,HLN_{j} (x) = \left\{ {\begin{array}{*{20}l} 5 \hfill & {x < 5} \hfill \\ x \hfill & {5 \le x \le 100} \hfill \\ {100} \hfill & {100 < x} \hfill \\ \end{array} } \right.$$
(10)

The \(G_{i}EnDeHLN_{j}(x)\) given in Eq. (10) represents the EnDeHLN value of gene i in the j iteration.

The EnAF value for each G is randomly determined according to the constraint function presented in Eq. (11).

$$G_{i} \,En\,AF_{j} (x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {x < 1} \hfill \\ x \hfill & {1 \le x \le 2} \hfill \\ 2 \hfill & {2 < x} \hfill \\ \end{array} } \right.$$
(11)

The \(G_{i}EnAF_{j}(x)\) given in Eq. (11) represents the EnAF value of gene i in the j iteration. The activation functions corresponding to the determined \(G_{i}EnAF_{j}(x)\) value are shown in Table 6.

Table 6 EnAf parameters and values

The DeAF value for each G is randomly determined according to the constraint function presented in Eq. (12).

$$G_{i} \,De\,AF_{j} (x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {x < 1} \hfill \\ x \hfill & {1 \le x \le 3} \hfill \\ 3 \hfill & {x > 3} \hfill \\ \end{array} } \right.$$
(12)

The \(G_{i}DeAF_{j}(x)\) given in Eq. (12) represents the DeAF value of gene i in the j iteration. The activation functions corresponding to the determined \(G_{i}DeAF_{j}(x)\) value are shown in Table 7.

Table 7 DeAf parameters and values

The L2WR value for each G is randomly determined according to the constraint function presented in Eq. (13).

$$G_{i} \,L2\,WR_{j} (x) = \left\{ {\begin{array}{*{20}l} {0.001} \hfill & {x < 0.001} \hfill \\ x \hfill & {0.001 \le x \le 0.01} \hfill \\ {0.01} \hfill & {x > 0.01} \hfill \\ \end{array} } \right.$$
(13)

The \(G_{i}L2WR_{j}(x)\) given in Eq. (13) represents the L2WR value of gene i in the j iteration.

The SPR value for each G is randomized according to the constraint function presented in Eq. (14).

$$G_{i} \,SPR_{j} (x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {x < 1} \hfill \\ x \hfill & {1 \le x \le 5} \hfill \\ 5 \hfill & {x > 5} \hfill \\ \end{array} } \right.$$
(14)

The \(G_{i}SPR_{j}(x)\) given in Eq. (14) represents the SPR value of gene i in the j iteration.

The SPP value for each G is randomly determined according to the constraint function presented in Eq. (15).

$$G_{i} \,SPP_{j} (x) = \left\{ {\begin{array}{*{20}l} 0 \hfill & {x < 0} \hfill \\ x \hfill & {0 \le x \le 1} \hfill \\ 1 \hfill & {x > 1} \hfill \\ \end{array} } \right.$$
(15)

The \(G_{i}SPP_{j}(x)\) given in Eq. (15) represents the SPP value of gene i in the j iteration.

The SD value for each G is randomly determined according to the constraint function presented in Eq. (16).

$$G_{i} \,SD_{j} (x) = \left\{ {\begin{array}{*{20}l} 1 \hfill & {x < 1} \hfill \\ x \hfill & {1 \le x \le 2} \hfill \\ 2 \hfill & {x > 2} \hfill \\ \end{array} } \right.$$
(16)

The \(G_{i}SD_{j}(x)\) given in Eq. (16) represents the SD value of gene i in the j iteration. The corresponding value in the hybrid structure for the determined \(G_{i}SD_{j}(x)\) value is shown in Table 8.

Table 8 SD parameters and values

In the next step of the proposed hybrid deep learning model, each G performs a deep network learning according to the values it contains. The training section reserved on the data set is used for learning. After the SAE learning phase created by each G, they were put into the testing phase with the remaining 30% of the data set. The test results reveal the performance of the constructs created for each G. The SAE or AE performance of each G is determined according to the fitness functions given in Eqs. (1719).

$$Traf(G_{i} ) = Learning - Accuracy\left( {YAENCO_{i} } \right)$$
(17)
$$Valf(G_{i} ) = Test - Accuracy\left( {YAENCO_{i} } \right)$$
(18)
$$Fitf(G_{i} ) = Max\left( {Traf(G_{i} ) + Valf\left( {YAENCO_{i} } \right)} \right)$$
(19)

\(YAENCO_i\) given in the equations, i. represents the SAE or AE generated by the gene. \(Traf(G_{i})\) represents the relevance of the SAE or AE generated by gene i in learning. \(Valf(G_{i})\) represents the suitability of the SAE or AE generated by the i gene at the test stage. \(Fitf(G_i)\) represents the fitness value of i gene.

The next step of the proposed hybrid deep learning model is the selection process. In the selection process, the cumulative Fitf values of all G’s in the population are calculated and placed on the roulette wheel according to these values. In the selection process, random roulette values were determined and as many genes as the number of populations shown in Table 4 were selected. In this step, the roulette wheel method was chosen because it increases the survival probability of genes with high fitness values and produces successful results.

The next step in the proposed hybrid deep learning model is the crossover process. In this step, random genes were randomly selected in pairs for the crossover process and the crossover process was performed among them. The crossover process between genes was performed over a single randomly determined point.

The next step in the proposed hybrid deep learning model is the mutation process. In this step, a random mutation value is generated for each chromosome in each gene. This value generated for each chromosome in each gene was compared with the mutation value shown in Table 4, and the chromosomes to be mutated were determined. While applying the mutation process to the chromosomes in the genes, the restriction functions presented in Eqs. (916) were also taken into account.

As a result of the processes executed in each iteration within the hybrid deep learning model, each gene possesses a distinct architecture, varying numbers of autoencoders, and diverse hyperparameters. Consequently, during each iteration, models with disparate architectures and hyperparameters are trained using the training dataset and evaluated with the test dataset to ascertain their effectiveness.

The proposed hybrid deep learning model was run for 200 iteration as shown in Table 4. After the run, the most successful gene in the population was determined according to the fitness functions given in Eqs. (1719). The chromosome values in the most successful gene according to the fitness values are shown in Table 9.

Table 9 The most successful gene chromosomes

An AE and Softmax classifier structure built with the parameters presented in Table 9 is shown in Fig. 6.

Fig. 6
figure 6

An AE and Softmax classifier structure

4 The experimental results and discussion

In this study for early stage diabetes risk prediction, KNN, DT, SVM, and CNN clustering methods were used along with the hybrid deep learning model presented in this study. The complexity matrix of each model used in the study is shown in Fig. 7.

Fig. 7
figure 7

Confusion matrices of the models used

In the confusion matrices depicted in Fig. 7, the SVM model exhibited the poorest performance in classifying early stage diabetes risk. Conversely, the hybrid deep learning model emerged as the most accurate predictor for diagnosing the disease. According to the values presented in Fig. 7, the proposed hybrid deep learning model performed a more successful classification process than the other classifiers. The ROC curves of each classifier according to the complexity matrices shown in Fig. 7 are presented in Fig. 8.

Fig. 8
figure 8

ROC curves of the models used

The ROC curve shows the false-positive rate on the x-axis and the true-positive rate on the y-axis. The AUC (area under the curve) value is the area under the ROC curve and is used to measure the success of a model. The larger the AUC, the better the performance of the model. The best AUC value is 1. Figure 8 shows that the presented hybrid deep learning model performs the best compared to other methods.

Table 10 shows the results of the methods used to classify the early stage diabetes risk prediction dataset according to the evaluation metrics given in Eqs. (1719).

Table 10 Results for diabetes classification

In Table 10, the SVM model yielded the lowest accuracy rate, followed by DT, KNN, and then CNN models. Conversely, the proposed hybrid deep learning model achieved the highest accuracy. When the values given in Table 10 are examined, it is seen that the proposed hybrid deep learning model produces more successful results than other methods. The results obtained with the hybrid deep learning model proposed with the previous studies in the literature on the early stage diabetes risk prediction dataset are shown in Table 11.

Table 11 Literature comparison with the proposed hybrid deep learning model

As can be seen in Table 11, the proposed hybrid deep learning model has produced more successful results than the previous studies in the literature.

5 Application

In this part of the study, a web-based application for early stage diabetes risk prediction is developed using the findings from the experimental studies. First of all, a functional architecture was coded in MATLAB platform using the parameters obtained with the proposed hybrid deep learning model. The structure created as a function in MATLAB platform takes the data presented in Table 2 as parameters, respectively. The function sends back the result 0 or 1 according to the data it receives as parameters. The prepared function was converted into a DLL file on the MATLAB platform. This DLL file was used in the web-based software developed in Visual Studio 2022 Community platform. While developing the web-based application, C-Sharp programming language was chosen as the coding language and Visual Studio 2022 Community was chosen as the platform due to its free and easy interface. With the DLL file added to the Reference section of the created application project, an early stage diabetes risk prediction application developed with C-Sharp programming language was created. The interfaces of the created application are shown in Fig. 9.

Fig. 9
figure 9

Web-based application interfaces

In Fig. 9a, the parameters used as input in the proposed hybrid deep learning model are received from the user. These parameters received from the user are sent to the function in the DLL through the application. The user is informed in Fig. 9b and c according to the 0 or 1 value returned as a result of the processing of these parameters sent to the generated DLL file.

The early stage diabetes risk prediction application showcased in Fig. 9 is web-based and designed for seamless integration into medical field applications. Consequently, it holds substantial potential for widespread adoption and utilization.

6 Conclusion and discussion

Diabetes is a disease that has turned into a worldwide epidemic, affects the quality of life of people and other body organs if no precautions are taken, and can even result in death in the very advanced stages of the disease. Measures to be taken with the early diagnosis of this disease can eliminate the bad consequences of this disease. In this article, a hybrid deep learning model based on GA-SAE-Softmax classifier is proposed for early stage diabetes risk prediction. Early stage diabetes risk estimation data set, which is frequently used in the literature and taken from UCI, was used as the data set in the proposed hybrid deep learning model. In the proposed deep learning model, SAE and Softmax classifier are combined and hyperparameters in the structure are optimized with GA in order to maximize the performance of the created architecture. In addition, different methods were applied to the same data set. Following the experimental studies, the proposed hybrid deep learning model achieved a prediction accuracy of 98.72% on the dataset. Experimental results have shown that the proposed deep learning model gives better results than other methods, as well as better results than studies conducted on the same data set so far.

With the proposed hybrid deep learning model;

  • A model with higher accuracy than the studies in the literature has been created by using the attributes in the early stage diabetes risk prediction dataset.

  • The number of encoders and decoders that should be used in an architecture built with SAE and Softmax classifier in early stage diabetes risk prediction, the number of hidden layers in the encoder and decoder, the activation functions to be used in the layers in the encoder section, the activation functions to be used in the layers in the decoder section, the weight adjustment coefficient used to prevent overfitting of the model and to increase the generalization sail, the sparsity ratio coefficient that helps back-propagation in the model and is used in dilution, the sparsity adjustment coefficient were determined by optimizing as a whole.

  • In addition to optimizing all the parameters required for a new architecture to be created, the proposed model also provides access to these parameter values.

  • The proposed hybrid deep learning model distinguishes itself from other methods by crafting problem-specific structures capable of achieving high success rates through the fusion of prominent features from various methods incorporated within the model.

With the web-based application prepared by using the values of the parameters determined by the hybrid deep learning model as a result of experimental studies;

  • Doctors who are not experts in the field will be able to perform diabetes risk prediction.

  • With the application, early diagnosis will be made and necessary precautions will be taken before the disease progresses, and diabetes that will cause serious consequences will be prevented.

  • Thanks to the early diagnosis application, it will significantly save the use of medication that patients need to use in the later stages, the cost and time to be spent for these drugs.

  • It will prevent serious health problems that will occur with the progression of diabetes and the death rate caused by the disease will decrease.

This developed web-based application can be applied as an alternative method in health decision support systems.

The limitation of this study is bound by the size of the dataset utilized. To enhance the precision and effectiveness of predictions and diagnoses, it is advisable to generate larger datasets and subsequently refine them for further analysis.

In future studies, an application can be developed for smartphones without the need for internet in the light of the findings obtained from this study.