1 Introduction

As the twenty-first century progresses, we are witnessing globalization, changes in people’s lifestyles and industrialization, one of the consequences of which is a change in the pattern of diseases [1]. Until recently, contagious diseases were considered to be the major health problem in third world countries, but now the increasing role of non-contagious diseases in mortality, especially in developing countries, is a serious threat. Diabetes is one of the most important diseases in this group [2]. Diabetes is a chronic endocrine disorder characterized by a malfunction in glucose metabolism due to problems with the production or utilization of insulin hormone. The long-term risks of diabetes are extremely serious for health, such as premature death, blindness, loss of organs if gangrene is not controlled, and impotence. Patients that require insulin treatment and whose disease has begun in childhood, adolescence, or early adulthood are at risk for such problems [1].

Self-care behavior, which is a key concept in health promotion, refers to decisions and activities that a person can use to adapt to a health problem or improve his health. Self-care behaviors prevent early and late complications of the disease and guarantee a long life for the patient. In diabetes, self-care is one of the most important factors for controlling the disease. Empowerment and acceptance status are personality factors that affect patients’ status and increase their ability to deal with problems such as illnesses. According to existing studies, the most important predictor of mortality in diabetic patients is lack of self-care [3].

Nowadays, it is important in medical science to collect a great deal of data on various diseases. Medical centers collect this data for a variety of goals. Researching these data to obtain useful results and models for diseases is one of the goals of using these data. A large amount of data and confusion resulting from that is a problem that prevents us from achieving acceptable results. Data mining is therefore used to overcome this problem and find useful relationships between risk factors in diseases [1].

The intensity of competition in the scientific, social, economic, political, and military fields has also increased the importance of speed or time of access to information. Therefore, the need to design systems that are capable of quickly discover interest information to users, with a focus on minimal human intervention, on the one hand, and approaching analysis methods proportional to the volume of bulk data, on the other, is well sensed. At present, data mining is the most important technology for the efficient, accurate, and rapid processing of bulk data, and its importance is increasing. Data mining is a bridge between statistics, computer science (CS), artificial intelligence (AI), pattern recognition (PR), and data machine learning. Data mining is a complex process for identifying the correct, new and potentially useful patterns and models in a large amount of data, so that these patterns and models are understandable to humans [4].

Data mining is not a product that can be purchased but is a scientific process that should be implemented as a project. Data are often bulky and cannot be used alone, but the hidden knowledge in the data can be used. Therefore, utilizing the power of data mining processes to identify patterns and models as well as the relationship between different elements in the database to discover the knowledge behind the data and ultimately convert the data into information becomes more and more essential. Data mining usually refers to the discovery of useful patterns among the data. A useful pattern is a model of data that describes the relationship between a subset of data and is valid, simple, understandable, and new [4].

In the information age, data are one of the most important assets of any organization. However, data can become a valuable resource for the organization when used correctly. To transform the potential value of data into usable information and knowledge, many organizations have adopted “data mining”. Because through data mining, it will be possible to discover the relationships, trends, and patterns hidden among data and gaining new knowledge in the field of explicit and latent organizational challenges [5].

In this paper, we try to create a data mining system that can first preprocess the collected data by laboratory tests of 1573 patients in the endocrinology department of Mazandaran University of Medical Sciences. Secondly, we use the one-versus-all method of SVM classifier to predict the type of disease based on the medical data of every patient into seven different diabetic complications, namely eye problem complication, high-blood-pressure complication, dialysis history complication, heart attack complications, stroke complications, diabetes foot ulcer complication, and diabetes coma complication. Thirdly, we improve the accuracy of the SVM method by feeding selected features of a patient using improved grey wolf optimizer (GWO). The improved GWO uses weighted adaptive middle filter (WAMF) at each step of the algorithm implementation, to filter the outliers (wolves far from the target) through a dynamic window. GWO algorithm [6] is a part of swarm intelligence algorithms [7]. These algorithms are widely used in many other practical application [810]. In this paper, we show how the GWO algorithm can be used in the medical area.

In brief, the structure of the paper is organized as follows: In Sect. 2, related work is presented. The proposed method is fully described in Sect. 3. The simulation results of the proposed algorithm and conclusion are summarized in Sects. 4 and 5, respectively.

2 Related work

Until now, many classification methods have been proposed for diabetes diagnosis problems that can be broadly classified into four major categories.

Artificial neural network (ANN)-based categories of classification method is the most frequently used method reported in the literature. In 2007, Anbananthen et al. [11] used ANN and DT made of C4.5 algorithm to diagnose diabetes in individuals based on features such as age and blood pressure. In 2008, Chan et al. [12] have studied the microvascular complications of diabetes. To do so, he compared the C5.0 algorithm and the multilayer perceptron neural network (MLP NN). Different factors have been identified for each of these complications, and their effect on each complication has been studied. Patil and Durga [13] have used the a priori algorithm to create turbulence rules for finding hidden relationships between variables. In 2009, Fang [14] has used various data mining techniques to cluster patients with diabetes. Important features considered in this study are age, family history, and weight. The accuracy of the model created using clustering is 80%. In 2014, Ganapathy et al. [15] propose a pattern classification system by combining temporal features with fuzzy min–max (TFMM) neural network-based classifier for effective decision support in medical diagnosis. In this work, a particle swarm optimization (PSO) algorithm-based rule extractor is proposed for improving the detection accuracy. Accuracy of the proposed TFMM-PSO method is compared with other methods [1620] using the University of California Irvine (UCI) Machine Learning Repository Dataset [21]. Most of the reviewed methods lack in selecting a proper number of features that make the classifiers slow.

Decision tree-based algorithms can be categorized as the second batch of methods used for diabetic prediction. Breault et al. [22] performed the classification and analysis of regression using the classification and regression tree (CART) system in 2002 and deduced the dependency between a series of features. The classification accuracy was 59.9%. Miyaki et al. [23] also have used the card method to judge the factors influencing the incidence of diabetes in 2002. Rohlfing et al. [24] used linear regression analysis to examine the relationship between type 1 diabetes and HbA1c in 2002. Silverstein et al. [25] performed experiments on three medical databases and produced rules and then compared these rules with predetermined rules.

Trautvetter et al. [26] have used the association rule and decision tree (DT) to extract knowledge from the medical database. Juan et al. [27] have developed a type 2 diabetes data processing system (DDPS) using a combination of C4.5 and EM (maximum expectation) algorithms in 2007. Jarullah [28] has used the DT to diagnose type 2 diabetes. DT is generated using J48 decision tree classification algorithm (DTCA) in Weka software. Aljumah et al. [29] have used regression to analyze the prediction of diabetes treatment in two groups of young and old ages based on drug treatment and side effects. Antonelli et al. [30] have proposed a multi-level clustering-based analysis framework for identifying treatment pathways and examining patients for specific diseases. The proposed method has worked well in identifying groups of patients with similar disease history and increasing the severity of their complications. All decision tree-based algorithms need prior knowledge about different classes that require many annotated samples by experts to design the tree.

SVM-based algorithms are the third type of methods that we discovered in our literature review. In 2007, Huang et al. [31] conducted a study on identifying the major factors affecting diabetes controlling by using feature selection in the patient management system. 1n 2008, Han et al. [32] predicted diabetes in the patient database using Rapid Miner software and ID3 decision tree algorithm (DTA). In 2007, Cho et al. [33] predicted the presence of neuropathy in diabetic patients using SVM classification, feature selection, and visualization. In [34], authors have attempted to diagnose diabetes using data mining algorithms that are very important in diagnosis and prediction. In this study, SVM, k-nearest neighbor, Bayes network (BN), ID3, C4.5, C5.0, and CART are used for diabetes detection. In this study, 768 diabetic patients from the PID dataset with 8 important features are used to train and test the data, 80% of which are used as training data and 20% are used as test data. The results show that the SVM model is more accurate than other algorithms, and has an accuracy of 81.77%. Han et al. [35] have developed a batch system for the diagnosis of diabetes. They specifically used the SVM to diagnose diabetes. In this study, SVM is used to screen diabetes, while at the same time a group learning module is added to make the black box related to SVM decisions more comprehensive and transparent. In addition, this scheme is a useful and appropriate method to solve the imbalance problem. Radha and Srinivasan [36] have used three classification methods to predict diabetes. This study compares the results of five supervised data mining algorithms using five performance criteria. The three algorithms are C4.5, SVM, and k-nearest neighbor. The performance of data mining algorithms is compared based on accuracy, computation time, and bootstrap accuracy. This study describes the algorithmic discussion of the UCI dataset for this disease in the large dataset repository. In  [37, 38], authors have used hybrid methods for feature selection and SVM for classification. In the existing databases, there are some not-so distinct and redundant features. These features are major contributing factors to the success of the classification tool and system processing time. The system developed in this study has attempted to increase system speed and success by eliminating these redundant features. Therefore, the purpose of this study is to investigate the effect of removing unnecessary and obsolete features from the dataset on classification success by using an SVM classifier. The feature selection algorithm based on the Bee Colony Optimization Algorithm (BCOA) developed in this study is the first sample of the BCOA used in feature selection. We also choose to use SVM in order to classify diabetic complications. However, we find that using SVM alone is not very accurate, so that we improve the method by selecting relevant features using an improved GWO method of optimization.

3 The proposed data mining system

In this section, we discuss the preprocessing method and the improved GWO method and also feature selection part of SVM classifier, totally called a complete data mining system.

3.1 Data aggregation

Required data are collected from the endocrinology department of Mazandaran University of Medical Sciences. The file information is from the second half of the year 2015. There are 1573 initial records of patients, 53 of which lack of complete information. The average age of patients is 53 years, and 30% are male and the rest are female. 70% of patients have a family history of diabetes. The laboratory features of the patients are evaluated and identified at this stage. For each patient, 23 features including name, family, file number, address, height, weight, age, body mass index, gender, heredity, maximum blood pressure, minimum blood pressure, education, fasting blood sugar, 2-h blood sugar, cholesterol, harmful fat, useful fat, triglyceride, blood urea, creatinine, activity rate, tobacco use, and 8 complications including high blood lipids, eye complication, high blood pressure, dialysis history, cardiac problems, stroke, diabetic foot ulcer, diabetic coma have been registered.

3.2 Preprocessing

Preprocessing in data mining usually involves in data cleaning (DC), data integration (DI), data reduction (DR), and data transformation (DT). In the real world, data are not always perfect, and as for medical information, this is always true. Therefore, if the quality of the data is not good enough, some steps of the preprocessing should be performed on data to improve the quality of data and deliver high-quality data to the data mining algorithm to minimize the impact of data weakness. Usually, data preprocessing and preparation consumes more than 70% of the time required for data mining and 75–90% of the success of data mining projects depends on that. In this study, Naim software (https://www.knime.com/) is used for data preprocessing and preparation. Table 1 describes details of the dataset.

Table 1 Field description of the Mazandaran University of Medical Sciences dataset

In this work, we employ DC and DT of preprocessing steps that are discussed in the following subsections.

3.2.1 Data cleaning (DC)

DC that is sometimes called data cleansing is the process of detecting, deleting, or correcting corrections in a database that have some errors and focuses on quantifying, or removing null attributes, balancing noisy values, detecting and deleting out-of-bounds values.

In this paper, some information such as name, family, file number, and address is removed from the file. Next, we excluded the records of patients who have incomplete test information such as cases that had zero values for blood pressure, fasting blood glucose, blood glucose 2 h after the meal, and triglyceride because of the impact of these features on the final result. Basically, incomplete test results occurred in two cases. In the first case, the patient does not cooperate and does not complete the test process. In the second case, the patients are not recognized as diabetic patients based on their first visit. Chen and Astebro [39] proved that rational deletion is an efficient way of replacing important features with techniques such as mean, random assignment, regression assignment, and Bayesian models. Samples with several missing features are also deleted, and other missing values are initialized using common and probable values. Some features, such as blood urea and creatinine, are not important alone; if the ratio is between 10 mg/dL (milligrams per deciliter) and 20 mg/dL, the normal condition is reported and more than 20 mg/dL means gastrointestinal bleeding or urinary tract obstruction. The ratio of these features indicates the likelihood of a kidney complication. Height and weight are not important lonely, but their body mass index is effective. As a result, these features have been removed and related indicators have been used instead.

Classification is a data preprocessing technique that minimizes the impact of minor errors that occur when receiving data. So classification is used to solve the noise problem in data. Data can be categorized in various ways, and then the data of each category can be represented in a more general sense. Based on reputable scientific and medical resources and sites and with the approval of a specialist physician, features such as body mass index, systolic blood pressure, diastolic blood pressure, fasting blood sugar, 2-h postprandial blood sugar, cholesterol, high-density lipoprotein (HDL), low-density lipoprotein (LDL), and triglyceride are classified as follows.

\(\bullet\) Body mass index classification

Body mass index is a statistical measure to compare a person’s weight and height. In fact, this measurement does not measure obesity but is a useful tool for estimating one’s weight according to height. This index was developed between 1830 and 1850. It is very simple to calculate and is used in many applications to determine overweight and weight loss. Body mass index is obtained by dividing a person’s weight in kilograms by his/her square height in meters as shown in Eq. 1. Table 2 shows the body mass index classification.

$$\begin{aligned} \mathrm{{Body}}\;\mathrm{{mass}}\;\mathrm{{index}}= \frac{\mathrm{{weight}}\;\mathrm{{in}}\; \mathrm{{kilograms}}}{\left( \mathrm{{height}}\;\mathrm{{in}}\;\mathrm{{meters}}\right) ^2} \end{aligned}$$
(1)
Table 2 Body mass index classification

\(\bullet\) High-blood-pressure classification

High blood pressure (hypertension) is a chronic disease in which blood pressure in the arteries rises. Following this increase in pressure, the heart must work more intensely than normal to maintain blood circulation in the blood vessels. Blood pressure consists of two systolic and diastolic scales that are dependent on the contraction (systolic) or relaxation (diastolic) of the heart muscle between beats. Nearly 50% of patients with high blood pressure are unaware of their disease, and many patients are accidentally informed of their blood pressure. Table 3 shows the classification of systolic and diastolic blood pressure. The units are based on the millimeter of mercury (mmHg).

Table 3 The classification of systolic and diastolic blood pressure

\(\bullet\) Blood sugar classification

High blood glucose (sugar) is one of the risk factors that increase the risk of complications of diabetes. This dataset used two types of blood sugar tests (fasting and 2 h after meals). Table 4 shows the blood sugar classification. The units in this table are measured by milligrams per deciliter (mg/dL).

Table 4 Blood sugar classification

\(\bullet\) Cholesterol classification

Cholesterol is a fatty, wax-like substance that is made in the liver and other cells. Cholesterol that moves through the blood attaches to proteins and forms a package called a lipoprotein. Lipoproteins are divided into high-density and low-density groups. Tables 56, and 7 show the classification of cholesterol, HDL, and LDL.

Table 5 Cholesterol classification
Table 6 HDL classification
Table 7 LDL classification

\(\bullet\) Triglyceride classification

Triglyceride is a type of fat in the body. Triglycerides act as a source of energy for the body. When you need a lot of energy, the body breaks down these fats and converts them into energy so that cells can use it. But increased levels of triglycerides in the blood can block arteries and damage the pancreas. Table 8 shows the triglyceride classification.

Table 8 Triglyceride classification

In statistics, outlier data are data that are far from the rest of the data. Different methods such as regression and clustering are used to deal with outliers and smooth them. This database uses box plot in Naim software to solve the outlier’s problem.

3.2.2 Data transformation (DT)

Data transformation also called data conversion helps to convert and consolidate data into a form suitable for data mining. There are several methods for converting data such as minimum–maximum normalization. Normalization is a way of putting data in a similar domain. In other words, a data miner may encounter situations where the features contain values that are in a different range or domain. These large-value features may have a much greater impact on the cost function than low-value features. This problem will be solved by normalizing features so that their values are in the same domain. Normal values allow for more accurate comparisons of different datasets. It also reduces the impact of the sharp difference between the values of the different features. To build the data model before starting the model training, the data are subdivided into its largest corresponding value to be normalized to values between zero and one. This will minimize the effect of the actual scale, and all entries will be in the same domain. Normalization makes it possible to compare data with different measurement criteria.

Equation 2 shows details of the min–max normalization used in our data conversion phase:

$$\begin{aligned} X'=\frac{X-X_{\mathrm{{min}}}}{X_{\mathrm{{max}}}-X_{\mathrm{{min}}}} \end{aligned}$$
(2)

3.3 Proposed classification method

There are many data mining methods for modeling. In this paper, the SVM classification is used to find the optimal model and pattern. Modeling is done using Naim software. The main method used here is predictive data mining. A ten-step validation method is used to determine the training and experimental data and to evaluate the performance of the proposed method which is a common technique for estimating the efficiency of classifiers.

In short, training is the process of providing feedback to the algorithm to regulate the power of classification prediction. And testing is the process of determining the true accuracy of the classification produced by the algorithm. During testing, data that have never participated in the training are classified. Usually, after each training step, validation is done to determine classification. The validation step does not provide any feedback to the algorithm for the classification adjustment but only specifies when the training algorithm should be terminated. Then, the error and mean error are calculated at each stage. To determine the category label (the type of complication) after consulting with diabetes specialists, it is concluded that each complication should be studied separately for greater accuracy rather than splitting the complication into microvascular and macrovascular groups. Accordingly, the category label (the type of complication) in the created model is shown in Table 9.

Table 9 Model category label

3.3.1 Grey wolf optimizer (GWO)

GWO is one of the latest optimization methods designed and implemented based on social behavior and hunting grey wolf. For some problems, this algorithm can provide better results than other algorithms, such as the PSO algorithm and multi-objective decomposition-based evolutionary algorithm [40].

Grey wolves are considered as the highest level of hunters because there is no natural hunter for this type of animal. Grey wolves usually live in groups of 5–20 wolves. Leaders (the first solution), known as alpha, have the duty of deciding on hunting. The second group of grey wolves belongs to the beta class (second-best solutions). Beta wolves help alpha wolves in decision making and other activities in the group. The lowest level in the hierarchy of grey wolves is omega wolves that play the role of goat (third-best solutions). Omega wolves need to join higher classes if needed. Wolves that are not in any of the alpha, beta, or omega categories belong to the delta category. Delta wolves (rest of the candidate solutions) follow the alpha and beta classes, but omega wolves are dominated.

In brief, the common steps of the GWO algorithm are as follows:

  • Generate initial population of wolves based on a set of random solutions,

  • Calculate the corresponding objective value for each wolf,

  • Choose the first three best wolves and save them as alpha, beta, and omega,

  • Update the position of the rest of the population (delta wolves) using equations given in [40],

  • Update parameters a, A, and C,

  • Go to the second step if the criterion is not satisfied,

  • Position and score of the alpha solution is returned as the best solutions.

3.3.2 Improved GWO using weighted adaptive middle filter

The most important factor that controls the performance and accuracy of an optimization algorithm is the compromise between exploration (efficiency) and exploitation. Exploration means the ability of the search algorithm to search different areas of the search space to locate the appropriate optimum. On the other hand, efficiency is the ability to focus the search in the desired range to scrutinize the solution. A good optimization algorithm balances these two contradictory goals. In any algorithm or in the complementary version, it is attempted to improve the performance of the method by controlling these two parameters. The experience shows that in the early iterations, the exploration power needs to be increased and the efficiency becomes more pronounced over time. This means that in the initial iterations, the algorithm performs a variety of searches in space and in the last iterations, it searches the found areas more accurately.

In order to increase the efficiency and accuracy of the GWO to reach optimal values, the results of each step of the GWO are filtered using WAMF. In other words, the value of the search criteria is adjusted more precisely to increase the optimization accuracy. At each step of the algorithm implementation, outlier solutions (wolves far from the target) are filtered through a WAMF with dynamic window in order to increase the accuracy of the algorithm. Algorithm 1 shows the pseudo-code of the improved GWO.

3.3.3 Applying filter at each step of the GWO implementation

As shown in Algorithm 1, a temperature parameter is defined with an initial value of zero and the final value of 1000 at the beginning of the algorithm. The number of wolves, or agents, is considered \(n=25\). The GWO starts by creating a random population of grey wolves (candidate solutions). After assigning random values to parameters C, a, A, the fitness of each individual is defined based on its non-dominated sorting GA (NSGA-II) [41] which is the most popular solution Eq. 5. The calculated fitness of each factor is then put into one of the \(\alpha\), \(\omega\), \(\beta\), or \(\delta\) categories based on their value.

Once the set of agents has been specified, the position of each agent is updated at each iteration using Eqs. 35 [6].

$$\begin{aligned} \overrightarrow{D_{\alpha }}&= |\overrightarrow{C_{1}}\cdot \overrightarrow{X_{\alpha }}-\overrightarrow{X}|,\overrightarrow{D_{\beta }}=|\overrightarrow{C_{2}}\cdot \overrightarrow{X_{\beta }}-\overrightarrow{X}|,\overrightarrow{D_{\delta }}=|\overrightarrow{C_{3}}\cdot \overrightarrow{X_{\delta }}-\overrightarrow{X}| \end{aligned}$$
(3)
$$\begin{aligned} \overrightarrow{X_{1}}&= \overrightarrow{X_{\alpha }}-\overrightarrow{A_{1}}\cdot \overrightarrow{D_{\alpha }},\overrightarrow{X_{2}}=\overrightarrow{X_{\beta }}-\overrightarrow{A_{2}}\cdot \overrightarrow{D_{\beta }},\overrightarrow{X_{3}}=\overrightarrow{X_{\delta }}-\overrightarrow{A_{3}}\cdot \overrightarrow{D_{\delta }} \end{aligned}$$
(4)
$$\begin{aligned} \overrightarrow{X}\left( t+1\right)&= \frac{\overrightarrow{X_{1}}+\overrightarrow{X_{2}}+\overrightarrow{X_{3}}}{3} \end{aligned}$$
(5)

As seen in pseudo-code (see Algorithm 1), the algorithm enters the filtering stage before updating parameters C, a, A. In this step, we first define parameter temp (which is the current value of the temperature divided by the final value), variable Rand (which is a random number between zero and one) and variable K (which is the size of the filter window). In the filtering phase, depending on which category the wolf belongs to, a probability is identified for filtering it. That is, the wolves farther from the target are more likely to be chosen. The probability of wolves being selected is as follows:

  • \(P=0.1\) when agent(i) is \(X_{\alpha }\)

  • \(P=0.2\) when agent(i) is \(X_{\beta }\)

  • \(P=0.3\) when agent(i) is \(X_{\delta }\)

  • \(P=0.4\) when agent(i) is \(X_{\omega }\)

If \(P\cdot Rand\le temp\), the selected wolf is eligible for filtration and enters the final step of applying the filter. Otherwise the next wolf will be chosen. In the final step of applying the filter, a window with k nearest neighbor is formed for the selected wolf. The initial value of the window is 3 because, it is the most populated type of solutions. Depending on which neighbor belongs to each category, a weight is assigned to them. The weights of each category of wolves based on their priority are as follows:

  • \(weight(j) = 4\) when window(j) is \(X_{\alpha }\)

  • \(weight(j) = 3\) when window(j) is \(X_{\beta }\)

  • \(weight(j) = 2\) when window(j) is \(X_{\delta }\)

  • \(weight(j) = 1\) when window(j) is \(X_{\omega }\)

In a descending order. That implies we give higher weight to alpha wolves because they are our first expected solutions. The wolves in the window are then sorted, and their mean is calculated after they are weighed.

Med is the middle of the positions of the K nearest neighbors of the selected wolf.

At final stage, mean agent fitness is calculated. If this value is less than the fitness of the selected wolf, then the new position of the selected wolf is calculated using Eq. 6 which is equal to the average of the old position and the Med value.

$$\begin{aligned} \mathrm{{New}}\;\mathrm{{position}}=\frac{\left( \mathrm{{Med}}+\mathrm{{Old}}\; \mathrm{{position}}\;\mathrm{{of}}\;\mathrm{{the}}\;\mathrm{{current}} \;\mathrm{{search}}\;\mathrm{{agent}}\right) }{2} \end{aligned}$$
(6)

Then, the parameters C, a, A are updated, agents’ fitness is calculated and put into the \(\alpha\), \(\beta\), and \(\delta\) categories, and finally algorithm starts the next iteration. Otherwise, the filter window size will be increased by one unit. Then, weighing to the neighbor wolves, middle selection and fitness calculation operations are performed. If the fitness of the mean factor is still less than the selected wolf, \(K=K+1\). This operation will continue for each wolf selected until \(K=10\).

3.3.4 Avoid improved GWO from stocking in local optimum

The filtering operation is controlled by a temperature parameter. Initially, the temperature is 0, which is very low pressure for filtering operation. While the algorithm is running, temperature increases, and as the temperature increases, the filtering pressure increases. This way, a different amount of filter pressure can be realized during the algorithm is running. In other words, the algorithm first starts the filtering operation at very low pressure (almost zero) and increases the filter pressure upward. To prevent the algorithm stuck in local optimum at the beginning of the algorithm, we increase the exploration operation by preserving the diversity and changing the wolf category. The improved GWO is used to adjust parameters C and \(\alpha\). The range of search space for parameter C (penalty parameter) is considered between 0.01 and 3500 and for parameter \(\alpha\) is considered between 0.01 and 32 [42]. The objective function of the improved GWO is defined as follows:

$$\begin{aligned} \mathrm{{Objective}}\;\mathrm{{function}}=\mathrm{{Minimize}} \left( \mathrm{{Error}}\;\mathrm{{rate}}\right) \end{aligned}$$
(7)
figure a

3.3.5 Features selection

The purpose of feature selection techniques is to remove irrelevant and ineffective features in the data. Unrelated features do not provide useful information to the classifier. Feature selection techniques are a subset of feature extraction methods. In feature extraction, new features are created as a function of all problem features, while in feature selection, a subset of all features is selected. Using the feature selection method reduces training time and computational time and increases classifier generalization capability. A feature selection algorithm uses a search method to select a subset of features and an evaluation criterion to rank this subset. In the simplest algorithm, all subsets of possible features are investigated and a subset with the lowest classification error rate is selected. Full search in feature space has a high computational burden. So GWO is used for feature selection in order to solve this problem. Each of these features is important in the diagnosis and prediction of diabetes complications. In other words, not all features are of equal value. For example, in the diagnosis of diabetes, two features (1) body mass index and (2) family history are of different importance. What is the value of each feature and how much does it play in diagnosing the disease is an important issue. In this paper, the value and role of each of them in identifying the various complications is carefully determined by weighing each of the features. The feature selection process in the proposed method involves the following steps:

\(\bullet\) A. Producing function

This function generates candidate sets in the initial population of the GWO to select and weight the features.

\(\bullet\) B. Fitness function

This function evaluates the set of candidate solutions for feature selection and weighting at each stage of the GWO and returns the prediction accuracy as the fitness of each factor.

\(\bullet\) C. Update agent position

Based on the GWO, the position of the agents is updated at each stage.

\(\bullet\) D. Using adaptive middle filter

In order to improve the efficiency of GWO in achieving optimal accuracy in prediction, the results of each step are filtered by WAMF. In other words, the value of the exploration criteria is adjusted more precisely to increase optimization accuracy.

\(\bullet\) E. Termination condition

  1. a.

    Reaching an optimal accuracy in predicting diabetes complications.

  2. b.

    Complete the number of iterations determined to run the GWO.

In the proposed method, to determine the value and role of each feature in the diagnosis and prediction of diabetes complications, a random number between 0 and 1 is assigned to each feature indicating the degree of importance of the features and optimized by the GWO. The weighted values of the features are given to SVM as input and features are selected based on the final weight of each feature. In the validation section, first, the error percentage of the proposed method is compared with that of the GWO-SVM, GA-SVM, and PSO-SVM in 500 iterations. Then, the results of the proposed method are compared with that of machine learning algorithms such as DT, SB, and multilayer perceptron neural network (MLP NN).

4 Experimental results and discussion

Data of 1573 patients are collected from the endocrinology department of Mazandaran University of Medical Sciences. After preprocessing, they are described, simulated, and analyzed in MATLAB 2016 software that is used on an Intel Core i7 processor, 2.60 GHz CPU and 16 GB RAM and the running OS platform is Microsoft Windows 8.1. In the proposed method, to determine the value and role of each feature in predicting complications, a weight in the interval [0, 1] is assigned to each of them indicating the degree of importance of a feature using the proposed optimization method after 500 iterations. Table 10 shows an example of weighted features for all 8 diabetic complications.

Table 10 An example of weighted features for all 8 diabetic complications

The weight of each feature given as input to the SVM is optimized at 500 iterations. In all listed complications, data samples divided into ten subsets were nine of them used for training, and the remaining one used for testing. This procedure was repeated ten times until each of the ten subsets was evaluated. Since GWO is classified as metaheuristic methods, it needs to run multiple times to get the best result. Hence, we repeated the whole process of choosing test and train data ten times (each with randomized data sequences). After optimization for each complication, the error percentage obtained at each stage of the proposed algorithm is compared with that of the GWO-SVM, GA-SVM, and PSO-SVM. Latter, we compare the accuracy of the proposed method with other machine learning algorithms using two different datasets and present results in the following subsections.

4.1 Prediction of health complication

4.1.1 Increased blood lipids complication

Based on the proposed objective function, to predict the complication of increased blood lipids (hyperlipidemia), the proposed method, GWO, PSO, and GA have been used to improve the performance of the SVM algorithm. In Figure 1, the vertical axis shows the error percentage in predicting increased blood lipids, and the horizontal axis represents the number of iterations. In the first iterations, as the initial population is random, error reduction is tangible. But in subsequent iterations the error reduction rate decreases and eventually, the proposed method achieves a better error reduction at the end of the simulation.

Fig. 1
figure 1

Error percentage in predicting increased blood lipids

4.1.2 Eye problem complication

Figure 2 shows the error percentage of the proposed method, GWO, PSO, and GA algorithm to improve the performance of the SVM in predicting eye problems. In this figure, the vertical axis represents the error percentage of predicting eye problems and the horizontal axis represents the number of iterations. As can be seen, in the first iterations, the proposed method has a higher error percentage than other methods, and after the filtering phase starts, the error percentage of the proposed method is lower than that of the PSO and GA. Then, in iteration 140, it improves over the GWO and this trend continues until iteration 500. As a result, the proposed method reaches higher accuracy in predicting eye problems.

Fig. 2
figure 2

Error percentage in predicting eye problem

4.1.3 High blood pressure complication

As can be seen in Fig. 3, the error percentage of predicting high blood pressure in the proposed method is compared to the GWO, PSO, and GA. In this figure, the vertical axis represents the percentage error of predicting high blood pressure complication and the horizontal axis represents the number of iterations. In the early iterations, the error percentage of the PSO and GA is higher than the proposed method. Then, the error percentage of the proposed algorithm is approximately equal to that of the GWO and this trend continues until the end of the simulation. At the end of the simulation, the proposed method has a lower error than other ones which improved the error percentage.

Fig. 3
figure 3

Error percentage of predicting high blood pressure complication

4.1.4 Dialysis history complication

The error percentage for predicting the dialysis complication in the proposed algorithm is compared to that of the GWO, GA, and PSO and shown in Fig. 4. The vertical axis represents the error percentage in predicting dialysis history, and the horizontal axis indicates the number of iterations. In early iterations, the error percentage of the PSO and GWO is higher than that of the proposed method and the error percentage of the GA is lower than that of the proposed method. Then, the error percentage of the proposed method is approximately equal to that of the GWO and is lower than that of the PSO and GA. This trend continues until the end of the simulation. Until iteration 100, PSO has a lower error percentage than the proposed method. Then, by increasing iterations, the error percentage becomes higher than the proposed algorithm but is still lower than the GA. This trend is constant until the end of the simulation. At the end of the simulation, the proposed method has less error than other ones which indicates improved error percentage.

Fig. 4
figure 4

Error percentage in predicting dialysis history complication

4.1.5 Heart attack complications

Figure 5 shows the error percentage in predicting heart problems using the proposed algorithm than GWO, GA, and PSO. The vertical axis represents the error percentage in predicting heart problems, and the horizontal axis indicates the number of iterations. As can be seen, at the beginning of the simulation, the proposed method has the highest error percentage compared to the other algorithms. But by increasing iterations and executing the filtering phase, the error percentage of the proposed method reduces. At the end of the simulation, the proposed method has a lower error than other ones, which indicates an improved error percentage.

Fig. 5
figure 5

Error percentage in predicting heart attack problems

4.1.6 Stroke complications

As can be seen in Fig. 6, the error percentage in predicting stroke complications in the proposed method is compared to that of the GWO, PSO, and GA. In this figure, the vertical axis represents the error percentage of predicting stroke complications and the horizontal axis represents the number of iterations. At first iterations, the error percentage of the proposed algorithm is higher than that of PSO and GA. Then, it becomes approximately equal to that of the GWO and this trend continues until the 70th iteration; after that, the error percentage of the proposed algorithm reduces. At the end of the simulation, the proposed method has less error than other methods, which indicates improved error percentage by the proposed method.

Fig. 6
figure 6

Error percentage for predicting stroke complication

4.1.7 Diabetes foot ulcer complication

As can be seen in Fig. 7, the error percentage of predicting diabetic foot ulcer complication in the proposed method is compared to that of the GWO, PSO, and GA. In this figure, the vertical axis represents the error percentage of predicting diabetic foot ulcer and the horizontal axis represents the number of iterations. At first iterations, the error percentage of the PSO and GA is higher than that of the proposed algorithm. Then, until iteration 270, the error percentage of the GWO is lower than that of the proposed algorithm. Then, until the last iteration, the proposed algorithm has the least error percentage.

Fig. 7
figure 7

Error percentage for predicting diabetic foot ulcer complication

4.1.8 Diabetes coma complication

Figure 8 shows the error percentage of predicting diabetic coma complication in the proposed method, GWO, and PSO. At the beginning of the simulation, the proposed method has a lower error percentage than the GWO and a higher error percentage than the GA and PSO. Then, it is observed that by applying filtering in iteration 280, the error percentage of the proposed algorithm becomes lower than other ones. This trend continues until the last iteration of the algorithm.

Fig. 8
figure 8

Error percentage for predicting diabetic coma complication

4.2 Evaluation and comparison of proposed method based on machine learning algorithms

In this section, the proposed model is compared with three machine learning algorithms: DT, SB, and MLP NN. The relationship between the actual classes and the predicted classes can be calculated using the confusion matrix; in the following its required parameters are also described. According to Eqs. 813, criteria such as accuracy, sensitivity, F-measure, precision, sensitivity, recall criteria are used to compare the proposed model with other ones.

$$\begin{aligned} \mathrm{{Accuracy}}&= \frac{\left( \mathrm{{TP}}+\mathrm{{TN}}\right) }{\mathrm{{All}}} \end{aligned}$$
(8)
$$\begin{aligned} \mathrm{{Sensitivity}}&= \frac{\mathrm{{TP}}}{\left( \mathrm{{TP}}+\mathrm{{FN}}\right) } \end{aligned}$$
(9)
$$\begin{aligned} \mathrm{{Specificity}}&= \frac{\mathrm{{TN}}}{\left( \mathrm{{FP}}+\mathrm{{TN}}\right) } \end{aligned}$$
(10)
$$\begin{aligned} \mathrm{{Precision}}&= \frac{\mathrm{{TP}}}{\left( \mathrm{{TP}}+\mathrm{{FP}}\right) } \end{aligned}$$
(11)
$$\begin{aligned} \mathrm{{Recall}}&= \frac{\mathrm{{TP}}}{\left( \mathrm{{TP}}+\mathrm{{FN}}\right) } \end{aligned}$$
(12)
$$\begin{aligned} \mathrm{{F-Measure}}&= 2\cdot \mathrm{{Precision}}\cdot \mathrm{{Recall}} \end{aligned}$$
(13)

Tables 11 and 12 show the results of predicting diabetes complications with DT techniques, SB and MLP NN in terms of Accuracy. Complications of diabetes include increase blood lipids, eye problems, high blood pressure, dialysis history, heart problems, stroke, diabetic foot ulcer, and diabetic coma. In all the mentioned complications, the accuracy of the proposed method is higher than that of the DT, SB, and MLP NN (Fig. 9).

Table 11 Predicting diabetes complications by accuracy criterion
Fig. 9
figure 9

Comparison of predicting diabetes complications with accuracy criterion

4.2.1 Increased blood lipids complication

In Table 12 and Fig. 10, a comparison between the proposed method, MLP NN, SB, and DT in predicting increased blood lipids complication is shown. Evaluation of values based on sensitivity, specificity, precision, recall, and F-measure criteria illustrates the superiority of the weighted features-based classification than the same weighted features.

Table 12 Comparison of predicting increased blood lipids complication
Fig. 10
figure 10

Comparison of predicting increased blood lipids complication

4.2.2 Eye problem complication

Table 13 and Fig. 11 show the comparison of the proposed method, MLP NN, SB, and DT in predicting eye problem complications. Comparison of the results of the proposed method based on F-measure, sensitivity, specificity, precision, and recall criteria indicates the superiority of the proposed algorithm than compared ones.

Table 13 Comparison of predicting eye problem complication
Fig. 11
figure 11

Comparison of predicting eye problem complication

4.2.3 High blood pressure complication

A comparison of predicting complications in the proposed algorithm, DT, SB, and MLP NN based on sensitivity, specificity, precision, recall, and F-measure criteria is shown in Table 14 and Fig. 12. A comparison of the results shows the improvement of classification with the weighted features than classification with the same weight features.

Table 14 Comparison of predicting high blood pressure complication
Fig. 12
figure 12

Comparison of predicting high-blood-pressure complication

4.2.4 Dialysis history complication

The performance of weighted features-based classification compared to modeling the same weighted features in predicting dialysis history complication is evaluated in Table 15 and Fig. 13. The superiority of the proposed method compared to DT, SB, and MLP NN in terms of sensitivity, specificity, precision, recall, and F-measure criteria is approved by comparing results.

Table 15 Comparison of predicting dialysis history complication
Fig. 13
figure 13

Comparison of predicting dialysis history complication

4.2.5 Heart attack complications

The performance of weighted features-based classification compared to modeling the same weighted features in predicting heart attack complications based on sensitivity, specificity, precision, recall, and F-measure criteria is evaluated in Table 16 and Fig. 14. The proposed method has better results in the diagnosis of heart attack complications compared to DT methods, SB, and MLP NN.

Table 16 Comparison of predicting heart attack complication
Fig. 14
figure 14

Comparison of predicting heart attack complications

4.2.6 Stroke complications

The performance of weighted features-based classification compared to modeling the same weighted features in predicting stroke complications is evaluated in Table 17 and Fig. 15. The proposed method has better results in the diagnosis of stroke complications compared to DT methods, SB, and MLP NN based on sensitivity, specificity, precision, recall, and F-measure criteria.

Table 17 Comparison of predicting stroke complication
Fig. 15
figure 15

Comparison of predicting stroke complication

4.2.7 Diabetes foot ulcer complication

Sensitivity, specificity, precision, recall, and F-measure criteria for evaluating the prediction of diabetic foot ulcer complications indicate the superiority of the proposed classification method over MLP NN, SB, and DT which is shown in Table 18 and Fig. 16.

Table 18 Comparison of predicting diabetes foot ulcer complication
Fig. 16
figure 16

Comparison of predicting diabetic foot ulcer complication

4.2.8 Diabetes coma complication

As shown in Table 19 and Fig. 17, sensitivity, specificity, precision, recall, and F-measure criteria have been evaluated to compare diabetes coma complication predicting. Results indicate the superiority of the proposed classification method over MLP NN, SB, and DT methods.

Table 19 Comparison of predicting diabetes coma complication
Fig. 17
figure 17

Comparison of predicting diabetes coma complication

Predicting and correctly diagnosing diabetes complications using AI and machine learning increases the chances of successful treatment. In this study, the middle filter is used to optimize the GWO algorithm and introducing a new model to predict and diagnose diabetes complications. The simulation results show that the proposed model is more accurate than SB, DT, and MLP NN. The high accuracy in the diagnosis of complications of diabetes indicates the superiority of the proposed method. Complexity and time-consuming implementation are the weaknesses of this method.

4.3 Experimental evaluation on UCI dataset

To compare the proposed method with related methods in this domain, we chose to use UCI Machine Learning Repository Dataset [21]. Diabetes files in this dataset consist of four fields per record such as date, time, code, and value. As shown in Table 20, code field of the dataset is described by an integer number showing the statues of a patient. There are 70 text files in the UCI dataset; each file contains one patient’s disease history. The patients are insulin deficient. This disease is manifested by many so-called metabolic effects, the main one being high blood glucose, which can be detected by measurements.

Table 20 Description of the code field in UCI dataset [21]

In this experiment, we compare the accuracy of previous works with our proposed method on UCI Machine Learning Repository Datasets. Previous works [15, 16, 21] used fuzzy methods to classify diabetic patients. Paper [15] also used a fuzzy method combined with PSO and reported their result on the UCI dataset. In all listed methods, data samples were divided into ten subsets, where nine of them were used for training, and the remaining one was used for testing. This procedure was repeated ten times until each of the ten subsets was evaluated. Since PSO and GWO are classified as metaheuristic methods, they need to run multiple times to get the best result. Hence, we repeated the whole process of choosing test and train data ten times (each with randomized data sequences) and listed the averaged results in Table 21.

Table 21 Comparison of classification accuracy of different methods on UCI dataset [21]

The result clearly shows boosting the accuracy of the classification system because we use SVM combined with GWO which shows better optimization results compared with PSO.

5 Conclusion

Diabetes is one of the most common chronic diseases and a major public health problem in the world. It is a rapidly growing and serious chronic disease, and its prevalence has been increasing in Asian countries. The increasing prevalence of diabetes mellitus (DM), the emergence of its complications as a cause of death, the early disability, and the burden on healthcare systems have made it a health priority. Diabetes has a profound effect on the quality of life physically, socially, and mentally. Studies have shown that diabetes can have negative effects on public health and a sense of well-being, in other words quality of life. Diabetes is not treatable, but it can be controlled. Thus, controlling diabetes means preventing and delaying its complications. Poor controlling leads to long-term blood sugar levels, which are strongly linked to late complications such as retinopathy, nephropathy, and cardiovascular disease. These complications are associated with high healthcare costs and reduced quality of life. In this paper, grey wolf optimizer (GWO) is used to solve the diabetes diagnosis problem. The important point of the proposed algorithm is the accuracy of that compared to other classification algorithms, which if used with the improved GWO along with the SVM will increase the accuracy of this algorithm to an acceptable level compared to other classification algorithms. The proposed method is superior over DT, SB, and MLP NN in predicting increased blood lipid with the accuracy of 0.96, eye problem with the accuracy of 0.94, high blood pressure with the accuracy of 0.92, dialysis history with the accuracy of 0.97, heart problems with the accuracy of 0.95, stroke with the accuracy of 0.96, diabetic foot ulcer with the accuracy of 0.96, and diabetic coma with the accuracy of 0.97. The high accuracy in diagnosing diabetes complications indicates the superiority of the proposed method in improving the results of the GWO. We also compared the proposed method with other classifiers on the UCI dataset that also shows its advantage over fuzzy-based classifiers. Complexity and time-consuming execution are the weaknesses of this method. We are trying to reduce time complexity by making changes in the GWO method in our future research.