A new Covid-19 diagnosis strategy using a modified KNN classifier

Rabie, Asmaa H.; Mohamed, Alaa M.; Abo-Elsoud, M. A.; Saleh, Ahmed I.

doi:10.1007/s00521-023-08588-9

A new Covid-19 diagnosis strategy using a modified KNN classifier

Original Article
Open access
Published: 02 May 2023

Volume 35, pages 17349–17373, (2023)
Cite this article

Download PDF

You have full access to this open access article

Neural Computing and Applications Aims and scope Submit manuscript

A new Covid-19 diagnosis strategy using a modified KNN classifier

Download PDF

Asmaa H. Rabie¹,
Alaa M. Mohamed²,
M. A. Abo-Elsoud³ &
…
Ahmed I. Saleh¹

1103 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

Covid-19 is a very dangerous disease as a result of the rapid and unprecedented spread of any previous disease. It is truly a crisis that threatens the world since its first appearance in December 2019 until our time. Due to the lack of a vaccine that has proved sufficiently effective so far, the rapid and more accurate diagnosis of this disease is extremely necessary to enable the medical staff to identify infected cases and isolate them from the rest to prevent further loss of life. In this paper, Covid-19 diagnostic strategy (CDS) as a new classification strategy that consists of two basic phases: Feature selection phase (FSP) and diagnosis phase (DP) has been introduced. During the first phase called FSP, the best set of features in laboratory test findings for Covid-19 patients will be selected using enhanced gray wolf optimization (EGWO). EGWO combines both types of selection techniques called wrapper and filter. Accordingly, EGWO includes two stages called filter stage (FS) and wrapper stage (WS). While FS uses many different filter methods, WS uses a wrapper method called binary gray wolf optimization (BGWO). The second phase called DP aims to give fast and more accurate diagnosis using a hybrid diagnosis methodology (HDM) based on the selected features from FSP. In fact, the HDM consists of two phases called weighting patient phase (WP²) and diagnostic patient phase (DP²). WP² aims to calculate the belonging degree of each patient in the testing dataset to class category using naïve Bayes (NB) as a weight method. On the other hand, K-nearest neighbor (KNN) will be used in DP² based on the weights of patients in the testing dataset as a new training dataset to give rapid and more accurate detection. The suggested CDS outperforms other strategies according to accuracy, precision, recall (or sensitivity) and F-measure calculations that are equal to 99%, 88%, 90% and 91%, respectively, as showed in experimental results.

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

Article 15 January 2021

An Analysis of Supervised Machine Learning Algorithms for COVID-19 Diagnosis

Nature inspired optimization model for classification and severity prediction in COVID-19 clinical dataset

Article 31 July 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Covid-19 has caused a significant alteration in all facets of life in all countries throughout the world since its initial appearance in Wuhan, China, in December 2019. In fact, there were 214,468,601 confirmed cases of Covid-19 in 27 August 2021 who includes 4,470,969 deaths received by the World Health Organization (WHO) from national authorities [1]. Fever, lethargy, dry cough, loss of appetite, body aches and mucous are the most prevalent symptoms. A person's symptoms can take 5–6 days to manifest after contact [2]. The majority of Covid-19 cases are mild, but some people (14%) develop more severe forms of the disease that necessitate oxygen therapy in the hospital, and about 5% require intensive care unit hospitalization [3].

Both computed tomography (CT) scans and real-time reverse transcription polymerase chain reaction (RT-PCR) were utilized to demonstrate the diagnostic procedures for Covid-19 disease. Although RT-PCR represents the most widely applied approach to diagnose Covid-19 cases and is the gold standard, it is unable to distinguish between live and dead viruses [2, 4]. Another drawback of RT-PCR is that it gives a false negative result due to the low amounts of viral ribonucleic acid (RNA) that did not reach the detection limit of the test. Although standard CT scans are available in most hospitals and can aid in early detection of suspected cases, the pictures of many viral pneumonias are similar and overlap with those of other infectious and inflammatory lung disorders. As a result, radiologists have difficulty distinguishing between Covid-19 and other viral pneumonias. Because RT-PCR and CT scans mislead the detection model to give an accurate results, blood tests have been used to overcome these problems and provide accurate results [2, 4].

Data mining (DM) that represents a sophisticated artificial intelligence technique is used to extract new and useful knowledge from large datasets. DM identifies correlations and patterns in several datasets and has also been used to predict and diagnose a variety of diseases including Covid-19 [2, 4]. Related to Covid-19, the large dataset produced around the world is a precious resource that must be analyzed to extract the important and innovative patterns to make better decisions to include the outbreak of the Covid-19 epidemic. Nowadays, DM was applied extensively in the healthcare sector for many different purposes, including modeling of health outcomes, hospital rankings, recovery, evaluation of treatment efficacy, predication of patient outcomes, infection control and stability [5].

Covid-19 is a very dangerous disease due to its rapid spread, so it needs rapid and accurate detection. In fact, the diagnostic process depends on the features selected from the Covid-19 dataset. Thus, feature selection is a very important process that allows the diagnostic model to deal with only effective features and ignore irrelevant features to reduce time consumption and increase diagnostic performance [2, 4]. The combination of filter and wrapper feature selection methods to provide hybrid methods is an important process for utilizing their advantages and selecting an effective subset of features. In this paper, the main contributions are summarized as follows:

(1)
CDS is provided as a new diagnostic strategy to quickly give accurate diagnosis. The CDS combines two phases, which are FSP and DP.
(2)
In FSP, features will be extracted from Covid-19 dataset containing blood test findings. Then, the best set of features using EGWO that includes both wrapper and filter techniques will be selected.
(3)
The EGWO combines two stages called FS and WS. While FS uses many filter methods as a fast way to select subsets of features from the input data, WS uses the subsets of features from FS as input to BGWO method to choose the meaningful features that can enable the diagnostic methodology in DP to give more accurate results.
(4)
In DP, the selected features from FSP will be passed to HDM to accurately diagnose the patients. In fact, HDM uses NB as a weight method to weight the patients in the WP² and then uses KNN as a diagnostic method to accurately diagnose patients in the DP².
(5)
Related to HDM, KNN will be used to diagnose a new patient based on the degree of affiliation of each patient in the testing dataset used as a training dataset.

The organization of this paper is structured as follows: The related work about Covid-19 classification strategies is introduced in Sect. 2. A new Covid-19 diagnostic strategy is discussed in detail in Sect. 3, and the experiments and results are analyzed in Sect. 4. In Sect. 5, the conclusions of this paper in addition to future works are presented.

2 Related work

Some research on the diagnosis of Covid-19 disease in recent years will be presented through this section. In [4], distance biased naïve Bayes (DBNB) was proposed to determine Covid-19 patients based on laboratory findings. Actually, the DBNB consists of two stages to have the ability to diagnose Covid-19 cases. To select the most effective features from the input data, advanced particle swarm optimization (APSO) which includes wrapper and filter approaches was provided in the first stage. In the second stage, Covid-19 cases were classified depending on the selected features by using DBNB suggested to overcome the disadvantages of classical NB. Although DBNB achieved high accuracy, it was not applied for nominal data.

As presented in [6], automatic COVID screening (ACoS) model was implemented using conventional machine learning techniques as well as radiomic texture descriptors to classify the normal, suspected and COVID-19 cases. Actually, depending on chest X-ray images, the radiomic texture descriptors were obtained. ACoS used a majority voting based on ensemble classification principle using five supervised learning methodologies. Although the results in [6] showed that ACoS provided a higher performance for diagnosing COVID-19 patients, it did not perform well when applied to tuberculosis and influenza.

As illustrated in [7], a novel fusion model handcrafted with deep learning features (FM-HCF-ACOSF) technique was used to diagnose COVID-19 cases using chest X-ray images. FM-HCF-ACOSF model was implemented in three main stages called Gaussian filtering-based preprocessing, feature extraction using fusion model and classification. At first, a preprocessing stage was carried out using Gaussian filtering technology to eliminate the noise presented in the input data (image). Secondly, fusion model was carried out to determine the best features after preprocessing stage. Finally, multilayer perceptron (MLP) method was applied to detect COVID-19 patients.

As illustrated in [2], feature correlated naïve Bayes (FCNB) was proposed for covid-19 diagnosis based on laboratory tests. FCNB was implemented through four basic phases called: (i) feature selection phase (FSP) implemented to only select the suitable features from the dataset; (ii) feature clustering phase (FCP) implemented to group the selected features into many clusters called master feature (MF); (iii) master feature weighting phase (MFWP) implemented to weight each master feature depending on the degree of importance of the feature; and (v) feature correlated naïve Bayes phase (FCNBP) used to classify patients depending on weight NB.

In [8], an automatic COVID-19 diagnosis method based on CT images that is called handcrafted feature generation technique and hybrid feature selector (HFGT-HFS) was proposed. Actually, HFGT-HFS needed to three main steps to be implemented. At first, preprocessing was used to convert image in 2D matrices. Then, statistical and textural features were selected using feature generation. At last, deep neural network (DNN) and artificial neural network (ANN) were implemented for classification. Related to [8], the experimental results ensured that the DNN model achieved 95.84% classification accuracy while ANN model achieved 94.10% classification accuracy.

As depicted in [9], X-ray images were passed to a new classification method called convolutional neural network (CNN) for Covid-19 detection. The CNN was enhanced by using EfficientNet architecture to be implemented on both binary and multi-class classification. The performance of CNN was measured using tenfold validation. The experimental results in [9] ensured that the accuracy of CNN based on binary classification is 99.62% and its accuracy value for multi-class is 96.70%.

As presented in [10], a COVID-19 diagnostic model (CDM) that composes of feature selection technique called genetic algorithm (GA) and four different classifiers was introduced. The four classifiers are decision trees (C4.5), NB, CNN and KNN. The proposed CDM used a binary genetic algorithm as wrapper feature selection to select relevant features from datasets. Datasets were extracted from laboratory findings. After selecting an effective subset of features, the four classifiers would train on the same databases and also apply them to the same testing data. The experimental results in [10] ensured the performance of where CDM model based on CNN achieved a high performance of 80%.

In [11], X-ray images were passed to the proposed fusion of convolutional neural network (CNN), support vector machine (SVM) and Sobel filter (CNN-SVM + Sobel) to diagnose COVID-19 cases. In fact, the CNN + SVM + Sobel model relied on data augmentation to augment the input data and overcome overfitting. A Sobel filter has been applied to obtain the edge of the image and to improve model performance. Then, in the preprocessing step, image dimension was changed. At last, CNN-SVM and NN-sigmoid were used for classification.

As presented in [12], a transfer learning based on COVID-19 screening technology (TL-CST) has been proposed for automatic diagnosis of diseases such as COVID-19. In this model, the dataset was initially augmented to increment the data’s size. Then, in preprocessing step, input images were converted to have the same size and the median filter was used to eliminate noise from the input. Then, Visual Geometry Group from Oxford (VGG16) was applied to extract the main features from CT images and principal component analysis (PCA) was applied to only select the effective subset feature. At the end, the classification was performed by using four classifiers, which are extreme learning machine (ELM), deep convolutional neural network (DCNN), bagging ensemble with support vector machine (SVM) and online sequential ELM. The experimental results in [12] showed that the SVM classifier can classify with high accuracy.

In [13], a new method called automatic bone age assessment (ABAA) was proposed to accurately diagnose children’s maturity assessment based on the calculation of bone age from hand X-ray images. ABAA includes two main methods called convolutional neural network (CNN) and graph convolutional network (GCN). In fact, CNN was applied to extract features, whereas bone key regions inference can be determined using GCN. Related to [13], the experimental results showed that the suggested ABAA can classify with high accuracy. As provided in [14], a new diabetic retinopathy diagnose method called lesion-attention pyramid network (LAPN) was introduced to accurately diagnose patients. Actually, LAPN is superior other existing methods according as the experimental results because it can accurately diagnose patients and can fuse the activation map of lesion. A comparison of recent Covid-19 diagnostic strategies is provided in Table 1.

Table 1 A comparison of recent Covid-19 diagnostic strategies

Full size table

3 The proposed Covid-19 diagnostic strategy (CDS)

The CDS that is provided to automatically introduce a rapid and accurate diagnosis will be discussed in detail through this section. In fact, CDS includes two basic phases called feature selection phase (FSP) and diagnosis phase (DP) as shown in Fig. 1. At first, trying to select the best features will be performed in FSP before beginning to train the diagnostic model in DP for preventing the overfitting. Additionally, selecting the important features will enable the diagnostic model to accurately diagnose patients. In the FSP, enhanced gray wolf optimization (EGWO) combining wrapper and filter techniques will be used to determine the best features that have effect on covid-19 patients. EGWO includes two basic stages called filter stage (FS) and wrapper stage (WS). Many different filter methods will be used in FS, and then, binary gray wolf optimization (BGWO) will be applied as a wrapper method in WS based on the output of FS. In the DP, fast and more accurate diagnosis will be provided using a hybrid diagnosis methodology (HDM) based on the selected features from FSP. The HDM contains two basic phases called weighting patient phase (WP²) and diagnostic patient phase (DP²). In WP², the belonging degree of each patient in the testing dataset to class category will be calculated using NB as a weight method. Then, fast and accurate diagnosis will be provided in DP² using KNN based on the weights of patients in the testing dataset as a new training dataset. In the next subsections, the stages of the proposed CDS called FSP and DP will be described in detail.

3.1 Feature selection phase (FSP)

Many features that includes relevant and irrelevant features have been provided in the input data. Therefore, features selection process is very important to eliminate the features that have the least impact on diagnosis or classification model. This process aims to increase the performance of Covid-19 diagnosis model and reduce its computational time [2]. Generally, feature selection methods have been classified into two basic types, namely wrapper and filter techniques [15,16,17,18,19,20]. In fact, filter techniques are faster than wrapper methods and they can deal with large-dimensional datasets. These methods do not waste implementation time and are also cheap. Despite the benefits of filter methods, they do not offer high performance because they ignore the interaction between a set of features and the applied diagnosis technique. On the contrary, wrapper techniques can offer high performance for the used diagnosis model but they suffer from computational time and are also more expensive [2, 4].

Through this section, enhanced gray wolf optimization (EGWO) as a new selection algorithm that includes wrapper and filter techniques is introduced. EGWO is a technology that includes the benefits of both wrapper and filter techniques to provide the best features that have an impact on the Covid-19 diagnosis model. In fact, EGWO consists of two basic stages: (i) filter stage (FS) using different filter techniques which are acting as quick selection techniques and (ii) wrapper stage (WS) using binary gray wolf optimization (BGWO) as an accurate technique. BGWO is one of wrapper techniques which has the ability to choose the significant features in input datasets. On the contrary, it suffers from computational time because it depends on the randomly generated initial population and input data which may contain a huge number of features.

Accordingly, FS tries to overcome the problems of WS by applying a number of filter selection techniques which are equal to the number of wolves (search agents) in initial population. Additionally, the output (subset of features) of each filter method is passed to WS as an initial value of one wolf. Accordingly, the output of the FS is the initial population for WS to reduce the computational time and the complexity of BGWO providing an enhanced technique called EGWO that can increase the performance of diagnosis model.

Generally, gray wolf optimization (GWO) is designed to deal with continuous optimization problems [21, 22]. On the other hand, to deal with binary (or discrete) optimization problems like feature selection process, BGWO should be used [23]. Consequently, the positions of the wolves must be converted from continuous values to binary form. This conversion is performed by using a sigmoidal transfer function so the position of wolf after conversion, will have two value 1 or 0. Actually, 1 value refers to the selected feature but 0 value refers to the unselected feature as presented in Table 2. Table 2 presents a single search agent in population in m-dimensional space, assuming m = 10 which indicates to number of features in Covid-19 dataset.

Table 2 An example of single wolf in population

Full size table

According to BGWO, EGWO is an improvement of BGWO to overcome its problems for choosing the best features in input dataset for Covid-19 diagnosis. For implementing EGWO, it is required at first to execute the filter methods in FS. The main reason is to determine the size of population in WS that is equal to the number of used filter methods and also to determine the initial values of search agents in the population which represents the output of filter methods. For example, if the used filter methods in FS equals “pt,” then, the initial population size in WS equals “pt” with initial values which are the output of implementing these filter methods. To implement BGWO in WS, K-nearest neighbor (KNN) will be applied as a fitness (or evaluation) technique to identify the best candidate solution [24].

BGWO represents an optimization algorithm that simulates the social leadership and hunting strategy of gray wolves. The size of each group is between 5 and 12 search agents (individuals). Alpha, beta, delta and omega are four groups of the hierarchy of wolves. The first leader wolf is called alpha and the second and third ones are called beta and delta, respectively [22, 23, 25, 26]. To hunt a prey, the encircling behavior for the pack can be formulated as (1) [23].

$$Wo\left( {r + 1} \right) = Wo_{t} \left( r \right){-}S \times D$$

(1)

where Wo_t is the prey’s position and r represents the current iteration number. r + 1 is the next iteration number, S is the coefficient vector and D is expressed as (2) [21].

$$D = \left| {E \times Wo_{t} \left( r \right) - Wo\left( r \right)} \right|$$

(2)

where E refers to the coefficient vector and Wo refers to the wolf’s position. The coefficient vectors called S and E are determined by using (3) and (4) [23, 27].

$$S = \left| {2 \times y \times p_{1} - y} \right|$$

(3)

$$E = 2 \times p_{2}$$

(4)

where p₁ and p₂ refer to two random numbers which are independent and uniformly distributed between [0, 1]. y refer to the encircling coefficient applied to balance the trade-off between exploitation and exploration. In EGWO, y is a parameter that is linearly decreasing from 2 to 0 that is can be calculated by using (5) [23, 25].

$$y = 2{-}2 \times \frac{r}{R}$$

(5)

where R refer to the maximum number of iterations. In EGWO, the three leaders called alpha, beta and delta wolves. These leaders have better knowledge about the potential position of the target (prey). Accordingly, the omega wolves are guided by these leaders to be moved toward the best position. The wolf’s position can be mathematically updated to be new position by using (6) [23, 25].

$$W_{o} \left( {r + 1} \right) = \frac{{wo_{1} + wo_{2} + wo_{3} }}{3}$$

(6)

where Wo₁, Wo₂ and Wo₃ are calculated by using (7–9) [23, 25].

$$Wo_{1} = \left| {Wo_{\alpha } - S_{1} \times D_{\alpha } } \right|$$

(7)

$$Wo_{2} = \left| {Wo_{\beta } - S_{2} \times D_{\beta } } \right|$$

(8)

$$Wo_{3} = \, \left| {Wo_{\delta } - S_{3} \times D_{\delta } } \right|$$

(9)

where Wo_α, Wo_β and Wo_δ refer to the position of alpha, beta and delta at iteration r. S₁, S₂ and S₃ are measured by using (3). D_α, D_β and D_δ are defined as presented in (10–12) [23, 25].

$$D_{\alpha } = \left| {E_{1} \times Wo_{\alpha } - Wo} \right|$$

(10)

$$D_{\beta } = \left| {E_{2} \times Wo_{\beta } {-}Wo} \right|$$

(11)

$$D_{\delta } = \left| {E_{3} \times Wo_{\delta } - Wo} \right|$$

(12)

where E₁, E₂ and E₃ are calculated by (4). In fact, the new positions of the search agents are not in binary form. Thus, the sigmoid function should be applied on a new position of each wolf to transform it to binary by using (13) [23].

$$Wo_{b} \left( {r \, + \, 1} \right) = \left\{ {\begin{array}{*{20}l} 1 \hfill & { {\text{if}}\; {\text{sigmoid}}\left( {Wo} \right) \ge {\text{rand}}\left( {0,1} \right)} \hfill \\ 0 \hfill & {{\text{otherwise}}} \hfill \\ \end{array} } \right.$$

(13)

where Wo_b(r + 1) is binary position value of each Wo wolf in m dimensions (m = no. of features) at r iteration. Sigmoid (Wo) is the sigmoidal function of Wo wolf denoted by using (14) [23].

$${\text{sigmoid}}\;\left( {wo} \right) = \frac{1}{{1 - e^{{ - 10\left( {wo - 0.5} \right)}} }}$$

(14)

The main objective of EGWO is to increase classification’s accuracy and reduce the number features to reduce the execution time. For this propose, fitness function is computed by using (15).

$${\text{Fitness}}\left( {Wo} \right) = \eta \times N + \left( {1 - \eta } \right) \times \frac{M - Z}{M}$$

(15)

where N is the accuracy of KNN classifier as a standard classifier, Z refers to the number of features which are selected and M refers to the total number of features in the data. η refers to the classification accuracy weight, (1-η) refers to the feature selection quality weight, and η $\in [\mathrm{0,1}]$. After implementing the EGWO on the Covid-19 datasets, the output has only the best features that have an impact on Covid-19 diagnosis model which have value 1. Algorithm 1 shows the sequence of executing the EGWO.

For clarification, assume that the used filter methods in FS is equal to four filters, which are correlation-based feature selection (CBFS) [28], Chi-square (C-square) [29, 30], information gain (I-gain) [31, 32] and Fisher score (F-score) [33]. Additionally, assume that the number of features in the blood test dataset is equal to 6 (m = 6): F = {x₁, x₂, x₃, x₄, x₅, x₆}. After applying the first stage of EGWO called FS, the selected subsets of features according to these four filter methods are: CBFS = {x₁,x₂,x₄,x₆}, C-square = {x₂,x₃,x₄,x₅}, I-gain = {x₂,x_3,x₄,x₅} and F-score = {x₁,x_3,x₅}. Then, these four outputs should be forwarded to the second stage of EGWO called WS where the number of wolves in population is equal to four that is the same number of filter techniques: Pop = {Wo₁,Wo₂,Wo_3,Wo₄}. Additionally, the initial values of agents (wolves) in population represent the output of these four filter methods.

Then, the execution of BGWO in WS depends on several assumptions as depicted in Table 3. Regarding to Table 3, it is proposed that BGWO has been executed through three iterations for producing new population including new values at 4 search agents: Wo₁ = {0,0,1,1,0,1}, Wo₂ = {1,0,1,0,1,0}, Wo₃ = {1,0,1,1,1,0} and Wo₄ = {1,0,0,0,1,1}. After evaluating the wolves, Pop = {Wo₁,Wo₂,Wo_3,Wo₄}, Wo₃ is considered the best solution (leader wolf) because it has the best fitness value. Accordingly, the most significant subset of features is provided by Wo_3. Finally, the highest performance of the used diagnosis model can be achieved by using these subset of features in the used dataset: F_new = {x₁, x₃, x₄, x₅}.

Table 3 Assumptions for executing BGWO in WS

Full size table

3.2 Diagnosis phase (DP)

DP will present a hybrid diagnosis methodology (HDM) technique as a new hybrid diagnostic method based on the KNN classifier in the Covid-19 diagnosis process. Although KNN is characterized by simplicity, high accuracy and ease of implementation, it is lazy learning and the k value affects the diagnostic process and may lead to misdiagnosis [34]. In fact, KNN relies on voting to classify a patient and this may give an incorrect diagnosis. Thus, weighting training patients is an important process before beginning to use KNN for classing a new patient. Weighting training patients aims to enable KNN to provide correct diagnosis to patients. Actually, HDM starts weighting the patients in testing dataset by NB as a weighting method, and then, those weighted patients are entered as patients in the training data to apply the KNN technique to a new patient. Thus, HDM consists of two phases for Covid-19 diagnosis called: (i) weighting patient phase (WP²) using NB as a weighting method and (ii) diagnostic patient phase (DP²) using KNN as a diagnosis method as shown in Fig. 2. The steps of implementing HDM technique is shown in Fig. 2. In the following subsections, weighting patient phase and diagnostic patient phase will be explained in detail.

3.2.1 Weighting patient phase (WP²)

Patients in testing dataset will be weighted using NB method. The patient’s weight indicates the degree to which the patient belongs to the class category. The NB method belongs to the type of probabilistic classifier which assumes that each feature is independent and does not need large data for the training process [2, 35]. In fact, NB depends on Bayes’ theorem to determine which class category the patient belongs to [35, 36]. Hence, NB is used in HMD technique as a weighting method to calculate the degree of degree of affiliation of each patient in the testing dataset to the class category using probability. To clarify the idea, suppose that Covid-19 dataset consists of “N” patients divided into “n₁” patients as training dataset and “n₂” patients as testing dataset. The patients in the training dataset are expressed as: V = {V₁, V₂, V₃, …, V_n1} while the patients in the testing dataset are expressed as: P = {P₁, P₂, P₃, …, P_n2}. Each patient of V_t ϵ n₁ and P_j ϵ n₂ is formulated as an ordered set of “D” features: V(f₁, f₂, f₃, …, f_Dim) = [f_1t, f_2t, f_3t, …, f_Dimt] and P(f₁, f₂, f₃, …, f_Dim) = [f_1j, f_2j, f_3j, …, f_Dimj]. Accordingly, each patient V_t and P_j can be expressed in an “Dim” dimensional space of features in which the considered “Dim” features are the Dim dimensions of that space. After learning the NB method based on the patients in the training dataset V, each ith patient P_i in the testing dataset P will be weighted (Weight (P_i)) using (16).

$${\text{Weight}}\;\left( {Pi} \right) = NB \, \left( {c| \, Pi} \right)$$

(16)

where NB (c| Pi) is a naïve Bayes probability that measure the belonging degree of P_i patients to c class category. NB (c| Pi) can be calculated based on probability using (17).

$${\text{NB}}\left( {c| \, Pi} \right) = {\text{Pro}}\left( c \right) \times {\text{Pro}}\left( {P_{i} |c} \right)$$

(17)

where Pro(c) refers to the probability of class c. Additionally, Pro(P_i|c) refers to the probability of the testing patient P_i given the class c. Algorithm 2 includes the implementation steps of WP². In the next subsection, the testing dataset and the weight values of patients in it will be used as a training data to KNN to enable it to give accurate diagnoses.

3.2.2 Diagnostic patient phase (DP²)

DP² aims to accurately diagnose Covid patients using a KNN classifier. In this phase, the testing dataset will be used as training dataset to enable KNN to accurately diagnose a new patient based on the weights of training patients from WP². Generally, KNN is a nonparametric classifier that provides a robust decision in multiple fields such as pattern recognition, diagnosis and classification based on the geometrical surrounding neighborhood [34, 37]. Although KNN represents a simple classifier, does not generate any training model for classification and is easy to implement, it suffers from many problems that reduce its performance. The main problems of KNN are that its performance depends on the voting process among the K of nearest neighbors and that it does not take into account the belonging degree of the patients in dataset to their class categories. In fact, the voting process used to determine the category of new patients may lead to a misdiagnosis. Accordingly, in this paper, the work aims to overcome the problems of classic KNN by taking the weights of K-nearest neighbors (nearest training patients) to a new patient rather than using the voting process to provide an accurate diagnosis.

To implement the modified KNN using the weights of the nearest training patients rather than using the voting process among them, it requires many steps as presented in algorithm 3. The modified KNN implementation steps begin with a representation of the training dataset generated by WP² in the feature space. Then, the distance between a new patient S_t and any training patient P_j in the feature space Dist(S_t,P_j) can be calculated using Euclidian distance by using (18) [34].

$${\text{Dist}}\left( {S_{{\text{t}}} ,P_{{\text{j}}} } \right) = \sqrt {\mathop \sum \limits_{{{\text{i}} = 1}}^{{{\text{n}}_{2} }} \left( {S_{ti} - P_{ji} } \right)^{2} }$$

(18)

where Dist(S_t,P_j) represents the Euclidean distance between two patients S_t and P_j, S_ti is a new patient and P_ji is the jth training patient. Additionally, n₂ represents the total number of training dataset to KNN. After calculating the distance between a new patient and every training patient separately, the closest K of training patients should be determined using (19).

$${\text{Neighbors}}\;\left( {S_{t} } \right) = k\;{\text{of}}\;{\text{ training}}\;{\text{patients}}\;{\text{with}}\;{\text{the}}\;{\text{smallest}}\;{\text{Dist}}\left( {S_{t} ,P_{j} } \right)$$

(19)

Assume that the K-nearest neighbors of a new patient is divided into K_c that refers to the number of nearest neighbors of patients who belongs to “Covid” class and K_nc that refers to the number of nearest neighbors of patients who belongs to “non-Covid” class: K = K_c + K_nc. Thus, new patient’s diagnosis can be determined based on the belonging degree of him to every class category depending on their neighbors in this category. Belonging degree of patient to each class category represents a cumulative summation of dividing the weight of his neighbor in that class by the distance between him and this neighbor. The belonging degree of new patient S_t to “Covid” class category (Belong_Degree__C (S_t)) based on K_c can be calculated using (20).

$${\text{Belong}}\_{\text{Degree}}\_{\text{c}}\left( {S_{t} } \right) = \mathop \sum \limits_{q = 1}^{{K_{c} }} \frac{{{\text{Weight}} \left( {P_{q} } \right)}}{{{\text{Dist}}\left( {P_{q} , S_{t} } \right)}}$$

(20)

where weight (P_q) is the weight of qth training patient who belongs to the “Covid” class and closes to new patient S_t. Dist(P_q,S_t) refers to the distance between P_q as a training patient and S_t as a new patient. Additionally, the belonging degree of new patient S_t to “non-Covid” class category (Belong_Degree_n_C(S_t)) based on K_nc can be calculated using (21).

$${\text{Belong}}\_{\text{Degree}}\_{\text{nc}}\left( {S_{t} } \right) = \mathop \sum \limits_{r = 1}^{{K_{nc} }} \frac{{{\text{Weight}}\ \left( {P_{r} } \right)}}{{{\text{Dist}}\ \left( {P_{r} , S_{t} } \right)}}$$

(21)

where weight (P_r) is the weight of rth training patient who belongs to the “non-Covid” class and closes to new patient S_t. Dist(P_r,S_t) refers to the distance between P_r as a training patient and S_t as a new patient. Finally, if Belong_Degree__C(S_t) is greater than Belong_Degree_nc(S_t), then new patient is classified as a Covid patient. Otherwise, new patient is classified as a non-Covid patient. Hence, the final decision to diagnose a new patient is based on weights of the nearest K of training patient for the new patient rather than using the voting process.

4 Experimental results

The CDS will be evaluated through this section. As introduced in the previous section, the CDS includes two basic phases: FSP and DP. In the FSP, the best features are selected by EGWO including two stages called FS and WS. On the other hand, the implementation of HDM, which consists of WP² and DP², will be performed in DP based on the chosen features from FSP to give a rapid and more accurate diagnosis. Actually, WP² aims to give a weight to each patient in the testing dataset using NB method before implementing the modified KNN as a diagnosis method in the DP² to quickly diagnose a new patient to the correct class category. For this purpose, the experimental results will be produced from several ordered steps. At first, dataset that includes both Covid and non-Covid cases will be collected. Then, EGWO will select the best group of features in the used dataset. Finally, the FSP output will enter into the HDM in the DP to introduce a rapid and more accurate results.

In this paper, the experimental results will follow three basic scenarios. According to the first scenario, EGWO will be implemented to determine the best features in the collected dataset compared to other advanced features selection methods. This scenario is intended to demonstrate the superiority of EGWO over other feature selection methods. In the second scenario, HDM will be tested against other recent classification methods based on Covid-19 dataset that include the best set of features selected by EGWO method. In the third scenario, a complete CDS strategy that include both EGWO and HDM will be applied to diagnose patients who suffer from Covid-19. In fact, the implementation of all scenarios will be performed using Covid-19 dataset [38, 39]. The Covid-19 data are divided into two groups, which are training data and testing data. While the diagnostic technique can be trained by training data, the testing data are used to measure the efficiency of model. Confusion matrix performance metrics will be applied to calculate the efficiency of the suggested diagnostic model [2]. A number of parameters are used during the implementation of these three scenarios. Table 4 shows the used values of these parameters.

Table 4 Applied parameters and the used values

Full size table

In fact, the value of K is set experimentally. Different values of K are used to implement KNN classifier based on 1000 different patients in the used dataset where training patients are represented in 800 patients while testing patients are represented in 200 patients. The accuracy and error values of KNN method are calculated based on each value of K to determine the best value of K which can enable KNN to provide maximum accuracy and minimum error values. The range of K used in our case belongs to 1 and 40: K ∈ [1, 40]. Actually, the best value of K is 13 because this value enables KNN to give the minimum error rate as shown in Fig. 3. Accordingly, k = 13 is used during the next experiments.

4.1 Covid-19 dataset’s description

The OSR dataset as a Covid-19 data that consist of routine blood test results is used to identify patients who suffer from Covid-19 [38, 39]. The OSR dataset consists of 1624 patients at the San Raffaele Hospital (OSR) collected from 19–2–2020 to 31–5–2020. This dataset includes personal information about patients such as age and gender (Female or Male). In fact, this dataset includes 34 features which are filtered from irrelevant features using EGWO method to be 20 features as presented in Table 5. Table 5 consists of the selected features according to the EGWO method (20 features), their description and normal range of them. Actually, the medical member’s (doctor’s) opinion about a suitable type of collected data that should be used to correctly diagnose Covid-19 patients has been taken into consideration.

Table 5 Description of the selected features

Full size table

Additionally, the doctor’s opinion has been taken to identify the normal range of values for each feature and also the limit values that should be excluded by taking a neighborhood around these limits to take only discriminatory values for each feature in the dataset. This process is very important because the discriminatory values of features enable the diagnostic model to determine Covid-19 patients and healthy cases in accurate manner. The OSR dataset is divided into two groups, which are training data and testing data. While the training data contain 1300 cases, the testing data contain 324 cases. The dataset is divided into two main class categories which are Covid and non-Covid cases as presented in Table 6. In the collected dataset, the distribution of the used cases are represented related to “Age” and “Gender” as illustrated in Figs. 4, 5 and 6.

Table 6 Description of dataset

Full size table

4.2 Evaluation metrics

In the following experiments, the recall (sensitivity) accuracy, precision and error as evaluation parameters will be measured. Accordingly, micro-average, macro-average and F-measure will be measured related on precision and recall calculations. Calculation of these metrics can be done using the confusion matrix constructed in Table 7. As shown in Table 8, various formulas are used to summarize the confusion matrix performance metrics. Finally, the second unit should be used to assess the execution time of Covid-19 detection algorithms.

Table 7 Confusion matrix

Full size table

Table 8 Confusion matrix formulas

Full size table

4.3 Testing the proposed feature selection technique

The proposed EGWO is examined and compared to other recent methodologies which are presented in Table 9 using the considered Covid-19 dataset. These methodologies which are BSFS [40], HLBDA [41], APSO [2] and ACO [42] are presented in Table 9. To demonstrate the performance of the EGWO technique against other methods, a standard classifier called KNN is implemented [34]. The obtained results show that EGWO is superior to other feature selection methods as shown in Figs. 7, 8, 9, 10, 11, 12, 13, 14, 15 and 16.

Table 9 Used selection techniques for evaluation

Full size table

As presented in Figs. 7, 8, 9 and 10, when the number of training patients is equal to 1300, the BSFS, HLBDA, APSO, ACO and EGWO approaches produce accuracy of 0.86, 0.85, 0.84, 0.83 and 0.87, respectively. In fact, EGWO has the highest accuracy value because it can precisely choose the meaningful features that can enable the Covid-19 diagnostic model to give more accurate results. Additionally, the error values for the BSFS, HLBDA, APSO, ACO and EGWO approaches are 0.14, 0.15, 0.16, 0.17 and 0.13, respectively. EGWO has a precision value of 0.80, whereas BSFS, APSO, ACO and HLBDA have precision values of 0.79, 0.77, 0.74 and 0.72, respectively. EGWO has a recall value of 0.85 while BSFS, HLBDA, APSO and ACO have recall (sensitivity) values of 0.83, 0.79, 0.73 and 0.78, respectively. Related to these results, Figs. 7, 8, 9 and 10 illustrate that EGWO outperforms other recent methods such as BSFS, HLBDA, APSO and ACO because it can give the best accuracy and error values.

Figures 11, 12, 13, 14 and 15 indicate that EGWO introduces the best value of macro-average precision that equals 0.70 when the number of training patients is equal to 1300. Otherwise, HLBDA introduces the best value of macro-average precision that equals 0.60. Furthermore, the best value of macro-average recall is generated by EGWO with value equal to 0.74, whereas the worst value is generated by ACO with value approximately equal to 0.68 at the number of training data equal to 1300. EGWO offers the best micro-average precision equal to 0.75, whereas BSFS has 0.70 that represents the lowest micro-average precision value at the number of training data equal to 1300. At the number of training patients equal to 1300, the micro-average recall value of EGWO is 0.67, whereas BSFS, HLBDA, APSO and ACO have values of 0.65, 0.60, 0.61 and 0.64, respectively. Additionally, the F-measure value for EGWO is approximately 0.72, whereas the values of BSFS, HLBDA, APSO and ACO are approximately 0.69, 0.70, 0.71 and 0.71, respectively. In Fig. 16, the fastest speed is provided by EGWO with value equal to 51 (sec.) while the slowest speed is provided by ACO with value equal to 55 (sec.). Finally, EGWO outperforms other recent methods such as BSFS, HLBDA, APSO and ACO as it can rapidly select the accurate features.

4.4 Testing the proposed classification technique

Based on the best features selected by EGWO technique, the proposed HDM technique is examined and compared to other recent classification techniques using the considered Covid-19 dataset without irrelevant features. These recent techniques represent classical KNN [34], NB [2], SVM [12] and ANN [8]. The obtained results show that HDM is superior to other classification methods as shown in Figs. 17, 18, 19, 20, 21, 22, 23, 24, 25 and 26. The best values of accuracy, precision, recall (sensitivity), macro- and micro-average and F-measure are provided by HDM. This demonstrates the efficiency of HDM compared to other methods using the best set of features presented in Table 5.

Related to Figs. 17, 18, 19 and 20, at the number of training cases equal to 1300, the KNN, NB, SVM, ANN and HDM approaches produce accuracy of 0.87, 0.89, 0.89, 0.90 and 0.99, respectively. In fact, HDM has the highest accuracy value because it can precisely diagnose Covid-19 patients. The error values for the KNN, NB, SVM, ANN and HDM approaches are 0.13, 0.11, 0.11, 0.10 and 0.01, respectively. Moreover, HDM provides 0.88 as a precision value while KNN, NB, SVM and ANN have precision values of 0.80, 0.79, 0.77 and 0.82, respectively. HDM has a recall value of 0.90 while KNN, NB, SVM and ANN have values of 0.85, 0.83, 0.80 and 0.88, respectively. Related to these results, Figs. 17, 18, 19 and 20 illustrate that HDM outperforms other recent methods such as KNN, NB, SVM and ANN as it gives the best accuracy and error values.

Figures 21, 22, 23, 24 and 25 indicate that HDM gives the maximum value of macro-average precision that approximately equals 0.89 when the number of training data equals 1300. On the contrary, KNN gives the lowest value of macro-average precision that approximately equals 0.70. Furthermore, the highest value of macro-average recall is provided by HDM with value equal to 0.87, whereas the lowest value is generated by SVM with the value equal to 0.72 when the number of training data equals 1300. HDM offers the best micro-average precision that approximately equals 0.86, whereas KNN has the lowest value that equals 0.75 when the number of training patients equals 1300. When the number of training patients is 1300, the micro-average recall value of HDM is 0.85 while the values of KNN, NB, SVM and ANN are 0.67, 0.68, 0.72 and 0.79, respectively. Additionally, HDM has F-measure value equal to 0.91, whereas the values of KNN, NB, SVM and ANN are approximately 0.72, 0.74, 0.80 and 0.85, respectively. In Fig. 26, the fastest speed is given by HDM with value equal to 28 (sec.), whereas the slowest speed is introduced by the traditional KNN with value equal to 51 (sec.). Finally, HDM outperforms other recent classification methods such as traditional KNN, NB, SVM and ANN because it can quickly diagnose Covid-19 patients with high accuracy.

4.5 Testing the proposed Covid-19 diagnostic strategy (CDS)

The proposed CDS technique that includes two phases called feature selection phase and diagnosis phase will be examined in this section. In other words, the proposed CDS that includes both EGWO as a feature selection approach and HDM as a classification technique will be tested during this section. To ensure that the CDS strategy is effective, it is compared against other Covid-19 diagnosis strategies as shown in Table 1. These strategies are DBNB [4], ACoS [6], CDM [10], FCNB [2], TL-CST [12] and CNN [11]. As shown Figs. 27, 28, 29, 30, 31, 32, 33, 34, 35 and 36, the best performance, error value and run time are achieved by CDS. Accuracy, precision, recall (sensitivity), macro-average, micro-average and F-measure of CDS are measured. This demonstrates that the performance of CDS is better than other strategies as its phases can effectively collaborate.

In Figs. 27, 28, 29 and 30, the accuracy of DBNB, ACOS, FCNB, CDM, TL-CST, CNN and CDS are 0.80, 0.79, 0.88, 0.85, 0.88, 0.90 and 0.99, respectively, when the number of training data equals 1300. These results proved that the best accuracy is achieved by CDS because it depends on preprocessing process before using the diagnostic model to give more accurate results. Also, the error of DBNB, ACOS, FCNB, CDM, TL-CST, CNN and CDS are 0.20, 0.21, 0.12, 0.15, 0.12, 0.10 and 0.01, respectively. The precision of CDS is 0.88, whereas the precision of DBNB, ACOS, FCNB, CDM, TL-CST and CNN are 0.75, 0.74, 0.68, 0.73, 0.74 and 0.71, respectively. Additionally, the recall of CDS is 0.90, whereas the recall of DBNB, ACOS, FCNB, CDM, TL-CST and CNN are 0.71, 0.77, 0.75, 0.74, 0.75 and 0.78, respectively. Hence, Figs. 27, 28, 29 and 30 demonstrate that CDS is superior to other recent methods such as DBNB, ACOS, FCNB, CDM, TL-CST and CNN because CDS has the highest accuracy and lowest error.

The results in Figs. 31, 32, 33, 34 and 35) show that CDS gives the highest macro-average precision value equal to 0.89 when the number of training data equals 1300 patients. On the contrary, CDM has the worst value of macro-average precision that reaches to 0.67 at the same number of training patients. Furthermore, CDS has a macro-average recall that is 0.87 that represents the highest value among the used strategies in the comparison while FCNB has the lowest value that is 0.73 at the number of training data equal to 1300 patients. Although CDS achieves the maximum micro-average precision value that is 0.86, FCNB provides the minimum micro-average precision value that is 0.67. CDS has the best micro-average recall value that is 0.85, whereas DBNB, ACOS, FCNB, CDM, TL-CST and CNN have 0.70, 0.69, 0.67, 0.68, 0.69 and 0.71, respectively. Additionally, CDS provides the best F-measure value that is 0.91 while DBNB achieves the lowest value that is 0.63 at the number of training data equal to 1300 patients. In Fig. 36, CDS has the maximum speed as its run time equals 28 (sec.) while the minimum speed value equals 50 (sec.) achieved by TL-CST. Finally, CDS is better than other strategies called DBNB, ACOS, FCNB, CDM, TL-CST and CNN. That is because CDS can provide fast and more precise diagnosis. In fact, both proposed methods, which are EGWO and HDM, help the CDS to provide fast and more accurate results compared to other recent strategies but the effect of EGWO is more than HDM. Hence, selecting the best set of features has a significant impact on the diagnostic model to give a quick and more accurate results.

5 Conclusions and future works

As a result of the rapid spread of Covid-19 disease and the increase in the number of infections and deaths, the rapid and accurate detection process is very important to limit this spread and isolate the infected. In this paper, CDS was provided as a new diagnostic strategy to give a quick and more accurate diagnosis. The CDS consists of two main parts, which are FSP and DP. A new feature selection technique called EGWO was used in FPS to identify the relevant and effective features from Covid-19 dataset. Then, the selected features were passed to HDM as a new diagnosis method in DP to give a fast and more accurate diagnosis. HDM used NB in WP² to calculate the probability (as a weight) of each patient and then used the modified KNN in DP² using the weights of the nearest training patients rather than using the voting process among them. Experimental results ensured that the CDS gives fast and more accurate diagnosis against the compared strategies according to confusion matrix measurements called accuracy, F-measure, precision, error and recall. The accuracy, F-measure, precision, error and recall of CDS are 91%, 1%, 90% and 99%, respectively.

In the future work, the study will focus on using a deep learning algorithm with our proposed diagnostic model to get the most of each of the benefits of these algorithms. Additionally, the proposed CDS will be tested using several Covid-19 datasets from different regions to ensure its general usability.

References

Liu S, Huang M, Xu Y, Kang J et al (2021) CRISPR/Cas12a technology combined with RT-ERA for rapid and portable SARS-CoV-2 detection. Springer, Virologica Sinica. https://doi.org/10.1007/s12250-021-00406-7#citeas
Book Google Scholar
Mansour N, Saleh A, Badawy M, Ali H (2021) Accurate detection of Covid-19 patients based on feature correlated naïve bayes (CDS) classification strategy. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02883-2
Article Google Scholar
Osman A, Al Daajani M, Alsahafi A (2019) Re-positive coronavirus disease PCR test: could it be a reinfection?”. New Microbes and New Infections 37:1–6
Google Scholar
Shaban W, Rabie A, Saleh A, Abo-Elsoud M (2021) Accurate detection of COVID-19 patients based on distance biased naive bayes (DBNB) classification strategy. Pattern Recognit. https://doi.org/10.1016/j.patcog.2021.108110
Article Google Scholar
Muhammad L, Islam M, Usman S, Ayon S (2020) Predictive data mining models for novel coronavirus (COVID-19) infected patients’ recovery. SN Comput Sci 1(206):1–7
Google Scholar
Chandra T, Verma K, Singh B, Jain D, Netam S (2021) Coronavirus disease (COVID-19) detection in Chest X-ray images using majority voting based classifier ensemble. Expert Syst Appl 165:1–13
Article Google Scholar
Shankar K, Perumal E (2021) A novel hand-crafted with deep learning features based fusion model for COVID-19 diagnosis and classification using chest X-ray images. Complex Intell Syst 7:1277–1293
Article Google Scholar
Ozyurt F, Tuncer T, Subasic A (2021) An automated COVID-19 detection based on fused dynamic exemplar pyramid feature extraction and hybrid feature selection using deep learning. Comput Biol Med 132:1–10
Article Google Scholar
Marques G, Agarwal D, Díez I (2020) Automated medical diagnosis of COVID-19 through EfficientNet convolutional neural network. Appl Soft Comput 96:1–11
Article Google Scholar
Turabieh H and Karaa W (2021) “Predicting the existence of COVID-19 using machine learning based on laboratory findings. In: Proceedings of 2021 International Conference of Women in Data Science at Taif University (WiDSTaif ), IEEE, Taif, Saudi Arabia, pp 1–7
Sharifrazi D, Alizadehsani R, Roshanzamir M, Joloudari J et al (2021) Fusion of convolution neural network, support vector machine and Sobel filter for accurate detection of COVID-19 patients using X-ray images. Biomed Signal Process Control 68:1–14
Article Google Scholar
Singh M, Bansal S, Ahuja S, Dubey R et al (2021) Transfer learning–based ensemble support vector machine model for automated COVID-19 detection using lung computerized tomography scan data. Med Biol Eng Comput 59:825–839
Article Google Scholar
Li X, Jiang Y, Liu Y, Zhang J, Yin S, Luo H (2022) RAGCN: region aggregation graph convolutional network for bone age assessment from X-ray images. IEEE Trans Instrum Meas, IEEE 71:1–12
Article Google Scholar
Li X, Jiang Y, Zhang J, Li M, Luo H, Yin S (2022) Lesion-attention pyramid network for diabetic retinopathy grading. Artif Intell Med 126:1–10
Article Google Scholar
Mohamed AM, Saleh A, Altantawy DA, Abo-Elsoud MEA (2022) Covid-19 patients diagnosis (CPD) strategy using data mining techniques. MEJ Mansoura Eng J 47(2):33–42
Article Google Scholar
Saleh A, Rabie A, Abo-Al-Ez K (2016) “A data mining based load forecasting strategy for smart electrical grids. Adv Eng Inform 30(3):422–448
Article Google Scholar
Rabie A, Saleh A, Ali H (2020) Smart electrical grids based on cloud, IoT, and big data technologies: state of the art. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-020-02685-6
Article Google Scholar
Rabie A, Ali S, Ali H, Saleh A (2019) “A fog based load forecasting strategy for smart grids using big electrical data. Clust Comput 22(1):241–270
Article Google Scholar
Rabie A, Ali S, Saleh A, Ali H (2020) “A new outlier rejection methodology for supporting load forecasting in smart grids based on big data. Clust Comput 23:509–535
Article Google Scholar
Rabie A, Ali S, Saleh A, Ali H (2020) A fog based load forecasting strategy based on multi-ensemble classification for smart grids. J Ambient Intell Humaniz Comput 11(1):209–236
Article Google Scholar
Al-Tashi Q, Kadir SA, Rais H, Mirjalili S, Alhussian H (2019) Binary optimization using hybrid grey wolf optimization for feature selection. IEEE Access, IEEE 7:39496–39508
Article Google Scholar
Mirjalili S, Mirjalili S, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Article Google Scholar
El-kenawy E, Eid M, Saber M, Ibrahim A (2020) MbGWO-SFS: modified binary grey wolf optimizer based on stochastic fractal search for feature selection. IEEE Access, IEEE 8:107635–107649
Article Google Scholar
Al-Tashi Q, Rais H, Abdulkadir S, and Mirjalili S (2020) “Feature Selection Based on Grey Wolf Optimizer for Oil & Gas Reservoir Classification, In: Proceedings in 2020 International Conference on Computational Intelligence (ICCI), IEEE, Bandar Seri Iskandar, Malaysia. pp 211–216
Arora S, Singh H, Sharma M, Sharma S, Anand P (2019) A new hybrid algorithm based on grey wolf optimization and crow search algorithm for unconstrained function optimization and feature selection. IEEE Access, IEEE 7:26343–26361
Article Google Scholar
Canayaz M (2021) MH-COVIDNet: diagnosis of COVID-19 using deep neural networks and meta-heuristic-based feature selection on X-ray images. Biomed Signal Process Control 64:1–12
Article Google Scholar
Saabia, A, El-Hafeez T, and Zaki A (2018) “Face recognition based on Grey Wolf Optimization for feature selection. In: Proceedings in International conference on advanced intelligent systems and informatics, Springer, Cham, Vol 845, pp 273–283
Alyam R, Alhajja J, Alnajran B et al. (2017) “Investigating the effect of correlation based feature selection on breast cancer diagnosis using artificial neural network and support vector machines, In:” Proceedings of the 2017 International Conference on Informatics, Health & Technology (ICIHT), IEEE, Riyadh, Saudi Arabia pp 1–7
BinSaeedan W, Alramlawi S (2021) CS-BPSO: hybrid feature selection based on chi-square and binary PSO algorithm for Arabic email authorship analysis. Knowl-Based Syst 227:1–14
Article Google Scholar
Song X, Zhang Y, Gong D, Gao X (2021) A fast hybrid feature selection based on correlation-guided clustering and particle swarm optimization for high-dimensional data. IEEE Trans Cybern, IEEE. https://doi.org/10.1109/TCYB.2021.3061152,1-14
Article Google Scholar
Ramesh G, Madhavi K, Reddy P, Somasekar J, Tan J (2021) Improving the accuracy of heart attack risk prediction based on information gain feature selection technique. Mater Today Proc. https://doi.org/10.1016/j.matpr.2020.12.079
Article Google Scholar
Prasetiyowati M, Maulidevi N, Surendro K (2021) Determining threshold value on information gain feature selection to increase speed and prediction accuracy of random forest. J Big Data 8(1):1–22
Article Google Scholar
Alqahtani M, Mathkour H, Ismail M (2020) IoT botnet attack detection based on optimized extreme gradient boosting and feature selection. Sensors 20(21):1–21
Article Google Scholar
Ayyad S, Saleh A, Labib L (2019) Gene expression cancer classification using modified K-Nearest Neighbors technique. BioSystems 176:41–51
Article Google Scholar
Balaji V, Suganthi S, Rajadevi R, Kumar V, Balaji B, Pandiyan S (2020) Skin disease detection and segmentation using dynamic graph cut algorithm and classification through Naive Bayes classifier. Measurement 163(15):1–14
Google Scholar
Badriyah T, Savitri N, Sa’adah U, and Syarif I (2020) “Application of Naive Bayes Method for IUGR (Intra Uterine Growth Restriction) Diagnosis on the Pregnancy, In: Proceedings of 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE) pp 1–4
Rosdi B, Mukahar N, Han N (2021) Finger vein recognition using principle component analysis and adaptive k-nearest centroid neighbor classifier. Int J Integr Eng 13(1):177–187
Google Scholar
Rikan S, Azar A, Ghafari A, Mohasefi J, Pirnejad H (2022) COVID-19 diagnosis from routine blood tests using artificial intelligence techniques. Biomed Signal Process Control 72:1–16
Google Scholar
https://zenodo.org/record/4081318#.X4RWqdD7TIU
Sen S, Saha S, Chatterjee S, Mirjalili S, Sarkar R (2021) A bi-stage feature selection approach for COVID-19 prediction using chest CT images. Appl Intell. https://doi.org/10.1007/s10489-021-02292-8
Article Google Scholar
Too J, Mirjalili S (2021) A hyper learning binary dragonfly algorithm for feature selection: a COVID-19 case study. Knowl-Based Syst 212:1–30
Article Google Scholar
Sowmiya C, Sumitra P (2021) A hybrid approach for mortality prediction for heart patients using ACO-HKNN. J Ambient Intell Humaniz Comput 12:5405–5412
Article Google Scholar

Download references

Funding

Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).

Author information

Authors and Affiliations

Computers and Control Department Faculty of Engineering, Mansoura University, Mansoura, Egypt
Asmaa H. Rabie & Ahmed I. Saleh
Delta Higher Institute for Engineering and Technology, Talkha, Mansoura, Egypt
Alaa M. Mohamed
Electronics and Communication Department Faculty of Engineering, Mansoura University, Mansoura, Egypt
M. A. Abo-Elsoud

Authors

Asmaa H. Rabie
View author publications
You can also search for this author in PubMed Google Scholar
Alaa M. Mohamed
View author publications
You can also search for this author in PubMed Google Scholar
M. A. Abo-Elsoud
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed I. Saleh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ahmed I. Saleh.

Ethics declarations

Conflict of interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Data availability

The dataset is available at: https://zenodo.org/record/4081318#.X4RWqdD7TIU.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Rabie, A.H., Mohamed, A.M., Abo-Elsoud, M.A. et al. A new Covid-19 diagnosis strategy using a modified KNN classifier. Neural Comput & Applic 35, 17349–17373 (2023). https://doi.org/10.1007/s00521-023-08588-9

Download citation

Received: 09 November 2022
Accepted: 05 April 2023
Published: 02 May 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00521-023-08588-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new Covid-19 diagnosis strategy using a modified KNN classifier

Abstract

Similar content being viewed by others

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

An Analysis of Supervised Machine Learning Algorithms for COVID-19 Diagnosis

Nature inspired optimization model for classification and severity prediction in COVID-19 clinical dataset

1 Introduction

2 Related work

3 The proposed Covid-19 diagnostic strategy (CDS)

3.1 Feature selection phase (FSP)

3.2 Diagnosis phase (DP)

3.2.1 Weighting patient phase (WP²)

3.2.2 Diagnostic patient phase (DP²)

4 Experimental results

4.1 Covid-19 dataset’s description

4.2 Evaluation metrics

4.3 Testing the proposed feature selection technique

4.4 Testing the proposed classification technique

4.5 Testing the proposed Covid-19 diagnostic strategy (CDS)

5 Conclusions and future works

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new Covid-19 diagnosis strategy using a modified KNN classifier

Abstract

Similar content being viewed by others

Accurate detection of Covid-19 patients based on Feature Correlated Naïve Bayes (FCNB) classification strategy

An Analysis of Supervised Machine Learning Algorithms for COVID-19 Diagnosis

Nature inspired optimization model for classification and severity prediction in COVID-19 clinical dataset

1 Introduction

2 Related work

3 The proposed Covid-19 diagnostic strategy (CDS)

3.1 Feature selection phase (FSP)

3.2 Diagnosis phase (DP)

3.2.1 Weighting patient phase (WP2)

3.2.2 Diagnostic patient phase (DP2)

4 Experimental results

4.1 Covid-19 dataset’s description

4.2 Evaluation metrics

4.3 Testing the proposed feature selection technique

4.4 Testing the proposed classification technique

4.5 Testing the proposed Covid-19 diagnostic strategy (CDS)

5 Conclusions and future works

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Data availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

3.2.1 Weighting patient phase (WP²)

3.2.2 Diagnostic patient phase (DP²)