1 Introduction

Coronaviruses are among diseases that have threatened our world for years, spreading in the form of epidemics among most human and animal species. An outbreak of coronavirus infection, which severely affected the population in Wuhan the capital of China's Hubei Province, began in December 2019. The World Health Organization (WHO) proclaimed a worldwide emergency on January 30, 2020, officially announced a new coronavirus, named COVID-19, and classified a pandemic on March 11, 2020. The coronavirus can infect birds, mammals, and humans. However, bats do not become infected even though they host the coronavirus [1]. As of October 22, 2021, more than 241 million cases were recorded worldwide, causing the death of more than 4.9 million infected people. The top three regions with the highest number of COVID-19 cases and more than weekly 2 million cases are described on the official WHO web page as the Americas, South-East Asia, and Europe, respectively. At the time of the first spread of COVID-19, the Chinese government announced that the diagnosis of COVID-19 could be confirmed with real-time reverse transcription polymerase chain reaction (RT-PCR) [1]. However, the fact that RT-PCR tests gave extreme false-negative results made the reliability of these tests questionable [2]. The inability to detect infected people and to start the necessary treatment on time increases both the risk of transmission of COVID-19 and the risk of death during this process.

COVID-19 tests are done to detect viruses or antibodies. The diagnosis of COVID-19 is made based on two basic approaches. The first is laboratory-based approaches that include nucleic acid testing, antigen tests, and serology tests. The second approach considers lung imaging-based diagnostic approaches such as X-rays and computed tomography (CT) scans[3]. Laboratory tests are performed on samples obtained through nasopharyngeal swab, throat swab, and sputum. One of the most common diagnostic methods used is nasopharyngeal swab [4]. X-rays and CT scans are used as important diagnostic approaches for the verification of patients suspected of being infected with the virus. Since COVID-19 can affect the lungs in a similar way to many diseases such as pneumonia on images obtained with X-rays and tomography, a definite positive COVID-19 result may not be reached based on findings obtained only from lung images without clinical diagnosis [5]. Along with clinical diagnoses, a chest CT scan has high sensitivity in revealing definitive diagnosis of COVID-19. So, the diagnosis of COVID-19 can be made by combining the symptoms and laboratory findings of the infected person with radiological imaging techniques. The radiological features of COVID-19 can be detected by X-rays and CT scans, which are radiological imaging techniques. Radiologists mostly prefer X-ray chest images for the diagnosis of COVID-19 disease. However, chest CT scans are used for more accurate detection, since X-ray devices cannot accurately distinguish soft tissues in chest images [6].

There has been a considerable increase in studies in the literature to diagnose COVID-19 from chest CT scans. Looking at these studies, the diagnostic approaches for COVID-19 are examined in two general categories. The first is based on laboratory-based approaches, while the other is based on medical imaging instruments such as X-rays and CT scans. When the studies performed are examined, chest CT-scans are used as a priority tool in the clinical process because of successful results for the diagnosis of COVID-19 [7]. Today, Artificial Intelligence (AI)-based Machine Learning (ML) and Deep Learning (DL) technologies are used for the diagnosis of SARS-CoV-2 in the medical field by using chest CT scans. ML Algorithms are used to help radiologists make decisions in the process of diagnosing COVID-19 from images on chest CT-scans. In addition, Deep Neural Networks (DNN) are preferred by researchers for imaging-based problems that require feature extraction, such as the diagnosis of COVID-19.

To summarize the literature studies about CT scans related to the diagnosis of COVID-19, Alom et al. [8] studied a total of 425 CT-scans developing the Inception Recurrent Residual Neural Network (IRRCNN) and NABLA-N models for COVID-19 detection and segmentation of CT scans, respectively. Silva et al. [9] developed the Efficient Deep Learning Technique to evaluate each COVID-19 chest CT scan independently and to process CT images of different quality when using different CT devices depending on the environmental conditions. To diagnose COVID-19 from chest CT scans and to classify the lesions by segmenting, the following models were developed: a Multi-task Deep Learning model by Amyar et al. [10], Weakly Supervised DL Framework by Wang et al. [11], a new Deep Transfer Learning model based on DenseNet201 by Jaiswal et al. [12], a new DL model was developed using multi-objective differential evolution (MODE) and convolutional neural networks (CNN) by Singh et al. [3].

In the literature review for our study, we did not come across any study in which a Bayes optimization (BO)-based approach was applied to CT scans. Therefore, the motivation for this paper is to find the hyperparameters (HP) using BO by both DNN and ML algorithms and to be the first to illustrate their high performance. What makes our study important is that it tries to obtain the best models by finding the most optimum results in a particular search area while choosing the most suitable parameters for both DNN and ML. Moreover, it is planned to contribute to real-time disease diagnosis by implementing these models on web. Generalization of models is important for performance in real-time systems. Ultimately, we have to verify that our work, which is expected to aid expert opinion, is working correctly. For this reason, GridSearchCV (GS) [13] is used as an alternative to BO for parameter optimization of ML algorithms. This is an important argument that shows that our study is usable.

We may summarize the contribution of our study to the field with a few points:

  • Providing a decision support mechanism that helps expert opinion with high accuracy using BO-based models.

  • Showing that datasets created from CT scans can give different results in terms of model and features.

  • Examining the contribution of HP regulation to performance.

  • It offers a fast-integrated approach for real-time disease diagnosis.

2 Related work

In this part of our study, literature studies published using BO about COVID-19 data in the field of artificial neural network (ANN) and DL are presented.

Cabras [14] proposed a semi-parametric approach to estimate the evolution of COVID-19 disease in Spain. It combined DL techniques with Bayesian Poisson-Gamma model. The resulting general model enabled prediction of the future variation of the disease sequences in all regions and the results of the final future scenarios. The overall success rate was found to be 95% in this study. Ghoshal et al. [15] studied a large number of PA chest radiography images. They attempted to improve diagnostic performance by using Dropweights-based Bayesian convolutional neural networks (BCNN) and DL methods. In comparison with standard Convolutional Neural Networks and BCNN, the accuracy rate was shown to be higher (over 92%) for BCNN. Dhamodharavadhani et al. [16] used SNN models such as probabilistic neural network (PNN), radial basic function neural network (RBFNN), and generalized regression neural network (GRNN), which include the Bayesian decision rule and the predictors of the Parzen probability density function. They attempted to predict future COVID-19 deaths in India using two separate datasets. In the study, R (correlation coefficient) and RMSE (square root of the mean square of the errors) were studied. As a result, PNN was observed to give better results for both criteria. Ucar et al. [17] adapted SqueezeNet for the diagnosis of COVID-19 by combining with BO. The BO method was used for the optimization of HP. The proposed method classified three classes of X-ray images labeled Normal, Pneumonia and COVID. It classified the data in the normal class with 98.04% accuracy, the data in the Pneumonia class with 96.73% accuracy, and the data in the COVID-19 class with 100% accuracy. Arman et al. [18] optimized the HP values of VGG16, MobileNetV2, InceptionV3, and Xception architectures using BO to detect COVID-19 on chest X-ray images. The proposed method classified three classes of X-ray images labeled Normal, Pneumonia and COVID. It classified the data in the normal class with 100% accuracy, the data in the Pneumonia class with 100% accuracy, and the data in the COVID-19 class with 98.3% accuracy. Majid et al. [19] designed a new series network consisting of five convolution layers to replace CNNs. This CNN model was designed as a deep feature extractor. The inferred deep distinguishing features were used to feed ML algorithms, the k-nearest neighbor, support vector machine (SVM), and decision tree. The HPs of ML models were optimized using the Bayes optimization algorithm. The best accuracy rate was achieved at 98.7% with SVM. Stefan et al. [20] processed a large number of reasonable hypothetical scenarios generated by a simulation program with ANN. After completion of the training phase, Bayesian posterior distributions were estimated. The network created has three levels. In the first level, feature extraction was performed from the observation data, in the second, preprocessed time series of different lengths were reduced to fixed-size statistical summaries, and in the third, a Bayesian-based inference network was used to extract parameters from the observations with summary statistics. At the end of the study, the number of newly infected, newly recovered and new deaths was estimated with 95% success. Ratnabali et al. [21] proposed a shallow long short-term memory (LSTM)-based neural network to estimate the COVID-19 risk situation of countries. The BO framework was used to optimize and design country-specific networks. Each network created with BO was trained using a maximum of 5000 iterations. The data for each country were used separately to create a country-specific optimized network and an average of 77.6% accuracy was obtained in country-specific datasets. Ankur et al. [22] showed that the uncertainty estimation decreases when the amount of training data is low with Bayesian Neural Network (BNN) and Deep Ensemble (DE) models. The approach enabled the basic uncertainties of the estimation for the deep K-Nearest neighbor (kNN) classifier to be accurately measured. Diagnosis of COVID-19 from chest X-rays was shown to measure uncertainty in a superior way compared to the latest technology. The proposed model was tested on three different datasets (COVID-19 training, COVID-19 Unseen and Shoulder). It achieved an accuracy rate of 99.9% for the first dataset, 60% for the second and 50.1% for the third. Gao et al. [23] used a total of 1918 CT scans in their study where they developed an approach called double-branched combination network (DCN) with less attention module for Covid-19 diagnosis and segmentation. The highest accuracy rate for classification was stated as 96.74%. Panwar et al. [24] have considered three datasets known as 1) COVID-chest X- ray, 2) SARS-COV-2 CT-scan, and 3) Chest X-ray Images (Pneumonia). According to the results obtained, the proposed deep learning model can detect COVID-19 positive cases within ≤ 2 s, faster than the currently used RT-PCR tests for the detection of COVID-19 cases. In their study, He et al. [25], in which they created a publicly available dataset containing hundreds of CT scans, developed sample efficient deep learning methods that can obtain high diagnostic accuracy of COVID-19 from CT scans even when the number of CT images is limited. Specifically, they propose a self-transition approach that synergistically integrates comparative self-supervised learning and transfer learning to learn powerful and unbiased feature representations to reduce the risk of overfitting. Wu et al. [26] have developed a new Joint Classification and Segmentation (JCS) system to perform real-time and explainable CT chest CT diagnosis of COVID-19. JCS obtains an average sensitivity of 95.0% and a specificity of 93.0% on the classification test set.

3 Methodology

3.1 Hyperparameter tuning

HPs have an important place in both ML and DL algorithms as they aim to achieve the best performance in ML algorithms [17] because ML algorithms rarely contain parameters. HPs also have an important place in training algorithms [27]. Especially in CNN studies, it can be time consuming considering the size of the model, activation function, optimization algorithm to be used and the structure of the network [28]. BO is a convenient approach in studies that take a long time [27, 29].

The HP optimization method is collected under two headings as manual and automatic search. Manual search is based on an expert's experience. As a result of the increase in the number of hyperparameters and the value range, the possibility of making an error increases [27]. Trial-and-error processes slow down the optimization process [28]. HPs optimization were suggested to reduce the possibility of errors and speed up optimization. HPs optimization aim to reduce human effort in ML algorithms, to increase current performances and to make studies repeatable [27]. Three techniques are often used in ML algorithms to optimize HPs; these are Grid Search CV (GS), Random Search CV (RS) [27, 28] and Informed search methods, respectively.

3.1.1 GridSearch CV

The GS method is a full factorial design. It checks all possible states to optimize parameters [27, 28]. A finite set of values is created for each HP and the Cartesian products of these sets are evaluated [27]. Large numbers of HPs and the search field cause an increase in time [28]. RS is more efficient than GS in a high-dimensional space. However, the RS method is unsuccessful in training complex models [27].

3.1.2 Bayesian optimization

Bayesian optimization is the most popular informed search method. It is faster than GS and RS methods. BO [28, 29] is preferred, especially considering the computational density encountered in DL algorithms. BO is an approach to optimize objective functions that take a long time to evaluate [29,30,31]. BO is a model-based HP optimization algorithm [31,32,33] based on the iterative update of the function to be optimized.

If we define \(f:x\to y\), y = f(x) and f as \(D={\left\{{x}_{i},{y}_{i}\right\}}_{=1}\) a black-box function, BO is a probability-based surrogate model (SM) to maximize an acquisition function (AF) that will decide which point to select [32,33,34]. An unknown model (f) is considered to have a black box property if it does not have a functional form [34, 35], and the optimization problem related to the HPs of this model is as in Eq. 1.

$${x}^{*}=\underset{x\in X}{\mathrm{argmax}}\,f(x)$$
(1)

The purpose of this optimization problem is to find global maximization (or minimization) at the sampling point for the function f. Here X represents the search space of x. BO is essentially a Bayesian approach based on Bayes' theorem. The purpose of Bayesian approaches is to use the information obtained from the data as prior information and to reveal how the existing information will be updated with the obtained posterior information [36, 37].

Using the Bayesian approach, an SM is created in BO [27, 28]. As an SM, it usually uses one of the gaussian processes (GP), random forest regression (RFR) or tree Parzen estimators (TPE) methods. In studies, GP is preferred which takes advantage of the properties of normal distribution and has a stochastic process. GPs are preferred due to their smooth and well-calibrated uncertainty estimates and closed-form computability properties [33, 33]. GPs predict a distribution for each HP setting rather than a single value [27, 28]. GP is considered to be the mean of function μ, covariance kernel K, \(f \sim GP(\mu , K)\). In this study, the kernel function of the Matern (v = 5/2), which is widely used to define the covariance of two points at \(d({x}_{i},{x}_{j})\) unit distance, was preferred.

$$K=\left(1+\frac{\sqrt{5}d}{\rho }+\frac{5{d}^{2}}{{3\rho }^{2}}\right)\mathrm{exp}\left(-\frac{\sqrt{5}d}{\rho }\right)$$
(2)

where \(d({x}_{i},{x}_{j})\) is Euclidean distance and ρ and ν are covariance parameters.

The process of maximizing the posterior process obtained by combining SM and prior knowledge [27, 28, 32] in BO is called AF (u). AF enables BO to make educated predictions [37, 38]. A proper AF should be easy to assess or maximize, and there should be a tradeoff between exploration and exploitation. Probability of Improvement (PI), Expected Improvement (EI) and Upper Confidence Bound (UCB) are commonly used for AF. PI was used in this study. If we define the best available observation as \(({x}^{+})\), which maximizes the possibility of improvement, Eq. 3 written as

$${x}^{+}=\underset{x\in X}{\mathrm{argmax}}\,u\left(x|D\right)=\underset{x\in X}{\mathrm{argmax}}\,f(x)$$
(3)

PI tries to find points that will prevail over the best available value. The search is terminated when the repeat count of the algorithm reaches the maximum, and where Φ(·) is the normal cumulative distribution function. This function, defined as PI, tries to find a point where improvement probability is maximized [27, 28] by adding a ε trade-off parameter [38, 39].

$$\mathrm{PI}\left(x\right)=P\left(f\left(x\right)\ge f\left({x}^{+}\right)+\varepsilon \right)=\Phi \left(\frac{\mu \left(x\right)-f\left({x}^{+}\right)-\varepsilon }{\sigma (x)}\right)$$
(4)

where ε is a parameter that tunes the tradeoff between exploration and exploitation.

The BO process continues to iterate until the maximum value is reached. BO makes this search efficient, using all the information it gets from the optimization history [39]. The pseudo code of BO is given in Algorithm 1.

figure a

3.2 Deep neural networks

DL is an ML subfield about algorithms inspired by the structure and function of the brain called NNA. DNN, on the other hand, is a tool in which DL applications that contain layer structures such as convolution, pooling, and fully connected layer are carried out. Many models were developed through these layers. The models we used in our study were developed using these layers. Detailed information about these layers can be found in [40]. The models developed have their own features rather than these layers.

3.2.1 ResNet-50

ResNet-50 network architecture has 4 stages. Each ResNet [41] architecture performs initial convolution and maximum pooling using 7 × 7 and 3 × 3 core sizes, respectively. Each layer of a ResNet consists of several blocks. In our study, 1000 features were extracted by using the "fc1000" layer in this model for feature extraction.

3.2.2 MobileNetv2

MobileNetv2 [42] offers a new CNN layer with the inverted residual and linear bottleneck layer that provides high accuracy and performance in mobile and embedded video applications. Especially developed for devices with low computing power, this model reduces the complexity cost of the network. In addition, the model size decreases. In our study, again 1000 features were extracted by using the "logits" layer in this model.

Structures of the models are summarized in Table 1.

Table 1 Structure of models

3.3 Machine learning algorithms

3.3.1 Support vector machine

SVM is one of the basic approaches for supervised learning. Additionally, it is widely used in classification and regression applications, and also frequently in clustering [43, 44], feature selection [45,46,47], feature extraction [48, 49], etc. SVM, based on the statistical learning theory [50, 51], is a distribution independent learning algorithm since it does not require joint distribution function information. The basic working principle of the algorithm is to determine a hyperplane that can optimally separate the pixels belonging to two classes from each other [52]. SVM applies the principle of minimizing the structure risk to minimize empirical error and learner complexity [50]. In this study, C, degree, and kernel parameters in SVM were obtained by the HP tuning process. SVM is demonstrated in Fig. 1.

Fig. 1
figure 1

Support vector machine

Equations (5) and (6) represent formulas for a line or hyper plane, respectively. SVM should find weights so that the data points are separated according to a decision rule.

$$wx + b = 0$$
(5)
$$y = mx + b$$
(6)

Here w is a weight vector, x is input vector, b is bias. C is a parameter that changes depending on the optimization. The higher the C value, the tighter the margin and care is taken to minimize the number of misclassifications. As the value of C decreases, it is allowed to overflow the classes because it becomes the goal of SVM to keep the margin between the two classes maximum [53]. The degree parameter determines the flexibility of the decision boundary. The lowest order polynomial is the linear kernel, which is not sufficient when there is a nonlinear relationship between the features. Also, increasing these parameters leads to higher training times. Kernel parameters have a very important influence on the decision-making boundaries. Kernel parameters select the type of hyperplane. The linear one uses a linear hyperplane. rbf, sigmoid and poly use a nonlinear hyperplane.

3.3.2 k-Nearest neighbor algorithm

kNN algorithm is a nonparametric classification method. It is a method with simple structure but is effective [54]. The kNN classifier tries to classify the data by assigning observation data of unknown classes to the class with the most similar examples [55]. The first value to be determined in the kNN algorithm is the distance between data. The distance measurement methods generally used for this are: Euclidean, Manhattan and Minkowski methods.

The most used Euclidean distance method in practice, between sample Xi and Xj is defined as [56]:

$$({X}_{i},{X}_{j}) =\sqrt{{{(X}_{i1}-{X}_{j1)}}^{2}+{{(X}_{i2}-{X}_{j2)}}^{2}+\dots +{{(X}_{in}-{X}_{jn)}}^{2}}$$
(7)

The second value to be determined is the parameter k. It is effective in determining the number of neighborhoods. Choosing an appropriate k value for kNN significantly affects the success of the classification. There are many ways to choose the K value. However, the simplest is to run the algorithm with different k values to select the one with the best performance [57]. Choosing a small value of k will increase the number of classes and create classes that do not exist. If the value of k chosen is too large, the classes will be fewer than they should be and thus the error values of the classes are increased. In general, larger k values are more resistant to potential noise in the data and make the boundaries between classes smoother [58]. In this study, n_neighboor (k), weights, and metric parameters were obtained by tuning.

Figure 2 represents the neighborhood for 2 sample points at k = 2.

Fig. 2
figure 2

A simple example of 3-nearest neighbor classification

If k = 2, the q1 point is largely labeled cluster 2 and the q2 point is labeled cluster 1.

3.4 Proposed approach

This study consists of three steps. In the first step, DNN was trained separately with all datasets. The learning rate, momentum, and L2 regularization parameters needed for the SGD optimization algorithm used to update the weights in the training process were found through BO. The "bayesopt" function in MATLAB was used to find these parameters. The following value ranges were used to find suitable values for these parameters; Initial Learn Rate: [1e-2, 1], Momentum: [0.8, 0.98], and L2 Regularization: [1e-10, 1e-2]. These parameters were run at the given ranges using BO. The parameter values that provide the best value for the trained networks at the end of the operation are recorded for the network. The network created with these values was used in the second stage feature extraction.

The second step is the feature extraction step from DNN models trained with datasets. DNN feature extraction is obtained from the activation of the desired layer. Network activations are used for feature extraction. In the study, 1000 properties of each image in the dataset were extracted by using the "Logits" layer for MobilNetv2 and the "fc1000" layer for ResNet. Feature extraction was done separately for each dataset and saved as a *.mat file.

In the last step, the extracted features were classified by ML algorithms using Python language. BO was used to find the HPs for the ML algorithms. In addition to the BO method, the aim was to compare the results by using the GS method in finding the parameters. Five-fold cross validation was used to ensure the reliability of these methods. In the method proposed in the study, SVM and kNN ML algorithms were preferred. The parameter values used in these algorithms are C, kernel, and degree values for SVM. For kNN, n_neighbors (k) are metric and weights.

The recommended approach is shown in Fig. 3.

Fig. 3
figure 3

Flowchart of the BO-based proposed approach

4 Results

4.1 Dataset

4.1.1 Dataset 1

The dataset used in the study was taken from the [59] study. The dataset includes two classes, COVID and Non-COVID. The COVID-CT-dataset has 349 CT images containing clinical signs of COVID-19 in 216 patients. The non-COVID dataset includes 396 CT images. According to the study, the images in the dataset were confirmed by a senior radiologist at Tongji Hospital, Wuhan, China, who diagnosed and treated a large number of COVID-19 patients during the outbreak of this disease between January and April. They also state that the dataset in this study was collected from articles on COVID-19 taken from medRxiv, bioRxiv, NEJM, JAMA, Lancet, etc. In our study, a total of 698 images, 349 from each class, were used for the classes to contain an equal number of images. These images were used in DNN as 20% test data.

4.1.2 Dataset 2

The second dataset we used is from the Kaggle. This dataset includes 1252 CT images of SARS-CoV-2 infection (COVID-19) and 1230 CT images without COVID-19 (Non-COVID). The dataset was collected from real patients in hospitals in Sao Paulo, Brazil [60, 61].

An example of the images used for datasets is shown in Fig. 4. In addition, the sample numbers for these datasets used in the training and testing phase are given in Table 2.

Fig. 4
figure 4

CT images that COVID-19 is infected and is non-infected in the used dataset (Dataset1 and Dataset2)

Table 2 Sample sizes used for Training and Testing in Dataset1 and Dataset2

4.2 Evaluation metrics

The application we developed for our study was written using MATLAB and Python programs. The computer where the applications ran had 16 GB of RAM and an I7 processor. In addition, models were run on GeForce 1070 graphics card with GPU. The performance metrics [62] of our study were obtained using a confusion matrix. The confusion matrix is the matrix N X N where N is the predicted number of classes. Since there are 2 classes in our study, a 2 × 2 matrix is obtained. These metrics are; accuracy: the ratio of the total number of predictions that are correct, positive predictive value or precision: proportion of correctly identified positive cases, negative predictive value: proportion of correctly identified negative events and sensitivity or recall: proportion of true positive cases correctly identified. For a 2-class structure, these values are shown in Fig. 5 for the confusion matrix. Metrics are calculated according to Eqs. 812.

Fig. 5
figure 5

Confusion matrix for 2-class

$${S}_{e}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FN}}$$
(8)
$${S}_{p}=\frac{\mathrm{TN}}{\mathrm{TN}+\mathrm{FP}}$$
(9)
$$\mathrm{Pre}=\frac{\mathrm{TP}}{\mathrm{TP}+\mathrm{FP}}$$
(10)
$$F-\mathrm{score}=\frac{2\mathrm{TP}}{2\mathrm{TP}+\mathrm{FP}+\mathrm{FN}}$$
(11)
$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP}+\mathrm{TN}+\mathrm{FP}+\mathrm{FN}}$$
(12)

4.3 Experiments

The experimental studies performed consist of three steps.

4.3.1 Experiment 1

In the first experiment, attempts were made to find HPs of DNN models by using BO.

In these experimental studies, the optimum values were found for the Learning Rate, Momentum and L2 regularization parameters for the SGD optimization algorithm using BO in MobileNet and ResNet models. The results obtained from these optimum values and models are presented in Table 3. Bayesopt tool in MATLAB program is used for BO. This tool uses cross-validation loss as an objective function for BO.

Table 3 Optimal DNN results obtained by using BO at the experiment1

When Table 3 is examined, an accuracy rate of 97.86% is achieved in the MobileNetv2 model for Dataset 1. For Dataset 2, both models provided an accuracy rate of over 99%. Again, precision, recall and f1-score values for this dataset are over 99%. For the Mixed dataset created by mixing both datasets, the ResNet-50 model achieved 98.50% success. These ratios are derived from confusion matrices. The confusion matrix obtained from the models for the Mixed Dataset is given in Fig. 6.

Fig. 6
figure 6

Confusion Matrix for Mixed Dataset at the MobileNetv2 and the ResNet-50

4.3.2 Experiment 2

In the second experiment, the features obtained from Dataset 1 and Dataset 2 were classified using SVM and kNN ML algorithm models. Here again, BO was used to find HP of these models. In addition, GS was also used to verify the reliability of the hyperparameter optimization. The HP found is given in Table 4 separately for each model and each ML algorithm. Results obtained by HP optimization are given for all datasets in order. Table 5 shows the results for SVM and kNN for Dataset 1. Table 6 shows the results for SVM and kNN for Dataset 2.

Table 4 Findings HPs based on the BO for SVM and kNN algorithms
Table 5 Results obtained by the SVM and the kNN algorithms for Dataset1
Table 6 Results obtained by the SVM and the kNN algorithms for Dataset2

For Dataset 1, when we look at the results obtained by using the features in the models with ML algorithms, an accuracy rate of 97.85% was obtained by using MobileNetv2 features and SVM. The kNN performance result for the features extracted from this model is 97.14%. Again for this dataset, the performance rate for the features obtained from the ResNet-50 model is 97.85%, while the performance rate for Bayesian kNN is 96.42%. For the features of this model, the highest performance was achieved with GS with 97.85%.

In the experimental studies for Dataset 2, the success is over 99%. This performance value is realized as a result of model training for both models.

4.3.3 Experiment 3

These experimental results were carried out on the Mixed Dataset obtained by mixing Dataset 1 and Dataset 2. The results obtained using this dataset are shown in Table 7.

Table 7 Results obtained by the SVM and the kNN algorithms for Mixed Dataset

When we examine the table for the Mixed Dataset, the features obtained from MobilNetv2 and SVM provided a performance of 98.12%, while kNN achieved a performance of 95.78%.

For ResNet-50, the performance achieved with SVM is 99.064%. The performance obtained from kNN is the highest success rate for the mixed dataset at 99.376%. The complexity matrix for this model, which provides the highest accuracy rate, is given in Fig. 7. In addition, the ROC curves obtained from this experimental result are given in Fig. 8.

Fig. 7
figure 7

Confusion Matrix for ResNet features with the kNN algorithm

Fig. 8
figure 8

Graphical of the ROC-curve obtained at the experiment3 by the MobileNetv2 and the Resnet-50

In Table 8, all steps in our study are summarized in terms of accuracy values.

Table 8 Accuracy values obtained using the SVM and kNN algorithms by the MobileNetv2 and the ResNet-50

According to this table, while MobilNetv2 has the highest accuracy rate obtained as a result of DNN training, the highest performance dataset is Dataset 2. The SVM model obtained by using GridSearchCV and BO for Dataset 1 obtained higher accuracy than kNN. For Dataset 2, the performance is over 99% in all models. As a result of the training of DNN models, for the Mixed Dataset a performance value of 98.59% was obtained with the ResNet-50 model. It is possible to see that for the properties obtained from MobilNetv2, the performance rates obtained with SVM are higher than the performance rates obtained with kNN. For this dataset, a high performance rate of 99.37% was obtained for both GridSearchCV and Bayesian kNN in the classification of the features obtained from the ResNet-50 model.

5 Discussion

When we look at the studies in this field, it is possible to see that many studies have been carried out recently. These studies were conducted on different datasets with different methods. It is possible to see that all the studies carried out today, where the disease is defined as a pandemic, make a certain contribution to the field. Our first aim was to contribute to these studies in this study. We hope that this approach, which we have obtained with the use of BO with DNN and ML algorithms, will be among the studies that contribute to the field. The results of the studies conducted on the CT images for COVID-19 and the comparison table of our approach are given in Table 9.

Table 9 COVID-19 classification results in the literature using different methods

Table 10 shows the performance of previous publications and our study with these datasets. Our study is about COVID-19 diagnosis on CT images. Studies on this subject continue in the current period. The advantage of our study is that we use a model that does not require high computation, such as MobilNetv2. In this way, we think very fast results can be obtained. As is known, individuals infected with this disease must be isolated from others very quickly. We aimed to achieve this with our approach. When other studies completed in this field are examined, given in Table 10, high success was obtained in this study. As mentioned in the experimental results section, we attempted to show that the performances of the datasets can give different results by taking separate results for more than one dataset. Another advantage of our study is that BO was tested on two models and the results were shown, and the results were given with Bayes-based ML algorithms using deep features. In other words, these methods, which are presented separately in other studies, are presented together in our study.

Table 10 Performance comparison of the COVID-19/normal classification in this study according to the literature

6 Conclusions

In this study, a BO-based approach that diagnoses COVID-19 on CT images is proposed. MobilNetv2 and ResNet-50 models, which are DNN models, were used in the first stage of the study to find optimum HPs. In the second stage of the study, feature extraction was achieved using these models. In our experimental results, two datasets from different countries were used. A mixed dataset was created by mixing these datasets and the performances of the models were shown for this dataset. Among ML algorithms, SVM and kNN algorithms were preferred in our study because they are the most widely used in this field in literature reviews. Again, BO was used to select the optimum of some parameter values for these algorithms. In addition, by using GS, a methods used in HP detection, the results are given comparatively with BO. A 99.37% success rate for the Mixed Dataset was achieved with BO parameters and the kNN algorithm had high performance. The study is expected to act as a decision support mechanism that helps experts with diagnosis of this disease. In future studies, studies carried out with different models and methods will contribute to this field.