1 Introduction

COVID-19 broke out in the world in early December 2019 and rapidly turned into a pandemic. According to the World Health Organization (WHO) data, 227,940,972 people have been infected, while 4,682,899 people have been killed by the disease around the world until today [1]. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is the virus which has caused COVID-19 pandemic [2]. Common symptoms of COVID-19 pandemic can be listed as fever, muscle pain, dry cough, head ache, a sore throat ve chest pain [3, 4]. Due to these sypmtoms, COVID-19 has been accepted as a respiratory tract disease. It may take these symptoms 2 to 14 days to appear in a person who has been infected with the virus [5]. Despite recent attempts at finding a treatment method, such as a drug or vaccine, against the disease, no viable solutions to COVID-19 have been found yet. Various medical imaging techniques such as X-ray and computed tomography (CT) can be considered as important tools in the diagnosis of COVID-19 cases [6, 7]. Coronavirus usually causes lung infections. Therefore, chest X-ray and CT images are widely used by physicians and radiologists for an accurate and quick diagnosis in the patients infected with the virus.

Polymerase chain reaction (PCR) test method is widely used for the diagnosis of COVID-19. However, the test is not always accessible at all healthcare points. It must be also noted that compared to PCR tests, X-ray and CT-based imaging techniques are usually more reliable and accessible. When CT and X-ray methods are compared, X-ray machines are preferred more by radiologists and physicians because of their accessibility nearly in every location including remote rural areas, cost-effectiveness, and their capacity to perform imaging in a fairly short period of time [5]. However, it is also time-consuming for physicians and radiologists to evaluate the patients’ X-ray images. Furthermore, it also runs the risk of inaccurate diagnosis because the detection of infected areas in an image requires technical know-how and medical experience. Therefore, an accurate and quick computer-assisted diagnosis system is needed for COVID-19 cases. The following literature review indicated that deep learning (DL) algorithms were used in order to diagnose COVID-19 in X-ray images succesfully [5, 8,9,10,11,12].

Introduced by Kononen [13] in 1998, tissue microarray (TMA) is an innovative and high-performance technique used for the analysis of multiple tissue samples. It is a high-end technology with a remarkable performance and has been used in the analysis of molecular identifiers recently. There is sufficient evidence to claim that epidermal growth factor receptor (EGFR) plays an important role in tumor development [14]. In parallel with this, it was also observed that EGFR played an important role in the initation and progress of colorectal cancer [15].

The present study proposes a convolutional neural network (CNN) classification approach with optimized hyperparameters using gradient-based optimizer (GBO) algorithm [16]. CNN is the most widely used DL model. The proposed approach was used to classify COVID-19, normal, and viral pneumonia. In addition, it can be also used to classify other types such as epithelial and stromal regions in EFGR-colon in digitized tumor TMAs.

Real-world applications in many different fields such as medicine, agriculture, and engineering can be approached as an optimization problem. To this day, numerous optimization approaches have been developed in order to solve real-world problems in an effective way. However, high-performance optimization approaches are needed due to the fact that the difficulty of these optimization problems is increasing day by day. In this respect, metaheuristic algorithms (MAs), which are known as global optimization techniques, have been widely used to solve challenging optimization problems [17,18,19,20,21,22].

Artificial neural network (ANN) is an important machine learning approach inspired by the neural system in human mind. It involves an input layer, hidden layer, and output layer, and aims to adjust optimal values in relation with the weight of each neuron in ANN following a training process [23]. The performance of an ANN structure is heavily affected by the number and variety of training data. If an insufficient number of data is used in the training process, the performance of ANN is very likely to decrease.

Various changes have been so far applied to ANN structure to design feedback and multi-layer model structures, which paved the way for the solution of non-linear problems. With the advent of multi-layer neural network models, the number of layers in an ANN structure has also increased and led to the development of CNN, which is a high-performance version of ANN models. Introduced during the 1990s, CNN was not preferred due to computer hardware incapacity in this period [23]. However, thanks to the technological developments in computer hardware and graphical processing unit (GPU) in the following years, CNN performances have also increased remarkably in recent years, and it became one of the most widely used machine learning approaches in various fields such as health, transportation, security, stock exchange, and law.

Various CNN architectures have been so far proposed in the existing literature, as manifested by several examples such as MobileNet-V2, ShuffleNet, GoogleNet, VGG-16, VGG-19, and AlexNet. In these CNN architectures, hyperparameters such as learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold are known to affect CNN performance directly. Therefore, it is not surprising that various studies in the existing literature attempted to offer solutions to the optimization of these hyperparameters.

The present study benefited from AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet architectures for the proposed approach, i.e., a COVID-19 and colon cancer diagnosis system with optimized hyperparameters using GBO. In order to optimize hyperparameters such as learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold in these architectures, GBO algorithm proposed by Ahmadianfar et al. [16] was used in the present study. Inspired by Newton’s method, GBO is one of the most recent metaheuristic optimization approaches. The present study aims to optimize hyperparameters in AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet and increase its classification performance.

The main contributions of the present study can be summarized as follows:

  1. 1)

    The present study proposes a high-performance approach which can classify both COVID-19 and colon cancer in TMAs. No approach which can classify both diseases has been so far proposed in the current literature.

  2. 2)

    The proposed COVID-CCD-Net approach benefits from GBO [16] algorithm proposed in 2020 in order to optimize hyperparameters in AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet.

  3. 3)

    The present study aims to obtain a high level of accuracy with a low value of epoch in AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet architectures in the proposed COVID-CCD-Net approach. On the other hand, the non-optimized CNN methods obtained a much lower level of accuracy with the same value of epoch.

The organization of the present study is as follows: “Section 2” describes the related works. “Section 3” presents gradient-based optimizer and convolutional neural networks. “Section 4” describes the proposed COVID-CCD-Net approach. “Section 5” presents experiments and results, and Section 6 concludes the study.

2 Related works

2.1 Hyperparameter optimization

In order to optimize hyperparameters in CNN, various approaches such as adaptive gradient optimizer [24], Adam optimizer [25], Bayesian optimization [26], equilibrium optimization [27], evolutionary algorithm [28], genetic algorithm [29], grid search [30], particle swarm optimization [31, 32], random search [30, 33], simulating annealing [33], and tree-of-parzen estimators [33], whale optimization algorithm [34], and weighted random search [35] have been so far proposed. random search, simulating annealing, and tree-of-parzen estimators.

In addition to its comprehensivess as a searching algorithm, grid search aims to identify the most optimal values for hyperparameters through a manually specified subset of hyperparameter space [36]. However, since the grid of configurations grows exponentially depending on the number of hyperparameters during the hyperparameter optimization process, the algorithm is not often useful for the optimization of deep neural networks [36]. During the hyperparameter optimization in CNN, it may take a few hours or a whole day to evaluate a hyperparameter selection, which causes serious computational problems. Similar to grid search algorithm, random search algorithm too encounters various disadvantages in sampling a sufficient number of points to be evaluated [37].

Bayesian optimization has been a popular technique for hyperparameter optimization recently [38]. One of the main advantages in Bayesian optimization–based neural network optimization is that it does not require running neural network completely. On the other hand, its complexity and high-dimensional hyperparameter space makes Bayesian optimization an impractical and expensive approach for hyperparameter optimization [36].

One of the biggest disadvantages of genetic algorithm is that it usually becomes stuck in a local optimal value and, as a result, results in yielding early convergence and non-optimal solutions [39]. Therefore, hyperparameter optimization techniques which benefit from genetic algorithm–based approaches are also likely to be problematic.

Lima [33] compared various hyperparameter optimization algorithms such as random search, simulating annealing, and tree-of-parzen estimators in order to find the most effective CNN architecture in the classification of benign and malignant small pulmonary nodules. Kumar and Hati [24] proposed the adaptive gradient optimizer–based deep convolutional neural network (ADG-dCNN) approach for bearing and rotor faults detection in squirrel cage induction motor. Ilievski et al. [40] used radial basis function (RBF) as a surrogate of hyperparameter optimization in order to reduce the complexity of original network. Talathi [41] proposed a simple sequential model based optimization algorithm in order to optimize hyperparameters in deep CNN architectures.

Rattanavorragant and Jewajinda proposed an approach using an island-based genetic algorithm in order to optimize hyperparameters in DNN automatically [42]. This approach involves two steps: hyperparameter search and a detailed DNN training. Navaneeth and Suchetha proposed the optimized one-dimensional CNN with support vector machine (1-D CNN-SVM) approach in order to diagnose chronic kidney diseases using PSO algorithm [43].

Compared to the literature review above, the main contribution of the present study is that the proposed COVID-CCD-Net approach can detect two important diseases: COVID-19 and colon cancer in TMAs. In addition, the proposed approach benefits from GBO, which is a metaheuristic approach, for the optimization of CNN models to overcome various problems mentioned in the existing literature.

2.2 Deep learning approaches for COVID-19

In recent times, many studies focusing on the diagnosis of COVID-19 using CNN have been published [44,45,46,47,48,49,50]. The literature review indicates that some of these studies [45,46,47] focused on the diagnosis of COVID-19 in non-COVID cases. On the other hand, there are also studies which classified cases into three groups as COVID, normal, and pneumonia [48,49,50]. Within the framework of the present study, the proposed COVID-CCD-Net approach classifies chest X-ray images into three different groups as COVID, normal, and pneumonia.

Shi et al. [51] performed a detailed literature review regarding the state-of-the-art computer-assisted methods for the diagnosis of COVID-19 in X-ray and CT scans. Castiglioni et al. [52] benefited from two chest X-ray datasets containing 250 COVID-19 and 250 non-COVID cases in order to perform training, validation, and testing processes for Resnet-50.

Hemdan et al. [53] proposed a deep learning–based approach called COVIDX-Net in order to diagnose COVID-19 in chest X-ray images automatically. This study involved seven different deep architectures, namely MobileNetV2, VGG19, InceptionV3, DenseNet201, InceptionResNetV2, ResNetV2, and Xception. Khan et al. [54] proposed a CNN-based approach called CoroNet in order to diagnose COVID-19 using X-ray and CT scans based on Xception architecture. The experimental studies demonstrated that the proposed model yielded an overall accuracy rate of 89.6% in four different classes (COVID vs. pneumonia bacterial vs. pneumonia viral vs. normal) and an overall accuracy rate of 95% in three different classes (normal vs. COVID vs. pneumonia).

The proposed COVID-CCD-Net approach differs from other studies on the detection of COVID-19 using CNN models in that it improves classification performance by optimizing hyperparameters of CNN models thanks to GBO approach.

2.3 Computer-aided colon cancer detection approaches

As can be seen in various studies in the existing literature, the number of studies dealing with automatic diagnosis of colon cancer in TMAs is limited. Nguyen et al. [55] analyzed different ensemble approaches for colorectal tissue classification using highly efficient TMAs and proposed an ensemble deep learning–based approach with two different neural network architectures called VGG16 and CapsNet. Thanks to this approach, they classified colorectal tissues in highly efficient TMAs into three different categories, namely tumor, normal, and stroma/others.

Xu et al. [56] proposed a deep CNN approach in order to perform the segmentation and classification of epithelial ve stromal regions in TMAs. This study benefited from two different datasets containing breast and colorectal cancer images. Finally, Linder et al. [57] proposed an approach for an automatic detection of epithelial ve stromal regions in colorectal cancer TMAs thanks to texture features and a SVM classifier.

The proposed COVID-CCD-Net approach is superior to other studies on the detection of colon cancer in TMAs using CNN models in that it optimizes the hyperparameters of CNN models, which significantly increases the detection accuracy rates of colon cancer. The effective performance of CNN in image classification contributes to the present study to a higher extent compared to other studies using other approachs for the classification of colon cancer in TMAs in the existing literature.

3 Theoretical background

3.1 Gradient-based optimizer

Inspired by gradient-based Newton’s method, GBO was proposed by Ahmadianfar et al. [16] as one of the most recent metaheuristic algorithms. This algorithm is based on two main operators: gradient search rule (GSR) and local escaping operator (LEO). Main steps of GBO are described below.

3.1.1 Initialization process

In GBO, each member of the population is called a “vector” and, as seen in Eq. 1, the population consists of N number of vectors in a D-dimension search space.

$${X}_{n,d}=\left[{X}_{n,1},{X}_{n,2},\dots, {X}_{n,D}\right],n=1,2,\dots, N,d=1,2,\dots, D$$
(1)

As shown in Eq. 2, each vector in the initial population is created by assigning random values within the boundaries of search space.

$${X}_n={X}_{\mathrm{min}}+\operatorname{rand}\left(0,1\right)\times \left({X}_{\mathrm{max}}-{X}_{\mathrm{min}}\right)$$
(2)

Here, Xmin and Xmax are lower and upper boundaries in the search space, respectively, while rand(0,1) is a random number in a range of [0,1].

3.1.2 Gradient search rule

GSR operator is used in GBO in order to increase exploration ability, eliminate local minimum, and accelerate the convergence rate. Thus, optimal solutions can be obtained within the search space [16].

The position of a vector in the next iteration (xnm+1) is calculated using Eqs. 3 and 4 with: X1nm, X2nm, and xnm, which denotes the current position of the vector.

$${x}_n^{m+1}={r}_a\times \left({r}_b\times X{1}_n^m+\left(1-{r}_b\right)\times X{2}_n^m\right)+\left(1-{r}_a\right)\times X{3}_n^m$$
(3)
$$X{3}_n^m={x}_n^m-{p}_1\times \left(X{2}_n^m-X{1}_n^m\right)$$
(4)

ra and rb are random numbers in a range of [0, 1]. X1nm and X2nm in this equation are shown in the following equations:

$$X{1}_n^m={x}_n^m-\mathrm{GSR}+ DM$$
(5)
$$X{2}_n^m={x}_{\mathrm{best}}-\mathrm{GSR}+ DM$$
(6)

Here, xnm and xbest are the current position and the best vector in the population, respectively. GSR denotes the gradient search rule, while DM represents the direction of movement. GSR enables GBO to assign randomly, improve its exploration ability and eliminate local minimals. GSR can be calculated as shown in the following equations [16]:

$$\mathrm{GSR}=\mathrm{randn}\times p1\times \frac{2\Delta x\times {x}_n}{\Big({x}_{\mathrm{worst}}-{x}_{\mathrm{best}}+\varepsilon\Big) }$$
(7)
$$\Delta x=\operatorname{rand}\left(1:N\right)\times \mid \mathrm{step}\mid$$
(8)
$$\mathrm{step}=\frac{\left({x}_{\mathrm{best}}-{x}_{r1}^m\right)+\delta }{2}$$
(9)
$$\delta =2\times \operatorname{rand}\times \left(\left|\frac{x_{r1}^m+{x}_{r2}^m+{x}_{r3}^m+{x}_{r4}^m}{4}-{x}_n^m\right|\right)$$
(10)

Here, rand(1:N) is an N-dimensional random number, r1, r2, r3, and r4 denote random integer numbers selected from a range of [1, N], and, finaly, step represents the step size.

DM shown in Eq. 11 helps the current position of the vector (xn) move along the direction of xbest - xn and thus provides local searching in order to improve convergence speed of GBO [16].

$$DM=\operatorname{rand}\times {p}_2\times \left({x}_{\mathrm{best}}-{x}_n\right)$$
(11)

Global exploration and local exploitation must be balanced in an algorithm in order to find solutions closer to a global optimal value. p1 and p2 parameters in Eqs. 4, 7, and 11 are used to balance exploration and exploitation in GBO [16]. These parameters are calculated using the following equations:

$${p}_1={p}_2=2\times \operatorname{rand}\times \alpha \times \alpha$$
(12)
$$\alpha =\left|\beta \times \sin \left(\frac{3\pi }{2}+\sin \left(\beta \times \frac{3\pi }{2}\right)\right)\right|$$
(13)
$$\beta ={\beta}_{\mathrm{min}}+\left({\beta}_{\mathrm{max}}-{\beta}_{\mathrm{min}}\right)\times {\left(1-{\left(\frac{m}{M}\right)}^3\right)}^2$$
(14)

Here, βmin and βmax are 0.2 and 1.2, respectively, and m denotes the current number of iteration. M represents the maximum number of iteration.

3.1.3 Local escaping operator

LEO is used to improve efficiency of GBO. It can change the position of xnm+1 vector significantly. Thanks to LEO, XLEOm, which is a new vector, is created as shown in Eqs. 15 and 16, and assigned to xnm+1 vector, as shown in Eq. 17.

$${X}_{\mathrm{LEO}}^m={x}_n^{m+1}+{f}_1\times \left({u}_1\times {x}_{\mathrm{best}}-{u}_2\times {x}_k^m\right)+{f}_2\times {p}_1\times \frac{u_3\times \left(X{2}_n^m-X{1}_n^m\right)+{u}_2\times \left({x}_{r1}^m-{x}_{r2}^m\right)}{2},\kern0.5em \mathrm{if} \operatorname {rand}<0.5$$
(15)
$${X}_{\mathrm{LEO}}^m={x}_{\mathrm{best}}+{f}_1\times \left({u}_1\times {x}_{\mathrm{best}}-{u}_2\times {x}_k^m\right)+{f}_2\times {p}_1\times \frac{u_3\times \left(X{2}_n^m-X{1}_n^m\right)+{u}_2\times \left({x}_{r1}^m-{x}_{r2}^m\right)}{2},\kern0.5em \mathrm{if} \operatorname {rand}\ge 0.5$$
(16)
$${x}_n^{m+1}={X}_{LEO}^m$$
(17)

Here, f1 and f2 are random numbers generated in a range of [−1, 1], and u1, u2, and u3 are three randomly generated and different numbers, while xkm is a newly generated vector. u1, u2, u3, and xkm are defined as shown in the following equations:

$${u}_1=\left\{\begin{array}{c}2\times \operatorname{rand},\kern0.5em \mathrm{if}\ {\mu}_1<0.5\\ {}1\kern2em ,\kern0.5em \mathrm{else}\end{array}\right.$$
(18)
$${u}_2=\left\{\begin{array}{c}\operatorname{rand},\kern0.5em if\ {\mu}_1<0.5\\ {}1\kern2em ,\kern0.5em \mathrm{else}\end{array}\right.$$
(19)
$${u}_3=\left\{\begin{array}{c}\operatorname{rand},\kern0.5em \mathrm{if}\ {\mu}_1<0.5\\ {}1\kern2em ,\kern0.5em \mathrm{else}\end{array}\right.$$
(20)
$${x}_k^m=\left\{\begin{array}{c}{x}_{\mathrm{rand}},\kern0.5em \mathrm{if}\ {\mu}_1<0.5\\ {}{x}_p^m\kern1em ,\kern0.5em \mathrm{else}\end{array}\right.$$
(21)

Here, rand, μ1, and μ2 are random numbers in a range of [0, 1], xrand denotes a randomly generated new vector, and xpm is a vector randomly selected from the population [16]. Flowchart of the GBO is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the GBO

3.2 Convolutional neural networks

Convolutional neural networks (CNN) is a special type of neural network inspired by the biological model of animal visual cortex [58, 59]. They are particularly used in the field of image and sound processing due to their main advantage: the extraction of automatic and adaptive features during a training process [60]. In CNNs, the variable of the network structure (kernel size, stride, padding, etc.) and the network trained (learning rate, momentum, optimization strategies, batch size etc.) are known as hyperparameters [29], which must be adjusted accurately for a more effective CNN performance.

In the present study, learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold value, which are among network trained hyperparameters of AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet, were optimized using GBO algorithm. Learning rate, which is also known as step size, is decisive in terms of updating weights [61, 62]. Solver, on the other hand, represents the optimization method to be used such as Adam, Sgdm, or Rmsprop [63]. The L2 regularization, which is also called weight decay, is a simple regularization method that scales weights down in proportion to their current size [64, 65]. Gradient threshold method and gradient threshold value are parameters related to gradient clipping. If the gradient increases exponentially in magnitude, it means that the training is unstable and can diverge within a few iterations. Gradient clipping helps avoid the exploding gradient problem. If the gradient exceeds the value of gradient threshold, then the gradient is clipped according to gradient threshold method [66, 67].

Input image size in AlexNet architecture, developed by Krizhevsky et al. [68], is 227×227. It consists of 5 convolution and 3 fully connected layers, thus reaching a depth of 8 layers. DarkNet-19 has a depth of 19 layers and its input image size is 256×256 [69]. Introduced by Szegedy et al. [70], Inception-v3 model has a depth of 48 layers with an input image size of 299×299. ResNet-18, which has a depth of 18 layers and an input image size of 224×224, was developed by He et al. [71]. Zhang et al. [72] proposed ShuffleNet model with a depth of 50 layers and an input image size of 224×224. Finally, MobileNet, which was proposed by Sandler et al. [73], has a depth of 53 layers and an input image size of 224×224.

4 Hyperparameter optimization of CNN models using gradient-based optimizer

In the present study, hyperparameters of AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet CNN models such as learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold value were optimized using GBO algorithm in order to classify COVID-19, normal, and viral pneumonia in chest X-ray images. In addition, other types such as epithelial and stromal regions in epidermal growth factor receptor (EFGR) colon in TMAs can also be classified. The proposed approach is called COVID-CCD-Net, as shown in the flowchart in Fig. 2.

Fig. 2
figure 2

Flowchart of COVID-CCD-Net

In the proposed COVID-CCD-Net approach, initial parameters of GBO such as ε, the number of population and maximum number of iteration are adjusted. Then, an initial population is created by using vectors with randomly assigned values. Each vector consists of 5 dimensions which represent learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold parameters of CNN models. Lower boundary (LB) and upper boundary (UB) values of these parameters are given in Table 1. Learning rate, L2 regularization, and gradient threshold are real values which are randomly generated between LB and UB values. If the solver value is 1, 2, or 3, “sgdm,” “adam,” and “rmsprop” optimization method is selected, respectively. If the gradient threshold method value is 1, 2, or 3, “l2norm,” “global-l2norm,” and “absolute-value” method is selected, respectively. In parallel with these boundaries, each vector in the initial population is generated using the formula in Eq. 22:

$${X}_{n,i}={LB}_i+\operatorname{rand}\left(0,1\right)\times \left({UB}_i-{LB}_i\right),n=1,2,\dots, N\ \mathrm{and}\ i=1,2,\dots, 5$$
(22)
Table 1 Hyperparameters to be optimized and their ranges

The following steps are taken in order to calculate the fitness value of each vector: Firstly, Xn vector whose fitness value will be calculated is sent to CNN model and the values of Xn vector are assigned to learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold parameters of CNN model. Later, CNN model is trained using the training dataset. Following the training processes, validation accuracy value obtained from the training is sent back to GBO and assigned as the fitness value of Xn vector.

As shown in Fig. 2, each step of the algorithm is iterated until it reaches a maximum number of iterations. At the end, the vector with the most optimal fitness value is accepted as the solution of the problem.

5 Experiments and results

The present study proposes the COVID-CCD-Net approach in which learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold parameters of AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet were optimized using GBO. The classification performance of the proposed approach was tested using two different medical image classification datasets. Additionally, the results of this test were compared with those obtained from non-optimized AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet CNN models. In addition, Quasi-Newton (Q-N) algorithm [74], one of the most fundamental optimization methods, was also used to optimize the hyperparameters of CNN models and compared with the proposed COVID-CCD-Net approach. The following sub-sections describe medical image classification datasets, experiment setup, and present comparative experimental findings.

5.1 Medical image classification datasets

COVID-19 [75, 76] and Epistroma [77] datasets were selected for the experimental studies. COVID-19 dataset consists of three classes, namely “Covid-19,” “Normal,” and “Viral Pneumonia,” with a total of 3829 images. Epistroma dataset, on the other hand, consists of two classes, namely “epithelium” and “stroma,” with a total of 1376 images. In both datasets, 80% and 20% images were used for training and testing processes, respectively and we have performed 5-fold cross-validation. Ten percent of the training data in each data set was also used for validation. Samples images from both datasets are shown in Fig. 3.

Fig. 3
figure 3

Sample images from datasets

5.2 Experimental setup

All experimental studies were carried out on MATLAB R2020a platform. The number of vectors in GBO population and the maximum number of iterations were selected as 10 in the proposed COVID-CCD-Net approach. In other words, the fitness function is called 100 times. Q-N algorithm performs search starting at a single point instead of a population-based search. For a healthier comparison with the proposed approach, the number of maximum iterations was selected as 100 in Q-N algorithm to call the fitness function 100 times. In addition, default MATLAB values for solver, L2 regularization, gradient threshold method, and gradient threshold parameters were selected as “sgdm,” “0.0001,” “l2norm,” and “Inf,” respectively for non-optimized CNN models. Values of epoch for all CNN models were selected as 2 for COVID-19 dataset as 5 for Epistroma dataset. Mini batch size was set to 25. Twenty independent experimental studies were conducted on these datasets for all CNN models, and the obtained mean accuracy, maximum accuracy, F1-score, and standard deviation values were compared to measure the performances of all models.

5.3 Experimental results

Mean accuracy, maximum accuracy, F1-score, and standard deviation values obtained from 20 different independent studies on COVID-19 and Epistroma datasets are given in Tables 2 and 5, respectively. The findings were also shown in bar charts in Figs. 4 and 5 to give a clearer picture of the overall findings.

Table 2 Accuracy, F1-score and Std. dev. results of validation and test for the COVID-19 dataset
Fig. 4
figure 4

Bar charts for the COVID-19 dataset

Fig. 5
figure 5

Bar charts for the Epistroma dataset

The findings related to COVID-19 dataset demonstrated that in the training process, COVID-CCD-Net (ResNet-18) reached the highest mean validation accuracy, maximum validation accuracy, and F1-score values with 97.977, 98.532, and 98.063, respectively. The second highest values were yielded by COVID-CCD-Net (DarkNet-19) with 97.553, 98.532, and 97.654, while non-optimized MobileNet displayed a lower performance with 82.007, 86.134, and 81.716. In the testing process, COVID-CCD-Net (ResNet-18) classified test images with a mean accuracy rate of 98.107%, followed by Darknet-19 with a mean accuracy rate of 97.369%. MobileNet displayed the lowest performance in terms of training and testing. validation and test accuracy for COVID-19 dataset before and after optimization with COVID-CCD-Net are given in Table 3 and the results demonstrated that COVID-CCD-Net increased the classification performance of the non-optimized CNN models by 6.22–13.29%. The performance was improved when Q-N algorithm was used to optimize the hyperparameters of non-optimized CNN models. However the performance increased between 2.92 and 8.40%, demonstrating that GBO displays a higher performance in the hyperparameter optimization in COVID-19 dataset.

Table 3 Validation and test accuracy before and after optimization with COVID-CCD-Net on the COVID-19 dataset

It can understand from the findings related to Epistroma dataset that in the training process, the highest mean accuracy, maximum accuracy, and F1-score values were obtained by COVID-CCD-Net (Inception-v3) with 99.705, 100, and 99.692, respectively. Similarly, COVID-CCD-Net (Inception-v3) also yielded the highest values in the testing process with 98.964, 99.636, and 98.924. It was followed by ResNet-18 with 99.545, 100, and 99.526 for the training and 98.836, 99.636, and 98.793 for the testing process. On the other hand, the lowest performance in the training and testing process was displayed non-optimmized ShuffleNet with 89.454, 92.273, and 89.149 and 90.491, 93.818, and 90.270, respectively. validation and test accuracy for epistroma dataset before and after optimization with COVID-CCD-Net are given in Table 4 and the results demonstrated that COVID-CCD-Net increased the classification performance of the non-optimized CNN models by 2.11–6.81%. The performance was improved when Q-N algorithm was used to optimize the hyperparameters of non-optimized CNN models. It can be seen in Table 5 that the performance increased between 1.81 and 5.43%, demonstrating that GBO displays a higher performance in the hyperparameter optimization in Epistroma dataset.

Table 4 Validation and test accuracy before and after optimization with COVID-CCD-Net on the Epistroma dataset
Table 5 Accuracy, F1-score, and Std. dev. result of validation and test for the Epistroma dataset

As shown in Tables 2 and 5, GBO algorithm remarkably improves the performance of non-optimized CNN models in COVID-19 and Epistroma datasets. Additionally, experimental studies indicated that GBO algorithm displayed a higher performance in hyperparameter optimization in both datasets compared to Q-N algorithm.

Mean training accuracy curves of all models obtained from COVID-19 dataset are shown in Fig. 6. While COVID-CCD-Net (ResNet-18) displayed a faster convergence, non-optimized MobileNet displayed a slower convergence. Mean training accuracy curves of all models obtained from Epistroma dataset are shown in Fig. 7, COVID-CCD-Net (Inception-v3), COVID-CCD-Net (ResNet-18), and COVID-CCD-Net (DarkNet-19) displayed a fast convergence in the first 20 iterations and a lower convergence in the remaining iterations.

Fig. 6
figure 6

Mean training accuracy curves for the COVID-19 dataset

Fig. 7
figure 7

Mean training accuracy curves for the Epistroma dataset

Maximum and mean confusion matrix values of all models obtained from the testing processes for COVID-19 and Epistroma datasets are shown in Fig. 8 and Fig. 9. A confusion matrix is a table which is used to describe the performance of a model by referring to its accuracy rates in each class. Rows and columns in a confusion matrix correspond to the predicted class (output class) and true class (target class), respectively.

Fig. 8
figure 8

Confusion matrices of COVID-19 dataset

Fig. 9
figure 9

Confusion matrices of Epistroma dataset

The receiver operating characteristic (ROC) curves of COVID-19 and Epistroma datasets are provided in Fig. 10 and Fig. 11 respectively, which showing the relationship between the false positive rate (FPR) and the true positive rate (TPR). It can be clearly seen, in COVID-19 dataset COVID-CCD-Net (ResNet-18) and in Epistroma dataset COVID-CCD-Net (Inception-v3) have higher true positive rates.

Fig. 10
figure 10

ROC curves for COVID-19 dataset

Fig. 11
figure 11

ROC curves for Epistroma dataset

Table 6 and Table 7 compare the performance of the COVID-CCD-Net with several state-of-the art methods on COVID-19 and Epistroma datasets. It can be seen obviously; the the COVID-CCD-Net has the highest classification accuracy among the compared methods for both datasets.

Table 6 Comparison of the results with state-of-the art CNN methods for COVID-19 dataset
Table 7 Comparison of the results with state-of-the art CNN methods for Epistroma dataset

6 Conclusion

In order to classify Covid-19, normal, and viral pneumonia in chest X-ray images as well as epithelial and stromal regions in TMA images accurately, the present study proposed the COVID-CCD-Net approach with the optimized hyperparameters of AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet CNN models using GBO, which is one of the most recent metaheuristic optimization algorithms. Network-trained parameters of these CNN models such as learning rate, solver, L2 regularization, gradient threshold method, and gradient threshold were optimized and tuned using GBO algorithm. In the GBO, each vector of the population represents a set of CNN’s hyperparameters, and the algorithm searches for the hyperparameter values that help the model display the highest classification performance. Two different medical image classification datasets, i.e., COVID-19 and Epistroma, were used in the experimental study. While GBO hyperparameter optimization improved the performance of non-optimized CNN models in COVID-19 dataset by 6.22% to 13.29%, the contribution of Q-N algorithm did not exceed 2.92% to 8.40%. Similarly, GBO hyperparameter optimization improved the performance of non-optimized CNN models in Epistroma dataset by 2.11% to 6.81%, Q-N algorithm improved it only 1.81% to 4.53%. These results demonstrated that the proposed approach significantly improved the classification performance of AlexNet, DarkNet-19, Inception-v3, MobileNet, ResNet-18, and ShuffleNet CNN models and displayed a better performance compared to non-optimized CNN models. One of the main problems in CNN-based classification approaches is their need for a high number of high-quality images for a succesful classification performance and optimal values for the hyperparameters of CNN architecture. In the present study, a sufficient number of images was used to complete training process for CNN architecture, and the proposed COVID-CCD-Net approach was used to optimize the hyperparameters of CNN architectures to overcome the above-mentioned problems. Future studies will focus on the optimization of different hyperparameters such as filter size, filter number, stride, and padding using various metaheuristic optimization algorithms.