1 Introduction

Coronavirus disease 2019 (COVID-19) has been at an outburst since December 2019, as one of the deadliest pandemics in over the decades and has since then been escalating. It has disrupted the economy and caused turbulences as it could never be imagined. With its highly mutating positive centered single-stranded RNA and no cure in hand, India has reached among world’s second highest rated countries on the count of people diseased. In addition to the ranking, India has over 10,950,201 cases with a death toll of 156,014. Countries like the USA, Spain, India, Italy, China, UK, Iran etc. have been suffering from several forms of COVID-19 and which is also widely present in humans, cats, dogs, pigs, poultry and rodents in different forms. The discernible symptoms of COVID-19 are sore throat, loss of smell and taste, fatigue, fever, running nose and cough. It targets weakening the immune system and has proven to be fatal with increased chances of inception in the age bar of 45–60 years. Acute respiratory symptoms like difficulty in breathing, weakness, chest pain can be an indication. Being a contagious disease, it has been spreading all over the globe rapidly. It can proliferate through physical touch, breath contact, contact with the hand or contact with the mucus. The world has witnessed a pandemic of such severity in the bygone centuries as well. Severe acute respiratory syndrome (SARS) and Middle East respiratory syndrome (MERS) belong to the same category. The worldwide count of the infected people is appalling, with 109,594,835 people being infected and 2,424,060 resulting in death as of February 19, 2021. The fatality count has been escalating exponentially all over the world.

COVID-19 earmarks people with chronic health problems and for the elderly. The common symptoms of COVID-19 aren’t yet so futile that may further result in pneumonia, multi-organ failure and death. One way to test them is the laboratory testing which turns out to be highly time consuming and costly as well requires a well-developed laboratory for analysis. To the rescue comes the chest X-ray diagnosis which speeds up the diagnosis process. It is widely used for other typical and viral pneumonia diseases such as influenza, SARS, and MERS. One of the other better ways of medication is detection of disease at an early stage and instant quarantine due to lack of proper medication. The Chinese government maneuvered real-time polymerase chain reaction (RT-PCR) for the diagnosis of the disease. However, it did not provide accurate results because of which patients were unable to be diagnosed and treated on time. Moreover, as a result of the inaccuracy, with high false negative reports and time consumption, the infection keeps on infiltrating into a healthy person’s body, being a highly communicable disease. A suitable alternative has been found for this cumbersome process. Infected people show bilateral changes in chest X-ray images. This replacement process for detection can provide a large amount of pathological knowledge for the infection as well. This further leads to fabricating a Deep Learning-based research technique to analyze the chest X-ray without radiologist's intervention. This paper addresses the following research problems:

  • The custom CNN and the pre-trained models always follow the conventional optimization for any problem to fit. Any upgradation for the performance improvement has to be derived from the enhanced base learning by capturing required patterns.

  • Most of the existing research work explores the pre-trained models and regular CNN architecture rather than customized architecture for COVID-19 detection.

The main objective of the paper is experimentation of a fast diagnostic method, i.e. chest X-ray classification for COVID-19 infected patients. Deep convolutional neural networks were employed and tested for a deep-learning method for determining if the patient is affected or not. By considering sensitivity and specificity, a multi objective fitness function is designed to classify COVID-19 infected patients. This paper proposes a convolutional neural network (CNN) based model for analysis/detection of COVID-19, dubbed as CovCNN in the remainder of this paper. The proposed CovCNN model is trained by considering the chest X-ray images of COVID-19 patients. The major contributions of the paper are:

  • The proposed custom created models been developed with three deep learning architectures such as CovCNN_1, CovCNN_2, CovCNN_3 to set the baseline for the final proposed custom model i.e., CovCNN_4.

  • The proposed model integrated various convolution models with effective activation layers to support better learning for extracting elite patterns to be classified.

  • The data is also tested on different pre-trained transfer learning models and hyperparameter tuning has been carried out to achieve optimal accuracy.

  • The results for both custom and pre-trained models are shown in the form of ROC curve, training and validation loss respectively

The remainder of the paper is structured as follows: Sect. 2 discusses the existing work in the field of COVID-19; Sect. 3 explains the proposed models, architectures developed; Sect. 4 details the experimental analysis and results of the proposed and pre-trained models and Sect. 5 concludes the work with the future scope.

2 Related work

Various machine learning models are specifically investigated by researchers for detecting the disease amongst COVID-19 X-ray dataset. Machine learning methods come in handy in critical tasks. Henceforth, deep neural technique by computer vision method can be used to detect COVID-19. Without the need for manual extraction of features, deep learning, a quite successful Artificial Intelligence (AI) research field, allows the development of end-to-end models to achieve expected outcomes with the input data. Additionally, it is also used for identification of several other problems like arrhythmia, recognition of skin cancer, breast cancer detection, diagnosis of brain disease, pneumonia identification from chest X-ray images and segmentation of the fungus images. Pneumonia in infants is diagnosed by Sousa et al. [27] through computer-aided systems from radiographic images using support vector machine (SVM), K-Nearest Neighbor, and Naïve Bayes, where SVM outperformed the others. Turker et al. [28] proposed a novel method to detect COVID-19 with residual exemplar local binary pattern (ResExLBP), using the dataset that included 87 X-ray images with COVID-19 disease that includes 26 female, 41 male and 20 are not determined. Deep learning based model is proposed by [3] using 1020 CT slices from 108 patients with AlexNet, VGG-16, VGG-19, SqueezeNet, GoogleNet, MobileNet-V2, ResNet- 18, ResNet- 50, ResNet-101, and Xception where ResNet-101 and Xception outperformed. RT-PCR sensitivity and chest CT, for COVID-19 detection was researched by Fang et al. [7]. The travel history and signs of two patients gives better analysis of chest sensitivity of CT and has superior detection than RT-PCR. With a whopping 97.4% accuracy, Mahmud et al. [17] proposed CovXNets for detecting the COVID-19 and other pneumonia using chest X-ray images. The Guangzhou Medical Center, China analyzed two datasets including 5856 images, parallel to the other dataset which included 305 X-rays of COVID-19 collected from Sylhet Medical College, Bangladesh. A deep neural network technique was further proposed by Panwar et al. [22] to analyze COVID-19 using nCOVnet that includes 24 layers where the first layer is the input layer and the other layers are the combination of convolution + rectified linear unit (ReLU) and max pooling layers. This scheme achieved an accuracy of 97.97% confidence. Similarly 121 chest CT images were analyzed by Berheim et al. [5] to identify the relationship between the symptoms among the patients. To distinguish between communities acquired pneumonia and other non-pneumonia lung diseases, Li et al. [15] extracted the features from CT scans using the deep learning model COVNet. With an astounding accuracy of 98.2% sensitivity and 92.2% specificity, Gozes et al. [10] successfully performed AI based tool to test COVID-19. Shan et al. [24] developed a VB-net deep learning model to segment the infection sites in the CT scans images. For discriminating pneumonia and influenza, Lui et al. [16] processed the data using a CNN based prediction model and achieved 86.7%. Wang et al. [31] achieved 89.5% prediction level using a modified inception transfer learning model. Narin et al. [19] performed an automatic deep CNN-based prediction using chest X-ray images with ResNet50 achieving 98% accuracy. Sethy et al. [23] extracted features from the chest X-ray images using deep learning techniques and classified using SVM and achieved 95.38% accuracy. Using chest X-ray images to accurately diagnose binary and multi class classification, Ozturk et al. [21] achieved an accuracy of 98.8% for binary classification and 87.02% for multi-class using DarkNet model for COVID-19 detection. Hemdan et al. [11] made use of COVIDX-Net deep learning model to diagnose COVID-19 where Singh et al. [25] used chest CT images and fine-tuned the CNN parameters to achieve an optimized CNN to classify COVID-19 using a multi objective differential evolution. Wang et al. [30] proposed COVID-Net achieving 98.75% accuracy for multi-class classification to classify, normal, non-COVID pneumonia, and COVID-19 classes. Apostolopoulos et al. [1] achieved an accuracy of 98.75% using a deep learning model using 224 COVID-19 images. Zheng co-author [26] proposed a three-dimensional deep CNN model to detect COVID-19 from CT scan imagery and reported 90.8% accuracy. Song et al. [26] achieved an accuracy of 86% using modified Inception models for CT images.

COVIDX-Net [11], a model which involves 7 separate Deep CNN model architectures, including VGG19 and modified version of Google MobileNet, evaluated normalised intensities of the X-ray to diagnose COVID-19. CNN-based pre-trained models were used by Bi et al. [6] to diagnose contaminated patients with Coronavirus pneumonia with the help of chest CT Images. Wang et al. [32] proposed an novel transfer learning model (L2TFL) derived with an optimal layers. This model resulted with novel selection algorithm using a fusion approach. This model analysed 284 COVID-19 and 281 pneumonia images and 293 tuberculosis and 306 healthy images which achieved an accuracy of 95.61%, 96.25%, 98.30%, and 97.86% for four classes. Transfer Learning Models were experimented for classifying X-ray by Apostolopoulos [2] using off-shelf features for 3 class classifier that includes COVID-19, pneumonia and pulmonary diseases. Khan et al. analysed Xception as the head layer followed by CNN for classifying COVID-19 from chest X-ray [14]. Wang et al. [32] proposed an FGCNet a deep feature fusion model from graph convolution network (GCN) and CNN using 320 COVID-19 and 320 healthy images. The features are extracted using GCN and DFF technology is used for combining these features. This scheme achieved 15% performance improvement compared to others. Minaee et al. [18] classified COVID-19 using CNN along with models like ResNet18, ResNet50, SqueezeNet, and DenseNet-121 using 5000 chest X-ray images and achieved 98% of accuracy. Elene Firmeza Ohata et al. [20] analysed COVID-19 using CNN trained over imagenet using 194 COVID-19 chest X-rays and extracted features. These features are classified using k-Nearest Neighbor, Bayes, Random Forest, multilayer perceptron (MLP), and support vector machine (SVM).

In recent years, a lot of work has been done in the field of classifying the COVID-19 using X-ray images. It’s been challenging due to its inherent texture variations and similarity towards other diseases like pneumonia. Several other studies have been encountered to be developing for classification of COVID-19 based on computer vision algorithms.

2.1 Convolutional neural networks

Convolutional networks are influenced by biological processes [8, 9, 12, 13], where the pattern of communication between neurons resembles the response of a neuron in the visual cortex to a specific stimulus. Individual neurons respond to stimuli for its field of receptive zone. For spanning the entire field of vision, the receptive fields of the different neurons partly overlap. Multilayer perceptron means networks that are completely connected. Henceforth, each neuron in one layer is linked to all neurons in the next layer. CNN [33] is used extensively for the classification of images where the hierarchical structure and the extracted features make the CNN a dynamic model for image classification. A convolution neural network includes an input and output layer, as well as several hidden layers. The hidden layers include a series of convolution layers that convolve with an operation. The convolution layer includes convolution kernels (height and width), number of input and output channels and the convolution filters whose depth is like the feature map. The convolution operation is given in Eq. (1)

$${F}_{cov }\left(x,y\right)=\left(D*F\right)\left(x,y\right)= {\sum }_{i}{\sum }_{j}D\left(x+i,y+j\right)F\left(i,j\right)$$

where D refers the input matrix representing the input image, F is the 2-D filter of size x and y and \({F}_{cov}\) represents the output feature map. \(D*F\) represent the convolution operation.

By using regularized weights over fewer parameters, the vanishing gradient and exploding gradient problems seen in traditional neural networks during backpropagation are avoided [4, 29]. Figure 1 includes the sample convolution operation.

Fig. 1
figure 1

Example for convolution operation

The output of the convolution layer is fed to an activation function that introduces the non-linearity. Activation function works just like the human brain. The human brain decides how to react based on the perception, the activation function does the similar thing by deciding whether the neuron should be activated or not based on the inputs received and making the calculation using specific function. The non-linear activation function used in the deep learning networks is ReLU. It generates an output 0 for the value less than 0 or raw output otherwise. The mathematical representation of ReLU is given in Eq. (2).

$${R}_{cov }\left(x\right)=max(0 , x)$$

Subsampling or downsampling, also known as pooling, is a simple process to reduce the size or dimensionality of the feature map, while retaining the most important features. Max, average, and sum are the 3 types of pooling that can be applied. Pooling has the effect of reducing the dimensionality (width and height) of the previous layer by half. Moreover, it can thus remove 75% of the activations seen in the previous layer. Fully connected (FC) means that all nodes in one layer are connected to the outputs of the next layer. When each class is assigned a probability, the FC Layer outputs the class probabilities. In fully connected layer each neuron from previous layer is connected to every neuron in the next layer and every value contributes in predicting how well a value fits a particular class. The output of fully connected layer is then forwarded to an activation function which outputs the class scores. A Softmax function that measures the distribution of probability of the ‘n’ output classes is given as Eq. (3).

$${S}^{Cov }=\frac{{G}^{{Y}^{m}}}{{\sum }_{j=1}^{k}{G}^{{Y}^{j}}}$$

where Y refers the input feature vector and S denotes the output. Generally, the sum of the outputs is equal to 1. The loss function used is the cross entropy defined in Eq. (4).

$$L\left(DV\right)= - {\sum }_{i}{D}_{i}log\left({V}_{i}\right)$$

2.2 Transfer learning models

Transfer learning in its abstract form is a method of applying knowledge acquired from one task to solve the other similar problems. Here, the knowledge gained from pre-trained networks are applied for detecting the COVID-19 infected patients. In this research work, 7 pre-trained models are used to segregate COVID-19 patients from that of normal patients: ResNet-101, VGG-16, VGG-19, Inception-V3, ResNet-50V2, InceptionResNet-V2 and Xception. These deep transfer learning pre-trained models are used as a feature extractor and on top of this extractor training the classifier is performed. ResNet-101 and ResNet-50V2 have their special residual blocks where ResNet-101 includes 33 residual blocks and is 101 layers deep. Whereas ResNet-50V2 includes 16 residual blocks and is 50 layers deep. ResNet-50V2 activates the weight layer by applying batch normalization and ReLU activation prior to the convolution operation. VGG-16 and VGG-19 architecture proposed by Simonyan and Zisserman includes five convolution blocks and three fully connected layers. The difference between them is that VGG-16 includes 13 convolution layers whereas VGG-19 has 16 convolution layers. Inception-V3 is the third version of Google’s Inception CNN which is made up of different inception blocks where each block includes different sized convolution and pooling layers. Each inception block performs multi-level feature extraction. InceptionResNet-50V2 conflates the idea of Inception and ResNet to produce hybrid architecture with better performance. The InceptionResNet-50V2 is much deeper than Inception-V3 and it consists of 164 layers. Xception model proposed by Francois Chollet derived it from Inception. In contrast to Inception, Xception consists of depth wise separable convolutions in place of inception modules. It starts with convolution layers followed by depthwise separable convolutions with a total of 71 layers.

3 Proposed architecture

In this research work, a custom CNN model is presented along with seven well-known pre-trained models to diagnose the chest X-ray image of COVID-19. The proposed work (i.e. contributions of this paper) includes four variant folds of CovCNN model of which the last version (CovCNN-4) achieves the best results.

3.1 CovCNN_1 model

CovCNN_1 model includes two 2-D convolution layers and average pooling layers followed by activation layer with ReLU as the activation function. The initial convolutional layer uses 32, 5 × 5 pixel filters processing the image resulting in a 32 feature map. The 3 × 3 kernel size is used in the average pooling layer resulting in a dimensionality reduction. Similarly in-order to extract more fine-scale features of the image, a second convolution layer block with 64, 5 × 5 pixel filter, followed by a 3 × 3 average pooling layer kernel is resulting in 23 × 23 × 64 features. Flatten layer is stacked which acts as a utility layer for converting the output into a vector. Now in order to prevent the network from overfitting, 30% drop is performed to the less contributing neurons which in turn helps to lower the generalization error. Flattening is used to convert the n-dimensional data output resulted from the convolution layers, to 1-Dimensional vector, which is then fed to the fully connected network (FC). But instead of directly feeding the features to FC, dropout is performed to prevent the model from the over-fitting. Finally, the flattened output is fed to a feed-forward neural network with two dense layers, one with 256 neurons and ReLU activation function and the other with two neurons with sigmoid as an activation function.

ReLU maintains the sparsity and reduces the likely-hood of vanishing gradient that is more likely to happen in the dense model. Last layer includes sigmoid activation function to overcome dying ReLU problem that prohibits further learning. These fully connected layers act as classification layers and help in learning a nonlinear function from the non-linear combinations of the high-level features as represented by the output of the convolutional layers. Binary cross-entropy is used as a loss function and RMSprop is used as an optimizer during the training process. This choice of optimizer results in fast convergence as it uses exponentially decaying average of squared gradients and discards history from the extreme past. This model results in classification accuracy of 93.94%, specificity of 0.9091, and sensitivity of 1. As it is a shallow network, it performs decently and thus acts as a baseline model. Figure 2 shows the proposed flow diagram of CovCNN_1 model and its details. Appendix 1 shows the summary of the CovCNN_1 model.

Fig. 2
figure 2

Flow diagram of CovCNN_1 model

3.2 CovCNN_2 model

CovCNN_1 model is modified resulting in a CovCNN_2 except the last layer, loss function, and optimizer. Softmax activation function is used since the last layer includes 2 neurons which require the distribution of the probability throughout each output node. The performance of CovCNN_2 model achieved an improved accuracy of 95.45% with 1.00 as sensitivity and 0.93 as specificity. The focusing point is to achieve a high sensitivity, and since this model results 1, means the model has the ability to correctly identify those actually having the disease (true positive rate). Figure 3 shows the flow diagram of CovCNN_2 model and Appendix 2 details the model summary.

Fig. 3
figure 3

Flow diagram of CovCNN_2 model

3.3 CovCNN_3

CovCNN_1 was added with a new convolution block with 64, 3 × 3 filters and a max-pooling layer with a kernel size of 2 × 2. The number of neurons in the first-dense layer is decreased to 128. The newly added convolution layer performs an optimized extraction of fine-grained features. The chosen average pooling instead of max-pooling smoothens out the image resulting in the sharp features like lesion which depict the presence of the COVID-19 virus in the chest X-ray might be identified. Max-pooling gives an upper hand as it selects the brighter pixels from the image which is very useful when the background of the image is dark in the case of X-ray images. Though there is a decrease in sensitivity and specificity value, the accuracy of this model is 96.97% which is greater than the accuracy of previous model CovCNN_2 model. In order to achieve high accuracy as well as high sensitivity we further fine-tuned the parameters and the layers to achieve so. Figure 4 shows the flow diagram of CovCNN_3 model and its model summary is shown in Appendix 3.

Fig. 4
figure 4

Flow diagram of CovCNN_3 model

3.4 CovCNN_4

The fine-tuned CovCNN_4 deep network that includes 15 layers where the convolutional layers are stacked sequentially with increasing filter sizes. The filter size of all the convolutional layers is of 5 × 5 with ReLU as an activation function and max-pooling kernel size as 2 × 2. Three convolution blocks are introduced in this CovCNN_4 model that comprises of two 2-D convolutional layers the max-pooling layer. Each convolution block is followed by a dropout layer with a 30% rate to prevent the model from overfitting. The fine-tuned parameter helps a good gradient flow among the convolution blocks. As in the case of CovCNN_1 model, CovCNN_4 uses binary cross-entropy as loss function and RMSProp as optimizer. The performance of CovCNN_4 model outperforms every other model achieving 98.48% accuracy, 1.0 sensitivity, and 0.9773 specificity. Figure 5 shows the flow diagram of CovCNN_4 model and its model summary is given in Appendix 4.

Fig. 5
figure 5

Flow Diagram of CovCNN_4 Model

4 Results and discussions

This section presents a comprehensive view of the dataset, experimentation, model training, and validation. A performance comparison of the proposed approach with the existing works has also been presented in this section.

4.1 Data acquisition and pre-processing

The dataset coontaining COVID-19 radiography images are acquired from Kaggle (https://www.kaggle.com/tawsifurrahman/covid19-radiography-databas) which is the real data collected recently through various labs of the infected patients. The database includes chest X-ray images for COVID-19 positive cases along with normal and viral pneumonia images. The dataset includes 219 COVID-19 positive images, 1341 normal images and 1345 viral pneumonia images, with the total size of approximately 1.15 GB. In this work, all the images of non-COVID-19 samples (normal + pneumonia) are not included because the number of COVID-19 samples is significantly less, which creates the scenario of class-imbalance. So, 219 images are randomly selected from the normal set and 219 images from the viral pneumonia set are included, thus contributing 438 images for non-COVID-19 class. Figure 6a shows a sample COVID-19 image where arrow in the image highlights the infected part. Figure 6b shows the non-COVID-19 normal images.

Fig. 6
figure 6

a COVID-19 image sample. b Non-COVID-19 image samples

4.2 Data augmentation

Datasets of COVID-19 chest X-rays are continuously sourced from public donations that are not adequate for complex training using CNN. This research adopts an augmentation technique without any loss to increase as well as maintain the sample count. The parameters of augmentation are, rotation range of 15 degree, shear range of 0.2, height shift range of 0.2, zoom range of 0.3, width shift range of 0.2 and fill mode as nearest. The images in the dataset are of different sizes and therefore all the images are converted into a size of 224 × 224 × 3 pixels. The color-gradient axis represents RGB re-ordering. Figure 7a and b shows sample images generated using augmentation process.

Fig. 7
figure 7

a Different variations of a COVID-19 sample after augmentation process. b different variations of a normal sample after augmentation process

4.3 Hyperparameter tuning

Hyperparameter tuning was performed using Grid search on CovCNN models. The tuning process was performed using the five hyperparameters: Number of epochs, learning rate, dropout rate in the dropout layer, batch size and gradient update optimization algorithms. The range of the hyperparameters where: dropout factor is between 10 and 35% in steps of 5%; number of epochs as 20, 25, 30, 35, 40 and 45. Batch size search space was 16, 32 and 64; gradient optimizers as ADAM or SGD or RMSprop. It was found that dropout probabilities of 25% and 30% resonated well in most of the versions of CovCNN models. The best epoch was chosen to be 40 with the batch size of 32 which yielded better performance. RMSProp was the best gradient optimizer in most of the versions, but in some versions of CovCNN, ADAM yielded better results. In order to avoid overfitting and increase the impact of generalization, a dropout of 30% has been done for regularization.

4.4 Model training and validation

The proposed CovCNN model is trained with the augmented chest X-ray images to analyze the custom models and pre-trained models and is evaluated. The distribution of the samples into training and validation dataset is in 4:1 ratio, i.e. 525 images for training and 132 images are used for testing purpose. Target classes with 525 images include 219 samples of COVID-19 and 306 samples of non-COVID-19 (normal + viral pneumonia) class. The input size of the training and testing images are 224 × 224 × 3, batch size of 32 is used during the experiment. The models were trained for 40 to 50 epochs with early-stopping configuration, i.e. the training stops once the model stops improving performance on the validation set. The model was developed with the system configuration as follows 10th Generation intel core, i7 core, 16 GB Ram, 512SSD with 4 GB NVIDIA Graphics Card.

The model is evaluated on various classification metrics that include binary accuracy (frequency with which predicted value matches true value, an idempotent operation that simply divides total by count), confusion matrix, recall, F1-score, sensitivity, and specificity. In order to improve the proposed model ability to generalize well, data set are artificially expanded by generating the different version of images in the dataset by leveraging the augmentation technique. Table 1 includes the learning process recorded by the CovCNN model with respect to the number of epochs, model loss, model accuracy curve respectively.

Table 1  Learning process recorded by the CovCNN model with respect to the number of epochs, model loss, model accuracy curve respectively.

From the learning curves detailed in Table 1, the dynamics of the model from these curves can be analyzed for under-fit, over-fit, and good fit. A model is said to be under-fitted if its training loss curve in the learning curves remains flat regardless of how training or training loss continues to decrease until the end of training. A model is said to be over-fitted if the plot of training loss continues to decrease with experience and the validation loss curve decreases till a certain point and then starts increasing. Finally, a model is said to be a good fit, when training loss and validation loss decreases to a point of stability, and the gap between both the curves is small.

In the case of CovCNN_1 model during the first 30 epochs in the learning curves the model depicts good fit behavior, but after 30 epochs the model started overfitting as the validation loss started increasing. CovCNN_2 model shows some improvement depicting the good-fit behavior, though the validation accuracy shows noisy movement around the training accuracy, this is due to the low-complexity of the model. CovCNN_3 trained for 32 epochs (early stopping configuration), the learning curve depicts good-fit behavior and outputs an acceptable accuracy of 96.97% with very good sensitivity and specificity value. CovCNN_4 a deep-neural network model, the learning plots depicts good-fit behavior within 25 epochs achieving an accuracy of 98.48%.

4.5 Comparison with pre-trained models

The performance of the pre-trained model with respect to the training and validation accuracy curves is shown in Table 2, and the interpretations are detailed in this section. ResNet101 generates a good fit until 10 epochs as the loss curve keeps descending. After that, there is an increase in loss function causing disturbance in the accuracy curve from which we can infer that the model overfits the data resulting in an accuracy of 90.91%. Xception model fits good up to epoch 13 resulting an acceptable accuracy of about 96.21%. However, after 13 epochs there is a gap between training and validation gaps that indicates overfitting in the model. 

Table 2 Learning process recorded for the pre-trained models with respect to the number of epochs, model loss, model accuracy curve respectively

InceptionV3, ResNet50V2 and InceptionResNetV2, VGG16 achieved an accuracy of 96.97% resulting in a slight overfitting that can be observed from the gap between training and validation loss leading to a penalty of generalization error. VGG19 shows a good improvement over all other pre-trained models resulting in an accuracy of 98.48%. The overall validation accuracy of the proposed CovCNN model and the pre-trained model is shown in the Table 3. The Receiver operating characteristic (ROC) curve and the confusion matrix is outlined post-training on the respective validation sets. Table 4 records the ROC observations obtained for the custom CovCNN model along with the confusion matrix. Similarly the ROC curve is also obtained for the transfer learning models along with the confusion matrix and is presented in Table 5.

Table 3 Represents the overall training and accuracy of the proposed CovCNN models as well as pre-trained models
Table 4 Observations recorded by the CovCNN model in the evolution of predicting Confusion matrix and ROC curve respectively
Table 5 Observations recorded by the transfer learning pre-trained models in the evolution of predicting Confusion matrix and ROC curve respectively

4.6 Visual interpretation of the trained model features

Figure 8 shows the visualization of features extracted by the inner layers of the proposed model CovCNN during the training process. This visualization helps a lot in understanding how a model is interpreting the image internally. Here image (Cov2d_45) represents visualization of the output after applying 32, 3 × 3 filters. Followed by image (max_pooling_2d_38), represents the output after applying max-pooling to each 32 filters. Here the edges are strengthened as compared to the Cov2d_45 image. Similarly, further convolution blocks, which helps in extracting more fine-grained features from the image. From image (Cov2d_47), various versions (128) have been created, each of them representing some feature and contributing to the fully connected convolution layers for the final classifications.

Fig. 8
figure 8

Visualization of the most salient features on the convolution and pooling layers

4.7 Effectiveness of the proposed architecture

The effectiveness of the proposed CovCNN model is analyzed using various performance metrics like accuracy, sensitivity, specificity and F1-score. The resultant performance metric values are detailed in this section.

4.7.1 Binary accuracy

Binary accuracy is the frequency with which the predicted value matches with the actual value divided by total prediction made. The expression for binary accuracy is given in Eq. (5).

$$Accuracy = \frac{TN + TP}{TN + FN + TP + FP}$$

4.7.2 Sensitivity and specificity

The proportion of correctly predicted positive data is called sensitivity whereas the proportion of correctly predicted negative data is called specificity which is given in Eq. (6) and (7). In the proposed work, the number of patients who are correctly identified as COVID-19 positive is referred as sensitivity and the number of patients who are correctly identified as non-COVID-19 is termed as specificity.

$$Sensitivity = \frac{TP}{TP + FN}$$
$$Specificity = \frac{TN}{TN + FP}$$

4.7.3 F1-score

F1- score is the value obtained from a combination of precision and recall. Here, precision is the proportion of true positives from all the examples that are classified as positive by the neural network. Whereas recall is the same as sensitivity defined previously from Eq. (8), (9) and (10).

$$F =2\times \frac{Precision\times recall}{Precision + recall }$$
$$Precision = \frac{TP}{TP + FP}$$
$$Recall =\frac{TP}{TP + FN}$$

Tables 4 and 5 represent the compiled performance metric of all the models along with the confusion matrix for each model. Tables 6 and 7 represent the results of custom and pretrained models respectively. CovCNN_1 achieved an accuracy, sensitivity and specificity of 93.9%, 0.909 and 1 respectively. This initial CovCNN model has been improved as CovCNN_2 achieving an accuracy, sensitivity and specificity of 95%, 0.932 and 1 respectively. Further this model was improved as CovCNN_3 achieving an accuracy, sensitivity and specificity of 97%, 0.97 and 0.98 respectively. The best model for binary classification in the custom model category is CovCNN_4 achieving an accuracy of 98.48%, sensitivity as 1 and specificity as 0.97. In the pre-trained category, the best model is VGG19 with 98.48%, sensitivity as 97.73% and specificity as 98.86%. The accuracy achieved with the proposed algorithm is compared with the existing and is given in Table 8.

Table 6 Quantitative performance validation results of CovCNN architectures on the COVID-19 X-ray dataset
Table 7 Quantitative performance validation results of pre-trained architectures on the COVID-19 X-ray dataset
Table 8 Comparison of the proposed accuracy with the existing algorithm

5 Conclusion and future scope

The significant problem in the current scenario is the precise classification and identification of the COVID-19 dataset. One of the crucial aspects is the discovery of important features that distinguish between COVID-19 and non-COVID-19 images. A deep learning approach using chest X-ray images with promising potential is presented to distinguish infection of COVID-19 from normal lung X-ray images. The objective of this research is to automatically classify the chest X-ray images using efficient deep learning intelligent model to provide faster classification. The proposed methodology aims to develop a unique CovCNN model that provides better generalization performance for classifying the COVID-19 lung X-ray images. Four deep learning custom created models are developed of which three of them (CovCNN_1, CovCNN_2, CovCNN_3) set the baseline for the final proposed custom model i.e., CovCNN_4. Along with this, the data is also tested on different pre-trained transfer learning models and hyperparameter tuning has been implemented in order to achieve optimal accuracy. The proposed CovCNN model achieved an accuracy of 98.5% that outperforms the other versions along with the pre-trained models. This shows efficiency of the proposed CovCNN model that outperforms the existing pre-trained transfer learning models like ResNet-101, VGG-16, VGG-19, Inception-V3, ResNet-50V2, InceptionResNet-V2 and Xception. It can be extremely helpful for medical practitioners and radiologists to aid them in fastened and accurate diagnosis and following up cases. Following are the recommendations for further research.

  1. (i)

    CovCNN_4 model can be further improved by fusion of CNN and pre-trained model features to enhance the accuracy.

  2. (ii)

    Multi-class classification can be designed to classify various types of lung diseases.

  3. (iii)

    A generalized system can be developed to detect the different mutants of COVID-19.

  4. (iv)

    Hyperparameter tuning and changes in CovCNN_4 model might help in detection of different mutants of COVID-19 from x-ray images and/or CT scan images.