1 Introduction

COVID-19 pandemic has taken millions of lives and created havoc worldwide over a short period. COVID-19 is a highly contagious respiratory disease that affects the lungs very badly [13]. The only viable solution is to detect the infected individual and break the chain of infection further. Presently, Reverse Transcription Polymerase Chain Reaction (RT-PCR) is widely used for detection of COVID-19 [25]. The expert lab technicians and testing kits are required to perform the testing [3]. The time and expenses involved in testing a sample range from 2 h to a few days. Moreover, RT-PCR does not produce accurate results for some cases due to its high false-negative rates (39-60%) [15]. New variants of SARS-Cov-2 virus have made it even more difficult to detect it by using the existing diagnostic techniques [20].

To overcome the above-mentioned issues, researchers have used artificial intelligence (AI) techniques to develop an automated diagnosis system for coronavirus infection [9]. AI coupled with medical imaging has assisted many fields especially, healthcare in various diagnoses and treatments within the past decade. Recently, deep neural networks [26] are widely used in healthcare system. The well-known deep neural network is convolutional neural network (CNN) [8]. CNNs require huge training data for detecting the coronavirus infection in Chest X-ray (CXR) images [16]. However, the appropriate datasets of CXRs that consists of equal number of COVID-19 and normal chest images is not available [23]. Lack of supervised data may lead to the class imbalance problem [27].

To handle the class imbalance problem, semi-supervised learning techniques are used to perform the classification task [4]. Data augmentation is used to overcome the small-sized dataset problem [24]. The data augmentation includes various operations such as flipping, rotation, and translation. These operations are widely used in most of the computer vision’s applications [1]. A new and more efficient form of augmentation is synthetic data augmentation of good-quality samples [2]. Synthetically obtained data samples enrich the dataset and improved the training efficiency. However, it is not possible to generate the completely different images by using the data augmentation [14]. Nowadays, Generative Adversarial Networks (GAN) are used to generate large dataset by generating synthetic data [5].

Deep convolutional GAN (DGAN) [17], which is special type of GAN, is used in this paper. DGAN is used to generate the synthetic COVID-19 CXR images. A novel deep learning-based model is developed for detecting the coronavirus infection on the CXR images. The key contributions of this paper are as follows:

  1. i.

    An efficient deep convolutional generative adversarial network and convolutional neural network (DGCNN) is designed to diagnose COVID-19 suspected subjects.

  2. ii.

    Deep convolutional generative adversarial network (DGAN) utilizes two networks trained adversarially such that one generates fake images and the other differentiates between them.

  3. iii.

    The convolutional neural network (CNN) is utilized for classification purpose.

  4. iv.

    Extensive experiments are conducted to evaluate the performance of the proposed DGCNN.

The remaining structure of this paper is as follows. Section 2 presents the related work done in the direction of COVID-19 diagnosis. The proposed deep learning-based model is described in Section 3. The experimental results and discussion are mentioned in Section 4. The concluding remarks and possible future research directions are presented in Section 5.

2 Related work

Many studies have implemented a supervised learning framework to develop CNN models for COVID-19 diagnosis using CXRs. Hemdan et al. [13] integrated seven convolution networks and proposed a COVIDX-Net model in CXRs. COVIDX-Net provided better performance than the individual models. However, the efficiency of COVIDX-Net was tested on very small dataset. Wong [25] developed a COVIDNet to detect and classify pneumonia cases along with COVID-19. This model attained the classification accuracy of 92.4%. Ioannis [3] expanded the dataset taking 224 COVID-19 positive cases. This model obtained a 98.75% success rate for binary and 93.48% accuracy for triple-classes. Narin [15] coupled ResNet-50 with CXRs and achieved a 98% accuracy in diagnosis. Sethy and Behra [20] used feature classification of images obtained from various deep networks with a support vector machine (SVM) classifier. Besides these, many recent studies are using deep learning that worked on Computed Tomography (CT) of lungs to classify and detect COVID-19 cases. However, the lack of publicly available datasets was an issue in all the deep learning models as they were trained on very small datasets.

A few more experiments were conducted by DeGrave [9] to test the robustness and generalizability of CNN models. They achieved good test accuracy on replicating various supervised models like COVID-Net [26] on the COVID-x dataset. However, their predictive performance falls by 50% on validating the model on external COVID and non-COVID datasets [8]. They used saliency maps and image edges for detecting COVID-19 [16]. It is found that non-COVID markers did not provide the prediction such as cardiac silhouette and diaphragm. The images obtained from different scanners can be problematic as the models with high accuracy on a particular dataset cannot be generalized for other datasets.

The above-mentioned data inadequacy in medical diagnosis led to explore new ways to expand image datasets. Recently, GANs are utilized by many researchers in medical imaging [23]. Zhao [27] proposed a VGG-16 network with DGAN for synthetic image generation in lung-nodule classification. A progressively grown GAN (PGGAN) was trained to synthesize medical scans of the fundus containing premature retinopathy vascular pathology (ROP) and MRI scans [4]. Waheed et al. [24] presented an auxiliary classifier GAN (ACGAN)-based model named as CovidGAN to generate the synthetic CXR images for COVID-19 classification. CovidGAN improved the performance of CNN. However, the performance can be further by using PGGAN. Acar et al. [1] hybridized GAN, data augmentation and segmentation to improve the performance of COVID-19 classifier. The synthetic CT images generated from hybrid approach were applied in CNN for identification of COVID-19. The performance of classifier was little improvement by utilizing the proposed hybrid approach. Al-Shargabi et al. [2] utilized a conditional GAN for generating synthetic CXR images. These images were used for training the InceptionResNet_V2 model and provided better performance than the InceptionResNet_V2 without GAN. However, the model has high computational time. Schlegl [19] studied the data distribution of healthy tissues in the retina using GAN. It was used for anomaly detection in the retinal region on new and healthy image patches.

It is observed from the above-mentioned literature that GAN-based techniques can be useful to detect the infection caused by COVID-19. Studies showed that the anomalies in chest radiographs are more frequent during the initial symptom period and at the peak of illness. Thus, CXR should be used as a major screening method for COVID-19 diagnosis. It has other advantages such as easy accessibility and portability. The less number of CXR images are available due to small duration of this pandemic. Hence, GAN can be used to create the synthetic training data. CXR images can be synthesized from scratch using GAN and produce the desirable results when it is combined with other methods. These fact motivated us to develop the novel deep learning-based model for identification of COVID-9 in CXR images.

3 Materials and methods

3.1 Dataset

CXRs dataset is obtained from [7] for experimental purpose. It is a well-structured dataset that consists of four classes namely, COVID-19, normal, pneumonia viral, and pneumonia bacterial. It has 306 images in total that are further divided into 270 training images and 36 test images, respectively (refer [7]). Figure 1 shows a sample view of CXR’s testing dataset. Table 1 shows the subject wise distributions of the dataset. It shows that the dataset is balanced in nature. The only problem with this dataset is limited number of images for deep learning models. To produce synthetic images for each of the four classes, CXRs dataset [7] is utilized as an input to DGAN with real images. The obtained images and actual CXRs images are used to train the proposed model.

Fig. 1
figure 1

Sample view of CXR’s testing dataset

Table 1 Subjects wise distribution of the dataset

3.2 Proposed architecture

To augment the dataset, DGAN is used for obtaining the synthetic images for each of the four classes of the original dataset. Further, CNN classifier is used to evaluate the effect of synthetic images on the classification performance.

3.2.1 Generative adversarial network (GAN)

GANs are the specific frameworks of a generative model. A generative model learns the data distribution (pdata) to obtain new samples from a given set of sample images x1, x2…, xn. DGAN is used when both discriminator and generator are deep CNNs [11]. It consists of two simultaneously trained networks, i.e., discriminator (D) and the generator (G). D discriminates between the obtained real and fake images. It takes input as i and outputs the probability of the real sample as D(i). The generator network takes input j1, …jn from a uniform distribution p(i) and synthesizes samples after mapping G(j) to the image space p(g). G targets to obtain f(g) = f(data). In a two player minimax game, the loss function is optimized to train the adversarial networks as:

$$ \underset{G}{\min }\ \underset{D}{\max }\ {E}_i\sim f(data) logD(i)+{E}_i\sim f(i)\ \left[\mathit{\log}\ \left(1-D\left(G(j)\right)\right)\right] $$
(1)

D is trained such that D(i) is maximized for images having i ∼ f(data) and is minimized for images having i = f(data). G outputs images G(z) and deceives the discriminator during the training process such that D(G(j)) ∼ f(data). Hence, the generator is trained to maximize D(G(j)), or minimize 1 − D(G(j)) [6]. G boosts its capacity to synthesize more realistic images during training. While D boosts its capacity to differentiate the real from the synthesized images.

  1. a.

    Generator

An input vector consisting of 100 random numbers is given to the generator from a uniform distribution, which produces the output of 64 ∗ 64 ∗ 1 sized image. The network architecture is made up of a fully connected layer that is resized to 4 ∗ 4 ∗ 512 and four fractional-stride convolutions to up-sample the image with 4 ∗ 4 sized kernels. A fractional stride convolution is expanded by inserting zeros in between the pixels. Besides the output layer, Batch-normalization is applied to every layer. The learning process of DGAN is stabilized by normalizing the responses throughout the mini-batch, which stops the generator from reducing all samples to a single point. Tanh activation function is used by the output layer and ReLU for all the other layers. Figure 2 shows the generator network architecture along with the trainable parameters.

Fig. 2
figure 2

Generator architecture in DGAN

  1. b.

    Discriminator

The discriminator network inputs CXR images with size of 64 ∗ 64 ∗ 1 and outputs one decision if CXR is real. Figure 3 depicts the architecture of discriminator. It consists of four convolutions and fully connected layers with 4 ∗ 4 sized kernels. The spatial dimensionality is reduced by using stride convolutions. Leaky Rectified Linear Unit (ReLU) function f(i) =  max (i, leak ∗ i) and batch normalization is used for all layers except the input-output. The sigmoid function is used in the output for the likelihood probability score of CXR as [0, 1]. Also, to prevent the over-fitting issue, the dropout is used for every layer, except the second and the output layers.

Fig. 3
figure 3

Discriminator architecture in DGAN

3.2.2 Classifier

The architecture of the proposed classifier and the relative trainable parameter are demonstrated in Fig. 4. CNN architectures for medical imaging generally consist of lesser convolution layers due to small datasets and small input sizes. In this paper, CNN classifier utilizes fixed size grayscale input CXRs of 64 ∗ 64 that is normalized in the range of [0, 1].

Fig. 4
figure 4

Proposed architecture of CNN classifier

The network architecture is a three-layered convolution with 3 ∗ 3 kernel, ReLU activation functions, batch normalization, and max pooling. The output layer have two fully connected layers with dropout as 0.5 and Softmax output function over the four classes. Batch normalization and dropout are utilized to prevent over-fitting.

3.2.3 Model building

After defining the network architecture, this section explains the training of the proposed DGCNN model. It involves the selection of appropriate loss function, selecting epoch value, and various learning rates. DGANs are capable of producing extremely good results, but training a DGAN is quite challenging as both the sub-networks are trained simultaneously and may affects the training of each other. It is often the case that improvements to one sub-network come at the expense of the other sub-network.

CNN architectures for medical imaging generally consist of lesser number of convolution layers due to lesser size of the datasets and small input sizes [22]. In this paper, the training process is extended until the loss curve stabilizes around its minimum axis. It may depend on the size of the dataset. Since the dataset length differs, therefore, epochs are allowed to be 1500.

  1. i.

    Batch Size

For training DGAN, larger batch size is not recommended. Because in the initial phases of training the discriminator gets a lot of examples to train. Also, a larger batch size can overpower the generator. It may lead to an unstable model and it may not converge properly. Considering the size of the dataset, the batch size is taken as 32. It is quite suitable since both the discriminator as well as the generator could train with stability.

For classifier, the strategy used is operating in a small-batch regime wherein the set of training data, a subset of 256 elements is randomly sampled to obtain an approximation to the gradient. The quality of the model is significantly degraded for a larger batch size. The batch size strategies are evaluated and it seems to be the best suited for the proposed DGCNN.

  1. ii.

    Transform of training samples

To train CNN, separate sets of training datasets were developed using classical augmented and synthetic augmented CXRs. Various image transforms are utilized together before assigning the training samples to DGCNN:

  • Resize: It is used to resize the input image to the given size. Since DGAN takes images having size 64 ∗ 64, thus, the input images are resized 64 ∗ 64 to before providing them to DGCNN.

  • Random horizontal flip: It can augments the images by flipping them horizontally with random degree.

  • Grayscale: This function converts the input image to a single channel which is later assigned to DGCNN.

  • Normalize: The given set of CXR images are normalized to have unit mean as well as unit standard deviation.

  • Color Jitter: It is used to alter saturation, brightness, and contrast of CXR images.

  • Random Rotation: It is used to randomly rotate CXRs depending upon the predefined interval range of degrees.

  • Random Affine: It is used for translation and scaling to CXR images in the predefined range.

  1. iii.

    Hyper parameters

Four separate DGANs [21] are trained to synthesize images for each of four type classes. Both the networks are trained iteratively. Mini-batches are drawn of n = 32 CXR samples x(1), …, x(n) for each CXR class ∈ (COVID, Normal, Pneumonia bacteria, and Pneumonia virus) and n = 32 noise samples y(1), …, y(n) from a uniform distribution. The slope of leak was 0.2 in Leaky ReLU. Adam optimizer with parameters α1 = 0.5 and α2 = 0.999 are used. The initial learning rate was 0.0002 for both networks. Different combinations of learning rates for generator and discriminator networks are used to analyze the convergence of DGAN. Finally, the learning rate of generator is increased to 0.002.

For training the classifier, batch size is used as 256. Stochastic gradient descent optimization with a high learning rate (LR = 0.01) is used that performed better for this specific task. Nesterov momentum updates (m = 0.9) and gradient at lookahead position are used, which speed up the training process and improve the performance. Separate CNNs are trained on different augmented CXR scans for each of the available datasets for variable epoch’s quantities depending on the dataset dimensions, i.e., 150 to 750 epochs depending upon the size of the dataset.

4 Experimental setup and analysis

This section discusses the performance metrics, comparative analysis of the proposed DGCNN model. For experimental purpose, an open-source machine learning library PyTorch is used. Google Colab is used to train both DGAN and classifier models.

4.1 Evaluation metrics

For evaluating the performance of the proposed DGCNN, the training performance can be evaluated as:

  • Loss_D: It represents the discriminator loss and is taken as the sum of losses for all the real and fake batches that given as: (log(D(G(j))) +  log (D(i))).

  • Loss_G: It is the generator loss and can be taken as log(D(G(j))).

  • D(x): It represents the average discriminator output for all real batches. It initializes using 1 and then theoretically can converge to 0.5 according to the value of G.

  • G(z): It is the average discriminator output for all the fake batches. The first value is (D _ G _ j1) before updating D and the second value is (D _ G _ j2) after updation of D. D _ G _ j2 is initialized as 0 and later converges to 0.5 according to the value of G .

Whenever a discriminator is updated, it tries to push D(i) towards 1 and at a same time pushes D(G(j)) towards 0. An updated generator tries to increase D(G(j)), i.e., it tries to dupe D that the images generated from noise are the real ones. The discriminator cannot differentiate between real and fake images in an ideal case [12]. However, this scenario is not easily achieved practically.

To evaluate the classifier, the number of CXRs in the training set is (9, 9, 9, and 9) for all kinds of subjects. Therefore, a batch of 36 CXRs is used to maintain the balance of CXR images in every class. The average performance is computed over 500 iterations. The average testing accuracy value is used to evaluate the performance of DGCAN. Additional measures such as sensitivity, specificity, and F1-score are also computed for each category.

4.2 Performance analysis

4.2.1 Fake images generation using DGAN

COVID-19 CXR [7] dataset is used for training DGAN. For analysis of every trained network, the relevant parameters namely, D(G(j1)), D(G(j2)), D(i), Loss _ D, and Loss _ G, are obtained and plotted against the epochs. Figure 5 shows the performance of a stack of real CXRs from the dataset and fake CXRs generated from DGAN trained. It is found that the stability of both networks is increased as the epochs increased. The fake images are shown in the right portion of Fig. 5, which are single-channel images obtained from an individual DGAN trained with 512 epochs for each of the four classes.

Fig. 5
figure 5

Pairs of real (left) and fake (right) images generated using DGAN

4.2.2 Comparison for different learning rates

Four DGANs are trained by considering every individual class of the dataset. Initially, a low learning rate of 0.0002 for the generator network resulted in the poor quality of synthetic images. The major cause is the discriminator overpowering the generator network. The output synthetic images from the network are improved in quality when the learning rate is increased to 0.002. Thus, DGAN perform efficiently on the used dataset when the learning rate is 0.002 (see Fig. 6a–d).

Fig. 6
figure 6

Analysis of the learning rate: a and b generator loss analysis with respect to epochs, c and d discriminator loss analysis, and e Generated fake image for the generator network when LR = 0.0002, and f generated fake image when LR = 0.002

4.2.3 Analysis of epochs

The performance of DGAN is also evaluated by using the epoch values as 128, 256, and 512. Figure 7 shows the obtained fake images using the different epoch values. The quality of images is increased as the number of epochs are increased. It is found that DGAN produce fake images, which are closer to the actual CXR images for larger value of epoch.

Fig. 7
figure 7

Epochs analysis a Actual pneumonia virus CXR. b Generated image at epoch 50. c Generated image at epoch 100. d Generated image at epoch 200

4.2.4 Convergence failure

During the training process, if the generator and the discriminator fail to reach a balance, it may results in a convergence failure. In the case of discriminator dominates, the generator score approaches to 0 and the discriminator score approaches to 1. It overpowers the generator as shown in Fig. 8.

Fig. 8
figure 8

Discriminator overpowering the generator

In contrast to the discriminator dominates, in the case of generator dominates the generator score approaches to 1. The score remains near to 1 for many iterations and the discriminator is duped by the generator almost every time. Figure 9 shows the case of generator dominates where the generator overpowers the discriminator.

Fig. 9
figure 9

Generator overpowering the discriminator

4.2.5 Classification analysis

Figure 10 shows the flowchart of the proposed DGCNN for the evaluation of synthetic data augmentation to diagnosis the suspected cases. Initially, the performance of existing data augmentation techniques is evaluated. DGAN is then used to synthesize CXR scans and the resultant images are combined with the actual CXR scans for training purpose. Subsequent section discusses the various steps of the proposed DGCNN for COVID-19 diagnosis.

Fig. 10
figure 10

Flowchart of the proposed DGCNN for synthetic data augmentation to diagnose COVID-19 suspected cases

Step 1: Existing augmentation

In this step, the classification results of CNN model are evaluated on actual and data augmentation-based training dataset. CNN is trained and the respective performance is evaluated separately for both sets of data, i.e., actual and augmented CXRs as. Let Dclassic represents the training data that includes an augmented CXR scans for training. Some fraction of CXR scans is also used for evaluations during the testing time. Additional data groups are formed for examining the effect of increased samples. First data group consist of only actual CXR scans. Various data augmentations are utilized (Nrot = 2, Ncolor = 2, Nflip = 2, Nscale = 4,and Ntrans = 4) for each original scan. It results in N = 128 augmented images per CXR scan. Therefore, 8000 samples per class are obtained. Thereafter, the images are selected by sampling random augmented scans such that same augmentation volume is sampled for each original scan. To summarize this data group preparation process in augmentation, 500, 1000, 2000,and 3000 samples are added, respectively to each fold. The training process is cross-validated over 4 different folds. Fig. 11 shows the sampled images after the data augmentation.

Fig. 11
figure 11

Sampled images after the data augmentation

Step 2: Synthetic augmented datasets

In the second step, synthetic CXR scans are generated for data augmentation using DGAN. The optimal point for classic augmentation \( {D}_{classic}^{optimal} \) is taken and the augmented data group is used for training DGAN. For effective training, the existing data augmentation is incorporated because of the small dataset. DGAN [10, 18] is employed for training each class separately using the same data fraction. The generator synthesized new samples after separate learning of each class data distribution. Some examples of synthesized CXR scans from each class are presented in Fig. 12. The same approach is used for constructing data groups. Additionally, numerous synthetic scans for all four classes are collected and data groups Dsynthetic of synthetic data are evaluated. The same number of synthetic scans are sampled for every class to keep them balanced. To summarize the data group preparation process in synthetic augmentation, 100, 500, 1000,and 2000 samples are appended to each of the four-folds.

Fig. 12
figure 12

GAN trained generated images for bacterial pneumonia CXRs

Figure 13 shows the experimental results obtained from DGAN for synthetic augmentation. The baseline results for the existing data augmentation techniques are marked as red. Total accuracy measure for each group of data for CXR diagnosis is evaluated. The average test results of CNN prediction over 500 iterations are reported in tables. The blue line demonstrates the result of the existing data augmentation scenario. The red line shows the result obtained from the combined approach of both synthetic data augmentation and the existing data augmentation base CXR scans. It results in 76.9% accuracy when no augmentation is used. This is happened due to the over-fitting problem. Table 2 shows the performance evaluation of proposed model without using data augmentation over 750 epochs.

Fig. 13
figure 13

Accuracy analysis of existing data augmentation and DGAN based CXR scans with respect to increase in training set size

Table 2 Performance analysis of CNN model on training set without data augmentation

Table 3 shows performance analysis of CNN model when it is trained with 750 epochs on training dataset by using the existing data augmentation techniques. It clearly shows that there is an improvement in terms of average accuracy as 1.7463%. The main reason behind this improvement is that the impact of over-fitting is reduced due to the increase in the training dataset.

Table 3 Performance analysis of CNN model on training dataset with the existing data augmentation

Table 4 demonstrates performance analysis of CNN model when it is trained on existing data augmentation and DGAN based training dataset with 750 epochs. It clearly shows that there is an improvement in terms of average accuracy as 5.8472% as compared to the without the use of any data augmentation technique and 4.1009% as compared to the existing data augmentation-based CNN model, respectively. The main reason behind this improvement is that the impact of the over-fitting is significantly reduced due to the major improvement in the size of the training dataset.

Table 4 Performance analysis of CNN model on training dataset with both existing data augmentation and synthetic augmentation

4.3 Discussion

Table 5 shows the comparative analysis of the proposed and the existing COVID-19 diagnosis models such as CSEN [34], TL-SVM [36], COVID-Net [33], MC-ResNet [40], COVID-DenseNet [39], COVIDX-Net [32], CNN-SVM [31], ResExLBP-SVM [30], COVIDiagnosis-Net [29], Xception and ResNet50V2 [28], ChestNet [18], MobileNet and SqueezeNet-based SVM [5], Xception [12], CovXNet [14], and DCNN [10]. Compared to these models the proposed DGCNN model achieves significantly better accuracy. It is also observed from results that the significant improvement has been achieved by the proposed model by using both exiting data augmentation and synthetic data augmentation generated by DGAN. The improvement in average accuracy is 5.8472% as compared to the without the use of any data augmentation technique. Whereas, the improvement in average accuracy is 4.1009% as compared to the existing data augmentation-based CNN model, respectively.

Table 5 Accuracy analysis among the existing and the proposed DGCNN models

5 Conclusion

It has been found that the sensitivity of RT-PCR test is not satisfactory to successfully control the COVID-19 outbreak. Therefore, an efficient DGCNN has been designed to diagnose COVID-19 suspected subjects. In the proposed DGCNN model, initially, DGAN consists two networks that has been trained adversarially such that one generates fake images and the other differentiates between them. Thereafter, CNN has been utilized to diagnose suspected cases from CXR scans. Extensive experiments have been drawn to evaluate the performance of the proposed DGCNN. Performance analysis have shown that DGCNN can highly improves the diagnosis performance. It has been found that the proposed DGCNN achieved an improvement in terms of average accuracy as 5.8472% as compared to without the use of any data augmentation technique and 4.1009% as compared to the existing data augmentation-based CNN model, respectively. The main reason behind this improvement is that the impact of the over-fitting has been significantly reduced due to the major improvement in the size of the training dataset.

In near future, the proposed model will be improved further by designing an evolving DGCNN model. Additionally, the proposed DGCNN model will be applied on other datasets too.