This section discusses the performance metrics, comparative analysis of the proposed DGCNN model. For experimental purpose, an open-source machine learning library PyTorch is used. Google Colab is used to train both DGAN and classifier models.
Evaluation metrics
For evaluating the performance of the proposed DGCNN, the training performance can be evaluated as:
-
Loss_D: It represents the discriminator loss and is taken as the sum of losses for all the real and fake batches that given as: (log(D(G(j))) + log (D(i))).
-
Loss_G: It is the generator loss and can be taken as log(D(G(j))).
-
D(x): It represents the average discriminator output for all real batches. It initializes using 1 and then theoretically can converge to 0.5 according to the value of G.
-
G(z): It is the average discriminator output for all the fake batches. The first value is (D _ G _ j1) before updating D and the second value is (D _ G _ j2) after updation of D. D _ G _ j2 is initialized as 0 and later converges to 0.5 according to the value of G .
Whenever a discriminator is updated, it tries to push D(i) towards 1 and at a same time pushes D(G(j)) towards 0. An updated generator tries to increase D(G(j)), i.e., it tries to dupe D that the images generated from noise are the real ones. The discriminator cannot differentiate between real and fake images in an ideal case [12]. However, this scenario is not easily achieved practically.
To evaluate the classifier, the number of CXRs in the training set is (9, 9, 9, and 9) for all kinds of subjects. Therefore, a batch of 36 CXRs is used to maintain the balance of CXR images in every class. The average performance is computed over 500 iterations. The average testing accuracy value is used to evaluate the performance of DGCAN. Additional measures such as sensitivity, specificity, and F1-score are also computed for each category.
Performance analysis
Fake images generation using DGAN
COVID-19 CXR [7] dataset is used for training DGAN. For analysis of every trained network, the relevant parameters namely, D(G(j1)), D(G(j2)), D(i), Loss _ D, and Loss _ G, are obtained and plotted against the epochs. Figure 5 shows the performance of a stack of real CXRs from the dataset and fake CXRs generated from DGAN trained. It is found that the stability of both networks is increased as the epochs increased. The fake images are shown in the right portion of Fig. 5, which are single-channel images obtained from an individual DGAN trained with 512 epochs for each of the four classes.
Comparison for different learning rates
Four DGANs are trained by considering every individual class of the dataset. Initially, a low learning rate of 0.0002 for the generator network resulted in the poor quality of synthetic images. The major cause is the discriminator overpowering the generator network. The output synthetic images from the network are improved in quality when the learning rate is increased to 0.002. Thus, DGAN perform efficiently on the used dataset when the learning rate is 0.002 (see Fig. 6a–d).
Analysis of epochs
The performance of DGAN is also evaluated by using the epoch values as 128, 256, and 512. Figure 7 shows the obtained fake images using the different epoch values. The quality of images is increased as the number of epochs are increased. It is found that DGAN produce fake images, which are closer to the actual CXR images for larger value of epoch.
Convergence failure
During the training process, if the generator and the discriminator fail to reach a balance, it may results in a convergence failure. In the case of discriminator dominates, the generator score approaches to 0 and the discriminator score approaches to 1. It overpowers the generator as shown in Fig. 8.
In contrast to the discriminator dominates, in the case of generator dominates the generator score approaches to 1. The score remains near to 1 for many iterations and the discriminator is duped by the generator almost every time. Figure 9 shows the case of generator dominates where the generator overpowers the discriminator.
Classification analysis
Figure 10 shows the flowchart of the proposed DGCNN for the evaluation of synthetic data augmentation to diagnosis the suspected cases. Initially, the performance of existing data augmentation techniques is evaluated. DGAN is then used to synthesize CXR scans and the resultant images are combined with the actual CXR scans for training purpose. Subsequent section discusses the various steps of the proposed DGCNN for COVID-19 diagnosis.
Step 1: Existing augmentation
In this step, the classification results of CNN model are evaluated on actual and data augmentation-based training dataset. CNN is trained and the respective performance is evaluated separately for both sets of data, i.e., actual and augmented CXRs as. Let Dclassic represents the training data that includes an augmented CXR scans for training. Some fraction of CXR scans is also used for evaluations during the testing time. Additional data groups are formed for examining the effect of increased samples. First data group consist of only actual CXR scans. Various data augmentations are utilized (Nrot = 2, Ncolor = 2, Nflip = 2, Nscale = 4,and Ntrans = 4) for each original scan. It results in N = 128 augmented images per CXR scan. Therefore, 8000 samples per class are obtained. Thereafter, the images are selected by sampling random augmented scans such that same augmentation volume is sampled for each original scan. To summarize this data group preparation process in augmentation, 500, 1000, 2000,and 3000 samples are added, respectively to each fold. The training process is cross-validated over 4 different folds. Fig. 11 shows the sampled images after the data augmentation.
Step 2: Synthetic augmented datasets
In the second step, synthetic CXR scans are generated for data augmentation using DGAN. The optimal point for classic augmentation \( {D}_{classic}^{optimal} \) is taken and the augmented data group is used for training DGAN. For effective training, the existing data augmentation is incorporated because of the small dataset. DGAN [10, 18] is employed for training each class separately using the same data fraction. The generator synthesized new samples after separate learning of each class data distribution. Some examples of synthesized CXR scans from each class are presented in Fig. 12. The same approach is used for constructing data groups. Additionally, numerous synthetic scans for all four classes are collected and data groups Dsynthetic of synthetic data are evaluated. The same number of synthetic scans are sampled for every class to keep them balanced. To summarize the data group preparation process in synthetic augmentation, 100, 500, 1000,and 2000 samples are appended to each of the four-folds.
Figure 13 shows the experimental results obtained from DGAN for synthetic augmentation. The baseline results for the existing data augmentation techniques are marked as red. Total accuracy measure for each group of data for CXR diagnosis is evaluated. The average test results of CNN prediction over 500 iterations are reported in tables. The blue line demonstrates the result of the existing data augmentation scenario. The red line shows the result obtained from the combined approach of both synthetic data augmentation and the existing data augmentation base CXR scans. It results in 76.9% accuracy when no augmentation is used. This is happened due to the over-fitting problem. Table 2 shows the performance evaluation of proposed model without using data augmentation over 750 epochs.
Table 2 Performance analysis of CNN model on training set without data augmentation
Table 3 shows performance analysis of CNN model when it is trained with 750 epochs on training dataset by using the existing data augmentation techniques. It clearly shows that there is an improvement in terms of average accuracy as 1.7463%. The main reason behind this improvement is that the impact of over-fitting is reduced due to the increase in the training dataset.
Table 3 Performance analysis of CNN model on training dataset with the existing data augmentation
Table 4 demonstrates performance analysis of CNN model when it is trained on existing data augmentation and DGAN based training dataset with 750 epochs. It clearly shows that there is an improvement in terms of average accuracy as 5.8472% as compared to the without the use of any data augmentation technique and 4.1009% as compared to the existing data augmentation-based CNN model, respectively. The main reason behind this improvement is that the impact of the over-fitting is significantly reduced due to the major improvement in the size of the training dataset.
Table 4 Performance analysis of CNN model on training dataset with both existing data augmentation and synthetic augmentation
Discussion
Table 5 shows the comparative analysis of the proposed and the existing COVID-19 diagnosis models such as CSEN [34], TL-SVM [36], COVID-Net [33], MC-ResNet [40], COVID-DenseNet [39], COVIDX-Net [32], CNN-SVM [31], ResExLBP-SVM [30], COVIDiagnosis-Net [29], Xception and ResNet50V2 [28], ChestNet [18], MobileNet and SqueezeNet-based SVM [5], Xception [12], CovXNet [14], and DCNN [10]. Compared to these models the proposed DGCNN model achieves significantly better accuracy. It is also observed from results that the significant improvement has been achieved by the proposed model by using both exiting data augmentation and synthetic data augmentation generated by DGAN. The improvement in average accuracy is 5.8472% as compared to the without the use of any data augmentation technique. Whereas, the improvement in average accuracy is 4.1009% as compared to the existing data augmentation-based CNN model, respectively.
Table 5 Accuracy analysis among the existing and the proposed DGCNN models