This section presents a comprehensive view of the dataset, experimentation, model training, and validation. A performance comparison of the proposed approach with the existing works has also been presented in this section.
Data acquisition and pre-processing
The dataset coontaining COVID-19 radiography images are acquired from Kaggle (https://www.kaggle.com/tawsifurrahman/covid19-radiography-databas) which is the real data collected recently through various labs of the infected patients. The database includes chest X-ray images for COVID-19 positive cases along with normal and viral pneumonia images. The dataset includes 219 COVID-19 positive images, 1341 normal images and 1345 viral pneumonia images, with the total size of approximately 1.15 GB. In this work, all the images of non-COVID-19 samples (normal + pneumonia) are not included because the number of COVID-19 samples is significantly less, which creates the scenario of class-imbalance. So, 219 images are randomly selected from the normal set and 219 images from the viral pneumonia set are included, thus contributing 438 images for non-COVID-19 class. Figure 6a shows a sample COVID-19 image where arrow in the image highlights the infected part. Figure 6b shows the non-COVID-19 normal images.
Data augmentation
Datasets of COVID-19 chest X-rays are continuously sourced from public donations that are not adequate for complex training using CNN. This research adopts an augmentation technique without any loss to increase as well as maintain the sample count. The parameters of augmentation are, rotation range of 15 degree, shear range of 0.2, height shift range of 0.2, zoom range of 0.3, width shift range of 0.2 and fill mode as nearest. The images in the dataset are of different sizes and therefore all the images are converted into a size of 224 × 224 × 3 pixels. The color-gradient axis represents RGB re-ordering. Figure 7a and b shows sample images generated using augmentation process.
Hyperparameter tuning
Hyperparameter tuning was performed using Grid search on CovCNN models. The tuning process was performed using the five hyperparameters: Number of epochs, learning rate, dropout rate in the dropout layer, batch size and gradient update optimization algorithms. The range of the hyperparameters where: dropout factor is between 10 and 35% in steps of 5%; number of epochs as 20, 25, 30, 35, 40 and 45. Batch size search space was 16, 32 and 64; gradient optimizers as ADAM or SGD or RMSprop. It was found that dropout probabilities of 25% and 30% resonated well in most of the versions of CovCNN models. The best epoch was chosen to be 40 with the batch size of 32 which yielded better performance. RMSProp was the best gradient optimizer in most of the versions, but in some versions of CovCNN, ADAM yielded better results. In order to avoid overfitting and increase the impact of generalization, a dropout of 30% has been done for regularization.
Model training and validation
The proposed CovCNN model is trained with the augmented chest X-ray images to analyze the custom models and pre-trained models and is evaluated. The distribution of the samples into training and validation dataset is in 4:1 ratio, i.e. 525 images for training and 132 images are used for testing purpose. Target classes with 525 images include 219 samples of COVID-19 and 306 samples of non-COVID-19 (normal + viral pneumonia) class. The input size of the training and testing images are 224 × 224 × 3, batch size of 32 is used during the experiment. The models were trained for 40 to 50 epochs with early-stopping configuration, i.e. the training stops once the model stops improving performance on the validation set. The model was developed with the system configuration as follows 10th Generation intel core, i7 core, 16 GB Ram, 512SSD with 4 GB NVIDIA Graphics Card.
The model is evaluated on various classification metrics that include binary accuracy (frequency with which predicted value matches true value, an idempotent operation that simply divides total by count), confusion matrix, recall, F1-score, sensitivity, and specificity. In order to improve the proposed model ability to generalize well, data set are artificially expanded by generating the different version of images in the dataset by leveraging the augmentation technique. Table 1 includes the learning process recorded by the CovCNN model with respect to the number of epochs, model loss, model accuracy curve respectively.
Table 1 Learning process recorded by the CovCNN model with respect to the number of epochs, model loss, model accuracy curve respectively. From the learning curves detailed in Table 1, the dynamics of the model from these curves can be analyzed for under-fit, over-fit, and good fit. A model is said to be under-fitted if its training loss curve in the learning curves remains flat regardless of how training or training loss continues to decrease until the end of training. A model is said to be over-fitted if the plot of training loss continues to decrease with experience and the validation loss curve decreases till a certain point and then starts increasing. Finally, a model is said to be a good fit, when training loss and validation loss decreases to a point of stability, and the gap between both the curves is small.
In the case of CovCNN_1 model during the first 30 epochs in the learning curves the model depicts good fit behavior, but after 30 epochs the model started overfitting as the validation loss started increasing. CovCNN_2 model shows some improvement depicting the good-fit behavior, though the validation accuracy shows noisy movement around the training accuracy, this is due to the low-complexity of the model. CovCNN_3 trained for 32 epochs (early stopping configuration), the learning curve depicts good-fit behavior and outputs an acceptable accuracy of 96.97% with very good sensitivity and specificity value. CovCNN_4 a deep-neural network model, the learning plots depicts good-fit behavior within 25 epochs achieving an accuracy of 98.48%.
Comparison with pre-trained models
The performance of the pre-trained model with respect to the training and validation accuracy curves is shown in Table 2, and the interpretations are detailed in this section. ResNet101 generates a good fit until 10 epochs as the loss curve keeps descending. After that, there is an increase in loss function causing disturbance in the accuracy curve from which we can infer that the model overfits the data resulting in an accuracy of 90.91%. Xception model fits good up to epoch 13 resulting an acceptable accuracy of about 96.21%. However, after 13 epochs there is a gap between training and validation gaps that indicates overfitting in the model.
Table 2 Learning process recorded for the pre-trained models with respect to the number of epochs, model loss, model accuracy curve respectively InceptionV3, ResNet50V2 and InceptionResNetV2, VGG16 achieved an accuracy of 96.97% resulting in a slight overfitting that can be observed from the gap between training and validation loss leading to a penalty of generalization error. VGG19 shows a good improvement over all other pre-trained models resulting in an accuracy of 98.48%. The overall validation accuracy of the proposed CovCNN model and the pre-trained model is shown in the Table 3. The Receiver operating characteristic (ROC) curve and the confusion matrix is outlined post-training on the respective validation sets. Table 4 records the ROC observations obtained for the custom CovCNN model along with the confusion matrix. Similarly the ROC curve is also obtained for the transfer learning models along with the confusion matrix and is presented in Table 5.
Table 3 Represents the overall training and accuracy of the proposed CovCNN models as well as pre-trained models Table 4 Observations recorded by the CovCNN model in the evolution of predicting Confusion matrix and ROC curve respectively Table 5 Observations recorded by the transfer learning pre-trained models in the evolution of predicting Confusion matrix and ROC curve respectively Visual interpretation of the trained model features
Figure 8 shows the visualization of features extracted by the inner layers of the proposed model CovCNN during the training process. This visualization helps a lot in understanding how a model is interpreting the image internally. Here image (Cov2d_45) represents visualization of the output after applying 32, 3 × 3 filters. Followed by image (max_pooling_2d_38), represents the output after applying max-pooling to each 32 filters. Here the edges are strengthened as compared to the Cov2d_45 image. Similarly, further convolution blocks, which helps in extracting more fine-grained features from the image. From image (Cov2d_47), various versions (128) have been created, each of them representing some feature and contributing to the fully connected convolution layers for the final classifications.
Effectiveness of the proposed architecture
The effectiveness of the proposed CovCNN model is analyzed using various performance metrics like accuracy, sensitivity, specificity and F1-score. The resultant performance metric values are detailed in this section.
Binary accuracy
Binary accuracy is the frequency with which the predicted value matches with the actual value divided by total prediction made. The expression for binary accuracy is given in Eq. (5).
$$Accuracy = \frac{TN + TP}{TN + FN + TP + FP}$$
(5)
Sensitivity and specificity
The proportion of correctly predicted positive data is called sensitivity whereas the proportion of correctly predicted negative data is called specificity which is given in Eq. (6) and (7). In the proposed work, the number of patients who are correctly identified as COVID-19 positive is referred as sensitivity and the number of patients who are correctly identified as non-COVID-19 is termed as specificity.
$$Sensitivity = \frac{TP}{TP + FN}$$
(6)
$$Specificity = \frac{TN}{TN + FP}$$
(7)
F1-score
F1- score is the value obtained from a combination of precision and recall. Here, precision is the proportion of true positives from all the examples that are classified as positive by the neural network. Whereas recall is the same as sensitivity defined previously from Eq. (8), (9) and (10).
$$F =2\times \frac{Precision\times recall}{Precision + recall }$$
(8)
$$Precision = \frac{TP}{TP + FP}$$
(9)
$$Recall =\frac{TP}{TP + FN}$$
(10)
Tables 4 and 5 represent the compiled performance metric of all the models along with the confusion matrix for each model. Tables 6 and 7 represent the results of custom and pretrained models respectively. CovCNN_1 achieved an accuracy, sensitivity and specificity of 93.9%, 0.909 and 1 respectively. This initial CovCNN model has been improved as CovCNN_2 achieving an accuracy, sensitivity and specificity of 95%, 0.932 and 1 respectively. Further this model was improved as CovCNN_3 achieving an accuracy, sensitivity and specificity of 97%, 0.97 and 0.98 respectively. The best model for binary classification in the custom model category is CovCNN_4 achieving an accuracy of 98.48%, sensitivity as 1 and specificity as 0.97. In the pre-trained category, the best model is VGG19 with 98.48%, sensitivity as 97.73% and specificity as 98.86%. The accuracy achieved with the proposed algorithm is compared with the existing and is given in Table 8.
Table 6 Quantitative performance validation results of CovCNN architectures on the COVID-19 X-ray dataset Table 7 Quantitative performance validation results of pre-trained architectures on the COVID-19 X-ray dataset Table 8 Comparison of the proposed accuracy with the existing algorithm