1 Introduction

A chest infection is a kind of infection that affects the proper functioning of the lungs (both larger and smaller airways) [1]. The severity of a lung infection depends on several factors like causes of lung infection (virus or bacteria) and the overall health of the infected person. The most common lung infections are pneumonia, chronic obstructive pulmonary disease (COPD), asthma, bronchitis, and lung cancer. Coronavirus disease popularly known as COVID-19 is a kind of lung infection disease. It is caused due to the novel discovered virus know as severe acute respiratory syndrome coronavirus 2 (SARS- CoV-2). Coronaviruses are a family of viruses that are known to cause diseases like a common cold, severe acute respiratory syndrome (SARS), and Middle East respiratory syndrome (MERS)[2]. The coronavirus disease is first discovered in Wuhan, China, in December 2019. The unprecedented rise in COVID-19 cases is impacting the worldwide economy and declared a pandemic by the World Health Organization [3].

On 22 May 2020, a total 5,207,918 patients are infected with COVID-19, and 334,848 deaths are reported across 215 countries [4]. To control the spread of the COVID-19 virus, its accurate detection and treatment are required. Reverse transcriptase polymerase chain reaction (RT- PCR) is the standard diagnostic test for COVID-19 [5]. The high popularity of PCR is due to its high selectivity and sensitivity, i.e., over 90%. The limitations of the COVID-19 testing with PCR technique are (a) very time consuming, (b) expensive, and (c) shortage of kits due to long production time [6]. Considering the alarming rates of spread of COVID-19, a faster and cheaper testing mechanism is required to tackle this outbreak. The need for a faster screening technique to control the COVID outbreak has also been studied by the authors in [7, 8]. Researchers have found that radiological analysis like X-rays and chest CT (computed tomography) scans have high accuracy in COVID-19 diagnosis and can be an effective tool for large scale screening. A high correlation between RT-PCR and radiological results for COVID-19 diagnosis is established in [9]. Also, COVID-19 infection is identified through ground-glass opacity patches (GGO) in radiographic scans of patients. This encouraged the development of a faster and cheaper COVID-19 screening mechanism using a radiological approach. Also, deep learning is playing a critical role in medical image analysis which motivates its use in screening of COVID-19. The growing role of deep learning has also been analyzed by the study of AI-based COVID classification techniques [10] where authors find that deep-learning-based techniques can provide very promising results for COVID classification. The details of the techniques available in the literature for COVID-19 diagnosis is put forth in Table 1.

From the detailed analysis of the state of the art of COVID-19 diagnosis field, it can be inferred that chest radiography (X-rays and CT scan) is the best alternative for COVID-19 detection in comparison to the RT-PCR test kits [31]. However, CT scan modality seems to be most efficient in comparison to chest X-ray due to the following reasons: (a) CT scan gives a detailed 3-dimensional view of the diagnosed organ whereas X-rays give a 2-D view, (b) the CT scan does not overlap the organ, whereas in X-rays ribs overlap the lungs and heart. Due to the high precision of a CT-scan based screening system, a deep learning-based 3 step model is proposed which consists of a transfer-learning-based feature extractor, a feature selector, and a feature classifier. In the proposed work, a truncated VGG16 architecture is proposed for extracting features. The last two blocks of the truncated architecture are fine-tuned with differential learning rates. PCA is applied to the features extracted by CNN. For the classification task, four different classifier models are compared.

The study addresses various issues with the current COVID-19 datasets and proposes various techniques to overcome these. Transfer learning capabilities of various models have been demonstrated and compared. Furthermore, various techniques like truncation and differential learning rates are proposed to increase robustness. The effect of various feature selection techniques has also been studied. Finally, bagging SVM is chosen for classification after a comparative study of popular classifiers. The proposed model within 385 ms achieved an accuracy of 95.7%, the precision of 95.8%, area under curve (AUC) of 0.958, and an F1 score of 95.3% on the 208 test images. The results obtained on diverse datasets prove the superiority and robustness of the proposed work.

The rest of the paper is organized as follows: Section 2 illustrates the proposed methodology; Section 2.7 put-forth the details of different classifiers. Section 3 gives the details of results and discussion. Then Section 5 concludes the proposed work.

Table 1 Summary of techniques available in literature for COVID19 screening

2 Proposed methodology

The chest CT scans of COVID-19 patients contain patches of ground glass opacity (GGO); thus, a multi-dimensional feature extractor is required for screening [32]. In the proposed work, the VGG16 architecture is fine-tuned and used to extract features from lung CT scan images. Since the size of the COVID-19 dataset is very small, a truncated version of the VGG16 architecture is used. PCA is used to reduce the dimensionality of the features obtained from truncated VGG-16. The final classification is performed using four different classifiers. The self-explanatory block diagram of the proposed methodology for COVID-19 classification is shown in Fig. 1.

Fig. 1
figure 1

Self-explanatory block diagram of the proposed methodology of COVID-19 screening

2.1 Training data

In the proposed work, the dataset is collected from three different sources to ensure the robustness of the model. The brief details of datasets used are:

  1. a)

    Dataset 1 (D1)- A CT scan dataset of 617 COVID and non-COVID images, compiled by Zhao et. al [14]

  2. b)

    Dataset 2 (D2)- COVID-19 image data collection (53 COVID CT scans): Joseph Paul Cohen, Paul Morrison, and Lan Dao [24]

  3. c)

    Dataset 3 (D3)- Italian society of medical and interventional research (60 COVID-19 CT scans) [15]

The final split of the data is summarized below:

  • Training : 432 images (204 COVID and 228 non-COVID before augmentation

  • Validation : 62 images (29 COVID and 33 non-COVID)

  • Test : 208 images (111 COVID and 97 non-COVID)

Some images in these datasets had markings and other non-removable artifacts and hence had to be dropped. Table 2 puts forth the details of CT scan images available in D1, D2, and D3 along-with the details of training, validation, and test set used. The minimum, average, and maximum height are 153, 491, and 1853. The minimum, average, and maximum width are 124, 383, and 1485. These images are from 216 patient cases. For patients labeled with positive, 169 of them have age information and 137 of them have gender information. The images are from multiple sources including hospital donations, pre-prints, and reports released by the ISMIR [15]. The exact patient and image details can be found at [14, 15, 24].

2.2 Pre-processing module

As the input images are of different sizes, thus all the input images are resized to 112 × 112 × 3 to maintain the uniformity.

CT scans have artifacts like beam hardening, noise, and scatter, which reduce the accuracy of the model. To overcome this, first, a median filter has been applied. Median filtering is a widely used nonlinear method used to remove noise from images while preserving edges. The median filter operates by replacing each pixel value with the median value of neighboring pixels. 5 × 5 is a popular median filter kernel choice for biomedical images and has hence been chosen [33]. Finally, morphological close transformation is applied to the image. A morphological close operation is a dilation operation followed by an erosion operation. It removes holes and any remaining salt and pepper noise from the images. It has been shown to be highly effective on binary and gray-scale images [34].

Since the images are of different scales and have labelings and markings around corners, an adaptive ROI selector has been applied to the images. The filter first centers and straightens the image, then applies an elliptical mask on the images such that the non-lung parts get cropped out. The ellipse is fitted to the image using the abrupt pixel-value changes as we move close to the rib-cage bones (black to white color change). Furthermore, all the masked images are manually checked to ensure that no image is over or under-cropped. Such images are manually fixed. Figure 2 shows the pictorial representation of the various pre-processing module used in the study.

Fig. 2
figure 2

Pictorial representation of various stages of the pre-processing module

Table 2 The brief details of the dataset for the proposed model

2.3 Image augmentation

Data augm‘entation allows the model to learn a more diverse set of features and also increases the size of the dataset thereby preventing the model from overfitting. Each training image is augmented by a random affine transformation, random flip, and random changes in hue, brightness, and saturation of the image. The random affine transformation consists of shearing and rotation. The details of image augmentation parameters include (a) rotation—within range of 0 to 30, (b) shearing—0.2, (c) zooming—0.2, and (d) changing the brightness level—within range of 0.75 to 1.5. The augmentation parameters were chosen based on the study of the effectiveness of image augmentation techniques on deep networks [35]. The parameters chosen have also been used in [36] where authors have achieved great results with these settings on a CT scan classification problem similar to this.

The training data after augmentation is (a) 612 of COVID-19 images and (b) 684 of non-COVID-19 images.

2.4 VGG-based feature extractor

Table 3 shows the result of the comparative study considering the popular CNN architectures. The accuracy reported is on the test set by adding a classification layer to the model, and it can be seen that VGG16 outperforms other deeper architectures. This is an interesting observation that deeper models are performing poorer on the current COVID-19 datasets. This is probably due to the size and quality of the currently available datasets. Due to the better performance of VGG16, it is the model of choice in this study. Similar observations have been made in [37] where authors use VGG16 to classify COVID-19 from a multi-modal input.

Initially, the VGG model is trained on the ImageNet database with over 14 million images [38]. Instead of using large receptive fields, VGG16 uses very small receptive fields (3 × 3 with a stride of 1). VGG16 incorporates 1 × 1 convolution layers to make the decision function more non-linear without changing the receptive fields. Since the COVID-19 dataset is much smaller with only 591 training images (before augmentation), the high complexity of the feature set will be difficult to generalize. To prevent this, a truncated VGG16 architecture is proposed which limits the complexity of the features. The first four convolution blocks of the VGG16 architecture are used for the proposed truncated architecture as shown in Fig. 3. The truncation of the architecture reduces model complexity and number of trainable parameters, which eventually helps in reducing overfitting. This technique has also been used in Inception-net based COVID classification in [39] where the authors truncated the architecture to reduce overfitting. The truncation layer is determined by evaluating performance on the validation set with different points of truncation as detailed out in Table 4.

Fig. 3
figure 3

Architecture of truncated VGG16 model

2.5 Transfer learning

Training a neural network from scratch requires huge amounts of data. As the COVID-19 dataset available is significantly smaller, transfer learning is applied to extract an accurate and concise feature set from the training data. This is a popular technique and has also achieved great results in [11, 13, 17, 19].

Table 3 Comparative study of various popular CNN architectures

In the proposed methodology, a representation learning-based approach is used. A pre-trained VGG-16 model is fine-tuned and its intermediate outputs act as a representation of raw data. This representation serves as features for the classifier module. The first four blocks of the VGG16 architecture pre-trained on ImageNet weights are used for this purpose [13, 17]. Since the Image net set is non-overlapping to the problem, the last 8 layers, i.e., the third and fourth convolution blocks are fine-tuned on the augmented CT scan training data [40]. While training these, it is desired that the fourth block adapts more to the data compared to the third block. The third block carries relatively fewer complex features that do not need to change much. Hence, a higher learning rate has been used for the fourth convolutional block compared to the third convolutional block while fine-tuning [11]. The extracted features are displayed as a color map as shown in Fig. 4. Figure 5 shows the confusion matrices of the proposed model with and without fine-tuning of the VGG16 based feature extractor.

The feature extractor module reduces the dimension of the data to 25,000 features per image for an image size of 112 × 112 × 3 pixels. However, with only 591 training examples (before augmentation), the model would still overfit the features. To prevent this, feature selection and dimensionality reduction of data are performed.

Fig. 4
figure 4

Intermediate color-mapped outputs. a Layer 1. b Layer 4. c Layer 8. d Layer 14

Fig. 5
figure 5

Comparision of confusion matrices before and after fine-tuning of VGG16 by evaluation on the test set with bagging SVM as the classifier

2.6 Feature selector

Principal component analysis (PCA), autoencoders, and variance-based selectors are the most popular feature selectors for image data. [41] finds PCA to perform significantly better as a feature selector on biomedical data. PCA finds the eigenvectors of a covariance matrix with the highest eigenvalues and then uses those to project the data into a new subspace of equal or fewer dimensions. Autoencoders compress the input to a lower dimension. Variance-based methods select the features which have the highest variance over the data. PCA, autoencoder, and variance-based selector have been used to reduce the dimensionality of the feature set, and then their accuracies on the validation set are compared after classification with an SVM. Applying PCA with 95% variance representation yields 359 components. Since 95% variance is a standard value for variance-based reductions, the autoencoder and variance-based selectors were also configured to retain 95% variance of the original feature set. The results of the analysis are tabulated in Table 5. For the proposed model, PCA gives the highest accuracy because it represents the low-dimensional sample and synchronized variables. Furthermore, the extracted features from the training set are used to train the classification module to screen COVID-19 CT scans. The better performance of PCA as a feature selector has also been emphasized in [42].

2.7 Classification

For the classification task, the required features are extracted using the truncated VGG16 model and selected using PCA. In machine-learning, no single algorithm is suitable for all problems. Thus, for achieving the highest performance, 4 different classification models are evaluated. Various classification techniques used in the proposed work are as follows: (a) deep CNN, (b) bagging ensemble with SVM, (c) extreme learning machine (ELM), and (d) online sequential ELM (OS-ELM).

2.7.1 Deep CNN

CNN can successfully capture the spatial and temporal dependencies in an image through the application of relevant filters. The architecture performs a better fitting to the image dataset due to the reduction in the number of parameters involved and the re-usability of weights [43]. Since VGG is itself a CNN architecture, for the deep CNN model, a fully connected layer of size 1024 is added to the truncated VGG architecture followed by a softmax layer for classification. This gives us the most direct classification model where the feature extraction and classification are in the same CNN architecture. The deep CNN utilizes the fine-tuned weights and uses it to directly predict the output. Similar model has been used by authors in [11] where they have used fine-tuned Resnet-50 for chest CT scan classification and achieved an accuracy of 93%.

Table 4 Summary of various VGG16 truncation point accuracy evaluated on the validation set with SVM as classifier

2.7.2 Extreme learning machine

ELMs are single-hidden layer feedforward neural networks (SLFNs) that randomly choose hidden nodes and analytically determines the output weights of SLFNs through the generalized inverse operation of the hidden layer output matrices. The implementation of ELM is as described in [44]. The number of hidden nodes in the model is experimentally determined with the best-suited gamma (width multiplier for RBF distance). L2-normalized RBF activation function has been used. Experimentation has also been conducted with varying numbers of neurons in the hidden layer. Using validation set accuracy, it was observed that the highest accuracy was observed at 1000 hidden nodes. The performance of ELM is comparable to [45] where authors have used ELM to classify COVID-19 chest C-rays and have achieved an F1 score of 0.95.

2.7.3 Online sequential ELM

OS-ELM can learn data chunk by chunk with varying chunk size and provides faster sequential learning. The implementation of the model is the same as that described in [46]. It uses the idea of ELMs with a sequential determination of the output weights through the recursive least-squares (RLS) algorithm. OS-ELM consists of two phases, namely an initialization phase and a sequential learning phase. In the initialization phase, a base extreme learning machine model is trained using a small chunk of initial training data. For classification using OS-ELM, SLFN is implemented with a sigmoid activation function with 2500 hidden layers. As the model has very few hyperparameters, they have been optimized using a grid-search optimization.

2.7.4 Bagging ensemble with SVM

To improve the limited performance of the SVM (accuracy of 93.4%) due to the high complexity of time and space, the SVM ensemble with bagging is used. A single classifier may have a high test-error, but many small classifiers can produce a low test error and increase robustness because diversity compensates for error. For classification using the bagging SVM, the dataset is randomly divided into 10 parts. The individual classifiers are trained independently with the bootstrap technique and aggregated to make a joint decision by the deterministic averaging process. The proposed classifier model with “RBF” kernel and tuned hyperparameters is used as the base estimator. Bagging ensemble with SVM achieves the highest accuracy of 95.7% on the testing data. Due to the high accuracy of the bagging with the SVM model, it is the proposed classification method for COVID-19 screening. SVM also achieved exceptional results in biomedical studies such as [47, 48].

Table 5 Performances analysis of feature selection techniques on validation set using SVM as classifier
Table 6 Performance parameters of different classifiers on testing data

2.8 Evaluation metrics

Confusion matrices for different classifiers are shown in Fig. 8. The classifiers are evaluated on the test set with 111 COVID-19 images and 97 non-COVID images. The features for the model are extracted using the truncated VGG16 model and selected using PCA. The screening performance of the model is assessed using generalized performance parameters derived from the confusion matrix. Table 6 put forth the generalized performance parameters, namely, true positive (TP), false positive (FP), true negative (TN), false negative (FN), AUC, accuracy (ACC), precision (PRE), sensitivity (S1), specificity (S2), negative predictive value (NPV), and F1 score (F1).

3 Experimentation

3.1 Testing environment

The proposed methodology is implemented on python software, run on a CPU. The system requirements are an Intel Core i7 processor with a 4 GB graphic card, a 64-bit operating system at 1.80 GHz, and 16 GB RAM.

3.2 Screening of COVID-19 based on different classifier

Figure 6 shows the convergence graph of training and validation accuracy of the transfer learning–based CNN model for the bagging ensemble classifier with SVM. Figure 5 shows the confusion matrices of the proposed architecture with and without fine-tuning of the VGG16 based feature extractor. The confusion matrices are obtained by evaluating the models on the test set with bagging SVM as the classifier.

Fig. 6
figure 6

Convergence graph of accuracy vs epoch for proposed methodology (VGG16+PCA+bagging ensemble with SVM)

3.3 K-fold cross-validation

K-fold cross-validation divides the training set into k-subsets and trains k-models such that one of the subsets is left out while training the model. The accuracy of these k models is then averaged. The benefit of k-fold is that the entire dataset is used for both training and validation, and each sample is used for validation exactly once. The 10-fold cross-validation curve obtained for the bagging SVM is shown in Fig. 7. This is a 10-fold cross-validation curve plotted against the number of training examples. It compares the robustness of the model with experience and shows that the proposed model is well generalized. The average scores of 10-folds have been shown in the plot.

Fig. 7
figure 7

Learning curve for proposed method using 10-fold cross-validation

3.4 Adversial defense

Deep learning models are often fooled with noise perturbations in the image. Such perturbations or attacks lead to miss classification of images. To defend the model against such noise attacks, a defense module has been designed. To remove noise from an image before prediction, three image denoiser have been applied namely total variation, Gaussian filter, and wavelet denoising. The prediction of all three denoised images is passed to an ensemble which finally classifies the image as shown in Fig. 8. On evaluating this module with the test set after adding random noise, the model gave an accuracy of 82.34%.

Fig. 8
figure 8

Confusion matrices of the proposed methodology with different classifiers

4 Results

In the proposed work, the best performing model achieves an accuracy of 95.67% along with a precision of 96.83%. The area under the ROC curve (AUC) obtained is 95.8, as shown in Fig. 9. The proposed method aims to reduce the false-negative rate as much as possible since false-positive cases can potentially be identified in subsequent tests, but false-negative cases might not have that chance. The proposed model has a false negative rate of 4.33%, which is significantly lower than other COVID-19 CT scan screening models. The model proposed in this study achieves a very high accuracy of 95.67% on the testing data with a very low prediction time of 358 ms. This proves that deep learning-based approaches could be used to effectively and accurately screen COVID-19 at very large scales. Table 7 puts forth the comparative analysis of the proposed methodology with other existing techniques.

Fig. 9
figure 9

ROC characteristics curve for the proposed methodology (VGG16+PCA+bagging ensemble with SVM)

Table 7 Comparative analysis of COVID-19 detection proposed methodology with techniques available in the literature on the used dataset

5 Conclusion

A deep learning-based truncated VGG16 model is proposed in this study to screen COVID-19 patients using chest CT scans. The VGG16 architecture is fine-tuned and used to extract features from CT scan images. An interesting observation has been that pre-trained models are able to learn features very effectively with fine-tuning. The study demonstrates that VGG16 outperforms other models on biomedical image feature extraction. Another important observation has been that truncation improves the model’s performance on the limited dataset. The performance also improves on applying dimensionality reduction techniques, indicating a high correlation among features that need to be removed to boost the performance of the classifier. It was experimentally found that PCA performs much better than auto-encoders for biomedical image features, as has been verified by several other studies. Finally, a comparison of various popular classification techniques shows us that a bagging ensemble of SVM gives the best results and outperforms direct CNN classification, ELM, and OS-ELM.

Our study addresses the challenges associated with the limited and poor quality of COVID-19 radiology data. The study has proposed techniques like transfer learning, fine-tuning, model-truncation, image augmentation, and feature-reduction to overcome these. This should be helpful for practitioners aiming to use these datasets for their research and development. Furthermore, the importance of various pre-processing techniques has also been highlighted. While comparing different pre-trained models, it was found that both VGG16 and VGG19 gave great results within the constraints of the small dataset. While deeper networks were found to struggle, they will perform better when larger datasets are available. For currently available datasets, we conclude that VGG16 with appropriate truncation and fine-tuning gives the best feature maps. These features give good results when classified directly but the results are found to improve significantly when a feature selector like PCA is applied before classification.

With the quality of available COVID-19 data increasing, clinically verified, and trusted deep learning models may be developed for fast diagnosis of COVID-19. The superior performance of the deep models may result in AI-based diagnosis for various diseases especially in times of outbreak where rapid screening and early identification are crucial for effective containment. As the quality of available data increases, deep models can be very effective for diagnosis. Future works on the study would focus on making the model interpretable and subjective in classification with a severity score which may be useful in the case of screening. Furthermore, a region of infection marking may assist the medical staff to treat the infection.