1 Introduction

The COVID-19 outbreak is very severe and fatal and has resulted in a global pandemic and caused millions of deaths across the globe. It originated in late December 2019 in the Wuhan province of China. Since then, it has become a global health hazard spreading across the globe through personal and community based transmission [1]. It has symptoms similar to Pneumonia [2] but is more fatal than that. It is difficult to obtain a standard medication for the COVID-19 disease as it mutates quickly, forming different variants. In March 2020, the World Health Organization (WHO) declared the COVID-19 as a pandemic. The rampant spread of the disease and the unavailability of a standard medication has caused there to be 3.7 million deaths across the globe by May 2021 [3]. Thus, it is very essential to diagnose the disease in the patients at an early stage so that the transmission and spread of the disease can be curbed and the pandemic can be combated. RT-PCR (Reverse Transcriptase Polymerase Chain Reaction) is the most popularly used molecular type test for COVID-19 detection in human beings. An RT-PCR test conducted within three weeks after the occurrence of symptoms has a 90% accuracy in detecting Covid-19 cases positively [4]. But there are several limitations related to the RT-PCR test like it is not so widely available, and it takes more than 6 hours to get the test results. Lateral Flow Test (LFT) is another test that can be used to detect Covid-19, but it is not as reliable as the RT-PCR test. In situations where molecular tests like RT-PCR are not available and the time is constrained, the analysis of the individual chest radiography images can be used to detect Covid-19 efficiently and immediately. In addition to detection, the chest radiography analysis also reveals the percentage of the infection present, which can aid in prescribing a more suitable treatment for the patient. Some of the well-known chest radio-graphical techniques are Chest X-rays and CT Scan. There are various advantages and disadvantages associated with both CT Scan and chest Xray. The instrument required for a CT Scan are more expensive as compared to that required for chest Xray. Moreover, the disinfection time for CT scan is approximately 15 minutes which is far greater than chest Xray as there is a portable version of it which exposes the patient to a lesser amount of bacteria. The ionizing radiation to which the patient is exposed is also greater in CT Scan as compared to chest Xray. Since chest X-rays are readily available in the hospitals and they have low ionizing radiation exposure to the patients, so it is more preferable as compared to the CT scans [5].

Various medical professionals and radiologists have acknowledged the efficacy of the use of chest radiology to detect COVID-19 which gives a comparable performance with the RT-PCR test having an accuracy of 90% [6]. The manual diagnosis of COVID-19 from chest X-rays on a large scale is impractical and cumbersome. Hence, the Computer-Aided Diagnosis (CAD) tools are often used by radiologists to diagnose COVID-19 from chest X-rays and other chest radiographs. Various researchers have worked extensively on developing efficient and reliable CAD tools for detecting COVID-19 from chest X-rays. Initial research was done on using machine learning-based approaches for detecting pathogenic lungs infected with COVID-19 [7]. Methods using Fuzzy color-based preprocessing techniques have also been presented on chest X-ray images [8]. The features are then extracted from the processed images using the pre-trained models, namely MobileNetV2 and SqueezeNet. The discriminant features are then extracted by stacking the extracted features and using Social Mimic optimization. Then a Support Vector Machine was used to exploit the discriminant features and make the final COVID-19 detection. A Generative Adversarial Network (GAN) based approach was proposed by Rasheed et al. [9]. They used Generative Adversarial Network (GAN) for data augmentation and then employed a Convolutional Neural Network for representation learning. They augmented the dataset using GAN and then used a Convolutional Neural Network for representation learning. The higher dimensional features are reduced to lower-dimensional features using Principal Component Analysis (PCA), and then Logistic Regression was used to make the final COVID-19 classification. The data augmentation using GAN and the dimensionality reduction helped the authors achieve improved performance. Recently, transfer learning models have emerged to achieve regularization and generalization of deep learning models [10]. Another work was proposed, which performed feature extraction as well as classification using a single model instead of using different models [11]. Several pre-trained CNN models such as Xception, VGG16, DenseNet121, EfficientNet, and NASNet were employed and fine-tuned for COVID-19 detection from chest X-rays [12].

The detection of COVID-19 using a radiological expert is time-consuming, and an error-prone task [13]. Hence, an automated framework is required for COVID-19 detection using chest X-rays. Most of the existing works in the field of COVID-19 detection uses several deep neural network architectures, but they only use a single model. These methods are only dependent on unilateral decisions and rely on findings from a single model. These methods do not use any technique to combine the classification power of other methods. This motivates us to present this work. In this work, we present a novel Trained Output-based Transfer deep Learning (TOTL) approach for COVID-19 detection from chest X-rays. In this paper, we introduced an improved framework for COVID-19 Detection from Chest X-rays using the Trained Output based Transfer Learning Approach. We combine the results of various transfer learning models by training a deep neural network instead of conventional ensemble techniques while most of the existing works use conventional techniques like majority voting. Overall, the proposed work provides better results for COVID-19 detection by combining the strengths of various deep transfer learning models in terms of the extracted features. Moreover, the conformity of the deep learning models towards general problem statements also helps us to use medical imagery-based COVID-19 detection using fine-tuning. Most of the existing works done in this field have used classical performance metrics like Accuracy, Precision, Recall, and F1 Score. However, we have used classical as well as some other important performance metrics. Moreover, we also present the simulations on chest X-ray instead of CT scans which has been extensively used in recent works. The reason for using chest X-rays is that it is readily available in hospitals and they, have low ionizing radiation exposure to the patients, hence it is more preferable as compared to the CT scans. We performed technical analysis like loss convergence of our model, the effect of data augmentation, hyperparameter analysis relating to epochs, optimizer, and confidence intervals. In general, the performed experimental analysis reveals the better performance of our method as compared to the existing baselines and recently proposed techniques. We start by processing chest X-rays images using various techniques like denoising with Gaussian filter, contrasting using CLAHE, segmentation using U-Net, and then cropping around the lungs region to capture only the relevant details from the images. These processed images are then reshaped and passed through various deep transfer learning-based pre-trained models, namely InceptionV3, InceptionResNetV2, Xception, MobileNet, ResNet50, ResNet50V2, VGG16, and VGG19. The models are fine-tuned on the processed chest X-ray images, and the classification outputs are generated. The classification outputs are then fed to a deep neural network to train the outputs of the various pre-trained models and achieve a more balanced output combining the multiple capabilities of all the models. This model is then finally used to detect COVID-19 from the chest X-ray images. We performed extensive experiments on four benchmarked datasets and evaluated several performance metrics. The performance of our model was also compared with several classical deep transfer learning models and multiple contemporary COVID-19 detection frameworks. The experimental results reveal the utility of our proposed framework as an efficient COVID-19 detection model. The major contributions of this work are as follows:

  1. (i)

    We proposed a novel Trained Output-based Transfer Learning (TOTL) approach for COVID-19 detection from chest X-rays. It trains the combined outputs of several fine-tuned deep transfer learning-based pre-trained models to capture the performance capabilities of every model and achieve improved performance.

  2. (ii)

    We process the chest X-ray images various preprocessing techniques like denoising using Gaussian filter, contrasting using CLAHE, segmentation using U-Net.

  3. (iii)

    We fine-tune several deep transfer learning-based pre-trained models like InceptionV3, InceptionResNetV2, Xception, MobileNet, ResNet50, ResNet50V2, VGG16, and VGG19 using preprocessed chest X-ray images.

  4. (iv)

    An automated framework for COVID-19 detection using chest X-ray has been proposed having real-world prevalence as is demonstrated by extensive experimentations done by us.

The rest of the paper is organised as follows. Section 2 discusses some of the existing work in the field of COVID-19 detection using chest X-rays. The preliminary concepts required for a better understanding of this manuscript are mentioned in Sect. 3. In Sect. 4 we discuss the various datasets and the evaluation metrics used by us for this study. Section 5 illustrates our proposed Trained Output based Transfer Learning approach for COVID-19 detection from chest X-rays. The experimental results and analysis are discussed in Sect. 6. Finally, the concluding remarks and scope for future extensions are given in Sect. 7.

2 Related Work

Due to the recent COVID-19 pandemic, a lot of work is being done in detecting and tackling it. Numerous research works have been published in this field. Recent advancements in the field of computer vision have paved the way for the application of deep learning and convolutional neural networks in the field of medical image diagnosis like retinopathy identification [14,15,16], tumor detection [17,18,19], and also for COVID-19 detection [20]. In this section, we discuss several recent research works done in the field of COVID-19 detection using chest X-rays and CT scans. The various works done in the field of COVID-19 detection involves the usage of deep neural networks, convolutional neural networks, transfer learning, ensemble learning, etc.

Luz et al. [21] presented a deep learning based approach to COVID-19 detection. They used the Covidx dataset for their study and to evaluate the efficiency of their model. Their proposed model achieved a 93.9% accuracy and a 96.8% sensitivity. Hemdan et al. [22] proposed an automatic COVID-19 detection framework, COVIDX-Net to detect COVID-19 from the chest X-rays. They used seven different deep learning architectures. They used a dataset containing 50 chest X-ray images. VGG16 and DenseNet201 both provided a 90% accuracy on the dataset. Basu et al. [23] presented an extension to the detection framework, which used a transfer learning based method. As part of their extension, they presented an algorithm for various abnormality detection caused by the COVID-19 disease in the patients as present in the chest X-rays. They used a Gradient Class Activation Map for feature extraction from the chest X-ray images. For detecting features from X-ray images, they used Gradient Class Activation Map. They evaluated their performance on the NIH chest X-ray dataset and achieved an accuracy of 95.3%. Ozturk et al. [24] proposed an automated COVID-19 detection framework, DarkCovidNet, which uses chest X-ray images. They used a dataset containing 125 images. They achieved a binary class accuracy of 98.08% and a multi-class accuracy of 87.02%. One of the drawbacks of their approach is the small dataset size.

Das et al. [25] proposed a COVID-19 detection framework using deep learning techniques for COVID-19 detection using chest X-ray images. They attained an accuracy of 97.4% through their approach. Tuncer et al. [26] presented an automated COVID-19 detection method for chest X-ray images. They performed feature extraction using a residual exemplar local binary pattern. Then they selected important features through iterative Relief. The chosen features are then fed into several well-known classifiers. They evaluated their approach on a small dataset having just 321 chest X-ray images and obtained a classification accuracy of 99% using the SVM classifier. Methods using Fuzzy color-based preprocessing techniques have also been presented on chest X-ray images [8]. The features are then extracted from the processed images using the pre-trained models, namely MobileNetV2 and SqueezeNet. The discriminant features are then extracted by stacking the extracted features and using Social Mimic optimization. Then a Support Vector Machine was used to exploit the discriminant features and make the final COVID-19 detection. They achieved a classification accuracy of 98.25% and 97.81% with MobileNetV2 and SqueezeNet, respectively. A COVID-19 detection framework called Covid-Net was proposed by Wang et al. [27]. Covid-Net presented a multi-class classification study for detecting Normal, Pneumonia, and Covid-19 infected chest X-ray images. Covid-Net gave a better performance as compared to the VGG19 and ResNet50 frameworks.

Data augmentation techniques have been extensively used for regularization and generalization of deep learning models [28,29,30,31]. Zheng et al. [32], worked on improving the performance of deep convolutional neural networks by proposing a full-stage simultaneous data augmentation for the training and testing phase. The consistent data augmentation technique helps in achieving the appropriate transfer of the domain specific knowledge to the model thereby improving the performance. Shen et al. [33], proposed a metaheuristic based approach for Covid-19 detection from chest Xrays. They performed feature extraction using a pre-trained ResNet18 architecture. The obtained features further undergo feature selection using a discrete social learning particle swarm optimization algorithm (DSLPSO). The selected features are then passed through an SVM classifier for final classification. Lung segmentation is often used in the detection of lung diseases to achieve an improved performance [34, 35]. A lung segmentation approach using CT scans to extract the lung region infected by CT scans was proposed by Oulefki et al. [36]. They used an efficient Kapur entropy-based multilevel thresholding unsupervised procedure. Further work on lung segmentation was used by Liu et al. [37] using a weakly supervised scribble annotation algorithm. Several researchers have also used respiratory sounds based on coughing, breathing, and voice parameters [38]. Lella et al. [39] proposed a 1D convolutional network framework for detecting Covid-19 from the respiratory parameters. They later improved this work by using deep convolutional network [40] and a lightweight convolutional neural network with modified MFCC and enhanced GFCC [41].

A COVID-19 study based on deep transfer learning frameworks was performed by Narin et al. [42]. They used ResNet50, InceptionV3, InceptionResNetV2. They obtained a classification accuracy of 98% for COVID-19 detection based on the chest X-ray images. An ensemble learning based approach was proposed by Chouhan et al. [43]. They presented an ensemble model comprising of deep transfer learning based frameworks, namely GoogleNet, AlexNet, ResNet18, InceptionV3, and DenseNet121. Their approach achieved an accuracy of 96.4% Shelke et al. [44] used DenseNet-161 for the analysis and diagnosis of the COVID-19 from the chest X-ray images. They performed experiments on a very small dataset containing only twenty-two X-ray images and achieved an accuracy of 98.9%. A hybrid approach combining ResNet50V2 and Xception was proposed by Rahimzadeh et al. [45] for discerning the identifying structural patterns in the chest X-rays for COVID-19 detection. Another framework named DeTraC was proposed by Abbas et al. [46] for COVID-19 detection. Pre-trained CNN architecture was used for feature extraction from the images. Thereby, local features were extracted from the images using a class decomposition method. For making the final classification, they used a gradient descent based method. The obtained classification results are further enhanced using a class composition layer. They used a dataset of 105 chest X-ray images for their experiments and attained an accuracy of 95.12%.

We can see from the discussion above that most of the works done in the field of COVID-19 detection do not have considerable detection accuracy to be of any real-world applications. Moreover, most of the existing techniques use pre-trained transfer learning models for either feature extraction or for fine-tuning. But only a few provide an ensemble based approach to combining the predictive powers of various models. The few ensemble approaches that are even presented only create a strong classifier from weak classifiers. Also, the datasets used in the existing works only use datasets having very few data samples. Through our work, we present a novel approach of combining several pre-trained deep transfer learning models by training their outputs using a deep neural network to achieve improved performance. The datasets used by us contain more data samples belonging to each category, namely Normal, COVID-19, and Pneumonia this makes our results more reliable.

3 Preliminaries

In this section, we explain several basic concepts like a deep neural networks, convolutional neural networks, and transfer learning. Knowledge and understanding of these concepts would help the readers to understand and implement our Trained Output based Transfer Learning (TOTL) approach for COVID-19 detection from chest X-rays.

3.1 Deep Neural Networks

Deep neural networks are inspired by the structure of the brain’s cerebral cortex. At the basic level, a perceptron is the scientific portrayal of a natural neuron existing in the brain of humans. It combines the inputs to the neuron into a single output using a pre-defined function. An activation function is used by the neuron to convert the outputs generated to a usable form. A deep neural network consists of an input layer, a network of hidden layers, and an output layer. The input layer takes the input as a feature set for the data points. The hidden layers process the input, where each layer contains multiple stacked neurons. The output layer finally makes the prediction for the data points. The prediction can be regression, classification, or clustering-based. The hidden layers iteratively adjust the weights associated with each connection between the neurons. The ideal weights are determined using back-propagation techniques, which adjust the weights using an optimizer like gradient descent. This is accompanied by a loss function like Mean Squared Error or Categorical Cross-Entropy. The quality of the prediction made is determined by the loss function, and the deep neural network aims to minimize this loss and achieve the most optimal results.

3.2 Convolutional Neural Networks

Convolutional Neural Networks are variations of the deep neural networks used for various computer vision tasks like image classification, object detection, image segmentation, image recognition, etc. It exploits template matching techniques to complete a vision-associated task. A stack of convolution layers is used for extracting essential features from the input images. It contains various pooling layers, channels (Kernels), and finally fully connected layers (FC). The last layer uses an activation function to make the final prediction for the input sample in a probabilistic range of 0 to 1 [47]. A convolutional neural network usually consists of several convolutional layers. The convolutional layers are made up of filters of varying sizes which extract different features and use an activation mapping to map the inputs to their corresponding outputs. The activation maps are combined by stacking them to obtain a substantial output volume and confidence. Various CNN’s take input of a pre-defined fixed size and produce outputs of a different shape. A convolutional layer takes as input a lot of parameters like the Number of filters, stride, and activation, which describe the convolutional layer. A pooling layer always follows a convolutional layer to combine the outputs generated by various filters. The pooling layers can be average pooling, max-pooling, etc., making it more efficient. Convolutional layers reduce overfitting and make neural networks concentrate on the more essential details [48].

3.3 Transfer Learning Based Deep Convolutional Networks

Transfer learning refers to using a large and complex pre-trained model, which is trained on a large and diverse dataset to achieve a generalized framework. The trained model is used on a similar problem statement as the original model was trained but with relatively smaller datasets and allows only a few layers to be fine-tuned while all other layers are frozen. The obtained generic framework of a complex architecture allows giving an impressive performance on a variety of problem statements. Transfer learning can be used for feature extraction and fine-tuning based classification. For feature extraction, the classification part of the pre-trained transfer learning model is removed, and the remaining model can be treated as a feature extractor which can be used with another classification algorithm [49]. While for classification, we freeze some of the layers of the pre-trained model, which are not trained again on the new dataset, the rest all the layers are fine-tuned by the model, and the classification is made thus [50]. Transfer learning is beneficial when the data available is scarce. Also, having a generalized framework helps with overfitting and randomization effects on the weights and the training and fine-tuning process.

For our study, we use several pre-trained deep convolutional neural networks like VGG16, VGG19, ResNet50, ResNet50V2, MobileNet, InceptionV3, Xception, and InceptionResNetV2., which have been pre-trained on the Imagenet dataset [51]. VGG16 is a 16 layers deep model which improves the performance of AlexNet, which was amongst the first winners of the Imagenet competition, by increasing the depth of the model and reducing the size of the kernel filters from 11 and 5 to 3. VGG19 is similar to VGG16 with 19 layers. While ResNet50 is a deep residual network having residual blocks to avoid the vanishing gradient problem. It uses skip connections to connect the output of one layer to some other layer by skipping several layers in between. ResNet50V2 is a modified version of the ResNet50 architecture by improving the propagation formulation of the connections between blocks. As compared to ResNet50, MobileNet is a lightweight mobile application-friendly deep neural network having fewer trainable parameters by using depthwise separable convolutions. InceptionV3 is a 42 layers deep model. It improves the performance of its predecessors by incorporating Factorization into Smaller Convolutions, Spatial Factorization into Asymmetric Convolutions, Utility of Auxiliary Classifiers, and Efficient Grid Size Reduction. Xception model, on the other hand, is 78 layers deep and uses modified depthwise separable convolution to perform better than InceptionV3. InceptionResNetV2 is an amalgamation of Inception and ResNet architectures. It is 164 layers deep. It combines the multiple-sized convolutional filters with residual connections to avoid the degradation problem and reduce the training time.

4 Datasets and Evaluation Metrics

In this section, we describe the various datasets and evaluation metrics used by us. We used four real-life COVID-19 X-ray image based datasets collected from various online data sources. We also used several benchmarked evaluation metrics to obtain a measure of the performance of our proposed algorithm. The description of the datasets and the performance metrics are as follows:

4.1 Datasets

We used four different datasets for the purpose of our study. Figure 1 shows the sample images of various datasets belonging to the COVID-19, Normal, and Pneumonia categories. The used datasets are described below:

  1. 1.

    Dataset 1: This dataset was proposed by Asraf. It is obtained from Kaggle and contains 1525 images of COVID-19, Normal and Pneumonia each. The dataset can be accessed from https://www.kaggle.com/amanullahasraf/covid19-pneumonia-normal-chest-xraypa-dataset

  2. 2.

    Dataset 2: This dataset was proposed by Chowdhury et al. at the University of Dhaka and Qatar University along with several medical practitioners and collaborators. It contains 3616 COVID-19 samples along with 10192 Normal and 1345 Viral Pneumonia chest X-rays. The dataset is available at https://www.kaggle.com/tawsifurrahman/covid19-radiography-database

  3. 3.

    Dataset 3: This dataset is also a chest X-rays based COVID-19 detection image dataset. It contains 137 COVID-19 samples, 90 Normal samples, and 90 Pneumonia samples. The dataset can be accessed from https://www.kaggle.com/pranavraikokte/covid19-image-dataset

  4. 4.

    Dataset 4: This dataset contains 576 COVID-19 image samples, 1583 normal image samples, and 4273 pneumonia images. Since the COVID-19 samples are relatively less than the normal and pneumonia samples, we exclude the Pneumonia samples from our experimentations and use the COVID-19 and the Normal image samples. The dataset can be obtained from https://www.kaggle.com/prashant268/chest-xray-covid19-pneumonia.

Fig. 1
figure 1

Examples of Chest X-rays in all considered datasets having COVID-19, normal, and pneumonia samples

4.2 Evaluation Metrics

We test the performance of our proposed model on several benchmarked evaluation metrics for the classification tasks, namely, Accuracy, Precision, Recall, F1 Score. The description of these performance metrics is given below:

  1. 1.

    Accuracy: Accuracy is defined as the measure of the model to correctly classify the samples to their corresponding labels. Mathematically, it can be explained as follows:

    $$\begin{aligned} Accuracy\, =\, \frac{TP + TN}{TP + TN + FP + FN} \end{aligned}$$
    (1)
  2. 2.

    Precision: Precision is used as a measure to estimate the exactness of the algorithm. It gives a measure of the agreement of the exact labels with respect to the positive labels predicted by the model. Mathematically it can be expressed as follows:

    $$\begin{aligned} Precision\, =\, \frac{TP}{TP + FP} \end{aligned}$$
    (2)
  3. 3.

    Recall: Recall is also known as sensitivity of a model and it signifies the completeness of the model. Higher value of Recall suggests lower false positives being predicted. Mathematically it can be expressed as follows:

    $$\begin{aligned} Recall\, =\, \frac{TP}{TP + FN} \end{aligned}$$
    (3)
  4. 4.

    F1 Score: F1 score shows the balance between Precision and Recall. In other words, it is a measure of the data distribution. Mathematically it can be expressed as follows:

    $$\begin{aligned} F1 Score\, =\, \frac{2*Precision*Recall}{Precision+Recall} \end{aligned}$$
    (4)

Here, TP (True Positive) suggests that a patient is COVID-19 positive, and the model has correctly identified him as positive. In the same way, FP (False Positive) represents that a patient doesn’t have COVID-19 but the model has detected him as COVID-19 positive. While TN (True Negative) shows that a patient doesn’t have COVID-19, and the model has predicted him not to have COVID-19. Similarly, FN (False Negative) implies that a patient has COVID-19, but the model has detected him not to have COVID-19.

5 Proposed Work

In this section, we describe our proposed Trained Output based Transfer Learning approach for more accurate COVID-19 detection based on chest X-ray images. Our approach consists of three phases: (i) Preprocessing phase, (ii) Training Deep Transfer Learning models, and finally, (iii) Training outputs. The details about these phases is given below.

5.1 Preprocessing

The chest X-ray available in the datasets used by us contains noise, varied brightness, and irregular shapes. To counter such effects, we preprocess our dataset. We start by first removing the noise from the images using a Gaussian filter, which stabilizes the variance in the images [52]. The contrast in these images is then improved using the Contrast Limited Adaptive Histogram Equalization (CLAHE) approach [53]. It improves the visibility of all the images across the dataset. Then we perform lung segmentation to extract only the lung region using U-Net [54]. The parameters used for the U-Net are as follows: ResNet34 is used as the backbone, imagenet weights form the encoder weights, while we use transpose as the decoer block types. We train the U-Net on the Chest Xray Masks and Labels dataset. We achieve a training loss and Intersection Over Union (IOU) of 0.0421 and 0.9886, respectively. The entire image is then cropped around the lungs to only capture details essential to the problem statement of COVID-19 detection. The entire process is shown in Fig. 2. Moreover, we also augment data to obtain a more generalized framework. Since neural networks are complex architectures having lots of neurons and numerous parameters, hence to obtain an appropriate learning curve, there should be a sufficient amount of data. Hence, to compensate for the lack of available data, we augment the training and validation using rotation by 15 degree, vertical flipping, horizontal flipping, and slant angle (0.2) for shear transformation. The processed images are then re-sized as per the pre-trained transfer learning models we used. We use 224\(\times \)224\(\times \)3 shaped images for the VGG16, VGG19, ResNet50, ResNet50V2, MobileNet architectures and 299\(\times \)299\(\times \)3 shaped images for the InceptionV3, Xception, InceptionResNetV2 models. Then we split the entire dataset into training, validation, and testing. We first split the entire dataset in an 80:20 ratio. The 20% dataset is kept for testing while the 80% dataset is split again by keeping 70% for training while 10% for validating. We use a larger share of the dataset for the training purpose as the model trains and learns parameters from it. Moreover, increasing the training dataset further might result in the model being over-fitted to the training dataset and losing its robustness for the testing dataset.

Fig. 2
figure 2

Various preprocessing techniques used by us in our approach

5.2 Training Deep Transfer Learning models

In this phase, we extract several semantic spatial representations and other crucial features from the COVID-19 X-rays. It is not very efficient and effective to train a deep Convolutional Neural Network architecture from scratch as there is a lack of sufficient quality and quantity of data related to COVID-19. To overcome this, we use several state-of-the-art pre-trained Transfer Learning models, VGG16, VGG19 [55], ResNet50, ResNet50V2 [56], InceptionV3 [57], Xception [58], MobileNet [59] and InceptionResNetV2 [60]. These models are selected after an extensive experimental study which suggests their use due to the contribution made by them in the improvement of the classification performance based on the essential features extracted by them. The models are used with the pre-trained imagenet weights along with fine-tuning on our dataset. The various statistical details consisting of the shape of the input, number of parameters, layers, layers that are frozen, and the size of the spatial features obtained is mentioned in Tab. 1. Figure 3 shows the implementation of transfer learning for our proposed approach. We use the pre-trained complex Convolutional neural networks which are pre-trained on the ImageNet dataset. The pre-trained models are then fine-tuned on the COVID-19 chest X-ray dataset. The fine-tuned models are then used to make the COVID-19 predictions. Following along these lines, we use several deep transfer learning based models like InceptionV3, InceptionResNetV2, Xception, MobileNet, ResNet50, ResNet50V2, VGG16, and VGG19. All the chosen models are thereby fine-tuned on the training datasets as mentioned in Sect. 4.1. These fine-tuned models are then used in the next section to make the final COVID-19 classifications.

Table 1 Statistical details about the various transfer learning models used by us
Fig. 3
figure 3

The implementation of transfer learning for the task of COVID-19 detection

5.3 Training Outputs

The models trained in the previous phase give varied outputs for the same input image. Some perform well on some images while others perform well on other images. This shows that we need a framework to optimally combine the outputs of these models and achieve a consensus which produces better results. To do this we feed the outputs of all the pre-trained models trained in the previous phase to a deep neural network efficiently combines and assigns proper weightage and importance to the outputs of these models. The outputs generated by the fine-tuned transfer learning model act as the features for the deep neural network which is trained against the ground-truth labels of the chest Xrays. This combines the features extracted by the various transfer learning models to obtain an improved performance. We use a six-layered deep neural network with the first five layers having 64, 128, 256, 128, 64, neurons with ReLU activation function [61] and 0.2 dropout rate. The final layer is a classification layer having softmax activation. The entire framework including the image preprocessor mentioned in Sect. 5.1, transfer deep learning models discussed in Sect. 5.2 and the output combining deep neural network described in this section are used to make the final COVID-19 classification based on the input chest X-ray images.

Figure 4 shows our proposed Trained Output based Deep Transfer Learning approach for COVID-19 detection using chest X-rays. We start by preprocessing the raw images as described in Sect. 5.1. Then we resize the images to fit the input requirements of various transfer learning based models. The images are then fed to these transfer learning models to fine tune the model for the COVID-19 detection task. Then we use the outputs of these models to train our deep neural network architecture for the final COVID-19 detection based on the input chest X-ray images.

Fig. 4
figure 4

The Algorithmic flowchart of our proposed trained output based transfer learning (TOTL) approach

Table 2 The values of the various parameter used in the study

5.4 Hyperparameters

Hyperparameters refer to the technical design which describes the architectural details of a model. The various hyperparameters of our model are given in Tab. 2. The hyperparameters are chosen with the help of the Random Search technique. It works by selecting and evaluating random combinations of hyperparameters from a search space to choose the most optimal parameters for the model. The activation functions chosen are ReLu and Softmax, the loss function is Categorical-crossentropy, and the optimizer is chosen to be SGD (lr = 1e-4, momentum = 0.9). The decay rate for the optimizer is 0.9. The dropout rate is set to be 0.25, the callback is chosen to be ReduceLROnPlateau while the entire framework is run for 25 epochs or iterations.

6 Experimental Analysis

In this section, we discuss the performance of our proposed Trained Output based Transfer Learning (TOTL) approach for COVID-19 detection from chest X-rays. We performed experiments on all the datasets mentioned in Sect. 4.1 based on the performance metrics mentioned in Sect. 4.2. We compared the performance of our proposed approach with several recent Chest X-ray based COVID-19 detection approaches, namely ResNet50 + SVM [62], CovidGAN [63], COVIDX-Net [22], CAAD [64], Dark COVID-Net [24], CORO-Net [65], Covid-Net [27], CNN-LSTM [66], ResNet+PSO [33], and DNN+XGBoost [67]. Since, we proposed an integrational technique combining the outputs of several state-of-the-art pre-trained Transfer Learning models, VGG16, VGG19 [55], ResNet50, ResNet50V2 [56], InceptionV3 [57], Xception [58], MobileNet [59] and InceptionResNetV2 [60], so we also compared our obtained results with the results obtained by these models, individually. The entire code is developed in the python programming language. To obtain results on existing techniques, we used several python libraries like sklearn, NumPy, pandas, etc., and some publicly available GitHub repositories. We performed simulations on a personal computer with Intel i7 11th generation processor, 16 GB RAM, and RTX 3070 graphics card. The training loss of the converged model for the various datasets is shown in Fig. 5. From the figure, we can clearly see that our proposed model converges around 25 epochs for all the datasets. This justifies the choice of the number of epochs for our framework. The further experimental results are discussed below.

Fig. 5
figure 5

Training loss of the converged model for various datasets

Table 3 Results obtained for all evaluation metrics for various transfer learning algorithms for dataset 1

6.1 Dataset 1

Tab. 3 shows the results obtained by various algorithms for all the evaluation metrics for Dataset 1. We see that our proposed model performs exceptionally well in terms of Accuracy with a percentage of 96.82%. ResNet50 + SVM is the best performer in terms of contemporary models. Also, InceptionV3 performs the best in terms of the classical transfer learning-based algorithms with an accuracy of 95.01%. InceptionV3 is the best performer in terms of Precision with a score of 98.54%, while the contemporary COVIDX-Net obtains the best results for Recall with a score of 98.42%. Our proposed TOTL approach performs reasonably well for these metrics as well. For the F1 Score, our proposed TOTL and the ResNet50 + SVM perform the best with an almost equal score. The obtained results also show that only our proposed algorithm gives stable results throughout the performance metrics. This can be attributed to the proper image denoising, contrasting, segmenting-based pre-processing techniques used by us. Moreover, the ability of our model to optimally combine the results of all the chosen pre-trained models also helps it capture better details and generate improved performance compared to other models.

Table 4 Results obtained for all evaluation metrics for various transfer learning algorithms for dataset 2

6.2 Dataset 2

Tab. 4 shows the results obtained by the various algorithms on dataset 2 to evaluate different performance metrics. We see that in terms of all the performance metrics, our proposed Trained Output-based Transfer Learning (TOTL) approach performs the best compared to all the contemporary algorithms and all the transfer learning algorithms. Amongst the various transfer learning algorithms, Xception performs the best for accuracy, recall, and F1 Score, while InceptionResNetV2 performs the best for precision. CNN + LSTM performs the best throughout the various evaluation metrics amongst all the chosen contemporary algorithms. The better performance of our approach as compared to other algorithms can be attributed to the use of appropriate pre-processing techniques to make the input images more clearer and less noisy. The segmentation and cropping of images across the lung regions also help our model to focus on only the relevant details. The proper feature extraction done by various transfer learning models also help our model to capture various crucial details regarding COVID-19 infection from the chest X-rays. We see that our proposed algorithm outperforms all the other algorithms by a considerable margin, thus establishing its efficacy as a COVID-19 detection framework.

Table 5 Results obtained for all evaluation metrics for various transfer learning algorithms for dataset 3

6.3 Dataset 3

In Tab. 5, we present the experimental results obtained for Dataset 3 for all algorithms across all the performance metrics. We infer from the results that our proposed Trained Output-based Transfer Learning (TOTL) approach outperforms all the other algorithms in terms of Accuracy and F1 Score. For Precision, our algorithm falls short only to Dark COVID-Net with a minute 0.11%, while for Recall, it lags behind COVIDX-Net and Xception with a very tiny fraction. But by obtaining an improved result for the F1 Score, our proposed algorithm proves its utility in getting balanced results for Precision and Recall as compared to other contemporary algorithms. Moreover, a high accuracy signifies the high classification prowess of our algorithm. The better performance of our approach can be attributed to the proper training of the outputs of various transfer learning approaches with a deep neural network instead of using trivial ensemble techniques like majority voting. Moreover, the use of several transfer learning methods exploits the dataset to extract different features from the dataset appropriately, as per the capabilities of the candidate models. This also helps our approach to give better results.

Table 6 Results obtained for all evaluation metrics for various transfer learning algorithms for dataset 4

6.4 Dataset 4

In this section, we discuss the experimental results obtained for Dataset 4. Tab. 6 presents the results obtained by various transfer learning algorithms and several contemporary COVID-19 detection algorithms across all the chosen evaluation metrics for Dataset 4. The results exhibit the superior performance of our proposed Trained Output-based Transfer Learning (TOTL) approach as compared to other chosen algorithms. Our model gives better results for all the evaluation metrics, apart from recall. For recall, our algorithm falls short only to VGG16, ResNet50, and COVID-Net. The proper pre-processing techniques can be attributed to the better results obtained by our approach as they improve the quality of the input images and highlight the lung region necessary for detecting COVID-19 from the chest X-rays. Combining various transfer learning models also helps our model to extract various salient features as per the capabilities of each transfer learning model. The above discussion demonstrates the utility of our proposed Trained Output-based Transfer Learning (TOTL) approach as an optimal framework for COVID-19 detection from the chest X-rays of the patients.

Table 7 Accuracy results obtained for all the datasets obtained by various contemporary algorithms along with our proposed TOTL framework with and without data augmentation

6.5 Results Without Data Augmentation

This section presents the results obtained for all the datasets obtained by various contemporary algorithms along with our proposed TOTL framework with and without data augmentation. The obtained results are presented in Tab. 7. From the results, we can infer that the results obtained by our algorithm without data augmentation are inferior compared to that obtained with data augmentation. The obtained results are even inferior to several transfer learning-based approaches. This can be attributed to the class bias of our framework to the class having a larger number of samples due to the over-fitting during the training phase. However, the results obtained by our framework without data augmentation are still better than several contemporary algorithms like COVIDX-Net, CAAD, Dark COVID-Net, and CORO-Net.

Table 8 The accuracy results obtained by our model on various datasets as the various hyperparameters are varied

6.6 Hyperparameter Analysis

This section illustrates the impact of the hyperparameters on the final results. Tab. 8, shows the accuracy results obtained by our model on various datasets as the various hyperparameters are varied. From Tab. 8, we can see that how the accuracy improves as the number of epochs are increased for all the four datasets. It also shows the impact of the choice of optimizer on the performance of our model. We see that the chosen SGD optimizer gives the best performance as compared to other optimizers. The obtained results demonstrate the efficacy and efficiency of our chosen hyperparameters.

Table 9 Statistical significance of the model performance based on the confidence interval on accuracy for all the datasets

Tab. 9 presents the statistical significance of the model performance based on the confidence interval on accuracy for all the datasets. From Tab. 9, we can see that the obtained confidence results are obtained are very close to the accuracy results achieved by us. This shows that the performance of our model is mathematically justified and is superior to the other algorithms. Through the above discussion, we infer that our proposed Trained Output based Transfer Learning (TOTL) approach performs reasonably well for the task of COVID-19 detection using the chest X-rays. Our model also achieves a consistent performance throughout the datasets and performance metrics and outperforms various chosen transfer learning approaches and contemporary COVID-19 detection frameworks. The good results of our method can be attributed to the proper use of preprocessing techniques like denoising using Gaussian filter, contrasting using CLAHE, segmentation using U-Net, and cropping around the lung region to capture only the details relevant to COVID-19 detection. We also further train the outputs generated by several transfer learning models to assign appropriate importance to the outputs obtained by each of them. Thus achieving an enhanced performance as compared to their individual results. The extensive size of the datasets used by us also provide us with sufficient data to optimally train the model parameters.

7 Conclusion

COVID-19 is a global pandemic that has spread across the globe and has caused millions of deaths. The existing COVID-19 screening methods are expensive, time-consuming, and menial. In this paper, we introduced a novel Trained Output-based Transfer Learning approach for COVID-19 detection using chest X-rays. We started by preprocessing the chest X-rays using various techniques, nemely, denoising using Gaussian filters, contrasting using CLAHE algorithm, segmentation using U-Net, and then cropping around the lungs region to only use relevant details. Then we passed these images through various deep transfer learning-based pre-trained models, namely InceptionV3, InceptionResNetV2, Xception, MobileNet, ResNet50, ResNet50V2, VGG16, and VGG19. The models are then fine-tuned on the chest X-ray images, and their outputs are fed to a deep neural network. The deep neural network assisted in allocating appropriate importance to the outputs of each model. This trained framework is then used to detect COVID-19 from chest X-rays. We performed extensive experimentations using four benchmarked datasets and evaluated several popular performance metrics. We also compared the performance of our model with several classical transfer learning models and various contemporary COVID-19 detection frameworks. The results obtained exhibit the efficiency of our proposed algorithm for COVID-19 detection. This work can be extended further by using several other symptoms or other organ infections to detect COVID-19. Moreover, we can also use other features like age, gender, medical history, genetic history, location info, etc., along with chest X-rays to detect COVID-19. Various respiratory parameters can also be considered in future research. Several other diseases can be explored using computer vision based approaches.