1 Introduction

COVID-19 was declared an epidemic by World Health Organization (WHO) on 18 April 2020, since this disease was highly infectious and was spreading rapidly across the world (Shereen et al. 2020). It is still affecting the world with severe consequences. A severe acute respiratory syndrome, coronavirus 2 (SARS-CoV-2), is the cause of this disease and includes symptoms like cough, fever, dizziness, breathing shortness, and acute respiratory distress (Huang et al. 2020; Kaur and Singh 2020).

Non availability of any medical treatment requires the detection of COVID-19 in early stages to control further spread and to treat the patient timely. For early classification of COVID-19 infection, numerous approaches have been proposed by researchers using radiological images like chest X-rays and CT-scans. These images can help in the early diagnosis of COVID-19 patients (Ai et al. 2020; Singh et al. 2021b) along with the severity assessment (Ng et al. 2020), (Das et al. 2020b). However, X-ray imaging is better than CT scanning due to low costing, availability of machines in hospitals, and portability of machines in ICUs and field hospitals (LINDA 2020; Kaur and Singh 2021a).

Recently deep learning models have been extensively used in the field of biomedical image processing and have proven their efficacy in classification of numerous diseases such as respiratory distress, tuberculosis, pneumonia, etc. (Jung et al. 2018; Kaur et al. 2019, 2020; Girdhar et al. 2021; Yeung et al. 2019). Inspired from this, in this research work, deep learning models are considered for COVID-19 detection from x-ray images.

Due to the nonavailability of a large number of images corresponding to COVID-19 compared to other diseases, it is quite challenging to develop effective techniques for the detection of COVID-19 from chest X-rays. However, the researchers have proposed many techniques with the help of available data, such as CNN Darknet (Ozturk et al. 2020; Kaur et al. 2021b), VGG19 (Simonyan and Zisserman 2014; Gianchandani et al. 2020), MobileNet v2 (Howard et al. 2017; Kaur et al. 2021a; Aggarwal et al. 2021), Inception (Szegedy et al. 2017, Singh et al. (2021a)), XceptionNet (Rahimzadeh and Attar 2020b; Kaur and Singh 2021b), etc.

In the image classification process, large and properly labeled dataset help in extracting the features in a better way. They provide much data to train the model and provide properly differentiated classes as results. When the model is trained on such heterogeneous data, they can be used universally irrespective of the discipline. Such models also help in learning and representing different classes of medical data, which is limited, unbalanced, and highly prone to generalization errors. The concept of transfer learning enables us to use pre-trained models on larger datasets (e.g., ImageNet) with the help of well-defined class boundaries. It also helps in getting classification results at faster speed. Therefore, very helpful to design an early diagnosis of COVID-19 suspected cases. These features have motivated us to incorporate transfer learning in the proposed ensemble model.

The main novelties of this paper are described below:

i.:

For chest X-ray images, an ensemble model is developed for early detection of COVID-19 infection.

ii.:

The proposed model is also able to classify the suspected patient as infected from pneumonia, tuberculosis, or as a healthy person.

iii.:

The ensemble model utilize pre-trained models to extract the potential features and classify these features using pre-trained models.

iv.:

The proposed model is applied on two popular datasets.

v.:

Comparison among the proposed model and the competitive models is carried out by considering various performance metrics like accuracy, area under curve, f-measure, precision and recall.

The remaining paper is categorized as: The related work is discussed in Sect. 2. The preliminaries are described in Sect. 3. Section 4 presents the proposed model for early diagnosis of COVID-19 suspected cases. Comparative analysis are presented in Sect. 5. The proposed model is concluded in Sect. 6.

2 Related work

Dadário et al. (2020) proposed a three-dimensional deep learning model for the diagnosis of COVID-19. 4356 chest CT scans were used to validate the performance of the proposed method. The experimental results have shown better sensitivity and high specificity for the detection of COVID-19 infection. An algorithm is developed for the classification of medical images. It has utilized transfer learning and its performance has been compared with various existing systems based on CNN. Two different datasets have been used i.e., one with 1427 X-ray images in which 224 images with confirmed COVID -19 cases. In the second dataset, 1442 X-ray images were used, out of these 224 images are of confirmed COVID -19 cases. It has been found that the deep learning with CNNs is very effective for the detection of COVID-19 from X-ray images (Apostolopoulos and Mpesiana 2020).

A deep learning-based CNN model, i.e., truncated InceptionNet, was proposed. Chest X-rays were used to classify the different infections such as COVID-19 positive, pneumonia positive, tuberculosis positive and healthy cases. The truncated InceptionNet has achieved an accuracy of 99.96% (Das et al. 2020a). A system has been developed for the classification of chest CT images to detect the COVID-19 infection. Initial parameters of CNN were tuned with the help of differential evolution. It has been found that the proposed model achieved an accuracy as 98.24 % (Singh et al. 2020). A deep CNN based system has been designed for COVID-19 detection. Projection-expansion-projection (PEP) patterns were also used. A total of 13,975 chest X-ray images from 13870 patients have been used. A dataset named COVIDx was collected by using the integration of five different public datasets. Classification accuracy of 93.3% and sensitivity of 91% has been achieved (Wang et al. 2020).

A model has been developed for the segmentation of COVID-19 infected regions. Deep learning based VB-Net model was proposed. VB-Net has shown significant performance for the detection of COVID-19 infected regions (Shan et al. 2020). An automatic CNN model has been proposed for the detection of COVID-19. Experiments were also drawn on multiclass (COVID vs. No-Findings vs. Pneumonia) classification dataset. DarkNet classifier was utilized for YOLO architecture. 17 convolutional layers have been used. The DarkNet model has shown significant results in the initial screening of the patient (Ozturk et al. 2020).

3 Preliminaries

We propose an ensemble model which is designed by collaborating three different transfer learning models, i.e., EfficientNet, GoogLeNet, and XceptionNet. Such ensemble leads to a powerful approach and provides better results with reduction in errors. The proposed model uses CNN, pre-trained transfer learning model and ensemble learning to achieve better diagnostic results.

3.1 Convolutional neural networks (CNN)

CNN models have shown better performance in various applications such as agriculture, industry, and diagnosis of medical diseases (Rahimzadeh and Attar 2020a; Ghosh et al. 2020; Dekhtiar et al. 2018). The architecture of CNN imitates the visual cortex system of humans (Majeed et al. 2020; Basavegowda and Dagnew (2020)). CNN architecture is shown in Fig. 1. It consists of three layers, wherein the first layer is named as convolution layer, the second layer as pooling layer, and the last layer is so-called fully connected layer (Guo et al. 2017; Gupta et al. (2019)). The convolution and pooling layers are responsible for learning of the model and a fully connected layer does the classification.

Fig. 1
figure 1

Architecture of convolutional neural networks

3.2 Transfer learning

It is often difficult to obtain a large dataset in the medical imaging field and as stated earlier, the data for COVID-19 is even lesser, depth models cannot provide the desired results for small datasets (Ozturk et al. 2020; Wang et al. 2019), (Khan et al. 2020; Osterland and Weber 2019). The deep learning models require a larger dataset to train the model. For smaller datasets, deep learning models suffer from the overfitting issue. All these problems can be resolved by using transfer learning (Zhou et al. 2020; Wiens 2019) model. In transfer learning, pretrained models are utilized. These pretrained models were trained on different datasets with a large volume of images. Transfer learning solves the problem of data training by transferring the existing knowledge in the target field where very less or no sample data is available. The use of transfer learning also facilitates data training with less model building cost.

3.3 EfficientNet

EfficientNet (Tan and Le 2019) was introduced to deal with the scalability issues of CNN. CNNs are required to be scaled up width wise and depth wise to provide better accuracy. However, this scaling leads to an increase in the training and testing time cost. EfficientNet resolves the scaling problem with the use of a compound scaling method which scales the network with a fixed ratios of all dimensions to make it wider, deeper, and to provide high resolution (see Fig 2). This type of scaling provides better accuracy and performance. The model comprises of eight models named from B0 to B7, with B0 representing the most compact and B7 representing the most scaled configuration of EfficientNet.

Fig. 2
figure 2

Architecture of pretrained EfficientNet model

3.4 GoogLeNet

GoogLeNet (Szegedy et al. 2015) evolves in nine inception modules and has twenty-two concealed layers. The inception modules enable to select from the available filter size in each block. From a previous input layer having different size (i.e., 1\(\times\)1, 3\(\times\)3, and 5imes5), three convolution kernels, a feature extraction process is carried out at different scales and further passed to the next layer as depicted in Fig. 3. The overfitting and gradient vanishing problems are overcome by making three groups of inception modules and by adding three objective functions for each group.

Fig. 3
figure 3

Architecture of pretrained GoogLeNet model

3.5 Xception

Xception (Chollet 2017) is a deep CNN which stretches the inception concept to extremes. It can be thought of as an extension to the Inception architecture. It introduces new inception layers which are created firstly by depth-wise convolution layers and then from a point-wise convolution layer as shown in Fig. 4. The architecture resembles a linear stack of convolution layers which are separable depthwise. Being separable, depthwise separates the cross-channel and spatial features’ learning. It also reduces the memory requirements and computational cost. It has thirty-six layers concealed in fourteen modules. The modules except the first and last have linear residual connections.

Fig. 4
figure 4

Architecture of pretrained Xception model

3.6 Ensemble learning

Ensemble learning (Chen et al. 2018) is a method of combining various deep learning models to obtain an ensembled predictive model. This process of combining several techniques is believed to reduce bagging (i.e., variance), boosting (i.e., bias), and enhances stacking (i.e., prediction). It also enhances the learning system’s generalization ability. In ensemble learning, base classifiers can be generated in two ways (i) the data set is the same, but the learning algorithms are different, (ii) different data sets are used with the same learning algorithm. In the former case, a heterogeneous classifier is obtained and in the latter case, a homogeneous classifier is obtained. The purpose of using ensemble learning is to prevent overfitting by the combination of different methods. If ensemble learning is being used for classification, then the decisions from multiple models are combined and voted so that the final result can be obtained. Voting can be relative or absolute.

4 Proposed ensemble model

This section presents the steps followed to design the proposed ensemble model to classifying COVID-19 suspected cases. The flow of the work is represented in Fig. 5. The MBConv block is an Inverted Residual Block (used in MobileNetV2) with a squeeze and excite block injected sometimes.

Fig. 5
figure 5

Proposed ensemble model

Step 1: :

Initially, multiclass classification dataset is loaded.

Step 2: :

The partitioning of the obtained dataset of chest x-ray images referred as \(CXR_Sample\) is then achieved. The size of sample is \(|CXR\_Sample|=9300\). \(CXR\_Sample\) is divided into four subsets depending on the type of image as \(CXR\_Healthy\), \(CXR\_Tuberculosis\), \(CXR\_Pneumonia\), and \(CXR\_COVID\) each having size as 2400, 2350, 2375, and 2175, respectively.

$$\begin{aligned}&CXR\_Sample\nonumber \\&\quad =[ CXR\_Healthy \; CXR\_Tuberculosis \; \nonumber \\&\qquad CXR\_COVID \; CXR\_Pneumonia] \end{aligned}$$
(1)
Step 3: :

Tend (10)-fold crossover is implemented on four subsets created in Step 2 to obtain the training and testing sample sets. To obtain the 10-fold cross-sample set, a partition algorithm is used which divides each sample subset in 10 uniform parts.

$$\begin{aligned}&\{CXR\_Sample\_TrainingSet\nonumber \\&\qquad CXR\_Sample\_TestingSet\}\nonumber \\&\quad =Ten\_Cross(CXR\_Sample) \end{aligned}$$
(2)
Step 4: :

Pre-trained models are used to train the network to generate individual classifiers. \(EN\_{SFX}, GN\_{SFX}\), and \(Xception\_{SFX}\) represent the softmax function of EfficientNet, GoogLeNet, and Xception, respectively.

$$\begin{aligned} EN\_{SFX}= & {} DTL(EfficientNet,{SFX}) \nonumber \\ GN\_{SFX}= & {} DTL(GoogLeNet,{SFX})\nonumber \\ Xception\_{SFX}= & {} DTL(Xception,{SFX}) \end{aligned}$$
(3)
Step 5: :

Obtain the individual classifiers by training \(EN\_{SFX}, GN\_{SFX}\) and \(Xception\_{SFX}\) using the training sample set \(CXR\_Sample\_TrainingSet\) as follows:

$$\begin{aligned} EN\_{SFX}= & {} Train(EN\_{SFX}, \nonumber \\&CXR\_Sample\_TrainingSet) \nonumber \\ GN\_{SFX}= & {} Train(GN\_{SFX}, \nonumber \\&CXR\_Sample\_TrainingSet) \nonumber \\ Xception\_{SFX}= & {} Train(Xception\_{SFX}, \nonumber \\&CXR\_Sample\_TrainingSet) \end{aligned}$$
(4)

Step 6: Implement ensemble learning to obtain the resultant classifier \((Ensemble\_EGX)\) by integrating the above three classifiers and by applying the relative voting.

$$\begin{aligned} Ensemble\_EGX= & {} Ensemble \nonumber \\&(EN\_{SFX},GN\_{SFX},Xception\_{SFX}) \end{aligned}$$
(5)

5 Performance analysis

The proposed model is compared with several transfer learning models like ResNet152V2, VGG16, DenseNet201, and InceptionResnetV2 to validate its performance. The preceding subsection discusses the dataset and comparative analysis.

5.1 Dataset

Two popular datasets have been utilized for experimental purpose. Dataset 1 is obtained from Kaggle dataset resource (Gianchandani et al. 2020). This dataset comprises of the X-ray images of pneumonia, tuberculosis, COVID +ve and COVID -ve patients. For binary classification, COVID +ve and COVID -ve images are used from dataset 1.

Dataset 2 is obtained form dataset (Mporas and Naronglerdrit 2020), Qatar University, and University of Dhaka (Chowdhury et al. 2020). The size of the dataset is \(|CXR\_Sample|=9300\). \(CXR\_Sample\) is divided into four subsets depending on the type of image as \(CXR\_Healthy\), \(CXR\_Tuberculosis\), \(CXR\_Pneumonia\), and \(CXR\_COVID\) each having size as 2400, 2350, 2375, and 2175, respectively. For multiclass classification, dataset 2 has been used. Figure 6 shows the sample image datasets 1 and 2.

Fig. 6
figure 6

A view of the multiclass classification dataset

5.2 Data Preparation and Preprocessing

Firstly, resizing (224\(\times\)224\(\times\)3 RGB) of the X-ray images is achieved. The lack of data availability is dealt with the use of transfer learning wherein pretrained models on larger datasets are reused. Data augmentation is also used to achieve the better generalization. Data in proportional order is favorable for a neural network since it has millions of parameters. Data augmentation is achieved using horizontal and vertical flipping, sheer transformation (using a slant angle of 0.2), and \({45}^o\) degree rotation, the validation of the model is done by a variety of inputs, therefore, the data augmentation is also done on the validation dataset. Image normalization is also done to achieve better convergence rate while network training and to get the data in \([0,\;1]\) range. This normalization is done with the division of images using 255. Finally, the dataset is divided into training, validation, and testing purposes. 15% of the actual dataset is used as testing data. From the remaining 85%, 17% is used for validation and 68% is used for training the proposed model. Maximum data is utilized for training purpose because the learning and weight assignment of the model is done using the training data. The results may differ if even slight variations (increased/decreased) are done in the training data proportions.

5.3 Experimentation 1: Four class classification

The quantitative metrics based on confusion matrix are considered for testing the performance of the proposed model. These metrics are accuracy, precision, recall, and f1-Score.

The comparison between the proposed and competitive models like ResNet152V2, VGG16, DenseNet201, and InceptionResnetV2 is presented in Table 1. It is found that these models have achieved comparative results wherein ResNet152V2 has performed the best out of the existing four models with 98.15% accuracy. The proposed model has attained an accuracy as 99.32 %. Therefore, the proposed model outperforms the state-of-the-art pretrained models. The generalization ability of the proposed model is better than the competitive models. For training purpose, the proposed technique takes 7.4733 seconds minutes and for testing it takes only 1.4343 seconds. High value of precision for each class has been achieved, which indicates that the proposed model can be used to classify other chest related diseases as well. The validity of the proposed model is justified with the value of f-measure, macro average precision, and recall value as \(99.39 \%\), \(99.21 \%\), and \(99.20 \%\), respectively.

Table 1 Performance analysis of the proposed model on four-class classification dataset

5.4 Experimentation 2: Two class classification

For training purposes, the base models took ten minutes approximately. For binary classification, ResNet152V2 and VGG16 both attained the best accuracy value, however, considering the criticality of COVID-19, efforts can be made for enhancing the sensitivity and precision scores. Hence, the ensemble model used for multiclass diagnosis is introduced. The proposed model outperformed the basing of models with 96.15 % accuracy, which is approximately 1.2% higher. It also obtained a precision value as 0.959, which implies the correctness of the predicted results.

It can be noted from the results that a high specificity rate is obtained using the proposed model indicating no false-positive predictions. High specificity makes the system more reliable. It helps the health care system in using the testing kits correctly and providing facilities and kits to people who are in genuine need.

Table 2 Detailed parameters of the original VGG-16 model (Simonyan and Zisserman 2014)

5.5 Comparative analysis

It is observed from the experimental results that the proposed model significantly provides a rapid solution with low cost for COVID-19 detection using chest X-ray images. The state-of-the-art models’ performance along with the proposed model is presented in Table 3. The non-availability of data to train the model results in models which are not good in generalization. To deal with this problem, we have used transfer learning which reuses the models trained on a larger dataset. We also have tried to get as minimum as possible false predictions. Keeping in view the statistical metrics shown in Table 2, better generalization, better accuracy, and less false predictions are achieved with the proposed model by outperforming state-of-the-art models.

Table 3 Performance of state-of-art techniques for binary classification

For multiclassification, the state-of-the-art models’ performance along with the proposed model is presented in Table 4. Table 4 demonstrates the comparative analysis of the proposed model for multiclass dataset. The proposed ensemble model outperforms other techniques with higher accuracy for multiclassification as well.

Table 4 Performance of state-of-art techniques for binary classification

6 Conclusion

A deep transfer learning-based ensemble model was designed by integrating EffientNet, GoogLeNet, and XceptionNet for early diagnosis of COVID-19 infection. The proposed model is capable of detecting COVID-19 as well as differentiating normal, COVID-19 (+), pneumonia, and tuberculosis infected cases. Two datasets were used to test the proposed model. The proposed model has shown an accuracy of 99.21% for multiclass and 98.95% for binary classification problem, respectively. Hence, the proposed model has emerged as recent solution which can be used by health officials in this critical situation for early diagnosis of COVID-19.

In the near future, the proposed work can be extended in a way that it also predicts the degree of risk and survival chances of the COVID-19 (+) patients, which in turn will be very helpful for medical practitioners in the management and healthcare planning of infected patients.