1 Introduction

With the advent of the COVID-19 pandemic, a massive amount of multimedia healthcare data is generated. The analysis of this data is critical for a technology-driven solution. To process massive data for disease diagnosis, machine learning (ML) and deep learning (DL) techniques have exhibited noticeable performance. Many applications have been developed using ML and DL techniques over the conventional computer-aided systems in disease diagnosis. Deep learning models are mainly used when there is a huge medical dataset and automatically extracting features from the images for developing a prediction and detection model. DL methods greatly lessen the comprehensive data engineering and feature extraction process. Particularly, deep learning techniques have shown significant potential to detect Lung-based abnormalities by processing chest X-rays [1, 2].

Effective detection and screening measures, along with proper and speedy medical action, are the need of the hour. The Reverse Transcription Polymerase chain reaction (RT-PCR) test is a useful screening technique for the COVID-19. This method is complicated and time-consuming, with an accuracy of about 63% [3]. Thus, with complex manual testing procedures and a shortage of testing kits, the infected are interacting with the healthy world-over leading to an exponential rise in active cases [4]. The medical symptoms of severe COVID-19 infection are bronchopneumonia, causing fever, cough, dyspnoea, and pneumonia [4,5,6,7].

The similarity in visual aesthetics of chest X-rays of COVID-19 patients with Viral Pneumonia [8,9,10,11] can sometimes lead to misdiagnosis of the disease. There have also been instances of misdiagnosis of chest X-rays by radiologists. There is a similarity in visual aesthetics of chest X-rays of COVID-19 patients [12] with those of Viral Pneumonia.

This study is of significance as the transfer learning models, namely EfficientNetB0, InceptionV3. and VGG16, have been proven suitable for practical implementation due to their balance of accuracy and efficiency with fewer parameters suitable for mobile networked applications as a means to detect COVID-19 [56]. This study provides substantial evidence that computer vision technology can be a path to achieve better accuracy with lower human intervention to screen COVID-19 disease.

The rest of the article is structured as follows. Section 2 presents the literature review; Sect. 3 describes the methodology, including data sets, databases, model selection and pre-processing, etc. Section 4 provides a performance evaluation and discussion. Section 5 concludes the paper with findings and further research.

2 Literature of review

Recent developments in deep learning have been seen over the years in many fields such as big multimedia data, business analytics for medical multimedia research, and finally, managing media-related healthcare data analytics [13,14,15,16,17,18, 54]. Computer-aided diagnosis (CAD) for lung diseases has been a part of medical research for nearly half a century. It was based on simple rule-based algorithms for prediction but has now developed into ML via deep neural networks [10, 19,20,21, 53]. Recent times have made CAD in lung disorder analysis imperative due to the extreme workload on radiologists [22]. Convolution networks can now extract features from images hidden from the naked eye [23,24,25,26]. This technique of Deep learning is widely acknowledged and utilized for research [14, 27,28,29]. In medical image analysis, the application of CNN was established by [30] to enhance low light images. They used it to identify the nature of the disease through CT and chest X-ray images. CNN has also proven to be reliable in feature extraction and learning by image recognition from endoscopic videos. For Chest X-ray analysis, CNN has gathered interest as it is low in cost with an abundance of training data for computer vision models. For classification, Rajkomar et al. applied GoogleLeNet with data augmentation and pre-training on ImageNet to classify chest X-rays with 100 percent accuracy [31]. This is essential evidence of deep learning applications in clinical image classification.

Transfer learning through pre-trained models was implemented by Vikash et al. [32] in a study for Pneumonia detection. Classification models for lung mapping and abnormality detection were built through a customized VGG16 transfer learning model [33]. Studies by training CNN models on a large training set were performed by Wang et al. [34] and with data augmentation by Ronneburger et al. [35]. Accurate detection of 14 different diseases by feature extraction techniques through Deep Learning CNN models was reported [36]. Sundaram et al. [37] achieved an AUC of 0.9 by transfer learning techniques through AleXNet and GoogleLeNet for Lung disease detection. A ResNet50 model [38] delivered an outcome with 96.2% accuracy. The inception V3 model has been successfully used to achieve the classification of Bacterial and Viral Pneumonia impacted chest X-rays (CXRs) from those which are normal with an AUC of 0.940 [39]. In a different. An attempt was made to screen and identify the disorder in chest X-rays with an area under a curve of 0.633 [40]. A gradient visualization technique was used to localize ROI with heatmaps for lung disease detection. A 121-layer deep neural network achieved an area under a curve of 0.768 for pneumonia identification [41]. Philipsen et al. [42] experimented on the performance of T.B. detection based on computerized chest and reported an AUC value of 0.93. Bharathi et al.[43] proposed a successful hybrid deep learning framework called “VDSNet” for time-efficient lung disease diagnosis through machine learning. Yoo et al. [44] proposed a prediction of COVID-19 based on a deep learning-based decision tree for fast decision making. The study reported an accuracy of 98%, 80%, and 95% for three decision trees.

Fig. 1
figure 1

Process of computer vision-enabled classification

A comparative analysis of the study is tabulated in Table 1.

Table 1 Comparative analysis of the study

3 Methodology

The process of medical image-based COVID-19 detection CNN-based classification model is shown in Fig. 1. A deep convolution neural network model’s classification capability is based on the amount and quality of data available for training. The amount of data, when sufficiently large, is observed to outperform the models trained on a smaller set. Utilizing pre-trained weights by transfer learning is a method wherein a model previously trained on a more extensive training set is used on a relatively small one with modifications as required. This benefits in reducing the time of training the model as it would not be done from scratch. This also reduces the load on the system’s hardware being used and can be done on general-purpose computers like the one used in this work. Transfer learning was achieved by using the Tensor flow library. Post loading the respective model, learning weights were modified to the suitability of the present dataset.

The details of the dataset used, details of model selection and process of model architecture are mentioned in the following sections.

3.1 Description of dataset

This study has used a posterior-to-anterior view of chest X-ray images. Figure 2 demonstrates some sample X-ray images of different classes. This view is most commonly referred to by radiologists in the detection of pneumonia.

Fig. 2
figure 2

COVID-19, Normal and viral pneumonia X-ray images

There are two broad subsections from where images have been sourced, details of which are as follows.

3.1.1 COVID-19 radiography database

M.E.H. Chowdhury, et al. [45], in their research “Can AI help in screening Viral and COVID-19 pneumonia?” collected chest X-ray images of positive COVID-19 patients along with normal and those suffering from Viral Pneumonia which is available for public use in Kaggle.com.

3.1.2 Actualmed-COVID-chest X ray-dataset

Medical data compiled by Actualmed, and José Antonio Heredia Álvaro and Pau Agustí Ballester of Universitat Jaume I (UJI) for research [46].

The training of models is done on 3106 images, 0.16 of which used for validation. Testing of three different algorithms was done on 806 non-augmented images of different categories to evaluate each algorithm’s performance. The details of the splitting of the dataset are illustrated in the Table 2.

Table 2 Details of training and test set

3.2 Model selection and pre-processing

The models selected for the research were due to their significance. Based on the assumption that better accuracy and efficiency can be achieved by setting the balance between all networks, EfficientNetB0 has been suggested. EfficientNetB0 surpasses CNN in gaining better accuracy while significantly reducing the number of parameters, as shown in Fig. 3 [47].

Fig. 3
figure 3

EfficientNetB0 architecture

VGG-16 [48], shown in Fig. 4, developed in 2014, is a popular model already trained in image classification.

Fig. 4
figure 4

VGG16 model [48]

The InceptionV3 [49] network, developed in 2015, as shown in Fig. 5 [49]. The main idea is to install modules using a few weights. InceptionV3 costs are suitable for mobile applications and big data.

Fig. 5
figure 5

Inception V3 model

Image pre-processing is done to resize the X-ray images to have standard input. As per the model requirement, the images are resized to and 224 × 224 pixels and were normalized according to the pre-trained model standards.

The chest X-rays were subjected to augmentation before training by rotation, scaling, and translation, including nearest neighbor fill techniques, as shown in Fig. 6.

Fig. 6
figure 6

Plot of augmented horizontal flip

3.3 Process of model architecture

The following steps were incorporated to implement the classification model.

The architecture depicted in Fig. 7 was incorporated by training some layers and keeping others frozen to finetune the model. In the CNN model, the layers at the bottom refer to features that do not depend upon the classification problem, whereas layers at the top refer to the problem-dependent features. Steps 3, 4, and 5 are frozen, and the final layers are unfrozen post feature transfer. This unfrozen, fully connected layer is the network head and responsible for classification. Backpropagation and weight decay were used to reduce the over-fitting in the models. The total no. of epochs for training is 25 with a batch size of 18. The base learning rate is chosen to be 0.00001.

Fig. 7
figure 7

Steps to implement the model

4 Results and discussion

In this section, we present the multi-classification results followed by a brief discussion of the results given by each model.

A confusion matrix is used to check how well a model can perform for new data. Following Eqs. (1) to (5) shows the formulae for different performance metric to measure the performance of binary classification models.

$$\mathrm{Accuracy}=\frac{\mathrm{TP}+\mathrm{TN}}{\mathrm{TP }+\mathrm{ FP }+\mathrm{ TN }+\mathrm{ FN}}$$
$$\frac{\mathrm{Sensitivity}}{\mathrm{Recall}}=\frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FN}}$$
$$\mathrm{Specificity}=\frac{\mathrm{TN}}{\mathrm{FP }+\mathrm{ TN}}$$
$$\mathrm{Precision}=\frac{\mathrm{TP}}{\mathrm{TP }+\mathrm{ FP}}$$
$$\mathrm{f }1\mathrm{ score}=2\frac{(\mathrm{Precision}\cdot \mathrm{ Recall })}{\mathrm{Precision }+\mathrm{ Recall}}$$

The results by VGG16 (Table 3) indicate that Normal CXR’s were detected with reasonable sensitivity (89%) due to low false negatives. The precision and specificity (91.01% and 93%) with an accuracy of is 91.8% is reported. Viral Pneumonia (Table 3) is reported within acceptable values. COVID-19 class (Table 3) is reported with good specificity (90%) but low precision (68%). It is observed, the accuracy is 82.34% (Fig. 8a).

Table 3 Evaluation for VGG16
Fig. 8
figure 8

Confusion matrix of VGG16 (a), Inceptionv3 (b), EficientNetB0 (c)

The results by InceptionV3 (Table 4) indicate that Normal CXR’s were detected with good sensitivity (93%). Better precision and specificity (95% and 94%) with accuracy is 94.42%. Viral Pneumonia (Table 4) is reported with an accuracy of 94%. COVID-19 class (Table 4) is reported with better specificity (95%) and acceptable precision (77%). It is observed, the accuracy is 93.38% (Fig. 8b).

Table 4 Evaluation for Inceptionv3

The results by EfficientNetB0 (Table 5) indicate that Normal CXR’s were detected with very good sensitivity (94%). The highest precision and specificity (95% and 96.53%) with an accuracy of is 95.53% is reported. Viral Pneumonia (Table 5) is reported with an accuracy of 95%. COVID-19 class (Table 5) is reported with high specificity (96%) and reasonable precision (79%). It is observed, the accuracy is 94.79% (Fig. 8c).

Table 5 Evaluation for EfficientNetB0

Table 6 shows the description of the overall performance parameters of the three classification models. The results are observed to be the best for EficientNetB0.

Table 6 Overall performance parameters

It is observed that the main cause of misclassification of COVID-19 as normal was due to less opacity in the left and right upper lobe and suprahilar on posterior-to-anterior x-ray images, which is very similar to normal x-ray images.

5 Conclusion and future scope

The COVID-19 pandemic has clearly put a threat to human existence. Efforts leading to curb the spread of the disease are observed to burden the healthcare sector. Testing measures to detect the presence are costly and may be insufficient to reach a wider population. Deep learning methods have proven to be an essential aid to screen big data with greater accuracy. This study aimed to provide evidence on the successful application of deep learning techniques to help detect the presence of COVID-19 infection. The results of this study confirm that deep CNN computer vision models are capable of practical implementation in the healthcare sector to screen and detect the presence of COVID-19 from chest X-rays. Transfer learning techniques have proven beneficial in enhancing the learning capabilities of the model. The EfficientNetB0 model reported the highest accuracy of 94.79% in detecting and classifying COVID-19 chest X-rays from other categories of chest abnormalities and an overall accuracy 0f 92.93%. This paper provides evidence that medical facilities’ burden can be lowered through AI technology’s effective use. Implementation of this technique also reduces the risk of spreading the disease and rises in cases as the doctors and patients will not require any physical at the screening level. The images that were misclassified were due to less opacity in the left and right upper lobe and suprahilar on posterior-to-anterior x-ray images, which is very similar to normal x-ray images.

The observations are from a limited amount of data set, which can be enhanced as more data becomes available for future research. The models then can be made country-specific to provide more detailed insights. The models have been trained on 20 epochs which can be increased on computer systems with enhanced processing capabilities. Further different deep learning techniques and models may be implemented for comparison of results with respect to multimedia medical image screening. The models selected and implemented in this study can be a base for further research in this domain.