Introduction

Coronavirus disease (COVID-19) is a respiratory tract infectious disease that has spread across the world [1]. It belongs to a family of viruses whose infection can cause complications that vary from typical cold to shortness of breath [2]. Patients also develop pneumonia termed, Novel Coronavirus Pneumonia (NCP), that results in acute respiratory failure with a very poor prognosis and high mortality [3, 4]. Subsequently, the pandemic nature of the coronavirus and the absence of reliable vaccines make COVID-19 diagnosis an urgent medical crisis.

At present, the standard testing method for COVID-19 diagnosis is the real-time Reverse Transcription Polymerase Chain Reaction (rRT-PCR) test. In this test, nasal swab is collected from the patient and kept in a special medium called the “virus transport medium”, to protect the RNA. Upon reaching the lab, the swab is further processed to determine whether or not the patient is positive for the coronavirus [5]. The entire process takes several hours and the results generally arrive after a day or two depending on the time taken from the swab to reach the lab.

The spread of the COVID-19 virus at this point advocates the requirement of its quick diagnosis and treatment. Studies such as [6, 7] have proved that the COVID-19 virus infects the lungs and creates smooth and thick mucus in the patient’s affected lungs that is visible when chest X-rays and CT scans are performed. However, the analysis of X-ray images is a tedious task and require expert radiologists. In this endeavor, several computer algorithms and diagnosis tools such as [8, 9] have been proposed to get detailed insights from the X-ray images. Although these studies have performed efficiently, they lack in terms of higher accuracy, generalization, computational time, and error rate. To mitigate the shortcomings, recent studies such as [10,11,12,13] have incorporated machine learning (ML) and deep learning (DL) tools to investigate the chest X-ray images. The selection of proper DL-based automated analyzer and predictor for coronavirus patients will be very beneficial and helpful for the medical department and society. Additionally, ML-DL approaches can provide test results faster and more economically as compared to the laboratory-based tests.

Furthermore, as COVID-19 is spreading rapidly through person-to-person contact, hospitals and healthcare professionals are becoming increasing overburdened, sometimes to the point of complete breakdown. Clearly, an alternative, remote-based, online diagnostic and testing solution is required to fill this urgent and unmet need. The Internet of Medical Things (IoMT) could be extended to achieve this healthcare-specific solution. With this motivation, the present work proposes an AI-based Healthcare Cyber-Physical System (H-CPS) that incorporates convolutional neural networks (CNNs) (see Fig. 1). The model allows healthcare practitioners to promptly and automatically screen positive and negative COVID-19 patients by considering their chest X-ray images.

Fig. 1
figure 1

Process flow of proposed COVID-19 classification

The organization of the paper is as follows: “Related prior research works” discusses the working of existing COVID-19 detection models, their shortcomings, and our contributions in the H-CPS framework. “Proposed CoviLearn model for automatic initial screening of COVID-19” explains the proposed solution and its functioning, followed by “Performance evaluation” that validates the model using real-life data. Finally, “Conclusions and future scope” gives a compact conclusion and mentions the area of future study.

Related Prior Research Works

How Existing Research Models Function

Over the course of 2 years, many techniques have been proposed for effective COVID-19 detection [14]. However, from the exhaustive list of works, we have selected some of the state-of-the-art methods focusing only on the deep learning based COVID-19 detection. A CNN called COVIDNet was trained in [15] using more than 15000 chest radiography images of COVID-19 positive and negative cases. The deep neural network (DNN) reported accuracy of 92.4\(\%\) and sensitivity of 80\(\%\). A three-dimensional convolutional ResNet-50 network, termed COVNet, was proposed in [16] that utilized volumetric chest CT images consisting of community acquired pneumonia (CAP) and other non-pneumonia cases. The reported AUC metric by the model was 0.96. A similar ResNet-50 model proposed by [17] reported an AUC of 0.996 although tested to a much lesser dataset. In [18], a location-attention network using ResNet-18 was proposed using disparate CT samples from COVID-19 patients, influenza-A infected, and healthy individuals to classify COVID-19 cases, that reported an accuracy of 86.7\(\%\). Samples from 4 classes: healthy, bacterial pneumonia, non-COVID-19 pneumonia, and COVID-19 were used in [19] to train drop-weight-based Bayesian CNNs that reported an accuracy of 89.92\(\%\).

In [20], a modified inception transfer-learning model that reported an accuracy of 79.3\(\%\), specificity of 0.83, and sensitivity of 0.67 was proposed. In [21], a multilayer perceptron combined with an LSTM neural network was implemented, that was trained using clinical data collected from 133 patients out of which 54 belonged to the critical care domain. The authors in [22] implemented a two-dimensional deep CNN architecture, while the authors in [23] combined three-dimensional UNet and ResNet-50 architectures. Both were trained using volumetric CT scanned data of patients categorized as COVID-19 positive and negative. The method in [24] used a pre-trained ResNet-50 network using chest X-ray images from 50 COVID-19 positive and 50 COVID-19 negative patients and reported an accuracy of 98\(\%\). In [25] four state-of-the-art DNNs: AlexNet, Resnet-18, DenseNet-201, and SqueezeNet were ensembled. The model also used chest X-ray images of normal, viral pneumonia, and COVID-19 cases. A novel CNN augmented with a pre-trained AlexNet using transfer learning was proposed in [26]. The model was tested on both X-ray and CT scanned images with reported accuracies of 98\(\%\) and 94.1\(\%\), respectively.

Shortcomings in the Existing Research Works

Although the domain is very new and many studies pertaining to the deep learning-based methodology have been proposed, most of them suffer from shortcomings such as lower accuracy, model generalization, computational cost, and error rate. Even when certain research works achieve higher accuracy, they either suffer from lower sensitivity, specificity or have a small test dataset. Moreover, the prospect of augmenting IoMT frameworks with COVID-19 diagnosis is new and its incorporation can further assist the existing healthcare system to cope in this difficult times. Also, the training dataset for certain methods is limited because of class imbalance, that is, less number of coronavirus images as compared to normal lung images. This problem of dataset imbalance results in lesser model accuracy and less efficient. Table 1 provides a comprehensive comparison of the existing research works.

Table 1 Comparative perspective with related AI works for COVID-19 detection

Our Vision of CoviLearn in the H-CPS Framework

We propose an AI-based H-CPS framework termed “CoviLearn” to provide healthcare professionals the leverage to perform automatic screening of COVID-19 patients using their chest X-ray images. With a deep neural network (DNN) in its core, the CoviLearn model is implemented on the server for ubiquitous deployment. The hyperparameters of the DNN have been adjusted to make its functioning reliable, accurate, and specific. By just uploading the X-ray images, the model automatically identifies the symptoms and reports unbiased results. CoviLearn augmented with H-CPS brings patients, doctors, and test lab in a single smart healthcare platform, as illustrated in Fig. 2. The reported results can be uploaded to the IoMT platform from where it may be transferred to nearby COVID-care hospitals, the Center for Disease Control (CDC), and state and local health bureaus. Hospitals could subsequently offer online health consultations based on the patient’s condition and monitor vital equipments and quarantine requirements. Therefore, the proposed H-CPS provides people the leverage to dynamically monitor their disease status, receive proper medical needs, and eventually curb the spread of the virus.

Fig. 2
figure 2

Schematic representation of the Healthcare Cyber-Physical System (H-CPS) ecosystem concept for combating COVID-19

Novel Contributions of CoviLearn

The major contributions of the work are:

  • An architecture of H-CPS framework augmented with a next generation smart X-ray machine architecture at the interface is proposed to combat the spread of COVID-19.

  • An efficient heuristic search technique is incorporated which automatically finds an optimal feature subset present in the input chest X-ray images.

  • An end-to-end automatic functioning DNN model that extracts the features from X-ray images is incorporated.

  • The CNN blocks are reliable, accurate, and very specific that makes the overall model very effective. Furthermore, the model can be easily integrated into embedded and mobile devices, thereby assisting health practitioners to effectively diagnose COVID-19.

Fig. 3
figure 3

The proposed next-generation X-ray device of CoviLearn integrated with machine learning models

Proposed CoviLearn Model for Automatic Initial Screening of COVID-19

The CoviLearn Device for Next-Generation X-ray Screening

As discussed in the earlier sections, COVID-19 and other related pneumonia diseases can be screened and diagnosed by analyzing chest X-ray images. However, the existing X-ray diagnosis suffers from limited access and lack of experienced personnel. To address this issue, we propose a next-generation X-ray system in the H-CPS perspective. The H-CPS and IoMT together bring all the necessary agents of smart healthcare in a universal communication and connectivity platform. This linking of technologies extends the efficiency services such as telemedicine, teleconsultation, and endorse smart-medical care.

Figure 3 shows the system-level block diagram of the next-generation X-ray machine integrated with CoviLearn for automatic screening of infectious diseases. It identifies most of its components, such as X-ray apparatus (tube), flat panel detector, onboard memory, DICOM protocol converter, Image processing, CoviLearn diagnosis, wired/wireless data communication, display, or user interface, along with system controller. In the proposed X-ray machine, X-ray image is captured by an array of sensor in the digital and radiography flat panel detector. The flat panel also includes the devices of communication to next stages. The image is then saved and converted to DICOM X-ray image. Subsequently, the image is processed and based on the quality and requirement the exposure of the X-ray tube is adjusted. The captured image is stored temporarily in the local memory, after which it is displayed on monitor screen with the help of the controller. After acquiring the quality assured image, it is then transferred to the CoviLearn model which automatically classifies the image either as normal or COVID-19 affected. The image classification is performed either locally in the presence of sufficient resources or on cloud by transmitting the images over network. The test results automatically synchronize with the H-CPS platform for necessary medical and administrative actions. The controller unit is responsible for controlling the entire sequence of events.

Dataset Used for Validating the Proposed CoviLearn System

To overcome the problem of class imbalance, we have manually collected chest X-ray of patients having coronavirus. These images are from various resources such as pyimagesearch, radiopedia, sirm, and eurorad. For the normal chest X-ray, we have used the chest X-ray dataset from the National Institute of Health (NIH), USA [27]. The count of images from both the sources was 250. Subsequently, the dataset has been divided into two classes: patient’s diagnosed as COVID-19 positive and negative. For training 80\(\%\) of the dataset (\(\sim\) 200 images) is used from which 30\(\%\) is used for validation (\(\sim\) 60 images). The testing of the model is performed on 20\(\%\) (\(\sim\) 50 images) of the dataset. Based on this validation dataset, the loss and validation graphs have been plotted. All the images are processed and mixed to prevent undue biasing as discussed in the following subsections.

Data Pre-processing

All the captured images have different sizes, and therefore, data pre-processing was essential before doing further analysis. The pre-processing is performed in three stages: first, the individual data are normalized by subtracting the mean RGB values; second, all the pixels in the input image data are scaled within the range of 0 to 1. Finally, the tensor is reshaped appropriately, so that it fits the model (in this case, the tensor is reshaped into \(224 \times 224\) pixels).

Data Augmentation

Deep learning models are ravenous for data and since our model only has around 250 images for each class; hence, the volume of our data needs to be increased and this can be achieved through data augmentation. Therefore, similar to the process mentioned in [28], the input images are augmented by random crop, adjust contrast, flip, rotation, adjust brightness, horizontal–vertical shift, aspect ratio, random shear, zoom, and pixel jitter. As a result of this augmentation, the proposed CoviLearn system became more efficient.

The Proposed Transfer Learning for Deep Neural Network in CoviLearn

CoviLearn uses transfer learning to predict the classification results. Transfer learning substitutes for the requirement of large dataset and has been used in different applications, such as healthcare, manufacturing, etc. It uses the knowledge learned in training a large dataset and transfers that same knowledge in some different and smaller dataset. In the present work, four different DNNs: ResNet-50, ResNet-101, DenseNet-121, and DenseNet-169, along with different blocks to train the individual networks. The hyperparameters have been adjusted to report the highest accuracy. Detailed structural organizational of network layers is as illustrated in Fig. 4 where each network is divided into phases, starting from getting an image input, followed by training the model by sequentially passing the set of images into convolutional networks, to finally predicting the results using a classification layer. Following subsection discusses the base classifiers and the difference between them.

Fig. 4
figure 4

Organization of the DNN with classification layers

Deep Neural Base Classifiers

The CoviLearn model uses four deep neural networks as the base classifiers. Two of these belong to the ResNet family [29] (ResNet-50 and ResNet-101) and remaining two belong to the DenseNet family [30] (DenseNet-121 and DenseNet-169). As the convolutional neural networks become deeper, the back propagated error from any layer is required to traverse the entire depth where repeated weight multiplications occur. As a result of these multiplications, the original error significantly diminishes and the neural network’s performance is satisfactorily affected. To combat this, researchers have proposed many architectures, out of which the current state-of-the-art includes the DenseNet and the ResNet models.

DenseNet or Dense Convolutional Network solves the problem using shorter connections between the layers. In other words, inside the DenseNet network, the each layer is connected to all its higher layers. Equation (1) represents the learning equation for a traditional CNN

$$\begin{aligned} P_{l} = T_{l}(P_{l-1}), \end{aligned}$$
(1)

where P\(_{l}\) represents the lth layer of the network, and T\(_{l}\) denotes the feature learned in the previous layer. For a DenseNet, the equation changes to (2)

$$\begin{aligned} P_{l} = T_{l}[P_{0}, P_{1}, P_{2}, \ldots , P_{l-1}]. \end{aligned}$$
(2)

This arrangement allows feature reusing without having to travel the entire depth or entire depth of the network. In comparison to a traditional CNN, DenseNet requires fewer parameters, because features learned in one layer are sent to the higher layers, thereby eliminating redundancy. A typical DenseNet architecture involves a convolution layer, followed by a pooling layer. These are followed by 4 dense blocks and 3 transition blocks placed one after the other. Inside the dense block, there are two convolutional layers with filters of different sizes, while the transition layer involves an average pooling layer. The dissimilarity between the DenseNet-121 and DenseNet-169 networks is with respect to the number of hidden layers. For the former, the total number of convolution layers in the four Dense Blocks is 121, while for the latter that is 169. Increasing the layers does not necessarily improve the accuracy and depends upon the particular situation.

Residual Networks or ResNet solves the problem of vanishing gradient decent by utilizing a skip connection between the original input and the final convolution layers. By overlooking the in between layers and attaching the given input directly to the output allows the presence of an additional path for the back propagated error to flow and therefore solving the problem of vanishing gradient descent. For a DenseNet, the equation changes to (3)

$$\begin{aligned} P_{l} = T_{l}(P_{l-1}) + P_{l-1}. \end{aligned}$$
(3)

A typical ResNet architecture involves four stages. The first stage is responsible for performing zero-padding operation on the input data. The second stage is made up of convolutional blocks that performs convolution along with batch-normalization and max-pooling. The penultimate layer consists of identity blocks augmented with filters, followed by the final stage that comprises a GAP layer, a fully connected dense layer, and classifier function. All convolution layers use ReLU as the activation function. Similar to DenseNet, the two types of ResNets that is ResNet-50 and ResNet-101 differ in the depth of the network. It has been observed that certain variations of ResNet have redundant layers that barely contribute. The presence of them results in ResNet handling larger parameters and weights. On the other hand, DenseNet are relatively narrow (fewer number of filters) and simply add the new feature maps. Another difference between the DenseNet and ResNet models is that the former does not sum the output feature maps of the preceding layers but rather concatenates them, unlike the latter where summation happens. This is evident from Eqs. (2) and (3).

Training and Testing of the Proposed Model

The CoviLearn model takes the input image, swaps the color channels, and resizes it to 224 \(\times\) 224 pixels. Afterwards, the data and label list are converted into an array, while the pixel intensities are normalized between 0 and 1, by dividing the entire input image by 255. Subsequently, one-hot encoding is performed on the labels, following which various models are loaded one at a time by freezing few upper layers and a base layer is created with dropout. Finally, the input tensor of size 224 \(\times\) 224 is loaded onto the model and compiled using Adam optimizer and binary cross entropy loss.

Performance Evaluation

Experimental Setup

To compare the performance of different models, three evaluation parameters: accuracy, sensitivity, and specificity have been considered. As the test images are converted into 224 \(\times\) 224 tensor, the model predicts the above-mentioned three metrics. Table 2 illustrates the comparison of results between the four models: DNN I (ResNet-50), DNN II (ResNet-101), DNN III (DenseNet-121), and DNN IV (DenseNet-169). A confusion matrix (see Fig. 5 compares the True Positive, True Negative, False Positive, and False Negative values. Moreover, loss-accuracy versus epoch graph is also provided to project how the training loss, validation loss, training accuracy, and validation accuracy vary with each epoch.

Result Analysis

In context of coronavirus detection, True Positive (TP) is when the patient has coronavirus and the model detects coronavirus, True Negative is when the patient does not have coronavirus and the model also predicts the same. False Positive (FP) is when the the patient is not infected with the coronavirus, but the model predicts the opposite, while False Negative (FN) is when the patient has coronavirus, but the model says otherwise. Accuracy specifies the correct number of predictions made by the CoviLearn model with respect to the total number of patients and is represented by Equation (4). Additional metrics such as sensitivity—the ability to identify coronavirus patients correctly—and selectivity—the ability to identify non-coronavirus patients correctly—are as defined by Eqs. (5) and (6), respectively

$$\begin{aligned} {\text {Accuracy}}&= \frac{{{\text {TP}} + {\text {TN}}}}{{{\text {TN}} + {\text {TP}} + {\text {FP}} + {\text {FN}}}} \end{aligned}$$
(4)
$$\begin{aligned} {\text {Sensitivity}}&= \frac{{{\text {TP}}}}{{{\text {TP}} + {\text {FN}}}} \end{aligned}$$
(5)
$$\begin{aligned} {\text {Specificity}}&= \frac{{{\text {TN}}}}{{{\text {TN}} + {\text {FP}}}} . \end{aligned}$$
(6)

Table 2 summarizes the performance matrix for different deep learning model tested for the different classification schemes. DNN III, which has DenseNet-121 architecture, performs best over other models in classification yielding an accuracy of 98.98\(\%\), sensitivity of 100\(\%\), and specificity value of 98\(\%\). Whereas, DNN I has the lowest performance value with an accuracy of 95.92\(\%\), sensitivity of 95.83\(\%\), and specificity value of 96\(\%\).

Table 2 Performance metrics for different deep learning techniques

Figure 5 shows the confusion matrices of COVID-19 and normal test results of the different pre-trained models. The graphs show a well-defined pattern of the training–validation accuracy that increases, and the training–validation loss that decreases, with increasing epochs. Because of the limited computational resources, the comparison between different parameters is done for 25 epochs only. Besides the confusion matrix, receiver-operating characteristic (ROC) curve plots and areas for each model are given in Fig. 6. DNNs which are trained with DenseNet pre-trained blocks appear to be very higher than DNN trained with ResNet blocks, with DNN III having the highest AUC of 99\(\%\). One of the interesting findings is the DNN which when used with the ability of the DenseNet model achieves higher sensitivity and specificity. This ensures the reduction of false positives for both the COVID-19 and the healthy classes. As is evident from the relationship between accuracy and epoch, DNN-III shows the highest accuracy followed by DNN-IV, DNN-II, and DNN-I. The accuracy increases with each subsequent epoch except at few as illustrated in Fig. 7. A similar trend is shown in loss graphs where the loss decreases with each subsequent epochs and a similar trend is followed, that is, DNN-III shows the lowest loss followed by DNN-IV, DNN-II, and DNN-I (see Fig. 8). The results as reported by the proposed CoviLearn model are compared with the existing research works and tabulated in Table 3. In [18], detects COVID-19 using classification of CT samples by CNN models with an accuracy of 86.7\(\%\), sensitivity of 98.2\(\%\), and specificity of 92.2\(\%\). CovidNet in [15] reported an accuracy of 93.3\(\%\). The CNN-based DarkCovidNet model [31] to detect COVID-19 from chest X-ray also has an accuracy of 98.08\(\%\). In comparison, the proposed model has an accuracy of 98.98\(\%\), sensitivity of 0.984, and specificity of 0965. CoviLearn has significantly outperformed existing deep learning-based COVID-19 detection techniques such as [15, 18,19,20, 23]. Also, the sensitivity of the proposed model has outperformed existing models such as [15, 20, 23] both in terms of sensitivity and specificity. [17, 21, 24] achieved similar accuracy; however, their test dataset size is relatively smaller than the one used in the current work. The deep neural architectures proposed in [25, 26] involved many hyperparameters, estimation of which increased the overall computation cost and resulted in ubiquitous deployment. On the other hand, CoviLearn because of its transfer-learning ability and selected deep neural networks has the advantage of rejecting redundant parameters and thereby reducing the overall computational cost. Finally, all these models lacked a smart healthcare framework, which has been proposed and implemented in CoviLearn in the form of H-CPS. The comparison of existing research works is compactly summarized in Tables 3 and 4 .

Fig. 5
figure 5

Confusion matrix for a DNN I, b DNN II, c DNN III, and d DNN IV

Fig. 6
figure 6

Comparison of the receiver-operating characteristics (ROC)

Fig. 7
figure 7

Classification accuracy in the deep learning system validation

Fig. 8
figure 8

Binary cross entropy loss in the deep learning system validation

Table 3 Comparison of results with existing recent similar works
Table 4 Comparison with existing deep learning-based COVID-19 detection model

Effectiveness of the Transfer-Learning Concept

The initial neural network when trained reported accuracy, sensitivity, and specificity values of 0.5981, 0.6041, and 0.5923, respectively. To improve these substantially, we used the concept of transfer learning. It is done by freezing the layers of the existing models and replacing with the penultimate layer (the layer responsible for performing classification) of state-of-the-art neural networks trained on larger datasets to perform final classification. This step improved the accuracy, sensitivity, and specificity metrics to 0.9225, 0.9319, and 0.9135, respectively. Following this step, fine-tuning is performed on the model’s hyperparameters to further improve the model’s performance by \(\sim \,5\%\). Therefore, despite a small training dataset of 250 images, embedding the transfer learning helped improve the model’s classification performance significantly. Table 5 compares the metrics obtained in each of the stages.

Table 5 Performance metrics at different stages of training

Conclusions and Future Scope

The study presents CoviLearn, a DNN-based transfer-learning approach in Healthcare Cyber-Physical System framework to perform automatic initial screening of COVID-19 patients using their chest X-ray image data. An architecture of next-generation smart X-ray machine for automatic screening of COVID-19 is proposed at the interface of H-CPS. Four different DNNs: ResNet-50, ResNet-101, DenseNet-121, and DenseNet-169 are trained and tested for classification of the X-ray images from healthy and corona disease-infected patients. DenseNet-121 showed the highest accuracy close to 98.98\(\%\) followed by DenseNet-169 , ResNet-50, and ResNet-101. Similarly, the sensitivity of DenseNet-121 and DenseNet-169 are 100\(\%\), while that of ResNet-50 and ResNet-101 are close to 97\(\%\). The highest specificity of DNN III is 98%. Therefore, all these results clearly indicate the ability to classify the deadly coronavirus correctly.

The present CoviLearn platform will be very useful tool doctors to diagnosis the coronavirus disease at a lower cost despite being economical and automatic. However, additional study and medical trial are required to full proof the extracted features extracted by machine learning as reliable bio-markers for COVID-19. Furthermore, these machine learning models can be extended to diagnose other chest-related diseases including tuberculosis and pneumonia. A limitation of the study is the use of a limited number of COVID-19 X-ray images. Therefore, in the future, a larger dataset and a cloud based system can be ventured to make the model ubiquitous and more robust. In fact, the results can be used to detect the highly prone corona positive patients in a timely application of quarantine measure, until the rRT-PCR test examinations results are obtained. The proposed CoviLearn can be added to our healthcare CPS framework CoviChain for reliable information sharing right from the source to destination end while accommodating various stake holders [34].