The era of artificial intelligence has brought significant improvements in the living society [30]. The recent advancements in deep learning have extended its domain in various applications such as healthcare, pixel restoration, visual recognition, signal processing, and a lot more [28]. In healthcare domain, the deep learning based image processing approaches for classification and segmentation are applied for faster, efficient, and early diagnosis of the deadly diseases e.g. breast cancer, brain tumor, etc. by using different imaging modalities such as X-ray, CT, MRI, [47] and fused modalities [37] along with its future possibilities. The success of these approaches is dependent on the large amount of data availability, which however is not in the case of automated COVID-19 detection.
The main contribution of the work is divided into the components as shown in Fig. 2. It has two concrete components: data preprocessing and classification. COVID-19 image, RSNA and NLM(MC) datasets are used to generate the final working set. The newly generated dataset contains CXR images of the following classes: coronavirus caused diseases, pneumonia, other diseases and normal cases. Further, binary classification (COVID-19 vs others) and multi-class classification (COVID-19, other types of pneumonia, tuberculosis and normal) are achieved using random oversampling and weighted class loss function approaches for unbiased fine-tuned learning (transfer learning) in various state-of-the-art deep learning approaches such as baseline ResNet, Inception-v3, Inception ResNet-v2, DenseNet169, and NASNetLarge [17, 52, 54, 63]. The trained models are utilized for identification and classification of COVID-19 in novel samples. Later, visualization techniques are utilized to understand and elaborate the basis of the classification results.
Imbalanced learning approach
Class balancing techniques are necessary when minority classes are more important. The dataset used in this research is highly imbalanced which may lead to biased learning of the model. Number of coronavirus infected CXR images are very less compared to other classes, hence class balancing techniques must be insured to smoothen the learning process. This section discusses two approaches to handle the class imbalance problem: weight class approach and random oversampling [21].
Weighted class approach
In this approach, the intention is to balance the data by altering the weights that each training sample class carries when computing the loss. Normally, each class carries equal weights, but sometimes certain classes with minority samples are required to hold more weights if they are more important because training examples within that class should have a significant effect on the loss function. In the used dataset the coronavirus infected image class samples must be given more weights as they are more significant. In this article, the weights for each class is generated based on the Eq. (1).
$$ {w(c)}=C_{c} . {\frac{\sum\limits_{c=0}^{N}{n_{c}}}{N . n_{c}}} $$
(1)
where Cc is the class constant for a class c, N is the number of classes, and nc is the number of samples in a class c. The computed class weights are later fused with the objective function (loss function) of the deep learning model in order to heavily penalize the false predictions concerned with the minority samples, which in this case is coronavirus.
Random oversampling approach
In this approach, the objective is to increase the number of minority samples by utilizing the existing samples belonging to the minority class. The minority samples are increased until the samples associated with every class become equal. Hence the procedure follows by identifying the difference between the number of samples in majority and minority class. To fill this void of difference, the samples are generated from the randomly selected sample belonging to the minority class by applying certain statistical operations. In this work, the samples of CXR image of COVID-19 positive cases are less as compared to other classes, therefore, these minority class images are randomly oversampled by means of rotation, scaling, and displacement with the objective to achieve equal distribution of classes and accommodate unbiased learning among the deep learning models.
Classification Strategy
Based on the type of data samples availability of CXR images the COVID-19 classification is divided into two following schemes:
-
Binary Classification - In this classification scheme, the coronavirus positive samples labelled as “1” (COVID-19) are identified against the rest of the samples labelled as “0” (non COVID-19 case) which involves other cases e.g. chlamydophila, SARS, streptococcus, tuberculosis, etc., along with the normal cases.
-
Multi-class Classification—In this classification scheme, the aim is to distinguish and identify the COVID-19 samples from the other pneumonia cases along with the presence of tuberculosis and normal case findings. The multi-class classification is performed with three and four classes. The three classes are provided with labels as “0” being a normal case, “1” being a COVID-19 case, and “2” being other pneumonia and tuberculosis cases, whereas four classes are labeled as “0” being a normal case, “1” being a COVID-19 case, and “2” being other pneumonia case and “3” as tuberculosis case.
In both the classification strategies, the deep learning models are trained with the above discussed imbalanced learning approaches using the weighted categorical cross entropy (WCE) loss function as given by Eq. (2) and Eq. 3 [24]:
$$ f{(s)}_{i}= \frac{e^{s_{i}}}{{\sum\limits_{C}^{j}}{e^{s_{i}}}} $$
(2)
$$ WCE=-{{\sum\limits_{i}^{C}}{w(i).t_{i}. \log (f{(s)}_{i})}} $$
(3)
In categorical cross entropy, the distribution of the predictions (the activations in the output layer, one for each class) is compared with the true distribution only, to ensure the clear representation of the true class as one-hot encoded vector; here, closer the model’s outputs are to that vector, the lower the loss.
Data Preprocessing
In this article, due to the limited samples of posteroanterior chest X-ray images concerned with positive COVID-19 [8] cases, the data samples are mixed with the other randomly selected CXR images selected from other datasets-, RSNA [50] and NLM(MC) [19]. The RSNA and NLM(MC) datasets consists of posteroanterior CXR images covering sample cases labelled as pneumonia and tuberculosis respectively along with normal samples. Table 2 describes the distribution of training, testing, and validation sets using the fused dataset for binary and multi-class classification along with different class imbalance strategies i.e. class weighted loss function that penalizes the model for any false negative prediction and random oversampling [61] of minority classes which in this case is COVID-19.
Table 2 Posteroanterior CXR images distribution into training, validation, and test sets from the fused datasets for different problem definitions The CXR images in the aggregated dataset also consists of unwanted artifacts such as bright texts, symbols, varying resolutions and pixel level noise, which necessitates its preprocessing. In order to suppress the highlighted textual and symbolic noise, the images are inpainted with the image mask generated using binary thresholding [34] as given by Eq. (4), followed by resizing the images to a fixed size resolution of 331 × 331 × 3.
$$ M(x,y)=\left\{\begin{array}{ll} {max}\_{th}, & i(x,y) \geq {{min}\_{th}} .\\ 0, & \text{otherwise}. \end{array}\right. $$
(4)
where i(x,y) is an input image, maxth and minth are max and min thresholds to design the mask. Despite filtering the unwanted information, there is still the possibility of uncertainty at the deep pixel level representation [15]. The denoising or removal of such uncertainty is carried through the adaptive total variation method [53] while preserving the original distribution of pixel values.
Let for a given grayscale image f, on a bounded set Ω over \({\mathbb {R}}^{2}\), where \({\Omega } \subset {\mathbb {R}}^{2} \), denoising image u that closely matches to observed image x = (x1,x2) 𝜖 Ω - pixels, given as
$$ u = \arg\min_{u}{\left ({\int}_{\Omega}{}(u - f. \ln{u})dx + {\int}_{\Omega}{}(\omega(x)|\bigtriangledown u|dx) \right)} $$
(5)
where \(\omega (x) = \frac {1}{1+ k{\mod {G_{\sigma }*\bigtriangledown u}}^{\prime }}\), Gρ - the Gaussian kernel for smoothing with σ variance, k > 0 is contrast parameter and * is convolution operator.
Figure 3 illustrates the data preprocessing stages by considering an instance of COVID-19 case consisting of textual and symbolic artifacts from the generated dataset. The resulting distributed pixels histograms at each stage of preprocessing shown in Fig. 3, illustrates that the preprocessing approach tends to preserve the original nature of distribution of the pixels while removing the irregular intensities. The preprocessed images are then divided into training, testing, and validation set for training and evaluation of the state-of-the-art deep learning classification models.
Deep learning models
This section incorporates the state-of-the-art deep learning models utilized in the present research work as shown in Table 3 along with their respective contribution, parameters, and performance on the standard benchmark datasets. The inception deep convolutional architectures proposed by GoogLeNet are considered as the state-of-the-art deep learning architectures for image analysis and object identification with the basic model as inception-v1 [5]. Later, this base model was refined by introducing the batch normalization and established as the inception-v2 [54]. In further iterations, additional factorisation was introduced and released as the inception-v3. It is one of the pre-trained models to perform two types of specific tasks: dimensionality reduction using CNN and classification using fully-connected and softmax layers. Since it is originally trained on over a million images consisting of 1,000 classes of ImageNet, its head layers can be retrained for the generated dataset using historical knowledge to reduce the extensive training and computational power. Later, Inception-ResNet-v2 was proposed by Szegedy et al. [52]. This hybrid model is a combination of residual connections and a recent version of Inception architecture. It is intended to train very deep convolutional models by the additive merging of signals, both for image recognition and object detection. This network is more robust and learns rich feature representations. Afterwards, DenseNet was proposed by Huang et al. [17]. It works on the concept of reuse, in which each layer receives inputs from all previous layers and yields a condensed model to pass its own feature-maps to all subsequent layers. This makes the network thinner and compact with the fewer number of channels, while improving variation in the input of subsequent layers, and becomes easy to train and highly parameter efficient. Google Brain research team proposed NASNet model based on reinforcement learning search methods [63]. It creates search space by factoring the network into cells and further dividing it into a number of blocks. Each block is supported by a set of popular operations in CNN models with various kernel size e.g: convolutions, max pooling, average pooling, dilated convolution, depth-wise separable convolutions etc.
Table 3 Recent deep learning architectures that are reported with top 5 error rate per year Fine tuning
Training deep learning models require (Inception-v3, InceptionResNet-v2, etc.) exhaustive amount of resources and time. While these networks, attain relatively excellent performance on ImageNet [9], training them on a CPU is an exercise in futility. These CNNs are often trained for a couple of weeks or more using arrays of GPUs to get good results on the complex and complicated datasets. In most deep CNNs the first few convolution layers learn low-level features (edges, curves, blobs) and with progress, through the network, it learns more mid/high-level features or patterns associated with the on-going task. In fine tuning, the aim is to keep or freeze these trained low-level features, and only train the high-level features needed for our new image classification problem.
In this article, the trials of training the deep learning classification models initiates with the baseline residual network that is composed of five residual blocks [16], defined by two convolutions whose rectified linear unit (ReLU) activation is concatenated with the input of the first convolution layer, followed by max-pooling and instance normalization, except the final output layer which uses softmax activation function. Later, transfer learning is utilized on the state-of-the-art architectures discussed in Table 3 to utilize the pre-trained models while fine tuning the head layers to serve the purpose of classifying the COVID-19 samples, as illustrated in Fig. 4. This follows by enabling the four head layers of the network to adjust its trainable parameters along with the addition of fully connected layers with 128 neurons and 2 or 3 neurons in the output layer depending on binary or multi-class classification, accompanied with the softmax activation function.