1 Introduction

Healthcare is a very substantial issue to be managed effectively for any country’s growth. One of the major challenges for any healthcare agency is to protect the people from diseases that can be spread from one infected person to another through the air or other communication medium. A novel Coronavirus (N-CorV) is one of the common virus infection diseases which was firstly originated in Wuhan city (China). The Emergency Committee of the World Health Organization (WHO) on January 30, 2020, declared coronavirus as a pandemic as it spreads rapidly from person to person and most affected to the people with a weak immune systemFootnote 1. On February 11, 2020, the WHO designated the 2019-nCoV epidemic disease as Coronavirus Disease 2019 (CorV) (Gorbalenya et al. 2020). Severe Acute Respiratory Syndrome (SARS) CoV-2 is a new type virus of novel coronavirus family (N-CorV) (Lai et al. 2020; Stoecklin et al. 2020). N-CorV is categorized under the family of coronavirus which can cause multiple diseases in humans as well as animals. The N-COV-19 pandemic as of date has 306,173,517 confirmed cases with more than a thousand deaths. Coronavirus symptoms are similar to moderate Middle East Respiratory Syndrome (MERS) or Severe Acute Respiratory Syndrome (SARS). The N-CorV is characterized as a serious threat owing to its higher death rate and increased medical problems. The advent of prevailing computing and emerging technologies can help make a powerful health monitoring system in a real-time application to slow down its spread. In 2003, SARS viral respiratory disease associated with coronavirus was found in South Arabia. In 2015, the MERS outbreak was detected in Saudi Arabia and caused 858 deaths. A CorV patient may have a variety of symptoms and signs of infection, including fever, dry cough, kidney failure, and respiratory disease, and throat infection that resulted in severe acute respiratory distress (Salman and Salem 2020). In severe circumstances, the infection can lead to pneumonia, breathing problem, multi-organ failure, and death (Mahase 2020). The health systems of several advanced countries are on the verge of collapsing, as a result of the increasing rate of CorV patients. A sufficient number of testing kits is lacking in many countries throughout the world. Many nations have announced total lockdown and ordered citizens to stay home.

The main communication sources of airborne diseases transmission are air droplets and human contact (Chang et al. 2020). To combat CorV, effective and accurate screening is very important (Wang et al. 2020). The patient can be isolated and home quarantined to reduce the transmission through the contact of that person (Li et al. 2020). The main core screening method is a real-time reverse transcription-polymerase chain reaction (RT-PCR) for early detection of CorV (Corman et al. 2020). The test is performed on the patient and the report can be obtained within 6–48 h (Xie et al. 2020). An alternate method to RT-PCR is chest computed scan images (CT) (Adair and Ledermann 2020). Intrinsically, a Deep learning(DL)-based diagnostic approach can encourage the experts to achieve an accurate and rapid decision to diagnose the CorV (Shibly et al. 2020).

1.1 State-of-the-art contributions

Based on the aforementioned aspects, numerous research has been published in the literature including the diagnosis of CorV. The current study indicates a model that can separate CorV, viral pneumonia, bacterial pneumonia, and healthy cases (normal) into a 4-class model. Here, the major contributions are listed below:

  1. 1.

    A SQueezeNet model is presented for the rapid diagnostic of CorV incorporated DL approach.

  2. 2.

    An offline augmentation is performed to find the solution to the imbalance problem of the open-source dataset. In addition, the benefits of transfer learning and fine-tuning are incorporated to reduce the problem of overfitting and speed of convergence.

  3. 3.

    To achieve and enhance classification results, the current research focuses on binary and multi-class classification using a pre-trained fine-tuned CNN model.

  4. 4.

    The proposed framework outperformed state-of-the-art decision-making techniques for extracting features by using learned weights from pre-trained CNN models.

Many studies have been stated that DL approaches seem to be significant potential for detecting the CorV disease classification. The current study aims to present a deep neural network-based model for the early detection of CorV. Inspired from the beneficial aspects of a CNN-based deep neural network-based SQueezeNet model incorporates dynamic graph-based spectral clustering using a multi-feature extraction strategy is used for the accurate detection of CorV. Additionally, the proposed model utilizes fine-tuned DL technique on a limited dataset to avoid the overfitting problem. Pre-trained CNN models of Inception-ResNetV2, GoogleNet, VGG2, ResNet152, AlexNet, and DenseNet512 were applied as transfer learning models to detect CorV from chest X-rays images. Henceforth, the suggested model can be further used for large datasets with the benefits of transfer learning and fine-tuning.

The following are the key aspects of the proposed study;

  1. 1.

    Without manual interventions, the suggested model has an end-to-end structure for selection and classification.

  2. 2.

    The present work has been performed on a large dataset to overcome the problem of overfitting.

  3. 3.

    The proposed model uses six different pre-trained CNN models to outperform results. The performance of the suggested model has significantly higher.

  4. 4.

    The presented model identifies the suspected CorV patients with maximal accuracy so the necessary treatment can be provided timely.

  5. 5.

    Experiments have shown the enhanced results in terms of statistical parameters.

1.2 Paper organization

The rest of the paper is structured as follows: Sect. 2 represents the literature work in the current domain of study. Section 3 discusses the proposed model for CorV detection. Section 4 presents the experimental simulation for performance assessment. Finally, Sect. 5 concludes the paper with future research directions.

2 Related work

Table 1 Comparative analysis with state-of-the-art work

2.1 DL-based CorV identification techniques

DL techniques are a subset of Machine Learning (ML) that has significant potential for automatic detection of disease through medical imaging with professional experts. As a result, the medical community has emphasized the need of focusing on the advancement of diagnostic technology (Liu et al. 2019). Jain et al. (2021) used medical images with DL techniques for diagnosing lung-related problems. Furthermore, the authors compared 3 DL models for performance assessment. Afshar et al. (2020) suggested a framework model based on Capsule Networks for using X-ray images to diagnose CorV disease. Several convolution layers and capsules are used in the proposed work to remove the imbalance problem of a class. Further, the authors have shown that CorV-CAPS performs satisfactorily on lesser trainable parameters in experimental investigation. Apostolopoulos and Mpesiana (2020) used an evolutionary neural network to identify CorV automatically. The procedure is known as transfer learning, in particular, has been implemented with a neural network to classify into common pneumonia, CorV-induced pneumonia with an achieved accuracy of 93.4%. Hemdan et al. (2020) created a COVIDX-Net model that takes X-ray pictures into account. The COVIDX-Net model was trained using 50 X-ray images with seven distinct CNN models. Zhang et al. (2020) presented a general DL approach for automatically extracting and analyzing areas with a high risk of CorV infection. The scientists used a DL-based method to perform a segmentation stage. The contaminated areas were then analyzed and quantified in the CT scan using specified metrics. Xu et al. (2020) proposed a pre-trained Convolution Neural Network extract a potentially infected portion from the computed tomography images. These regions are used to classify the disease into three categories of CorV, Influenza-A viral, and infection. The experiment result has shown an overall accuracy of 86.7%. Nayak et al. (2021) compared the eight pre-trained models of convolutional neural networks. The findings reveal that ResNet-34 outperformed as compared with other state-of-the-art models for the categorization of CorV from normal instances, with an accuracy of 98.33%. Recent researches have been shown the capabilities of CNN to solve challenging tasks such as classification, segmentation, and object detection.

2.2 Smart medical data analysis

Smart medical analysis is an important research domain. Numerous researchers have contributed significantly for healthcare data analysis for real-time services, medical report generation (Huang et al. 2021; Pahwa et al. 2021), and recommender systems. Chan,Zeng,Wu andSun (2018) described detection of numerous abnormalities in the small medical image to detect pneumothorax. The authors used the SVM method to diagnose the features of lung diseases using pattern analysis methods. The authors employed texture segmentation to represent the defective lungs in the proposed detection model. Bai et al. (2020) discussed the CorV Intelligent Diagnosis and Treatment Assistant Program based on the Internet of Things (IoT). The main goal was to enable different levels of CorV diagnosis and treatment by using medical technology. Yildirim et al. (2019) proposed a framework for the diagnosis of diabetes mellitus using the transfer learning approach. Dorj et al. (2018) proposed a model for the classification of skin cancer using SVM and a deep convolutional neural network. Brunetti et al. (2019) discussed a computer-aided system that can support physicians in classifying different kinds of breast cancer, liver cancer, and blood tumors revealed by images. Furthermore, the authors purposed a framework for three kinds of diseases classification acquired via Computer Tomography, Magnetic Resonance, and Blood Smear systems. In the context of computer vision, Zhou et al. (2019) proposed a multi-feature extraction approach with an adaptive graph learning model for unsupervised Reidentification for non-textual data. Furthermore, the authors incorporated multi-feature dictionary learning and adaptive graph learning into a unified learning model. The optimization algorithm is used to prove convergence. The experimental has been performed on four datasets to depict the superiority and effectiveness of the proposed method. Alqudah, Qazan and Alqudah (2020) used the Xception architecture to detect CorV infection by classifying images into three categories. Ucar and Korkmaz (2020) proposed an intelligent and efficient decision-making system for CorV. The authors used an AI-based model incorporating a Bayesian network that outperforms for identification of CorV. Li et al. (2018) proposed an optimization algorithm that is adopted to effectively solve the clustering problem by using one parameter in the learning process. Chang et al. (2015) proposed a novel compound rank-k projection algorithm for bilinear analysis. In the proposed algorithm, the authors used the multiple rank-k projection models to find the best optimization solution. Luo et al. (2017) proposed an unsupervised feature selection to produce a faithful subset from the feature space by maintaining the intrinsic structure accurately. To achieve the subset, optimal graph reconstruction and selective matrix techniques are used. Furthermore, the various results have shown the effectiveness of the proposed algorithm. Ren et al. (2021) discussed the characteristics of the neural search for medical data, issues and provided the comparative analysis for performance assessment. Yu et al. (2018) proposed the Adaptive Semi-supervised Feature Selection (ASFS) for cross-modal retrieval. In the proposed model, the authors used an efficient joint optimization algorithm to update the mapping matrices and label matrix for non-labeled data. Experimental results have shown the reliability and efficiency of the model. Luo et al. (2017) proposed a novel semi-supervised feature analyzing framework for video semantic identification. In the paper, the authors have included the adaptive optimal similarity matrix learning into the procedure of feature selection. In Li et al. (2019), by including adaptive optimum similarity matrix learning into the feature selection technique, authors have created a semi-supervised feature analysis framework for video semantic identification. Moreover, authors have been shown fine-tuned parameters with other methods to achieve higher accuracy. Table 1 shows strengthen and weaknesses for some of the relevant models presented are numerous researchers.

3 Proposed model

DL models have been effectively employed in a wide range of fields, including medical data classification, segmentation, and lesion identification (Brunetti et al. 2019). DL models are used to analyze medical imaging modalities such as magnetic resonance imaging (MRI), computed tomography (CT), and X-ray images (Gaál G et al. 2020). Consequently, studies with the identification of illnesses such as diabetes, brain tumors, and cancer are available. Nowadays, Convolutional Neural Networks (CNNs) model is used to solve image identification and classification issues (Albawi, Mohammed and Al-Zawi 2017). The X-ray images are preprocessed by augmentation and normalization before passing to the fully connected layers of the CNN model. Specifically, images are converted to matrix format. Based on disparities in images and matrices, the system decides which image corresponds to which label. During the training phase, it learns the effects of variations on the labeled data and predicts to generate the new image. Figure 1 depicts the outlined architecture of the proposed model. For such purposes, deep CNN is employed containing 3 different layers: Convolution layer, Pooling layer, and Fully Connected layer. Both convolution and pooling layers are used in the feature extraction process. These layers explicitly assume that any input to them is an image that helps to increase efficiency. Moreover, such layers are effective to extract spatial and temporal features of an image by incorporating filters. Unlike traditional feed-forward, these layers contain a considerably reduced number of parameters and employ a weight-sharing and data augmentation approach to reduce the computing requirements (Ahmed, Hossain and Noor 2021). In the proposed work, a Spectral Clustering (SC) approach is used with the proposed algorithm to extract clusters values from the input datasets (Li et al. 2018). The pre-processing and feature extraction of all database images are performed for indexing purposes. Each of the layers is detailed ahead;

3.1 Convolution layer

The first layer is responsible for identifying numerous attributes of the presented CNN technique. The learning parameters of each layer consist 3 \(\times \) 3 or 5 \(\times \) 5 shaped matrix kernel which is used to transform the input matrix. When an image is forwarded through a filter, the value from filters creates a feature map of that filter. By applying the filters, such layers are used to extract low and high-level pattern characteristics. The stride parameter specifies the number of steps to correct for shifting across the input matrix. The output is computed as:

$$\begin{aligned} Z_{j}^{k}=f\left( \sum _{m=1}^{N}{w_{j}^{k-1}*{y}}_{m}^{k-1}+ {b}_{j}^{k}\right) \end{aligned}$$
(1)

where, \(Z_{j}^{k}\) is feature map in a layer.

\({w_{j}^{k-1}}\) indicates the jth filter in k-1 layer.

\({y}_{m}^{k-1}\) represents the feature map in k-1 layer.

\({b}_{j}^{k} \) indicate the bias of \(j^{th}\) feature map in \(k^{th}\) layer. N is the total number of features in \((k-1)th\) layer.

(*) represents vector convolution process. The complexity of proposed algorithm per time step can be expressed as the addition of the complexity of the convolution layers and the pooling layer for all training process. Asymptotically, it is represented as:

O\( \left( \left( \sum _{k=1}^{d} \right) \left( {s}_{k}^{2} * {n}_{k}*{m}_{k}^{2}+w\right) *i*e \right) \)

where i is the input length, d is the number of convolutional layers, w indicates the number of weights, and e denotes the number of epochs.

Fig. 1
figure 1

Conceptual perspective of proposed model

3.2 Pooling layers (PLs)

After the initial layer, the next layers are PLs which can be used to diminish the spatial size of the representation, number of maps attributes, and parameters of the network. It is done by applying the mathematical computation generated by previous kernels after CNN. It is also responsible to extract dominant features(Bailer, Habtegebrial and Stricker 2018). The current study uses the max-pooling (MXP) and Global Average Pooling (GAP) for reducing dimensionality and complexity. The MXP algorithm selects the highest value in each feature map by utilizing the matrix size to give fewer output neurons. MXP layer is generally included between two convolution layers which separate the input and generate the highest value as output. In addition, the GAP layer finds the average to reduce data into a single dimension. The flattering layer of PL gathers data into a single vector and forward it to the fully connected layer.

3.3 Fully connected and activation function layer (FCL)

FCL is the final layer of the presented CNN model. Rectified linear unit (ReLU) activation function is used on the FCL which is similar to multilayer perceptron. Moreover, softmax activation function is utilized to predict the output image. Both these functions are mathematically represented as:

$$\begin{aligned} ReLU(z)= {\left\{ \begin{array}{ll} 0, &{} \text {if} \quad z<0.\\ z, &{} \text {if} \quad z\ge 0. \end{array}\right. } \end{aligned}$$
(2)
$$\begin{aligned} Softmax(z_{i})= \frac{{e}^{{x}_{i}}}{\sum _{j=1}^{c}{e}^{x_{j}}} \end{aligned}$$
(3)

where, \(x_{i}\) represents the inputs and c indicates the number of classes.

Definition 1

Transfer learning (TL) The process of transfer learning from one pre-trained model to create a new model by using the computed network layer weights is called TL (Swati et al. 2019).

TL technique is more useful in medical applications. The main advantage of adopting the TL approach is that as it allows for data training with fewer datasets and requires less cost in terms of calculations. TL approach in conjunction with deep CNN is used to achieve the information by the pre-defined models, transferred on a large dataset to the model to be trained (Chouhan et al. 2020). Therefore, TL is used to leverage the knowledge such as features, and weights learned from the source domain for training newer models of the destination domain.

3.4 Stepwise approach

Figure 2 depicts the step-by-step layered architecture of the proposed model. Some of the brief aspects of the presented technique are mentioned ahead.

  1. 1.

    As convolution is a gradual process, it is applied recursively to an image. It extracts various features of an image as an input.

  2. 2.

    Apply pooling recursively on all the images to make the features robust against noise and dimensionality reduction.

  3. 3.

    The image is classified according to the labeled training class.

Fig. 2
figure 2

Layered architecture of proposed work

Furthermore, a transfer learning technique is used with the presented CNN model to overcome insufficient data and training delay.

3.5 Model development

A convolutional network consists of a chain of convolutional layers, ReLU activation functions, pooling layers, and a fully connected layer with a softmax activation function. Every neuron of each layer is linked to all neurons in the following layer, making CNN a coordinated form of multilayer perceptrons. Convolutional filters increase the visual field when layers convolute input using kernels. The hierarchical process provides high-level feature maps, variation in subsequent input layers, and enhances the performance as it uses a condensed network that helps to train models and reduced computation complexity. The last layer of the proposed model is fully connected and trained with the size of separate output classification classes. The feature maps are used to represent a global state of a network. Pre-trained deep neural networks models are initially trained by using its labeled ImageNet dataset. Therefore, by considering such characteristics, a fine-tuned DL (FTDL) model of SQueezeNet along with CNN is proposed to accomplish the classification process.

3.6 Decision visualization

The proposed technique takes into account the visualization of internal representations, feature extraction in convolutional layers, and computational performance as major aspects. The main aim of the presented work is to provide visualization to each convolutional kernel, find out multi-features without redundancy, and learn to extract features from the visual input when trained with pre-defined CNN models. For such purposes, various feature extraction classification techniques such as spectral clustering can be used for accurate classification. (Li et al. 2018) proposed a rank-constrained spectral clustering which is an explicit clustering strategy. The approach utilizes an adaptive neighborhood learning process to find the diagonal affinity matrix of an ideal graph. In the process, the similarity matrix is learned simultaneously. Based on a rank restriction of the matrix, the clusters are guaranteed to converge. The limitation of the explicit clustering is complexity and computational time as it uses the predefined representation. Another approach in Computer-aided vision is zero-shot event detection via the event-adaptive concept. It uses the correlation between an event, concepts, and pre-trained concepts from external sources (Li et al. 2019). Such an event-based concept is more useful in video-based event data. In the present work, the proposed algorithm used dynamic affinity graph construction for spectral clustering using multiple features incorporated CNN for CorV classification (Li et al. 2018). To get an idea about the CorV detection transparency, the current study employed Graph-based spectral clustering (GBSC) using multiple features for detecting the regions where the proposed model has acquired enhanced efficacy for the classification .

3.6.1 Multifeature spectral clustering (SC)

The data is effectively transformed into a set of data points in Euclidean space by mapping it into a high-dimensional feature space, where each coordinate corresponds to one of the data items. SC usually constructs an affinity matrix to measure the relationship among data points in space. Such mapping is called the kernel approach which works on CNN layers of the proposed model. The main purpose of SC is to transform the representations of the data points into the indicator space in which the cluster characteristics are more prominent. SC is mainly used in computer vision applications to construct similarity matrices for data partitioning. Consider a data set \({\left\{ x\right\} }_{i}^{n}\in {R}^{d_{m}*n} \) where \(d_{m}\) represents the dimensionality of the \(m^{th}\) feature. All data points are grouped into u clusters such that \(U_{i}\) i=1,2,...,u such that data points from the same cluster can be kept close to one another. Let \({\left\{ y\right\} }_{i}^{n} \in {R}^{d_{m}*u}\) represents the cluster matrix indicators where \(y_{i}\) is cluster matrix indicator correspond to \(x_{i}\). The scaled cluster matrix is represented as:

$$\begin{aligned} G = Y(Y^{T}Y)^{-1/2} \end{aligned}$$
(4)

The neighbors of \(x_{i}\) will be the k-closest samples in a given data set. The weighted affinity matrix can be represented by using the Euclidean distance as:

$$\begin{aligned} p_{{ \text{ ij }}} \!=\!{\left\{ \begin{array}{ll} \!\exp \!\left( {-\frac{\Vert x_{i} - x_{j}\Vert ^{2}}{\delta ^{2}}\!}\right) \!, &{}\!\!\text{ if }~ x_{i}~\text{, } ~x_{j} ~\text{ are } ~k ~\text{ nearest } \text{ neighbors }\\ \!0, &{}\!\!\text{ otherwise } \end{array}\right. } \end{aligned}$$
(5)

\(\delta ^{2}\) is used as a scaling parameter which defines the size of neighbor. The Laplacian graph L is calculated as L= D-W, where D represents to diagonal matrix as \(D_{ii}={\Sigma }_{j}{P}_{ij}.\) The normalized Laplacian matrix is written as:

$$\begin{aligned} L_{P } = I-D^{1/2}PD^{1/2} \end{aligned}$$
(6)

The learned matrix is denoted by \(W^{v}\in {R}^{d_{m}*r_{m}}. r_{m}\) denotes the reduced dimensionality of m feature at the polling layer. The low dimensional subspace data with constraint \( (W^{m})^{T} Z_{t}^{m} W^{m} = I\) can de defined as:

$$\begin{aligned} Z_{t}^{m}= (X^{m})^{T}EX^{m} \end{aligned}$$
(7)

\(Z_{t}^{m}\) is total of scatter matrix.

\(X^{m}\) defines the \(m^{th}\) feature of X.

\((X^{m})^{T} \) represents the transpose of \(X^{m}\).

$$\begin{aligned} E = I-\left( 1/n\right) RR^{T} \end{aligned}$$
(8)

where R is a vector whose all elements are 0.

All the data points are \(x_{i}\) are connected as a neighbour with a probability \(W_{ij}\), Then the distance between \(x_{i}\) and \(x_{j}\) in low dimension subspace is computed as:

$$\begin{aligned} \sum _{m} \alpha _{m} \big \Vert ({\varvec{W}}^{m})^{T} x_{i}^{m} - ({\varvec{W}}^{m})^{T} x_{j}^{m} \big \Vert _{2}^{2} \end{aligned}$$
(9)

\(\alpha _{m}\) denotes the weight of \(m^{th}\) feature.

As a result, the optimal data similarity matrix is obtained by local learning to generate the optimal affinity graph.

Fig. 3
figure 3

Architecture of proposed model

3.6.2 SQueezeNet (SQNet) model

SQNet is a convolution network that outperforms AlexNet while using 50x fewer parameters (Iandola et al. 2016). SQNet has eighteen layers, including convolution layers, MXP levels, GAP layer, fire levels, and softmax output layer. Figure 3 represents the architecture of the proposed model. The K \(\times \) K, s, and l specify the filter’s receptive field size, stride, and length of the feature map respectively. The input has a 227 \(\times \) 227 dimensional image and consists of RGB channels. Convolution is used to generalize the input images and max pooling. With 3 \(\times \) 3 kernels, the weights and small regions in the input are convolved by the first layer of CNN. As the positive component of the argument, each convolution layer executes element-by-element activation. Between the convolution layers, SQNet makes use of the fire layers, which are made up of squeeze and expansion stages. The size 1 \(\times \) 1 filter is used in the squeeze phase, whereas the size 1 \(\times \) 1 and 3 \(\times \) 3 filters are used in the expansion phase. The input tensor H \(\times \) W \(\times \) C is squeezed, and the total number of convolution channels is equivalent to the number of input tensors i.e.C/4.

After the initial step, the data goes through the expansions, and its depth is increased to the output tensor. Both the squeezing and expanding stages involve the ReLU. The squeezing process reduces the depth, whereas the expanded stage increases it while maintaining the same feature size. Finally, using the merging operation, the expanded outputs are stacked in the dimension of the input tensor. Figure 4 depicts all operations of the SQNet architecture.

Fig. 4
figure 4

Summarize structure of fire layer

Finally, the resultant f(y) of the squeezing process with the kernel(W), feature maps(FM), and C specify channels of different tensors respectively, which can be mathematically represented as:

$$\begin{aligned} f(y) = \sum _{fm=1}^{FM}\sum _{c=1}^{C} W_{c}^{f} * x_{c}^{fm} \end{aligned}$$
(10)

where f(y) is an output \(\in R^{N}. \) Let \(X_{i}\) be an input with a size of \((W_{i},H_{i},C_{i})\in R^{N}\) of layer i. W represent the weights, H defines the height, and C indicate the channel respectively. The MXP layers along spatial dimensions execute a down-sample in the network and the GAP, which converts the feature map classes into a single value. In the last layer, the Softmax activation function produces multi-class probability distributions. SQNet is a base model along with dropout and FC layers. Table 2 represents the detailed layered, output shape of the model. Multiclass classifiers, which are also known as output layers in Neural networks are often used to classify images into a collection of categories. The classifier requires individual features for the classification to conduct calculations. As a result, for the classifiers, the output given by the feature extractor is transformed into a 1-dimension feature vector. The result of the convolution operation is flattened to produce one long feature vector for the SQNet layer to use in its final classification step, which is known as flattening. A flattened layer, a dropout of size 0.5, convolution layers, a ReLU, and a softmax activation function perform the classification tasks in the classification layer.

Table 2 Detailed configuration of the model

The motivation for developing the SQNet architecture in CorV diagnostic as provides three important benefits:

  1. 1.

    As it employs a lesser number of parameters, the network is more efficient;

  2. 2.

    Applications based on such model is easier to proceed and need less communication; and

  3. 3.

    In context to memory, it is easier to integrate into embedded systems as it requires less than 5 MB.

  4. 4.

    To initialize the parameters, the current study used TL to solve the problem of overfitting.

4 Experimental simulation

DL approaches have been deployed in recent years continue to perform admirably in the domain of medical image processing, as well as in many other fields. It is an endeavor to extract relevant data from medical data by applying DL techniques to it. The study suggested an automated identification of CorV by utilizing deep CNN-based transfer models on chest X-ray images. Transfer learning technique offers a promising technique in case of inadequate data. For such purpose, Inception-ResNetV2, GoogleNet, VGG2, ResNet152, AlexNet, and DenseNet512 are used as pre-trained models to extract learned features. In the current study, to achieve better predictability outcomes for three different datasets, comprising images (X-Ray) of non-infected, infected, bacterial, and viral pneumonia patients are used(Yadav and Jadhav 2019). Furthermore, CNN has proven to be efficient in transfer learning when trained on large-scale datasets like the ImageNet dataset. The collection contains chest X-ray images mostly from patients suffering from SARS, CorV, the Middle East respiratory syndrome (MERS), and pneumonia collected from the GitHub shared by Cohen et al. (2020). In addition, 165 images are selected from the “COVID-chest-ray-dataset” DatasetFootnote 2. The Experiments have been performed on three datasets (Cov-Dataset1, Cov-Dataset2, Cov-Dataset3) with X-ray images. The dissemination of images per dataset is given in Table 3. The collected datasets consist of a total of 8731 images with 7662 patients of disease-infected cases. The data set includes 2860 Normal, 2228 Bacterial pneumonia, 3517 Viral, and 126 CorV disease cases.

Table 3 Number of images per CorV-Dataset

In the training dataset, the data augmentation approach is applied with the scaling (1/255), shear range (1/5), zoom (1/5), and horizontal flipping. All images are resized to 227 \(\times \) 227 pixels in CorV-datasets.

4.1 Experimental setup

The current section describes the experimental setup and performance assessment of the SQNet model. The suggested method’s influence is explored in terms of accuracy, evaluation metrics, and computing efficiency. The training and testing are performed in a Matlab environment running on a workstation with 3.3 GHz CPU dual Intel Xeon E5, Quadro M4000 8 GB GDDR6 GPU, and 512 GB of RAM. Furthermore, the evaluation metrics of the proposed network are compared with state-of-the-art techniques.

4.1.1 Performance evaluation metrics

The current subsection described the quantitative performance of the proposed approach such as accuracy (ACR), truthfulness (TR), faultless (FLT), specificity (SPFC), f-measure, and statistical measure computed from confusion matrix as Matthew correlation coefficient (MCR) of the proposed model. These metrics are described ahead:

  1. 1

    ACR determines the classification evaluation of the suggested model.

  2. 2

    TR specifies the rate of true classification of the images.

  3. 3

    FLT specifies the corrected detect negative images.

  4. 4

    F-measure calculates the harmonic mean and the combination of TR and FLT.

  5. 5

    MCR specifies the classification quality of the performance.

The evaluation matrix as per confusion matrix can be expressed as:

$$\begin{aligned} ACR = \frac{{N}_{TRP}+{N}_{TRN}}{{N}_{TRP}+{N}_{TRP}+{N}_{FLP}+{N}_{FLN}} \end{aligned}$$
$$\begin{aligned} TR= = \frac{{N}_{TRP}}{{N}_{TRP}+{N}_{FLP}} \end{aligned}$$
$$\begin{aligned} FLT= = \frac{{N}_{TRP}}{{N}_{TRP}+{N}_{FLN}} \end{aligned}$$
$$\begin{aligned} SPFC= = \frac{{N}_{TRN}}{{N}_{TRN}+{N}_{FLP}} \end{aligned}$$
$$\begin{aligned} MCR = \frac{({N}_{TRP}*{N}_{TRN})-({N}_{FLP} * {N}_{FLN})}{\sqrt{{({N}_{TRP}+{N}_{FLP})*({N}_{TRP}+{N}_{FLN})*({N}_{TRN}+{N}_{FLP})*({N}_{TRN}+{N}_{FLN})}}} \end{aligned}$$
$$\begin{aligned} F-measure = 2* \frac{{TR*FLT}}{{FLT}+{TR}} \end{aligned}$$
(11)

Here \({N}_{TRP},{N}_{TRN}, {N}_{FLP}, {N}_{FLN}\) describes the number of correctly classified with the disease, number of correctly defines with another disease, number of incorrectly classified with the disease, number of incorrectly classified with another disease respectively. By using such evaluation metrics, the classification method of the proposed model enhances the performance and effectiveness to determine the disease.

4.1.2 Training and testing implementation

Firstly, data augmentation is performed to the raw dataset in the experimental setup both for fine-tuning and end-to-end training. The five-fold cross-validation is used to enhance the validation performance to get a robust result. Further, the enhanced dataset is distributed into training, validation, and testing parts. The dataset packages are split into three groups: 80% for training, 10% for validation, and 10% for testing. The DL SQNet network uses training and validation datasets. The adaptive moment estimation (ADAM) optimizer and the cross-entropy loss function are used for training, with a starting learning rate of 0.001 and decreased by 1 after two epochs. Training accuracy represents a correctly labeled that is shared by the training image. Cross-entropy defines that how finely the loss function is trained. Validation accuracy described the training accuracy. The SQNet has been trained for 100 iterations at a learning rate of 0.001 and subsequently trained at a very low learning rate of 0.0001. Figure 8 represents the training and testing accuracies and Fig. 9 depicts the loss values for both training and testing. As a consequence of the validation, the objective function error is minimized, and the optimal network model is utilized to implement the testing phase. The derived best model is evaluated using a different test dataset. All of the input images are scaled into a resolution of 227 \(\times \) 227 pixels.

To overcome the adverse effect of overfitting, all of the dataset packages are reorganized. As a result, a reliable and effective decision-making performance for classifying the infected people instances is achieved. The small batch size is set to 36 in the training phase, and all images are normalized using the mean subtracting procedure. The average computational time for the proposed SQNet model on CPU is 0.347s and for GPU it is 0.152s. Cumulatively, there are a total of 1502 healthy cases, 1463 bacterial cases(non-CorV), 2513 viral cases, and 84 CorV patient cases. In pneumonia cases, both viral and bacterial samples are considered. Table 4 depicts the raw datasets and augmented datasets.

Table 4 Raw and augmented datasets class distribution

The deep SQNet model proposed incorporates optimization in the training stage as well as a validation step. Figure 5 shows the optimization process’s objective function. Because of the model saturation, function evaluation is shown to terminate after 30 iterations can be seen. The minimal observed aim to create the best model is accomplished at the end of the 9th iteration. The augmentation approach increases the performance enhancement by approximately 20 times for the proposed model.

Fig. 5
figure 5

Minimum objective vs function evaluation

The best model parameters obtained during the training procedure are utilized in the proposed model. The relevant dataset is used for the pre-training and testing operations from the trained and test packages. The current study presents the dataset results to evaluate the performance of the augmentation approach.

4.1.3 Results

The confusion matrix of the test procedure of the proposed model is depicted in Fig. 6. In the confusion matrix, each column represents the accurate value of each class and the row states the individual accurate value of the class. Figure 6 depicts the confusion matrix of 4-class of Cov-dataset2. The average ACR, TR, FLT, SPFC, F-measure, and MCR are described in Table 5. In such a case, the accuracy is 97.8%, the value of specificity is 91.0%, and f-measure is 92% of the SQNet model. Figure 10a–d depicts the pictorial representation of the proposed model.

Fig. 6
figure 6

Confusion matrix for 4-class classification

Table 5 Average accuracy, precision, recall, F-measure, specificity (in %)

The performance for pneumonia (bacterial and viral) is less compared to other classes. When it combines into a single class, the average accuracy increases significantly by making the slight modification in 4-class with the SQNet model and fine-tuned. The model was tested on 45 CorV cases, 76 Normal, and 128 pneumonia cases after the fine-tuning process. The confusion matrix for three-class is described in Fig. 7. After combining pneumonia classes, the average Accuracy of the SQNet model is increased from 97.8% to 98.4% as shown in Table 6.

Table 6 Average Accuracy, Precision, Recall, F-measure, Specificity(in %) (three-class classification)
Fig. 7
figure 7

Confusion matrix after fine-tuned into three-class

Figure 8 depicts the graphical representation accuracy of the proposed model.

Fig. 8
figure 8

Accuracy of the proposed model

Table 6 represents that SQNet has a 92.7% precision value. The F-measure and MCR values also show that a stable categorization has been achieved. A higher recall value indicates that a low number of false negatives (FLN) is an encouragement to better outcomes.

Fig. 9
figure 9

Loss value for training and testing

Fig. 10
figure 10

a Accuracy, b Specificity, c F-measure, d Matthew correlation coefficient (MCR)

4.2 Comparative analysis

On February 11, 2020 WHO declared the CorV illness to be an epidemic. The identification of CorV has become a global menace, as a result of it being declared a pandemic. The use of DL techniques in images classification procedures can aid in the early detection of illness. In terms of the deep neural network, CNN performs better than the traditional diagnosis methods for efficient classification. Deep SQNet is presented as a fast, reliable, and efficient CorV diagnostic technique in the current study. The suggested approach classifies CT images into Healthy, Pneumonia, viral, and CorV categories. Li et al. (2020) present a COVID-Xpert architecture based on DenseNet using medical images for identifying CorV patients. The experiment results have shown the overall accuracy of 89.8% via transfer learning. Wang et al. (2020) presents the COVID-Net architecture for CorV diagnosis, and the primary model based on the customized CNN. The model architecture is improved via machine-driven design. Experiment results provide overall accuracy, COM, and COR scores of 0.923, 0.887, and 0.913, respectively. The COVIDx dataset utilized is also shared and collected by the authors. Farooq and Hafeez (2020) proposed a ResNet-based system for identification of CorV. The model’s accuracy has remained constant at 0.962. Table 7 depicts that the proposed SQNet model has achieved a higher accuracy value of 98.4% when compared to other CNN based models.

Table 7 Comparative analysis

4.2.1 Discussion

Since the publication of a dataset generated by Cohen, researchers have been studying chest X-ray pictures for precise prediction of CorV infection (Cohen et al. 2020). Moreover, numerous studies have been attempted to create an accurate diagnostic model by utilizing DL methods. The TL method has been widely used in CNN-based networks. On the other hand, most of the researchers had performed experiments with a restricted amount of data. In certain situations, the datasets are also unbalanced. For such purposes, the pre-defined CNN models such as AlexNet, VGG2, ResNet52, GoogleNet, AlexNet, and InceptionV3 for the prediction of CorV infections are evaluated. The datasets are collected from two sources cohen and “COVIDchest Xray-dataset” (Cohen et al. 2020). The experiment is evaluated on a large dataset by taking into several factors to enhance the performance of the proposed model for automated diagnosing CorV. Furthermore, the advantage to use the fine-tuned SQNet model, it uses fewer parameters. In addition, transfer learning is used to find the solution to the overfitting problem. The experimental results have shown that the SQNet model achieved 98.4% accuracy as compared to other models. The present approach is cost-effective and can assist in making timely decisions. The main goal of the present study is to take effective treatment decisions for quarantine patients, which will assist to reduce the spread of CorV infection. In the future, the current research intends to validate the proposed model by incorporating more images.

5 Conclusion

In the present work, a DL-based model is proposed for effectively classifying the CorV infection cases from healthy, viral, and bacterial pneumonia cases by using chest X-ray images. The model is developed to provide accurate diagnostics for both binary and multi-class classification. The proposed model produced 97.8% and 98.4% accuracy for four-class and three-class classification respectively. The limitation of the current research is the inability of the patient for X-ray scanning in a severe situation. In the future, the presented model will be placed in a cloud to provide diagnosis instantly and to help affected patients immediately. The current research intends to make a model more robust and accurate for the future. The current research contributes to the development of cost-effective approaches for combating the disease. Such approaches might be explored for further research to demonstrate real-world implementation.