1 Introduction

Corona virus disease (COVID-19) is a viral infection caused by the SARS-CoV-2 (Severe Acute Respiratory Syndrome Coronavirus 2) virus, which was initially discovered in Wuhan, China, in December 2019 and quickly spread around the world (Singhal 2020; He et al. 2020). As per the statistics given by World Health Organization, more than 500 million people have been infected worldwide, with roughly 6 million confirmed mortality cases (WHO) (WHO 2022). COVID-19 is classified as a respiratory disease as its symptoms includes high fever, myalgia, and sore throat with dry cough, headache, and chest pain. COVID-19 can spread to other people through minute liquid particles from an infected person's lips or nose when they cough, sneeze, speak, sing, or breathe. The majority of those infected with the virus will have mild to moderate respiratory symptoms and will recover without the need for medical attention. However, some people will become extremely unwell and need medical help. People can be immunized against the virus through vaccination, although they are still susceptible to infection. Furthermore, due to the vaccine's limited supply and the population's geographical distribution, vaccinating the whole global population is a time consuming process. Despite the quick development of vaccines, the disease has spread to over 200 nations and locations.

The most extensively utilized approach for diagnosing COVID-19 is reverse transcription-polymerase chain reaction (RT-PCR) (Corman et al. 2020). Sick people may transmit the virus to close contacts if insufficient resources are deployed to isolate positive patients from other suspected cases while waiting for SARS-CoV-2 coronavirus confirmation by RT-PCR. In clinical practice, CXR radiography and CT (Kassania et al. 2021) (or response to treatment) examined routinely by radiologists are used to detect COVID-19, describe its severity, and track its prognosis. Despite the fact that CT has higher detection sensitivity, chest X-ray radiography is more widely utilized in clinical practice due to its benefits, which include low cost, low radiation dose, ease of use, and widespread availability in general or community hospitals. Figure 1 shows some samples of CXR pictures of COVID and non-COVID instances as an example.

Fig. 1
figure 1

a Depicting COVID-19 case and b depicting normal case of CXR images

Diagnosing this disease manually takes time and is also prone to human error, and it importantly requires an assistance of skilled radiologists. As anomalies seen early in COVID-19 may resemble those seen in some other pulmonary syndromes such as SARS-CoV-2 or Viral Pneumonia (VP) as shown in Fig. 2, an expert radiologist is essential. Because of the disease's recent origins and similarities to other respiratory disorders like pneumonia, effective interpretation of results through images presents a number of difficulties. CXR images are indicated for a variety of pulmonary disorders; therefore, any automated system created to detect COVID-19 should also examine other respiratory illnesses in order to provide a more comprehensive and robust diagnostic system. COVID-19, the disease caused by the novel coronavirus, can infect the lungs and cause pneumonia. Pneumonia is a potentially fatal lung illness. It can sometimes be dangerous in certain people, particularly the elderly and those with respiratory illnesses. Figure 2 represent various types of Pneumonia.

Fig. 2
figure 2

Various types of pneumonia predicted from CXR images

Machine learning (ML) is becoming more popular in medical imaging applications such computer-aided diagnosis (CAD) (Nemoto et al. 2016), radiomics (Leger et al. 2017), and medical image analysis. One of the key advantages of ML in medical imaging is its ability to automatically extract relevant features and patterns from large datasets, which can help improve diagnostic accuracy and reduce the time and cost of analysis. For example, in CAD systems, ML algorithms can be trained on large datasets of medical images and associated clinical data to identify patterns and predict the presence of disease or abnormalities. Similarly, in radiomics, ML techniques can be used to extract quantitative features from medical images, such as texture, shape, and intensity, and use these features to develop predictive models for disease diagnosis, prognosis, and treatment response. Finally, ML algorithms are also being applied to medical image analysis tasks such as segmentation, registration, and classification, which can help clinicians more accurately, identify and quantify the location and extent of disease within medical images. Overall, the use of ML in medical imaging holds great promise for improving diagnostic accuracy, reducing healthcare costs, and advancing our understanding of disease.

Deep learning is a subset of machine learning that uses artificial neural networks with multiple layers to learn complex representations of data. DL has gained prominence in various industries, including the field of medical diagnosis. In the field of medical diagnosis, deep learning methodologies are used to drastically improve image processing (Litjens et al. 2017; Altaf et al. 2019). Some of the practical applications of DL include image registration with localization, recognition of skeletal and cellular structures and computer-aided disease prognosis in medical imaging and diagnosis (Shen et al. 2017; Chen et al. 2022). In DL, the CNN technique is commonly used for medical imaging (Shin et al. 2016; Anwar et al. 2018), and it comes in a variety of flavors.CNN also used to differentiate COVID image from others. CNN has been successfully applied to a variety of problems, including skin cancer detection (Estava et al. 2017), arrhythmia classification (Yoon et al. 2020), brain disease prediction (Wang et al. 2019), breast cancer finding (Talo et al. 2019), fundus image decomposition (Tan et al. 2017), pneumonitis with X-ray image detection (Rajpurkar et al. 2017), lung sectionalization (Gaál et al. 2020), and White Blood Cells Classification (Gothai et al. 2022). Each layer of CNN extracts COVID-19-related information from chest X-ray images, which can be used to differentiate the COVID and Normal images. Because of the automatic feature learning capabilities provided with CNN, COVID-19 classification based on deep neural networks is becoming more popular (Babu et al. 2022; Ganesh Babu et al. 2022; Kavinkumar and Meeradevi 2021).

The detection of COVID-19 from chest X-rays is crucial in controlling the spread of the pandemic, but accurately interpreting these images can be challenging due to the subtle and complex patterns associated with the virus. To address this issue, researchers have turned to optimization algorithms, which are mathematical methods used to find the best solution to a given problem. Optimization algorithms can be used to extract features from the images that are indicative of the virus, such as ground-glass opacities or consolidation. By optimizing these features, the algorithm's ability to identify COVID-19 in chest X-rays can be improved, leading to earlier diagnosis and treatment. The use of optimization algorithms in the detection of COVID-19 from chest X-rays has become increasingly important in the context of the pandemic, as chest X-rays are a common diagnostic tool for COVID-19. Overall, the use of optimization algorithms represents an important advancement in the fight against the pandemic, as it can improve the accuracy and efficiency of COVID-19 detection, ultimately saving lives (Devi and Maheswaran 2018; Rakkiannan et al. 2023).

The research contributions were explored by tabulating the available DL technologies, emphasizing the obstacles, and suggesting the necessary future investigations. This research looks at and analyzes preprints and published studies on COVID-19 diagnosis using CXR images that become available from 2020 (Batcha et al. 2023; Bennet et al. 2023; Kathamuthu et al. 2023). Among other research databases, the papers were found in Science Direct, Springer, Springer Nature, MDPI, Hindawi, and IEEE. We searched through the abstracts to see which studies utilized traditional machine learning techniques and which investigated deep learning for chest X-ray pictures. The study's primary contributions are as follows:

  • It provides an overview of the several common DL-based methodologies that have been used in related research;

  • It describes the commonly used COVID datasets that are available publicly;

  • It provided an overview of preprocessing techniques and image augmentation utilized in DL approaches.

  • It depicts a high-level overview of several optimization techniques for fine-tuning various hyper parameters.

  • It details the performance of the different DL models.

The following is a breakdown of the structure of this review study. Section 2 gives a comprehensive examination of DL strategies for COVID input image analysis, comprising preprocessing approaches, methodologies, and CXR dataset repository. Section 3 reviews the approach and compares the conduct of several deep learning modeling applications. Section 4 delves into the findings of this investigation. Finally, Section V brings the research to a close.

2 Literature survey

We examined 44 studies that used DL strategies to analyze CXR images containing the symptoms of SARS-CoV-2 viral infections. Major research works used datasets of CNN architectures trained on the ImageNet to implement transfer learning. Some of the work, however, went apart from using this and used unique designs. We provide a general description of the CNN approach for COVID-19 detection and datasets used were examined in this review in the subsections below.

2.1 CNN-based model for COVID-19 prediction

Preprocessing and classifying using pre-trained CNN architectures to categorize a given chest X-ray image was modeled. A thorough understanding of the problem, data obtained, and production environment is required to determine the best preprocessing and augmentation processes for improving model performance. The following sections go over each stage in detail. Figure 3 shows a COVID-19 diagnosis system based on DL, which includes the steps detailed further below.

Fig. 3
figure 3

CNN-based architecture for COVID-19 detection

2.1.1 Preprocessing

Preprocessing is an important step in enhancing image quality and consequently model performance. It entails resizing, normalizing, and, on occasion, grayscale conversion (Rahman et al. 2021). The steps taken to format images before they are used in model training and inference are referred to as image preprocessing. Images in training and testing would be subjected to this preprocessing phase. To clean picture data for model input, preprocessing is necessary. Convolutional neural networks, for example, demand that all images be the same size arrays. Model training time and model inference speed may both be reduced by image preprocessing. If the input images are very huge, shrinking them will cut model training time in half without sacrificing model performance. This section describes the procedures employed during the preprocessing stage in detail which is employed in our studies. Table 1 gives the different preprocessing techniques used in our studies.

Table 1 Various techniques for preprocessing CXR images

Normalization and image resizing is a crucial step in CNN systems that preserve image stability while also improving model performance. If normalization is used, a CNN model learns faster and the gradient descent tends to be more stable. The process of normalization changes the range of pixel values (Islam et al. 2020). As a result, input image pixels in this study were normalized between 0 and 1 (Saha et al. 2021; Haque and Abdelgawad 2020). Image resizing is the process of scaling images. Image processing and machine learning applications both benefit from resizing. The majority of deep learning model architectures assume that all input images are the same size (Asnaoui and Chawki 2021; Showkat and Qureshi 2022; Jia et al. 2021). Almost a third of the people in our study resize the image to 224 × 224 pixels. Resizing allows you to reduce the amount of pixels in an image, which has a number of benefits such as it can reduce the amount of time it takes to train a neural network since the more pixels in an image; the more input nodes there are, increasing the model's complexity (Islam et al. 2020; Li et al. 2021).Also, By normalizing input features to a similar range, the model can become more stable, converge faster, and increase efficiency while reducing the impact of outliers (Chollet , 2018).

2.1.1.1 Augmentation technique

Image augmentation builds training images using a variety of processing techniques or a mix of techniques. Another image processing technique is histogram equalization, which uses the image intensity histogram to increase the global contrast of an image (Heidari et al. 2020; Mostafiz et al. 2020). Images can be flipped horizontally and vertically using flipping (Sousa et al. 2022). Vertical flips are not supported by all frameworks. A vertical flip, on the other hand, is the same as rotating an image 180 degrees and then flipping it horizontally. The image dimensions may not be kept after rotation. If your image is square, rotating it at right angles will keep the image size the same. If it's a rectangle, turning it 180 degrees will keep it the same size. The final image size will alter as the image is rotated at finer angles (Karakanis and Leontidis 2021; Aslan et al. 2022).Cropping is just taking a random piece of an image and cropping it. We then restore the original image size to this part. Random cropping is the common name for this method (Reshi et al. 2021).

Shear is an image distortion technique used to produce or correct perspective angles, typically to enhance images for computer vision applications as in Khan et al. (2022), Asnaoui and Chawki (2021). In addition to shear, various other techniques are used in image augmentation to increase the diversity of the training data, such as zoom augmentation, height shift range function, translation, and Gaussian filtering. Zoom augmentation randomly zooms in or adds pixels around the image to enlarge it as in Asnaoui and Chawki (2021), while height shift range function shifts each pixel vertically to the top or bottom at random (Sharifrazi et al. 2021). Translation involves moving the image in the X or Y direction (or both) during the augmentation process, which enables the convolutional neural network to search in all directions (as described in Abbas et al. 2021; Kiziloluk and Sert 2022). Finally, Gaussian filtering is often used to reduce noise and blur image regions, as seen in some studies (e.g., (Jain et al. 2020; Shankar and Perumal 2021)).

Data augmentation with Generative Adversarial Networks (GANs) has been used to improve CNN training by producing fresh data without any predetermined augmentation procedure (Rasheed et al. 2021).Conditional GANs can transform an image from one domain to an image to another domain. The sole disadvantage of this procedure is that the result is more artistic than practical (Karakanis and Leontidis 2021).Anisotropic diffusion is a technique for improving image quality. It is a nonlinear method that "extracts" the key visual information by removing noise and extraneous details while preserving the edges (Mostafiz et al. 2020).Image data augmentation is used to increase the size of the training dataset in order to improve the model's performance and generalization capacity (Hasan et al. 2021). The DARI method generates synthetic chest X-ray images using GAN and generic data augmentation techniques, which are then integrated with the remaining original radiograph images to create a strong training dataset (Sakib et al. 2020).

2.1.2 Traditional CNN architecture

Nowadays, CNN architectures are good in attaining expert-level performance as human in a variety of complicated visual tasks, such as medical image analysis and pathology detection. Since the first successful CNN in 1998, a plethora of CNN architectures have been proposed in the literature. It was widely used for the application (handwritten digit recognition) and was known as LeNet. It was developed by Yann LeCun. LeNet has three convolutional, two averages pooling, and two fully connected layers, making it a shallow design in comparison to current models. CNN is used for feature extraction. Figure 3 shows a schematic depiction of a typical CNN for COVID-19 prediction, which is detailed further below.

  1. 1.

    Convolution layer Learnable filters (also called as kernels) are convolved with the input images to produce the convolutional layer. It executes an element-wise dot product and sum to create a number as a feature map element. Local connectivity, in which filter weights are doubled to a small area of the input picture at a time, and Weight Sharing, where the same filter weights are duplicated to every grid point of the input image, is two important aspects of convolution. The initial layer retrieves low-level features, whereas the subsequent layers recover high-level features.

  2. 2.

    Pooling layer Along with standard convolutional layers, convolutional networks can have local and/or global pooling layers. By merging the outputs of neuron clusters from one layer into a single neuron in the next layer, pooling layers reduce the number of dimensions in data. Small clusters are combined using local pooling, which generally uses tiling sizes of 2 × 2. The feature map's neurons are affected by global pooling. Maximum and average pooling are the two most used types of pooling. The maximum value of each local cluster of neurons in the feature map is used for max pooling, while the average value is used for average pooling.

  3. 3.

    Fully connected layer It is commonly used to perform classification tasks. This layer, also known as dense layers, take the output of the previous layer, which is a high-dimensional feature map, and convert it into a probability vector indicating the likelihood of the input belonging to each class. The number of neurons in the fully connected layers is typically set to match the number of classes in the dataset. During training, the network learns the optimal weights for each neuron through backpropagation, minimizing the loss function. Once trained, the CNN can classify new data by passing it through the network and determining the predicted class based on the highest probability value in the output vector.

2.1.3 Pre-trained CNN and methodologies

A pre-trained model is one that has been trained on a big benchmark dataset to solve a problem that is comparable to the one we're trying to solve. As a result of the high computational cost of training these models, it is usual practice to import and employ models from the literature. In such situations, the usage of pre-trained models based on the notion of transfer learning (TL) can be beneficial. In TL, the information gained by a DL model trained on a large dataset is applied to a task with a smaller dataset. This reduces the need for a big dataset and a longer learning period, which are both requirements of DL algorithms that are taught from scratch. For classification of COVID-19 from normal cases/Viral Pneumonia/Bacterial Pneumonia, several pre-trained models were utilized in this study: AlexNet, VGG-16, ResNet-101, MobileNet, Se-ResNeXt-50, Densenet-161, SqueezeNet, and Inception-V3. Table 2 depicts the numerous approaches used in our research.

Table 2 COVID-19 detection methodologies and it limitations

AlexNet was one of the first convolutional networks to solve large-scale image classification problems, paving the way for deep learning applications. Its architecture comprised 8 layers, which included 5 convolutional layers and 3 fully connected layers. During operation, the input images were processed by the convolutional layers, which utilized pooling and filter sizes to extract image features, while the fully connected layers subsequently classified the images based on these extracted features. A notable advancement of AlexNet was the use of ReLU activation functions, which helped address the issue of vanishing gradients that had previously limited neural network depth. Abbas et al. (Abbas et al. 2021) presented the DeTraC system, which uses AlexNet for class decomposition to detect COVID-19 X-ray images from normal and severe respiratory disease cases. AlexNet was used by Shukla et al. (2021) to provide a framework for diagnosing COVID-19 patients using chest X-ray images.

Oxford University's Visual Geometry Group is known as VGG. This model is simple in design yet quite effective in terms of performance. The VGG16 and VGG19 architectures, respectively, include 16 and 19 convolutional layers. VGGNet contains a cascade of five convolutional blocks with fixed kernel sizes of 3 × 3, with the first two blocks each containing two convolutional operations and the last three blocks each containing three convolutional operations. It is important to note that a new convolution block, as well as process enhancement techniques (batch normalization and dropout), can be readily added to the conventional model, allowing for the learning of finer features and enhanced learning speed/stability. A few studies used the VGG16 pre-trained model for COVID-19 classification with CXR for three-class classification (Heidari et al. 2020; Hasan et al. 2021). Some research added or fine-tuned a few layers to the VGG16 pre-trained model for COVID-19 CXR image classification (2 classes (Shibly et al. 2020); 3 classes (Bayoudh et al. 2020; Das et al. 2021)). Asnaoui and Chawki (2021) also made use of VGG19 to classify the given CXR images.

ResNet is the most well-known pre-trained model for COVID-19 classification, and it has been frequently used. The result of each convolutional block is added to the output of the convolution blocks of the deeper stages of ResNet, which is made up of many residual blocks. With the help of X-ray imaging, Jain et al. (2020) were able to detect COVID-19 instances while distinguishing it from bacterial pneumonia, viral pneumonia, and healthy normal persons. Several researchers have utilized ResNet50 to detect COVID-19 using CXR images (Mostafiz et al. 2020; Das et al. 2021; Asnaoui and Chawki 2021). For COVID-19 picture categorization, the authors in Ref. Showkat and Qureshi (2022), Hossain et al. (2022) used customized versions of ResNet. Table 2 provides additional information on these sources.

In contrast to standard residual models, which use extended representations in the input and output, the MobileNetV2 architecture is built on an inverted residual structure, where the input and output of the residual block are thin bottleneck layers. MobileNetV2 filters features in the intermediate expansion layer with lightweight depth wise convolutions. MobileNetV2 was used in a few studies to classify CXR images into three categories: COVID, Other Pneumonia, Normal (Kiziloluk and Sert 2022; Asnaoui and Chawki 2021).To build Se-ResNeXt, a Squeeze and Excitation (SE) block was added to the ResNet. SE blocks allow a network to execute dynamic channel wise feature recalibration, which increases its representational capacity. Hira et al. (2021) used AlexNet, GoogleNet, ResNet-50, Se-ResNet-50, DenseNet121, Inception V4, ResNet V2, ResNeXt-50, and Se-ResNeXt-50 to examine the performance of COVID-19 classifications through CXR images. Se-ResNet-50 outperformed the others in this model.

DenseNet is a ResNet50 modification in which each layer receives additional input from all preceding levels instead of a single previous layer's skip connection. For concatenation, it sends its output to all of the following convolutional layers. As a result, each convolutional layer is said to receive "collective knowledge" from the ones before it. DenseNet has showed good performance in a few studies (Kiziloluk and Sert 2022; Ortiz et al. 2022; Alhudhaif et al. 2021). SqueezeNet is a convolutional neural network that utilizes design tactics to minimize the number of parameters, particularly through the use of fire modules, which "squeeze" parameters using 1 × 1 convolutions. SqueezeNet was used by Gupta et al. (2021b) to extract features during COVID-19 prediction using CXR images. To classify the provided Chest X-ray pictures into COVID-19 Pneumonia and Other Pneumonia classes, SqueezeNet was utilized as the training architecture (Alhudhaif et al. 2021).

For greater model adaption, the Inception V3 model uses numerous strategies to optimize the network. It has a more extensive network than the Inception V1 and V2 models, but its speed is unaffected. It is less computationally costly. As regularizers, it employs auxiliary classifiers. In Ref. Gupta et al. (2021b), the author used the Inception V3 model to extract features and classify input photos into COVID and non-COVID images. Other frameworks used in Covid-19 detection include Capsule Network-based frameworks like COVID-wideNet (Gupta et al. 2022) and COVID-CAPS (Afshar et al. 2020).

2.1.4 Classification task

COVID-19 prediction was achieved by categorizing given CXR pictures into two (binary) or multi classes, as illustrated in Fig. 3. Normal, bacterial, viral, and COVID-19 are among the designations seen in each class. COVID-19 label and non COVID label are the two labels that make up binary categorization. The three labels in the three-class prediction process are: (1) COVID-19, (2) Normal, and (3) pneumonia. The four sorts of labels in the 4 class prediction technique are: (1) COVID-19, (2) Normal, (3) bacterial, and (4) viral pneumonia. The vast majority of works forecast 2 or 3 classes. Table 3 shows the number of publications reviewed divided by the number of classification labels used. One of the review work (Jia et al. 2021), there are five classifications: "COVID-19," "Tuberculosis," "Viral Pneumonia," "Bacterial Pneumonia," and "Normal." This study will also assess whether the CXR image contains "Tuberculosis" disease.

Table 3 Study distribution based on classification task formulation

2.1.5 Dataset details

Different datasets were utilized in the peer-reviewed studies. Table 4 provides a summary of these datasets. The reference No., dataset description/URL, and No. of images utilized for each work, listed in each row. Some of these files feature COVID-19 CXR images, while others comprise images of healthy people and people with various pulmonary illnesses. CXR images are frequently used as a first-line imaging technique for COVID-19 patients and studied in several COVID-19 diagnosis studies. This technique is quite inexpensive when compared to other medical imaging approach and poses a lower danger to human health because it is a low-radiation technique. Table 4 also summarizes the most recent and relevant studies in this subject, as well as the dataset’s quantitative parameters, such as the number of records.

Table 4 COVID-19 datasets utilized in the reviewed study are described here

3 Performance evaluation

Evaluation indicators are used to evaluate the overall pipeline's performance. For the experiment, the data is usually partitioned into training and testing sets. The training data is used to construct a specific model, while the adequacy of the training and the model is evaluated by simultaneously monitoring, overfitting and under fitting on the validation data. Finally, the generated model's performance is evaluated using previously unseen test data. The accuracy of the classification can be measured by dividing the number of accurately predicted images by the total number of predictions made. If there are two classes of images, for example, each class should have an equal amount of images; only then will the training accuracy work well. The Accuracy can be described as

$$ {\text{Accuracy}} = \frac{{{\text{No}}{.}\;{\text{of}}\;{\text{correctly}}\;{\text{predicted}}\;{\text{images}}}}{{{\text{Total}}\;{\text{No}}{.}\;{\text{of}}\;{\text{predictions}}\;{\text{actually}}\;{\text{made}}}} $$
(1)

The test's sensitivity (sometimes called the detection rate in a clinical setting) is the percentage of people who test positive for the condition among those who have it. This can be written as: The chance of a positive test, conditioned on it being actually positive, is known as sensitivity (True Positive Rate). This is expressed mathematically as

$$ {\text{Sensitivity}} = \frac{{{\text{No}}{.}\;{\text{of}}\;{\text{true}}\;{\text{positives}}}}{{{\text{Total}}\;{\text{No}}{.}\;{\text{of}}\;{\text{sick}}\;{\text{individual}}\;{\text{in}}\;{\text{population}}}} $$
(2)

The probability of a negative test, conditioned on being actually negative, is known as specificity (True Negative Rate). The fraction of people who do not have the condition but test negative for it is known as test specificity. This can also be written mathematically as:

$$ {\text{Specificity}} = \frac{{{\text{No}}{.}\;{\text{of}}\;{\text{false}}\;{\text{positives}}}}{{{\text{Total}}\;{\text{No}}{.}\;{\text{of}}\;{\text{well}}\;{\text{individual}}\;{\text{in}}\;{\text{population}}}} $$
(3)

The precision can be calculated by dividing the individuals correctly identified by total number of correctly identified individuals and incorrectly labeled individuals.

$$ {\text{Precision}} = \frac{{{\text{No}}{.}\;{\text{of}}\;{\text{correctly}}\;{\text{identified }}\;{\text{individual}}}}{{{\text{Total}}\;{\text{No}}{.}\;{\text{of }}\;{\text{individuals }}\;{\text{in }}\;{\text{a}}\;{\text{population}}}} $$
(4)

The F-score can be used as a single indicator of positive class test performance. The harmonic mean of precision and recall is the F-score. F-Score is written in mathematical notation as:

$$ F1\;{\text{score}} = 2*\frac{{{\text{Precision}}*{\text{Sensitivity}}}}{{{\text{Precision}} + {\text{Sensitivity }}}} $$
(5)

If the precision and sensitivity values are high, the F1 score for the particular model is also high. If the precision and sensitivity values are low, the F1 score for that model is also low. The model has a medium F1 score if one of the precision and sensitivity values is low and the other is high. The F1 score solely reflects the model's performance.

It was unable to compare the studies included in this study due to discrepancies in the size of the testing sets and the lack of uniform performance evaluations, making the identification of the most efficient DL models for recognizing COVID-19 from CXR pictures even more difficult. Most writers evaluated the DL models using the accuracy, sensitivity, and specificity criteria. However, when non-standard metrics and data from several sources are used, comparing alternative techniques becomes more difficult. As a result, a public COVID-19 dataset that is both complete and accessible to researchers is necessary. Performance standards for prediction models must also be established. Table 5 illustrates the outcomes of the articles that were read in terms of classification metrics like accuracy, precisions, recall, sensitivity, specificity, and F1-score.

Table 5 Metrics of the methodologies utilized in the examined study's performance

Accuracy is the parameter which measures the overall performance of a model. It is calculated as the percentage of the correctly classified data samples by the model. By integrating CNN with LSTM (Long Short Term Memory) as a classifier, Md. Zabirul Islam et al. (He et al. 2020) achieved accuracy of 99.40%. Using Adam Optimizer, Hira et al. (WHO 2022) achieves a 99.32% with Se-ResNeXt-50. Using Adam as an optimizer, Gupta et al. (Erickson et al. 2017) suggested InstaCovNet-19, with this author achieving accuracy of 99.08% for Multi-class and 99.53% for Binary Class. For classification, Reshi et al. (Altaf et al. 2019) employed deep CNN, which achieved a 99.5%. The Adam optimization technique is employed in this study. With the help of CNN and DWT optimized features, Mostafiz et al. (Shin et al. 2016) achieves a binary classification accuracy of 99.45%. Shukla et al. (Hira et al. 2021) created a framework called COVID-19 that uses a Multiobjective Genetic Algorithm and a Convolutional Neural Network to achieve 99.15%.

Modified Mobile Net and ResNet were used by Jia et al. (Rahman et al. 2021), with average accuracy of 99.6% and 99.3%, respectively. SqueezeNet + ReLu Activation function by Gupta et al. (Das et al. 2021) achieves 99.4%, while Inception V3 + Sigmoid achieve 99.5%. Transfer learning with fine-tuned deep CNN ResNet50 model for classification by Hossain et al. (Sakib et al. 2020) achieves accuracy of 99.95%. Adam Optimizer is used in the above model. Sharifrazi et al. (Gayathri et al. 2022) created a model that combines a convolutional neural network, a support vector machine, and a Sobel filter to reach a 99.02%.Fig. 4 gives the overview of various optimization technique used in the review work.

Fig. 4
figure 4

Overview of various optimization techniques

In this study, various optimization algorithms have been used in deep learning, including stochastic gradient descent optimizer (SGD) (Khan et al. 2022), adaptive learning rate optimization algorithm (Adam) (Heidari et al. 2020; Hira et al. 2021; Nour et al. 2020; Karakanis and Leontidis 2021; Gupta et al. 2021a; Khan et al. 2022; Reshi et al. 2021; Bayoudh et al. 2020; Sousa et al. 2022), Adagrad (Sakib et al. 2020), and RMSprop (Ortiz et al. 2022). These algorithms have been compared based on their convergence speed, ability to escape local minima, and robustness to hyper parameter settings. While SGD is a commonly used optimization algorithm, it can be slow and sensitive to the choice of learning rate. Adam is a popular choice due to its fast convergence and adaptive learning rate, but it may struggle with high-dimensional problems. Adagrad is effective at handling sparse data and can adapt to different learning rates, but it can converge too quickly and be less effective on non-convex problems. RMSprop is similar to Adagrad but addresses its fast convergence issue, making it a good choice for non-convex optimization problems. The choice of optimization algorithm depends on the problem being addressed and the resources available for training.

Additionally, the literature survey found that the batch sizes used in training ranged from 4 to 128, while the number of epochs ranged from 10 to 5000. Some models used convolution layers, max pooling layers, FC layers, and activation function layers, while others used LSTM and GRU layers. Overfitting is a common problem that authors encounter when training their models. To address this issue, authors typically use techniques such as early stopping, regularization, and data augmentation. Early stopping involves monitoring the loss function during training and stopping the training process once the validation error stops improving. Regularization techniques, such as L1 and L2 regularization (Kiziloluk and Sert 2022), penalize large weights and help prevent overfitting. Data augmentation involves generating additional training examples by applying various transformations to the original data. Alternative optimization algorithms have also been used in hyper parameter optimization for deep learning models, such as the Modified Competitive Swarm Optimizer (MCSO) (Jalali et al. 2022), the Salp Swarm Algorithm (Ahmadian et al. 2021), and the modified deer hunting optimization algorithm (M-DHOA) (Kuzhali and Pushpa 2022). The hyper parameters considered in these studies include convolution filter size, number of filters, number of convolutional layers, activation function type, dropout rate, max-pooling size, learning rate, momentum rate, optimizer type, number of epochs, and batch size. These optimization algorithms were found to be effective in improving model performance, demonstrating their potential in hyper parameter optimization for deep learning.

4 Conclusion

This article explores different deep learning (DL) techniques that can be utilized to identify COVID-19 through chest X-ray images, including the current state of research in this area. The paper explains pre-trained CNN models and various datasets used in prior studies. Although DL approaches show promise for automatic COVID-19 diagnosis, cooperation between medical experts and computer scientists is necessary to create more reliable and effective DL models. Despite the excellent outcomes, there is still a lot of room for improvement. In order to enhance model performance, it is crucial to establish datasets that are public, broad, diverse, validated and labeled by experts with related lung disease lesions. Combining sign detection with categorization output could improve both the forecast accuracy and model performance. Techniques such as cross-validation, data augmentation, and transfer learning have been identified to enhance the adaptability and generalization of DL models. It is also crucial to train COVID-19 detection models on a substantial amount of real-world data. Nevertheless, there have been limited studies on multi-class classification (i.e., specifically more than 3 classes), and future research can evaluate the effectiveness of proposed models for this issue. Moreover, incorporating optimization algorithms with DL models can lead to the development of more dependable models.