1 Introduction

Several cases of COVID-19 (COrona VIrus Disease - 2019) were identified in Wuhan City (China) at the end of December 2019 [1]. On January 30th, 2020, the World Health Organization (WHO) declared a Public Health Emergency of International Concern as the virus is highly contagious and may be transmitted between people through the airways. Less than three months later, on March 11th 2020, it was deemed a Global Pandemic Outbreak [2]. According to the last reports, there have been 606.459.140 confirmed cases of COVID-19, with 6.495.110 deaths worldwideFootnote 1.

The viral infections of the Coronaviridae virus family cause the coronavirus disease. This family is divided into four significant genes: Alphacoronavirus, Betacoronavirus, Deltacoronavirus, and Gammacoronavirus [3]. COVID-19 appears to be the seventh Coronavirus to be transmitted among humans; the predecessors are 229E, NL63, OC43, HKU1, MERS-CoV, and SARS-CoV.

COVID-19 is made up of four proteins: S (spike), E (envelope), M (membrane), and N (nucleocapsid), and its RNA genome contains 29,891 nucleotides, which are present in the N protein [4]. Glycoprotein S (“spike”) is responsible for the bulbous projections seen on the outside of the Coronavirus. Three S glycoproteins join to form a trimer; the trimers combine to form structures resembling a corona surrounding the virion. The membrane protein (M) crosses the envelope and interacts with the RNA-protein complex within the virion. Protein E aids glycoprotein S (and thus the virus) in attaching to the target membrane of the cells. Finally, the Coronavirus genome comprises a single strand of RNA with a sizeable positive polarity (ranging from 27 to 32 kb in different viruses); RNA generates seven viral proteins and is linked to the N protein, increasing its stability.

The COVID-19 virus can be detected using gold-standard methods: PCR, molecular, and antigen tests, also known as swabs. The swabs are based on identifying the virus’s genetic material using a sample of the patient’s respiratory tract taken via nasopharyngeal swab and a saliva sample. The molecular test (RT-PCR) explicitly detects the presence of the genetic material of the virus [5]. In contrast, the antigen test looks for specific proteins known as antigens on the virus’s surface. The antigenic swab provides a faster result but is less reliable. Indeed, the rapid swab has a 30% false negative rate compared to the RT-PCR, which turns out to be very reliable in detection with a specificity of 96%, an AUC of 98%, and a sensitivity of 86% [6].

COVID-19 has had a significant impact on cardiovascular health; in particular, it can directly affect the heart and cardiovascular system. It may cause myocarditis (inflammation of the heart muscle), arrhythmias (irregular heartbeats), blood clot formation, and heart damage [7]. This fact led to the formulation of the following initial working hypotheses (WH): “If COVID-19 is indeed linked to changes in the cardiovascular system, can these changes be detected by examining correlations in electrocardiogram (ECG) signals among patients with COVID-19?”.

In this context, the ECG analysis might be an alternative detection method. Early detection of COVID-19 through ECG analysis could be crucial for prompt intervention, reducing transmission risks, and monitoring potential cardiovascular complications associated with the virus, as with many other pathologies [8]. Moreover, the proactive approach supports healthcare systems in efficiently managing resources based on case severity.

The ECG is an outpatient diagnostic test that records and graphically displays the heart’s electrical activity; it is the primary diagnostic test in the cardiovascular field and necessarily requires the placement of ten electrodes (six for precordial leads and four for peripheral leads). An ECG has 12 leads representing different angles from which the heart’s electrical activity can be observed.

A succession of deflections characterizes a single heartbeat: the P wave, the QRS complex, and the T wave are separated by two insulating sections, the PR interval and the ST section [9].

This research explores the potential of Deep Learning (DL) approaches, training various neural networks such as ResNet-50, ResNet-18, SqueezeNet, AlexNet, MobileNetV2, and our proposed networks for detecting and analyzing ECG images to detect the COVID-19 virus. The dataset was gathered in Pakistan, and it is made up of a total of 1937 ECG images that have been divided into five groups: Normal (N), Abnormal (ABN), Myocardial Infarction (MI), History of MI (HMI), and COVID-19. Following the pre-processing stage, cleaning, resizing, and binarization techniques were used to enhance the image quality. The results suggest the possible detection of the COVID-19 class through the ECGs with an accuracy of 98.9% and a low presence of false negatives.

The confusion matrices support our working hypothesis, suggesting that almost all experiments correctly classify the COVID-19 class without false negatives.

The research questions (RQs) of this work can be described as the following:

  • RQ1: How can advancements in medical imaging and diagnostic technologies contribute to the development of effective methods for the early diagnosis of COVID-19 through ECG analysis, facilitating preventive intervention and improved patient outcomes?

  • RQ2: How effective are various DL approaches in detecting COVID-19 from ECG images?

This paper is organized as follows. First, an analysis of the works in the literature has been made in the Related Works section. Then, Methods section describes the methodology and the techniques used; the obtained results are presented and discussed in the Results and Discussion section. Finally, the last section discusses the conclusions, limitations, and future developments.

2 Related works

Several pioneering studies have revealed that cardiac damage and inflammation are commonly associated with hospitalized COVID-19 patients. In this section, we focused on studies that implemented binary and multiclass classification approach.

Different mechanisms may be involved in myocardial tissue damage. Acute myocarditis is one of them. Histological studiesFootnote 2 using an optical microscope confirmed the evidence of direct viral damage to the myocardium with acute myocarditis [10]. Fatih et al. found a prolongation of the QT interval (> 500ms) in patients hospitalized due to COVID-19, particularly in patients who had taken the active ingredient, azithromycin [11]. Abnormalities were also discovered in the ST-T segment (possibly indicating myocarditis) [12], ventricular tachycardia [13], atrial fibrillation (A-fib) [14], and the Brugada pattern [15].

The use of images and other data types for the early identification of many pathologies has become possible thanks to the technological advancement promoted by machine learning (ML) and DL [16,17,18]. The utilization of patient data through ML and DL techniques in disease management offers numerous substantial benefits. However, the intrinsic nature of patient data poses several challenges due to their irregularity, temporality, absenteeism, and sparsity [19]. Furthermore, recent studies have shown how implementing a hybrid architecture based on Edge, Fog, and Cloud can offer a diagnostic support system that responds to the need to combine private data through anonymization strategies and data utilization [20, 21].

ECG images and signals were commonly analyzed using DL techniques, as noted in a study by Rana et al. [14, 22] proposed a classification system for COVID patients’ ECGs based on the Hexaxial Reference System, which presents the direction of the 12 leads of the electrocardiogram. The authors selected ResNet-50, AlexNet, ResNet-8, and SqueezeNet neural networks. AlexNet had the highest accuracy rate of 95%. In addition, they created an ad-hoc network with 99,6% accuracy based on the AlexNet structure.

Recent work by Rahman et al. [23] compared six deep CNN models using gamma correction and pre-processing techniques for ECG images to detect COVID-19. DenseNet-201 achieved the highest accuracy (99.1%) for two-class classification, while InceptionV3 performed best (97.83%) for five-class classification. The work of Attallah et al. [24] introduced ECG-BiCoNet, utilizing five neural networks for binary and multiclass classification, employing feature fusion and ensemble classification. The method achieved 91.73% accuracy for multiclass and 98.80% for binary classification.

3 Methods

The methodology adopted consists of three key phases: (i) analysis and pre-processing of the image quality and informational value; (ii) classification phase employing various CNNs, specifically ResNet-18, ResNet-50, Mobile-NetV2 and SqueezeNet (chosen for their high performance as demonstrated in [25]) and an ad hoc network. (iii) The final step involves assessing performance using the gold standard metrics from existing literature and examining confusion matrices to analyze images identified as false negatives.

3.1 The dataset

The dataset contains a total of 1937 ECG images in RGB format of various categories, including COVID-19-positive patients. Each image contains a 12 leads-based ECG. The dataset was collected throughout a three-month study conducted in 2020 in Pakistan [26]. The tool used to gather the images is called EDAN SERIES-3. All images were produced with a sample rate of 500 Hz, to which bandpass filters were applied. As a result, images can range in frequency from 0.5 Hz to 100 Hz in some cases and from 0.67 Hz to 25 Hz in others.

Medical experts examined and labeled the images using the Telehealth ECG Diagnostic System. The Telehealth system is a group of personnel and technologies allowing remote patient monitoring, assistance, and data transfer. The ECG images have no private patient information [27]. There are five main categories, for a total of 1931 images:

  • COVID-19 Patients: 249 ECGs of people affected by SARS-CoV-2;

  • Myocardial Infarction Patient: 74 ECGs present Myocardial infarction;

  • Normal Person ECG Images: for a total of 859 ECGs of healthy patients;

  • Patients with Previous History of Myocardial Infarction: Among this group, 203 ECGs were belonging to patients who had previously experienced a MI but did not exhibit signs of a new MI during the recordings.

  • Patients with Abnormal Heartbeat: this class contains 546 ECGs of patients with an abnormal heartbeat. It should be noted that these patients have reported full recovery from both myocardial infarction and COVID-19.

In the following, for brevity these classes are referred as COVID-19, MI, N, HMI and ABN, respectively.

3.2 Pre-processing

The dataset was pre-processed before being analyzed by the neural networks.

3.2.1 Resizing and conversion to a binary image

All images have been resized, and all unnecessary information, including the outer margin, has been removed. Then the images were cleaned, and the resized image was converted into a binary image. Resizing the image is essential to capture the pertinent information from the ECG. Furthermore, we eliminated the background from normal ECG traces, resulting in a white background with distinctive black waves, characteristic of an ECG. The green channel, with a threshold value of less than 50, is expressly used because it has less noise than the other channels in the RGB image. Thresholding is a quick but effective technique that separates an image into its background and foreground by transforming it from grayscale to a binary format [28]. To eliminate noise from the image, we employed a function that isolates objects corresponding to a specified value. This value, determined by the number of waveforms in the image, is applied to both the blue and red channels to restore all three RGB channels.

3.2.2 Division into leads

A separate image was created for each lead by dividing the pre-processed image into its component portions. Consequently, the original dataset was divided into three different types, depending on how the leads were organized within the ECG image:

  • Patients with Abnormal Heartbeat: type-1, type-2, type-3;

  • COVID-19 Patients: type-1, type-2, type-3;

  • Myocardial Infarction Patient: type-1, type-2;

  • Normal Person ECG Images: type-1;

  • Patients with Previous History of Myocardial Infarction: type-1, type-2, type-3

Where type-1 includes the images in which the leads have a total of 3 waves + 1 of stability. In total, type-1 has 4 waveforms. Type-2 includes the images in which the leads have a total of 6 waves + 1 of stability. In total, type-2 has 7 waveforms. Finally, type-3 contains images in which the leads have a total of 6 waves without a stability wave; Fig. 1 shows an example of type-1, type-2, type-3, and a single lead.

Fig. 1
figure 1

Example of type-1 (a), type-2 (b) e type-3 (c), and a single lead (d). The normal waveform is represented by red, and the stable waveform is represented by blue

The pre-processed dataset was split randomly using the following ratios: 70% training set, 15% for the validation set and 15% for the test set. No cross-validation was applied.

3.3 Convolutional neural networks

Several networks have been involved in our experiments with transfer learning techniques: MobileNetV2, AlexNet, ResNet-18, ResNet-50, and SqueezeNet. Additionally, we implemented a custom network from scratch (ourCNN). The networks were trained through fine-tuning, involving modifications to both the first and last layers to adapt to the used multiclass and binary data.

Fig. 2
figure 2

The layers composing the architecture of ourCNN trained from scratch

AlexNet was introduced by [29]. The network has eight layers with learnable parameters, and the model consists of 5 layers with a combination of max pooling followed by three fully connected layers. ReLU is the activation function in all layers except the output layer. The vanishing gradient issue is reduced, and performance is enhanced via RELU: with this function, the learning speed within the training is increased by at least six times compared to other existing convolutional neural networks (CNN). Moreover, this NN uses dropOut layers to avoid overfitting.

MobileNetV2 is a CNN consisting of 53 deep layers developed by [30] and based on the previous version of MobileNetv1. This network was specifically designed for mobile platforms and resource-restricted environments. The model has two blocks: a residual block with a stride of 1 and a scaling block with a stride of 2. For both types of blocks, there are three levels: the first is a 1x1 convolutional layer with ReLU6, the second is a deep convolutional layer, and the third is another convolutional layer without any linearity.

With an increase in the depth of the network, the convergence problem affects the majority of CNNs. Residual Networks (ResNet) try to avoid gradient problems. Specifically, ResNet-18, proposed by [31], has an architecture consisting of 72 and 18 deep layers. The architecture of this network is designed to manage a large number of convolutional layers efficiently. As with ResNet-18, ResNet-50 is also a variant of the ResNet model and has 48 convolutional layers accompanied by a MaxPool Layer and an AvaragePool Layer.

SqueezeNet was developed to optimize the complexity model without compromising performance. As a result, SqueezeNet reaches accuracy results comparable to AlexNet and occasionally even better, but it employs a model that uses fewer layers, enhancing portability and scalability [32].

ourCNN was developed and trained entirely from scratch. The network has a simple architecture (see Fig. 2), consisting of only 15 layers. The size of the network input is 227x227x3. Each convolutional layer has n filters (16, 32, 64), and each layer has a size of 3x3.The padding value is always 1 for all convolutional2dLayers. As for the maxPooling2dLayer, the filter size is for both filters (2x2) with a stride of 2. Softmax has been chosen as a classifier layer.

3.4 Performance measures

The network’s output was analyzed and measured using the following metrics: accuracy, sensitivity, specificity, precision, F1_measure, gmean, and AUC reported in (1)-(6).

$$\begin{aligned} Accuracy (ACC) = \frac{(TP+TN)}{P+N} \end{aligned}$$
(1)
$$\begin{aligned} Sensitivity (RE) = \frac{TP}{P} \end{aligned}$$
(2)
$$\begin{aligned} Specificity (SP) = \frac{TN}{N} \end{aligned}$$
(3)
$$\begin{aligned} Precision (PRE) = \frac{TP}{(TP+FP)} \end{aligned}$$
(4)
$$\begin{aligned} F1\_measure (F1) = 2*\left( \frac{(precision*recall)}{(precision + recall)}\right) \end{aligned}$$
(5)
$$\begin{aligned} gmean = \sqrt{\frac{TP}{P}*\frac{TN}{N}} \end{aligned}$$
(6)

where TP and TN mean True Positive and True Negative and refer to the number of predictions in which the classifier correctly predicts the positive and negative class as negative, respectively. False Positive (FP) and False Negative (FN) refer to the number of predictions in which the classifier incorrectly predicts the positive or negative class, respectively. The Receiver Operating Characteristic (ROC) curves and the Area Under the Curve (AUC) indicate how well the model ranks each class by demonstrating the separability of classes from all potential thresholds. AUC scores and the ROC curve are crucial for assessing binary classification methods. So, in this work, the model chosen for the comparison of the classes is called One-vs-Rest (OvR) [33]. This model predicts that the class considered for the comparison (P, Positive) is compared with all the remaining classes (the N classes where N means Negative). OvR is a strategy for reducing the problem of multiclass classification into multiple binary classification problems.

4 Results and discussion

This section focuses on analyzing the outcomes obtained from the experiments conducted on the pre-processed dataset. Across all evaluated networks, we established the parameters as follows: utilizing the Adam solver, with a learning rate of 0.0001, a maximum of 15 epochs, a mini-batch size of 128, and the dataset labelled as Output-lead.

In Table 1, we summarized the overall obtained results based on the best performance of the four experiments. In particular, the best performance have been achieved for the binary classification (COVID-19 vs. N). The best neural network was ResNet-18, with 98,9% of accuracy (highlighted in bold).

Table 1 Comparison between the used networks

One of the most intriguing findings, notably from the confusion matrix analysis, is the consistently accurate discrimination of the COVID-19 class. This implies an exceedingly rare occurrence of false negatives. The networks demonstrate a remarkable ability to discern differences among the different data classes, especially within the COVID-19 category. Furthermore, in alignment with our initial hypothesis, these results suggest the presence of a discernible pattern within ECG signals. This discovery holds promise for improving COVID-19 detection methods and potentially developing a novel, rapid, reliable, and non-invasive approach.

Figure 3 shows some examples of confusion matrices from various networks and experiments, with the first column displaying the perfect classification of the COVID-19 class and the subsequent absence of false negatives.

Fig. 3
figure 3

Various confusion matrices, showcasing perfect COVID-19 classification

4.1 Comparison with the literature

In Table 2, we reported the comparison with state-of-the-art multiclassification problems using the same dataset presented in this work. Then, in Table 3, we summarized our result compared to the studies in the literature for the binary classification.

For the first time, in [14], the ECGs were used to diagnose COVID-19. To convert 12-lead ECG to 2D colourful images, the authors proposed a new method called hexaxial feature mapping (HexaxialFM). Then, the Gray-Level Co-Occurrence Matrix (GLCM) approach is employed to extract features and generate hexaxial mapping images, the input for an architecture modified from the AlexNet model. The results for multiclass classification were average ACC, PRE, RE, SP, AUC and F1 of 93%, 90,58%, 96%, 90%, 94,98% and 93,20%, respectively. For discrimination between COVID-19 and normal ECGs, the average results via the proposed architecture were 96,20 of ACC, 94,33% of PRE, 98,40% of RE, 94% of SP, and 99,15% of AUC, and F1 of 96,30%.

Recently, [23] proposed the comparison between six different deep CNN models (ResNet18, ResNet50, ResNet101, InceptionV3, DenseNet201, and MobileNetv2) using gamma correction ad pre-processing techniques for the ECG images in order to detect COVID-19, comparing the different performance of the networks for two-class, three-class and five-class classification. They achieved the best results for two-class classification with the DenseNet-201 neural network and InceptionV3 for the five-class classification. DenseNet-201 achieved 99.1 ± 0.44 of ACC, 99.11 ± 0.43 of PRE, 99.1 ± 0.44 of RE, 99.09 ± 0.44 of F1 and 96.9 ± 0.8 of SP. InceptionV3 reached an ACC of 97,83% ± 0,67, a PRE of 97,82% ± 0,67, a RE of 97,83% ± 0,67, a F1 of 97,82% ± 0,67 and a SP of 98,86% ± 0,49.

Table 2 Comparison of our best network with existing literature on multiclass classification

A new pipeline called ECG-BiCoNet was introduced by [24], employing five neural networks (ResNet-50, InceptionV3, Xception, Inception- ResNet, andDenseNet-201) for binary and multi classes classification, using in-depth features extracted from different layers of each CNN. Higher-layer features are fused using discrete wavelet transform and combined with lower-layer features. The symmetrical uncertainty (SU) is used as a feature selection approach to reduce the dimension of features. Finally, an ensemble classification system is constructed by combining the predictions of three ML classifiers: LDA, RF, and SVM. Their method reached 91,73% of ACC, 91,7% of RE, 91,9% of PRE, and 95,9% of SP for multiclassification. For binary classification, they obtained an ACC, RE, PRE, and SP of 98.80%. ResNet-18 outperforms the existing methods in detecting COVID-19 via ECG traces in the binary classification task between normal and affected samples.

Table 3 Comparison of our best network with existing literature on binary classification

Our work and the examined studies utilized a diverse range of neural networks, including MobileNetV2, ResNet-18, ResNet-50, AlexNet, SqueezeNet, and a custom CNN, offering a comprehensive evaluation. The exploration of different DL models addresses the research questions concerning the feasibility of early COVID-19 detection using deep learning methods on ECG images. In the literature, few methods maintain the binary and multiclass classification approach for ECG features. Limited comparison with other state-of-the-art methods and architectures, potentially overlooking alternative approaches that could further improve performance.

5 Conclusions

The COVID-19 virus rapidly propagated and infected millions of people in every country on earth, resulting in thousands of deaths. The technique of contagion is pretty easy, as it occurs through liquid particles in the air; as a result, various prevention methods have been applied, including the use of masks, frequent hand washing, and social distancing. Molecular and rapid swabs are the two primary techniques for viral identification. Thanks to their excellent dependability and precision, these technologies are the “Gold Standard” used worldwide. However, they have several limitations, such as time and expense in terms of resources, and can be somewhat invasive.

Following the WH that suggests the link between cardiac symptoms and COVID-19 patients, exploring new approaches to detect the virus was necessary. This study aimed to utilize DL techniques to identify any correlations between the ECG signal images of COVID-19 patients and those unaffected by the virus.

The dataset consists of 12 lead RGB images in five categories: Normal, Abnormal, Myocardial infarction, history with previous Myocardial infarction, and COVID-19. The images were pre-processed in two phases: first, the ECG signal was extracted from the figure, and then the images were divided by lead (each image was one lead). Subsequently, many NN were utilized, with selection based on their composition and overall performance. MobileNetV2, ResNet-18, ResNet-50, AlexNet, SqueezeNet, and ourCNN have been selected for this research. Five experiments were carried out: first by comparing all five classes, then by comparing four classes, and last by comparing three classes, until attaining a binary classification while always keeping the COVID-19 category. ResNet-18 and SqueezeNet produced the most significant outcomes with 98.9% and 98.8% accuracy, respectively. The findings presented in this study address RQ1 and RQ2, which specifically pertain to the feasibility of early detection of COVID-19 through the application of DL approaches on ECG images. The outcomes shed light on the potential of DL methodologies to successfully identify early manifestations of COVID-19 within ECG data.

The findings suggest the existence of a correlation between the ECG signals of individuals who have contracted COVID-19. As a result, the NNs proposed in this study have the potential to be integrated into healthcare monitoring systems and could assist clinicians in the real-time detection of COVID-19. Additionally, the analysis could further explore the use of signals in 12 leads and search for any patterns associated with COVID-19 and correlations between the different categories in the dataset. Such research could contribute to developing more accurate and efficient diagnostic tools for COVID-19 and other heart diseases.