1 Introduction

The qualitative processing and classification of biomedical signals is very important for diagnosis and therapy. Many methods are used to process biomedical signals. Some important methods are discrete Fourier transform (DFT), short-time Fourier transform (STFT), continuous wavelet transform (CWT), and discrete wavelet transform. The Fourier transformation provides a very good frequency range for stationary signals (Haberl et al. 1989). However, the time domain is almost non-existent. This can lead to serious problems, especially if time-dependent characteristics are to be inferred. However, when signals are transformed with the wavelet transform, both frequency and time domains are distinguishable (Li et al. 1995). In other words, wavelet transform (WT) is a transformation technique that splits signals into different frequency components and processes each component with the time domain of the respective scale. In this study, we focus on electrocardiogram (ECG) signals. The signals resulting from the electrical activity of the heart, the main vital organ in the human body, are called an electrocardiogram (ECG). Sudden deaths from heart disease with coronavirus (COVID-19) are currently on the rise (https://www.chss.org.uk/media-release/new-nhs-figures-show-dangerous-domino-effect-of-pandemic-on-progress-made-with-strokes-and-heart-disease/). For this reason, the processing and analysis of the signals received by the heart are very important for rapid diagnosis and treatment. In conventional methods, a suitable sampling method is used in the pre-processing phase of ECG signals and the signals are cleaned of noise. Then, the manual feature extraction phase begins, where it is very important to seek expert opinions. This phase is very critical as incorrect feature extraction can lead to misclassification of signals and serious errors in diagnosis and treatment. After all these phases are completed, classification is done using traditional classification algorithms. However, the studies show that the situation for deep learning algorithms has changed in recent years (Ozaltin et al. 2022; Özaltın and Yeniay 2021; Koc et al. 2022). Thanks to deep learning algorithms, successful classifications can be made automatically. In this way, the state of health of patients can be monitored with smartphones, watches, etc., even without an expert opinion.

The aim of the study was to recognize type of ECG efficiently via deep learning algorithm. Firstly, we collect the dataset from PhysioNet databases (Physionet 2020). The dataset consists of three different types: arrhythmia (ARR), congestive heart failure (CHF), and normal sinus rhythm (NSR). In this study, a novel convolutional neural networks (CNN) architecture, which is one of the deep learning algorithms, is proposed for automatic ECG signal classification. This newly proposed 34-layer CNN architecture is designed for two-dimensional images. In fact, the newly proposed CNN is considered not only ECG classification, but also other biomedical signals, images, etc. classification. In this context, the ECG signals are naturally transformed from one-dimensional signals into images by using a continuous wavelet transform (CWT) in the pre-processing phase. This wavelet transform has three different mother wavelet functions: Amor, Bump, and Morse, which are the most commonly used. The impact of these functions on classification performance is also examined. In this study, 360 Hz, 500 Hz and 1000 Hz sample lengths are examined whether the wave characteristics become more evident. Figure 1 shows the images (scalograms) obtained with different sampling lengths of ECG signals, 360 Hz, 500 Hz, and 1000 Hz, respectively. Therefore, a total of nine different datasets are obtained under these conditions. These datasets are classified separately with the same training options parameters using the proposed CNN, AlexNet, and SqueezeNet. After identifying the best wavelet function, sample length, and architecture, we additionally investigate another pre-processing method: STFT to measure ECG classification performance via different split methods: training and testing, and cross-validation. Finally, the proposed CNN is used as a deep feature extractor from images and merged with support vector machines (SVM) to get trusted results.

Fig. 1
figure 1

227 × 227 × 3 size of images with different sample lengths

In this study, a hybrid algorithm is proposed to detect ECG types from acquired images based on a deep learning algorithm and a machine learning algorithm. The main contributions and novelties of this study are as follows:

  • When using CWT, 500 Hz is observed as an efficient sample length while converting.

  • Amor wavelet function has higher performance than others while applying CWT.

  • A new CNN architecture called proposed CNN is presented and compared with AlexNet and SqueezeNet. Eventually, the proposed CNN has the highest performance.

  • To measure the performance of the proposed CNN, STFT is also used as pre-processing method via different splitting methods: training and testing (80:20, 70:30), and k-fold cross-validation (5, 10). Finally, CWT is higher than it and cross-validation is the best splitting method.

  • To improve classification performance, the proposed CNN is utilized as feature extractor and benefited from both fully connected layer and maximum pooling layer.

  • Reduced features are classified using SVM.

  • Consequently, the highest performance to recognize ECG types is acquired thanks to the proposed CNN–SVM hybrid algorithm.

1.1 Related studies

Nowadays, artificial intelligence is evolving day by day, and many studies are also being conducted to classify ECG signals and other biomedical signals using CNN architectures. Khorrami and Moavenian (2010) applied the CWT, discrete wavelet transform (DWT), and discrete cosine transform (DCT) to ECG signals. In addition, they compared SVM with multi-layer perceptron (MLP) algorithms in the classification phase. In particular, they found that combinations made with MLP (CWT-MLP, DWT-MLP, DCT-MLP) are superior to SVM. Al Rahhal et al. (2018) transformed signals from different datasets using CWT to identify arrhythmias in ECG signals. Also, they used the CNN algorithm and achieved an accuracy of 99% in the classification phase. Huang et al. (2019) converted ECG signals with STFT and obtained two-dimensional scalograms in their study. Moreover, they benefited from the CNN architecture for classifying these scalograms and achieved an accuracy of 99%. In addition, they also classified the one-dimensional ECG signals using CNN and found an accuracy of 90.93%. Krak et al. (2020) transformed ECG signals into the images using CWT and DWT in their study. Furthermore, they classified the images using the CNN architecture and obtained an accuracy of 96% in the classification phase. Baloglu et al. (2019) designed a 10-layer end-to-end CNN architecture for the classification of multiclass one-dimensional ECG data and achieved an accuracy of a 99.78%. Mahmud et al. (2020) created a CNN architecture for multiclass one-dimensional ECG data and obtained an accuracy rate of 99.28%. Salem et al. (2018) utilized DenseNet architecture to classify transformed two-dimensional ECG data and achieved an accuracy of 97.23%. Zhao et al. (2020) proposed a CNN containing 24 layers for classifying transformed ECG data and achieved an accuracy of 87.1%. Xu and Liu (2020) created a CNN architecture in order to analyze ECG data recorded from a Holter device and achieved an accuracy of 99.4%. Rajkumar et al. (2019) suggested a CNN architecture for one-dimensional ECG data by using exponential linear unit (ELU) activation layers and achieved an accuracy of 93.6%. Hua et al. (2020) developed a CNN architecture for one-dimensional ECG signals and achieved an accuracy of 97.45%. Kiranyaz et al. (2015) proposed a CNN architecture for patient-specific real-time one-dimensional ECG classification and achieved an accuracy of 96.4%. Chen et al. (2020) suggested CNN + long short-term memory (LSTM) which can classify six kinds of ECG fragments. They have classified two ECG databases: MIT-BIH arrhythmia database and MIT-BIH arrhythmia database + Challenge2017, and achieved an accuracy of 99.32% and 97.15%, respectively, using CNN + LSTM. Sandeep et al. (2019) utilized the CNN architecture to classify ECG data and also achieved an accuracy of 90.63%. Furthermore, machine learning algorithms such as support vectors machine (SVM), K-nearest neighbors (KNN), decision tree (DT), extreme learning machine (ELM), ensemble learning, and multi-layer perceptron (MLP) to classify ECG signals by many other researchers (Alickovic and Subasi 2015; Qaisar and Subasi 2020; Tuncer et al. 2022; Ceylan and Özbay 2007; Pławiak and Acharya 2020). Additionally, Table 1 shows recent studies on ECG signals classification.

Table 1 Recent Studies on ECG signals classification

The rest of the study is organized as follows: In Section 2, we present the materials and methods. Then, we explain the dataset, experimental setup, performance metrics, and experimental results in Section 3. Next, we discuss the results in Section 4. Finally, we conclude the study and state the future works.

2 Materials and methods

In this section, we first present pre-processing methods. Next, we introduce CNN, the proposed CNN, and pre-trained architectures: AlexNet (Krizhevsky et al. 2012) and SqueezeNet (Iandola et al. 2016). In the last, we present SVM and the proposed CNN–SVM architecture for classification of ECG dataset. Figure 2 shows the framework of this study.

Fig. 2
figure 2

Flowchart of this study

2.1 Pre-processing methods

In this study, we propose a novel CNN it needs images; therefore, we transform one-dimensional signals into two-dimensional image datasets via continuous wavelet transform (CWT) and short-time Fourier transform (STFT).

2.1.1 Max–min normalization

In this study, firstly, we normalize raw one-dimensional ECG signals using the minimum–maximum normalization method given formula in Eq. (1) as follows:

$$ X = \frac{{{\text{signal}} - \min ({\text{signal}})}}{{\max ({\text{signal}}) - \min ({\text{signal}})}} $$
(1)

where \(X\) denotes the normalized ECG signal. Besides, \(\min (.)\) is a minimum function, and \(\max (.)\) is a maximum function.

2.1.2 Continuous wavelet transform

Continuous wavelet transform (CWT) is a transformation method. CWT allows simple analysis of its frequency components and can transform a one-dimensional signal into a two-dimensional scalogram by providing a mapping of the signal also on the time axis. The mathematical formulation of the CWT and WT family is offered in Eq. (2) and Eq. (3), respectively,

$$ {\text{CWT}}\left( {a,b} \right) = \left\langle {f,\psi_{a,b}^{*} } \right\rangle = \int_{ - \infty }^{ + \infty } {f\left( t \right)\psi_{a,b}^{*} (t){\text{d}}t} $$
(2)
$$ \psi_{a,b} \left( t \right) = \frac{1}{\sqrt a }\psi \left( {\frac{t - b}{a}} \right) $$
(3)

where \(f(t)\) is a continuous signal function received in this study as an ECG signal function, \(\psi_{a,b} (t)\) is the mother wavelet function, \(a\) indicates a scale parameter, \(b\) indicates the shift parameter or translation, and the symbol of * indicates the complex conjugate function (Lee and Choi 2019). Besides,\(\left\langle {f,\psi_{a,b} } \right\rangle\) is expressed as a function of the inner products of Eq. (2). It \({\text{CWT}}\left( {a,b} \right)\) is regulated,

$$ {\text{CWT}}\left( {a,b} \right) = \frac{1}{\sqrt a }\int\limits_{ - \infty }^{ + \infty } {f(t)\psi \left( {\frac{t - b}{a}} \right){\text{d}}t} $$
(4)

will be in the form like in Eq. (4). The signal function \(f(t)\) can be converted from the inverse of \({\text{CWT}}\left( {a,b} \right)\), as follows:

$$ f\left( t \right) = \frac{1}{C}\int\limits_{ - \infty }^{ + \infty } {\int\limits_{ - \infty }^{ + \infty } {CWT\left( {a,b} \right)\frac{{\psi_{a,b} \left( t \right)}}{{\left| a \right|^{3/2} }}} } \,{\text{d}}a\,{\text{d}}b $$
(5)

where C indicates the normalization constant depending on the choice of the mother wavelet function in Eq. (5) (Lee and Choi 2019).

Some mother wavelet functions as follows:

$$ \psi_{{{\text{Morl}}}} \left( t \right) = e^{2\pi it} e^{{ - \frac{{t^{2} }}{{2\sigma^{2} }}}} = \left( {\cos 2\pi t + i\sin 2\pi t} \right)e^{{ - \frac{{t^{2} }}{{2\sigma^{2} }}}} $$
(6)
$$ \psi_{{{\text{Mexh}}}} \left( t \right) = \left( {1 - \frac{{t^{2} }}{{\sigma^{2} }}} \right)e^{{ - \frac{{t^{2} }}{{2\sigma^{2} }}}} $$
(7)
$$ \psi_{{{\text{Bump}}}} \left( {ab} \right) = e^{{\left( {1 - \frac{1}{{1 - {{ab - \mu } \mathord{\left/ {\vphantom {{ab - \mu } {\sigma^{2} }}} \right. \kern-\nulldelimiterspace} {\sigma^{2} }}}}} \right)}} \chi \left[ {\mu - \sigma ,\mu + \sigma } \right] $$
(8)

will be in the form in Eqs. (68). Here, \(\psi_{{{\text{Morl}}}} \left( t \right)\), Morlet, \(\psi_{{{\text{Mexh}}}} \left( t \right)\), Mexican hat, and \(\psi_{{{\text{Bump}}}} \left( {ab} \right)\), Bump, show the mother wavelet function (Lee and Choi 2019).

2.1.3 Short-time Fourier transform (STFT)

Short-time Fourier transform (STFT) is also a transformation method. The STFT is obtained from the discrete Fourier transform (DFT), to discover the sudden frequency and the sudden amplitude of localized waves with time-varying typical (Huang et al. 2019; Haykin and Veen 1999). The STFT uses a window function to extract time-domain information (Toma and Choi 2022). The window function possesses a certain interval, and the value of this window function outward of the interval is zero (Toma and Choi 2022). To calculate the frequency domain information, the window function shifts over all non-stationary signals and each time it is multiplied with the signal (Haykin and Veen 1999; Toma and Choi 2022). Further, the time–frequency spectrogram can be computed in a discretized non-stationary digital signal as given in Eq. (9) (Toma and Choi 2022),

$$ STFT\left\{ {x\left[ n \right]} \right\} = X\left( {m,\omega } \right) = \sum\limits_{n = - \infty }^{\infty } {x\left[ n \right]w\left[ {n - m} \right]e^{ - j\omega n} } $$
(9)

where \(x\left[ n \right]\) symbolizes signals and \(w\left[ n \right]\) is the window function. In this study, we utilize the Kaiser function with a window size of 500 Hz. Thus, we convert ECG signals into ECG spectrums images with dimensions of 227 × 227 × 3.

2.2 Convolutional neural network (CNN)

Convolutional neural network (CNN) emerges as a specialized deep learning approach for analyzing two-dimensional data. Not only it is preferred algorithm in the analysis of multidimensional data but also one-dimensional data. Other classifications and clustering algorithms are difficult to apply to real-time data due to their computational complexity (Narin 2020). For this reason, deep learning technology that can overcome this complexity evolves day by day. Moreover, CNN can perform feature extraction and classification automatically using raw data, so deep learning algorithms are very popular in the field of artificial intelligence. Further, it is found to give very good results of classification studies involving both big data and small data by researchers. Thanks to the CNN algorithm, ECG signals can be analyzed and observed on smartphones, watches, Holter monitoring devices, etc. (Huang et al. 2019).

The CNN processes an image in different layers and separates all its features. The most commonly used layers are:

  1. 1.

    Convolution layer,

  2. 2.

    Nonlinear layer,

  3. 3.

    Pooling layer,

  4. 4.

    Flattening layer,

  5. 5.

    Fully connected layer expressed as (Baloglu et al. 2019; Lee and Choi 2019; Acharya et al. 2017).

  6. 1.

    Convolutional Layer: The convolution process is the layer where the features of the image are determined. To determine more than one feature, the number of convolutional layers increases in the same proportion. This layer is the main building block of CNN.

  7. 2.

    Nonlinear Layer: This layer is also known as the activation layer. It is used to realize the activation of the system with nonlinear functions. Rectified linear unit function (ReLU), which is widely used because it is faster than others, is preferred in recent years.

  8. 3.

    Pooling Layer: Smaller matrices are obtained while preserving the properties of the existing input. In this way, the computational complexity is reduced.

  9. 4.

    Flattening Layer: The matrix format data obtained from the previous step is prepared following the fully connected layer.

  10. 5.

    Fully Connected Layer: It is the most important layer of convolutional neural network layers. The data are taken from the flattening layer and trained by the neural network and the learning process is performed.

2.3 Pre-trained architectures: AlexNet and SqueezeNet

AlexNet (Krizhevsky et al. 2012) has five convolution layers combined with max-pooling layers and three fully connected layers. It also includes a dropout layer and a softmax. Moreover, each layer is activated with the ReLU activation function. In 2012, it was used the ReLU activation function in place of the tanh function (Abdelmalek et al. 2019). Thus, it was seen that the architecture was accelerated. The total number of parameters is 62.3 million, and the input image size is 227 × 227.

SqueezeNet (Iandola et al. 2016) starts with an independent convolutional layer (conv1), follows by eight firing modules, and ends with the last convolutional layer (conv10). In total, it consists of ten convolutional layers, some max-pooling layers, and a SoftMax layer, in the recently presented version.

In this study, a novel CNN architecture is presented in the next section and it is compared with AlexNet and SqueezeNet on created different datasets.

2.4 Novel proposed CNN architecture

A CNN architecture usually consists of an input layer, some convolutional layers, some pooling layers, and a fully connected layer (Krak et al. 2020). In this study, we introduce a novel CNN architecture. It has seven convolutional layers, seven batch normalization layers, seven activation layers (ReLU), seven maximum pooling layers, and two fully connected layers with one dropout layer. Additionally, a SoftMax layer and a classification layer with an entropy approach are used as well. The convolution layers are effectively utilized for feature extraction from ECG image datasets. This is important since well feature extraction is also meaning very sensitive classification. Essentially, these layers are filtered to enhance the features of the primary signal while reducing the noise (Hua, et al. 2020; Li et al. 2018). The pooling layers reduce the dimension of the input images, and these are prepared for the next layer. Finally, extensive features in the fully connected layers are reduced with 0.5 probability by using the dropout layer and transferred to the SoftMax layer for the classification. Details of the parameters of the proposed CNN are given in Table 2.

Table 2 Proposed CNN architecture details

The proposed CNN is a novel architecture that has different filter sizes, number of filters, strides, and padding. Fundamentally, we develop the architecture for biomedical image classification. However, it is tested on known classical datasets such as CIFAR-10, like other CNN architectures. Additionally, it is utilized on Physikalisch-Technische Bundesanstalt (PTB) Diagnostic ECG Database (Özaltın and Yeniay 2021; Goldberger, et al. 2000). This proposed CNN is performed for not only signals but also brain computed tomography, detailed in Ozaltin et al. (2022). Moreover, this proposed CNN is named as OzNet in studies of Ozaltin et al. (2022). And, this architecture obtains successful performances in these datasets (Fig. 3).

Fig. 3
figure 3

Transformed images using STFT

In this study, the proposed CNN is compared with AlexNet and SqueezeNet using same fine-tuning parameters. Stochastic gradient descent method (sgdm) is performed as the optimization algorithm, and the momentum parameter is determined as 0.95, and the learning rate is also started with 0.0001 as constant. Figure 4 shows the proposed CNN scheme.

Fig. 4
figure 4

Proposed CNN architecture

2.5 Deep feature extraction

In this study, the proposed CNN can extract features from images effectively. Therefore, we use it both classifier and deep feature extractor. Although, when it is used for classification algorithm, the results are quite well, we decide to more improving results for obtaining the best one. Therefore, we designed a hybrid algorithm which is included the proposed CNN and SVM. In this section of study, the proposed CNN is assigned as automatic feature extractor from ECG images and SVM is employed for classifier. In brief, we can explain the steps of how to work it as follows: (i) the proposed CNN is trained on ECG images, firstly. (ii) Reduced features are obtained from the proposed CNN of fully connected layer and 4096 features are collected for each image. (iii) To classify with these features, the dataset is split into 30% training set and 70% testing set. This is because we want to obtain trustworthy classification results owing to dropout layer would not have much influence (Elleuch et al. 2016; Srivastava et al. 2014). Then, the trained net is activated. (iv) SVM classifier is employed to detect type of ECG, effectively. The same stages are happened when reduced features are achieved from maximum pooling (Max-Pooling 7) layer. Figure 5 demonstrates the scheme of the proposed CNN–SVM.

Fig. 5
figure 5

Proposed CNN–SVM algorithm

2.6 Support vector machine (SVM)

Support vector machine (SVM) is a machine learning algorithm that an effective separation with a kernel-based method to the datasets for classification or regression (Koklu and Ozkan 2020). It is improved by Cortes and Vapnik (1995) for two classes. Then, the algorithm is advanced and generalized for multiclass and nonlinear datasets. In general, the dataset can be separated in high-dimensional feature space with a kernel function. Also, SVM can be overcome confused datasets and overfitting. The most common representation of the SVM function is \(f(x) = w^{T} \phi \left( x \right) + b\) where \(w \in R^{n}\) \(b \in R\) and \(\phi \left( x \right)\) is a feature map.

3 Results

3.1 ECG dataset

In this study, we benefit from three different ECG datasets from PhysioNet databases (Physionet 2020). Each raw ECG dataset is taken with a signal length of 1 h and sampled at 128 Hz. The first ECG dataset consists of the ECG recordings from 48 patients, which contain two leads. It is received from the MIT-BIH Arrhythmia Database and referred to as ARR (Goldberger, et al. 2000; Moody and Mark 2001). The next ECG dataset consists of the ECG recordings from 15 patients, which contain two leads. It comes from the BIDMC Congestive Heart Failure Database and is named CHF (Goldberger, et al. 2000; Baim et al. 1986). The final ECG dataset consists of the ECG recordings from 18 patients, containing two leads. It is obtained from MIT-BIH Normal Sinus Rhythm and referred to as NSR (Goldberger, et al. 2000). There are a total of 96 ARR, 30 CHF and 36 NSR in the ECG dataset. In fact, this dataset is not suitable for convolutional neural networks because of demand pattern. That is why we convert the signals into the images. First, we normalize the dataset using the max–min normalization method. Next, one-dimensional ECG signals are transformed into images utilizing CWT with different sampling lengths of signals, 360 Hz, 500 Hz, and 1000 Hz. This is because we want to compare which sample length is better to see differences. Besides, three different mother wavelet functions: Amor, Bump, and Morse, are applied to each sample length to compare which mother wavelet function is better to detect differences. It also sizes each image to 227 × 227 × 3 and.jpg format. Therefore, we create nine different balanced datasets with identifying mother wavelet functions and signal lengths. Each dataset contains 900 images, and each class (ARR, CHF, and NSR) includes 300 images. After that, to compare the results, we also benefit from the STFT transform method to turn signals into images. Also, created this dataset consists of 900 images, and each class contains 300 images.

3.2 Experimental setup

In this study, we run AlexNet, SqueezeNet, and the proposed CNN to classify ECG datasets. In this study, we use splitting methods: training and testing sets, and cross-validation to compare affective classification performance. Primarily, the dataset is split conventionally as a training and testing set with 80:20 and 70:30 percentages. Next, k-fold cross-validation is performed, where k values are determined as 5 and 10. Further, we use the proposed CNN to automatically extract deep features. They are reached from the fully connected layer (FC-8) and maximum pooling layer (Max-Pooling 7), respectively. To classify these reduced features, we perform an SVM using Gaussian kernel function to detect ECG type from images. Therefore, we present a comprehensive study that effectively determines the ECG type.

3.3 Performance metrics

In this study, we review performance metrics of CNN architectures that are accuracy, sensitivity, specificity, precision, and F1-score in Eq. (812), as follows (Xu and Liu 2020; Abdelmalek et al. 2019):

$$ {\text{Accuracy}} = \frac{{{\text{TP}} + {\text{TN}}}}{{{\text{TP}} + {\text{TN}} + {\text{FP}} + {\text{FN}}}} \times 100\% $$
(10)
$$ {\text{Sensitivity}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FN}}}} \times 100\% $$
(11)
$$ {\text{Specificity}} = \frac{{{\text{TN}}}}{{{\text{TN}} + {\text{FP}}}} \times 100\% $$
(12)
$$ {\text{Pr}}ecision = \frac{TP}{{TP + FP}} \times 100\% $$
(13)
$$ {\text{F1 - Score}} = \frac{2 \times {\text{Precision}} \times {\text {Sensitivity}}}{{\text{Precision} + {\text{Sensitivity}}}} \times 100\% $$
(14)

where TP: true positive, FP: false positive, TN: true negative, and FN: false negative are expressed.

3.4 Experimental results

This study is conducted in a MATLAB 2021b environment with Intel Core i7-7500U CPU, NVIDIA GeForce GTX 950 M, 16 GB RAM and 64-bit Operating System. The aim of this study was to identify ECG types via CNN architectures and a designed hybrid algorithm. First, nine different ECG image datasets are created using CWT, and each is classified using AlexNet, SqueezeNet, and the proposed CNN with the same option parameters with 80:20 training and testing split percentages. In addition, the obtained results are tested with the Wilcoxon signed rank test. Tables 3 and 4 show both performance results and paired comparisons for statistical significance. Besides, all comparisons are demonstrated in Fig. 5.

Table 3 Wilcoxon signed rank test for proposed CNN and AlexNet
Table 4 Wilcoxon signed rank test for proposed CNN and SqueezeNet

When Table 3 is examined in relation to the sample length of the ECG between AlexNet and the proposed CNN, AlexNet gets a maximum accuracy of 94.67% at a sample length of 500 Hz. Also, the proposed CNN achieves the maximum accuracy of 98.00% at a sample length of 500 Hz. Finally, SqueezeNet achieves a maximum accuracy of 94.67% with a sample length of 500 Hz, as given in Table 4. Therefore, we can indicate that 500 Hz is the best one for the sample length of ECG.

When Tables 3 and 4 are also examined in terms of the mother wavelet function, Amor and Morse provide almost similar results to classify images for AlexNet and our proposed CNN. However, these results do not apply to SqueezeNet. When SqueezeNet is examined for the mother wavelet function, Bump is found to be the best. So, if researchers want to use SqueezeNet, they can choose to use the bump wavelet function while performing CWT. When Tables 3 or 4 is investigated for the proposed CNN in terms of the mother wavelet function, Amor’s choice for classifying the images is the best.

Although the results are quite good, we want to test these results for the reliability of this study using the nonparametric method, the Wilcoxon signed rank test. First, we make one hypothesis, which is a null hypothesis: there is no difference between AlexNet and the proposed CNN, and an alternative hypothesis: there is a difference between AlexNet and the proposed CNN. As a result, p value is obtained 0.018 < 0.05, and hence, null hypothesis is rejected. In this study, a significant level is determined as 0.05. Therefore, we can statistically say that there is a difference between AlexNet and the proposed CNN.

Though the results are rather good, as given in Table 4, we want to test these results for the trustfully of this study using the Wilcoxon signed rank test. First, we make one hypothesis, which is a null hypothesis: there is no difference between SqueezeNet and the proposed CNN, and an alternative hypothesis: there is a difference between SqueezeNet and the proposed CNN. As a result, p value is obtained 0.024 < 0.05, and hence, null hypothesis is rejected. Thus, we can statistically express that there is a difference between SqueezeNet and the proposed CNN.

As a result, the proposed CNN is the best choice to classify ECG datasets while using CWT and 80:20 training and testing percentages. Figures 6 and 7 display performance graphs for classification. In addition, Table 5 details the results with other performance metrics for each class.

Fig. 6
figure 6

Performance comparison of different sample lengths and mother wavelet function using CNN architectures

Fig. 7
figure 7

Performance comparison of different sampling lengths and mother wavelet function using the proposed CNN

Table 5 Performance metrics of proposed CNN, AlexNet, and SqueezeNet architectures

When all of the performance metrics in Table 5 are examined, these proposed CNNs metrics are met at over 96%. Specifically, the NSR performances are considered to be %100 in terms of specificity and precision score. In addition, its performances on other metrics are also over 98%.

When the metrics are examined, which the classifiers did well, it is noticeable that the proposed CNN’s F1-score is superior to the others in Table 5. Therefore, the proposed CNN is determined to be the best classifier in terms of performance metrics.

As a result of this part, the best signal length, mother wavelet function, and architecture are determined to be 500 Hz, Amor, and the proposed CNN, respectively. Thus, these foundations have shown that only one ECG dataset is classified. In addition, Fig. 8 shows the accuracy rate graph and the loss graph for the proposed CNN, while the signal length is 500 Hz and the wavelet function is Amor.

Fig. 8
figure 8

Accuracy rate and loss graph of training progress using the proposed CNN

Having determined the proposed CNN as the best architecture for classifying ECG images, we examine the impact of other split methods on performance. First, the ECG image dataset created with 500 Hz sample length and Amor wavelet function using CWT is divided into 80:20 and 70:30 training and test sets, respectively, and then, we use a fivefold and tenfold cross-validation. The results are shown in Tables 6, 7, and 8.

Table 6 Proposed CNN performance metrics over five training sessions with an 80:20 training and testing split using CWT
Table 7 Proposed CNN performance metrics over five training sessions with a 70:30 training and testing split using CWT
Table 8 Proposed CNN performance metrics with fivefold and tenfold cross-validation using CWT

When Table 6 is viewed, all mean performance metrics are observed above 96.52% and also the maximum standard deviation (Std) was 0.0173. Therefore, the proposed architecture is traditionally trained and tested to classify images.

According to Table 7, all mean performance metrics are above 95.3% and also the maximum standard deviation (Std) was 0.01705. Thus, it can be said that a training and testing split of 80:20 has the best performance for classifying ECG images while performing CWT.

According to Table 8, all average performance metrics are seen, and the maximum average accuracy of 97.22% is obtained through tenfold cross-validation. Concluding on the use of CWT, the performances expressed that the cross-validation is better than the split method for training and testing. Perfect performances for classifying ECG images are achieved using CWT and the proposed CNN. However, we would like to see how other pre-processing methods affect the performance of the proposed CNN using the same splitting methods. Therefore, we prefer to use STFT method which is performed widely. Its performances are shown in Tables 9, 10, and 11.

Table 9 Proposed CNN performance metrics over five training sessions with an 80:20 training and testing split using STFT
Table 10 Proposed CNN performance metrics over five training sessions with a 70:30 training and testing split using STFT
Table 11 Proposed CNN performance metrics with fivefold and tenfold cross-validation using STFT

According to Table 9, all the average performance metrics are shown above 89.3% and also, the maximum standard deviation (Std) was 0.013438. Therefore, the proposed architecture is traditionally trained and tested at 80:20 to classify images using STFT.

According to Table 10, all the average performance metrics are shown above 89.2% and also, the maximum standard deviation (Std) was 0.0169. Therefore, when the proposed architecture is trained and tested at 70:30 to classify images using STFT, performance results are similar to 80:20 training and testing split.

According to Table 11, all average performance metrics are observed, and the maximum average accuracy of 91.11% is achieved through fivefold cross-validation. Final on the use of STFT, the performances indicated that the cross-validation is better than the split method for training and testing. Compared with CWT, STFT is not preferred to create ECG images as its performances are lower than CWT using the proposed CNN. In general, however, the proposed CNN in this study achieves quite good classification performance for recognizing ECG types.

Indeed, in this study, our main contributor wants to find the best algorithm to detect ECG types. Thus, the proposed CNN is used as a deep feature extractor from images. Having trained proposed CNN for the ECG images using CWT through 80:20 splitting method because of the highest accuracy rate, reduced features are obtained from the fully connected (FC-8) layer and maximum pooling layer (Max-Pooling 7), respectively. These features are classified using SVM classifier. Therefore, we designed novel hybrid algorithm thanks to the proposed CNN and SVM. Table 12 exhibits performance results.

Table 12 Performance metrics of proposed CNN–SVM algorithm

According to Table 12, all performance metrics are increased for two different processes. However, the highest accuracy of 99.21% is achieved when retrieving features from Max-Pooling 7 layer. In this study, while using CWT, the proposed CNN–SVM is seen as the best algorithm for recognizing ECG types. Additionally, Fig. 9 displays a confusion matrix of the proposed CNN–SVM with the highest.

Fig. 9
figure 9

Confusion matrix of the proposed CNN–SVM

This study is conducted not only with CNN, but also with an SVM classifier, which is very successful in image classification. The combination of these two methods, which are very successful individually, has proven itself very well. Table 13 shows a comparison of all methods in terms of performance metrics while using CWT.

Table 13 Comparison of all methods in terms of performance metrics when using CWT

4 Discussion

In this study, we aim to investigate whether ECG types are distinguishable from ECG-created images using deep learning structures and which type of ECG images (CWT or STFT) is efficient in recognizing ECG types using deep learning. Actually, our study possesses some advantages and disadvantages as follows:

Advantages of this study are as follows: (i) Different sample lengths (360 Hz, 500 Hz, and 1000 Hz) are researched while using CWT, and 500 Hz is seen as an efficient sample length when one-dimensional signals are converted into images. (ii) Different mother wavelet functions (Amor, Morse, and Bump) are examined which one is more efficient on CNN architectures classification performance while performing CWT. (iii) This study presents a novel CNN architecture, called proposed CNN, and it is compared with AlexNet and SqueezeNet. (iv) Amor wavelet function is viewed successfully when using AlexNet and the proposed CNN, and the Bump wavelet function is high performance for SqueezeNet. (v) The proposed CNN has the highest performance in generating ECG datasets and is tested for significant differences via the Wilcoxon signed rank test. (vi) CWT is compared with the STFT method using the proposed CNN. (vii) Performances are measured on different splitting methods: training and testing (80:20, 70:30), and k-fold cross-validation (5, 10). (viii) The proposed CNN is performed as a deep feature extractor and provides from fully connected and maximum pooling layer. (ix) As a result, a new hybrid algorithm with the proposed CNN and SVM is designed. In this stage, SVM is used as a classifier to increase the performance of the distinguishability of ECG types. Disadvantages of this study are researched limited ECG types (ARR, CHF, and NSR) and the number of individuals.

Many approaches are used for the classification of arrhythmia (ARR), congestive heart failure (CHF), and normal sinus rhythm (NSR) datasets. Basically, successful classification is very important for diagnosis and treatment. Therefore, in this study, we propose a novel 34-layer deep learning algorithm, called proposed CNN. Besides this ECG dataset, other datasets have also been classified using our proposed CNN, such as the PTB ECG dataset, CT images of brain hemorrhages, and the CIFAR-10 dataset. As is known, the pre-trained CNN architectures are tested on the traditional dataset. In addition, the proposed CNN architecture is also tested on the CIFAR-10 dataset in this study and examined whether it could make a successful classification. The CIFAR-10 dataset consists of 10 classes and 60,000 images. Similarly, this huge dataset is split 80% for training and 20% for testing, as shown in the study. In this way, 50,000 images are trained and 10,000 images are also tested. Also, the same option parameters are applied to both sets of data. Table 14 shows the proposed CNN success on different datasets. In addition, Fig. 10 displays the confusion matrix for the CIFAR-10 dataset.

Table 14 The proposed CNN performance on different datasets
Fig. 10
figure 10

Confusion matrix of the proposed CNN for the CIFAR-10 dataset

As can be seen, the performance of the proposed CNN is very good. However, as mentioned earlier, this CNN must be excellent for classifying biomedical signals or images. Therefore, the proposed CNN is merged with SVM for perfect classification. In general, if a CNN architecture has a fully connected layer, that layer is used for obtaining features and combined with SVM. Of course, this method offers good advantages because of the extracted features. However, the deep learning algorithm (also CNN) is a complex nonlinear model and is referred to as a black box (Guidotti et al. 2018). Accordingly, it has to be investigated which last layers have good properties within this probabilistic process. Among all these considerations, the characteristics in the Max-Pooling 7 (just before the FC-8 layer) are also examined in the present study. According to the knowledge gained in this study, it is necessary to examine the features in the last layers for a more sensitive analysis, which are listed in Table 12. Apart from this, when the literature is searched on the same property ECG dataset, the proposed CNN–SVM hits the top in terms of accuracy rate, detailed in Table 14.

5 Conclusion

Many of sudden deaths from heart disease continue to increase these days with the coronavirus (COVID-19). Based on this, the automatic classification of the signals received from the heart is of great importance for diagnosis and treatment. In this study, we classify ECG types using our proposed CNN, which has overcome overfitting with the dropout layer. This CNN is also performed on other datasets, shown in Table 14. In addition, the proposed CNN is compared to AlexNet and SqueezeNet on nine different ECG image datasets processed via CWT using three different wavelet functions and three different sample lengths. All results show that the best sample length is 500 Hz and the best mother wavelet function is “Amor.” Also, the comparison of classification success in terms of the overall accuracy rate of the proposed CNN, AlexNet, and SqueezeNet is 98%, 94.67%, and 94.67%, respectively. Therefore, the proposed CNN architecture performs the best classification on the ECG image dataset generated with the Amor wavelet function and the 500 Hz sample length by using CWT. However, we want to search how another pre-processing method affects classification success and so, we generate new ECG images using STFT with 500 Hz sample length. In this way, we use not only a splitting method as training and testing (80:20, 70:30), but also cross-validation implemented on two created datasets. According to the ECG image dataset generating via CWT, when the dataset split training and testing as 80:20, all mean performance metrics are over 96.5%, and also maximum standard deviation (Std) is 0.0173 on testing the ECG dataset. When the dataset split training and testing as 70:30, all average performance metrics are over 95.3%, and the highest Std is 0.01705. Further, as fivefold and tenfold cross-validation methods are implemented on the dataset, average accuracies are 96.44% and 97.22%, respectively. Also, the maximum average accuracy of 97.22% is obtained through tenfold cross-validation. Resulting of the use of CWT, the performances expressed that cross-validation is better than training and testing. According to the ECG image dataset creating via STFT, when the dataset split training and testing as 80:20, all average performance metrics are above 89.3% and also the maximum Std is 0.013438. While the dataset split training and testing as 70:30, all mean performance metrics are above 89.2% and also the maximum Std is 0.0169. Besides, when fivefold and tenfold cross-validation methods are applied on the dataset, average accuracies are 91.11% and 87.66%, respectively. All these results show that CWT is better than STFT to detect types of ECG.

The main purpose of the study is to find an excellent classification algorithm for recognizing the ECG types. Therefore, the proposed CNN is merged with SVM. In this stage of the study, the proposed CNN is employed as a deep feature extractor from ECG images generated with CWT. In general, if any CNN architecture has a fully connected layer, it is used for obtaining features. It is highlighted that it can provide an advantage to examine features from the last layers of CNN, such as the max-pooling layer, in this study. To improve the proposed CNN performance, Max-Pooling 7 and FC-8 layers are used attaining reduced features, and the results are detailed in Table 12. As a result, the highest success with an accuracy of 99.21% is achieved by Max-Pooling 7 layer. When comparing to other studies on similar ECG datasets, the proposed CNN–SVM is considered the best performing for classification, detailed in Table 15.

Table 15 The comparison of classification performances for different studies on ECG signals

This study applies deep learning algorithms for ECG-type detection as an assisting decision support system. As such, clinicians will not spend much more time identifying ECG types, and the proposed pipeline will help physicians and professionals better identify ECG types in a hospital setting. In future work, we will continue to search for the detection of various diseases on signals or images by deep learning algorithms.