Image Aquisition
The ECG records were available in paper form. The first step is converting it into an image by scanning or through camera capture. We had received data from Saidhan Hospital and STEMI Global. The database contains images of ECG captured from the camera (Samsung Galaxy S7) and also 12 lead ECG data records in the form of pdf files (Model: ECG600G).
Pre-processing
The 12 lead ECG data records are available in the form of pdf files. These pdf files had to be converted into an image format for further processing like converting 12 lead ECG records into single leads. 12 lead ECG was available in the form of pdf. But to process it we would be needing it in an image form (JPEG or PNG). This was done using the pdf2image python library.
After obtaining the images of 12 lead ECG, 12 different images were obtained with a single lead in each image. There were some files with continuous 12-leads that were manually converted which was done by writing a semi-manual algorithm using the OpenCV library. Here, the left and right mouse clicks were used to draw horizontal and vertical lines on the 12 lead ECG image to get a grid in which each box of the grid contains a single lead.
These boxes were saved as separate images to get 12 different images from one single 12 lead ECG image. (Code is given in the Supplementary Information.) For improving the computation complexities and accuracy, it was important to study every lead of the ECG. Rectangular boxes were located by shape detection and appropriately cropped to get 12 different image files. (The code for automated single lead extraction is also given in Supplementary Information.)
Binary Image Extraction
We find all the possible threshold values for the ECG signal by the brute force method. Possible threshold values range from 0 to 255. By applying these threshold values, we can obtain four types of images (viz. fully black images, fully white, images with the grid and signal, images without the grid, and only the signal). We only wanted to extract the signal without the grid. The process was repeated manually for all images where a 1-D characteristic curve and a single threshold value were generated to generate the training set for the deep learning-based model. Once trained, this model provided an automated threshold for any given input image. Once we got the desired image with only the ECG signal we noted the value of LOB for the image. In the case of thresholding, we have taken the data of level of binarization at every threshold i.e. from 0 to 255 and this was repeated for 66 images. The global thresholding concept of the LOB characteristic curve used for image Binarization is shown in the Fig. 2. Binarization of a sample grayscale image is shown in Fig. 2a–c. Figure 2d–h shows the concept of the normalized sum which is zero for a completely black image and one for a completely white image. From Fig. 2i–n, we can observe that as the value of LOB decreases, normalized sum (NS) gets incremented as the number of white pixels increases. At LOB = 50, white pixels occupy majority place and we get the value of NS as 0.997. At LOB = 200, the picture is mostly black and hence, NS = 0.073. Figure 2o shows a chessboard image with its corresponding characteristic curve, which shows the step transition from black to white. Similarly, different gray level image (as shown in Fig. 2q) has a characteristic curve with eight steps. The characteristic curve is also plotted for a single-lead ECG paper record showing two slopes, one for ECG signal and other for background grid. This curve is used as input to the deep learning model for automated threshold calculation.
Deep Learning-based Binarization
After extracting a LOB characteristic curve in the previous step, we made a deep learning model to predict threshold values for automated binarization. In this case, we have used a deep learning model to predict the threshold which uses the level of binarization. For thresholding using deep learning, we have prepared a dataset containing all (0 to 255) thresholding values during the Level of binarization value, with respect to a single image. The delta of the characteristic curve was calculated to avoid overfitting during the DL training phase. We subtracted the current value from its next value to find the delta of the characteristic curve. Then we took the inverse of the data so that we can get the required features. Then we applied the sequential model and the layer used was the Dense Layer, and the activation function used was ReLU. We had one input layer, two hidden layers, and one output layer. Then we applied EarlyStopping, ModelCheckpoint, and ReduceLROnPlateau function. EarlyStopping function is used when the learning rate of the model doesn’t change. To save the best models, the ModelCheckpoint function was used so that we can use them for prediction. ReduceLROnPlateau decreases the learning rate (LR) when the accuracy of the model doesn’t increase. Later the best model was chosen. Then we applied the model and got the threshold predicted values from the DL model. The entire data was exported into an excel sheet.
Figure 3a shows the deep Learning model for determining the correct threshold value for the input ECG report for binarization. The image was first binarized and then during vertical scanning, it was converted into a 1D array. The width of the image acts as the size of the input 1D array. In this, we first passed the characteristic curve as an input to the Dense layer of size \(1 \times 253\). This is passed through two Dense hidden layers of size \(1 \times 253\). After this, we use the dropout layer of size again \(1 \times 253\) which is passed through a Dense model of dimension \(1 \times 1\). After this, we get the number called LOB which is the threshold value. This is the value at which we should binarize the image for getting the correct binarized output.
Post-processing
Once we have the binary signal, we stepped into the discontinuity detection process to check if there was any broken signal while binarizing. Now, a signal being an analog quantity, it would always show a smooth change and not a sudden change. Hence, to re-construct the broken signal, we have used a 1-D signal reconstruction algorithm. This algorithm helped join the broken parts from ECG and avoid sudden change. In the first, dilation and skeletonization are performed. The next step was the lead name removal, which was present on every lead image. We performed vertical scanning of every column of the image array, starting from the bottom of the columns to the top. During the scan, we knew that if we were going to get the first white pixel from the bottom, it would be the ECG lead, so we ignore that because we want to preserve the ECG signal. All the white pixels above the lead pixels of that column will just be the lead name character white pixels, which need to be removed. So, we convert the character pixels to black. These lead names act as a disturbance to the image, and hence it needs to be removed. We removed all the names, characters, or printed values so that only the signal remains and rest all unwanted data gets filtered out. To remove shadow impressions, we split the image into RGB values then we applied dilation which resulted in the reduction of the black shadow. Further, we applied median blur followed by the normalization to remove the salt pepper noise. The above procedures were implemented for all three RGB values. In this 1-D signal reconstruction algorithm, the broken binary signal obtained was made continuous, and then the lead name removal was done. For the first subpart, OpenCV operations were applied to the broken signal. This signal was then dilated, which increased the thickness of the ECG signal. This filled the broken gaps. Skeletonization was then applied to this thick signal to reduce the thickness again. So, by dilating and later skeletonizing, the broken signals were made continuous. Finally, we merged the result of all three parts to produce our final result without shadow impressions.
Signal Extraction
The objective was that we need to find the value of ECG in terms of voltage and time. We know that the dimension of a single box in hard-copy ECG paper is 1 cm \(\times\) 1 cm. We also know that an image is an array of pixels but in an image, the dimension of a single box in terms of pixel value will not be the same as 1 cm \(\times\) 1 cm. So the approach is that we are calculating the single box size of the graph in terms of the pixel, then we are calculating the different peak values of ECG in terms of pixel and remapping them back to the voltage. After post-processing, the 2-D image is vertically scanned for the identification of ECG signal pixels, and the identified signal pixels are then stored as an array. For obtaining time and voltage values, we performed a different type of detection where the red squares were preserved, and the ECG signal was removed. From those red squares, each red square in the time domain was 0.2 seconds, and in the voltage domain, it was 0.5 mV. So basically, from the domains, we got the scale for X-axis and Y-axis. And then, these scales were converted into pixels. So whatever X and Y coordinate values we were getting from pixels, we converted it into time and voltage. From time and voltage, we fetched a corresponding 1-D signal with the determined time and voltage values.
Deep Learning Model for Diagnosis
The most significant waves seen in a normal ECG are P wave, QRS complex, and a T wave. Different types of abnormalities such as ST-segment elevation myocardial infarctio (STEMI), Left Bundle Branch Block (LBBB), Right Bundle Branch Block (RBBB), and T-wave abnormality can be found out from the ECG records of a patient. STEMI is a type of myocardial infarction. A part of the heart muscle becomes dead due to the obstruction of blood supply in that area. LBBB is an ECG abnormality in the conduction system of the heart where the left ventricle contracts later than the right ventricle. In the case of RBBB, the right bundle branch is not directly activated by impulses whereas the left ventricle functions normally. Sinus rhythm is necessary for the conduction of electrical impulses within the heart. A strangely shaped T wave may signify disruption in re-polarization of the heartbeat and possibly be identified as sinus rhythm T-wave abnormality. Deep learning-based ECG diagnosis algorithm classifies the given ECG images into five different classes (normal, STEMI, LBBB, RBBB, and T-wave abnormality).
The deep learning model for the diagnosis using digitized ECG for heart abnormalities consists of various layers. For the deep learning model, we have used 400 points sample. The input is a 400 \(\times\) 1 matrix 1-D signal which is passed to a 2-D convolutional layer of size is 3 \(\times\) 16. After getting the binary image without any noise and lead character, we perform vertical scanning of the image array starting from the left-bottom of the image. When we get the first white pixel in a column, we know it is an ECG signal and store its corresponding value on the y-axis. Iterate it for the remaining 399 columns. This is how we get the 400 point training data and then we also store the corresponding disease value as label data. The extra space is filled by padding the same numbers. We then apply a ReLU layer to it. Two fully connected layers act as a hidden layer after applying the ReLU layer. There is a 5 neuron output layer that is given to the softmax and classification layer for classifying the abnormalities into 5 different classes.
Figure 3b shows this model, in which we provide the binarized ECG image as an input to the model, of dimension \(400 \times 1 \times 1\). This is further passed through a 2-Dimensional convolution network along with the ReLU layer which is of dimension 3x16. After this, we use a series of 384 and 2 fully connected layers, which also have hidden layers with them. Finally, we use the Softmax layer which gives the diagnosed output. We classify them under the following categories: Normal, Stemi, LBBB, RBBB, and T-wave abnormality.
In this model, we have taken input (1-D ECG signal) of size (400, 1, 1). Further, we have passed this input in our model. The model comprises a convolutional layer with filter size = 3, number of filters = 16, followed by ReLu layer. We have used 3 fully connected layers followed by the SoftMax layer for classification into different labeled data. The input format of the dataset (400, 1, 1) allowed us to use the 2D convolution layer. If we set the ’Padding’ option to ’same’, then the size of the padding for the output during the training will be same as the input. The software adds the same amount of padding to the top and bottom, and to the left and right. If the padding that must be added vertically has an odd value, then the software adds extra padding to the bottom. If the padding that must be added horizontally has an odd value, then the software adds extra padding to the right.