FormalPara What’s new?
  • An in-hospital all-cause mortality prediction model based on the 12-lead electrocardiogram (ECG) upon admission was developed and externally validated in a large cohort of COVID-19 patients.

  • ECG-based prediction models might assist in risk stratification of COVID-19 patients, although with limited value above age and sex.

  • Age, low QRS voltage, ST depression, premature atrial complexes, sex, increased ventricular rate, and right bundle branch block were predictors for mortality in COVID-19 patients.

  • Transfer learning of a deep neural network to raw ECGs from a very heterogeneous source performed similarly to the time-consuming human interpretation of the ECG for mortality prediction in COVID-19 patients.

Introduction

Since December 2019, Coronavirus Disease 2019 (COVID-19), spread around the world and caused a global pandemic. Several (cardiovascular) risk factors have been identified to be associated with increased morbidity and mortality [1]. Although vaccination has started in late 2020, the long-term protection of the vaccines is still unknown and the virus has already shown to mutate and a new variant could prove to be resistant to current vaccines [2]. Therefore, it is expected that COVID-19 will be a major ongoing burden to public health for the foreseeable future.

The electrocardiogram (ECG) is an easy to assess, widely available and inexpensive tool that is frequently used during the work-up of hospitalised COVID-19 patients. As the ECG can show signs of both cardiovascular risk factors, such as pathological Q‑waves, as well as current cardiac and pulmonary disease, it could be a useful tool for prognostication. Previous studies have already shown that individual ECG features—e.g. the presence of sinus tachycardia, atrial fibrillation, QRS prolongation, pathologic T‑wave inversion, non-specific repolarisation changes and signs of right ventricular strain at time of admission—yield value to predict in-hospital and out-of-hospital mortality in COVID-19 patients [3,4,5]. However, these studies only looked at individual risk factors and did not assess prediction models solely based on raw ECGs, a process that can be made autonomous without requiring human interaction.

Human interpretation of the ECG is a time-consuming task that suffers from high inter-observer variability [6]. Recently, machine learning and specifically the subclass of deep neural networks (DNNs) has gained much attention in the field of ECG analysis and has the advantage to learn complex patterns in the ECG signal, only based on the raw waveform as input. In short, DNNs can be described as algorithms using multiple layers which are able to gradually extract features from a raw input signal. For example, the first layers would extract only simple features, whilst later layers combine these into more complex and detailed features, which are then used for classification of a specific problem. This way both pattern discovery and classification are optimised jointly, which is a great advantage over the two-step approach when manually annotating ECGs [7]. Several studies have presented DNN models, analysing very large datasets of raw ECGs to diagnose a cardiac disease or to predict outcome of cardiac patients, often outperforming experienced cardiologists in these tasks [8,9,10,11,12]. Previous studies suggested that transfer learning, where a previously trained model is transferred to a new dataset or task, can improve performance of ECG-based DNNs [12, 13].

With this study, we aim to evaluate the value of using the ECG to predict in-hospital all-cause mortality of COVID-19 patients by specifically analysing the ECG at hospital admission. Furthermore, we aim to evaluate whether a transferred DNN can be trained on this new, relatively small multicentre dataset and complex task, in comparison with a logistic regression-based approach with manually interpreted ECG features and only age and sex. Secondly, we aim to identify specific ECG features associated with mortality in patients diagnosed with COVID-19.

Methods

For this retrospective multicentre cohort study, we used data of seven hospitals in the Netherlands participating in the CAPACITY-COVID registry, which has been described previously [1, 14, 15]. In short, CAPACITY-COVID is an international consortium established to evaluate the role of cardiovascular disease in the COVID-19 pandemic. To date, data from 74 centres, originating from 13 different countries, leading to a dataset of >17,000 patient records. From a subset of patients included in the CAPACITY-COVID registry that were admitted to the emergency ward between March 1, 2020, and August 28, 2020, clinical data and raw-format 12-lead resting ECGs (XML) were collected. After collection, the ECGs were evaluated by multiple experienced investigators and ECG features were manually extracted, if present. Data was split into a development dataset and a validation dataset. Three mortality prediction models were developed: (a) a logistic regression baseline model using age and sex, (b) a least absolute shrinkage and selection operator (LASSO) model using age, sex and human annotated ECG features, and (c) a pre-trained deep neural network (DNN) using age, sex and the raw ECG waveforms. To better evaluate generalisability, we used data from two separate hospitals (n = 248) for external validation. A complete overview of the methods, including an extensive description of patient selection, data collection, ECG evaluation, data pre-processing and model development processes can be found in the Supplementary Methods section in the Electronic Supplementary Material (ESM).

Results

From a total of 1396 COVID-19 patients from seven centres participating in the CAPACITY-COVID registry, 971 (69.5%) had a raw format ECG available, and 882 (63.2%) were found eligible for inclusion in this study (Fig. S1 in ESM). Baseline characteristics of the included patients are presented in Tab. 1, stratified by outcome. Overall, included patients had an age of 67 ± 14 years and 35% were female. Cardiovascular risk factors were common in this cohort, with hypertension in 51%, diabetes in 26%, and coronary artery disease in 21% of patients. Baseline characteristics, stratified per participating centre, are shown in Table S1 (in electronic supplementary material [ESM]). The final development dataset consisted of 634 patients, of which 179 (28%) died, and the validation dataset consisted of 248 patients, of which 57 (23%) died.

Table 1 Baseline characteristics of all patients included in this study stratified for mortality

Baseline age and sex model

The baseline age and sex logistic regression model had a mean c‑statistic in the development dataset of 0.73 (95% confidence interval [CI] 0.69–0.77). In the validation dataset, the c‑statistic was 0.73 (95% CI 0.65–0.79). Age and female sex were both important predictors with coefficients of 0.90 (odds ratio [OR] 2.5 per one standard deviation [SD] or 14-year age increase) and −0.51 (OR 0.60) respectively.

LASSO model

The manually derived ECG features used by the LASSO model are shown in Tab. 2, stratified by outcome. Of the 882 manually annotated ECGs, disagreement was resolved by consensus in a panel discussion in 74 (8.4%). The ventricular rate in the overall study was 90 ± 20 beats per minute, the majority was in sinus rhythm (85%), and one third had non-specific ST‑T abnormalities (34%). The final LASSO model consisted of 12 ECG features that were predictive for the outcome in the development dataset. The c‑statistic for the LASSO model was 0.76 (95% CI 0.72–0.80) in the development dataset and 0.76 (95% CI 0.69–0.82) in the validation dataset. The most important features for predicting mortality were age, low QRS voltage, ST depression, premature atrial complexes, ventricular rate and right bundle branch block (RBBB). Female sex and left bundle branch block were predictive for a lower risk of mortality. Coefficients of the ECG features included in the final model are shown in Fig. S2 (in ESM).

Table 2 ECG characteristics used in the logistic regression model, stratified for mortality

Deep neural network

The c‑statistic for the deep neural network was 0.86 (95% CI 0.83–0.89) in the development dataset and 0.77 (95% CI 0.70–0.83) in the validation dataset. Calibration slope and intercept, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) of all three methods for the validation dataset are presented in Tab. 3. An overview of both methods is shown in Fig. 1. Fig. 2 shows the differences in predicted probabilities between the DNN model and the LASSO model. Correlation between the two models was high (Pearson correlation r = 0.86, p < 0.001). Table S2 (in ESM) shows the performance for all three models in the different hospitals separately.

Table 3 Prognostic performance of the three models in the validation dataset
Fig. 1
figure 1

Variable importance for the manually extracted ECG features in the LASSO model. Coefficients represent the beta coefficients of the normalised variables in the LASSO model and can there be interpreted as importance values. Negative values point at a lower risk of mortality

Fig. 2
figure 2

Predicted probability for the deep neural network compared with the LASSO algorithm with manually extracted ECG features, with the probability cut-offs of 30 and 25%, respectively. Inspection of the ECGs in the right lower corner (i.e. correct predictions by the DNN and not the LASSO) showed frequent tachycardia and low QRS voltage that did not meet the criteria, while the age was mostly below 70 years. Inspection of the left upper corner showed that these ECGs were normal, but patients had a high age of up to 93. (DNN deep neural network, ECG electrocardiogram, LASSO least absolute shrinkage and selection operator)

Subgroup analyses

Performance was similar for males and females for all three models investigated. For males, c‑statistics ranged from 0.72 (95% CI 0.63–0.81) to 0.77 (95% CI 0.68–0.85) and 0.77 (95% CI 0.69–0.86) for the age and sex model, the LASSO model and the DNN respectively. For females, c‑statistics ranged from 0.74 (95% CI 0.63–0.85) to 0.76 (95% CI 0.65–0.86) and 0.76 (95% CI 0.65–0.86) respectively.

Discussion

This study is the first to suggest that machine learning models are able to specifically predict in-hospital all-cause mortality in COVID-19 patients with the ECG during admission as input, although the added value to a model with only age and sex is limited. Two approaches were evaluated and externally validated in a unique multicentre cohort of COVID-19 patients: a classical LASSO logistic regression on human annotated ECG features and a fine-tuned DNN, which was previously pre-trained on raw ECG signals of almost 300,000 patients. This is one of the first studies to show that a pre-trained DNN can be easily transferred to other datasets, with other ECG machines and a completely different task, with similar accuracy as the very time-consuming manual extraction of ECG features. External validation was performed in different centres, which is uncommon in the current research on DNN-based ECG interpretation. This is a major finding, as it opens up possibilities for the highly dimensional DNNs in smaller datasets with complex prediction tasks. Finally, multiple new ECG abnormalities where detected that are predictive of mortality in COVID-19 patients.

In our dataset, we have demonstrated that the ECG alone, a simple and widely performed test, includes features with predictive value in risk stratification of COVID-19 patients. A previous large multicentre study has demonstrated that a prediction model based on clinical variables (such as age, sex, comorbidities, laboratory values and lab tests) performs well in predicting mortality in COVID-19 patients with c‑statistics up to 0.77 [16]. Our study suggests that the ECG performs similarly, with the advantage that it uses just one modality and can be automated. An important finding in the current study is the limited added predictive value above age and sex. This is in line with earlier studies that developed prediction models for in-hospital mortality and showed that, in multivariable models, prediction is mostly driven by age and sex. These studies did not evaluate age and sex alone, they included several other, mostly clinical, parameters, which might also have limited value above age and sex, similar to our findings [15,16,17,18]. Within our current study, we did not include any clinical data besides age and sex, because we aimed to evaluate only data derived from the ECG file itself. Age and sex are always present in the metadata of an ECG and therefore these could be included. The main reason for this was to use only one data modality, which is time efficient, instead of having to derive data from different sources, such as medical records or laboratory results.

The DNN used in the study, which was previously trained to perform comprehensive ECG interpretation on a large non-COVID-19 ECG dataset from four different countries, showed to be transferrable to the complex mortality prediction task of this study with minimal fine-tuning [13]. Even after external validation in different hospitals with different ECG devices, something which is rarely performed in machine learning studies, the transferred DNN remains of predictive value. It is especially important to note that while the DNN is pre-trained and fine-tuned on datasets mostly using GE MAC ECG devices, it also performs very well in external hospital 6, which uses the Welch Allyn Mortara device (Table S2 in ESM). Moreover, we observe that performance aligns with the human annotated model, which indicates that the ECG device (with different filters and electrodes) has very limited influence on the performance of DNNs. These are very promising findings, as it could allow the use of pre-trained ECG-DNNs for smaller datasets in the process of diagnosis or prediction using the raw format ECG. More conventional analysis techniques, such as multivariable logistical regression, require manually selected features as input, for which collection and labelling of these features is a time-consuming process when analysing large datasets. In our study, the performance of our deep learning model was similar to that of the LASSO logistic regression, illustrating the potential of deep learning as a time-efficient raw ECG data analysis technique. This is especially important in settings where expert knowledge in ECG interpretation is not readily available. Although the performance of the DNN model was still similar to the performance of the logistic regression model in the external validation set, the c‑statistic did decrease from 0.86 to 0.77. This indicates that the DNN model, despite measures taken to prevent overfitting (i.e. pre-training, freezing of layers and dropout), is still overconfident in the training dataset due to the larger number of trainable parameters. It remains important to use rigorous validation methods in deep learning research to obtain reliable estimates of predictive performance.

The current evaluation of ECG features in 882 COVID-19 patients is the largest to date, containing data collected from seven different hospitals across the Netherlands [1, 15]. Previous studies have described ECG features associated with poor outcome of COVID-19 patients. For example, Raad et al. identified ECG abnormalities associated with right ventricular strain (such as SIQIIITIII pattern and incomplete/complete RBBB) to be associated with the need for mechanical ventilation and mortality in patients admitted with COVID-19 [4]. With our data, we reproduced that the presence of RBBB is predictive of in-hospital mortality, but we were not able to do the same for the other features of right ventricular strain. A possible explanation for this is that the patients included in the study of Raad et al. were more severely affected by COVID-19, e.g. right ventricular overload due to severe pulmonary disease and potentially pulmonary embolisms, than the patients in our study [4]. Another study by McCullough et al. evaluating 756 COVID-19 patients, showed the presence of premature atrial contractions, RBBB and ischaemic T‑wave inversion to be associated with mortality, which corresponds to our findings [3]. This study is the first to identify ST depression and low QRS voltages as risk factors for mortality in COVID-19 patients. While ST depression may be a sign of myocardial hypoxia due to malperfusion, low QRS voltages may either represent emphysema and obesity, which are both previously identified as important risk factors for adverse events in COVID-19 patients. Low QRS voltages may also represent pericardial effusion, although pericarditis was only diagnosed in one patient in the entire CAPACITY cohort [1, 15].

Limitations

There are several limitations to acknowledge. No follow-up data was available to us after patients were discharged from the hospital, therefore we could only assess the in-hospital mortality and mortality after discharge was not included in this study. The multicentric nature of this study is a strength. However, since not all hospitals included in this study use the same ECG machine and different machines use different filtering settings. This could possibly have introduced bias when analysing the raw ECG data using deep learning. To minimise this effect, we applied several pre-processing steps to normalise the input data per hospital. Furthermore, it is difficult to understand what features DNNs learn for their prediction and also in this study, we do not know the ECG features used by the DNN. Some techniques have been developed for this purpose, however visualisation of DNN remains a topic of ongoing research and therefore could be considered in near future. Finally, the models were evaluated in an external dataset with two Dutch hospitals, which might limit the generalisability to other countries or settings.

Conclusion

This study shows that ECG-based prediction models could be helpful for initial risk stratification of patients diagnosed with COVID-19, although the added value above age and sex is limited. Secondarily, several ECG abnormalities are associated with in-hospital all-cause mortality of COVID-19 patients. Furthermore, this proof-of-principle study shows that the use of pre-trained DNNs for ECG analysis does not underperform compared with multivariable regression analysis of ECG features derived using time-consuming human interpretation.