Introduction

Sepsis is a life-threatening organ dysfunction caused by dysregulation of the host response to infection and is a major healthcare problem worldwide [1, 2]. In 2017, a total of 48.9 million cases of sepsis were recorded worldwide, and 11.0 million sepsis-related deaths were reported, representing 19.7% of all the global deaths [2, 3]. Although the mortality rate associated with sepsis has decreased by 52.8% over the past 20 years, the incidence has increased, likely reflecting the aging population with more comorbidities [3]. Because sepsis is a medical emergency that requires immediate treatment and resuscitation, early recognition is a cornerstone for preventing disease progression and death [2].

Vital signs and blood tests are required to screen and diagnose sepsis [1]. Vital signs are measured by the medical staff at intervals, and blood tests require infrastructure for blood sampling and analysis. Therefore, it is difficult to monitor the occurrence of sepsis in real time in hospitals. Sepsis has its highest burden in areas with a lower sociodemographic index as these areas lack medical resources for screening, diagnosis, and treatment of sepsis [4]. Furthermore, home monitoring for the deterioration of infected patients and screening for sepsis are critical for appropriate allocation of scarce medical resources in a pandemic such as the ongoing severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic. However, the existing method for the screening of sepsis using vital signs and laboratory examinations is limited in daily living situations and remote monitoring.

A low-cost and widely available method for screening patients with sepsis has important therapeutic implications. Electrocardiography (ECG) is a noninvasive test that can be monitored in real time, and diverse wearable and life-type devices have been developed for remote monitoring and transfer. In the SARS-CoV-2 pandemic, an ECG monitoring device was used to monitor patients [5]. In previous studies, approximately 50% of patients who were diagnosed with sepsis exhibited signs of myocardial dysfunction; furthermore, a prolonged duration and decreased amplitude of the QRS complex have been reported in sepsis patients [6,7,8,9]. Artificial intelligence (AI) technologies based on deep learning have been used in diverse medical domains, and a DLM has been applied for the diagnosis of heart failure, pulmonary hypertension, valvular heart disease, electrolyte imbalance, and anemia using ECG [10,11,12,13,14]. In contrast to conventional statistical methods, a DLM can diagnose or predict diseases by extracting possible implications from data and capturing nonlinear and subtle changes in an ECG [15]. In this study, we developed and validated a DLM for sepsis screening using ECG. And we confirmed the performance when using 12-, 6-, and single-lead ECGs to confirm the possibility of predicting sepsis in diverse ECG devices.

Methods

Study design and population

We conducted a retrospective multicenter cohort study in two hospitals. The study population included adult patients who were admitted to two hospitals and underwent at least one standard 10-s 12-lead ECG during the study period. We excluded individuals with missing ECG data. Data from the Sejong General Hospital (SGH) were used to develop and validate the DLM. The patients admitted to SGH during the study period (October 2016 to November 2020) were randomly split into development (70%) and internal-validation (30%) datasets (Fig. 1). Data from the Mediplex Sejong Hospital (MSH) during the study period (March 2017 to November 2020) were only used for external validation, confirming that the developed DLM was robust across different hospitals. There were no patients that had undergone treatment both at the SGH and MSH. The patients from the two hospitals were exclusively divided. As the purpose of the validation dataset was to assess the accuracy of the DLM, we used only one ECG from each patient for the internal and external validation datasets, the time closest to the sepsis time, which was confirmed by critical care medicine physicians.

Fig. 1
figure 1

Study flowchart

This study was approved by the Institutional Review Board (IRB) of SGH (2020–0541) and MSH (2020–149). Clinical data, including ECG, age, sex, admission note, vital signs, and laboratory examination results, were extracted from the electronic health records of both hospitals after anonymization. The IRBs of both hospitals waived the need for informed consent because this was a retrospective study using fully anonymized data, and thus, the possibility of harm to patients was unlikely.

Predictor variable

ECG was the only predictor variable. Digitally stored 12-lead ECG data had 500 data points per second (500 Hz) at each lead for 10 s. In other words, one ECG dataset has 60,000 values. We preprocessed the ECGs for sampling, normalization, and noise filtering. Because there were more artifacts at the beginning and end of the ECG, we removed 1 s of data at the beginning and end of the ECG. And we normalized (z-score) based on the mean and standard deviation. We conducted noise filtering for decreased artifact in ECG data and used band-pass filter for noise reduction. We also normalized the value of age and changed the value of sex to one-hot encoding. We also used augmentation, the addition of linear and nonlinear noise causing baseline changes was performed. We used 8-s data of each lead. We created a dataset using 12-, 6-, and single-lead ECG datasets. We created a 12-lead ECG dataset using 12-lead ECG data (12 × 4000). We also created 6- and single-lead ECG datasets from the partial datasets of the 12-lead ECG. The 6-lead ECG dataset was created from limb 6-lead (I, II, III, aVL, aVR, aVF) and a single-lead ECG dataset was created from lead I. We selected these leads because they can be measured using diverse wearable and life-type ECG devices.

Endpoints

The primary endpoint of this study was the presence of sepsis. Sepsis was defined as per the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3). Three critical care medicine physicians reviewed the medical records of the study population, including admission notes, laboratory examination results, vital signs, drug administration data, and rapid response team’s progression note, to label the presence and time range of sepsis. Septic shock was the secondary endpoint and defined based on Sepsis-3.

For the primary endpoint—sepsis in patients with suspected infectious disease—we labeled the ECG within and outside the time range of sepsis as sepsis and non-sepsis, respectively. Further, in patients who had no history of infectious diseases during hospitalization, we labeled all ECGs as non-sepsis. Similarly, for the secondary endpoint, namely septic shock, we labeled the ECG within the time range of septic shock as septic shock and the other ECG as non-septic shock.

Development of DLM for detecting sepsis using ECG

We developed a DLM based on a convolutional neural network (residual neural networks in particular) (Fig. 2). The residual neural network contained a skip connection to avoid the problem of vanishing gradients. In a residual block with four stages, two convolutional layers and two batch normalization layers were repeated. Furthermore, there were two flattened layers in the DLM. The last layer of the seventh residual block was connected to a flattened layer that is fully connected to a one-dimensional (1D) layer composed of neural nodes. The second fully connected 1D layer was connected to the output node, which was composed of two nodes. The value of the output nodes of the DLM represents the probability of endpoints. The output node of the DLM uses a softmax function as an activation function.

Fig. 2
figure 2

Architecture of DLM to screen sepsis using ECG. conv convolutional neural network layer; DLM deep-learning model; ECG electrocardiography; FC fully connected layer

Statistical analysis

At each input ECG of the validation data, the DLM calculated the probability of sepsis within the range of zero (non-sepsis) to 1 (sepsis). To confirm the accuracy of the DLM, we compared the probability calculated by the DLM with the presence of sepsis (ground truth) in the internal and external validation datasets. Thus, we used the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV). We confirmed the cut-off point from Youden’s J statistics in the development dataset. We then applied the cut-off point to validate the performance of internal and external validations [16]. As a comparative measure, we used C-reactive protein (CRP) and body temperature abnormality (difference between measured body temperature and 36.5 °C) to screen for sepsis and septic shock.

Continuous variables were presented as mean values (standard deviation, SD) and compared using the unpaired Student’s t-test or Mann–Whitney U test. Categorical variables were expressed as frequencies and percentages and were compared using the χ2 test. The exact 95% CIs were used for all measures of diagnostic performance, except for AUC. The CIs for the AUC were determined based on the Sun and Su optimization of the De-long method using the pROC package in R (R Foundation for Statistical Computing, Vienna, Austria). A significant difference in patient characteristics was defined as a two-sided p value < 0.05. Statistical analyses were performed using the R software, version 3.4. In addition, PyTorch’s open-source software library was used for the backend and Python (version 3.6) for the analysis[17].

Visualizing the developed DLM for interpretation

To compare the findings from the developed DLM with the current medical knowledge, we used a sensitivity map using the saliency method [18, 19]. The map shows the region with a significant effect on the decision of the DLM. The sensitivity map was computed based on the first-order gradients of the classifier probabilities with respect to the input signals; if the probability of a classifier is sensitive to a specific region of the signal, the region would be important in the decision of the DLM. Using this method, we verified that the region of the ECG was correlated with sepsis. We used a gradient class activation map (Grad-CAM) as the sensitivity map. We could not find a definite decision process for the developed deep-learning model. Instead, we calculated the importance of the selected variables. We also confirmed the variable importance of the ECG features in the conventional statistical method (logistic regression) and machine-learning methods (random forest and deep learning). We calculated the variable importance of logistic regression, random forest, and deep learning based on the difference in deviance, mean degreased Gini, and Garson’s relative importance, respectively.

Verifying DLM performance to predict in-hospital mortality among infectious disease patients

We hypothesized that the ECGs could display severity in infectious diseases and that the developed DLM would predict in-hospital mortality of patients with infectious diseases. In other words, we hypothesized that a high DLM score is correlated with a severe infectious disease. We conducted a subgroup analysis of patients with suspected infectious diseases in the internal and external validation datasets. We verified the in-hospital mortality prediction performance of the DLM with these patients. To confirm the accuracy of the DLM, we compared the score calculated by the DLM with the presence of in-hospital mortality in the subgroup datasets. For comparison, we used the sequential organ failure assessment (SOFA) score, quick SOFA score, National Early Warning Score (NEWS), Modified Early Warning Score (MEWS), lactate, white blood cell (WBC) count, and CRP to predict in-hospital mortality among infectious disease patients [20,21,22,23].

Results

The eligible study population included patients admitted to the SGH and MSH. As shown in Fig. 1, we excluded eight patients because of missing clinical information including that of ECGs, admission notes, and laboratory examination results. The study involved 46,017 patients, of which 1,548 and 639 patients had sepsis and septic shock, respectively. The DLM was developed using a development dataset of 73,727 ECGs from 18,142 patients from the SGH. The internal validation of the DLM performance was conducted using 7,774 ECGs from 7,774 patients from the SGH. External validation of the DLM was conducted using 20,101 ECGs from 20,101 MSH patients. The patients were divided into development, internal validation, and external validation groups. In patients with sepsis, the ECG had a rightward P-, R-, and T-wave axes, prolonged QTc, and tachycardia (Table 1).

Table 1 Baseline characteristics

During the internal and external validations, the AUC of the DLM for detecting sepsis, the primary outcome, using a 12-lead ECG was 0.901 (95% CI = 0.882–0.920) and 0.863 (95% CI = 0.846–0.879), respectively (Fig. 3 and Table 2). The AUC of the DLM for detecting septic shock using 12-lead ECGs during internal and external validations was 0.906 (95% CI = 0.877–0.936) and 0.899 (95% CI = 0.872–0.925), respectively. The AUC of the DLM for detecting sepsis using 6-lead and single-lead ECGs was 0.845–0.882, and the AUC of the DLM for detecting septic shock using 6-lead and single-lead ECGs was 0.881–0.906.

Fig. 3
figure 3

Performance of DLM for screening sepsis and septic shock using electrocardiography. AUC area under the receiver operating characteristic curve; ECG electrocardiography; NPV negative predictive value; PPV positive predictive value; SEN sensitivity; SPE specificity

Table 2 Performance of DLM for screening sepsis and septic shock using electrocardiography

A sensitivity map showed that the QT interval and T wave were associated with sepsis, and the variable importance of deep learning confirmed that prolonged QTc was associated with sepsis (Fig. 4). The logistic regression and random forest had different variable importance and showed that prolonged QTc, T axis, and QRS duration were important variables (Table 3).

Fig. 4
figure 4

Sensitivity map of septic shock patients

Table 3 Variable importance for detecting sepsis

Subgroup analysis was conducted using ECGs from 4,609 patients who were grouped into the validation dataset with infectious diseases. There were 256 in-hospital mortality cases in the subgroup study population. The AUC of the DLM using 12-, 6-, and single-lead ECG, SOFA, qSOFA, NEWS, MEWS, lactate, WBC, and CRP for predicting in-hospital mortality was 0.817 (0.793–0.840), 0.815 (0.794–0.836), 0.802 (0.780–0.825), 0.817 (0.786–0.847), 0.797 (0.767–0.828), 0.808 (0.777–0.839), 0.778 (0.747–0.808), 0.801 (0.758–0.844), 0.591 (0.552–0.630), and 0.541 (0.499–0.583), respectively, which outperformed other predictive models (Fig. 5 and Table 4).

Fig. 5
figure 5

Performance of DLM for predicting in-hospital mortality of patients with infectious disease. AUC area under the receiver operating characteristic curve; ECG electrocardiography; MEWS Modified Early Warning Score; NEWS National Early Warning Score; NPV negative predictive value; PPV positive predictive value; SEN sensitivity; SOFA sequential organ failure assessment; SPE specificity

Table 4 Performance of DLM for predicting in-hospital mortality of patients with infectious diseases

As shown in Fig. 6, there was a significant difference in the prediction score of the DLM using ECG according to the presence of infection in the validation dataset (0.277 vs. 0.574, p < 0.001). In patients with SARS-CoV-2, the same trend was observed in the prediction score of DLM using ECG before and after SARS-CoV-2 infection (0.260 vs. 0.725, p = 0.018).

Fig. 6
figure 6

Change of DLM’s prediction score according to infection. SARS-CoV-2 Severe acute respiratory syndrome coronavirus 2

Discussion

We developed a DLM for screening sepsis and septic shock using 12-, 6-, and single-lead ECGs and demonstrated reasonable accuracies for internal and external validations. We confirmed the performance of predicting in-hospital mortality in a subgroup analysis of patients with infectious diseases. We also identified the ECG regions and features associated with sepsis. To the best of our knowledge, this study is the first to develop a DLM for sepsis screening using ECG.

Approximately 50% of sepsis patients have cardiac dysfunction, which is a well-known risk factor associated with a significantly increased mortality rate of 20–50% [24]. Sepsis develops into cardiac dysfunction by decreasing the beta-adrenergic receptor components, which are mediated by inflammatory substances such as cytokines and nitric oxide [25]. Direct cardiomyocyte injury or death is caused by toxins and complications from sepsis. Cardiomyocyte apoptosis is the leading cause of cardiac dysfunction, followed by the downregulation of beta-adrenoreceptors and impairment of myofibril function owing to the disruption of calcium liberation. Because sepsis affects cardiac function through direct or indirect pathophysiology, we hypothesized that an ECG contains information for sepsis detection. Previously, Rich et al. showed that the QRS amplitude of sepsis was smaller than that of normal individuals [9]. However, conventional statistical methods, such as logistic regression, cannot develop diagnostic criteria for using these subtle changes and nonlinear correlations. ECG is affected by not only cardiac function but also other human factors. For example, a patient with fat and a larger body mass index has a lower ECG amplitude [26]. Madias et al. reported that the loss of QRS amplitude in the ECGs in patients with sepsis is not due to cardiac dysfunction but due to an extracardiac reason such as a reduction in the transfer impedance of the body volume conductor owing to water accumulation [27]. Recent studies have highlighted the possibility of using AI for interpreting an ECG. Using AI technologies based on a DLM, we could diagnose diseases that could not be diagnosed based on previous medical knowledge such as heart failure, valvular heart disease, pulmonary hypertension, anemia, and hyperkalemia [11,12,13,14, 28,29,30]. The most important aspect of deep learning is its ability to extract features and develop an algorithm using various types of data such as images, 2D data, and waveforms [15]. In this study, we developed a DLM for detecting sepsis and validated its performance based on external validation. DLM can also detect septic shock using a DLM prediction score. Previous studies have shown that inflammatory markers and infection are closely correlated with cardiac disease and ECG [31].

There has been enormous development in diverse wearable and lifestyle devices worldwide. There is already a base for remote diagnosis and treatment based on diverse biosensors and internet technologies. However, there are limitations in the biosignal interpretation by various wearable devices. ECG is an important biosignal for remote medical monitoring and treatment as it can be measured using diverse wearable devices and transferred to remote medical sites in real time. As a conventional statistical limitation, an ECG is only used for the diagnosis of arrhythmia and myocardial infarction. Based on current studies, AI technologies have enabled the diagnosis and prediction of diverse diseases using ECG. In the ongoing SARS-CoV-2 pandemic, such technologies are important for screening infectious diseases, monitoring patient status, and capturing the deterioration of patients. In this study, we highlighted the possibility of using DLMs for screening infectious diseases, including SARS-CoV-2, as shown in Fig. 6. The results were not definite evidence of SARS-CoV-2 screening via ECG. However, we wanted to demonstrate the possibility of developing deep learning for SARS-CoV-2 for other researchers. There is a need for studies on the use of AI for screening sepsis and septic shock. However, this study highlights the possibility of applying ECG to detect and monitor infectious patients. In this study, we confirmed that the performance was secured in six- and single-lead ECGs. Because of this, we showed the possibility of applying the deep-learning model to various lifestyle ECG devices and patch devices.

This study had some limitations. First, we validated the DLM using retrospective data; however, it is necessary to validate the DLM using prospective studies and real-time data. Studies related to the clinical significance of this new technology are required for its application in clinical practice. In our next study, we intend to verify the DLM performance and significance through a prospective study on daily clinical practice. And we plan to conduct research on deep-learning models for predicting the development and resolution of sepsis using ECG. We plan to conduct a prospective study to validate the performance of the deep-learning model as a screening and prognostic method. Second, this study was conducted in only two hospitals in Korea, and it would be helpful to validate the DLM in patients from other countries. Third, deep learning had a black box limitation owing to which we could not determine the exact decision-making process. Therefore, we could not confirm that our study findings represented correlation or causality. In our next study, we intend to develop a method for confirming the decision process and the causality of the deep-learning model. For the same reason, we could not know the exact features of the ECG that were used in deep learning. As technologies for explainable deep learning that could define the reason and feature are being developed, we can use this technology in our next study. Fourth, we conducted a retrospective study, and there could be confounders in this study. A prospective randomized controlled study is needed to exclude hidden confounders and confirm the exact clinical implications of deep-learning models for sepsis.

Conclusion

The DLM demonstrated accurate performance in detecting sepsis and septic shock using ECG. The results of the present study indicate that the application of AI technologies based on a DLM to an ECG could predict the development of sepsis in patients and enable the screening of diverse infectious diseases.