Introduction

Cardiogenic shock (CS), a state of circulatory failure, can occur due to acute ischaemic or non-ischaemic cardiac events, or from the progression of longstanding underlying heart disease [2,3,4]. Unfortunately, despite recent advances in pharmacological intervention or mechanical support, CS mortality remains unacceptably high and highly varied, with the 30-day mortality ranging from 50 to 90% [5]. The disparity in the mortality rates might imply CS patients are a heterogeneous population, and some phenotypes of CS are so different in clinical features and prognoses that they cannot be regarded as a whole population, both in clinical practice and research. Additionally, most attempts at staging CS have been based on expert opinions and consensus [6,7,8,9]. To avoid complexity, some of these classification systems only use few variables and depend on specific, although arbitrary, cutoffs, which may introduce bias and fail to capture the full variability of patient profiles [10]. Furthermore, the traditional logistic regression (LR) method has been used to develop most of these classifications, despite the fact that the predictors do not interact linearly and additively [11]. Therefore, a more accurate and granular classification of the CS spectrum is urgently needed to aid in the urgent and critical task of selecting proper management, including targeting the most appropriate candidates for advanced therapies.

Machine learning (ML) algorithms have become more commonly utilized in individualized medicine to support clinical decision-making as electronic medical records and artificial intelligence have advanced [12, 13]. Consensus clustering, an unsupervised ML approach, is used to find similarities and differences among numerous variables, and then allocate them into distinct phenotypes. Consensus clustering generates multiple clustering results by multiple iterations and merges these results to arrive at the final clustering result [14]. Additionally, consensus clustering can provide a visual display of multiple clustering results to help understand the clustering process and results, enhancing the interpretability of the algorithm. Recent studies have reported that ML consensus clustering approach may distinguish clinically distinct disease phenotypes such as cardiovascular diseases [9, 15].

Given the heterogeneity of patients with CS on admission [16], we aimed to identify clinically meaningful phenotypes of patients with CS using an unsupervised ML approach and to assess mortality risks among these distinct clusters.

Methods

Study design and data resource

We conducted a retrospective multi-center analysis using all the relevant data extracted from the electronic Intensive Care Unit (eICU) Collaborative Research database. The Database was a comprehensive ICU database for more than 200,000 admissions from over 200 hospitals across the USA between 2014 and 2015 [17]. We finished the “Protecting Human Research Participants” curriculum and obtained permission to access the dataset (authorization codes: 33,281,932). The establishment of the eICU database was approved by the Institutional Review Boards of the Massachusetts Institute of Technology (Cambridge, Massachusetts, USA). All the data were anonymized prior to research analyses by the eICU program, and hence the requirement for informed consent was waived. The study adhered to the ethical standards set forth in the 1964 Declaration of Helsinki and its later amendments.

Patient selection

We included all critically ill patients with a primary diagnosis of CS using International Classification of Diseases, Ninth Revision (ICD-9) diagnosis codes from the eICU database (ICD-9 codes:758.51 and R57.0). Patients were excluded if they had: [1] the age of less than 18; [2] multiple ICU admissions; [3] a length of stay in the ICU less than 24 h; and [4] incomplete information about study outcomes (all-cause in-hospital mortality, all-cause ICU mortality, and the incidence of acute kidney injury [AKI] after admission). For patients with multiple admissions, we retained information only on the patient’s first admission to the ICU.

Data extraction and processing

Demographic data, medical history, vital signs, laboratory data, scoring systems, treatment information, and others were retrieved from the eICU database using structured query language with PostgreSQL (version 13.6, www.postgresql.org). We only used data that was present within 24 h of ICU admission for clustering analysis since our aim was to phenotype CS patients based on available data at the time of ICU admission. If patients received vital signs measurement or laboratory tests more than once on the first day of admission, only the initial test results were considered for subsequent analyses.

The extracted variable included: [1] demographics: age, gender, ethnicity and body mass index (BMI); [2] ICU type: cardiac intensive care unit (CICU), cardiac surgery intensive care unit (CSICU), medical intensive care unit (MICU), surgery intensive care unit (SICU), cardiac care unit-cardiac trauma/surgical intensive care unit (CCU-CTICU), neuro intensive care unit (NICU), cardiac trauma intensive care unit (CTICU); [3] Medical history: myocardial infarction, coronary artery bypass grafting (CABG), percutaneous coronary intervention (PCI), pacemaker, congestive heart failure, cardiac arrhythmias, hypertension, peripheral vascular disease, chronic obstructive pulmonary disease (COPD), respiratory failure, stroke, neurologic disorders, diabetes, anemia, lymphoma, liver disease, peptic ulcer, metastatic cancer, rheumatoid arthritis, hypothyroidism, and acquired immunodeficiency syndrome (AIDS); [4] vital signs: systolic blood pressure (SBP), diastolic blood pressure (DBP), mean blood pressure (MBP), heart rate, respiratory rate, temperature, and oxygen saturation measured by pulse oximetry (SpO2); [5] laboratory findings: white blood cell (WBC) count, red blood cell (RBC) count, platelet count, red blood cell distribution width (RDW), blood urea nitrogen (BUN), creatinine, estimated glomerular filtration rate (eGFR), glucose, total protein, albumin, bilirubin, total calcium, potassium, sodium, chloride, and bicarbonate; The eGFR was calculated using the modification of diet in renal disease (MDRD) formula [18]; [6] prognostic scoring system: systemic inflammatory response syndrome (SIRS) score, Sequential Organ Failure Assessment (SOFA) score, acute physiology score III (APS III) and Acute Physiology and Chronic Health Evaluation IV (APACHE IV) sccore; [7] Treatment information: PCI, CABG, intraaortic balloon pump (IABP), mechanical ventilation, renal replacement treatment (RRT), and vasopressor use (dopamine, epinephrine, norepinephrine, or vasopressin). The definition of vasopressor use was at least one vasopressor was used during the first 24 h after admission.

Endpoints

The study endpoints of our study included all-cause in-hospital mortality, all-cause ICU mortality, and the incidence of AKI after admission. KDIGO (Kidney Disease: Improving Global Outcomes) criteria were taken as the definition of AKI [19]. KDIGO criteria are as follows: increase in serum creatinine to ≥ 1.5 times baseline must have occurred within the prior 7 days, or a ≥ 0.3 mg/dl increase in serum creatinine occurred within 48 h, or urine volume < 0.5ml/kg/h for 6 h or more. The baseline serum creatinine was determined by using the minimum serum creatinine values available within the 7 days before admission. If the pre-admission serum creatinine was not available in the eICU database, the first serum creatinine measured at admission was used as the baseline serum creatinine.

Management of missing data

Variables with more than 20% missing values were excluded since large amounts of missing data might cause bias. Correspondingly, for variables with fewer than 20% missing values, multivariable imputation was applied, which was based on 5 replications and a chained equation approach method. Additionally, the extreme values were not omitted and treated as missing data for imputation [20].

Cluster analysis

We applied an unsupervised ML approach to consensus clustering to identify clinical phenotypes of ICU patients with CS. To prevent producing an excessive number of clusters that would not be clinically helpful, we employed a pre-specified subsampling parameter of 80% with 100 iterations and assigned the number of potential clusters (k) to vary from 2 to 8 in sequence with the K-means clustering algorithm. The optimal number of clusters was determined by cumulative distribution function (CDF) plot, delta area plot, consensus matrix (CM) heat map, cluster-consensus plot in the within-cluster consensus scores, and the proportion of ambiguously clustered pairs (PAC) analysis [21]. Pairwise consensus values, defined as ‘the proportion of clustering runs in which two items are grouped together’, are calculated and stored in a CM for each k. Then for each k, a final agglomerative hierarchical consensus clustering using distance of 1 − consensus values is completed and pruned to k groups, which are called consensus clusters. The within-cluster consensus score, ranging from 0 to 1, is defined as the average consensus value for all pairs of individuals within the same cluster [22]. A value closer to 1 indicates better cluster stability [22]. PAC is calculated as the proportion of all sample pairs with consensus values falling within the predetermined boundaries [21, 22]. A value closer to zero indicates better cluster stability [21].

Statistical analysis

After we identified the clusters of CS patients, we performed analyses to test the differences among the clusters. Data were presented as mean ± standardized differences (SD) and compared between groups using a Student’s t test if the measurement data were normally distributed and the variance was homogeneous. If the requirements were not satisfied, data were expressed as median interquartile range (IQR), and the Kruskal Wallis rank test was used for comparisons between groups. Numeration data were reported as absolute numbers and percentages, with statistical analysis using Pearson’s χ2 test or Fisher’s exact test as appropriate.

We determined the clusters’ key features using an absolute standardized mean difference (SMD) of > 0.3 in reference to Thongprayoon’s studies [23,24,25]. We then compared outcomes among the identified clusters. We assessed the association of clusters with CS and in-hospital mortality, ICU mortality, and the incidence of AKI after admission using the LR model. Cluster 1 is taken as the reference group in the further analysis. The extracted variables were not incorporated into the LR analysis because these characteristics were used to identify clusters through unsupervised ML. We performed all analyses using R, version 4.0.5 (RStudio, Inc., Boston, MA, USA; http://www.rstudio.com/), with the package of Consensus ClusterPlus (version 1.54.0) for consensus clustering analysis [22].

Result

Identification of the optimal number of clusters

Out of 34,682 ICU admissions, 21,925 patients with the diagnosis of CS were enrolled in the final cohort (Fig. 1). The CDF plot shows each cluster’s consensus distributions (Fig. 2A). The delta area plot displays the relative change in the area under the CDF curve, and the largest changes in area occurred between k = 2 and k = 5 (Fig. 2B). The CM heat map demonstrated that the ML algorithm identified cluster 2 and cluster 3 with clear boundaries, indicating good cluster stability over repeated iterations (Fig. S2A). As shown in the cluster-consensus plot, k = 2 and k = 3 had high stability given their high mean cluster consensus score (Fig. S2B). Additionally, favorable low PACs were demonstrated for 2 clusters (Fig. S1). Summarily, the ML consensus clustering approach from baseline characteristics on admission identified 2 clusters that best represented the data.

Fig. 1
figure 1

Flow diagram of patient inclusion and overview of the statistical analysis. Abbreviation: CS: cardiogenic shock; ML: machine learning; eICU: electronic Intensive Care Unit; ICD-9: International Classification of Diseases, Ninth Revision; CDF: cumulative distribution function; CM: consensus matrix; PAC: proportion of ambiguously clustered pairs; SMD, standardized mean differences; LR: logistic regression

Fig. 2
figure 2

(A) CDF plot; (B) Delta area plot. Abbreviation: CDF: cumulative distribution function

Selection of Key features of the clusters

Figure S3 shows missing rate for clinical and laboratory variables extracted from the database. Cluster 1 had 9,848 (44.9%) patients, while Cluster 2 had 12,077(55.1%) patients. As shown in Table 1, the clinical characteristics of the two identified clusters in the CS cohort were significantly different. Ages at presentation were 70 (59–80) years for Cluster 1 cohort, and 65 (63–76) years for Cluster 2, whereas male sex represented 53.8% and 51.2% of patients in these two cohorts, respectively. On the basis of the |SMD|>0.3, the key features of patients in Cluster 1, compared with Cluster 2, included: lower SBP, lower MBP, lower DBP, lower eGFR, higher BUN, higher creatinine, lower albumin, higher potassium, lower bicarbonate, lower RBC count, higher RDW, higher SOFA, higher APS III score, and higher APACHE IV score (Fig. 3 and Table 2).

Table 1 Baseline characteristics of the clusters
Fig. 3
figure 3

The SMD for each of baseline characteristics across clusters. Abbreviation: SMD: standardized mean differences; BMI: body mass index; CICU: cardiac cardiac intensive care unit; CSICU: cardiac surgery intensive care unit; MICU: medical intensive care unit; SICU: surgery intensive care unit; CCU-CTICU: cardiac care unit-cardiac trauma/surgical intensive care unit; NICU: neuro intensive care unit; CTICU: cardiac trauma intensive care unit; CABG: coronary artery bypass grafting; PCI: percutaneous coronary intervention; COPD: chronic obstructive pulmonary disease; AIDS: acquired immunodeficiency syndrome; SBP: systolic blood pressure; DBP: diastolic blood pressure; MBP: mean blood pressure; SpO2: oxygen saturation measured by pulse oximetry; WBC: white blood cell; RBC: red blood cell; RDW: red blood cell distribution width; BUN: blood urea nitrogen; eGFR: estimated glomerular filtration rate; SIRS: systemic inflammatory response syndrome; SOFA: Sequential Organ Failure Assessment; APS III: acute physiology score III, APACHE IV: Acute Physiology and Chronic Health Evaluation IV; IABP: intraaortic balloon pump; RRT: renal replacement treatment

Table 2 Selection of key characteristics of the clusters

Association of clusters with endpoints

The in-hospital mortality, ICU mortality, and incidence of AKI for Cluster 1 patients were 25%, 16%, and 66% respectively, and were 11%, 6%, and 49% for Cluster 2 patients, respectively (P < 0.001 for all) (Fig. 4 and Table S1). The results of LR analysis showed that the cluster 2 was associated with lower in-hospital mortality (odds ratio [OR]: 0.374; 95% confidence interval [CI]: 0.347–0.402; P < 0.001), ICU mortality (OR: 0.349; 95% CI: 0.318–0.382; P < 0.001), and incidence of AKI after admission (OR: 0.478; 95% CI: 0.452–0.505; P < 0.001) (Fig. 5).

Fig. 4
figure 4

(A) In-hospital mortality, (B) ICU mortality, and (C) Incidence of AKI after admission among different clusters. Abbreviation: AKI: acute kidney injury; ICU:intensive care unit

Fig. 5
figure 5

The forest plot of OR (95% CI) for (A) In-hospital mortality, (B) ICU mortality, and (C) Incidence of AKI after admission. Abbreviation: OR: odds ratio; CI: confidence interval; AKI: acute kidney injury; ICU:intensive care unit

Discussion

Major findings

The unsupervised ML consensus clustering approach provides the ability to more efficiently analyze, identify, and classify phenotypes of patients on admission based on large amounts of data. In this study, two distinct clusters of critically ill patients with CS were determined by applying the consensus clustering algorithm. Blood pressure (SBP, MBP, and DBP), kidney function (creatinine, BUN, and eGFR), electrolytes and acid-base compounds (potassium and bicarbonate), liver function (albumin), RBC-associated indicators (RBC count and RDW), and some scoring systems (SOFA, APS III, and APACHE IV) were the key features used to differentiate the phenotypes of CS. In addition, these two clusters were also associated with multiple study endpoints including in-hospital mortality, ICU mortality, and the incidence of AKI after admission. A more accurate and granular classification could deepen our understanding of CS pathophysiology, be introduced into clinical practice as a risk assessment tool, and provide participant selection information for clinical trials.

Relation to other works

Several prognostic classifications or risk stratifications of CS have been reported. For example, with regard to hemodynamic phenotypes of CS, patients are generally classified into 4 phenotypes based on cardiac output (i.e., insufficient [cold] versus sufficient [warm]) and volume status (i.e., overloaded [wet] versus euvolemic [dry]) which reflect tissue perfusion and congestion, respectively [26, 27]. This classic “cold and wet” profile is the most frequent CS phenotype, accounting for nearly two-thirds of patients with MI-associated CS [28]. Based on 6 variables with a maximum of 9 points, there are three risk categories in the IABP-SHOCK II score [29]. Patients in the low, intermediate, and high risk categories have an in-hospital mortality risk of 20–30%, 40–60%, and 70–90%, respectively. The recently proposed Society of Cardiovascular Angiography and Interventions (SCAI) staging, describing stages of CS from A to E, provides discriminatory potential for morbidity and mortality [6]. It can be used to track the severity of shock over the course of a hospital stay. However, it was noted that some of these classification tools are based on expert consensus and theoretical considerations rather than on clinical evidence. To avoid complexity, some of these classifications contain only a few characteristics and depend on specific, although arbitrary, cutoff values that could result in bias and fail to capture the full variability of patient profiles. Additionally, some continuous variables in the classification were changed into categorized variable, which might cause a loss of information on between-subject variability. Furthermore, most of these classifications, using the traditional LR method, were developed assuming that the predictors interact in a linear and additive way, despite the reality that the interactions are often non-linear and multifactorial [11].

To address these limitations above, clustering analysis was used in this study to capture the natural structure of multivariate data without a priori knowledge and it has been applied extensively in medical science, for example, to identify clinical phenotypes [30]. It can also treat multiple variables independently and continuous variables as continuous. Zweck et al. [9] used machine learning, and identified 3 distinct CS phenotypes (“Noncongested” CS, “Cardiorenal” CS and “Cardiometabolic” CS), with specific and reproducible associations with mortality. However, their study and ours differ in terms of the study cohort, sample size, and statistical methods. Additionally, multiple endpoints (in-hospital mortality, all-cause ICU mortality, and the incidence of AKI) were set in our study. AKI, which is reflected by a rise in serum creatinine and a potential reduction in urinary output, may indicate renal hypoperfusion in the setting of CS and is associated with poor outcomes [31]. In the current study, 2 phenotypes were identified. Compared with those in Cluster 2, patients in Cluster 1 had worse hemodynamic and metabolic parameters, lower scoring systems, and worse clinical outcomes, which indicated they were more likely to suffer from multisystem organ failure [4, 29].

Through calculating the SMD of each variable, we determined that SBP, MBP, DBP, eGFR, BUN, creatinine, albumin, potassium, bicarbonate, RBC, RDW, SOFA, APS III, and APACHE IV were the key features between clusters. Some of these indicators have been found to be associated with risk of mortality in CS. A creatinine of greater than 1.33 had significantly higher mortality in the Intra-aortic Balloon Pump in CS (IABP-SHOCK II) trial [32]. Serum bicarbonate, especially when evaluated in the early-stage course of CS patients, could offer information regarding prognosis. Wigger et al. [33] found that serum bicarbonate decreased prior to significant elevation of lactate. A low bicarbonate level shows the better ability to predict 30-day mortality than the highest recorded lactate level. One recent study has reported that higher RDW is associated with an increased risk of all-cause mortality in critically ill patients with CS [34]. There is mounting evidence that the development of SIRS plays an important role in the pathogenesis of CS. Pierce et al. [35] found that inflammatory cytokines might cause an increase in RDW by affecting iron metabolism and inhibited bone marrow. Additionally, CS can cause activation of the renin-angiotensin system, which leads to an increase in RDW with erythropoiesis [36]. Multiple scoring systems derived from the ICU population have been proposed to predict clinical outcomes in CS. A small study comparing the APACHE-II, APACHE-III, SAPS-II, and SOFA scoring systems in CS reported that APACHE-III and SAPS-II had the best mortality discrimination [10]. The latest version of APACHE-IV is calculated based on 129 variables derived within the first 24 h of ICU admission, which was assessed from over 110,588 patients admitted to more than 104 ICUs across the USA [37, 38]. The application of the APACHE IV score is limited due to its complexity. However, as data science advances, the complexity of these scores could be overcome by electronic recording techniques and computing power.

Clinical implications

The strengths of our study include innovative findings via an unsupervised ML consensus clustering approach derived from a large sample size consisting of a multi-center population of ICU patients with CS covering a broad spectrum of etiologies. The identified clusters of CS may be used by clinicians in the ICU to quickly assess patients with CS, as the key features identified in this study are rapid, easy, and inexpensive laboratory tests. These clusters may enhance clinical trials by developing treatment strategies tailored to a shock phenotype instead of aiming for a one-size-fits-all solution, thereby paving the way for more individualized health care. This new classification system of different shock states will also help to make different trials of CS better comparable and may also trigger new randomized trials on the pre-shock state.

Limitations

There were several limitations to our current study. First, due to the retrospective nature of this study, future studies may collect comprehensive data in a prospective manner and allow for enhanced, even more nuanced examination of the CS phenotypes. Second, in the eICU database, values for some important variables, including lactate, brain natriuretic peptide, and some advanced hemodynamic monitoring parameters, were documented incompletely and not included for currentanalysis. Third, restricted by the eICU database, the etiology of CS has not been identified accurately. Future studies should attempt to conduct subgroup analyses based on different causes of CS. Fourth, consensus clustering was performed on hospital admission and did not include data before or during hospitalization, which could affect hospitalization-related outcomes. Lastly, our classification tool only enrolled the variable of CS at the early stage, and cannot evaluate the severity and progression of CS dynamically. Therefore, in the future, we will study the association between ML-derived phenotypes and endpoints within individual SCAI stages, whose aim is to characterize disease severity as it evolves over the course of a hospital stay.

Conclusions

ML consensus clustering analysis identified distinct clusters of hospitalized CS. Blood pressure (SBP, MBP, and DBP), kidney function (creatinine, BUN, and eGFR), electrolytes (potassium and bicarbonate), liver function (total protein and albumin), RBC-associated indicators (RBC count and RDW), and some scoring systems (SOFA, APS III, and APACHE IV) were the key features used to differentiate the phenotypes of CS upon admission. Furthermore, the distinct phenotypes of CS have differing in-hospital mortality, ICU mortality, and the incidence of AKI after admission. Accordingly, these findings may help with risk classification and the design of treatment algorithms tailored to each phenotype of CS, as well as inform participant selection in future clinical trials.