Background

Infective endocarditis (IE) is a disease in which pathogenic microorganisms such as bacteria or fungi attach to the endocardial tissue of the heart, causing inflammation and damage to the endocardium or valve tissue. The incidence of IE is 3–10 per 100,000 people, with an increasing trend in some places [1, 2]. In spite of various diagnostic tools and therapeutic measures such as antibiotics and surgery being developed, the in-hospital mortality rate due to IE remains at approximately 20% [3,4,5]. Individual patient characteristics, along with cardiac and non-cardiac comorbidities, the infecting microbial species and the echocardiographic disease status are the primary factors contributing to the poor prognosis of IE [6]. Early diagnosis is now recognized as critical to improving patient outcomes and reducing mortality associated with IE [7].

Due to the complex and unpredictable clinical characteristics and disease progression of IE, timely and accurate diagnosis is imperative to ensure that patients receive appropriate treatment during critical stages of the disease. However, there is currently limited evidence regarding the effective intensive care management of IE, including specific clinical features that are indicative of a poor prognosis, as well as predictive tools that are effective in identifying high-risk patients [8].

A nomogram is a common statistical tool used in medicine to predict the probability or risk of a certain outcome based on multiple patient factors. This tool is especially useful in clinical predictive modeling, as it visualizes the impact of each predictor on the outcome event. Clinicians can use this tool to make prognostic predictions more accurately [9, 10]. The objective of this study was to construct a nomogram utilizing data sourced from the public database MIMIC-IV, subsequently assessing its prognostic value in predicting the risk of in-hospital mortality among patients with IE who were admitted to the ICU.

Data and method

Data source

The information utilized in this research endeavor was sourced from MIMIC-IV 2.0, a publicly available archival system for extensive care medicinal data, which was established by the Massachusetts Institute of Technology (Cambridge, MA). The database containing hospitalization data for patients treated at Beth Israel Deaconess Medical Center (Boston, MA) between 2008 and 2019 exists as a comprehensive collection of patient information, and includes vital signs, laboratory test results, organ failure score, disease severity score, comorbidities, diagnosis, treatment, length of hospital stay, survival data and demographic information. The database contains patient information that has been de-identified, and informed consent from patients is not necessary. The authors of this study have completed the requisite training and certification to access the database.

Study population

From the MIMIC-IV database, a cohort of 925 patients diagnosed with IE and admitted to the Intensive Care Unit (ICU) was identified. Among them, 237 cases with repeated ICU admissions in a single hospitalization were excluded from the analysis. Patients over 90 years of age, those without basic laboratory test results, and individuals with hospitalization times exceeding 100 days were also excluded from the study. The final analysis consisted of 676 patients deemed eligible according to the pre-determined criteria (Fig. 1).

Fig. 1
figure 1

Flow diagram of patient selection (MIMIC IV, Medical Information Mart for Intensive Care IV)

Clinical variables

The raw data of the 676 selected patients was extracted using Structured Query Language (SQL) and PostgreSQL tools (version 9.6) through Navicat Premium software, based on their unique HADM_ID and ICUSTAY_ID. The extracted raw data comprised demographic characteristics (age, gender, race and weight), vital signs (heart rate, blood pressure, and temperature), blood gas analysis (lactate, SpO2, SpCO2 and anions gap), laboratory tests (e.g., white blood cells [WBC] and red blood cells [RBC] counts, hemoglobin, prothrombin time [PT], calcium, bicarbonate, glucose, lymphocytes), comorbidities (e.g., atrial fibrillation, myocardial infarction, chronic lung disease, heart valve disease, cerebrovascular disease), score systems (Acute Physiology Score III [APSIII], Simplified Acute Physiology Score II [SAPSII] and Sequential Organ Failure Score [SOFA]), clinical risk factors associated with IE (continuous renal replacement therapy [CRRT], blood culture results, the presence of embolism, history of prior cardiac surgery).

Statistical analysis

Indexes with a missing degree above 20% were excluded from this study, and remaining missing data were imputed using R packages “lattice (0.21-8)” and “mice (3.16.0)” for multiple imputation. A random allocation of participants into the training and validation cohorts was accomplished in a ratio of 7:3. More specifically, 70% of the study’s subjects were assigned to the training cohort, and the remaining 30% were allocated to the validation cohort to test the data. The median and interquartile range were used to express continuous variables, and the Wilcoxon rank-sum test was employed to compare two groups. Proportions were utilized to represent categorical variables, and the comparison between groups was analyzed by either the Chi-square test or Fisher’s exact test, depending on the context of the comparison. The methodology employed adheres to academic standards in statistical analysis. The variables in the training cohort were screened for mortality predictors using the Least Absolute Shrinkage and Selection Operator (LASSO) approach. The variable set for the logistic regression models was determined through the selection of lambda.1se using cross-validation [11, 12]. Subsequently, utilizing the selected set of variables, we constructed multiple logistic regression models, and nomograms were created using the ‘regplot (1.1)’ R package. The nomogram was validated using data from the validation cohort. The evaluation of the model’s performance was conducted by computing the area under the receiver operating characteristic (ROC) curve, commonly referred to as AUC. Additionally, we compared the AUC of our model with the APSIII and SAPSII scoring systems to evaluate its efficacy.

The utilization of the calibration curve aimed to evaluate discrepancies between the expected outcomes of the model and the factual observed measurements. The assessment was conducted to ensure the accuracy of the model’s predictions in an empirical manner. The Hosmer-Lemeshow goodness-of-fit test (HL test) was employed to determine whether the model’s estimation of the observed probability was consistent with the actual probability. The Integrated Discrimination Improvement (IDI) method was utilized to evaluate the overall improvement in the accuracy of predictions. Furthermore, to establish the clinical validity of the model, a Decision Curve Analysis (DCA) approach was employed. The statistical analyses were performed utilizing the R language (4.3.0), while the level of significance was established at a p-value of less than 0.05.

Results

Patient characteristics

The eligibility criteria were met by a cohort of 676 patients diagnosed with IE, who were subsequently divided into two cohorts, namely the training cohort (n = 473) and the validation cohort (n = 203), through a random sampling process. Table 1 provides a comprehensive overview of the demographic and clinical features of the respective cohorts. In the training cohort, the median age was 58 years (43, 68), and the sample consisted of 310 (65.54%) male and 163 (34.46%) female patients. The validation cohort had a median age of 57 years (40, 66), with 132 (65.02%) male and 71 (34.98%) female participants. The majority of patients in both groups were white (> 67%). In the training cohort, the median weight of the patients was 77.3 kg (67.5,92.0). While in the validation cohort, the patients had a median weight of 77.9 kg (66.2, 94.2). Apart from the prevalence of liver disease (p = 0.044) (Supplementary materials), baseline clinicopathological data did not significantly differ between the cohorts (p = 0.077-1).

Table 1 Baseline characteristics of the patients

Variable analysis and selection

During the cross-validation process of the LASSO regression, we used the mean square error (MSE) as the evaluation index to successfully screen seven significant independent variables from the initial pool of 54. We selected Lambda.1se and found that the coefficients of these variables were not equal to zero, indicating their significant contribution to the model’s prediction results (Fig. 2). In the multivariate logistic regression analysis, seven variables were incorporated and evaluated: lactate (OR: 1.182; 95% CI 1.084–1.297), bicarbonate (OR: 0.913; 95% CI 0.847–0.984), white blood cells (OR: 1.213; 95% CI 1.126–1.317), platelet (OR: 0.996; 95% CI 0.993–0.997), PT (OR: 1.074; 95%CI 1.009–1.163), APSIII score (OR: 1.014; 95%CI 1–1.029), and SAPSII score (OR: 1.020; 95% CI 0.994–1.048) (Fig. 3).

Fig. 2
figure 2

Using Lasso logistic regression model to select clinical variables. (A) Tuning parameter (λ) selection using LASSO penalized logistic regression with 10-fold cross-validation. (B) Cross validation plot for the penalty term

Fig. 3
figure 3

Results of the multiple logistic regression model based on LASSO regression. WBC white blood cell; PT prothrombin time; APSIII acute physiology score III; SAPSII simplified acute physiology score

Development of the nomogram

To mitigate the potential impact of multicollinearity on model accuracy, a new nomogram model was constructed by selecting the remaining five variables (Fig. 4), as the APSIII score and SAPSII score overlapped with other variables among the independent variables obtained through LASSO regression. Each factor in the nomogram was assigned an individual score based on its value, and the total score was calculated by summing these scores. The cumulative score derived from the aforementioned parameters was employed in forecasting the mortality of patients with IE during hospitalization in the ICU.

Fig. 4
figure 4

Nomogram for predicting the in-hospital mortality rate of patients with infective endocarditis. Each variable is represented by a vertical line drawn to its corresponding score. The scores for each variable are summed to obtain a total score, which corresponds to the predicted probability of in-hospital mortality rate at the bottom of the nomogram. WBC white blood cell; PT prothrombin time; *** means p < 0.001; * means p < 0.05

Evaluation of nomogram performance

Our nomogram model generated an AUC value of 0.843 (95% CI: 0.792–0.893) under the ROC curve for the training cohort, which is higher than both APSIII (0.764; 95% CI: 0.711–0.817) and SAPSII (0.764; 95% CI: 0.712–0.816). In the validation cohort, the AUC value is also 0.891(95% CI: 0.837–0.946), which is higher than APSIII (0.782; 95% CI: 0.698–0.867) and SAPSII (0.706; 95% CI: 0.617–0.796) (Fig. 5). With a Delong’s test, we have proven that the difference in AUC between our nomogram model and the APSIII and SAPSII scoring systems is statistically significant (Table 2). And calculating the IDI value demonstrates that our model outperforms the two scoring systems, indicating that our nomogram’s graphical model exhibits superior classification accuracy compared to the commonly used APSIII and SAPSII scores (Table 3). In the training and validation sets, the calibration curve closely approximates the ideal curve (45-degree diagonal line). This indicates a strong correlation between the predicted and observed values, signifying a robust fit (Fig. 6). The HL test (subgroups = 10) yielded a χ2 of 13.97 (p = 0.12) for the training cohort and 3.68 (p = 0.93) for the validation cohort. Finally, the DCA curve indicates that our model has good clinical validity in predicting mortality, as it represents a net benefit (Fig. 7). The DCA curve for the validation cohort indicated a net clinical gain of 8.37% when the prediction probability threshold was set to 20%. These results suggest that our model is more successful in predicting in-hospital mortality caused by IE.

Fig. 5
figure 5

The ROC curves and AUCs of APSIII, SAPSII and nomogram in the training cohort (A) and validation cohort (B). AUC, area under curve; APSIII acute physiology score III; SAPSII simplified acute physiology score

Table 2 The AUC of Nomogram, APSIII and SAPSII in training and validation cohort
Table 3 The IDI of Nomogram, APSIII and SAPSII in training and validation cohort
Fig. 6
figure 6

Calibration curves of the training cohort (A) and validation cohort (B) for the nomogram. The x-axis represents the nomogram-predicted probability, and the y-axis represents the actual probability of the nomogram. The diagonal line represents the perfect prediction of the ideal model. The dashed line represents the nonparametric calibration curve, the solid line represents the calibration curve of logistic regression

Fig. 7
figure 7

The DCA curve of nomogram, APSIII and SAPSII in the training cohort (A) and validation cohort (B). The y-axis measures the net benefit. DCA, decision curve analysis; APSIII acute physiology score III; SAPSII simplified acute physiology score

Discussion

The outcome of patients with IE is generally unfavorable, with a high mortality rate [13]. Thus, there is a pressing need for accurate prediction models to identify high-risk patients at an early stage. In this study, we successfully developed a nomogram model for the prediction of in-hospital mortality in IE patients admitted to the ICU, and assessed its effectiveness. Through the utilization of LASSO regression analysis on the data of MIMIC IV database, we identified several significant indicators, namely lactate, bicarbonate, white blood cell count (WBC), platelet count, prothrombin time (PT), APSIII score, and SAPSII score, that were associated with adverse outcomes in this particular patient cohort.

The APS score is a component of the Acute Physiology and Chronic Health Evaluation (APACHE) score. Within APS III, there are 17 physiological variables assessed, including temperature, mean arterial pressure, heart rate, respiratory rate, PaO2 or A-aDO2, pH or HCO3, Na+, K+, Cr, hematocrit, WBC, blood urea nitrogen, urine output, serum albumin, bilirubin, glucose, and Glasgow Coma Scale (GCS) score [14]. On the other hand, the SAPS II is a tool that incorporates age, admission type (planned surgery, unplanned surgery, or medical), underlying disease variables (acquired immune deficiency syndrome, metastatic cancer, and hematological malignancy), and 12 physiological variables (such as heart rate, systolic blood pressure) into its scoring system [15]. Both APS III and SAPS II scores are commonly used in assessing the risk of mortality in patients [16, 17]. However, due to the overlap of these scoring systems and other variables in this study, the APS III and SAPS II scores were excluded to maintain model accuracy.

Based on these findings, a nomogram prediction model was developed with the aim of providing clinicians with a practical tool to predict in-hospital mortality in patients with IE who are receiving intensive care. By employing this nomogram, physicians can gain a comprehensive understanding of the patient’s condition and identify individuals at high risk during the early stages, enabling the formulation of more personalized treatment plans. Additionally, nomograms can serve as initial assessment tools for patient admission, establish baseline data for patients, and provide references for subsequent treatment and care. To facilitate clinical use, we developed a dynamic nomogram based on this model and made it available on the website, enhancing the user-friendliness of the nomogram prediction model.

Our study found a correlation between lactate elevation and patient mortality. Elevated levels of lactate serve as a crucial prognostic indicator in the evaluation of patients [18], hyperlactatemia strongly associated with poorer prognosis [19]. The mechanisms by which hyperlactatemia occurs have been debated, but irrespective of these mechanisms, numerous studies have demonstrated lactate as a marker for disease severity [20], and there exists a positive correlation between lactate concentration, disease severity and mortality [21, 22]. Bicarbonate serves as a vital regulator of body fluids and acid-base homeostasis, supporting essential physiological processes. However, metabolic acidosis is often observed in patients admitted to Intensive Care Units (ICUs), leading to reduced levels of bicarbonate [23, 24]. Such alteration in bicarbonate levels has been linked to a higher incidence of adverse patient outcomes, including increased morbidity and mortality rates [25].

Coagulation is at the core of IE, with inflammation capable of disrupting the proper balance between the coagulation and immune systems [26]. This disruption leads to the generation of thrombin, which activates platelets and generates fibrin. Fibrin seals infected tissue to prevent further spread of bacteria [27], while platelets with immunoglobulin receptors and pattern receptors can also kill bacteria [28]. Research has revealed that the depletion of platelets can worsen outcomes in animal models of IE [29,30,31]. Additionally, thrombocytopenia, which is the reduction of platelet counts, is correlated with longer hospital stays, an increased incidence of major bleeding events, and higher in-hospital mortality rate among individuals with septic shock [32]. While platelets could contribute to bacterial adhesion to heart valves that result in vegetation, their involvement in clearing bacteria at a later stage may be more beneficial. The inflammatory response that accompanies IE can cause activated endothelial cells and leukocytes to release several tissue-active factors, which can activate the extrinsic coagulation pathway and prolong PT [26]. Patients with severe infections commonly exhibit elevated white blood cell counts (WBC), and the severity of their illness correlates with the degree of elevation of their WBC counts [33].

The study analyzed several known risk factors that could affect the prognosis of IE patients, such as CRRT, blood culture results, embolism symptoms and prior history of cardiac surgery [34], but none of these variables were incorporated into the model. This situation can be attributed to the requirement of producing a practical and reliable model. Therefore, Lambda.1se was utilized as the cut-off point for variable selection during LASSO regression to avoid overfitting the model with too many selected variables.

In clinical practice, several scoring systems, including ANCLA, PALSUSE, DeFeo, RISK-E and EndoSCORE have been employed to evaluate the prognosis of patients with IE [35,36,37,38,39], these scoring systems mainly focus on the postoperative prognosis of IE surgery patients. This approach is inadequate, as only approximately 51% of IE patients undergo surgical intervention, as shown in EURO-ENDO 2019 registry data [40]. Our study exhibits a significant advantage due to the comprehensive inclusion of data pertaining to all patients diagnosed with IE who were admitted to the ICU, as opposed to solely concentrating on those necessitating surgical intervention. In addition, the variables present in our model comprise of objective indicators widely utilized in routine clinical practice, hence facilitating the process of data acquisition.

Our study also has several limitations. First, echocardiographic findings could not be included in the analysis as the MIMIC IV 2.0 database did not have such imaging data. Second, the data was gathered solely from a single medical center, which may hinder the generalizability of the findings to a broader population. Moreover, we only performed internal validation of the model, indicating the need for further research involving external validation to consider other factors and improve the model’s validation.

Conclusions

We developed a practical nomogram model based on laboratory results, primarily comprising lactate, bicarbonate, WBC, platelet, PT. Our model demonstrates a precise estimation of in-hospital mortality among patients with IE within the ICU setting. The objective of the model is to assist physicians in making reasonable assessments and treatments, resulting in an improved survival rate of patients while hospitalized.