Introduction

Pyogenic liver abscess is a purulent lesion caused by the invasion of pathogenic bacteria into the liver. The incidence of PLA varies slightly in various regions of the world and is increasing yearly [1]. The incidence rate in European and American countries was about (1.0 ~ 4.1)/100,000, the annual incidence rate in some Asian countries was (12 ~ 18)/100,000, and the annual incidence rate in mainland China was about (1.1 ~ 5.4)/100,000 [1,2,3,4]. Incidence is higher in males than females (3.3/100,000 vs. 1.3/100,000) [5]. Although the diagnosis and cure rates of PLA had improved significantly with the development of medical technology, the mortality rate was still around 10% [6]. In China and throughout the Asia–Pacific region, the primary pathogen of PLA is Klebsiella pneumoniae [7], which is prevalent in diabetic patients [4]. Diabetic patients with Klebsiella pneumoniae liver abscess are likelier to develop the invasive syndrome, IKPLAS [8]. IKPLAS refers to Klebsiella pneumoniae liver abscess with metastatic infection such as lung abscess, endophthalmitis, meningitis, necrotizing fasciitis, etc.IKPLAS has the characteristics of acute onset, rapid disease progression, and lack of specific clinical manifestations. If patients are not diagnosed and treated in time, the prognosis is generally poor [9]. Although there have been some studies on IKPLAS in the past, most of them are case reports [10,11,12,13], and there is no relevant literature report on its clinical prediction model.

Compared with traditional statistical methods, machine learning, as a branch of artificial intelligence, can analyze and obtain rules from existing data and continuously improve and build models based on algorithms and data [14]. Furthermore, it shows apparent advantages in clinical diagnosis and prognosis prediction [15, 16]. This study compared the performance of seven different machine learning methods in predicting the progression of the invasive Klebsiella pneumoniae liver abscess syndrome. Then, a model that can effectively identify high-risk patients is selected, which can help clinical decision-making and provide new perspectives for research in this field.

Materials and methods

Patients pre‑processing

This study included patients with diabetes and Klebsiella pneumoniae liver abscesses admitted to Changzhou First People's Hospital from January 1, 20,15 to December 31, 2021. The inclusion criteria were (1) Imaging showed liver abscess, and the puncture fluid or microbial blood culture was Klebsiella pneumoniae. (2) Diabetes diagnosis was based on the "Chinese Guidelines for the Prevention and Treatment of Type 2 Diabetes, 2020 Edition". The exclusion criteria were (1) Patients who died on admission. (2) Patients are automatically discharged or referred midway through. (3) Liver abscess secondary to primary or metastatic liver tumors. (4) Patients with abnormal coagulation function, platelet count, or dysfunction in the past. (5) The age is less than 18 years old. The primary observation was a diagnosis of IKPLAS during hospitalization. The diagnostic criteria of IKPLAS were liver abscess caused by Klebsiella pneumoniae and metastatic infection such as lung abscess, endophthalmitis, meningitis, necrotizing fasciitis, etc. The diagnosis of IKPLAS was judged by two physicians with senior professional titles in the clinic. Both physicians needed to be diagnosed with IKPLAS before establishing the diagnosis. Secondary observation indicators include general information (such as age, gender, comorbidities, etc.), the first laboratory (blood routine, liver and kidney function, etc.) and imaging (abdominal B-ultrasound) related indicators, treatment plans, etc. after admission. Among them, the medical history collection and routine blood test were collected on the day of admission, and the results of the first examination after admission by abdominal B-ultrasound, the treatment plan, and the prognosis were collected retrospectively after the patients were discharged from the hospital.

Data pre-processing

Statistical analysis was performed using EmpowerStats software and Python 3.9, and the procalcitonin with too many missing values (number of missing values ≥ 30%) was deleted. Multiple imputations were performed for C-reactive protein, triglyceride, and cholesterol with a few missing values (number of missing values ≤ 30%) using the miceforest package in Python. Since different indicators are not comparable due to their different dimensions, we use the Z-score method to standardize continuous variables. The formula is:

$$z=\frac{\chi -u}{\sigma }$$

Where μ is the average of the continuous variable across all samples, and α is the standard deviation. The influence of dimensions on the data can be eliminated after data standardization. K-S-L test and Q-Q plot were used to test the normality of measurement data. The binary variables were described as counts, and percentages were evaluated using the Chi-square test or Fisher’s exact test. If the continuous variables conformed to a normal distribution, they were compared using a t-test and expressed as means ± SEM. For a non-normal distribution, the Mann–Whitney U test was used. P < 0.05 was considered statistically significant.

Model training and evaluation

This research uses the python3.9 version, anaconda3 integrated development environment. Based on the train_test_split module, the parameter is set to test_size = 0.3, and the complete data is divided into a training set of 149 cases and a test set of 64 cases by stratified random sampling in a ratio of 7:3. This study used recursive feature elimination (RFE) for feature screening [17]. RFE can effectively eliminate the redundancy between features and select the optimal feature combination. It takes the prediction accuracy as the evaluation standard and eliminates the minimum relevant variables through each iteration. Then cross-validation is used to find the optimal number of features. In this study, random forest (RF) was used as the primary classifier for RFE, and feature selection was performed on the training set. The Scikit-learn python software package was used to build seven machine learning prediction models. The logistic regression model(LR) [18] was selected for the linear model. The Multilayer Perceptron (MLP) [19] model, also called artificial neural network (ANN), was chosen as an essential nonlinear prediction model. For the kernel-based model, Support Vector Machine (SVM) [20] with Gaussian kernel (RBF) was selected.For the decision tree approach, the random forest(RF) [21] model,the Decision Tree (DT) [22]model and the XGBoost(XGB) [23]model have also been used in clinical research. Finally, we chose a basic prediction model, the K-Nearest Neighbor algorithm (KNN) [24]. After the model was established, Bayesian optimization algorithm was used to find the maximum model Area Under Curve(AUC) value according to the Settings for parameter optimization. The specific optimized parameters were the C value of LR model, max_depth, min_samples_split, min_samples_leaf, min_weight_fraction_leaf of DT model, and max_depth, min_samples_leaf of LR model, n_estimators, max_features, max_depth, min_weight_fraction_leaf of RF model, n_estimators, max_leaves, max_depth, max_bin of XGB model, C-value and gamma of SVM model, hidden1, hidden2, learning_rate_int of ANN model and n_neighbors of KNN model.A fivefold cross-validation method was used to evaluate the model's generality in the training set. The model performance was evaluated using the test set, and the evaluation indicators were accuracy, precision, specificity, sensitivity (recall), F1-score, confusion matrix and AUC. A schematic overview of the study design and model development is depicted in Fig. 1.

Fig. 1
figure 1

Overview of study design and model development

Results

Patients and variables

After screening by inclusion and exclusion criteria, 213 patients were included in this study, all in line with the diagnosis of type 2 diabetes mellitus and Klebsiella pneumoniae liver abscess. Patients were grouped by the occurrence of IKPLAS, with 25 cases progressing to IKPLAS as the IKPLAS group and 188 cases as the NIKPLAS group. There were 60 females and 153 males, as shown in Table 1. Through stratified random sampling, the data set was divided into the training set and test set. As shown in Table S1, there was no statistically significant difference between training set and test set(P ≥ 0.05). Clinical findings, Symptom at presentation, Admission data, and Radiologic findings in Table S1 will all be screened as variables. As shown in Fig. 2, When the number of feature variables is four, the recursive feature elimination method with random forest as classifier has the highest cross validation score. These four variables are hemoglobin, platelets, D-dimer, and SOFA score.Spearman correlation analysis was performed on these four features, as shown in Fig. 3, indicating no highly correlated redundant features.

Table 1 Baseline statistics for 213 patients Line 143
Fig. 2
figure 2

Recursive feature elimination method variable selection diagram

Fig. 3
figure 3

Spearman Correlation Analysis Heatmap

Tuning of parameters

The four variables selected from the training set were put into the machine learning classifier to construct the prediction model. Through Bayesian algorithm optimization, the parameters were adjusted with the average optimal AUC value, and the specific parameter Settings are shown in Table S2. The five-fold cross-validation ROC curve of the training set can be seen in Fig. 4, where it can be seen that the SVM model and LR model have better performance.

Fig. 4
figure 4

Five-fold cross-validation ROC curve for the training set. A ANN model. B DT model. C KNN model. D LR model. E RF model. F SVM model. G XGB model

Evaluation of prediction models

The ROC curve of the test set can be seen in Fig. 5. The AUC values of most models are higher than 0,850, among which SVM (0.969) and LR (0.967) rank the top two, but the AUC values of XGB (0.799) and DT (0.800) are lower. Studies have shown that precision recall curve (PRC) has advantages over ROC in evaluating imbalanced datasets [25]. The dataset included in this study is also imbalanced, so PRC is also a valuable indicator. Figure 6 shows the PRC of the test set, and the Average Precision(AP) value was used as a criterion to evaluate the PR curve [26]. The APs of the LR,SVM models were all above 0.800. The confusion matrix was also calculated for all seven models (Table 2), and the DT model generated a large number of FPs (n = 19) during the prediction process, while the other models were relatively few. DT, LR, and SVM models produced the least FNs (n = 1), and the KNN model produced the least FPs (n = 0). Table 3 shows each model evaluation result's sensitivity (recall), specificity, accuracy, precision, f1, AP and AUC.

Fig. 5
figure 5

ROC curves of seven models in the test set

Fig. 6
figure 6

Precision Recall Curves for the seven models in the test set

Table 2 Confusion matrices of 7 models
Table 3 Performance summary in terms of sensitivity (recall), specificity, accuracy, precision,F1-score,AUC

As shown in Table 3, there were significant performance differences between the models. The AUC (0.969), F1-Score(0.737) and AP(0.890) of the SVM model were the highest among the seven models, and the all-around performance was the best. At the same time, its sensitivity (0.875) is the highest and can effectively identify the occurrence of IKPLAS in the early stage. The KNN model had the best specificity (1.000) and could be used to reduce the occurrence of overdiagnosis and treatment.

Figure 7 shows the calibration curves of the seven models. Except that the XGB and DT models over-estimates the occurrence of IKPLAS risk, the other models' calibration curves are a good fit with the actual observed results.

Fig. 7
figure 7

Seven machine learning model calibration curves

Figure 8 shows the Decision Curve Analysis of the seven models, which was first proposed in 2006 and has been used for prognostic decision analysis in cancer [27] and other fields [23]. The DCA curve shows a model compared to the Net Benefit situation under different High-Risk Thresholds between the two strategies of intervention in all patients (ALL) and no intervention in all patients (NONE). As shown in Fig. 6, there is no significant difference in the benefits of treatment intervention based on SVM and LR model between the risk threshold of 0.0 and 0.4. However, when the risk threshold was between 0.4 and 0.8, the SVM model's net intervention rate was significantly higher than that of other models and the overall benefit rate was high. Model explanation.

Fig. 8
figure 8

Decision curve analysis of seven machine learning models

To explain the output of our models, we used the SHapley Additive exPlanations(SHAP) algorithm to help us understand how a single feature affects the output of the models [28]. Its most significant advantage is that it can reflect the influence of the features in each sample, and it also shows the positive and negative effects of the influence. Each row represents a feature, sorted by feature importance from top to bottom. The abscissa is the SHAP value. A point represents a sample, and the color represents the eigenvalue (red for high, blue for low). The SVM prediction model with the best all-around performance was selected to interpret the feature importance.

As shown in Fig. 9, the SOFA score ranks first in the feature importance of SVM model, and the higher the value, the higher the probability of the patient progressing to IKPLAS. Platelet and hemoglobin were the second and third most important predictors of the SVM model, and both were negatively correlated with the outcome. D-dimer ranked last and was positively associated with the risk of IKPLAS.

Fig. 9
figure 9

SHAP feature analysis of the SVM model

Discussion

The high incidence of IKPLAS is mainly in the Asian population, which may be related to the fact that the Asian population is more likely to colonize the intestine with K1/K2 serotype Klebsiella pneumoniae [29, 30]. Diabetes is considered a significant risk factor for IKPLAS, and up to 63% of patients with a bacterial liver abscess in Taiwan have diabetes. This may be related to the impaired phagocytosis of K1/K2 Klebsiella pneumoniae in diabetic patients [31] and the more excellent vascular permeability in diabetic patients, which is conducive to bacterial invasion [11]. The above two serotypes of Klebsiella pneumoniae are also highly virulent Klebsiella pneumoniae, which show high viscosity in the String test [9]. Although the highly virulent Klebsiella pneumoniae is sensitive to most antibiotics, patients often have a poor prognosis if they are not recognized and treated early [32].

This study screened four characteristic variables: hemoglobin, platelets, D-dimer, and SOFA score. We interpreted the importance of the model characteristic variables by using the SHAP package, in which the SOFA score ranked first among all four models.

The SOFA score is a scoring system that measures the degree of impairment of significant organ function in patients with sepsis or suspected sepsis to determine prognosis [33]. Several studies have confirmed its predictive value in the prognosis of infected patients [34, 35]. This study also suggests that the SOFA score is a significant predictor of diabetes complicated by IKPLAS. As can be seen from the SHAP plot, the higher the SOFA score, the greater the risk of progression to IKPLAS. Although the pathogenesis of IKPLAS is currently unclear, the study by Chen-Guang Zhang et al. shows that most diabetic patients with IKPLAS are prone to sepsis [11]. Blood-borne transmission may be one of the more important ways.

In the feature importance ranking, platelets' influence on SVM model ranked second.Jai Hoon Yoon et al. showed that thrombocytopenia is an independent risk factor for invasive syndrome in diabetic patients with Klebsiella pneumoniae liver abscess [10]. This is also consistent with the conclusions about platelets in the SVM model established in this study. The mechanism of platelet reduction in diabetes combined with IKPLAS may be that when the body is infected, platelets are stimulated and activated to participate in the body's inflammatory response by inducing the expression of membrane proteins and the production of mediators and play the role of anti-infection and pathogen removal. Activated platelets produce and release pro-inflammatory, anti-inflammatory, chemokines, antimicrobial, and other mediators to regulate the body's innate immune or adaptive immune response [36]. The interaction between platelets and pathogens or their products, endothelial cells, and immune cells promotes endothelial cell damage and leukocyte activation. As a result, the adhesion of platelets to it is enhanced, platelets are continuously activated in the circulation, and the body continuously produces anti-platelet antibodies and macrophage-colony stimulating factors, which accelerates the destruction and consumption of platelets [37].

The SHAP plot shows that hemoglobin is the third most important characteristic variable after the SOFA score, and the lower its value, the higher the risk of progression to IKPLAS. It has been shown that hemoglobin can be an indicator to assess the severity of the disease in infected patients, probably due to a systemic inflammatory response leading to decreased erythropoiesis, increased destruction of erythrocytes due to hemolysis, and hemorrhage, which leads to a reduced ability of blood to transport oxygen and carbon dioxide and insufficient oxygen supply to the body, resulting in multi-organ damage [38].

D-dimer is a specific molecular marker for secondary hyperfibrinolysis in vivo and is an effective indicator to reflect the coagulation state of the body. The coagulation and fibrinolytic systems are usually closely linked to the development of inflammation. Infection can lead to damage of vascular endothelial cells and alveolar epithelial cells, which stimulates the coagulation system, resulting in impairment of coagulation function and abnormal coagulation indexes in patients, further aggravated by elevated D-dimer along with infection [39, 40]. The above two promote each other, forming a vicious circle. The autoimmune function of diabetic patients is weakened, and the inflammatory response is enhanced after infection. Patients with diabetes complicated with IKPLAS can have noticeable D-dimer changes in the early stage. In the SVM model, D-dimer was positively associated with the risk of developing diabetes with IKPLAS, which is consistent with the above findings.

In the field of IKPLAS, more studies are focused on the risk factors of IKPLAS.The study by Shixiao Li et al. [41] showed that patients with IKPLAS were more likely to develop chronic renal insufficiency, thrombocytopenia, and increased total bilirubin than patients with non-IKPLAS. Hairui Wang et al. [42]. A logistic regression prediction model was used to predict the incidence of IKPLAS by incorporating clinical and CT features, with an AUC value of 0.842 in the validation set, and did not compare other prediction models.Unlike many studies, we first used seven machine learning models for prediction. Through parameter adjustment and verification, the SVM model with the best performance was selected, with an AUC value of 0.969 and an AP value of 0.890, indicating that it was a reliable IKPLAS prediction model. At the same time, the variables included in this model are clinical indicators, which are easy to collect and can be used by clinicians to conveniently judge the possibility of IKPLAS in patients with diabetes mellitus complicated with Klebsiella pneumoniae liver abscess.

Machine learning algorithms can build complex models that perform satisfactorily enough when the amount of data is sufficient. However, in specific applications, the amount of data is often insufficient, so it is essential to analyze these machine learning algorithms and obtain good results with relatively small sample sizes. In this study, the Power analysis was satisfied by calculating a power value of > 0.80, although we only used a small data set of 213 patients. The main reason for the excellent performance of the SVM model in this study is that it is a nonlinear learner that is more suitable for small samples, can ideally separate samples, and has better generalization.

There are still some limitations in this study. First, this is a single-center regression study, and some potential biases cannot be avoided. Secondly, for machine learning, the sample size of this study is insufficient. In order to further improve the accuracy of the model, we will collect more clinical data and further optimize the parameters.

Conclusion

In this study, we established and compared seven models to compare the performance of predicting the progression of diabetes with Klebsiella pneumoniae to IKPLAS and found that the SVM model had the highest overall predictive power. We also found that SOFA score, platelets, hemoglobin, and D-dimer significantly affected the model's predictions. In the future, we will expand the dataset to improve further the model's accuracy and better plan diagnosis and treatment for clinicians.