Keywords

1 Introduction

Serum lactate level is traditionally considered a biomarker of tissue hypoxia and is often accompanied by sepsis [1]. Measuring and monitoring blood lactate concentration in sepsis and septic shock can reflect the severity of the illness and the response to therapeutic interventions [2,3,4]. It has been shown that the decrease in blood lactate values measured in the first hours of admission to the intensive care unit (ICU) over time is associated with better survival [5]. Persistently elevated or increasing lactate levels, indicating inadequate blood flow to organs and tissues (hypoperfusion), are associated with a higher risk of complications and death [6].

For adults with sepsis or septic shock, international guidelines suggest using serum lactate levels to guide resuscitation. This approach helps ensure patients with high initial lactate levels receive targeted treatment aimed at lowering lactate levels [7]. Recently, some randomized control trials demonstrated that early lactate clearance-directed therapy is associated with decreased mortality as compared to the usual care [8]. Because the lactate level measurement is based on time consuming laboratory analysis, technologies that can predict lactate trends quickly, accurately, and noninvasively can be of significant help to clinicians. Despite extensive efforts over the years, there are currently no commercially available intravenous (IV) chemical sensors (i.e., in the bloodstream) for continuous real-time monitoring of lactate levels in ICU patients [9]. Frequent blood draws for serum lactate testing expose patients to risks like infection from venipuncture or central line use, and potential anemia from repeated sampling [10, 11]. A non-invasive method could predict lactate trend of patients allowing clinicians to focus confirmatory testing on patients likely to experience deterioration. In addition, it may avoid unnecessary blood sampling and repetitive lactate measurements. Machine learning algorithms may be helpful to clinicians in this regard [12].

We performed this retrospective study with the hypothesis that a machine learning approach can predict lactate trends from non-invasive clinical variables of patients with sepsis.

2 Methods

2.1 Data Sources

MIMIC-IV is a database containing de-identified health data from over 60,000 ICU patients at Beth Israel Deaconess Medical Center (BIDMC). This database, maintained by MIT’s Laboratory for Computational Physiology, is a valuable resource for medical research [13]. We obtained permission to use the anonymized MIMIC-IV dataset and followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) guidelines [14] for reporting our findings. While STROBE focuses on observational studies, we additionally considered the recommendations offered by Stevens et al. [15] when preparing our manuscript, specifically for reporting machine learning analyses in clinical research. This research recommends statistical methods for machine learning analysis in clinical research, and machine learning analysis workflow is overviewed. Also, several key reporting elements according to the study designs are reported.

The study received ethical approval from both the institutions involved (MIT and BIDMC) and waived the need for individual patient consent because it utilized completely anonymized data already publicly available. Our research adhered to all relevant data privacy guidelines and regulations.

2.2 Study Design

Our retrospective study examined a subgroup of adult sepsis patients from the MIMIC-IV dataset. Sepsis was defined using the Sepsis-3 criteria: suspected infection and an acute increase in the SOFA score of at least 2 [16]. The SOFA score, reflecting organ dysfunction, was calculated using hourly clinical and laboratory data from the first day of each patient’s ICU stay. The sepsis criteria were satisfied at the earliest time at which a patient had SOFA ≥ 2 and suspicion of infection (time of suspected infection: the culture time [if before antibiotic]; or the antibiotic time [if before culture]). According to the diagnostic criteria, we enrolled adult patients (age ≥ 18 years) with at least two serum lactate measurements recorded (within 12 h, starting 6 h before the initial sepsis diagnosis) and with an ICU stay ≥24 h.

2.3 Definition of Outcomes

We first needed to define trends to accomplish this above-mentioned lactate trend analysis. Therefore, three trend states were constructed according to value change in blood lactate. For the 12-h observation period, 1mmol per liter and above change was considered a trend indicator. We calculated the difference between two lactate values with a maximum interval of 6 h. According to this setup, all samples in the data cohort had been labeled as increase, decrease, or constant. Trend definition can be seen in Fig. 1.

Fig. 1.
figure 1

Trend definition of lactate values.

2.4 Variable Selection

According to the clinical literature, we identified nine variables that are most relevant in lactate trend analysis. These variables are age, initial lactate value, last lactate value, and the time interval between two lactate measurements, the averages of hemodynamic and respiratory monitoring parameters measured in this time interval (heart rate, systolic blood pressure, diastolic blood pressure, mean blood pressure, oxygen saturation, and PaO2/FiO2 ratio) (Table 1).

These variable selections were used to reduce laboratory dependence on lactate trend analysis and therefore in a minimal-invasive manner. Preprocessing is a vital step to achieving robust machine learning models. These processes help reduce noise, remove redundant data, generate consistent data, and thus increase the performance of prediction models. We applied various preprocessing steps to the data cohort to improve data quality. Outliers in the dataset were removed to obtain consistency between data points. To make the range intervals more coherent unity-based normalization was applied. All ranges were transformed into 0 and 1. We received 18653 data samples after these preprocessing steps.

Feature selection strategies on clinical data provide the correct parameters to analyze a certain disease, treatment cost reduction and reduce computational burden [17]. To achieve these goals, we do a further investigation on variable space. We used the Correlation-based feature selection (CFS) algorithm as a feature selector. CFS algorithm acquires important and pertinent features using inner characteristics of data instead of machine learning approaches [18]. In many cases, some features have a high correlation with others. These features with high correlation characteristics produce redundant data and thus reduce the performance of prediction models. CFS algorithm evaluates the correlations between other features and discards features with high correlation [18]. According to the CFS algorithm, we identified four variables with less correlation than the others and can be used to predict lactate trends in sepsis patients. These are heart rate, oxygen saturation, initial lactate value, and time interval. The overall ranking of features can be seen in Table 2. In this table, feature ranks were identified according to their average merit value; a higher average merit value represents a lower correlation and a higher rank among feature sets [19].

Table 1. Patient characteristics (N = 18653).
Table 2. Ranking of features according to correlation analysis.

2.5 Proposed Machine Learning Framework

Our proposed machine learning-based framework uses a clinical and demographical types of data and feeds these data to a classifier to oversee lactate trend in ICU settings. We utilized a traditional model for a supervised classification problem consisting of training and a test/evaluation phase.

First, training data consisting of annotated data samples are acquired from the MIMIC-IV dataset. Afterward, they go through a data preprocessing stage to increase data quality for the classification model. Every sample in training data has a lactate trend label (Increase/Constant/Decrease). These samples are trained with a classifier to construct a machine-learning model. A preprocessed test sample is fed to the classifier for the test stage, and the classifier predicts its lactate trend label. In conclusion, classification performance is reported in the evaluation phase (Fig. 2).

Fig. 2.
figure 2

Proposed machine learning framework for lactate trend prediction.

2.6 Selected Classifiers for Proposed Framework

We evaluated various classifiers on the MIMIC-IV dataset to predict lactate trends in sepsis patients. These classifiers are Naïve Bayes (NB), J48 Decision Tree, Logistic Regression (LR), Random Forest (RF), and Logistic Model Tree (LMT). Naïve Bayes is a traditional and simple machine learning approach that contemplates dataset attributes as an independent [17]. The outputs are considered class probabilities. Naïve Bayes acts on the Bayes theorem, which is the probability of any event occurring, given the probability of another event just occurring. The class with the highest probability is selected as the outcome. It became immensely popular in the machine learning area due to its advantages. These advantages are managing the overfitting problem very well and parallelizing the classification process [20].

J48 decision tree algorithm is an updated version of the popular decision tree algorithm ID3 [21]. It can be used in both numerical and categorical data. J48 aims to find a specific attribute that fully partitions the training data. This attribute has the highest in-formation gain value in the dataset [22]. By evaluating the probable values of this attribute, a branch pruning process starts, and J48 defines target values. In the meantime, J48 searches other high information gain attributes. This process continues until an explicit decision is made on the combination of attributes that gives a certain rule for determining the target value. At the end of the algorithm, all features are evaluated; therefore, all samples have a target value accordingly [22]. J48 became a popular machine learning tool in many areas due to its easy implementable and robust nature [21, 23, 24].

Random Forest (RF) belongs to the family of decision trees that employ a supervised ensemble learning strategy [25]. It gained popularity among classification and regression problem domains due to its robustness against overfitting and low computational load [25,26,27]. RF builds many decision trees that are based on the selection of a random subset of variables that are called bootstrap samples. Other decision tree learners aim to find the best variable available, whereas RF uses random variables. The primary motivation for this approach is to reduce the correlation between these candidate random trees. This randomness approach is essential when making decisions because if highly correlated variables are available, it affects the prediction phase and leads to poor prediction performance. All predictions from random trees are combined to achieve the maximum result [26].

The logistic regression algorithm is mainly used for tackling classification problems and modeling class probabilities [28]. It aims to fit the data to a logistic curve to predict the occurrence probability of events [29]. It can handle nonlinear dataset effects.

LMT algorithm is a hybrid decision tree approach that utilizes logistic regression and decision tree learning [30]. Leaves of the tree have piecewise linear regression models constructed by logistic regression functions. To build these logistic regression functions LogitBoost algorithm is used [31]. Decision tree classifier algorithms do prune of the decision tree. Splitting of the decision tree is implemented via logistic variant information gain. The algorithm has many positive aspects; it can map linear relationships, overfitting can be easily avoided, and it is easy to implement. Because of its numerous advantages, in recent years, it has been used in many different research areas [30,31,32].

2.7 Evaluation Criteria

Experiments on predicting the lactate trend are evaluated with ten-fold cross-validation (CV). In this CV approach, the dataset is split into ten parts that have an equal number of samples. One part is selected for testing, and the rest are used for the training phase. The cross-validation process stops if all parts are used for the testing phase. Evaluation setting for three class classification is one versus all approach.

Area Under Curve (AUC) score and Area Under Precision-Recall Curve (AUPRC) metrics are used to assess the classification performance of machine learning algorithms. The AUC score is calculated by drawing a True Positive Rate (Sensitivity) and False Positive Rate (1-Specificity) curve. Then after drawing this curve area under the curve is calculated to assess the classification model. AUC score range is between 0 and 1. An AUC score of 1 means that the classification model can distinguish all samples. So, values that are close to 1 indicate better prediction performance. Compared to AUC, AUPRC prioritizes its ability to identify positive samples. In addition, AUPRC is preferred over the AUC as it is more sensitive and less prone to exaggerate model performance for unbalanced datasets.

3 Results

We evaluated RF, NB, J48, LR, and LMT classifiers on the lactate trend prediction task. We conducted our experiments based on three scenarios; the sepsis patient`s lactate value has an increasing trend, sepsis patient has a steady lactate value trend, and sepsis patient`s lactate value has a decreasing trend.

Table 3 shows classification results for the lactate trend increase scenario. As can be seen from Table 3 LMT and LR algorithms outperformed other classifiers and achieved 0.647 values in terms of AUC; but in terms of AUPRC, RF performed better. NB comes second. J48 decision tree performed worse when predicting lactate trend increase. LMT algorithm with four features (heart rate, oxygen saturation, lactate value before sepsis diagnosis, and time interval) achieved 0.630 in terms of AUC (AUPRC, 0.113).

Table 3. Classification results for the increasing trend of lactate.

Table 4 shows classification results for the constant lactate trend scenario. As can be seen from Table 4 LMT algorithm outperformed other classifiers (AUC, 0.803; AUPRC, 0.921). RF comes second, and LR comes third. J48 decision tree performed worse when predicting constant lactate trend. LMT algorithm with four features achieved 0.921 in terms of AUPRC.

Table 4. Classification results for the constant trend of lactate.

Table 5 shows classification results for the lactate trend decrease scenario. As can be seen from Table 5 LMT algorithm outperformed other classifiers (AUC, 0.847; AUPRC, 0.502). RF comes second, and LR comes third in terms of AUC. J48 decision tree performed worse when predicting constant lactate trend. LMT algorithm with four features achieved 0.844 in terms of AUC and 0.493 in terms of AUPRC. According to experimental results, we can say that machine learning models that employ logistic regression architectures overall achieved good results in lactate trend prediction tasks. Also, the LMT algorithm with just four variables achieved a noteworthy prediction performance compared with the LMT algorithm that uses all of the variables. Especially in constant lactate and decreased lactate trends, LMT with four features achieved similar results to the LMT algorithm. We can say that the LMT algorithm with heart rate, oxygen saturation, lactate value before sepsis diagnosis, and time interval variables can be effectively used to assess the patient’s state, whether it is stable or improving.

Table 5. Classification results for decreasing trend of lactate.

4 Discussion

The LMT models, one of the machine learning approaches, were the most accurate in predicting serum lactate trends from non-invasive clinical variables of patients with sepsis. In this method the AUC of increasing, constant, and decreasing lactate values were 0.647 [95% CI] [0.633–0.661], 0.803 [95% CI] [0.798–0.808], and 0.847 [95% CI] [0.841–0.853], respectively.

We observed different rankings of the importance of the variables for predicting lactate trends. For example, initial serum lactate measurement was a significant predictor of change in serum lactate values, followed by oxygen saturation, the time interval between lactate measurements, heart rate, SBP, P/F ratio, age, MBP, and DBP.

Multiple studies have been conducted on reducing the fatality rate associated with sepsis. Quickly identifying patients likely to experience severe sepsis or septic shock is essential for effective treatment. While lab tests (such as procalcitonin, C-reactive protein, and lactate) help predict sepsis, they can be time-consuming. This delay in diagnosis and treatment initiation highlights the need for faster prediction methods [33,34,35,36,37].

Signs of poor blood flow to tissues caused by sepsis can be both general and specific. General signs include low blood pressure, fast heart rate, decreased urine output, slow capillary refill, confusion, high blood lactate levels, and low blood oxygen saturation. Specific signs vary depending on the affected tissue. Notably, changes in vital signs, like heart rate, blood pressure, breathing rate, oxygen saturation, and body temperature, can appear several hours before serious complications or worsening of the patient’s condition, providing valuable time for early intervention [38]. The Systemic Inflammatory Response Syndrome (SIRS) and qSOFA score (also known as the rapid Sepsis-Related Organ Failure Assessment) are primarily based on identifying changes in vital signs. These criteria remain an essential clinical tool for the host’s systemic response to inflammation, despite the discovery of several biomarkers [39].

Studies suggest that analyzing trends in intermittent vital signs could lead to earlier detection of clinical deterioration in patients, potentially improving outcomes in both general wards and emergency departments [40]. According to a study by Barfod et al. [41], abnormal vital signs (SpO2, RR, BP, HR, GCS), especially abnormal RR, SpO2, and GCS, are strong predictors for intensive care unit admission from the emergency department and in-hospital mortality. Today, some researchers are developing sepsis diagnosis and mortality prediction models by analyzing changes in vital signs using machine learning techniques. A machine learning-based sepsis prediction algorithm (InSight) developed by Mao Q et al. [42] provides high sensitivity and specificity for detecting and predicting sepsis, severe sepsis, and septic shock using only six common vital data acquired in the emergency department, general ward, and ICU.

In clinical conditions, the circulatory disorder may be characterized by abnormal hemodynamic parameters such as hypotension and tachycardia, abnormal tissue organ perfusion findings such as decreased urine output and changes in consciousness, and abnormal metabolic parameters such as increased lactate and metabolic acidosis [43].

Hyperlactatemia is common in patients with sepsis, which is a marker of disease se-verity and a strong predictor of mortality. Sepsis-associated hyperlactatemia may reflect the degree of activation of the stress response (and epinephrine release) [1]. In daily clinical practice, it is accepted that the increase in lactate levels over time primarily reflects a change due to increased production, decreased utilization, or both. As hyperlactatemia is often associated with poor circulation, we usually see a decrease in lactate levels in the improved circulation state, and we hypothesize (but cannot prove) decreased production [44]. However, since clearance is significantly reduced in stable septic patient shock states, continued hyperlactatemia may reflect decreased clearance rather than increased lactate production [5]. Lactate levels can help doctors predict a patient’s risk of death, allowing them to determine the appropriate level of care. High lactate level indicates an increased risk of mortality, and it can help identify patients who need additional investigation and monitoring [45].

In our study, all models underperformed in predicting lactate increase, a more helpful indicator for disease severity. All the selected cases consisted of patients diagnosed with sepsis who already had high lactate levels. Therefore, we evaluated whether the upward lactate trend could predict further increase from the currently in-creasing state rather than the baseline level. This may have affected the predictive power of the model. In addition, the low performance can be attributed to the uneven distribution of the number of samples in each group in the cohort. The number of samples with an in-creasing trend is deficient compared to the others (Increase: 1328, Constant: 14012, Decrease: 3313 samples are available). Because of this imbalance, the model’s predictive power may be affected. We can conduct future work to improve this situation. Increasing the number of patients with increased lactate in the dataset may be recommended, or methods such as synthetic data generation may be used at the training stage [46]. Though our current model offers valuable insights, we believe its performance can be significantly enhanced with access to a larger dataset. This would allow us to develop a model with improved discriminatory power, ultimately providing clinicians with more precise guidance on when to utilize serum lactate testing. The reduction in the lactate trend and the prediction of stability can also reduce unnecessary testing. Also, additional parameters (such as the focus on infection and the medical treatments administered) will further improve the model’s performance and facilitate the prediction of serum lactate trends.

Empirical results reveal that machine learning approaches that utilize logistic regression functions achieved higher AUC values than others. Due to its robust structure to overfitting, the LMT algorithm achieved high AUC values even with just four features. Experimental results also prove that the LMT algorithm can be combined with easily acquirable and routinely collected parameters to predict the lactate trend of sepsis patients. The LMT model has high computational complexity due to its hybrid logistic tree structure. If the model is trained with high dimensional data, it can lead to high CPU and memory consumption [47]. To overcome this issue, our approach only uses high importance parameters in the lactate trend prediction task. With this proposed approach, a quick and accurate solution based on easily acquirable wearable parameters can be implemented in the ICU setting to assess the trend of lactate value.

5 Conclusion

Lactate metabolism is affected by many factors; thus, predicting its level using ma-chine learning is not easy. Treatment can be tailored according to predicting the lactate trend rather than predicting one single value. Our study suggests that lactate change can be predicted with a suboptimal performance by using machine learning models that use patients’ hemodynamic and respiratory parameters. Further clinical studies will help determine the full potential of this tool within a clinical context.

By adding more lactate-related parameters to the dataset, the performance of deep learning methods, a branch of machine learning, can be examined. Deep learning structures that have a reliable performance, such as LSTM (Long Short-Term Memory) and CNN (Convolutional neural network) can be combined with LMT to form a hybrid system and be used in predicting lactate trends. Lastly, synthetic samples can also be used in the models’ training phase to increase machine learning models’ prediction capability.

The need to find a better way to predict patients’ survival is still ongoing. Machine learning is gaining more importance and attention as the clinical outcomes are well correlated with the systems’ predictions. Clinicians prefer noninvasive and less costly approaches with accurate estimations of the patients. Predicting the lactate trend, in other words, the state of sepsis patient, whether it is stable or improving, in the ICU by LMT algorithm, which uses heart rate, oxygen saturation, lactate value before sepsis diagnosis, and time interval variables can be done effectively.