Introduction

Sepsis, a severe and life-threatening condition triggered by an overwhelming immune response to infection, poses a significant global health challenge [1]. It is responsible for an estimated 31.5 million cases of sepsis and 19.4 million cases of severe sepsis annually, leading to approximately 5.3 million deaths worldwide [2]. The critical nature of timely sepsis identification in clinical practice is underscored by findings that even a brief delay in initiating treatment can substantially increase mortality rates, owing to irreversible organ damage[3]. This urgency has catalyzed research into advanced predictive methodologies, notably the application of Machine Learning (ML) techniques aimed at the early detection of sepsis[4]. Such research endeavors have largely concentrated on the development of ML models and tools with a focus on improving prognosis, diagnosis, and the integration of clinical workflows, thereby highlighting the potential for constructing sophisticated computerized decision support systems[5].

Despite these advancements, traditional sepsis prediction methodologies, including the Sequential Organ Failure Assessment (SOFA), Systemic Inflammatory Response Syndrome (SIRS), and quick SOFA (qSOFA), exhibit significant limitations[6]. These methods often rely heavily on clinical judgment and are subject to variability in interpretation across different levels of clinical expertise, which can lead to inconsistencies in early sepsis detection. Moreover, traditional approaches tend to identify sepsis at a more advanced stage, missing the crucial window for early intervention [6, 7]. Conversely, ML models offer a dynamic and continuous analysis of real-time patient data, enabling early detection and providing dynamic risk assessments. By harnessing data analysis and pattern recognition capabilities, ML models aim to enhance patient outcomes and alleviate the burden on healthcare systems[7].

Feature engineering emerges as a critical component in the optimization of ML models for sepsis prediction. This process entails the selection, transformation, and creation of relevant features from raw data, aiming to improve the predictive accuracy of models. While the significance of identifying and utilizing critical features in constructing robust and precise predictive models is well-recognized, the field continues to grapple with uncertainties surrounding the effectiveness of specific features and the methodologies for feature selection and extraction[8]. Variability in the approaches to feature engineering and their impact on model performance necessitates a comprehensive scoping review to evaluate the evidence and discern which strategies yield the most significant benefits in terms of prediction accuracy.

In the context of feature engineering for sepsis prediction, the current literature has focused on the importance of selecting and extracting the most relevant patient-related variables to enhance the model accuracy [9]. Though some studies have explored clinical and laboratory features to improve sepsis prediction models, such as vital signs (e.g., temperature, heart rate, respiration rate, blood pressure), laboratory values (e.g., white blood cell count, lactate levels), patient demographics, and clinical history to improve sepsis prediction models; but still there is uncertainty regarding the effectiveness of specific features and feature selection/extraction methods in sepsis prediction[10]. Our current study employed different approaches to feature engineering, and their impact on model performance varies. This variability highlights the need for a scoping review to comprehensively evaluate the available evidence and provide insights into which feature-engineering strategies offer the greatest benefits in terms of sepsis prediction accuracy.

This scoping review seeks to address these gaps by assessing the critical features that enhance sepsis prediction and by providing insights into identifying patterns that may lead to improved clinical outcomes. The primary objective of this review is twofold: firstly, to explore the feature engineering strategies utilized in ML models for sepsis prediction, thereby offering valuable insights for future research and model development; and secondly, to evaluate the performance of these models through a critical analysis of existing studies, focusing on metrics such as the Area Under the Receiver Operating Characteristic curve (AUROC), Sensitivity, and Specificity.

Through an exhaustive evaluation of 29 selected studies, this review aims to analyze and synthesize various feature engineering techniques applied in sepsis prediction models, assess their impact on model performance, and evaluate the overall effectiveness of ML models in predicting sepsis. By adopting a systematic approach, the review intends to provide a comprehensive understanding of the role of feature engineering in enhancing sepsis prediction models, ultimately contributing to more effective clinical decision-making and patient care.

Methods

Search strategy

The search strategy for this scoping review was meticulously devised in alignment with the Preferred Reporting Items for Scoping Reviews (PRISMA) guidelines. PRISMA represents a rigorously developed framework, outlining a comprehensive set of standards for reporting scoping reviews. This methodology ensures transparency and reproducibility in the review process, as illustrated in Figure 1.[11]

Fig. 1
figure 1

Preferred Reporting Items for Scoping reviews and Meta-Analysis (PRISMA) flow diagram for the conducted study

On 13th March 2023, a comprehensive literature search was conducted across PubMed, Embase, and Scopus, targeting publications from the past five years (13 March 2018 to 13 March 2023). This search employed a detailed strategy, utilizing Boolean operators "AND" and "OR" to combine key phrases, specifically: "Machine Learning" and "Prediction" along with "Sepsis" and "Septic Shock". Each database was queried with these terms to ensure a thorough retrieval of relevant studies. Subsequently, the titles and abstracts of the retrieved studies were meticulously reviewed by an investigator (SB) to ascertain their suitability for inclusion in the review.

Inclusion and exclusion criteria

Inclusion criteria:

  • Research articles published in the English language.

  • Studies appearing in peer-reviewed journals.

  • Research focusing on the prediction of sepsis and associated outcomes.

  • Studies investigating ML (Machine Learning) models for sepsis prediction, emphasizing significant features for model optimization.

Exclusion criteria:

  • Conference abstracts and preliminary proof of concept studies.

  • Research studies exclusively predicting mortality related to sepsis.

  • Research studies published in the subscribed journals

These criteria ensured a comprehensive and focused review of the literature on ML models for sepsis prediction by excluding preliminary findings and studies that were not directly aligned with the core objectives of efficient prediction and feature analysis.

Data extraction and quality assessment

Data extraction was meticulously carried out by a primary reviewers (SB) and (JP), who cataloged essential details such as Title, Publication year, First author, Study objectives, Clinical setting, Patient cohort size, ML model utilized, Feature count, Sepsis classification, Observation period, Gender distribution, AUROC, Innovation, Model evaluation criteria,Training-test split, Data source, Sensitivity, and Specificity, in addition to the criteria used for sepsis diagnosis.

To ensure the accuracy and integrity of the data extraction process, two additional reviewers (ED and UU) collaboratively worked with the primary reviewer (SB) to scrutinize and validate the extracted information. Studies failing to align with the predetermined inclusion criteria were systematically excluded. Discrepancies encountered during the review process were resolved through comprehensive mutual discussion and further literature consultation, facilitating consensus among the reviewers.

Results

Characteristics of studies

The scoping review included 29 studies (See in Table 1), encompassing a total patient cohort of 1,147,202 (909,462 cases and 237,740 controls). The majority of the studies, numbering 20, (3,4,12,13,17,18,19,20,21,5,23,24,25,28,30,31,32,33,34,36) were conducted in Intensive Care Units (ICUs), while four were based in Emergency Departments (EDs) 16,26,29,35, and one was carried out in a general hospital setting[14]. These studies varied in patient demographics, prediction timeframes, and sepsis types, utilizing diverse database sources. For instance, the research by Meicheng Yang et al. [3]focused on hourly sepsis risk prediction in ICU settings with the EASP model, emphasizing interpretability. Maximiliano Mollura et al. [12] and Xin Zhao et al. [13] explored ICU data and PhysioNet/Clinic Challenge 2019 data, respectively, each applying distinct approaches to sepsis prediction and addressing specific challenges.

Table 1 Characteristics of included studies

Figure 2's pie chart illustrates the data source distribution, revealing a nearly equal split between private (55%) and public databases (45%), such as MIMIC, indicating the varied origins of data in these studies. This comprehensive review underscored the array of strategies and methodologies employed to enhance sepsis prediction across different clinical environments, contributing to the ongoing advancement in the field.

Fig. 2
figure 2

Database sources used in the studies

Feature engineering techniques

Feature selection is a process of selecting feature subsets which are applied to the model construction. It is used in areas where there any many features and relatively few samples. On the other hand, feature extraction generates new features from the original features, which means that the new features after feature extraction is a mapping of the original features (See Figure 3). [35]

Fig. 3
figure 3

Classification of feature selection methods

Feature selection methods: Filter methods

Filter methods employ variable ranking techniques as their core criterion for feature selection, arranging variables based on their relevance. This relevance, termed feature relevance, measures a feature's utility in distinguishing between different classes within the data. Utilizing methods like Info Gain, GINI, and Relief, Jevier Enrique Camacho-Cogollo et al.[20] applied the filter approach to score and rank features according to their class label relevance, selecting features above a specified relevance threshold (0.0020). This process identified 31 medically relevant features and 88 statistical features, with Info Gain, GINI, and Relief methods selecting 75, 47, and 76 relevant features respectively. Similarly, Donghun Yang et al. [15] employed the filter method to narrow down from 1,738 initial features to the 50 most critical features, encompassing both laboratory data and drug interactions, thereby underscoring their significance in enhancing the accuracy of their predictive model.

Feature selection methods: Wrapper methods

Wrapper methods optimize feature selection by treating the prediction model as a “black box," utilizing the model's performance metrics as the objective function for evaluating subsets of variables. [2]This approach typically yields higher performance subsets than filter methods by leveraging actual modeling algorithms for evaluation.[38] Yash Veer Singh et al.[31] applied backward elimination, a wrapper method, effectively removing non-contributory features to identify 11 critical features, achieving a model accuracy of 0.96 with their Ensemble model. Meicheng Yang et al. employed forward feature selection, another wrapper strategy, categorizing their 168 selected features into raw features, information missingness, time series, and empiric categories, showcasing the adaptability of wrapper methods in refining feature sets for predictive modeling.

Feature selection methods: Embedded methods

Embedded methods integrate the feature selection process directly within the training phase of ML models, offering a nuanced approach that inculcates the complexity of model training with the simplicity of feature optimization. These methods, such as Lasso and Elastic Net, operate on the principle of regularization, which aims to minimize overfitting by penalizing the magnitude of feature coefficients, effectively shrinking some to zero. [38] This not only aids in identifying features that have little to no predictive value but also enhances model generalizability.

In the realm of sepsis prediction, embedded methods have shown considerable promise. For instance, the use of Random Forest importance as an embedded method highlights its capability to discern the relative value of each feature within a dataset. By analyzing feature importance, researchers can pinpoint which variables most significantly impact the model's predictions, particularly in the context of sepsis where timely and accurate prediction can save lives. Dong Wang et al.'s [5] application of this method led to the selection of a concise set of 20 features critical for sepsis prediction in ICU patients, underscoring the method's efficiency in distilling a dataset to its most informative components.

Further exploration by Cesario et al. [20]into Mean Decrease Accuracy and Mean Decrease GINI as embedded methods provides insights into the multifaceted nature of feature selection. These techniques evaluate the impact of each feature on the model's accuracy and the overall reduction in data impurity, respectively, offering a comprehensive view of feature significance. Such methodologies have elucidated the paramount importance of certain predictors, like age, which exhibited a profound influence on the model's predictive capabilities.

Rishikesan Kamaleswaran et al.'s [25] study stands out for its broad application of feature selection methods, spanning both embedded and wrapper techniques. By employing a wide array of methods, including parametric and non-parametric tests, Ridge, Lasso, and Recursive Feature Elimination (RFE), alongside Random Forest-based variable importance, the study showcases the depth of possible analysis when integrating feature selection with model development. The adoption of Recursive Feature Elimination, in particular, highlighted its effectiveness in isolating 22 highly predictive features, demonstrating the potential of embedded methods to refine and enhance model performance through targeted feature selection.

This comprehensive approach to feature selection, particularly within the scope of embedded methods, exemplifies the dynamic interplay between algorithmic complexity and model optimization. By embedding feature selection within the model training process, these methods provide a robust framework for developing highly accurate and generalizable predictive models, essential for advancing sepsis prediction and improving patient outcomes.

Feature extraction methods

Zhengling He et al. [19] and colleagues explored the potential of LSTM (Long Short-Term Memory networks) for deriving features from sequential data, employing an ablation study to gauge the impact of individual features. Their findings highlight the ICU Length Of Stay (LOS) as a pivotal predictor, alongside other significant LSTM-derived features like the pseudo SOFA score and body temperature. These insights underscore the value of deep learning in identifying nuanced indicators for sepsis onset prediction.

Further innovation in feature engineering was demonstrated through the development of second-order derived features and aggregate features [17], capturing complex relationships and condensing data into insightful metrics. This approach yielded a comprehensive set of 672 features, with 192 identified as unique, revealing the synergistic effect of body temperature and heart rate, among others, on sepsis prediction accuracy and lead time.

Table 2 consolidates various feature selection and extraction methods, ranging from wrapper and embedded methods to unsupervised techniques, highlighting their effectiveness in distilling critical predictors from a broad spectrum of clinical and demographic data. This table illustrates the evolution from initial feature identification to the final selection, emphasizing the top ten features across studies, and showcasing the diversity and impact of feature selection and extraction strategies on enhancing model performance.

Table 2 Feature Selection Methods and Important Features in the Included Studies

The analysis of the top 10 features, as depicted in Figure 4, highlights the most critical physiological markers for sepsis prediction. These indicators are crucial for recognizing the onset of sepsis, emphasizing the necessity of vigilant monitoring of such parameters. The graphical representation serves to underline the significant role these features play in the early detection and prediction of sepsis, pointing to the potential changes in these parameters as early signs of sepsis. This insight is vital for the development of effective early diagnosis and intervention strategies in sepsis management, illustrating the clinical importance of these markers.

Fig. 4
figure 4

Frequency of features identified in studies

To comprehensively assess the influence of various feature selection and extraction methodologies on the predictive accuracy of sepsis models, a meticulous analysis was carried out. This scrutiny was confined to investigations leveraging publicly accessible databases, namely MIMIC and PhysioNet, to ensure an unbiased comparison across diverse studies. By filtering through an expansive array of research, significant contributions from each category—Filter, Wrapper, Embedded, and Feature Extraction—were identified and their optimal results meticulously synthesized.

The graphical representation, depicted in Figure 5, elucidates the differential efficacy of these methodologies, with a particular spotlight on the Feature Extraction technique. This method emerged as notably superior, showcasing enhanced sensitivity and AUROC metrics, thereby suggesting its unparalleled effectiveness in sepsis prediction. Such findings are instrumental, indicating that feature extraction methods when specifically adapted for the nuances of sepsis prediction, are capable of significantly elevating the predictive precision of models. This detailed comparative analysis not only highlights the distinct advantages of tailored feature extraction techniques but also serves as a critical resource, offering insights into the optimization of sepsis prediction models through strategic feature selection and extraction.

Fig. 5
figure 5

Performance analysis of different feature selection and extraction methods across open database

Model performance

Table 3 synthesizes outcomes from a spectrum of studies dedicated to sepsis prediction, encapsulating the application of Machine Learning (ML) and Deep Learning (DL) strategies. It meticulously outlines the top-performing models, highlighting their Area Under the Receiver Operating Characteristic Curve (AUROC) values, Sensitivity, Specificity, and the Distribution of data for training, testing, and validation phases. The compilation reveals a broad array of algorithmic approaches, underscoring the dynamic potential of different ML models in accurately predicting sepsis. For instance, Kim et al.'s [30] bespoke model showcases exemplary performance metrics, whereas Yang et al.'s [13] study presents a contrasting scenario with their Random Forest model. This diversity in model efficacy and algorithmic application illustrates the ongoing evolution and complexity in the quest for improved sepsis prediction methodologies, aiming to significantly uplift patient care standards through enhanced diagnostic accuracy.

Table 3 Evaluation Measures for ML Models in Different Studies

Impact of prediction time window on model performance

The prediction time window is crucial as it plays an important role in clinical intervention, resource allocation, treatment planning, false positive rates, clinical workflow, and model evaluation. Figure 6 depicts the impact of the different prediction time windows on the model performance.

Fig. 6
figure 6

Predicting the performance of multi-time window

Some studies including [18, 21, 34] showed that the ML model gave dependable results (higher AUROC while minimizing false positive and false negative rates) when predicting sepsis at different time intervals 12, 24 and 48 h. These results have clinical significance as they demonstrate that the model’s predictive power remains consistent across the crucial time windows. It’s also worth noting that some studies [12,13,14, 32,33,34, 36] have shown consistent results across early time points (1 to 6 h) which indicates that they can identify septic cases in their early stages.

In the course of this investigation in a study [4], a novel Smart Sepsis Predictor (SSP) model was meticulously developed, employing a Recurrent Neural Network (RNN) architecture. The SSP model was thoughtfully designed to operate in two distinct modes, each harnessing crucial inputs encompassing a spectrum of patient data, including vital signs, demographics, and laboratory values. What sets this model apart is its remarkable proficiency in achieving higher Area Under the Curve (AUC) scores when applied to a 12-h prediction window. This capability arises from its unique capacity to discern intricate and nuanced relationships within vital sign data, thereby facilitating timely alerts to healthcare practitioners. It is noteworthy that the findings reported herein align consistently with the outcomes observed in prior studies, specifically references [34] and [36].

This study [31] introduced a novel early warning model known as the Double Fusion Sepsis Predictor (DFSP), which stands as a hybrid deep-learning framework amalgamating deep features with meticulously engineered attributes encompassing statistical metrics and clinical scores. The outcomes of this investigation present compelling evidence for the superior performance of DFSP when juxtaposed with a pure deep learning model. Specifically, DFSP demonstrates a substantial enhancement in the Area Under the Receiver Operating Characteristic (AUROC) curve across 6, 12, and 24-h prediction horizons. This improvement is attributed to the utilization of fusion strategies, which not only enhance predictive capabilities but also significantly elevate the AUROC scores.

In this study [19], an advanced Sepsis Early Risk Assessment (SERA) algorithm was devised, incorporating both structured and unstructured clinical notes. Through data mining techniques, the SERA algorithm demonstrated enhanced predictive accuracy compared to utilizing solely clinical metrics. The Receiver Operating Characteristic (ROC) analysis of the SERA algorithm consistently surpassed predictions made by physicians across all examined time intervals, exhibiting notably high Area Under the ROC Curve (AUROC) scores even up to 48, 24, 12, 6, and 4 h preceding the onset of sepsis.

Discussion

From the list of 29 included studies, almost all of them, ICU-based studies (68%) and ED-based studies (13%), were conducted in critical care settings in the hospitals, thus showing the significance of ML in critical care data analytics. Early diagnosis and treatment play an important role in reducing the mortality due to sepsis, but advanced and accurate detection of sepsis is still a challenge in the clinical domain. When we discuss the electronic monitoring of sepsis patients for predicting and detecting early symptoms of complications, that's where the ML algorithms come in and play their role by identifying patterns and relationships from vast / big patients' datasets to solve this complex problem [37]. While reviewing the related literature, we found several studies using ML models and algorithms for sepsis prediction as mentioned in the above results section. Linked with the subject of our current study, we found three important scoping reviews / meta-analysis that focused on the potentials of ML for sepsis prediction [2, 38,39]. The review from Deng et al. included 21 studies focusing on early sepsis detection, prediction and mortality. It concluded that no model could be adopted widely yet in general due to the lack of unified validation standards / procedures and the heterogeneity in patients’ cohort, though it referred Deep Neural Networks (DNNs) as more suitable tool as compared to the other traditional tools for high-dimensional and highly heterogeneous patients' sepsis data. Interestingly, it recommended using ML as a feature engineering tool, which reflects the need for and importance of our conducted study in this field; and suggested AUROC as evaluation standard for model performance as in our results. The review and meta-analysis from Fleuren et al. [38] showed ML models prediction for Sepsis ahead of time using retrospective data by examining 28 included studies out of which 24 reported AUROC as their performance metric in critical care settings. and focused on AUROC to analyze model performance whereas we looked at the other metrics like sensitivity and specificity in addition to AUROC. Though the results of this review showed that individual models outperformed the traditional scoring tools, the authors suggested the need of development of reporting guidelines for ML models in critical/intensive care settings and their implementation with diverse patient populations to see the clinical impact. Similarly, another meta-analysis study from Islam et al. [39] included seven observational studies to quantify the performance of ML models for Sepsis prediction. The outcomes showed that.

ML prediction models performed well as compared to existing sepsis scoring systems, such as SIRS, MEWS, SOFA, and qSOFA for identification and prediction of sepsis patients; and suggested for more multi-centered studies with more precise clinical variables for sepsis prediction in the future. In contrast to these review / meta-analysis studies, our review dedicatedly focused on different critical features and feature extraction methods. The results of our study showed the key dynamic features that are pivotal in early sepsis prediction; demonstrated the critical role of feature selection methods in enhancing the efficacy of predictive models in sepsis; and proved the effectiveness of feature extraction models—Random Forest and XG Boost with high sensitivity and AUROC in facilitating the sepsis prediction, and DL showing excellent AUROC values for different predicting time windows (12–48 h.). Concisely, the increased accuracy of sepsis prediction using these ML models can lead to minimizing the hospital mortality rate, reducing the LOS, improving the patient safety, and at the same time saving millions of dollars of investment in large clinical settings, hence proving the potentials and importance of these models in this domain.

This scoping review has several strengths. It followed a comprehensive and systematic approach to assess the landscape of sepsis prediction using ML techniques. It offered a thorough compilation of feature selection and extraction techniques used in the sepsis prediction and identified the top features for sepsis prediction across all studies. Additionally, by categorizing the studies based on features, prediction time and model performance, this study provided a clear comparison of different approaches that can support researchers and healthcare professionals in informed decision making.

There were certain limitations related to features. Firstly, the feature variability, the studies examined in this review utilize a wide array of features reflecting the diversity of clinical data sources and methodologies. However, the variability in selected features across studies can hinder direct comparisons and the identification of universally impactful features. Secondly, the features that prove influential in one clinical context may not necessarily generalize to other healthcare settings or patient populations. Thirdly, many studies identified critical features, but not all provided in-depth insights into the clinical significance or mechanistic explanations of these features. Lastly, this review was constrained by the availability of data in the studies analyzed, as incomplete or restricted datasets can lead to incomplete representation of potentially critical features.

We hope that this review will help clarify which features and methods are most promising for improving the accuracy of sepsis prediction models for future research studies. Ultimately, the findings from this review will be valuable not only for researchers but also for healthcare professionals, as they seek to enhance early sepsis detection and patient care.

Conclusion

To our best knowledge, this is the first study of its kind that reviewed critical features and feature extraction methods for sepsis prediction. Spanning diverse studies, it encompassed over 18,841 features and explored techniques like wrapper, filter, and embedded extraction to assess their impact on sepsis prediction models. The findings of this study highlighted the pivotal role of dynamic features, notably encompassing vital signs, such as Temperature, Heart Rate, and Blood Pressure, alongside critical laboratory parameters including White Blood Cell count (WBC), Creatinine, Bilirubin, Platelet count, and Lactate levels in sepsis prediction. These dynamic features have shown consistent and substantial prominence in prognosticating the onset of sepsis, exhibiting remarkable discriminatory power and pivotal utility in the early detection of septic conditions. In contrast, the demographic variables have evinced comparatively diminished influence in effectively predicting sepsis. For enhancing the predictive efficacy of sepsis models, the strategic implementation of feature selection methodologies has emerged as a crucial factor. The judicious identification and integration of key predictors via Filter, Wrapper, and Feature extraction techniques, these methodologies have effectively mitigated data dimensionality issues and conferred enhanced model stability, thereby facilitating the development of accurate and refined predictive models. In terms of model efficacy, the Random Forest and XG Boost models have exhibited superior performance, with commendable AUROC, sensitivity, and specificity scores. Additionally, Deep Learning models have demonstrated consistent and profound insights into the correlation between features and model predictions, an aspect that conventional Machine Learning models have not been able to fully elucidate yet. These Deep Learning models have demonstrated remarkable AUROC values across different prediction time windows, ranging from 12 to 48 h.

We recommend standardization of feature engineering methods used in sepsis prediction models which will facilitate comparison across studies and will foster consistency. Also, researchers should provide detailed description of the feature engineering process in their publication, including the rationale behind selecting methods and detailed data preprocessing steps.

In summary, this study reaffirmed the crucial role of features in sepsis prediction. The careful choice of feature extraction methods can significantly impact the model’s performance and provide clinicians with valuable insights into complex interrelationships.