Background

Gastric cancer (GC) is a global health problem that remains a significant contributor to the global burden of cancer [1]. The GLOBOCAN 2020 estimates of cancer incidence and mortality produced by the International Agency for Research on Cancer show that gastric cancer is the fifth most diagnosed malignancy worldwide, with more than 1 million incident cases annually and accounting for 5.6% of all cancer diagnoses [2]. Hotspots of incidence and mortality for gastric cancer exist in East Asia, Eastern Europe, and South America, with East Asia being the most affected region [3]. It is foreseeable that clinicians will encounter more and more gastric cancer cases in the future of China [4]. Due to its frequently advanced stage at diagnosis, GC has a poor prognosis [5, 6]. In China, the financial burden on families and society resulting from the treatment of gastric cancer patients is considerable, with an average cost of approximately $10,000 [7].

Conventional detections of gastric cancer include imageological examination, pathological diagnosis, and gastroscopy [8]; however, the cost of these commonly used clinical tests is typically higher. Examination, such as gastroscopy, are time-consuming and may cause minor discomfort to the patient. The interpretation of these test results is also limited by the level of the examining physician and is more subjective. The American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) staging system has been a vital evaluation system for guiding clinical treatment and assessing prognosis [9], which provides useful but imprecise prognostic information. However, in clinical practice, the prognosis is significantly different even in patients with the same pathological classification [10]. Consequently, there is an urgent need to establish a cost-effective model for adverse overall survival (OS) for clinicians to aggressively pursue early intervention and improve the prognosis of GC.

Tumor-related inflammation, which is regard as the 7th hallmark of tumour [11], plays a decisive role in different stages of tumor development [12]. As a result, there has been an increased focus on systemic inflammatory parameters, especially those measured by simple laboratory tests (e.g., platelet, leukocyte, neutrophil, lymphocyte, and albumin analyses). Recent studies have confirmed that many inflammatory factors, such as peripheral blood neutrophil-to-lymphocyte ratio (NLR) [13], platelet-to-lymphocyte ratio (PLR) [14], lymphocyte-to-monocyte ratio (LMR) [15] and systemic immune-inflammatory index (SII) [16], are closely associated with the prognosis of patients with gastric cancer. However, most current studies focus on the prognosis value of GC with a single inflammatory factor, and few studies have considered to establish a prognosis model by different combinations of inflammatory factors for gastric cancer. Therefore, this study attempted to establish a prognostic scoring system by combining the common inflammatory factors with the basic clinical characteristics of gastric cancer patients, and predict the survival rate of gastric cancer patients by nomograms.

Material and method

Study design and population

This study’s population and data were collected from a nation-wide program, the Investigation on Nutrition Status and its Clinical Outcome of Common Cancers (INSCOC). INSCOC was a multi-center retrospective cohort study conducted in China, which was registered online. It was registered in Chinese Clinical Trial Registry (ChiCTR) on December 24, 2018, with the clinical trial registration number ChiCTR1800020329. The complete protocol of this project has been described in previous study [17, 18]. The detailed inclusion and exclusion criteria can be found in Supplemental Table 1. Ultimately, 1,140 patients were included for the final analysis (Fig. 1 shows the flowchart of patients inclusion), and all of them had complete blood biochemistry test results available to support the calculation of inflammatory markers. The study was approved by the Ethics Committee of the First Affiliated Hospital of the Sun Yat-sen University. Written informed consent was obtained from all participants after explanation of the nature of the study. All data were analyzed anonymously with the removal of all identifying information. We followed the principles of the Declaration of Helsinki in this study.

Fig. 1
figure 1

Workflow of patients inclusion

Collection of data and definition of variables

Patients’ electronic medical records were collected within 48 h of admission and interviewed by experienced medical staff using questionnaires developed in previous studies [17, 19]. These included age, gender, lifestyle (smoking; alcohol intake; tea intake), tumor-related data (tumour stage; therapy), nutritional-related metrics [the Nutrition Risk Screening 2002 (NRS 2002) score and Scored Patient-Generated Subjective Global Assessment (PG-SGA) tool], body mass index (BMI), quality of life and performance status assessment [European Organisation for Research and Treatment of Cancer Quality of Life Questionnaire-Core 30 (EORTC QLQ-C30) and the Karnofsky Performance Status (KPS)], and indicators for laboratory blood tests.

Clinical features recorded during hospitalization, including the family history, clinicopathologic staging, and blood biochemistry tests, were retrospectively collected from electronic medical records. Laboratory blood test indicators included pre-albumin levels, total protein levels, C-reactive protein levels, albumin levels, total cholesterol levels, hemoglobin levels, blood glucose levels and blood cell counts(including leukocyte, erythrocyte, lymphocyte, neutrophil and platelet). All of the blood tests were performed within 48 h of initial hospitalization, prior to antineoplastic therapy, and participants fasted for at least 9 h before blood collection. Information on lifestyle habits (including smoking, alcohol, and tea intake) was obtained through the administration of a lifestyle questionnaire. The American Joint Committee on Cancer TNM staging system (8th edition) was used for pathologic staging [9]. Supplemental Table 2 shows the formulas used to obtain the indexes required. We have established a unified endpoint determination committee for the primary study endpoints, and all study endpoints are reviewed and determined by this committee. The members of the committee were blinded to the concrete tasks of the research group.

Outcomes

Overall survival (OS) was defined as the time from tumor diagnosis to death, loss to follow-up, or last confirmed follow-up date, which was the primary endpoint of this study. All patients were followed until June 2022.

Machine learning models

The total population in this study was 1,140. We shuffled the data and divided the population into two sets 70% vs. 30%. We developed the model in “training ones” (n = 798) and used “validation ones” (n = 342) to estimate the model performance. We defined baseline characteristics and biochemical indices as input variables. The binary response variable was the survival status (survival or death). Several machine learning (ML) algorithms developed by current researchers were independently used to predict status of GC patients, including decision tree (DT), adaptive boost machine (ADA), random forest (RF), logistic regression (LR), support vector machine (SVM), and neural network (NNET). The algorithms were selected for their accessibility and prevalence in various cancer studies [18, 20, 21], especially gastric cancer [22, 23].

Initially, we employed a pretraining approach, utilizing all available variables to train models. The major parameters and program details for these models are indicated in Supplemental Table 3. In order to derive optimized models with fewer features and enhance the clinical application, we used the R package “Caret” for feature selection [18], which was performed with 10-fold cross-validation. In a subsequent step, the top 13 most essential factors were picked for training predigested models. The performance of the models was evaluated using receiver operating characteristic (ROC) curves. We calculated the area under the receiver operating characteristic curve (AUC) and 95% confidence interval (CI). Then we statistically compared them to identify the best machine learning algorithm. In the event that multiple algorithms exhibited comparable optimal performance, the “white box” algorithm would be selected as the preferred option, given its interpretability and ease of implementation [18]. Harrell’s concordance index (C-index), decision curve analysis (DCA), net reclassification improvement (NRI) and integrated discrimination improvement (IDI) metrics were used for model assessment to select the appropriate combination of variables.

Statistical analysis

The predictors selected by the model were utilized for multivariate Cox regression analysis. When conducting the analysis, variables that met the Proportional Hazards Assumption were retained, and those that did not were replaced by the variables ranked 14th in the order of importance of the variables, in that order, until all 13 variables were consistent with the assumption. We calculated the cut-off values of the risk score by the Cox model, which was obtained using the R platform’s “survminer” package. Patients were divided into low-risk group and high-risk group based on the risk score. We used the Log-rank test to compare the survival rates, and Kaplan-Meier analysis was utilized to plot the survival curves. We performed univariate and multivariate analyses with Cox proportional risk models to identify independent prognostic features. We employed Hazard Ratio (HR) and 95% CI to estimate the risk of mortality in GC patients. We developed adjustment models in the multivariate adjustment model and then conducted sensitivity analysis.

Based on the results obtained from machine learning algorithms, we conducted a nomogram with chosen variables to predict the individual survival of gastric cancer patients. By comparing the survival probabilities predicted by the nomogram with the observed actual survival probabilities, we performed a calibration curve analysis with internal bootstrap correction to verify the nomogram’s discriminatory and calibration properties. A consistency index was also calculated to quantify the discriminatory performance of the nomogram. In order to facilitate more flexible applications of clinicians, we further simplified the model. We developed a nomogram using selected features on the basis and exported the best model as a Predictive Model Markup Language (PMML) file to support cross-platform deployment [18].

Continuous variables exhibited as median with superior and inferior quartiles were compared utilizing the nonparametric Wilcoxon’s rank-sum test. Categoric variables exhibited as number (%) were compared by the χ 2 test. We utilized the DeLong test to compare the AUC of different models. All tests were two-sided and a P-value of less than 0.05 was regarded as statistically significant. All analyses were implemented using the open-source software R, version 4.2.2 (The R Foundation: https://www.r-project.org/).

Result

Basic characteristics

We indicate extensive baseline characteristics of the patients in the Table 1. A total of 1,140 patients included in this study with a median age of 65.0 years, accounting for 798 males and 342 females. The predominant clinical stages were III (44.65%) and IV (22.90%). Family history of tumor was found in 177 patients. All participants were randomized into a training cohort (n = 798) and a validation cohort (n = 342) to further investigate the predictive value of machine learning based models in GC cancer patients. Table 1 demonstrated the comparison and revealed no significant differences in clinical and demographic characteristics between the two groups (Table 1).

Table 1 Baseline characteristics of the study population

Feature selection and Model Development

We severally trained models in the training cohort by defining all attainable baseline features as the input variables and regarding survival status as the response variable. The complete models were consequently estimated in the validation cohort. The estimation results are illustrated in Fig. 2A, the RF model performed excellent with an AUC of 0.752 (95% CI: 0.697, 0.807). Setting the RF model as the reference, we statistically compared the models’ efficiency. The performance of ADA and SVM were similar with the RF (P = 0.150; P = 0.087), while the other models displayed significantly inferior efficiency (P < 0.05).

Fig. 2
figure 2

Model performance in the data (AUCs were compared using DeLong’s test). (A) Performance of the full ML models. (B) Performance of the simplified ML models after feature selection. (C) Comparison of the ADA and RF in training set. (D) Comparison of the ADA and RF in validation set. (E) Comparison of the full ADA and simplified ADA

The Caret framework has an internal function to appraise variables’ importance; thus, we calculated our models with the function. DT, SVM, and ADA used the AUC as the model metric, different from those models(LR, t statistic; RF, mean decrease accuracy; NNET, connection weights). Supplemental Fig. 1 depicts the comparative importance of the input variables of the complete models. The input variables for each model were ranked in order of importance, and the top 13 variables (top 30%; Supplementary Table 4) for each model were selected for model reconstruction. In the validation sets, the RF model still performed better with an AUC of 0.763 (95% CI: 0.711, 0.815). The RF was set as the reference model; the ADA (P = 0.137) and DT (P = 0.099) were statistically comparable with the RF, while other models displayed inferior performance (P < 0.05) (Supplemental Fig. 2).

Since the purpose of this study was to assess the prognosis of gastric cancer using inflammatory indicators, the variables in the simplified model were evaluated separately for the Proportional Hazards Assumption (PH Assumption) before building the Cox regression model. LCR (P < 0.0001), LCS (P = 0.007), GLR (P = 0.021), and PLR (P = 0.012) did not satisfy the PH Assumption. After the replacement, the variables in the new simplified model were all consistent with the PH Assumption, and we listed the 13 variables utilized to construct the novelty model in Supplemental Table 5.

In the retrained streamlined models, ADA model displayed the most excellent performance, with an AUC of 0.751 (95% CI: 0.698, 0.803). The DT (P = 0.224), SVM (P = 0.093), and RF (P = 0.921) models were statistically comparable with the ADA, while other streamlined models performed worse (P < 0.05) (Fig. 2B). We evaluated the influence of selecting feature on the efficiency of diff ML algorithms (Table 2). Reducing the number of input variables from 43 to 13 strikingly decreased the performance of DT and NNET (all P < 0.05), while the effects on ADA, RF, SVM, and LR were not significant (all P > 0.05).

Table 2 Comparison of the model performance before and after feature selection

We compared the better-performing ADA and RF models, one of them for future use. The combinations of variables in the ADA and RF models were incorporated into the Cox regression models separately. The final model was selected by comparing the calculated C-index, IDI, NRI, and DCA curves in the training and validation datasets (Fig. 2C and D). The ADA model had a higher C-index value than the RF model in the both training and validation sets, but the differences were not statistically significant. The NRI and IDI metrics displayed that the ADA rarely improved regarding discrimination compared with the RF. Observed the curve of DCA, we found the ADA and RF model displayed comparable performance when the threshold probability for predicting 1-, or 3-, 5-year survival in GC patients was > 0.05 or 0.10. Clinicians who utilized either model to predict the probability of survival could gain more benefits than those who chose the strategy of treating all patients or none patients. The ADA displayed lightly better performance than the RF in partial intervals. Therefore, we selected the ADA for future use.

Subsequently, we made a comparison between the complete ADA model with 43 variables and the simplified ADA model with 13 variables in terms of their clinical usefulness and discrimination (Fig. 2E). By comparing the metrics of NRI and IDI, we observed that the complete model barely improved regarding discrimination compared to the streamlined model. Although the differences between the two models were out of significance, the DCA curve displayed that using either the ADA or the streamlined model still gained more benefits in predicting the probability of survival.

Validation of the Survival Prediction ability of Prognostic Model

The predictors in the simplified ADA model were included in the final prognostic Cox model. Supplemental Table 6 displays the HRs (95% CIs) and P-value of the factors in the prognostic model. The complete equation of the prognostic model is demonstrated in Supplemental Table 6’s footnote. According to the calculated cut off score of the risk classification (0.75), the participants were split into two groups–“high-risk ones” and “low-risk ones” (Supplemental Table 7), and the Kaplan-Meier analysis was used to plot a survival curve (Supplemental Fig. 3A-3B). The sensitivity analyses indicated a consistent result (Supplemental Table 8). Furthermore, the results of the ROC curve demonstrated better performance for 1-, 3-, and 5-year both in the training and validation datasets (Supplemental Fig. 3C-3D). The calibration curve’s plot displayed excellent consistency between the observed actual probability and the OS predicted by the prognostic model in the validation set. Next, we performed 1,000 internal cross-validations of this model using the bootstrap method. Supplemental Fig. 3E showed the C-index of 1-, 3-, 5-year.

Nomogram Model for clinicians after Simplification

We screened out independent predictors on the very borderline of significance in the multivariate Cox analysis, including TNM(P < 0.001), ALI(P = 0.009), and AGR(P = 0.058). Through literature review, we found that NLR and PNI, two inflammatory indicators, are also commonly used to predict the prognosis of patients with gastric cancer. Combined with this study, we found that NLR and PNI also had relatively small P values (P < 0.2) in the multivariate Cox analysis. After completing the two-step simplification process, we reduced the indicators involved in establishing the prognosis model to TNM, ALI, AGR, NLR, and PNI. After further simplification of the prognostic model, we screened for five independent predictors. The final nomogram model incorporated these five predictors to predict patient survival (Fig. 3). The complete equation of the prognostic model is displayed in Table 3’s footnote. The OS of GC patients was positively related to tumor stage II, III, and IV (all HRs > 1, all P < 0.05). However, it was negatively related to the AGR and the ALI (all HRs < 1, all P < 0.05).

Fig. 3
figure 3

The nomogram for overall survival prediction in GC patients

Table 3 Cox analysis results of Nomogram Model

The Nomogram model showed better performance than the classical American Joint Committee on Cancer TNM staging system (P = 0.094), with AUC values of 0.753, 0.774, 0.755 at 1, 3, 5 years (Fig. 4A and C). In the subgroups of the population using different treatment modalities, the column-line diagram model also outperformed the classical TNM staging system (Supplemental Figs. 46). In particular, for the population receiving surgical treatment and chemotherapy, the AUC value of the column-line diagram was higher than that of the TNM system in predicting the 3-year prognosis (P = 0.121; P = 0.200), with AUC values of 0.769 (Supplemental Fig. 4B) and 0.726 (Supplemental Fig. 5B). We performed predictive analysis on the established risk score of the Nomogram model, and survival analyses showed that the cohort’s high-risk group had a significantly lower level of OS than the low-risk group (Table 4). The sensitivity analysis showed a similar result (Supplemental Table 9). Patients were stratified based on the risk score cut-off value (0.42), and subsequently, Kaplan-Meier survival curves were generated (Fig. 4D). The Cox model was consequently applied as a web-based risk calculator (https://gcnomogram2023.shinyapps.io/dynnomapp/), likewise an offline risk calculation nomogram.

Fig. 4
figure 4

Nomogram Model performance in total patients (AUCs were compared using DeLong’s test). (A) Comparison of 1-year prognostic ROC for Risk Score calculated by Nomogram Model, Cox Model and TNM Model in total patients. (B) Comparison of 3-year prognostic ROC for Risk Score calculated by Nomogram Model, Cox Model and TNM Model in total patients. (C) Comparison of 5-year prognostic ROC for Risk Score calculated by Nomogram Model, Cox Model and TNM Model in total patients. (D) The Kaplan-Meier survival curves of Risk Score calculated by Cox Model in total patients

Table 4 The univariate and multivariate analysis of risk score in total patients

Discussion

This study was part of a prospective multi-center cohort study that included patients with gastric cancer from several regions in China. The study aims to establish a new inflammation-related score system that can more precisely predict the OS of gastric cancer patients. We tackle this challenging problem with the machine learning approach based on inflammatory indicators and traditional clinical characteristics. This study may help clinicians decide how to treat high-risk patients and guide them in developing management strategies to improve patient outcomes.

Gastric cancer is a malignant tumor characterized by high morbidity and mortality, which has been widely reported in East Asia [3]. TNM staging, as an important evaluation system for guiding clinical treatment and assessing prognosis, are not yet adequate for the needs of individualized and accurate treatment for GC patients [22]. Some studies have demonstrated that the prognostic model constructed by machine learning method and Cox regression analysis has a significantly better evaluation performance than TNM staging [24,25,26]. Therefore, the selection of more representative and easier-to-access metrics for constructing model to accurately appraise the prognosis of gastric cancer patients is a major concern for researchers.

In the prognostic model constructed by the machine learning method, the prognosis of gastric cancer patients was closely related to intermediate and advanced cancer. Inflammatory factors were ranked high in relative importance for the diagnosis of malignant disease. The consequences of multifactorial Cox analysis displayed that AGR, ALI and TNM stage were independent prognostic indicators for GC patients. Several studies have shown that reduced pretreatment ALI is an independent risk factor for OS in cancer patients, particularly GC patients [27,28,29,30]. These findings are consistent with this research. The results of Nomogram Model in this study showed that the HR of ALI was 0.994, indicating that high levels of ALI have a protective effect on the prognosis of patients with GC. AGR was deemed to be a valid combination of the two predictive indices. Previous meta-analyses have shown that a lower level of AGR is related to lower survival rates in digestive system cancers [31]. The investigators concluded that AGR can be a valid prognostic indicator for GC, which may help clinicians find GC patients with high-risk who require pre-treatment interventions in their clinical practice [32,33,34]. In this study, the P < 0.05 of AGR after multifactorial Cox analysis, and the HR was 0.643. Combined with previous studies, AGR can be considered as a protective factor for GC prognosis. Previous studies have indicated that the remaining inflammatory factors such as NLR and PNI can also be utilized as independent prognostic factors in gastric cancer patients [35,36,37]. However, since this study included them as continuous variables in the multifactorial Cox analysis, the results indicated that the P-values all exceeded 0.05, and further studies can be conducted after finding the appropriate cut-off values to classify them in subsequent studies.

Despite the extensive global adoption of the TNM stage system, it still has some limitations. Until now, TNM staging has always been evaluated on the basis of anatomical factors. However, tumor diagnosis and treatment have entered the era of precision therapy, and there is a particular need to incorporate markers related to inflammatory response into the prognostic assessment [38, 39]. To alleviate the perceived limitations, several researchers have derived new tools or nomograms [40, 41]. Many studies have correlated systemic inflammation with the development and progression of malignancy and patient prognosis [42, 43]. The tumor inflammatory microenvironment is complex and dynamic, involving crosstalk between various immune cells and tumor cells [44]. This phenomenon not only promotes the development and progression of cancer, but also significantly affects patient prognosis [45]. Compared to other assays, blood biochemical tests are easier and more convenient, and their price is relatively low; therefore, inflammatory factor indices obtained by this method are gradually coming into the limelight. Inflammatory prognostic scores calculated using inflammatory factors and related parameters have shown promise in a variety of tumor types. Accordingly, the nomogram based on the systemic immune and inflammation indicator is superior to the available systems for predicting survival in patients with gastric cancer. Facing multiple metrics, how to choose the right combination of metrics to build a model is also an essential conundrum. To solve this problem, ML algorithms have become the first-line [46,47,48]. Several studies have exhibited the efficacy of machine learning algorithms in selecting indicators to construct predictive models [24, 25, 49]. Turkki R [50] developed support vector machines and artificial neural networks to predict breast cancer prognosis and obtained good efficiency of the model. Wang et al. used five classifiers [51] to sort the postoperative clinical characteristics of colon cancer patients by importance. In this way, we can develop survival prediction models suitable for the survival prediction of gastric cancer patients in China.

A distinguishing characteristic of the prognostic model used in this study is the incorporation of various inflammatory factors and the associated inflammatory prognostic scores. We selected variables from the best-performing models in the machine learning method as predictors to construct the column line graph model and used the training set to predict OS in GC patients. 1,000 replicate bootstraps showed good model accuracy as evidenced by a C-index value of 0.724. Compared with TNM Model, 1, 3 and 5-year OS AUC values indicated the high diagnostic effectiveness of our model in predicting OS in GC patients. In order to mitigate the potential confounding effects of diverse treatment modalities on the prognosis of patients with GC, we also compared the performance of our Nomogram Model and the TNM Model within subgroups receiving different treatments. The findings revealed that our Nomogram Model outperformed the traditional TNM staging system, with significantly higher AUC values in predicting 3- and 5-year prognosis, particularly among patients undergoing surgery and chemotherapy. The simplified Nomogram model, which was facilitated to be easier for clinicians to use, also showed superior performance compared to the TNM model. We separated patients into high and low groups by the risk score calculated simplification model and found that this risk-based stratification also significantly differentiated patient OS.

The primary strength of this study lies in its prospective multicenter design, which employs multiple machine learning approaches to screen metrics and compare/validate the constructed models. The metrics used to build the Nomogram Model are readily available in clinical practice and offer greater convenience for stratifying clinical prognosis and optimizing treatment strategies. Currently, patients with gastric cancer are more likely to undergo surgery or chemotherapy, and our constructed Nomogram Model demonstrates significant superiority over TNM staging in these two groups, warranting its clinical application.

There are several potential limitations in this study. Firstly, because of the study’s retrospective design, the sample may suffer from selection bias. Besides, the prognosis of the gastric cancer patients is intricate and vulnerable to physical and environmental elements. Other confounding elements that may influence patients’ prognoses need to be considered. Third, despite the DCA results supporting the clinical utility of the final model, we still need more assessment of patients under treatment-specific data to prove our consequence. Fourthly, among the 1,140 participants included in the study, only 7 received immunotherapy. Due to the small number of participants, the analysis was not possible, the subgroup analysis of the prognosis and efficacy of immunotherapy patients was not supplemented in this study. In addition, the inflammatory indices and their dynamic changes with treatment response also need to be explored. However, it is regrettable that we do not currently gather such data. We will consider incorporating this section if additional data becomes available in the future. Finally, although we internally validated the predictive value of our model, the results were not proved by other independent datasets, thus we could not confirm the external validity. Future studies with larger sample sizes and broader clinical characteristics of gastric cancer patients are warranted to address these issues.

Conclusion

In summary, it is effective that establish prognosis model developed by inflammatory indicators for gastric cancer patients. In this research, we developed a clinical prognostic model for gastric cancer using the machine learning approach, which performed well in predicting the prognoses of gastric cancer patients combining conventional clinical features and inflammatory indicators. The model was implemented as an online tool and nomogram, which can help clinicians make decisions and guide management strategies for better prognostic outcomes for patients.