Background

Primary gastric lymphoma (PGL) is a malignant extranodal lymphoma originating from gastric submucous lymphoid tissue [1,2,3]. PGL represents almost 30-40% of extranodal lymphomas and approximately 5% of all gastric malignancies [1,2,3]. The two predominant histological subtypes are extranodal marginal zone lymphoma of mucosa-associated lymphoid tissue (MALT) and diffuse large B cell lymphoma (DLBCL), constituting around 90% of all cases [1,2,3]. So far, there is still no consensus on the optimal treatment, especially for H. pylori negative MALT, DLBCL, and advanced PGL [4, 5]. Given the rareness of this disease, treatment recommendations are mostly based on case-series data, rather than large randomized clinical trials [6]. The Ann-Arbor stage system is a tool commonly used by oncologists to predict disease progression and design therapeutic strategies for lymphoma, taking the location of lymph node spread as the basis for staging. However, it does not include other factors that may affect survival, such as personal and cancer treatment information [1, 3]. Considering various factors affecting the etiology of lymphoma, the prognosis based on the Ann-Arbor staging system alone is unreliable [7]. Nomogram is a reliable and convenient prognostic tool, which has been widely used to predict the overall probability of specific outcomes in clinical oncology by incorporating a number of prognostic factors [8,9,10,11,12,13]. It can use known and important prognostic factors to quantitatively predict the prognosis in certain patients and explain the numerical probability of clinical outcomes [8, 9]. A prognostic tool [14] has recently been proposed for primary gastric DLBCL. However, a complete nomogram for predicting survival in patients with PGL has not been reported. The purpose of this study is to develop comprehensive and effective nomogram based on the data retrieved from the SEER and Chinese cohorts to better predict the survival rate of PGL patients.

Methods

Study population

The Surveillance, Epidemiology, and End Results (SEER) project sponsored by the National Cancer Institute (NCI) is a free public database (https://seer.cancer.gov/). It consists of 18 registered cancer centers and currently covering approximately 30% of the United States population. We got access to the SEER Research Data. SEER*Stat software (Version 8.3.9.2; NCI; Bethesda, MD) was used to extract clinical data of patients with PGL from the SEER in 2000-2018 (November 2020 Submission). Patients with PGL were identified by the International Classification of Diseases for Oncology, the third edition (ICD-O-3) histologic codes 9590-9599, 9650-9669, 9670-9699, 9700-9719, 9720-9729 for lymphoma and primary site codes C16.0-C16.9 for gastric. Patients with more than one primary tumor site, without pathological confirmation, race records, marital status, stage records, survival months and active follow-up were excluded from the extracted patients. The demographic and clinical characteristics obtained in the database are as follows: age at diagnosis, sex, race, marital status, primary tumor site, histology, cancer stage, surgery therapy, radiotherapy, chemotherapy and survival months until death or last follow-up. The cancer stages were according to the Ann Arbor staging system, which can be found in the American Joint Committee on Cancer (AJCC) Cancer Staging Manual (7th edition) or Union for International Cancer Control (UICC) staging manual. The overall survival (OS) was calculated from the date of diagnosis to the date of death or the date of censoring, regardless of whether the patients survived at the last follow-up. The cancer-specific survival (CSS) was calculated from the diagnosis date to death date only from PGL or censorship date if patients were alive or dead from another cause.

For clinical data, patients initially diagnosed with PGL from May 2015 to September 2021 in our Cancer Center were included in this retrospective study, the last follow-up was in November 2021. The inclusion and exclusion criteria were in line with the SEER cohort above. However, cancer-specific deaths have not been observed in the Chinese cohort. This study was approved by the Ethical Committees of Union Hospital, Tongji Medical College, Huazhong University of Science and Technology and in accordance with the Helsinki Declaration.

Statistical analysis

In the SEER database, we identified 8898 eligible PGL patients in order to compile the effective OS and CSS nomograms. One hundred and twenty-seven patients initially diagnosed with PGL were assigned to the validation cohort. Cox proportional hazards model was used for univariate and multivariate analyses to identify the variables that significantly influenced OS and CSS in the training cohort. Then, OS and CSS nomograms based on independent prognostic factors were constructed by training cohort. The OS nomogram was validated in both the training and the validation cohort. The predictive performance of nomograms was evaluated with discrimination and calibration tests. Concordance index (C-index) and receiver operating characteristic curve (ROC) were used for assessing the discrimination, and calibration curve was used to compare actual results and survival probability predicted by nomogram. For this purpose, bootstraps with 1000 resamples were used [15]. The web-based OS and CSS probability calculators were built using packages “DynNom” and “shiny” in R software. Statistical analysis was performed using the SPSS Statistics software (version 23.0; IBM Corporation; Armonk, NY) and the R software (version 4.0.3). Hazards ratios and 95% confidence intervals (95% CI) were calculated. Two-sided P value less than 0.05 was considered statistically significant.

Results

Demographic and clinical characteristics

A total of 8898 patients in the SEER cohort and 127 patients in the Chinese cohort met the inclusion criteria. Table 1 summarizes the demographic and clinical characteristics of patients in the training and validation cohorts. In SEER training cohort, most patients aged between 60 and 79 years old (48.57%). Over half of eligible patients were male (54.57%) and married (58.65%). The majority of patients were white (80.22%), followed by black (9.53%). The primary site of the patients' tumor was 12.13% in the cardiac and gastric fundus, 13.49% in the gastric body, 16.10% in antrum and pylorus, and 9.59% in the lesser and greater curvature. DLBCL and MALT histologic subtype accounted for 42.23% and 44.83%, respectively. About 61.77% of patients were diagnosed with Ann Arbor phase I disease. Of the PGL patients, 50.24% received chemotherapy, only 10.29% underwent surgery and 24.53% performed radiotherapy.

Table 1 Demographic and clinical characteristics in the training and validation cohorts

Identifying independent prognostic factors for OS and CSS

Univariate and multivariate Cox proportional hazards analysis of OS and CSS in patients with PGL in the training cohort were listed in Tables 2 and 3. Univariate analyses showed that age at diagnosis, sex, race, marital status, histology, stage, radiotherapy and chemotherapy were associated with OS. Multivariate analyses identified eight variables, including age at diagnosis, sex, race, marital status, histology, stage, radiotherapy and chemotherapy, to be significantly associated with OS. Age at diagnosis, sex, marital status, primary tumor site, histology, stage, surgery, radiotherapy and chemotherapy were closely related to CSS in univariate analyses. Age at diagnosis, sex, marital status, primary tumor site, histology, stage, radiotherapy and chemotherapy were independent risk factors for CSS in multivariate analyses.

Table 2 Cox proportional hazard regression analysis of OS in patients with primary gastric lymphoma in the training cohort (n=8898)
Table 3 Cox proportional hazard regression analysis of CSS in patients with primary gastric lymphoma in the training cohort (n=8898)

Nomogram construction

The prognostic nomograms based on sorted significant independent factors from multivariate analysis for predicting 3-year or 5-year OS and CSS in the training cohort were shown in Fig. 1. Scores were given at each level of each variable, and the total score was obtained by adding a score on the point scale of each selected variable, which subsequently helped correlate with the probability of the event for each patient.

Fig. 1
figure 1

OS and CSS associated nomograms for PGL patients. A OS nomograms for PGL in 3-year and 5-year; B CSS nomograms for PGL in 3-year and 5-year.

Nomogram validation

The OS nomograms were validated externally. In the training cohort, the prognostic nomogram C-index predictied by OS and CSS was 0.716 (95% CI: 0.708–0.794) and 0.767 (95% CI: 0.757–0.777), respectively. While, in the validation cohort, the C-index was 0.948 (95% CI: 0.901–0.995). However, due to the lack of cancer-specific survival information, it was not possible to obtain the C-index of the validation cohort. The calibration plots of the training and validation cohorts for 3-year or 5-year OS showed an optimal match between nomogram prediction and actual observed outcomes (Fig. 2). A high area below the ROC curve (AUC) was also observed in both training and validation sets, respectively (Fig. 3).

Fig. 2
figure 2

ROC curves for the nomograms. The ROC curve for the nomogram with 3-year OS A and 3-year CSS C in the training cohort and 3-year OS E in the validation cohort; with 5-year OS B and 5-year CSS D in the training cohort and 5-year OS F in the validation cohort.

Fig. 3
figure 3

Calibration plots for the nomograms. The calibration plots for the nomogram with 3-year OS A and 3-year CSS C in the training cohort and 3-year OS E in the validation cohort; with 5-year OS B and 5-year CSS D in the training cohort and 5-year OS F in the validation cohort.

Web-based probability calculators

According to the multivariate results above, a dynamic nomogram was created for prediction of OS probability in patients with PGL, which was of great convenient and intuitive to individually prognosis prediction based on the personal characteristics of PGL patients (https://yangjinru.shinyapps.io/DynNomapp/). For instance, the 5-year survival rate of DLBCL patients was approximately 86.0% (95% CI: 84.0%-87.0%) in 40-59 years old, male, white, married, stage I patients with radiotherapy, chemotherapy and without surgery, see details in Fig. 4, and in patients with MALT more than 80 years old, female, black, separated/divorced, stage IV, with chemotherapy, without surgery and radiotherapy, 33.0% (95% CI: 27.6–40.0%) of them would survive within 60 months (Additional File 1: Fig. S1).

Fig. 4
figure 4

A web-based PGL probability calculator. The 5-year survival probability of PGL with DLBCL, 40–59 years old, male, white, married, stage I patients with radiotherapy, chemotherapy and without surgery showed in the dynamical nomogram. A The estimated survival probability. B Numerical summary showed the probability and its 95% CI.

Discussion

PGL is the most common extranodal NHL and ranks as the second most common tumour of the stomach [1,2,3]. PGL is a relatively rare cancer and is easily misdiagnosed due to its non-specific symptoms [16,17,18]. There are many treatment options include gastrectomy, radiotherapy, chemotherapy, immunotherapy, and observation, while gastrectomy remains controversial due to a considerably favorable prognosis versus quality of life [19,20,21,22,23,24,25]. A previous study by Wang et al [2] showed that advanced stage and malignant pathological type are significantly associated with poor overall survival. Most studies demonstrated that female gender, low-grade histology, good PS, and surgical resection were associated with better overall survival [2, 18, 21]. Our study corresponds to these previous reports except for gastrectomy. The balance between efficacy of gastrectomy and quality of life requires further studies.

Although the Ann-Arbor staging system is widely used and recognized for PGL forecasting, it neglects some significant risk factors such as age, race and marital status [26, 27]. In addition, the International Prognostic Index (IPI) and related indices can also divide the prognosis of lymphoma into risk groups. The evaluation criteria included age, stage, ECOG score, extranodal lesions and LDH level, but do not include cancer-related treatment information, therefore it is necessary to develop a more systematic model for predicting PGL risk and to make better therapeutic strategies for individual patients. It is known that nomograms were used to predict the survival status of various diseases [28]. Therefore, we constructed a comprehensive nomogram based on different risk factors to better predict the prognosis of PGL patients. Although the clinical treatment of inert lymphoma, such as MALT and invasive lymphoma is different, the nomogram can well distinguish PGL by pathological type, and better guide the treatment and predict prognosis. These nomograms were able to perform more accurate evaluation and predictions in patients with PGL in both training and validation cohorts, and the results of C-index and calibration curves showed that the models were repeatable and reliable. As far as we know, this is a comprehensive and in-depth large population study aimed at building nomograms for PGL patients, and a web-based dynamic nomogram can directly help clinicians quantify the probability to provide personalized and accurate treatment for patients and to determine the best follow-up time according to disease progression and recurrence rate. CSS is an epidemiological statistical method that can rule out non-cancer deaths. It is an ideal prediction model, which differs from the actual situation, but the index can better predict the mortality attributable to the cancer, and better design precise treatment strategies for patients.

However, there are several limitations in our study. First, our nomogram has been externally validated in a single cancer center, which requires multi-center and large samples verification. Second, some potentially vital information related to the prognosis, such as surgical details, the surgical margin status, vascular invasion, hematological indicators, molecular pathologic characteristics, cell of origin (COO), immunohistochemical results and Helicobacter pylori infection status, family history were not included in the SEER database, which could improve predictive ability if incorporated. Third, the majority race of SEER cohort was white and all patients in the Chinese cohort were yellow, therefore the study could be with potential racial heterogeneity. Finally, due to lack of data, we excluded patients from the study, which may lead to potential selection bias.

Conclusion

To sum up, we constructed a web dynamic nomogram based on the SEER database to better predict the prognosis of PGL patients, and pre-validated it in the Chinese cohort. The nomogram only needs basic information, and has a wider range of application.