Introduction

Gastric cancer (GC) is among the most common malignancies globally, with particularly high incidence in China, Korea, and Japan [1, 2]. In China, GC has caused approximately 498,000 deaths nationwide in the year 2015, accounting for nearly half of GC-related deaths on the globe [3, 4]. It has been thought that late detection and delay in diagnosis and treatment are strongly associated with the high mortality of GC, while early detection is critical to reduce the mortality and make a substantial improvement in the survival of GC patients [5,6,7]. Unfortunately, a majority of GC cases in China are diagnosed at its advanced stage and this has posed tremendous challenges in effective treatment of GC, led to poor prognosis as well as high mortality of GC nationwide [5, 7]. Currently, there is no established screening program for GC in China, and early GC (EGC) is mainly detected in opportunistic screening [4, 5, 8, 9]. Given the high incidence and mortality of GC in China, it is urgently needed to find an effective and feasible screening program for GC, especially to the population in the high-risk areas, where the incidence of GC is greater than 30/100,000.

Gastroscopy, along with a histological examination of gastric mucosal biopsies, is considered the gold standard for screening, detection, and early diagnosis of GC. In a massive screening program for GC, initial risk stratification is usually conducted to identify the individuals at high risk prior to gastroscopy. In fact, the past decade has witnessed substantial progress in the development of models to predict risk levels in relation to GC, after which individuals at higher risk were identified to undergo gastroscopy for detection and early diagnosis of GC. Of these models, the ABC method was originally developed by Miki et al., using assays of serum levels of pepsinogens (PG), including PGI and PGII, in combination with anti-Helicobacter pylori IgG antibody for initial risk assessment for GC, and has been shown beneficial to screening for GC in the Japanese population [10]. A number of previous studies have tested the ABC method in screening for GC in the Chinese population, and identified the limitation in defining atrophic PG status [9, 11]. Thus, a new ABC method was recommended [12]. Then, Tu and colleagues assessed the values of five serum markers, including PGI, PGII, the ratio of PGI/PGII (PGR), anti-H. pylori IgG antibody, and gastrin-17 (G-17), for stratifying the individual’s risk of developing GC and identifying individuals at higher risk needed for further gastroscopy [9]. In comparison with the ABC method, Tu’s prediction model showed better performance in predicting GC risk and identifying individuals who were at higher risk of developing GC and needed for diagnostic gastroscopy [9, 10]. It was of note that the participating individuals presented upper gastrointestinal (UGI) symptoms or had a family history of GC [9], and the performance of Tu’s prediction model in the population without UGI symptoms remains to be investigated. Most recently, a new model for predicting the risk of GC in the screening for GC has developed in the Chinese asymptomatic population in a cross-sectional study [8]. Compared with the three previously published models (i.e. the ABC method, new ABC method, and Tu’s prediction model), this prediction model showed better performance and ability in identifying individuals with GC, and with its potential in detecting GC at early stages [8]. Although the strength of Li’s prediction model has been demonstrated, the model has not been externally validated in a large population. To date, a direct efficacy comparison of these proposed models in a large-size population in a Chinese high-incidence area is lacking. In addition, gastric precancerous lesions, if left untreated, can progress to GC, and thus, prediction models for precancerous lesions are also needed to effectively identify the high-risk patients.

The incidence of GC was reportedly 34.71/100,000 during the period between 2010 and 2014 in registration areas of Zhejiang Province, China [13], higher than 30/100,000 for defining the high-risk areas of GC in China. In an effort to reduce the high mortality and to improve the 5-year survival rate for GC, the screening for GC and early diagnosis as a provincial program was initiated and conducted in Zhejiang Province in 2016. The aim of this present study, as a part of this program, was to evaluate the performance of the various prediction models described above for GC screening in a large, high-risk population in China.

Study subjects and methods

Study population

In this retrospective study, data were obtained from the Provincial Gastric Cancer Screening Program (Zhejiang, China) spanning from October 2016 to April 2019. A total of 97,541 individuals in the urban areas of 10 cities, Zhejiang province, China, were willing to participate in this program. All participants had no gastrointestinal symptoms or occasionally mild symptoms. The demographic and clinicopathological characteristics of all participants are listed in Table 1. The detailed information was presented in the supplementary material.

Table 1 Demographic and serological characteristics of the study population, and clinicopathological features of individuals undergoing further gastroscopic examination

This retrospective study was reviewed and approved by the Ethics Committee of the First Affiliation Hospital of Zhejiang Chinese Medical University (2020-KL-020-01), and registered in www.chictr.org.cn (Chinese Clinical Trial Registry, ChiCTR2100043363). All study procedures involving humans were conducted in compliance with the 1964 Helsinki Declaration. Participants had provided their signed informed consent.

Data collection, serological tests, upper endoscopy and histopathology, and the four prediction models

The details of data collection, serological tests, upper endoscopy and histopathology, and the four prediction models were described in Supplementary Material.

Statistical analysis

Statistical analysis was carried out using SPSS software (IBM, Armonk, NY, USA). Demographical and clinical characteristics of the study population were described as frequencies for categorical variables. Continuous variables were expressed as Mean ± standard deviation (SD) or median (interquartile range). A risk score was assigned to each participant after summing up all score values. Receiver-operating characteristic curves (ROCs) were plotted, and the area under the ROC curve (AUC), with 95% confidence intervals (CI), was calculated. The Youden index of each model was calculated using the formula as follows: Youden index = (Sensitivity + Specificity) -1. The values of the AUC and Youden index were used for the evaluation of the performance of the four models. A two-sided P value < 0.05 was considered statistically significant. All authors had access to the study data and reviewed and approved the final manuscript.

Results

Demographic, serological, and clinicopathological characteristics of the study population

A total of 97,541 individuals from the urban areas of the participating cities who fulfilled the criteria participated in this study and the four models were used for risk assessment of GC. Among these participants with initial serological analyses for GC screening, 6005 individuals voluntarily underwent further gastroscopic examination in the screening program, accounting for 6.16% of all participants. The demographic and serological characteristics of the study population and clinicopathological features of the gastric mucosal biopsies of individuals with gastroscopy are summarized in Table 1. Among those undergoing gastroscopy, 72 GC cases (1.20%) were detected, in which 30 (41.67%) were diagnosed as early GC, and 42 (58.33%) were diagnosed as advanced GC. Moreover, precancerous lesions, including, were detected in 2006 participants (270 AG, 1634 IM and 102 dysplasia/IEN) undergoing gastroscopy (Table 1). In addition, five cases of esophageal squamous cell carcinoma (ESCC) and six cases of gastroesophageal junction adenocarcinoma (GEJAC) were identified as malignancies other than true gastric adenocarcinoma (GAC).

The detected GC patients were treated with either endoscopic submucosal dissection (ESD) for early GC or surgery for advanced GC. For the individuals identified to have precancerous lesions and tested positive for H. pylori infection, they were treated with H. pylori eradication therapy, and then with [14].

Performance of various models for predicting GC

The four prediction models were assessed for their performance in the screening for GC. The numbers and proportions of cases with GC, including early and advanced GC, among subgroups of the various prediction models are summarized in Table 2. There were significant differences in the detection rates of GC among the subgroups of Li’s prediction model (P < 0.001) and the new ABC method (P = 0.002), whereas the ABC method and Tu’s prediction model did not show a significant difference in the GC detection rates among the subgroups (Table 2). There were significant differences in the detection rates of predicting early GC among the subgroups with Li’s prediction model (both P < 0.001) and Tu’s prediction model (P = 0.039), but not with the ABC method and new ABC method (Table 2). The detection rates for advanced GC were significantly different among the subgroups with Li’s prediction model (P < 0.001) and the new ABC method (P = 0.001), but not with the ABC method and Tu’s prediction method (Table 2). Notably, the detection rates were significantly different among the subgroups for both early and advanced GC with Li’s prediction model, which detected 75.0% (54/72) of GC cases: 76.7% (23/30) of early GC cases, and 73.8% (31/42) of advanced GC cases (Table 2).

Table 2 Comparison of detected gastric cancer cases and their proportion by risk-stratified subgroups in the four evaluated models

To further assess the performance of the four models, the ROCs were plotted (Fig. 1), and their AUCs and the Youden indexes were calculated (Table 3). ROCs yielded an AUC of 0.708 (95% CI 0.643–0.773) for Li’s prediction model (P < 0.001), which was higher than that of 0.566 [95% CI 0.496–0.636] for the ABC method (P = 0.054), 0.496 [95% CI 0.420–0.572] for the new ABC method (P = 0.905), or 0.534 [95% CI 0.463–0.605] for Tu’s prediction model (P = 0.318) (Table 3, Fig. 1). Li’s prediction model had significantly better discrimination ability to predict GC in comparison with the ABC method, the new ABC method, or Tu’s prediction model (all P < 0.001). In addition, LI’s prediction model had a Youden index of 0.319, which was higher than that of the original ABC method (0.103), the new ABC method (0.103), or Tu’s prediction model (0.079) (Table 3). Thus, Li’s prediction model had the highest AUC and the Youden index values among the evaluated prediction models.

Fig. 1
figure 1

Receiver-operating characteristic (ROC) curves and Youden indexes for the evaluation of discrimination ability of the four prediction models. ROC curves were generated for the four prediction models, including the original ABC method, new ABC method, modified ABC method, and Li’s prediction model. The area under the ROC curve (AUC) and the Youden indexes were used for the performance evaluation of the four models. The AUC and the Youden index values are the highest for Li’s prediction model among the four models

Table 3 The area under the receiver-operating characteristic curve and Youden index for each of the four models

Performance of various models for predicting precancerous lesions

Overall, precancerous lesions, including 270 AG, 1634 IM, and 102 dysplasia/IEN, were detected in 2006 of the 6005 participants undergoing gastroscopic examination; some of whom had more than one precancerous lesion. The performance of the four models in predicting precancerous lesions, including AG, IM, and dysplasiaIEN, was evaluated. The numbers of detected cases with precancerous lesions and their proportions classified by the ABC method, new ABC method, Tu’s prediction model, and Li’s prediction model are shown in Table 4. The detection rates of all these precancerous lesions in Li’s prediction model and Tu’s prediction model were significantly different among the three risk-stratified subgroups (P < 0.001 for AG, IM and dysplasiaIEN).

Table 4 Comparison of detected precancerous lesions and their proportions by subgroups applied to gastroscopies in the four evaluated models

Discussion

This is the first study, to the best of our knowledge, in which the previously published four prediction models were assessed in parallel for their performance in screening GC among a large-size population in a Chinese high-incidence area. The screening program enrolled 97,541 individuals, of which 6005 individuals voluntarily participated in further gastroscopy. The major novel findings are summarized as follows: (1) GC was detected in 72 (1.20%) of the 6005 participants: 30 (41.67%) with early and 42 (58.33%) with advanced GC; (2) One or more precancerous lesions were identified in 2006 cases: 270 with AG, 1634 with IM, and 102 with dysplasia/IEN; (3) Li’s prediction model was the most accurate model for risk stratification of GC, as evidenced mainly by its discriminatory detection rates for overall, early and advanced GC among subgroups, the highest values of AUC and Youden index among various models, while the performance of the other three models was similar; and (4) Li’s prediction model showed better performance in risk stratification of precancerous lesions (AG, IM, and dysplasia/IEN) in comparison with the other three models. These findings are of particular importance, as they confirm that Li’s prediction model is an accurate and reliable method for the initial identification of individuals at higher risk who should undergo gastroscopy for further diagnostic examinations in high-incidence areas in China.

A screening program for GC has not been available in China nationwide, even in the high-incidence areas (i.e. incidence of GC > 30/100 000), where a majority of GC cases are initially detected or diagnosed at its late stage. The present study was conducted to evaluate in a high-risk Chinese population the efficacy of the four published prediction models derived from different populations on the basis of various serological markers and other risk factors and to identify a reliable model to be used for identifying individuals at higher risk who need further diagnostic gastroscopy. This is particularly important, as both endoscopists and facilities are limited to sever the very large population in China, especially in the high incidence areas, where dietary habits and local foods are likely contributing factors in the transformation from precancerous lesions to gastric malignancies [15]. In the present study, the participants came from geographically diverse, high-incidence areas (GC incidence of 34.71/100,000). A total of 72 GC cases were detected and pathologically confirmed in this study. Of these cases, 30 patients diagnosed as having early GC subsequently received ESD procedures, which are expected to lead to good clinical outcomes. In addition to the 72 GC cases, gastric precancerous lesions were identified a larger number of cases, i.e. 2006 in total (270 AG, 1634 IM, and 102 dysplasia/IEN). H. pylori infection is known to be a major cause of precancerous lesions. So, these patients with precancerous lesions were scheduled to take a further test for H. pylori infection. For patients confirmed positive for H. pylori infection, they were treated with an eradication regimen, and followed up regularly, according to the consensus and guideline in China (Shanghai, 2017; Changsha, 2014), to not only decrease the risk of GC development but also early detect GC.

Most recently, Li’s team performed a multicenter cross-sectional study of the Chinese high-risk population, including 14,929 eligible individuals aged 40–80 years, and developed the prediction rule to be used for initial identification of higher-risk individuals for further gastroscopy [8]. According to Li’s prediction model, the diagnostic gastroscopy was applied to the study population in medium-risk (scores 12–16) and high-risk groups (scores 17–25), 70.3% of early GC cases and 70.8% of total GC cases regardless of the stages were detected [8]. In the present study, Li’s prediction model detected 75.0% of GC cases, 76.7% of early GC cases, and 73.8% advanced GC cases according to the recommendation of this model, which were higher than those obtained with the ABC method, the new ABC method, and Tu’s prediction model. Our results were basically consistent with that reported by Li’s team [8]. These findings obtained using Li’s prediction model are encouraging and helpful for GC screening, and demonstrate that Li’s prediction model is a reliable screening tool to be used for identifying individuals at higher risk who require further diagnostic gastroscopy. Moreover, based on the detection rates obtained with Li’s prediction model in the individuals who voluntarily underwent gastroscopy, it would be estimated that the total GC case number among 97,541 individuals would be 954, including 341 (64,331 × 0.53%), 368 (28,317 × 1.3%), and 245 (4893 × 5.01%), respectively, in those with low-, medium- and high-risk (Tables 1 and 5). Thus, using Li’s prediction model, approximately two-thirds of GC cases, (64.3% (613/954) in total, 64.5% (245/380) of early GC, and 64.1% (368/ 574) of advanced GC), would be detected among the 97,541 individuals in our initial screening program, which would avoid about two thirds (66.0% (64,331/97541) of endoscopic procedures (Table 5).

Table 5 Estimated gastric cancer cases among the 97,541 individuals using Li’s prediction model

Our study has potential limitations. First, the population in diverse geographical areas of Zhejiang Province in Southeastern China was studied for evaluating the performance of prediction models. Due to geographic variations in risk factors for GC, the performance and efficacy of the prediction models may vary in other regions in China. Second, a selection bias between with and without gastroscopy occurred, as individuals with gastroscopy were older and lower level of PG ratio than those without gastroscopy, which may influence the prevalence of GC. To reduce the bias in this real-world study, the results of GC detection rate were expressed as percentages and 95% CI, and the estimated number of cases in the corresponding CI had been calculated (Table 5). Third, some other prediction models have been reported but not evaluated for their performance in the present study. For instance, in a recent study by Zhan’s team, serological tests of H. pylori IgG antibody and PGs, in combination with a family history of GC, were used for risk stratification and identification of individuals for further gastroscopy and gastric mucosal biopsies [16]. In this study, all individuals in Groups C and D were mandatorily referred for a further gastroscopy, while a proportion of individuals in Groups A and B, consisting of individuals with a family history of GC in their first-degree relatives (FDR) and those selected at random from the remaining individuals, were referred for undergoing gastroscopy [16]. It was of note that the detection rate of GC in the study was 1.6% (GC cases per 100 gastroscopies) and 1.8% (GC cases per 1000 serological tests) [16]. In the present study, we yielded a detection rate of 1.20/100 gastroscopies (72/6005 gastroscopies). Actually, a number of previous studies have identified a history of GC in FDRs as a risk factor for both GC and gastric precancerous lesions [17,18,19,20]. Fourth, serology for anti-H. pylori IgG antibody was used for an evaluation of H. pylori status in the present study, as the test is less costly and more feasible than the 13C-urea breath test, which seems to be more accurate than serology. Fifth, it has to be pointed out that 25.0% of GC cases would be missed in the study population (n = 6005) on the basis of the risk stratification using Li’s prediction model in the present study. As such, other risk factors should be considered to improve the performance. For instance, genetic variations, such as CYP1A1 and cadherin one genes, have been reported with a relationship to the development of GC or GC susceptibility [21], integrating genetic factors into the prediction model might further improve predictive ability.

In conclusion, Li’s prediction model is an accurate and reliable model with the best performance among the four evaluated prediction models in stratifying the risk for GC and identifying the higher risk individuals for further gastroscopy in the screening, detection, and early diagnosis of GC. In addition, its discrimination power for GC and precancerous lesions is also greater than the other three models. Thus, Li’s prediction model is a feasible and reliable tool in GC screening of the Chinese population. As screening gastroscopy for GC in the general population has not been covered by medical insurance in China nationwide, along with the limited endoscopists as well as facilities, Li’s prediction model should be recommended for risk stratification to identify those individuals who are at higher risk of GC and need for further diagnostic gastroscopy and gastric mucosal biopsy.