Introduction

Gastric cancer is the sixth most common cancer and the third most common cause of cancer deaths worldwide [1]. Screening and prevention strategies are urgently needed to reduce the significant burden of gastric cancer. Helicobacter pylori infection is the initiator for gastric cancer carcinogenesis [2], so active screening and eradication of this bacterium is considered one of the best strategies [3]. However, the benefit of such an approach depends on the magnitude of pre-existing molecular damages in the gastric mucosae [4]. For patients who already harbor premalignant gastric lesions after long-term inflammation, such as atrophic gastritis and intestinal metaplasia, they can retain irreversible changes after the eradication treatment [5, 6]. Therefore, when H. pylori eradication is increasingly adopted as a healthcare policy, a non-invasive test is needed to accurately identify candidates most likely to benefit from endoscopy, in order to allocate the limited resources.

Serum pepsinogen (PG), a proenzyme of pepsin secreted from gastric mucosae, is released into the systematic circulation and thus becomes measurable, making it a good choice [7]. PG consists mainly of two subtypes: PG-I and PG-II; the former is secreted by the fundic glands and the latter is secreted by the pyloric and Brunner’s glands [8]. When long standing H. pylori infection leads to the loss of secretory glands, both PG-I and PG-II will decline, providing quantitative measures for the presence and severity of mucosal damages [9, 10]. However, some barriers exist to its clinical application. First, PG testing isn’t routinely used as a tool for mass screening, so real-world evidence of its effectiveness, in terms of population coverage, acceptance among patients, and diagnostic accuracy, is limited. Second, in the current free-market system, different brands of PG testing may be chosen. Although results are generated from the same antibody-antigen reaction, their methods to measure the concentration of antibody-antigen complexes differ. This may include colorimetric methods, radioimmunoassay, enzyme-linked immunosorbent assay (ELISA), and chemiluminescence [11,12,13]. As a result, the numerical measures and the cutoff criteria for a positive test are fraught with significant variations, leading to uncertainty about the interchangeability of results. Third, after H. pylori eradication, the back diffusion of PG-I and PG-II from stomach to the systematic circulation is reduced because the mucosal integrity is improved through the attenuation of mucosal inflammation [14]. However, our understanding of the performance of PG tests under these circumstances is limited.

In Taiwan, gastric cancer preventative programs have been implemented, allowing us a large enough high-risk cohort to study these questions [3, 14]. We therefore used this cohort to evaluate the performance of PG testing in a post-eradication population as well as determine if two different brands of PG testing, with different measurement methods and cutoff criteria, would perform equivalently in the prediction of the premalignant gastric lesions.

Methods

Study population

Our eligible population comprised 7616 inhabitants of an offshore archipelago (Matsu Islands, Taiwan), who were aged 30 years or more and registered in the population registry. Because of the high gastric cancer incidence in this population, a mass screening and eradication program of H. pylori has been implemented since 2004 [3]. All eligible inhabitants were invited by mail, telephone, or announcement in social media and newspapers to attend the screening program. Participants’ demographic data, social habits, and medical histories were recorded in a structured questionnaire. H. pylori infection was determined using the 13C-urea breath test, those testing positive received eradication treatment, and those who failed initial treatment were retreated [15, 16]. By the end of 2018, six rounds of mass eradication have been conducted. As a result, the prevalence rate of H. pylori infection in the Matsu Islands declined from 64.2% before the mass eradication program to about 10% recently, accompanied by a 53% reduction in gastric cancer cases [14].

Study design

Since scattered cases of gastric cancer occurred during the mass eradication program, a two-stage endoscopic screening program was added to the chemoprevention program in 2015, with the main purpose to identify participants who were still at high risk for gastric cancer after H. pylori eradication and provide them with endoscopic screening. This program was done using the serological test with PG, in which the test positives would be referred for endoscopic examination and histological evaluation. Preliminary results were reported previously [17].

In 2018, because various brands of PG were available in the market, we tested whether or not the performance of two brands, the GastroPanel® (Biohit HealthCare, Helsinki, Finland) and the LZ-Test® (Eiken Chemical Co., Ltd, Tokyo, Japan), was equivalent. We applied a parallel test design [18], in which participants with positive PG results for either one of the two tests would be referred for endoscopic examination. We assumed the possibility of false negatives could be lower than those based on a single test.

These tests also included measurements of the anti-H. pylori IgG or gastrin-17; however, since our initial experience indicated that the values of these two measures had been reduced after H. pylori eradication [17], we only analyzed the results of the PG-I, PG-II, and PG-I/II ratio in this study. The study protocol was approved by the Institutional. Review Board (IRB) of National Taiwan University Hospital (IRB No: 201406021RINA), and written informed consent was obtained from all participants before they entered the study.

Serological tests

After an overnight fast, participants received blood sampling for both GastroPanel® and LZ-Test® at the same time. Participants were requested to stop taking acid suppressing drugs, particularly proton-pump inhibitors, at least one week before testing. After the blood sample was taken by venipuncture, it was placed into the serum tube without any additives. The tube was placed in an upright position for at least 30 min at room temperature to allow for blood clot formation. Then, after the blood sample was centrifuged for 10 min, it was stored at – 20 ℃ and prepared for measurements of PG.

The GastroPanel® test applied the ELISA method to measure the serum concentration of antigen–antibody complexes based on color intensity. The PG-specific antibodies were coated onto the surface of a microplate. The blood sample was then added to the wells, creating the antibody-antigen reaction. A secondary antibody linked to an enzyme was used to bind the antigen–antibody complexes. Finally, color-producing substrates were added to react with the enzyme and color was produced. The PG concentration in the serum sample could be quantified according to the intensity of color. The cutoff concentrations recommended by the manufacturer were set at PG-I < 30 ng/mL or PG-I/II ratio < 3 [19]. The sensitivity, specificity, and positive likelihood ratio in the prediction of premalignant gastric lesions had been reported as about 40%, 93%, and 6.0, respectively [20].

The LZ-Test® applied the latex-enhanced turbidimetric immunoassay (L-TIA) that measured the concentration of antigen–antibody complexes based on the optical turbidity. The latex reagent was prepared through the binding between the PG-specific antibodies and the latex particles. After the latex particles reacted with the PG presented in the blood samples, the agglutination reaction would change the turbidity, which could be therefore quantified. The cutoff concentrations recommended by the manufacturer were set at PG-I ≤ 70 ng/mL and PG-I/II ratio ≤ 3 [21]. The sensitivity, specificity, and positive likelihood ratio in the prediction of premalignant gastric lesions had been reported as about 59%, 89%, and 5.5, respectively [22].

Endoscopic and histological evaluations

During the endoscopic examination, biopsy specimens were taken from gastric mucosa in the antrum (from the greater and lesser curvatures 2–3 cm from the pylorus) and corpus (one each from the lesser and greater curvature at the middle corpus) according to the modified Sydney protocol [23]. Sampling was done at the same locations in all participants to maintain consistency. All specimens were fixed in formalin for histological evaluation. A senior histopathologist (Dr. H Chiang), blinded to participants’ clinical status and test results, performed all histological assessments. Specimens were graded, using the Updated Sydney Classification, as acute inflammation (polymorphonuclear infiltrates), chronic inflammation (lymphoplasmacytic infiltrates), atrophic gastritis (loss of glandular tissue and fibrous replacement), or intestinal metaplasia (presence of goblet cells and absorptive cells) [23]; the severity of each category was rated as none, mild, moderate, or marked. And the severity of premalignant lesions were classified using the criteria of Operative Link for Gastritis Assessment of Atrophic Gastritis (OLGA) and Operative Link for Gastritis Assessment of Intestinal Metaplasia (OLGIM) [24, 25].

Statistical analysis

For the baseline characteristics, categorical data were expressed as a percentage and continuous data were expressed as a mean, with standard deviation. The baseline characteristics of participants were compared using the Student t-test or chi-square test. The performance of the whole program was evaluated according to the consecutive steps of an organized screening program, including the invitation, participation, testing, referral, and the final diagnosis. The indicators included the test positivity rates (the number of positive cases divided by the number of participants), referral rate for endoscopic examinations (the number of endoscopies divided by the number of positive cases), positive predictive value (PPV: the number of participants diagnosed with premalignant gastric lesions divided by the number of diagnostic endoscopies), and the detection rate (the number participants diagnosed with premalignant gastric lesions divided by the number of participants).

Two brands of PG tests were then compared. First, the Pearson’s correlation coefficient (a level of 0.7 or more indicated a strong correlation) was applied for the numerical measures of PG-I, PG-II, and PG-I/II ratio, and the Kappa statistics (a level of 0.6 or more indicated substantial agreement) was applied for the positivity rates. Second, the positivity rate, referral rate, PPV, and detection rate were compared between two tests using the two sample proportional test. To adjust for differences in baseline characteristics, logistic regression analyses were performed, with the results expressed as the crude and adjusted odds ratios and the corresponding 95% confidence intervals (CIs). Third, the positivity rates between two tests in patients with premalignant lesions were compared using the two-sample proportion test according to the OLGA and OLGIM stages.

All statistical analyses were performed using Stata 14 software (StataCorp LLC, College Station, TX, USA). All P-values were two-sided, with P < 0.05 considered statistically significant.

Results

Baseline characteristics

The study flowchart is shown in Fig. 1. Among 7616 eligible participants, 5117 received the PG tests, leading to a coverage rate of 67.2%. Among these participants, 284 (5.6%) had positive results. Among participants tested positive, 51 participants tested positive for GastroPanel® only, 92 tested positive for LZ-Test® only, and 141 tested positive for both tests. A total of 105 (37%) participants tested positive and underwent endoscopic examinations: 13 were diagnosed with atrophic gastritis, 19 diagnosed with intestinal metaplasia, and 7 diagnosed with both atrophic gastritis and intestinal metaplasia.

Fig. 1
figure 1

Study flowchart

Baseline characteristics of participants are shown in Table 1. Participants who tested positive for either GastroPanel® or LZ-Test® were older than those who tested negative (58.1 vs. 54.8 years, P < 0.001). A history of hypertension was more common in those who tested positive than those who tested negative (38.7% vs. 29.9%, P = 0.002), which was probably because the subjects who tested positive were older than those who tested negative. The mean value of body mass index was similar between those who tested positive and negative. Also, the proportion of male sex, cigarette smokers, alcohol drinkers, betel nut chewers, and other medical conditions, including diabetes mellitus, hyperlipidemia, cardiovascular disease, stroke, chronic hepatitis B, chronic hepatitis C, and chronic kidney disease, were all comparable between participants with positive and negative results.

Table 1 Baseline characteristics of the screening participants stratified by the results of the PG tests

The comparison of baseline data between endoscopic receivers and refusers is shown in Additional file 1: Table S1. No significant differences were noted except that the proportion of hyperlipidemia and cardiovascular disease were higher in the endoscopic receivers.

Performance of the whole program

The number of positive tests, positivity rates, number of endoscopies, and the referral rate of endoscopic examinations, which are stratified by age, sex, and testing brand, are shown in Table 2. The positivity rates were 3.8%, 4.6%, and 5.6% for GastroPanel®, LZ-Test®, and the parallel combination (i.e. either one was positive), respectively. The positivity rate of the parallel combination was higher than that of GastroPanel® (5.6% vs. 3.8%, P < 0.001) and LZ-Test® (5.6% vs. 4.6%, P = 0.021). The positivity rates of the parallel combination were slightly higher for males (5.8% vs. 5.3%, P = 0.43), but were significantly higher in the older-age group (≥ 50 years) than the younger-age group (30–49 years) (6.1% vs. 4.5%, P = 0.016).

Table 2 The process indicators of screening, stratified by the age, gender, and the brands of PG testing

Results of the PPV and detection rate are shown in Table 3. In addition to the overall results, the results were also stratified by age, sex, and test brand. The overall PPV (i.e., two-test combination) for atrophic gastritis was slightly lower than that of the individual test with either GastroPanel® or LZ-Test® (12.4% vs. 16.3% and 16.3%, both P = 0.45). Findings for intestinal metaplasia were similar (18.9% vs. 23.8% and 21.3%, P = 0.42 and 0.69, respectively). Taking into consideration the referral rates, the detection rate for atrophic gastritis was the same for overall and the individual tests (2.5 per 1000). The detection rate for intestinal metaplasia was also comparable (3.9 vs. 3.7 and 3.3 per 1000, P = 0.60 and 0.10, respectively).

Table 3 The performance indicators of screening, stratified by the age, sex, and the brands of tests

The PPVs for either atrophic gastritis or intestinal metaplasia was slightly higher in males than females, and was significantly higher in the older-age than the younger-age group. The detection rates for atrophic gastritis and intestinal metaplasia were both higher in males than females, and were both higher in the older-age than the younger-age group.

Comparisons of the measures between two tests

As shown in Fig. 2, the quantitative measures of PG-I, PG-II, and PG-I/II ratio for GastroPanel® were about 40% higher, 26% lower, and 2.3-fold higher than those for LZ-Test®, respectively. However, the correlation coefficients between these two tests were 0.96 (95% CI 0.95–0.96), 0.91 (95% CI 0.90–0.91), and 0.76 (95% CI 0.75–0.77) for the PG-I, PG-II and the PG-I/II ratio, respectively, indicating a strong correlation (all P < 0.001). For the positivity rate, the kappa value for agreement was 0.65 (95% CI 0.60–0.70), also indicating a substantial agreement (P < 0.001).

Fig. 2
figure 2

Correlations of numerical measures between two tests using scatter plots. The R indicates the correlation coefficient (Y axis: GastroPanel and X axis: LZ-Test). The linear regression lines for the PG-I, PG-II, and PG-I/II ratio are Y = 1.40X + 7.78, Y = 0.74X-0.99, and Y = 2.31X + 1.45, respectively

Comparisons of the screening indicators between two tests

In the comparison between two tests, the positivity rate was significantly lower in GastroPanel® than that in LZ-Test® (3.8% vs. 4.6%, P = 0.044), particularly in the younger-age group (2.2% vs. 3.5%, P = 0.018). The referral rate for diagnostic examination was slightly higher in GastroPanel® than LZ-Test® but the difference was not statistically significant (41.1% vs. 33.9%, P = 0.12). The PPVs for atrophic gastritis were the same at 16.3% (P = 1.00) and the results for intestinal metaplasia were 23.8% and 21.3% (P = 0.71) for GastroPanel® and LZ-Test®, respectively. Two tests had the same detection rate for atrophic gastritis, 2.5 per 1000, and their detection rates as for intestinal metaplasia were also comparable (3.7 vs. 3.3 per 1000, P = 0.27).

Multiple regression analyses

Results for the logistic regression analyses are shown in Table 4. Univariate analyses showed non-significant differences between the two test brands regarding the PPV for atrophic gastritis (OR 1.00, 95% CI 0.42–2.38, P = 1.00) or intestinal metaplasia (OR 1.16, 95% CI 0.54–2.48, P = 0.70). Taking into consideration the referral rate for endoscopy, the results for the detection rate were still similar between two tests for either atrophic gastritis (OR 1.00, 95% CI 0.45–2.23, P = 1.00) or intestinal metaplasia (OR 1.13, 95% CI 0.57–2.21, P = 0.73).

Table 4 Comparisons of the performance indicators between two brands of pepsinogen testing using the logistic regression models

When adjusted for age, sex, body mass index, social habits, and medical histories, the differences in PPVs for atrophic gastritis (adjusted OR 1.12, 95% CI 0.40–3.12, P = 0.83) (Additional file 1: Figure S1) and intestinal metaplasia (adjusted OR: 1.75, 95% CI 0.65–4.67, P = 0.27) (Additional file 1: Figure S2) between two tests were still non-significant. Results for the detection rates for atrophic gastritis (adjusted OR 1.00, 95% CI 0.45–2.24, P = 1.00) (Additional file 1: Figure S3) and intestinal metaplasia (adjusted OR 1.20, 95% CI 0.60–2.40, P = 0.53) (Additional file 1: Figure S4) were also non-significant.

Older age was borderline significantly associated with a higher PPV for atrophic gastritis (adjusted OR 3.68, 95% CI 0.67–20.16, P = 0.13) while older age was significantly associated with a higher PPV for intestinal metaplasia (adjusted OR 5.97, 95% CI 1.12–31.77, P = 0.036). Between sexes, differences in PPV were not statistically significant. The detection rate for atrophic gastritis was significantly higher in the older age group (adjusted OR 5.61; 95% CI 1.30–24.15, P = 0.021) and slightly higher for male sex (adjusted OR 1.37, 95% CI 0.58–3.25, P = 0.47). Regarding intestinal metaplasia, significantly higher detection rates were noted for either the older age group (adjusted OR 8.91; 95% CI 2.12–37.51, P = 0.003) or the male sex group (adjusted OR 2.23; 95% CI 1.01–4.96, P = 0.049).

Comparison of positivity rates according to OLGA and OLGIM stages

In subjects with histologically documented atrophic gastritis and intestinal metaplasia, 84.6% and 78.9% of them tested positive for both tests, respectively. The details are shown in Fig. 3; in patients with atrophic gastritis (all were graded with OLGA stage 1 diseases), both 92.3% yielded positive results for GastroPanel® and LZ-Test®. In patients with intestinal metaplasia, 94.7% and 84.2% yielded positive results for GastroPanel® and LZ-Test®, respectively; the difference was not significant (P = 0.29). In details, the positivity rates were remarkably similar in patients with either the OLGIM stage 1 (100% vs. 75.0%, P = 0.13), stage 2 (87.5% vs. 87.5%, P = 1.00), or the stage 3 diseases (100% vs. 100%, P = 1.00).

Fig. 3
figure 3

Comparisons of the positivity rates between two PG tests in patients with histologically documented premalignant lesions. Test 1 = GastroPanel (Biohit HealthCare, Helsinki, Finland) and Test 2 = LZ-Test (Eiken Chemical Co., Ltd, Tokyo, Japan)

Discussion

This community-based study not only confirmed the applicability of PG testing for mass screening, but also made extensive comparisons between two different brands of PG testing. We found that although their absolute measures of PG-I, PG-II, and the PG-I/II ratio were significantly different, the correlations between them were high and the agreements in test results were substantial. We also found that, although they applied different immunoassay methods for measurement and defined different cutoff criteria, the performance of the two tests was comparable in the prediction of premalignant gastric lesions, particularly in terms of PPV and detection rate. The findings were replicated in multiple regression analysis that adjusted for differences in the baseline characteristics of participants. The histological severities of the identified atrophic gastritis and intestinal metaplasia were also similar based on unified scoring systems. Collectively, the results indicate that two different brands of PG tests perform equivalently in predicting premalignant gastric lesions in the real-world environment, supporting their interchangeability.

We evaluated the real-world performance of PG testing according to the step-by-step principles of screening [26]. First, our high population participation rate of about 67% supported PG testing as a tool for mass screening. Second, regarding the referral for endoscopic diagnosis, the endoscopic rate of about one third was relatively low, which was due to the invasive nature and the lower population acceptability of endoscopic screening. Third, regarding the PPV, the value of about one fifth has indicated the moderate predictability of PG testing in a high-risk population [27]. Finally, the detection rate of about 3 cases per 1000 participants, which took into account the positivity rate, referral rate, and the PPV, replicated the results of population-wide screening programs for other types of cancers [28].

Our study demonstrated a high correlation of quantitative measures between the two test brands, which is supported by two previous studies. First, one Japanese study, based on 304 blood samples, showed the correlation coefficients of 0.98, 0.98, and 0.92, respectively, for the PG-I, PG-II, and PG-I//II ratio between GastroPanel® and LZ-Test® [11]. Another Latvian study, based on 805 blood samples, also showed the high correlation coefficients of 0.89, 0.90, and 0.86 for the PG-I, PG-II, and PG-I//II ratio, respectively, between these two tests [13]. However, whether these results could be interpreted as equivalent in the real-world setting remained unclear. It was primarily because their absolute measures and cutoff definitions were so different. In our study, the prevalence rate of the premalignant conditions has declined after mass eradication of H. pylori [14] so the PPVs of the PG tests also decreased. As a result, the overlap of positive results between two tests may appear lower (141/284, 49.6%). Nonetheless, when we focused on the subjects with histologically documented premalignant conditions, a large proportion of them tested positive for both tests. Therefore, when we interpreted the findings with the concept of sensitivity (the number of test positives divided by the number of patients diagnosed with premalignant lesions), which was not affected by the lower prevalence rate of the premalignant lesions, the test consistency was actually high.

In our study, we applied the two tests in parallel so the prevalence rates of premalignant gastric lesions were the same. The observed similarities in the PPVs (i.e., the prevalence rate of premalignant gastric lesions multiplied by the positive likelihood ratio of the screening test) would indicate that the positive likelihood ratios of two tests should be close to each other. Using histology as the reference standard, the positive likelihood ratio (i.e., the sensitivity divided by one minus specificity) was determined with the diagnostic accuracy study. Our findings supported the robustness of cutoff criteria selection from both companies because their criteria were defined based on the judicious tradeoffs between true positive and false positive results in the receiver operating characteristic curve analyses, which had been optimized according to the similar need for clinical practice [20, 22, 29].

Our previous study has estimated a prevalence rate of about 3% for premalignant gastric lesions in this population that had undergone six rounds of mass eradication [3, 14]. Given the PPVs of about 16% observed in this study, the positive likelihood ratio of PG testing could be estimated at about 5, which was consistent with the results from the previous studies investigating subjects without H. pylori eradication [20, 22, 29]. Given this test capability, we also evaluated how to improve the PPV on a real-world setting. First, we applied two tests in parallel, hoping to increase the test sensitivity so as to reduce the false-negative results. However, this hypothesis was rejected as the PPV and detection rate were comparable when the tests were either applied alone or in combination. It was likely because these two tests were highly equivalent in the test capability and their results largely overlapped. Second, we performed stratified analyses according to sex and age and conducted multiple regression analyses, hoping to identify useful indicators to focus on the subgroups with the highest risk to increase the pretest possibility. We found that these tests performed more accurately in older participants. For participants aged 50 years or more, there were 3.7- and 6.0-fold increases in the PPVs for atrophic gastritis and intestinal metaplasia, and 5.6- and 8.9-fold increases in the detection rates for atrophic gastritis and intestinal metaplasia, respectively, as compared with the younger age group. These findings lend support to the recommendations of several consensus statements about the utilization of surveillance endoscopy to detect gastric cancer after H. pylori eradication [30, 31].

The strengths of our study include the invitation of all candidates in a high-risk community, the step-by-step evaluation following pre-established screening principles, and the head-to-head comparisons of two widely used PG tests in a real-world environment. Our findings can not only be applied in a high-risk population, but can also be generalized to other high-risk populations living in lower-risk countries, such as first-generation immigrants from high-prevalence areas. However, certain limitations should be noted. First, although our study confirmed the consistent performance of PG testing in predicting the presence of premalignant gastric lesions, detecting gastric cancer would have required a longer follow-up of this population, as our previous study has done [10]. Second, research has shown that, after H. pylori eradication, the histology can be improved but genetic damage may persist, which would lead to a decreased capacity of PG testing in the prediction of gastric cancer risk [17, 32]. Our study demonstrated a low prevalence rate of advanced-stage pre-malignancies (i.e., stage III–IV OLGA and OLGIM), particularly atrophic gastritis, which was likely related to the effect of H. pylori eradication on gastritis regression [14, 17]. Other serological markers or molecular testing based on histological samples could be possible solutions and directions for future research [33, 34]. Finally, regarding the generalizability, although we have demonstrated the equivalent performance between these two brands of PG tests for mass screening, it still need further investigate the interchangeability for other methods to measure the PG concentration except ELISA and L-TIA.

Conclusions

We demonstrate high correlations of either the PG-I, PG-II, or PG-I/II measures, good agreements in the test positivity, and the equivalent performance between two brands of PG tests for mass screening, which support the interchangeability of their test results. We also show that targeting older participants will increase the predictability of PG testing in a high-risk population when H. pylori screening and treatment has been implemented as common practice. These findings provide important implications in identifying those who will retain gastric cancer risk to properly identify who will benefit most from endoscopic screening.