Introduction

Risk stratification is critically important for the early detection of gastric cancer, which can lead to improved curability and reduced mortality. In Japan, national health insurance approved the eradication therapy of Helicobacter pylori (H. pylori) for the indication of chronic gastritis to prevent gastric cancers in February 2013, based on clinical evidence [1]. This has led to an explosive increase in healthy individuals after H. pylori eradication [2]. However, H. pylori eradication does not completely inhibit gastric cancer development [1]. Therefore, periodic surveillance for all such people is currently recommended and has emerged as a burden on both the individual and medical service providers. Risk stratification can optimize the surveillance interval for individuals, which helps to manage the social burden. Therefore, the establishment of a risk stratification system for H. pylori-eradicated healthy individuals is an urgent issue.

To this end, epigenetic risk markers have been highlighted as a promising biomarker for cancer risk stratification. Epigenetic alterations, namely aberrant DNA methylation, are induced in gastric mucosa with normal appearance by H. pylori infection-triggered chronic inflammation [3,4,5]. After H. pylori eradication, methylation levels decrease, depending upon individual marker genes, and the decreased levels persist for a long time [6, 7]. Persistent methylation levels after eradication are considered to reflect methylation in stem cells [8] and are closely correlated with gastric cancer risk [9, 10]. Methylation markers are considered to be a reflection of the overall epigenomic damage in gastric mucosa that leads to gastric cancer [11]. Using previously isolated methylation markers, such as miR-124a-3 [12, 13], we conducted a multicenter prospective cohort study for the prediction of the risk of metachronous gastric cancers and found the advocated epigenetic cancer risk diagnosis as a promising strategy [14, 15].

However, our previous markers were isolated from relatively small sample sets and by methylated DNA immunoprecipitation (MeDIP)-CpG island microarray, which is a low-resolution technique [13]. The recent development of a bisulfite-based BeadArray has enabled high-throughput epigenetic analysis with a much more comprehensive coverage of the genome. In particular, the BeadArray reaches a higher resolution using more than 450 thousand probes; additionally, the BeadArray can accurately measure DNA methylation levels through the accurate detection of slight differences. Therefore, using this new technology, it is likely that we can establish novel methylation markers with a higher accuracy than that of the previous markers.

In this study, we aimed to establish novel methylation markers for the risk stratification of gastric cancer optimized for H. pylori-eradicated individuals using the BeadArray technology.

Materials and methods

Tissue sample collection

A total of 232 normal or non-cancerous gastric mucosa samples collected from 169 subjects in two previous studies [14, 16]. In addition, eight normal gastric mucosa samples were collected from healthy volunteers without a history of H. pylori infection and three peripheral blood samples from healthy volunteers for screening. Further, four gastric mucosa samples from healthy volunteers with current H. pylori infection and four without a history of H. pylori infection were collected for expression microarray analysis. All gastric samples were endoscopically biopsied from the fixed antral region (2 cm from the pyloric ring on the lesser curvature), and stored in RNAlater (Thermo Fisher Scientific, MA, USA) at − 80 °C. For H. pylori-eradicated individuals, the samples were collected 6 months or more after eradication. Genomic DNA was extracted using the phenol/chloroform method.

All participants were classified into three groups (Groups 1, 2, and 3) according to their cancer risk (low, intermediate, and high, respectively) defined by the history of H. pylori infection and gastric cancer status (Supplementary Table 1). Group 1 (G1) consisted of four young (mean age ± SD, 28.8 ± 4.2 years) and four older (mean age ± SD, 71.0 ± 3.1 years) healthy volunteers without a history of H. pylori infection, and was considered a low-risk group. Group 2 (G2) consisted of 75 healthy individuals who underwent cancer screening at the Research Center for Cancer Prevention and Screening at the National Cancer Center, Japan, between April 2009 and September 2009 and experienced the successful eradication of H. pylori [16]. All the G2 participants were accompanied by gastric mucosal atrophy, and G2 was considered an intermediate-risk group. G2 samples were consecutively collected both before and after H. pylori eradication, and all the samples had sufficient quality and quantity for use in this study. Group 3 (G3) consisted of 94 gastric cancer patients who underwent the first endoscopic submucosal dissection at the National Cancer Center Hospital [14]. Such gastric cancer patients are known to have a much higher incidence of subsequent (metachronous) gastric cancer even after H. pylori eradication [17], reaching 3.0% per year, while H. pylori-eradicated healthy individuals (G2) have a low incidence of 0.23% per year [18]. All G3 patients had a history of H. pylori infection accompanied by gastric mucosal atrophy.

Gastric mucosal atrophy was determined before successful H. pylori eradication in accordance with the Kimura–Takemoto classification [19], which was correlated with the degree of histological atrophy. The extent of gastric mucosal atrophy was classified into three grades: mild (closed types I and II), moderate (closed type III and open type I), and severe (open types II and III).

All previous studies and the current study were approved by the relevant Institutional Review Boards, and written informed consent was obtained from all participants.

Preparation of a screening set and a validation set

To prepare a screening set, 12 samples were randomly selected from G2 and G3, respectively (Supplementary Table 2). In the screening set, the age and extent of gastric atrophy were not significantly different between G2 and G3 (Supplementary Table 1). In addition, G1 samples (Supplementary Table 1) were used for the selection of unmethylated probes in normal gastric mucosa. The remaining participants were used as the validation set (Supplementary Table 3). In the validation set, age distribution was comparable between G2 and G3, but G3 included significantly more males and individuals with a more severe extent of gastric atrophy than G2 (Supplementary Table 1).

Table 1 Candidates for epigenetic markers

Genome-wide DNA methylation analysis

A comprehensive genome-wide screening of differentially methylated CpG sites was conducted using an Infinium HumanMethylation450 BeadChip array (Illumina, CA, USA) as described previously [20]. Among the total 485,512 probes, the probes on the sex chromosomes were excluded and the remaining 473,864 probes on autosomes were analyzed. The methylation level of each probe was represented by a β value, which was in the range from 0 (completely unmethylated) to 1 (completely methylated). To adjust for probe design biases, the intra-array normalization was conducted using a peak-based correction method, beta-mixture quantile dilation (BMIQ) [21].

iEVORA-based screening

The Epigenetic Variable Outliers for Risk prediction Algorithm (iEVORA) is a novel statistical algorithm based on the hypothesis that differentially variable (DV) and methylated (DM) CpGs (DVMCs) are more likely to indicate cancer risk [22, 23]. In the iEVORA algorithm, significant DV CpGs are selected by Bartlett’s test FDRs and the significant DVMCs were ranked by DM t statistic. The iEVORA algorithm was computed by the R script ‘iEVORA.R’ and probes with a Bartlett’s test FDR of less than 0.001 and an unadjusted P value of less than 0.05, based on the t test, were selected.

Quantitative methylation analysis

DNA (1 μg) was modified by sodium bisulfite using an innuCONVERT Bisulfite Basic kit (Analytik Jena AG, Germany) and EZ DNA Methylation Kit (Zymo Research, CA, USA). The modified DNA was suspended in 40 μL elution buffer and a 1 μL aliquot was used for quantitative methylation analysis. Quantitative methylation-specific polymerase chain reaction (qMSP) was performed using an iCycler Thermal Cycler (Bio-Rad Laboratories, CA, USA), as previously described [12, 13]. The primer sets specific to methylated and unmethylated sequences and PCR conditions are listed in Supplementary Table 4. Standard DNAs for methylated and unmethylated sequences were prepared by cloning the PCR products from fully methylated and unmethylated control DNAs, respectively, into pGEM-T Easy vector (Promega, WI, USA). The number of molecules in a sample was determined through the comparison of its amplification with those of standard DNA with defined numbers of molecules (101–106 molecules). The methylation levels were defined as the fraction of methylated molecules in the total number of DNA molecules (the number of methylated and unmethylated molecules).

Bisulfite pyrosequencing was performed on bisulfite-converted DNA using the PSQ 96 Pyrosequencing System (Qiagen, CA, USA) as previously described [24]. Data analysis was conducted using PyroMark Q96 ID software (Qiagen, version 2.5.8). The primer sequences and PCR conditions are listed in Supplementary Table 4.

Gene expression analysis by microarray

Expression analysis was conducted with a GeneChip Human Genome U133 Plus 2.0 expression microarray (Affymetrix, CA, USA), using a pool of 4 samples from healthy individuals with current H. pylori infection, and another pool of 4 samples from individuals without a history of H. pylori infection. Data processing was conducted using GeneChip operating software (ver. 1.4). The signal intensity of each probe was normalized so that the average signal intensity of all the probes on a microarray would be 500. Mean signal intensity of all probes for a gene was used as the gene expression level. Genes were classified into those with high (> 1000), moderate (250–1000), and low (< 250) transcription, according to their signal intensities as previously described [25].

Statistical analysis

The mean methylation levels were compared by Welch’s t test and by a paired t test to compare G2 samples before and after eradication. The clinical factors between G2 and G3 were compared by Welch’s t test or the Chi-squared test. These analyses were computed using GraphPad Prism software (GraphPad Software, CA, USA). Pearson’s correlation coefficients and the P values were calculated by Excel software. The receiver-operating characteristic (ROC) curves and AUCs (area under the curve) were computed using the R package, ROCR. The P values were obtained by a two-sided test and considered as significant if less than 0.05.

Results

Isolation of candidate novel markers by two algorithms

To isolate the methylation cancer risk markers, two screening algorithms (the large difference and iEVORA-based methods) were adopted (Fig. 1). In the large difference method, to minimize the influence of contamination of inflammatory cells, probes with high methylation in three peripheral blood cell samples were first removed from the 473,864 probes on autosomes to obtain 86,596 probes. Then, 28,513 probes unmethylated in G1 (healthy volunteers never infected with H. pylori) were selected and 555 probes with a large difference (Δβ ≥ 0.2) between G2 (n = 12; H. pylori-eradicated healthy individuals with gastric mucosal atrophy) and G3 (n = 12; gastric cancer patients after endoscopic treatment and H. pylori-eradication) were isolated. Finally, to avoid the isolation of an outlier signal in a region, 49 regions were selected in which five consecutive probes showed consistent values and the value for the central probe was used. From these 49 candidates, ten candidates selected based on their P values and another ten candidates (of which six overlapped) selected based on mean differences of the beta values were used for validation (Table 1).

Fig. 1
figure 1

The isolation of candidate novel epigenetic (DNA methylation) risk markers by two algorithms from 473,864 probes on autosomes. Left: the large difference algorithm. The probes unmethylated (β value < 0.2) in blood samples and G1 were first selected. Then, 555 probes with a large difference in methylation levels (Δβ ≥ 0.2) between G3 and G2 were selected. Finally, 49 regions in which five consecutive probes showed consistent values were obtained. Right: iEVORA-based algorithm. First, 462 probes were isolated by iEVORA. Subsequently, 446 unmethylated probes in the blood samples were selected. After consideration of both large difference and consecutive probes with consistent values, two and six probes were isolated, respectively

In the iEVORA-based method, 462 probes with high variances in G3 (n = 12) were first selected. Then, similar to the large difference method, 446 probes unmethylated in the peripheral blood cells were selected. Among the 446 probes, two candidates were isolated based on the large mean differences between G2 (n = 12) and G3 (Δβ ≥ 0.2). Additionally, six regions in which three consecutive probes showed consistent values were selected and the value for the central probe was used. Collectively, eight candidates were isolated by the iEVORA-based method and used for validation (Table 1).

All candidate markers analyzed were validated by a different set

Among the 14 (large difference method) and eight (iEVORA-based method) candidates, we successfully designed pyrosequencing primers for nine regions, FLT3, LINC00643, RPRM, JAM2, ELMO1, BHLHE22, RIMS1, GUSBP5, and ZNF93 (Supplementary Table 4). First, to analyze the effect of H. pylori eradication on their methylation levels, we analyzed 63 G2 samples taken before and after H. pylori eradication. Consistent with previous reports [6, 7], all candidate markers showed comparable or decreased methylation levels after eradication compared to those before eradication (Supplementary Fig. 1).

Next, the validation set of 63 G2 and 82 G3 samples was analyzed by the nine candidate markers and a previous marker (miR-124a-3) [12]. All the candidate markers and miR-124a-3 had much higher methylation levels in G3 than in G2 (P < 0.0001; Fig. 2, Supplementary Table 5). Therefore, we were able to validate all the nine novel methylation markers for gastric cancer risk.

Fig. 2
figure 2

The methylation levels of the nine candidate markers and a previous marker, miR-124a-3, in the validation set. The horizontal line represents the mean methylation level in each group. The methylation levels of all nine candidate markers, as well as miR-124a-3, were significantly higher in the high-risk group (G3, n = 82) than in the intermediate-risk group (G2, n = 63). *P < 0.0001

High correlations among novel methylation markers

To explore whether the novel methylation markers reflected the overall epigenomic damage that resulted from past exposure to environmental factors and host responses to them [26, 27], correlations among the markers were analyzed. As expected, high correlation coefficients (R = 0.74–0.97; Table 2) were observed between any two of the novel markers and miR-124a-3. In particular, the novel markers identified by the large difference method showed very strong correlations (R = 0.82–0.97) (Table 2, Supplementary Fig. 2). The known marker, miR-124a-3, was also highly correlated (R = 0.66–0.93) with the novel markers. These results suggested that our novel markers represented one shared entity, namely epigenomic damage as a result of H. pylori infection.

Table 2 Correlation coefficient between methylation markers

High-performance novel methylation markers

The clinical performance of the novel methylation cancer risk markers was assessed by ROC curve analysis (Fig. 3, Supplementary Table 6). All the novel markers showed high AUCs (0.70–0. 80), odds ratios (5.43–23.41), high-to-moderate sensitivity (0.59–0.94), and high-to-moderate specificity (0.59–0.83) (Supplementary Table 6). Compared with miR-124a-3 (AUC; 0.74, odds ratio; 8.01), some novel markers had higher AUCs and better sensitivity or specificity. Thus, we were able to establish novel methylation markers for bisulfate pyrosequencing for the estimation of gastric cancer risk that was potentially superior to miR-124a-3.

Fig. 3
figure 3

The ROCs of nine novel methylation markers and miR-124a-3. The AUC values are shown in each box. The AUCs of some of the novel methylation markers were higher than that of miR-124a-3. AUC: area under the curve

Passenger methylation of novel marker genes in gastric carcinogenesis

To explore the possible roles of our novel marker genes in gastric carcinogenesis, we analyzed their expression levels in normal gastric mucosa using an expression microarray (Supplementary Table 7). Most of the marker genes showed very low expression levels irrespective of H. pylori infection statuses, which agreed with the data in the Genotype-Tissue Expression (GTEx, URL: https://www.gtexportal.org/home/) database (data not shown). This finding indicated that these genes are unlikely to play any biological roles in the stomach, and that their aberrant methylation is unlikely to be involved in gastric carcinogenesis, being passenger methylation. Nevertheless, JAM2 and ELMO1, whose target probes were located in CpG sites near their transcription start sites (TSSs), showed moderate expression levels.

Relationships between methylation levels and the extent of gastric atrophy/gender/aging

Gastric atrophy and gender are also strong risk factors for gastric cancer [28, 29], and the efficacy of the novel methylation risk markers irrespective of these confounding factors was examined by subgroup analysis. First, the methylation levels of the markers did not show differences in the extent of gastric atrophy within G2 or G3 (Supplementary Fig. 3a and 3b). Among the individuals with moderate gastric atrophy, significantly higher methylation levels were observed in G3 (n = 42) than in G2 (n = 26). Significantly higher methylation levels were observed in G3 than in G2 for all novel markers and miR-124a-3 (Supplementary Fig. 4, Supplementary Table 8a).

The subgroup analysis for men confirmed significantly higher methylation levels in G3 (n = 73) than in G2 (n = 41) for all markers. For women, the subgroup analysis also revealed higher methylation levels in G3 (n = 9) than in G2 (n = 22) for all markers, although some of the differences were not significant owing to the limited sample size (Supplementary Table 8b).

In addition, age-related methylation is known to contribute to predisposition to carcinogenesis by inactivating specific genes [30]. We compared methylation levels between four young and four older healthy individuals never infected with H. pylori (G1). No candidate novel methylation markers showed significant age-related differences (Table 1). This suggests that the novel methylation markers are unlikely to be affected by aging. On the other hand, regarding H. pylori-eradicated individuals (G2 and G3), the methylation levels of some markers showed weak positive correlations with age (Supplementary Fig. 5).

Taken together, these results suggested that our novel markers have the potential to be applied to individuals after H. pylori eradication, irrespective of the extent of gastric atrophy, gender, and age.

Discussion

In the current study, we established nine novel epigenetic markers able to identify cancer patients even among H. pylori-eradicated individuals using a genome-wide screening based on two different algorithms. Our novel markers showed sufficiently high odds ratios, even among individuals with gastric atrophy. Some markers showed superior performance compared with our previous marker, miR-124a-3. We believe that BeadArray technology-based comprehensive methylation analysis and validation in a large number of samples provided us with promising methylation markers. Importantly, our epigenetic markers are not likely to be affected by possible contamination of blood cells in the biopsy sample because we eliminated such probes during our screening.

The novel methylation markers showed comparable or decreased methylation levels after eradication, compared to those before eradication, in agreement with previous reports [6, 7]. Our novel markers were selected as those showing very low methylation levels in blood cells. Therefore, the decrease in methylation levels after eradication is unlikely to have resulted from decreased infiltration of inflammatory blood cells after eradication, but likely resulted from the supply of new progenitor cells from unmethylated stem cells. Additionally, before eradication, a large fraction of inflammatory blood cells can reduce the fraction of epithelial cells. After eradication, an increased fraction of epithelial cells may result in a trade-off with decreased methylation levels in epithelial cells for some markers, leading to methylation levels comparable to those before eradication.

In general, genes with low expression are known to be susceptible to aberrant DNA methylation [25, 31, 32]. Indeed, seven marker genes had very low expression levels in the gastric mucosa, indicating that aberrant DNA methylation of these marker genes was passenger methylation. On the other hand, JAM2 and ELMO1 had moderate expression levels, and their target probes were located in CpG islands near the TSSs. Therefore, there remains a possibility that these two genes were methylation-silenced, being drivers. Methylation silencing of JAM2, involved in cell adhesion, and ELMO1, involved in cell motility, is reported in colorectal cancer [33,34,35]. However, more importantly, considering that the high correlations of methylation levels between the marker genes, the high susceptibility of the marker genes to aberrant methylation due to H. pylori infection, irrespective of their functions in gastric carcinogenesis, was considered the explanation for the good performance of these novel marker genes in risk prediction.

We also analyzed the effect of age-related methylation. However, no difference in methylation levels was observed between young and older healthy individuals never infected with H. pylori (G1), and the effect of aging was considered to be minimal. At the same time, in H. pylori-eradicated individuals (G2 and G3), some markers were slightly associated with age. Importantly, methylation levels are known to increase over time to reflect the infection period [5]; therefore, the slight increase correlated with age may be related to the infection period.

The potential limitations of our study include the sample bias collected in two studies [14, 16]. Additionally, H. pylori-eradicated individuals who were collected as G2 may contain a small number of cancer patients (G3). This might have resulted in a lower sensitivity and specificity in our markers than expected. In addition, the differences in prevalence of the extent of gastric atrophy and gender were observed between G3 and G2. However, the subgroup analysis suggested a high potential applicability of our methylation markers.

One of the prominent applications of the novel markers is a large-scale multicenter prospective cohort study for the risk stratification of primary gastric cancer in healthy individuals with extensive gastric atrophy after H. pylori eradication (UMIN000016894). This nationwide, ongoing prospective study is expected to optimize the surveillance system based on stratified individual risk, which will contribute to precision medicine.

In conclusion, we established nine novel methylation markers that identify H. pylori-eradicated individuals with high risk of gastric cancer.