Background

Esophagogastric junction (EGJ) adenocarcinoma is a global health burden with its incidence increasing worldwide, especially in Western countries [1,2,3]. EGJ adenocarcinomas are classified into three types, Siewert types I, II, and III, based on the anatomic location of tumor epicenter [4,5,6]. Based on these subtypes, different treatment protocols are utilized. Namely, Siewert type I adenocarcinomas, mostly developed from Barrett’s esophagus, are treated as esophageal adenocarcinomas (EACs), and Siewert type III adenocarcinomas, mostly developed from proximal gastric mucosae with H. pylori-triggered inflammation, are treated as gastric adenocarcinomas (GACs) [7,8,9]. In contrast, Siewert type II adenocarcinomas are morphologically indistinguishable whether they originate from Barrett’s esophagus or the stomach, and no standard treatment strategy has been established [6, 10, 11]. Therefore, the development of a predictive marker for the tissue origin of EGJ adenocarcinomas is important, especially for Siewert type II.

To predict a tissue origin, DNA methylation profiles are known to reflect tissue types [12], and are mostly maintained during carcinogenesis [13,14,15]. For example, it is reported that tissue origin of circulating cell-free DNA can be identified using methylation profiles [16]. Also, cancer subtypes can be defined for breast cancers, colorectal cancers, and central nervous system tumors [17,18,19]. Furthermore, DNA methylation profiles can be utilized to predict tissue origins of cancers of unknown primary [14, 20]. Stratification of patients using a classifier of cancer type based on DNA methylation signatures and implementation of tumor type-specific therapy brings better survival than the use of empiric therapy [14]. This suggests that DNA methylation may also be useful for the prediction of the tissue origin of EGJ adenocarcinoma, especially Siewert type II. A decade ago, a pioneering study used 74 genes, and revealed that DNA methylation levels of GATA5 could classify EGJ adenocarcinomas into two subgroups, although distinction between GACs and EACs was unclear [21].

In this study, we aimed to develop a DNA methylation marker to predict tissue origin of EGJ adenocarcinomas analyzing DNA methylation levels of 470,870 CpG sites located on autosomes. Since our goal was to select EGJ adenocarcinomas to be treated as GACs and EGJ adenocarcinomas (e.g., Siewert type II cases) that are currently treated as EACs, we placed emphasis on reducing false-positive cases as much as possible.

Materials and methods

Tissue samples and cancer cell lines

Formalin-fixed paraffin-embedded (FFPE) tissue samples of 37 GACs in the gastric body (thus, excluding Siewert type III), and 18 EACs (including Siewert type I, or Nishi type E) were obtained at the National Cancer Center Hospital (GAC, n = 34, EAC, n = 3), the University of Tokyo Hospital (GAC, n = 3; EAC, n = 4), the Nippon Medical School Tama-Nagayama Hospital (Nippon Medical School) (EAC, n = 3), and the Cancer Institute Hospital of Japanese Foundation for Cancer Research (JFCR) (EAC, n = 8), with written informed consents [22, 23]. This study was approved by the Institutional Review Boards of the National Cancer Center (2018-024), the University of Tokyo [G3521-(21)], Nippon Medical School (A-2020-051), and the Cancer Institute Hospital of JFCR (2021-GA-1103).

MKN1, SW837, DLD1, and LoVo cell lines were purchased from the Japanese Collection of Research Bioresources (Osaka, Japan). The LNCaP cell line was purchased from the American Type Culture Collection (Manassas, VA, USA). The HSC64 cell line was provided by Dr. K Yanagihara (National Cancer Center Institute Research, Tokyo, Japan). For HSC64, MKN1, SW837, DLD1, LoVo, and LNCaP cell lines, DNA was extracted and methylation profiles were analyzed by Infinium HumanMethylation450 BeadChip array (Illumina, San Diego, CA) as described previously [24, 25]. Methylation profiles of KYSE30, KYSE50, KYSE140, KYSE170, KYSE180, KYSE220, KYSE270, KYSE410, KYSE450, KYSE510, KYSE520, AGS, KATOIII, N87, HSC44, HSC59, 44As3, BT20, HCC1395, HCC1428, HCC1937, HCC1954, MCF7, and ZR-75-1 cell lines were obtained from our previous studies [24,25,26,27].

The cells were maintained in Roswell Park Memorial Institute (RPMI)-1640 medium (HSC64, MKN1, SW837, DLD1, and LNCaP) or Ham’s F12 medium (LoVo) supplemented with 10% fetal bovine serum, 1% penicillin/streptomycin at 37 °C in a humidified atmosphere with 5% CO2.

DNA extraction

From FFPE samples, cells enriched for cancer cells were collected by macrodissection of histological sections (10 μm thick) for 37 GAC and 18 EAC adenocarcinoma FFPE samples. Dissected tissues were treated with deparaffinization solution and a proteinase K [EpiTect Plus bisulfite kit (Qiagen, Hilden, Germany)], and were purified using a Zymo-Spin I column (Zymo Research, Irvine, CA). Genomic DNA was extracted from 4 to 10 FFPE sections or 106–107 cultured cells by the phenol/chloroform method, and quantified based on the absorbance at 260 nm. For FFPE samples, 10 ng of DNA was used for the analysis of RPPH1 gene copy number by quantitative real-time PCR using SYBR Green I (BioWhittaker Molecular Applications, Rockland, MD) and primers are listed in Supplementary Table S1.

Genome-wide DNA methylation analysis

Genome-wide DNA methylation profiles using an Infinium HumanMethylation450 BeadChip array were obtained from our previous studies and public databases (Supplementary Table S2). Methylation profiles of 48 GACs were obtained from our previous studies [28, 29]. Profiles of 30 GACs, 48 EACs, and 46 EGJ adenocarcinomas were from The Cancer Genome Atlas (TCGA) database. Profiles of 16 normal gastric mucosae [8 mucosae with Helicobacter pylori (H. pylori)-infection and 8 mucosae without infection] and normal esophagus were from our previous study [30, 31]. Profiles of Barrett’s esophagus were obtained from the Gene Expression Omnibus (GEO) (accession # GSE81334). Profiles of peripheral leukocytes were from a previous study by Reinius et al. [32]. Among 485,512 probes in a BeadChip array, 470,870 CpG sites located on the autosomes were used for the analysis. β values (DNA methylation levels), ranging from 0 (complete unmethylation) to 1 (complete methylation), were obtained, and probe type difference was normalized using a web tool, MACON [33].

Bisulfite pyrosequencing

Genomic DNA (1 µg) was extracted from FFPE samples and cancer cell lines was treated with sodium bisulfite using an EZ DNA Methylation Kit (Zymo Research, Irvine, CA), and bisulfite-treated DNA was eluted in 40 μl of elution buffer. Bisulfite-treated DNA that had 50 copies of RPPH1 or more in FFPE samples (average = 137 copies; range = 50 to 628 copies) and cancer cell lines was analyzed by bisulfite pyrosequencing (PyroMark Q96 ID, Qiagen, Valencia, CA) using the listed primers (Supplementary Table S3). DNA methylation levels of target CpG sites were calculated using the PSQ Assay Design software (Qiagen). The average value of target CpG sites in a region sequenced by bisulfite pyrosequencing was obtained.

Statistical analyses

DNA methylation levels that showed the maximum Youden index (sensitivity + specificity − 1) were calculated using R version 3.6.2 with the ROCR package. The highest value was selected as a cut-off value to place emphasis on the high specificity and reduce false-positive cases as much as possible.

The correlation between bisulfite pyrosequencing and Infinium BeadChip array was analyzed by a Pearson correlation coefficient with a parametric hypothesis test. Based on the correlation analysis, a methylation level obtained by pyrosequencing was scaled to that by Infinium BeadChip array. A difference of the mean DNA methylation values was evaluated by Student’s t test. All analyses were conducted by two-sided tests and P < 0.05 was considered statistically significant.

Results

Screening of GAC- and EAC-specific methylation markers

To isolate GAC-specifically methylated regions (GAC-specific markers), a genome-wide DNA methylation profile of 30 GACs and 30 EACs (the screening set) was utilized (Supplementary Table S2). 75,308 CpG sites unmethylated (average β value < 0.2) in seven types of leukocytes (Th cells, Tc cells, NK cells, B cells, monocytes, neutrophils, and eosinophils) [32], normal esophagus (n = 8), and Barrett’s esophagus (n = 8) were isolated from 145,841 CpG sites in CpG islands (Fig. 1a). A volcano plot of these 75,308 CpG sites showed that 51 CpG sites had significantly higher methylation levels in the 30 GACs than in the 30 EACs [Δβ (GAC − EAC) > 0.2, and P < 0.001] (Fig. 1b). Among them, 2 CpG sites (SLC46A3 and cg09177106) were unmethylated (β value < 0.2) in all the 30 EACs, and were considered as GAC-specific methylation markers (Supplementary Fig. S1a). Based on the maximum Youden index in the screening set, the cut-off values for SLC46A3 and cg09177106 were calculated as 0.03 and 0.061, respectively (Supplementary Fig. S1b).

Fig. 1
figure 1

Genome-wide screening of gastric adenocarcinoma (GAC)-specific and esophageal adenocarcinoma (EAC)-specific methylation markers. a Screening scheme of GAC-specific methylation markers. 75,308 CpG sites unmethylated (average β value < 0.2) in leukocytes, normal esophagus, and Barrett’s esophagus were selected from 485,512 probes. Among them, 51 CpG sites had significantly higher methylation levels in 30 GACs than in 30 EACs [Δβ (GAC − EAC) > 0.2, and P < 0.001]. Two of 51 CpG sites, SLC46A3 and cg09177106, were unmethylated in all the EACs. b Volcano plot analysis for identification of GAC-specific methylation. 51 CpG sites had significantly higher methylation levels in 30 GACs than in 30 EAC [Δβ (EAC − GAC) > 0.2, and P < 0.001]. c Screening scheme of EAC-specific methylation markers. 72,354 CpG sites unmethylated in leukocytes, gastric mucosae with H. pylori infection, and those without were selected from 485,512 probes. Among them, 51 CpG sites had significantly higher methylation levels in 30 EACs than in 30 GAC [Δβ (EAC − GAC) > 0.2, and P < 0.001]. Three of 51 CpG sites, MARCKSL1, RIC8B, and RAB11FIP3, were unmethylated in all the GACs. d Volcano plot analysis for identification of EAC-specific methylation. 51 CpG sites had significantly higher methylation levels in EACs compared to GACs

EAC-specific markers were also searched for by the same strategy. 72,354 CpG sites unmethylated (average β value < 0.2) in leukocytes, gastric mucosae without H. pylori infection (n = 8), and those with H. pylori infection (n = 8) were isolated from the 145,841 CpG sites in CpG islands (Fig. 1c). A volcano plot of these 75,354 CpG sites showed that 51 CpG sites had significantly higher methylation levels in 30 EACs than in 30 GACs [Δβ (EAC − GAC) > 0.2, and P < 0.001] (Fig. 1d). Among them, 3 CpG sites (MARCKSL1, RIC8B, and RAB11FIP3) were unmethylated in all the 30 GACs, and were considered as EAC-specific methylation markers (Supplementary Fig. S2a). Based on the maximum Youden index in the screening set, the optimal cut-off values of MARCKSL1, RIC8B, and RAB11FIP3 were calculated as 0.084, 0.143 and 0.134, respectively (Supplementary Fig. S2b).

Validation of the GAC- and EAC-specific methylation markers

To validate the predictive powers of GAC-specific markers (SLC46A3 and cg09177106) and EACs markers (MARCKSL1, RIC8B and RAB11FIP3), DNA methylation levels were analyzed by Infinium BeadChip array in an independent validation set of 18 GACs and 18 EACs. The two GAC-specific markers had significantly higher methylation levels in GACs compared to EACs (both, P < 0.001) (Fig. 2a). Using the cut-off values established in the screening set, SLC46A3 had a sensitivity of 77.8% and specificity of 100%, and cg09177106 had a sensitivity of 83.3% and specificity of 94.4% (Supplementary Table S4). Also, the three EAC-specific markers had significantly higher methylation levels in EACs compared to GACs (all three markers, P < 0.001) (Fig. 2b). Using the cut-off values established in the screening set, MARCKSL1 had a sensitivity of 100% and specificity of 88.9%, RIC8B had a sensitivity of 100% and specificity of 100%, and RAB11FIP3 had a sensitivity of 94.4% and specificity of 88.9% (Fig. 2b).

Fig. 2
figure 2

Validation of GAC-specific and EAC-specific methylation markers. a Methylation levels of GAC-specific markers, SLC46A3 and cg09177106, in the validation set. SLC46A3 and cg09177106 had significantly higher methylation levels in 18 GACs than in 18 EACs. b Methylation levels of EAC-specific markers, MARCKSL1, RIC8B, and RAB11FIP3, in the validation set. MARCKSL1, RIC8B, and RAB11FIP3, had significantly higher methylation levels in 18 EACs than in 18 GACs. Box plots represent mean methylation levels. Error bars represent SD. Two-sided Student’s t test was conducted

Application to FFPE samples

Implementation of a cost-effective method, for example pyrosequencing, is essential for clinical application. In addition, since most clinical samples are FFPE samples, we decided to establish a pyrosequencing system for the GAC- and EAC-specific markers for FFPE samples. We attempted to design pyrosequencing primers for the five potential markers, and those for the two GAC-specific markers (SLC46A3 and cg09177106) and one (MARCKSL1) of the three EAC-specific markers were successfully developed (Supplementary Table S3). Consistency between the methylation levels obtained by the pyrosequencing and those by the Infinium BeadChip array was analyzed using DNA from 30 cell lines. Using 17–18 cell lines with known methylation levels by Infinium BeadChip array in our previous studies [24,25,26,27] and obtained in this study, methylation levels of the two GAC-specific markers and one EAC-specific marker were measured by pyrosequencing. Correlation coefficients between the values obtained by the two methods were 0.947 for SLC46A3 (P < 0.001), 0.935 for cg09177106 (P < 0.001) (Supplementary Fig. S3a), and 0.934 for MARCKSL1 (P < 0.001) (Supplementary Fig. S3b).

DNA methylation levels of the GAC- and EAC-specific markers were measured in 37 GAC and 18 EAC FFPE samples by pyrosequencing. Two GAC-specific markers had higher methylation levels in GACs compared to EACs (SLC46A3, P = 0.0001; cg09177106, P = 0.0028) (Fig. 3a). Analysis of the Cancer Cell Line Encyclopedia database (www.broadinstitute.org/ccle) showed that SLC46A3 methylation and mRNA expression are strongly negatively correlated in 831 cancer cell lines (R = − 0.592, P = 9.8 × 10–80) (Supplementary Fig. S4) [34]. In contrast, the EAC-specific marker, MARCKSL1, did not show any difference between the GAC and EAC FFPE samples (Fig. 3b).

Fig. 3
figure 3

Application of GAC- specific and EAC-specific methylation markers to FFPE samples. a Methylation levels of GAC-specific markers, SLC46A3 and cg09177106, in the FFPE samples. SLC46A3 and cg09177106 had significantly higher methylation levels in 20 GACs than in 7 EACs. b Methylation levels of one EAC-specific marker, MARCKSL1, in the FFPE samples. Methylation levels of MARCKSL1 were not different between FFPE GAC and EAC samples. Box plots represent mean methylation levels. Error bars represent SD. Two-sided Student’s t test was conducted

Prediction of tissue origin of EGJ adenocarcinoma by the GAC-specific markers

First, DNA methylation levels of the GAC-specific markers, SLC46A3 and cg09177106, were analyzed in GACs in Western countries. Both markers also had high methylation levels in GACs in Western countries similar to cases of Japanese GACs (Supplementary Figure S5). Then, to further evaluate the clinical utility, a TCGA cohort (n = 46) of EGJ adenocarcinomas in Western countries was assessed for the fraction of GACs by the GAC-specific markers. SLC46A3 and cg09177106 had high methylation levels in six and four EGJ adenocarcinomas, respectively (one EGJ adenocarcinoma was overlapped). Finally, 9 of 46 EGJ adenocarcinomas (19.6%) were predicted to be GACs (Fig. 4a, b). This result indicated that most EGJ adenocarcinomas in Western countries are from the esophagus.

Fig. 4
figure 4

Prediction of tissue origin of EGJ adenocarcinomas by two GAC-specific markers, SLC46A3 and cg09177106. a DNA methylation levels of SLC46A3 in 46 TCGA EGJ adenocarcinomas. Six of 46 (13.0%) TCGA EGJ adenocarcinomas were predicted to be GACs (fresh-frozen samples, cut-off value = 0.03). b DNA methylation levels of cg09177106 in 46 TCGA EGJ adenocarcinomas. Four of 46 (8.7%) TCGA EGJ adenocarcinomas were predicted to be GACs (fresh-frozen samples, cut-off value = 0.061)

Discussion

The tissue origin of EGJ adenocarcinomas was predicted by utilizing DNA methylation profiles. Although the determination of the treatment protocols for Siewert type II adenocarcinomas is still controversial between Eastern and Western countries [35, 36], the predictive marker developed here might be useful to determine the treatment strategy. Since the two GAC-specific markers, SLC46A3 and cg09177106, have a very high specificity for detection of GACs in EGJ adenocarcinomas, once identified as GACs, patients would have benefitted on the treatment strategy of gastric adenocarcinomas. To clinically utilize these markers, we expect that future studies will clarify the sensitivity and specificity of the markers in FFPE samples.

The marker CpG site of SLC46A3 was located in its TSS200 region within a CpG island, and thus, its methylation was expected to be associated with its expression. The negative correlation between its methylation and expression in the Cancer Cell Line Encyclopedia database supported the expectation. In general, promoter CpG islands of genes with low expression are more likely to be methylated than those with high expression [30]. SLC46A3 methylation in GACs, but not in EACs, may have been caused by its low expression in the stomach and high expression in Barrett’s esophagus, which is considered as an origin of EACs [37]. In contrast, cg09177106 was located in a CpG island out of a genic region. Therefore, its methylation cannot be connected to expression of a specific gene, and the mechanism of how it is specifically methylated in GAC remains unknown.

Based on the markers isolated, the TCGA EGJ adenocarcinomas were actually mixtures of GACs and EACs, supporting a previous finding that Siewert type II EGJ adenocarcinomas are possibly a mixture of types I and III adenocarcinomas [38]. However, the fraction of GACs in TCGA EGJ adenocarcinomas was only 19.6%, indicating that most EGJ adenocarcinomas in Western countries were derived from the esophagus. Considering the sensitivity of the two markers (SLC46A3, 77.8%; cg09177106, 83.3%), the actual GAC fraction can be estimated to be around 25% in the TCGA cohort. It has been reported that Siewert type II adenocarcinoma patients in South Korea have a similar prognosis with type III, most of which are GACs [39]. The high incidence of GAC among the EGJ adenocarcinomas in Eastern countries was considered to be due to the high incidence of H. pylori infection [40, 41], and the high incidence of EAC in Western countries was considered to be due to a high incidence of gastroesophageal reflux disease and Barrett’s esophagus [42, 43].

Although the correlation of methylation levels between pyrosequencing and Infinium BeadChip array was high for one EAC-specific markers, MARCKSL1 (CHR1, 32802033), there was no difference in the methylation levels between EACs and GACs in the FFPE samples. This suggested that the marker CpG site is susceptible to DNA degradation in the FFPE samples. To address this issue, we used our previous study on the analysis of fresh-frozen and FFPE samples of the same specimen, and MARCKSL1 turned out to show a significant difference between the two types of samples. In contrast, the other two GAC-specific markers, SLC46A3 and cg09177106, presented consistent values (Supplementary Table. S5). Therefore, the probe of MARCKSL1 (CHR1, 32802033) was considered to be susceptible to DNA degradation during formalin fixation.

As alternative EAC-specific markers, screening of candidate markers using CpG sites located outside of CpG islands was also conducted. One CpG site (DAXX, cg03369465) was identified as a candidate EAC-specific marker (screening set, P = 7.2 × 10–6; validation set, P = 7.1 × 10–6) (Supplementary Fig. S6). However, DNA methylation levels of its adjacent CpG sites were discordant with that of cg03369465. Such discordant methylation levels in closely neighboring probes are often caused by unstable DNA methylation values of an unreliable probe, and we therefore decided to discard this probe.

Although both GAC-specific markers, SLC46A3 and cg09177106, had high specificity in both screening and validation sets, the sensitivity is still insufficient to be applied as a routine standard to identify the tissue origin of EGJ adenocarcinomas. Therefore, further screening of additional markers, which can be combined with markers identified in this study, will be necessary. To conduct additional screening, an increase in sample number, especially for EACs, is critical. Once a sufficient number of EAC samples are collected in future studies, efforts to identify additional combination markers should be made.

In conclusion, two GAC-specific DNA methylation markers, SLC46A3 and cg09177106, had high specificity for predicting the tissue origin of EGJ adenocarcinomas. This is further expected to be important for determining the therapeutic strategy for EGJ adenocarcinoma patients, especially for Siewert type II adenocarcinomas.