Introduction

Extensive genomic and epigenomic analyses of a variety of human cancers, including gastric cancers (GCs), have been and are being conducted [14]. However, these analyses are almost always affected by contamination from coexisting normal cells in primary cancer samples. Although genomic analyses are designed to detect mutations even in a small fraction of cells, they still fail to detect gene mutations in samples with a low fraction of cancer cells [5]. Moreover, epigenomic and gene expression analyses are heavily affected by the fraction of cancer cells [6]. To overcome the contamination from normal cells, laser-capture microdissection (LCM) is conducted [7, 8]. However, LCM is labor-intensive and time-consuming, and practically impossible for diffuse-type GCs.

Without purification of cancer cells, if a fraction of cancer cells in a sample can be assessed, a sample with an extremely low fraction of cancer cells can be excluded from subsequent analyses, or the data obtained may be corrected by the fraction of cancer cells. Such assessment has been generally conducted by an expert pathologist, which is time-consuming and almost impossible for diffuse-type GCs and a large number of samples. To overcome this limitation, efforts have been made to develop molecular markers. For example, cancer-cell-specific mutations identified by a single-nucleotide polymorphism microarray and next-generation sequencing can be used to assess the fraction of cancer cells [9, 10]. However, identification of such mutations must be conducted for each sample, and there is a sizable research cost for this approach.

To overcome these issues, in our recent study, we successfully isolated CpG islands specifically methylated in esophageal squamous cell carcinoma (ESCC) cells [11]. Three genes were methylated in almost all ESCC cells, but were not methylated or were barely methylated in normal esophageal mucosae, and at least one of the three genes was methylated in virtually all of 28 ESCC cases analyzed. Therefore, a panel of the three genes was considered to be a DNA methylation marker for the fraction of cancer cells. Using the marker, we were able to correct the fraction of ESCC cells, and showed that tumor-suppressor genes were methylated in almost all cancer cells.

In this study, for GCs, we aimed to isolate a DNA methylation marker that can be used to assess the fraction of cancer cells. Different from the esophagus, isolation of such a marker is far more difficult because gastric mucosae can have very high levels of DNA methylation owing to Helicobacter pylori infection [1215], and GC samples are contaminated with such gastric mucosae. Therefore, we paid special attention to isolation of marker genes not influenced by H. pylori infection.

Materials and methods

GC cell lines and tissue samples

Cell lines KATOIII, MKN45, NUGC3, MKN74, and MKN7 were purchased from the Japanese Collection of Research Bioresources (Tokyo, Japan), and the AGS cell line was purchased from the American Type Culture Collection (Manassas, VA, USA). Cell lines HSC39, HSC57, 44As3, and 58As9 were gifted by K. Yanagihara from the National Cancer Center, the TMK1 cell line was gifted by W. Yasui from Hiroshima University, and the GC2 cell line was established by M. Tatematsu at Aichi Cancer Center Research Institute.

A total of 56 primary GC samples (32 intestinal type and 24 diffuse type) were collected from surgical specimens of patients who had undergone gastrectomy, and 30 of the samples were used for our previous studies [1, 16]. Genome-wide DNA methylation and TP53 mutation data of the 30 GCs were obtained from one of the studies [1]. Peripheral leukocyte samples were collected from five healthy volunteers by a centrifugation method. Gastric mucosae were collected by endoscopic biopsy from 17 healthy volunteers (11 without and six with present H. pylori infection) and from noncancerous gastric mucosae of 27 GC patients. Among the 27 noncancerous gastric mucosae, 23 (nine without and 14 with present H. pylori infection) were used for our previous study [17]. H. pylori infection status was analyzed by a serum anti-H. pylori IgG antibody test (SRL, Tokyo, Japan), rapid urease test (Otsuka, Tokushima, Japan), or culture test (Eiken, Tokyo, Japan).

All of the samples, except for those used for LCM, were stored in RNAlater (Applied Biosystems, Foster City, CA, USA), and genomic DNA was extracted by the phenol–chloroform method. LCM was performed using formalin-fixed paraffin-embedded primary GCs by a Leica LMD7000 system [7, 18]. This study was conducted with the approval of the Institutional Review Board of the National Cancer Center. Written informed consent was obtained from all individuals.

Genome-wide DNA methylation analysis

Genome-wide DNA methylation analysis was performed using an Infinium HumanMethylation450 BeadChip array (Illumina, San Diego, CA, USA), which assessed the degree of methylation of 485,512 CpG sites. The methylation level of each CpG site was obtained as a β value, which ranged from 0 (completely unmethylated) to 1 (completely methylated). We excluded 11,551 CpG sites on the sex chromosomes, and the remaining 473,961 CpG sites were used for the analysis. Genomic blocks were defined as collections of CpG sites classified by their locations against transcription start sites and CpG islands [1].

Gene-specific DNA methylation analysis

Gene-specific DNA methylation levels were analyzed by quantitative methylation-specific PCR (qMSP). For DNA from surgical specimens in RNAlater, 1 µg was digested with BamHI, treated with bisulfite, purified, and suspended in 40 µl of Tris (hydroxymethyl) aminomethane–EDTA buffer, as described in [19, 20]. For formalin-fixed paraffin-embedded samples collected by LCM, DNA extraction and bisulfite treatment was conducted with an EpiTect Plus bisulfite kit (Qiagen, Hilden, Germany). Quantitative methylation-specific PCR (qMSP) was performed by real-time PCR using primers specific to methylated or unmethylated DNA (Table S1), the bisulfite-treated DNA, and SYBR Green I (BioWhittaker Molecular Applications, Rockland, ME, USA). The number of molecules in a sample was determined by comparing its amplification with that of standard DNA samples that contained known numbers of molecules (101–106 molecules). On basis of the numbers of methylated and unmethylated molecules, a methylation level was calculated as the fraction of methylated molecules in the total number of DNA molecules (number of methylated molecules plus number of unmethylated molecules). As a fully methylated control, blood genomic DNA treated with SssI methylase (New England Biolabs, Beverly, MA, USA) was used. As a fully unmethylated control, blood genomic DNA amplified twice with Genomiphi (GE Healthcare, Piscataway, NJ, USA) was used [21].

Gene expression analysis

Complementary DNA was synthesized from 1 µg of total RNA using SuperScript III (Invitrogen, Carlsbad, CA, USA). Quantitative reverse transcription PCR was performed using SYBR Green I and an iCycler thermal cycler. The measured number of complementary DNA molecules was normalized to that of GAPDH. The primers and PCR conditions are shown in Table S1.

Genomic DNA copy number analysis

Copy number alteration (CNA) of a specific genomic region was analyzed by quantitative real-time PCR using an iCycler thermal cycler and SYBR Green I. RPPH1 was used as a control gene located on a chromosomal region with infrequent CNA [22]. The number of DNA molecules in a sample was measured for the control gene and three regions flanking the target gene (Table S1). The number of DNA molecules of the target gene was normalized to that of the control gene, and the normalized number of DNA molecules in a sample was compared with that in human leukocyte DNA to obtain the CNA. All the analysis was conducted in duplicate. A CNA (gain or loss) was defined as a twofold or greater increase or a 0.5-fold or smaller decrease.

Mutations of TP53 and mutant frequency

The TP53 mutation status and mutant frequency were obtained from our previous study [1]. Briefly, the mutation was analyzed by target sequencing using an Ion AmpliSeq cancer panel kit (Life Technologies, Carlsbad, CA, USA) and an Ion PGM next-generation sequencer.

Statistical analyses

The correlation was analyzed using Pearson’s product-moment correlation coefficients, and its P value was obtained by the parametric hypothesis test. A difference in the mean DNA methylation level was analyzed by Student’s t test. A result was considered significant when the P value was less than 0.05 by a two-sided test.

Results

Selection of regions specifically methylated in GCs by a genome-wide screening

To screen specific regions not methylated in normal cells and fully methylated in GC cells using the Infinium HumanMethylation450 BeadChip array, we searched for CpG sites (1) with β ≤ 0.2 in six samples of normal gastric mucosae, one sample of peripheral leukocytes, and four samples of noncancerous mucosae, and (2) with β ≥ 0.8 in at least six of 12 GC cell lines. A total of 1,006 CpG sites were isolated from 473,961 informative CpG sites on autosomes. Then, to screen regions frequently methylated in primary GCs, CpG sites for which β ≥ 0.3 in 20 or more of 30 primary GCs [1] were searched (Fig. 1a). From the 1,006 CpG sites, 18 CpG sites derived from 16 genomic regions were isolated (Table S2). From the 16 genomic regions, PRDM16 was excluded because its gene amplification was known [23], and five other regions were also excluded because they did not have neighboring CpG islands or known genes.

Fig. 1
figure 1

Selection of specifically methylated regions by a genome-wide screening. a Specific genomic regions not methylated in normal cells and fully methylated in cancer cells were selected by a genome-wide screening using an Infinium HumanMethylation450 BeadChip array. Eighteen CpG sites derived from 16 genomic regions were isolated. b Five regions of five genes (OSR2, VAV3, PPFIA3, LTB4R2, and DIDO1) were selected because of their genomic structure and the availability of quantitative methylation-specific PCR (qMSP) primers. The genomic structure, including the location of a CpG island, transcription start site, introns, and exons, is shown at the top. The β values of the CpG sites analyzed using the bead array are shown in the middle, and the broken lines show the threshold used in the screening. A CpG map around the CpG site(s) is shown at the bottom. Vertical lines (solid or broken) show CpG sites, with broken lines showing CpG sites whose β values were measured by the bead array. Arrows show locations of primers for qMSP. M methylated, U unmethylated

For the remaining ten regions, we attempted to design primers for qMSP, and primers for both methylated and unmethylated DNA were successfully designed for five regions of five genes (OSR2, VAV3, PPFIA3, LTB4R2, and DIDO1) (Fig. 1b). To confirm the genome-wide DNA methylation data obtained by the bead array, qMSP was conducted using the 12 GC cell lines mentioned in “GC cell lines and tissue samples” and one sample of peripheral leukocytes. DIDO1 had slight methylation in the peripheral leukocytes, and was excluded from further analysis. The methylation levels of the other four genes (LTB4R2, OSR2, VAV3, and PPFIA3) obtained by qMSP were in good accordance with the bead array data (Fig. S1).

Isolation of genes not influenced by H. pylori infection

Gastric mucosae with H. pylori infection are known to have very high DNA methylation levels [12, 13]. To exclude genes influenced by H. pylori infection, the methylation levels of the four genes were analyzed in 23 gastric mucosa samples of H. pylori-positive (n = 14) and H. pylori-negative (n = 9) individuals, as well as four samples of peripheral leukocytes different from the one used for the initial screening. The LTB4R2 methylation level in the H. pylori-positive individuals was higher than that in the H. pylori-negative individuals and the four samples of peripheral leukocytes, showing that the LTB4R2 methylation level was affected by H. pylori infection. On the other hand, OSR2, VAV3, and PPFIA3 were almost unmethylated in the three groups (Fig. 2).

Fig. 2
figure 2

Isolation of genes not influenced by Helicobacter pylori infection. Methylation levels of the four genes were analyzed by quantitative methylation-specific PCR in noncancerous gastric mucosae of H. pylori-positive (n = 14) and H. pylori-negative (n = 9) individuals, as well as four samples of peripheral leukocytes. LTB4R2 was excluded because its methylation level was higher in the H. pylori-positive individuals than in the H. pylori-negative individuals

We also analyzed the expression of OSR2, VAV3, and PPFIA3 using 17 normal gastric mucosa samples of H. pylori-positive (n = 11) and H. pylori-negative (n = 6) individuals. VAV3 was highly expressed in both H. pylori-positive and H. pylori-negative gastric mucosae, whereas OSR2 and PPFIA3 were only weakly expressed (Fig. S2).

High incidence of methylation of the three genes and their specificity using LCM-purified cells

To examine the incidence of methylation of the three genes in primary GCs, we performed qMSP using 26 independent primary GCs, and observed that at least one of the three genes was methylated in all of the 26 GCs (Fig. 3a). These data showed that if these three genes were used as a panel, they would have a higher coverage (100 %) of primary GCs.

Fig. 3
figure 3

High incidence of methylation of the three genes and specificity of methylation using cells purified by laser-capture microdissection (LCM). a The incidence of hypermethylation of the three genes was analyzed in 26 independent primary gastric cancers (GCs) by quantitative methylation-specific PCR. At least one of the three genes was methylated in all of the 26 GCs. b Methylation levels of the three genes were analyzed in four primary GCs before LCM and four pairs of purified cancer and noncancer cells after LCM. At least one of the three genes was highly methylated in GC cells (more than 85 %), but all the three genes were barely methylated in noncancer cells (less than 5 %). Dotted rectangles show the panel of the three genes as a DNA methylation marker

To confirm that the three genes were highly methylated only in GC cells but not in coexisting noncancer cells, four pairs of cancer and noncancer cells were collected by LCM. We found that at least one of the three genes was highly methylated in GC cells (more than 85 %), but that all of them were barely methylated in noncancer cells (less than 5 %) (Fig. 3b). The highest methylation level of the three genes was considered to reflect the fraction of cancer cells, and we defined the panel of the three genes as a DNA methylation marker to estimate the cancer cell fraction in a GC sample.

Because DNA methylation levels of some genes can be influenced by age [24], we also analyzed the correlation between the methylation of the three genes and age. The methylation levels of the three genes were found to be independent of age (Fig. S3).

CNAs of the three genes

CNAs of a marker gene can affect the methylation level of its region in cancer samples [25]. Therefore, we analyzed CNAs of the three regions in the 20 GCs used for the bead array analysis (Fig. 4). VAV3 and PPFIA3 showed no CNAs of more than twofold or less than 0.5-fold. In contrast, OSR2 showed CNAs at low frequencies (more than twofold in one GC and less than 0.5-fold in two GCs). It was calculated that the deviation of the methylation level from the true cancer cell fraction would be 17.2 % when twofold or 0.5-fold CNA was present in cancer cells [11]. Therefore, the effect of the CNA of OSR2 was considered to be minimal in the estimation of the cancer cell fraction.

Fig. 4
figure 4

Copy number alterations (CNAs) of the three genes. CNA of the three genes was analyzed by real-time PCR of the 20 gastric cancers (GCs) used for the bead array analysis. Significant CNA (gain or loss) was defined as a twofold or greater increase or a 0.5-fold or smaller decrease, respectively. Only OSR2 showed CNAs at low frequencies (twofold or greater in one GC; 0.50-fold or smaller in two GCs)

Correlation between the cancer cell fraction estimated by DNA methylation and that estimated by a genetic alteration

To evaluate the accuracy of the DNA methylation marker, 13 GCs with TP53 mutation were identified among the 30 GCs used for the bead array analysis, and the cancer cell fraction estimated by the marker was compared with the TP53 mutant frequency. A high correlation between the two methods was observed (r = 0.77, P < 0.001; Fig. 5). This result showed that the cancer cell fraction estimated by the DNA methylation marker accurately reflected the true fraction of cancer cells in a tumor sample.

Fig. 5
figure 5

Correlation between the cancer cell fraction estimated by DNA methylation and that estimated by a genetic alteration. The cancer cell fraction estimated by the DNA methylation marker was compared with the TP53 mutant frequency. A high correlation between the two methods was observed (r = 0.77, P < 0.001)

Application of the DNA methylation marker to correction of the bead array data

We applied the DNA methylation marker to correct the influence of contamination by normal cells in the data from the epigenomic analysis. For the 30 primary GCs used for the bead array analysis, we measured the fraction of cancer cells using the marker, and corrected the bead array data by division with the evaluated fraction. Unsupervised hierarchical clustering analysis was conducted using 263 genomic blocks selected because their downstream genes were silenced by aberrant methylation [1] (Fig. 6b). Compared with the heatmap before the correction (Fig. 6a), two samples, S20T and S22T, moved from the CpG island methylator phenotype (CIMP)-negative group to the CIMP-high group. The cancer cell fraction in these two samples was less than 20 % (Fig. 3a). After exclusion of these two samples and correction of the methylation levels, the clustering of the CIMP-high, CIMP-moderate, CIMP-low, and CIMP-negative GCs became much clearer (Fig. 6c). From these data, we concluded that the DNA methylation marker could be used to identify and exclude samples with an extremely low fraction of cancer cells, and to correct the molecular data.

Fig. 6
figure 6

Application of the DNA methylation marker to the correction of the bead array data. a Unsupervised hierarchical clustering analysis of the 30 primary gastric cancers using DNA methylation profiles of 263 genomic blocks. b Two samples surrounded by a red square (S20T and S22T) moved from the CpG island methylator phenotype (CIMP)-negative group to the CIMP-high group after the Infinium HumanMethylation450 BeadChip array data had been corrected by the DNA methylation marker. c After exclusion of two samples with a low fraction of cancer cells, a heatmap using the corrected bead array data showed a much clearer clustering of CIMP-high, CIMP-moderate, CIMP-low, and CIMP-negative gastric cancers

Discussion

We successfully established a panel of three genes (OSR2, VAV3, and PPFIA3) as a marker to estimate the fraction of cancer cells in primary GCs. Using the DNA methylation marker, we were also able to identify and exclude samples with a low fraction of cancer cells, and to correct the methylation levels by the fraction of cancer cells. After this, the genome-wide DNA methylation profiles yielded clearer clustering of CIMP by unsupervised hierarchical clustering analysis. This is the first molecular marker for the cancer cell fraction in GC.

The DNA methylation marker has the advantages of simplicity without the need for experienced pathologists or paired normal samples, compared with microscopic examination and genomic alterations. Also, the DNA methylation marker is likely to have a broad coverage in primary GCs because the DNA methylation marker was methylated in 100 % of the 26 primary GCs used for validation. Further, we were easily able to use the DNA methylation marker to assess the cancer cell fraction, even in diffuse-type GCs, for which even an expert pathologist has difficulty in estimating the cancer cell fraction. Finally, since the methylation levels of the three genes were independent of age, this marker was regarded to be useful to estimate the cancer cell fraction irrespective of age.

The correlation of the cancer cell fraction estimated by the DNA methylation marker with TP53 mutant frequency was high (r = 0.77, P < 0.001). However, in two samples, the cancer cell fraction estimated by the marker was twice as large as that estimated by the TP53 mutant frequency. Since loss of heterozygosity can coexist with a mutation of TP53 in GCs, we speculated that the discrepancy between the two methods in the two GC samples might have been caused by the loss of heterozygosity of TP53.

Gastric mucosae, especially when infected with H. pylori, can have very high levels of DNA methylation, so we paid special attention to isolation of marker genes in this study. The panel of the three genes was not affected by H. pylori infection because the genes were barely methylated in H. pylori-positive mucosae. Only two samples in H. pylori-negative individuals had a high methylation of VAV3 or PPFIA3, respectively. One possible reason for detection of such high methylation levels in H. pylori-negative samples is that these two samples were contaminated with cancer cells because they were resected from samples from GC patients. Another possible reason is that they were methylated in noncancer cells during past H. pylori infection.

A CNA can affect the methylation level of a marker gene. Therefore, we analyzed the CNAs of the three genes in 20 primary GCs used for the bead array analysis, and found CNAs of the three genes had little influence on the estimation of the cancer cell fraction. Regarding the expression of the three marker genes, only VAV3 was highly expressed in normal gastric mucosae. The region of VAV3, for which DNA methylation was analyzed, was outside the nucleosome-free region, suggesting that its transcription is not necessarily suppressed by the methylation.

In summary, a DNA methylation marker—namely, the panel of the three genes—was isolated, and was shown to be qualified to estimate the cancer cell fraction in GCs. Application of the marker to correction of the bead array data showed promising results for improving the accuracy of molecular analysis. The DNA methylation marker is expected to be useful in many aspects of GC research.