Background

Cervical cancer is one of the leading causes of cancer-related mortality in women worldwide. Cervical intraepithelial neoplasia (CIN) is a premalignant transformation and abnormal growth (dysplasia) of squamous cells of the cervix. Early screening together with timely treatment of precancerous lesions can substantially improve clinical outcome, thus offering a unique opportunity to cervical cancer management. The widely used screening strategy, cytology-based Pap smear, has been associated with a significant reduction of cancer incidence rate and mortality [1].

Besides finding cervical carcinomas, cervical cancer screening aims to identify high-grade intraepithelial lesions (corresponding to histological grades CIN II and CIN III) which require surgical procedures to prevent further progression. Low-grade intraepithelial lesions (corresponding to histological grade CIN I), on the other hand, should not be over-treated for such procedures as they have high potential to spontaneously regress to normal [2]. However, the sensitivity of Pap smears for the detection of CIN II or higher grades is generally low [3,4]. On the other hand, the highly sensitive diagnostic high-risk human papillomavirus (HPV) DNA testing tends to give false positives [5-9]. A third strategy is direct colposcopy [10], which requires interpretational expertise, is not amenable to high throughput processing, and has low positive predictive values for low-grade squamous intraepithelial lesions [4,11]. Finally, even for histopathology specimen of cervical biopsies, objective CIN diagnosis can be sometimes challenging. The reproducibility of cervical histopathologic interpretations was moderate and equivalent to the reproducibility of monolayer cytologic interpretations [12]. Thus, an objective, high-throughput approach with high sensitivity and specificity is urgently needed for early diagnosis of cervical cancer.

Numerous investigations have reported that gene-specific hypermethylation occurring in pre-invasive and invasive phase of cervical cancer may be promising biomarkers for early diagnosis [13-21]. A review of the results of 51 published cervical cancer methylation studies involving 68 different genes concluded, however, that no single methylation marker from these studies was suitable as a cervical cancer biomarker [13]. Most identified biomarkers, with a few exception [22,23], lacked sufficient independent validations. Currently, therefore, it is as important to validate existing candidates as to identify additional ones. Another concern regarded inconsistent results in methylation studies. Most of these studies used methylation-specific PCR (MSP) or quantitative methylation-specific PCR (QMSP) methods [24], analyzing in each gene one or two CpGs which were selected randomly as those feasible for primer/probe design, assuming hypermethylation is uniform across CpG promoter and the analyzed CpGs are representative. The measured methylation frequencies varied widely for the same gene even between studies that used common specimen or similar assays [13]. A recent study by Lai et al. [18], on a Chinese cohort of squamous cell carcinoma (SCC), identified six novel genes (SOX1, PAX1, NKX6-1, LMX1A, ONECUT1 and WT1) as more frequently methylated in SCC tissues than in normal controls. Some of the markers were verified by the same laboratory using QMSP (MethyLight) [25,26]. However, two of these methylation markers had different performance between the two studies [18,25]. Moreover, another study by an independent laboratory using QMSP on liquid-based cytology samples from a UK cohort found that only one of these genes, SOX1, was able to discriminate between high-grade squamous intraepithelial lesions and controls [15]. Surprisingly, although such disturbing discrepancies cast considerable doubts on the validity of identified biomarker candidates, little study was undertaken to examine their potential causes.

We suspected that different CpGs assayed by different groups for the same genes may be a major factor contributing to the result variances, and decided to systematically evaluate different CpGs from multiple genes as methylation biomarkers. For early detection of cervical cancer, it is clinically more useful to find epigenetic correlates discriminating between histologically distinct CINs than between SCC and normal cervices, yet few studies have focused exclusively on CIN development. We set to evaluate the utility of methylation biomarkers in distinguishing high-grade from low-grade CIN lesions. Our aims were therefore twofold: (1) to evaluate the relative importance of different CpGs as methylation biomarkers and thus decide whether randomly selecting CpGs to assay, as practiced in most methylation biomarker studies, is justified and (2) to find an optimal panel of candidate hypermethylated CpGs with high sensitivity and specificity for precancerous CIN II or CIN III.

To these ends, we evaluated 34 CpG units from five candidate genes, using definitively diagnosed FFPE tissue specimens from an independent cohort of 100 Chinese precancerous cervical patients and normal controls, who shared a common genetic background with the subjects of the original gene-discovery study [18]. We used a matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS)-based DNA methylation quantification technology (EpiTYPER, Sequenom) [27], which is fundamentally different from the commonly used MSP or QMSP methods. Our method yields direct quantification of the percentage of DNA methylated in a CpG unit, with results highly concordant with bisulfite sequencing [28]. This technology has already been applied to evaluate methylation patterns of leukemia [29] and non-small cell lung cancer [30].

To rank CpG units with discriminating power, we used traditional nonparametric statistics as well as Random Forest, a method particularly well suited for analyzing mass spectrometry data in studies of biomarker identification for cancer classification [31]. We then used support vector machine (SVM) [32-35] with cross-validation and bootstrap resampling, which randomly partitioned the tested samples into training and validation sets, to construct an inferred model and assessed the predicative power of the model. Our results showed that choosing the right CpG unit to assay is critical, and a panel of multiple specific CpG methylation constructed by computerized algorithm allowed us to separate high-grade CIN from low-grade or healthy subjects with high accuracy, providing a candidate biomarker panel for early detection of cervical cancer development.

Results

Survey of CpG methylation of five genes by MALDI-TOF-based EpiTYPER assay

A total of 100 FFPE cervical samples with histopathological classifications of normal (N = 16), CIN I (N = 31), and CIN II or CIN III (N = 53 including 4 CIN II and 49 CIN III) were obtained retrospectively from a cohort of ethnic Han Chinese women. The CIN samples were all tested HPV positive (data not shown), and consensus histological diagnoses were provided independently by two pathologists, with confirmation by p16 and Ki-67 immunohistochemistry staining (Figure 1). There was no significant age difference between different groups (Table 1).

Figure 1
figure 1

CINs with p16 and Ki-67 immunostaining. Immunohistochemical examination of p16 (A–C) and Ki-67 (D–F) protein expressions in histologically CIN I (A, D), CIN II (B, E), and CIN III (C, F) tissues. Magnifications × 40.

Table 1 Sample characteristics and number of samples whose CpG islands for each gene were successfully amplified for EpiTYPER analysis

To analyze the methylation status of PAX1, NKX6-1, SOX1, LMX1A, and ONECUT1, a CpG island (CGI) for each gene was chosen for amplification (Figure 2 and Table S1 in Additional file 1). Each CGI contained five to eight CpG units that can be analyzed by EpiTYPER in this study (Table 2).

Figure 2
figure 2

The positions of CpGs analyzed by EpiTYPER. Drawings are schematic and not to scale. The orientation of each gene is indicated by the arrow at the end. Boxes indicate exons or UTRs; vertical lines indicate individual CpGs in the CGI regions, and the horizontal bars indicate the regions analyzed by EpiTYPER.

Table 2 The number of CpG units for each gene

Upon DNA extraction, bisulfite treatment, and PCR amplification, 72–94% of the samples, depending on the target gene of interest, generated sufficient amplicons that were amenable to subsequent EpiTYPER analysis (Table 1), suggesting that the assay design and sample processing protocol were suitable for the archival FFPE samples. The EpiTYPER is capable of simultaneously determining all applicable CpG units within a CGI amplicon in one well. Quantitative methylation assessment of a total of 34 CpG units (Table 2) in 100 individuals was completed in two 384-well plates in a single day.

When we examined CGIs of candidate genes, we observed unexpectedly that the mean methylation levels of the CGIs of four of the five genes were not statistically different between CIN II/III and CIN I/normal groups (Figure 3A), indicating that during CIN development, the overall methylation status of the examined CGIs of NKX6-1, SOX1, LMX1A, and ONECUT1 did not change. However, when we examined individual CpG units, statistically significant difference in methylation between the CIN II/III and CIN I/normal groups emerged for eight CpG units (Figure 3B). PAX1 had the highest methylation level among the five genes, with the methylation level of four CpG units being significantly different between the two groups (P < 0.05). Although the overall methylation level of LMX1A was low, one CpG unit (L_CpG 28.29.30) was differentially methylated between the two groups (Figure 3B). NKX6-1 and SOX1 exhibited a moderate methylation level and, respectively, contained one (N_CpG 9.10) and two (S_ CpG 17.18 and S_ CpG 34.35) significant CpG units. ONECUT1 contained no significantly methylated CpG units (Figure 3B). These data suggested that during CIN development, the methylation within CGIs was not uniform.

Figure 3
figure 3

The methylation patterns of the five genes. (A) Dot plot of the methylation levels for each gene. Each dot represents a sample, with methylation level averaged over all CpG units analyzed for the gene. (B) Bar graph of the methylation levels of individual CpG units. Open and filled columns denote CIN I/normal and CIN II/III group, respectively. Each bar represents mean methylation of all samples in the group. Error bars indicate SEM; *P < 0.05 for Mann-Whitney U test.

Validation by bisulfite sequencing

To validate the EpiTYPER methylation results using independent methods, we did bisulfite sequencing for three genes on eight samples (Figure 4). As expected, the sequencing results were in accordance with quantitative EpiTYPER results. Moreover, sequencing confirmed that the DNA methylation was not uniform, as some specific CpGs tended to exhibit more frequent methylation than other CpGs (Figure 4).

Figure 4
figure 4

Bisulfite sequencing (BS) of CpGs assayed by EpiTYPER. Three genes were bisulfite-sequenced in eight cervical samples of various stages. In each panel, sample ID is shown at the top, and EpiTYPER results are shown below the gene name as the average level for all measured CpGs. BS results are summarized as filled circles representing methylated CpGs and open circles representing unmethylated CpGs. Each line is an independently sequenced clone. Each column is a CpG of the gene.

Significance ranking of CpG units by Random Forest

To evaluate the contribution of 34 CpG units to the separation of CIN II/III subjects from CIN I/normal ones, we employed the Random Forest algorithm (see “Methods”) in addition to the standard nonparametric statistical method; Figure 5 shows the mean decrease in accuracy (MDA) values of the 34 CpG units, with higher MDA indicating increasing importance of a CpG unit as predictor [36]. We tested the performance of classifiers constructed by the assembly of different features iteratively. When the selected features were PAX1_CpG12, SOX1_CpG34.35, LMX1A_ CpG28.29.30, NKX6-1_CpG9.10, and PAX1_CpG6.7.8, the classifiers achieved the optimal performance. Table 3 presents the Mann-Whitney U test result of these five CpG units.

Figure 5
figure 5

Feature importance for prediction of CIN II/III according to Random Forest algorithm. MDA: mean decrease in accuracy.

Table 3 Summary of nonparametric statistics for the selected CpG units in building classifier

Classification model built by SVM

To build optimal classification model, the methylation levels of the above five CpG units were entered into a SVM with radial basis function (RBF) kernel (see “Methods”). When C and γ were 410 and 14, the classifiers showed the optimal and robust performance. With 200 times bootstrap resampling and 10-fold cross validation, the classification model showed high predictive power with sensitivity, specificity, and accuracy of 0.804 ± 0.028, 0.812 ± 0.008, and 0.808 ± 0.014 (mean ± SD), respectively (Table 4).

Table 4 Evaluation parameters of SVM classifier model trained by the best parameter setting

Discussion

CpG hypermethylation of key genes involved in cervical cancer development may be promising biomarkers for early diagnosis [13-18,23]. However, progress in the field of cervical methylation biomarker discovery has been hampered by inconsistent results that defy validations [13]. Most of the previous studies used nonquantitative MSPs [18,37] that are highly sensitive but with the drawback of being unable to distinguish tumors with substantial methylation from those with biologically insignificant methylation levels [28]. Recently developed quantitative MSP (QMSP) such as MethyLight provides a better alternative and is becoming increasingly used in methylation studies [24,38,39]. However, QMSP will only detect methylation of few CpGs (equivalent of one CpG unit in our assay, usually 2 ~ 3 CpGs each) [24]. It is difficult to design assays in CpG-rich areas without having the probe overlapping flanking CpGs, and with the probe having sufficient annealing temperature to achieve robust annealing and specificity in the highly AT-rich sequence after bisulfite conversion. This makes QMSP limited in applications and perhaps explains the sometimes variable results obtained for the same gene using different primer/probe designs [15,18,25]. In contrast, primers used in EpiTYPER assay did not involve CpGs (Table S1 in Additional file 1), resulting in more consistent results. The EpiTYPER assay showed much better correlation with the gold-standard sequencing results than the MSP-based assays [28]. The technique could analyze almost all CpGs covered within one amplicon for a gene, instead of 2 ~ 3 CpGs randomly chosen by QMSP. We used this novel quantitative platform to survey 5 ~ 8 CpG units (containing 13 ~ 35 CpGs) within a CGI of each gene for five genes. We found that randomly selecting CpGs to assay gene methylation can be problematic, as CpG methylation is not uniform during CIN development (Figure 3B). Of all genes except PAX1, the overall methylation status, defined as the averaged level of all CpGs within the CGI, was similar between CIN II/III and CIN I/normal(Figure 3A). This is in contrast to the conclusion based on a much limited number of CpGs using MSP [18,25]. Only select CpG units may be used as markers distinguishing CIN II/III from CIN I/normal samples (Figure 3B). Consistent with our observation, other reports has demonstrated that aberrant DNA methylation of only specific CpGs within the CGI are responsible for the downregulation of gene expression [40-44], and more recently, a substantial number of studies reported a specific, single CpG can function as strong prognostic or predictive indicators in various cancers [28,45-47].

Our findings highlight the importance of studying the detailed methylation pattern within a CGI, as they reveal the temporal complexity of DNA methylation during cervical cancer development, and emphasize the importance of not only methylated marker genes but also specific CpGs for identifying high-grade CINs. Therefore, choosing the right CpG unit to assay is critical, and previous inconsistencies among different labs regarding methylation status of the same genes may be due to CpG choices.

We also note that significant CpG units for CIN development can reside beyond the promoter, in exonic or intronic regions as well (Figure 2), just as CpG methylation outside of promoter region can be responsible for tumor suppressor inactivation in breast cancer [48]. Although we did not evaluate all CpGs of these marker genes, our original findings could provide diagnostically useful methylation biomarkers for high-grade CIN. Moreover, MALDI-TOF-based technology gave consistent results to assay these CpG markers in a multiplexed, high-throughput fashion suitable for clinical applications.

Diagnostic classifiers built on multigene methylation panels have shown better performance in predicting a wide variety of tumors [49]. However, such studies commonly associated with the overfitting problem [50,51]. To overcome this, we used SVM to construct the classifier model [52-54] and coupled with a procedure of 10-fold cross validation (in which our samples were partitioned into randomly assigned training and testing sets for the model to be validated 10 times) and 200 times bootstrap resampling (in which the partitioning and cross-validation was randomized and repeated 200 times). Such procedures help reduce overfitting and provide a reliable estimate of the performance of the model [55]. Compared with classification methods used in previous studies [18,25], SVM is a statistical learning method with greater accuracy in diagnostic ability [32,33,54,56] and with more consistent performance at our sample size [57].

Hypermethylated genes selected to predict invasive cervical cancer achieved a sensitivity about 90% according to previous study. However, the high-grade CIN lesions were predicted with much lower sensitivity (~70%) [20,58]. Our panel of CpG units obtained a high sensitivity and specificity of ~80%, achieving a valuable balance between sensitivity and specificity in identifying high-risk samples. The high specificity of our classifier would be particularly suitable for developing countries like China, where cervical cancer prevalence remained relatively high.

Identification of a set of reliable CIN biomarkers serves as a foundation for potential future applications such as quality assurance of histopathology classifications and noninvasive cervical cancer screening if these markers are validated in exfoliated cell samples from cervical scrapings or Pap smears. Our panel of CpG units and the EpiTYPER platform can potentially be a part of an objective, high-throughput strategy for early cervical cancer detection.

Conclusions

Our findings highlight the significance of studying the detailed methylation pattern within a CGI and emphasize the importance of not only methylated marker genes but also specific CpGs for identifying high-grade CINs. We demonstrated the value of the MALDI-TOF technology in methylation biomarker identification and obtained a five-CpG panel with a promising potential as a biomarker for the early detection of cervical cancer.

Methods

Samples

Formalin-fixed paraffin-embedded (FFPE) cervical biopsy samples were obtained from outpatients visiting the Beijing Aerospace Central Hospital from 2007 to 2012. All histological specimens were tested for HPV DNA (Hybrid Capture-2 kit; Qiagen, Gaithersburg, MD) and for p16 and Ki-67 immunostaining (Beijing Zhong Shan Golden Bridge Biological Technology Co., Ltd.) [59]. The specimens were reviewed independently by two expert pathologists from the Departments of Pathology at Aerospace Central Hospital and Beijing Tiantan Hospital, and only concordant, clearly unambiguous specimens were chosen for the study. Exclusion criteria included uncertain histopathological classification, pregnancy, chronic or acute systemic viral infections, presence of other cancers, skin or genital warts, and an immunocompromised state. Informed consents were obtained from all patients and controls. The study followed the ethical guidelines of the Institutional Review Board of the Aerospace Central Hospital.

DNA preparation and bisulfite treatment

Genomic DNA was extracted from archival FFPE blocks using an established protocol [60]. DNA was quantified using the NanoDrop 2000 Spectrophotometer (Thermo Fisher Scientific Inc, CA.). Only DNA samples exhibiting an A260/A280 ratio between 1.8 and 2.0 were considered for further testing.

EZ DNA Methylation Kit (Zymo Research Corporation, CA) was used to modify extracted genomic DNA according to the manufacturer’s protocol with Sequenom recommendations.

MALDI-TOF-MS-based DNA methylation analysis

MALDI-TOF-MS-based DNA methylation assay (EpiTYPER) was performed according to the manufacturer’s specification [27] (Sequenom Inc. CA). Bisulfite-modified genomic DNA was used as PCR template. Primers for PCR (Table S1 in Additional file 1), which do not contain CpGs and amplify both methylated and unmethylated sequences equally, were designed using EpiDesigner (http://www.epidesigner.com/) with the following constraints: (1) the amplicon was located in a CGI of the target gene, (2) the amplicon size is below 300 bp to increase the amplification success rate of FFPE samples, and (3) the amplicon covers as many CpGs as possible. The reverse primers included at the 5′ end a T7 promoter tag [5′-cagtaatacgactcactataggg-3′]. Only samples successfully amplified with a clear and specific PCR band at the expected size were included for further analysis. After PCR amplification, T7 RNA polymerase (Sequenom Inc.) was used to in vitro transcribe single-stranded RNA, which was then cleaved base-specifically by RNase A [27] (MassCLEAVE, Sequenom Inc.). The cleavage products, which contained either individual CpG or short stretches of adjacent CpGs, were analyzed using a MALDI-TOF mass spectrometer (Sequenom Inc). The peak areas of the mass signals derived from methylated and non-methylated template DNA were used to estimate the relative methylation level (valued from 0 to 1 or 0 to 100%). Methylation level for each CpG unit represents average of CpGs within the unit.

We used 100 and 0% methylated human DNA (EpiTect Control DNA Set, QIAGEN Inc. CA) as positive and negative controls, respectively, for the amplification and methylation determination. No-template controls were included for each amplicon to monitor PCR specificity.

Bisulfite sequencing

We cloned the EpiTYPER PCR products into pGEM-T Easy vectors (Promega, WI). For each sample, Sanger sequencing was performed on 10 random individual clones using the 3730 automatic sequencer (Applied Biosystems, CA). Sequencing results were analyzed using the QUMA online software suite (http://quma.cdb.riken.jp/).

Statistical analysis

For all statistical analysis in this study, the normal and CIN I samples were grouped into one category, so that all samples were classified as either CIN II/III or CIN I/normal. The relative methylation of each CpG unit in the dataset was analyzed as continuous variables. Nonparametric statistical analysis was performed with the two-tailed Mann-Whitney U test for unpaired comparisons (GraphPad Prism 5.01), with statistical significance set at P value <0.05.

Additionally, the significance of CpG unit was assessed using the MDA calculated by the feature selection algorithm of Random Forest [36] (https://code.google.com/p/randomforest-matlab). Two main parameters of Random Forest, ntree (the number of trees in the forest) and mtry (the number of variables randomly chosen at each split in a tree), were set to 5000 and 6, respectively.

We used SVM with a RBF kernel for the classifiers. The SVM parameters (penalty parameter C, kernel parameter γ) were optimized using grid-search method [50]. Besides SVM, we used 10-fold cross-validation combined with 200 times bootstrapping sampling in constructing and evaluating the classification model. Thus, the original samples were randomly partitioned into 10 equal-sized subsets, 9 of which were used as training data and the remaining set for validation testing. The process was repeated 10 times to ensure each subset was used exactly once as the testing set. The random partition and cross-validation repeat 200 times altogether. The classification performances were assessed using the sensitivity, the specificity, and the accuracy of the classification [61]. All computational experiments were carried out in the MATLAB (Version 8.1) programming environment.