Introduction

The importance of Barrett’s esophagus (BE) lies in its increasing prevalence and strong predisposition to esophageal adenocarcinoma (EAC) [1]. Western prevalence is estimated at 0.5–2.0 % [2] (a third of whom are asymptomatic [3]), conferring a 30- to 125-fold increased risk of developing EAC [4]. As EAC becomes more common [5], the benefits of predicting susceptibility to BE and detecting established metaplasia as soon as possible are numerous. Population screening overall is not presently feasible, but identifying both at-risk individuals and established metaplasia sooner provides windows for risk factor modification, chemoprevention, ablation, resection, and surveillance.

However, the natural history of BE is incompletely understood. Overall, the incidence of progression from metaplasia to high-grade dysplasia (HGD) or EAC is approximately 0.26–0.63 % per year [6], and that to EAC alone 0.12 % [7]. However, only a minority of patients progress to low-grade dysplasia (LGD), and of those that do similarly only a minority ultimately progress to HGD or EAC [8]. These progressive grades of dysplasia can help identify patients at particular risk (in particular, those with HGD in whom the risk is considerable) [9], but are insufficient in isolation.

There is therefore an immediate clinical need for biomarkers to predict both susceptibility to BE and progression. A major subtype is genetic variants, both germline and somatic. The potential of the latter, in particular, has been highlighted by a number of recent next-generation sequencing (NGS) studies of both EAC and BE, which have identified a number of candidate genes for further study [10]. However, no systematic reviews have been performed. The aims of this study were firstly to identify and evaluate all genetic markers tested in association with BE susceptibility and progression. Secondly, we aimed to identify markers with statistically significant associations and perform meta-analysis for those assessed by more than one study.

Methods

Inclusion Criteria

Studies testing associations between DNA markers [germline single nucleotide polymorphisms [SNP], somatic single nucleotide variants, insertions/deletions, copy number variants (CNV), loss of heterozygosity (LOH), microsatellite instability (MSI) or chromosomal instability (CIN)] and diagnosis/progression of BE were eligible; diagnosis was defined as endoscopic and histopathological evidence with or without intestinal metaplasia (IM) [13] and progression as histopathological progression from metaplasia to LGD/HGD/EAC or LGD to HGD/EAC, or HGD to EAC during surveillance endoscopies.

Exclusion Criteria

Studies comparing grades of metaplasia/dysplasia/malignancy within samples at one time point were excluded, as were studies comparing grades between patients. Studies using cell line, epigenetic, or expression data were excluded unless patient or DNA-specific data were available.

Literature Search

A search was performed in May 2014 of the PubMed and EMBASE databases, using the MOOSE and PRISMA guidelines [14] and the following term: [((((esophageal OR esophagus OR gastro esophageal)) AND (Barrett’s OR metaplasia OR columnar)) AND (genomic OR genetic OR genome OR pharmacogenetic OR pharmacogenomic OR amplification OR copy OR mutation OR polymorphism OR polymorphic OR variant OR deletion OR insertion OR locus OR loci OR allele OR ploidy OR instability OR biomarker))]. The bibliographies of retrieved articles were also searched.

Study Data

The following data were extracted: study methodology; variants and genes assessed; endpoints; population; and effect size [odds ratio (OR) or hazard ratio (HR)] and variance [standard error or confidence interval (CI)]. For studies not presenting OR, these were calculated using provided allele/variant frequencies. If reference SNP identification numbers were not provided (http://www.ncbi.nlm.nih.gov/dbSNP), these were mapped by searching cited methodology and performing in vitro polymerase chain reaction (http://genome.ucsc.edu), with nucleotide flank BLAST® (http://blast.ncbi.nlm.nih.gov).

Evidence Quality

Overall study and evidence quality were evaluated using the revised American Society of Clinical Oncology (ASCO) level of evidence (LOE) scale for biomarker research [15]. This stratifies study quality on the basis of trial design, patients and data, specimen collection, processing and archival, and statistical design and analysis from A to D (supplementary table 1), and uses this in conjunction with subsequent validation to stratify overall LOE for a marker from I to V (supplementary table 2). Methodological quality was appraised using the recommendations for tumor marker score (REMARK) guidelines modified by the authors, which scored study methodology in detail to generate a score from 0 to 17 (GWAS) and 0 to 18 (candidate studies; supplementary table 3). Reported associations were appraised for appropriate correction for multiple comparisons (Bonferroni method, false discovery rate, or multivariate analysis of all markers). If not undertaken, this was performed via post hoc Bonferroni correction. Significance was taken at p < 5×10−8 for genome-wide association studies (GWAS). Genomic quality criteria included reporting of genotyping call rate or providing data to allow its calculation, and assessment of Hardy–Weinberg equilibrium for germline variants.

Meta-Analysis

Meta-analysis was undertaken for markers assessed by more than one study, irrespective of correction for multiple comparisons (other than variants assessed by GWAS and non-GWAS due to major methodological differences) using RevMan v5.2 (Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration). Sensitivity analyses were performed on the basis of IM-only versus non-IM studies. In the case of studies reporting updated results using cohorts previously reported (i.e., population overlap), the most recent study was used.

Heterogeneity and Bias

Heterogeneity was estimated using I 2 and Chi-square statistics. For moderate heterogeneity (I 2 ≥ 50 %), a random-effects model was used. Funnel plots were reviewed for analyses of ≥5 studies [16]. Statistical assessment of these was not performed due to the low number of studies per variant. Publication bias was corrected using the “trim and fill” method [17].

Results

Literature Search

A total of 1516 articles were identified, of which 218 were duplicates. Therefore, 1298 articles were appraised on the basis of their title and abstract. Then, 251 full-text articles were retrieved, of which 52 met the inclusion criteria (Fig. 1).

Fig. 1
figure 1

PRISMA diagram

Studies Assessing Germline Markers of Susceptibility

A total of 32 studies were included: 2 GWAS, 29 candidate studies, and 1 meta-analysis of 4 candidate studies (supplementary tables 4 and 5). All original studies were LOE C (indicating prospective observational registries, without standardized treatment and follow-up), published between 1999 and 2014. Median modified REMARK scores were 15/17 (GWAS) and 13/18 (candidate studies; range 7–16; supplementary table 6). A number of frequent methodological issues were identified. The most common of these included failure to blind investigators (n= 18; 58.1 %), perform/report quality control procedures (19; 61.3 %), and match cases and controls (23; 74.2 %). Similarly, a number of recurrent reporting issues were identified. The most common of these were comparing markers with established risk factors (30; 96.8 %), adjusting for multiple comparisons (22; 71.0 %) and confounding factors (20; 64.5 %), reporting power calculations (15; 48.4 %) and reporting multivariate effect sizes (30; 96.8 %). Of the 27 studies assessing multiple variants, 24 did not perform multivariate analysis (88.9 %). For GWAS, there were a mean 5507 cases and 14,159 controls. For candidate studies, 134 cases and 196 controls. There were a number of cases of study population overlap, with 13 studies reported on subjects drawn from one of three populations.

Variants Associated with BE Susceptibility

A total of 187 candidate variants/haplotypes were tested. Twenty-eight significant associations were reported, of which 16 were excluded (Table 2). Twelve were therefore associated with BE (Table 1). However, for 2 of these (rs6785049 and rs9344) precise p values were not provided to allow for Bonferroni correction.

Table 1 Reported germline markers of Barrett’s esophagus susceptibility
Table 2 Reported germline markers of Barrett’s esophagus excluded on the basis of multiple comparisons

The rs1695 (GSTP1) was assessed by 4 studies, which underwent meta-analysis by Bull et al. [18]. This calculated an OR of 1.50 (95 % CI 1.16–1.95 p = not presented; LOE II). While derived from a large total cohort (434 cases and 738 controls), none of the four studies adjusted for risk factors; indeed the sole study finding a significant association had only 22 cases [19]. The GSTM1 null genotype was also assessed by 4 studies and underwent updated meta-analysis in this study. Overall, no association was demonstrated. On sensitivity analysis, a significant negative association was apparent for the 2 studies not requiring IM [17, 20], although the relevance of this is unclear. Meta-analysis was performed for 5 other variants, none of which demonstrated associations (Table 3).

Table 3 Meta-analyzed markers assessed in association with BE susceptibility

Of the 12 significant candidate associations reported, only rs1695 (GSTP1) and rs25487 (XRCC1) were assessed by more than one study. This notwithstanding, 5 appear relatively robust on the basis of adjustment for clinical covariates. These include 3 growth factor variants: rs444903 (EGFR [20]; notably associated with reflux esophagitis and EAC), rs6214 (IGF1 [21]), and rs2229765 (IGF1R) [22]. Two interleukin variants also appear plausible: rs3212227 (IL12B) and rs917997 (IL18RAP) [2325], with the former demonstrated to be independent of all other tested genotypes [26]. A number of other associations were reported in the IL1 [27, 28], IL10 [28], IL18 [29], and IL23 [30] clusters. Of these, however, only wild-type rs917997 (IL18RAP) persisted following correction for multiple comparisons [29].

The remaining 5 candidate variants included associations with 3 caudal homeobox 1 (CDX1) variants: rs3776082, rs2237091, and rs717767. The authors demonstrated these variants to be significantly associated with established risk factors for BE: age, gender, and the presence of hiatus hernia. However, multivariate analysis was not performed to demonstrate whether the association of these variants with BE was independent of these. Of the remaining 2, an association was demonstrated for the rs6785049 (NR1I2) variant; however, similarly this was not adjusted for risk factors. This was performed for the rs9344 (CCND1 [31]) variant, although the p value was not published.

Of the 5 GWAS variants identified, 4 were identified by the Wellcome Trust Centre Case Control Consortium (WTCCC). Two were identified by the initial report [32]: rs9257809 within the major histocompatibility complex (MHC; OR 1.21 [1.13–1.28]; p = 4.09 × 10−9) and rs9936833 (related to FOXF1; OR 1.14 [1.10–1.19]; p = 2.74 × 10−10). Subsequent replication identified rs3072 (related to GDF7: OR 1.14 [1.091.18]; p = 1.80 × 10−11) and rs2701108 (related to TBX5: OR 0.90 [0.86–0.93]; p = 7.50 × 10−9) [33]. All remained significant when meta-analyzed with data from the Barrett’s and Esophageal Adenocarcinoma Consortium (BEACON) GWAS. The first 2 were also associated with EAC in the BEACON dataset and an independent candidate study [34]. Mechanistically, GWAS arrays rely on linkage disequilibrium between SNPs, by which SNPs act as proxies for others (i.e., genotyping one SNP allows the genotypes of others to be inferred with confidence). This “tagging”, however, means that variants identified may be functional or may in fact be bystanders tagging other SNPs [35]. However, the loci overall and related genes are consistent with the roles of immune-mediated inflammation (MHC) and thoracoembryogenesis (FOXF1, GDF7 and TBX5) in BE.

The BEACON GWAS identified 3 loci associated with either BE or EAC, although none reached the genome-wide threshold for BE alone. However, on subsequent meta-analysis of both GWAS, rs2687201 (FOXP1, similarly involved in developmental regulation) was significantly associated with BE alone, in addition to a further variant associated with either BE/EAC [33].

Studies Assessing Germline Variants Associated with Progression

One study was identified (supplementary table 7). LOE was C; modified REMARK score was 13.5/19. This assessed 4 variants in the IGF axis; none were associated with progression.

Studies Assessing Somatic Variants Associated with Progression

Sixteen studies were identified, published between 1989 and 2012 (supplementary table 8). LOE was C for 12 and D for 4. Mean modified REMARK score was 13.2 (range 10.5–18.5/19; supplementary table 9). Five classified progression as HGD/EAC, 10 as EAC, and 1 as EAC/CIN. Again, a number of recurrent methodological issues were identified. These included failure to: blind investigators (12; 75.0 %), perform appropriate quality control/reproducibility (9; 56.3 %), match controls (14; 87.5 %), and appropriate power calculations (14; 87.5 %). Recurrent reporting issues included failure to: report univariate association effects (9; 60.0 %), adjust for risk factors (particularly the presence of dysplasia at baseline; 14; 87.5 %), and fully report coefficients of multivariate models (13; 81.3 %).

Variants Associated with Progression

Of 7 variants assessed, 5 associations were identified: CIN, CNV (>70 Mbp), TP53 LOH, p16 LOH, and mutant TP53 (Table 4). Meta-analysis was possible for CIN, which was assessed by 11 studies. These defined CIN variably as aneuploidy (4), tetraploidy (1), and aneuploidy/tetraploidy (6). Six of these studies were derived from independent sample archives. However, this was not clear for 4. All studies reported associations of CIN with progression. However, only two adjusted for confounding variables, including the presence of dysplasia [36, 37]. Another two, while not adjusting for dysplasia, did adjust for length of Barrett’s segment [38, 39], with CIN remaining significant.

Table 4 Reported somatic markers of Barrett’s esophagus progression to dysplasia/adenocarcinoma

Meta-analysis was performed for both OR and HR of progression to HGD/EAC. Significant associations were demonstrated for both (Table 4; Fig. 2). All studies included patients with IM only. Meta-analyzed OR was 5.98 (2.10–17.1; p = 8.00 × 10−4; n = 5 studies; following exclusion of overlapping studies and correction for publication bias). However, only one of these studies adjusted for the presence of dysplasia. Meta-analyzed HR was 1.36 (1.26–1.47; n = 2 studies; p < 1.00 × 10−5; following exclusion of one overlapping study).Overall LOE for CIN was II. While more than one study assessed mutant TP53 (n = 3) and LOH TP53 (n = 2), meta-analysis was not possible for either. Two studies for each were derived from the same populations, and the third TP53 study used a different measure of effect size.

Fig. 2
figure 2

Uncorrected and corrected funnel plots for odds ratio of progression and chromosomal instability

Variants Associated with Risk of Progression Following Photodynamic Therapy

Three studies were identified (supplementary table 10). LOE was C for 2 and D for 1. Mean modified REMARK score was 16.2/20 (range 14.5–19; supplementary table 7). For 2 studies, while endoscopic mucosal resection was variably performed, this was only controlled for in 1 study. A total of 6 variants were assessed; while 5 were assessed by 2 studies, meta-analysis could not be performed due to population overlap. CIN at both 4 and 12 months was reported to be associated with risk of progression (Table 5).

Table 5 Reported somatic biomarkers of Barrett’s esophagus progression to dysplasia/adenocarcinoma following photodynamic therapy

Discussion

We believe this review to be the first to identify, synthesize, and evaluate the evidence for genetic markers of BE susceptibility and risk of progression. Thirty-three susceptibility markers were identified; however, just 17 remained significant after correcting for multiple comparisons. Five (rs9257809, rs9936833, and subsequently rs3072, rs2701108, and rs2687201) were derived from GWAS and are therefore most likely to be reproducible. Of the 6 candidate markers assessed by more than one study, meta-analysis was supportive for one (rs1695, GSTP1). Five non-meta-analyzed variants affecting either growth factors or inflammatory cytokines appear plausible and therefore represent priorities for validation: rs444903 (EGFR), rs6214 (IGF1), rs2229765 (IGF1R), rs3212227 (IL12B), and rs917997 (IL18RAP).

No germline markers of progression risk were identified. However, 5 somatic markers were reported, plus another of progression following photodynamic therapy. Meta-analysis was possible for CIN, demonstrating significant effects for both HR and OR after correction for publication bias. However, there was considerable heterogeneity regarding definition of CIN, duration, and frequency of follow-up, confounding risk factors (e.g., prevalence of HGD) in addition to minimum follow-up periods. Notably, however, both studies adjusting for HGD [36, 37] did demonstrated convincing effects.

The robustness of the associations between CIN and progression suggests it to be of immediate clinical utility. CIN is a constituent of genomic instability, a state of erroneous progression through the cell cycle. Inaccurate DNA replication, repair, and chromosomal segregation, results in accumulation of genomic errors and is a major factor driving tumorigenesis. CIN is associated with worse stage and prognosis in a range of tumors including esophageal [40] and has been demonstrated in a quarter of patients with BE [36]. Importantly, this subset appears to be at significantly higher risk of malignant progression, which can be readily demonstrated by flow cytometry. This does, however, remain somewhat imprecise and is unable to distinguish between stable and unstable, simple and complex abnormalities. Despite this, the incorporation of CIN into a biomarker panel such as that reported by Bird-Lieberman et al. [36] (comprising age, CIN, dysplasia, TP53 and Cyclin A expression, sialyl Lewis antigens, Aspergillus oryzae lectin, and binding of wheat germ agglutinin) may provide invaluable information with which to personalize management of BE.

By contrast, the immediate benefits of germline susceptibility biomarkers are less tangible. As predicted by the “common disease-common variant” hypothesis of complex traits, both GWAS suggested many common variants of small effect to contribute to development of BE. Consequently, germline associations may be weaker and more complex. Whilst this gives the potential to identify novel biology, variants may have little utility in isolation.

A number of recurrent methodological issues were identified, limiting the generalizability of reported variants. Ultimately, validation studies for prioritized variants should be designed with these in mind, with particular emphasis on the interaction between genomic and clinical factors. Other issues to be addressed include disparity as diagnostic criteria; confirmation of IM is a prerequisite in the USA, yet is not required in the UK [41], although this did not alter the findings of this review.

A number of exploratory NGS studies have recently been performed for both BE and EAC. These have served to highlight the mutational complexity of both conditions, providing biological context for markers and their genes (which are often considered in isolation) as well as suggesting new variants and genes for study. Recently, Streppel et al. [11] performed whole genome sequencing (WGS) of one patient, comparing normal squamous epithelium with metaplastic and neoplastic epithelium. This identified somatic nonsense mutations in genes including AT-rich interactive domain 1A (ARID1A), a member of the SWI/SNF family involved in gene expression via chromatin remodeling, which has been independently identified as a driver gene of EAC by of other NGS studies. The authors found ARID1A loss of expression to become progressively more common during the metaplasia-dysplasia-adenocarcinoma sequence, and to be associated with aberrant cellular proliferation and invasion in a knock-down model. NGS studies also provide valuable contextual information as to mutational spectra, as well as clonal and linear evolution to better understand to development of somatic mutations and genomic instability. Recently, Weaver et al. [12] performed WGS of 112 EACs, similarly identifying a number of significantly and recurrently mutated genes. One hundred and seven BE samples were then genotyped, with the notable finding that most such mutations were already present in non-dysplastic epithelium; just TP53 and SMAD4 mutations occurred later, in HGD and EAC. The advent of third-generation sequencing, for example from single cells, will undoubtedly shed yet further light on this process. Additionally, germline rather than somatic mutations in EAC driver genes have been shown to predispose to EAC, although this has yet to be demonstrated in BE [42].

This review has a number of limitations. While we searched two databases using a comprehensive search term, it is possible that relevant publications (including non-English articles) were not identified. As discussed, meta-analysis for CIN was performed within the context of considerably heterogeneity and must be interpreted with caution. Only one meta-analysis (CIN and OR of progression) involved more than 5 studies, with considerable funnel plot asymmetry, largely due to small studies reporting large effect sizes with significant variance. This was interpreted as publication bias, and was adjusted, without altering statistical significance. In addition, the “trim and fill” method used to correct for possible bias while widely used does make assumptions regarding the necessity for plot symmetry, while not incorporating study methodology. Unfortunately, the limited number of studies prevented useful meta-regression to assess this further. There are also limitations in using the revised ASCO guidelines, as these do not fully represent the complexity of methodological quality and also disagreement between studies. We therefore used modified REMARK guidelines to provide a further level of criticism. In particular, the ASCO guidelines do not allow for differences in methodology (for example, GWAS versus candidate studies). And more generically, while genomic biomarkers are typically considered in isolation, in reality their utility depends on innumerable variables (including transcriptional, translational and proteomic regulation, and clinical and environmental factors). Consequently, establishing their true utility will require parallel processing and consideration of these contexts.

In conclusion, this review has identified, evaluated, and synthesized the evidence for genomic biomarkers of BE susceptibility and dysplastic/malignant progression. Seventeen germline markers of susceptibility, 5 somatic markers of progression, and 1 marker of relapse following photodynamic therapy were identified. Meta-analysis demonstrated CIN to be a particularly plausible and clinically useful marker of progression and one which can be demonstrated readily. However, the overall evidence base is characterized by widespread methodological issues, which limit the immediate clinical utility of these markers. Consequently, larger studies with more robust design are required to validate these markers, identify novel variants, and incorporate them into clinical practice.