Introduction

Esophageal cancer is the eighth most common cancer and the sixth most common cause of cancer death in the world [1]. In the Western world, the main histological subtype of this disease is esophageal adenocarcinoma and the incidence has increased sevenfold over the past 3 decades [2]. It is a prime example of a cancer which presents late, since the symptoms only manifest when the tumor has already enlarged substantially to obstruct the passage of food, by which stage spread to lymph nodes is almost inevitable. As a result of late presentation, the overall 5-year survival is < 15%, despite advances in oncology and surgical practices [3]. However, early detection is feasible since the majority of esophageal adenocarcinoma develops slowly from a metaplastic condition called Barrett’s esophagus in which the normal stratified, squamous epithelium of the lower esophagus is replaced with a polarized, columnar-lined epithelium with intestinal-type differentiation. If the Barrett’s associated cancer can be diagnosed early, survival improves markedly, such that > 80% patients with superficial (stage T1) disease survive beyond 5 years [4, 5]. This presents an ideal opportunity to intervene and prevent progression to advanced disease especially since endoscopic therapy is now mainstream for superficial disease (high-grade dysplasia and T1a), involving a combination of endoscopic resection (EMR or ESD) and ablation therapy [6]. However, this strategy is flawed unless Barrett’s esophagus is diagnosed systematically in those individuals at risk. Herein lies a problem since the vast majority of Barrett’s esophagus cases (> 80%) are currently undiagnosed, and as a result, > 90% esophageal cancer presents de novo [7]. This begs several questions including which individuals should be offered a test and how can this be achieved in a high-volume and cost-effective way.

Identifying the Target Population for Esophageal Adenocarcinoma Screening

It is 40 years since the World Health Organization (WHO) commissioned a report on screening which led to the Wilson and Junger report which became a seminal public health article [8]. A synopsis of the screening criteria which have been adapted since the original criteria were published is summarized in Table 1 [9].

Table 1 Synthesis of emerging screening criteria proposed over the past 40 years.

In favor of screening, it is clear that esophageal adenocarcinoma is a poor prognosis disease which is usually fatal within 5 years and even when cured comes at the cost of significant morbidity due to the toxicity of oncological therapy and the morbidity associated with resection of the esophagus. Furthermore, the availability and efficacy of endoscopic therapy for disease detected at an early stage is compelling for the minority of patients who currently benefit [10, 11].

On the other hand, esophageal adenocarcinoma has a low prevalence compared with other cancer types, with 1% of all new cancer cases in the USA attributed to esophageal cancer [12]. Furthermore, if the aim is to detect Barrett’s esophagus, then one needs to take several facts into account: firstly not all esophageal adenocarcinoma arises from Barrett’s (although the exact proportion is difficult to define due to the problem of tumor overgrowth for studies examining the prevalence of associated Barrett’s) [13]; secondly, the progression rate to adenocarcinoma is low (estimated 0.03–0.5% per annum); [14, 15] and thirdly, it is not clear that surveillance programs, as currently practiced, have been successful in lowering cancer-related mortality [16,17,18].

For a low prevalence disease with a low conversion rate, any screening test would need to be applied to an enriched population in order to reduce the false-positive rate and improve the cost-effectiveness of the program. In the context of Barrett’s esophagus, the enrichment might be based on the known risk factors such as age, sex, reflux symptoms, BMI and family history. In a study in which positive factors of central obesity, smoking history and increasing age were added to a history of weekly symptoms of gastroesophageal reflux, the net reclassification index improved by up to 25% [19]. An extension of this model was constructed by Thrift et al. [20] using the additional factors of highest level of education, body mass index, smoking status, frequency of gastroesophageal reflux symptoms and/or use of acid-suppressant medications, and frequency of nonsteroidal anti-inflammatory drug (NSAID) use.

In the current societal guidelines, screening is recommended for individuals with such risk factors, but how precisely these are applied is not clear since there is no algorithm or risk score [21]. The updated Junger screening criteria (Table 1) state that the enrichment algorithm should be easy to apply. For national screening programs that are implemented at the current time, eligibility is based on age (e.g., colon cancer screening) or age and sex combined (e.g., mammography) in order to increase the prevalence of the disorder in those tested. Identifying individuals with reflux symptoms is most easily done based on the requirement for acid suppression therapy rather than based on symptoms per se since these are subjective. However, it should be borne in mind that acid-suppressant medication can be obtained freely over the counter, and therefore, in order to get compliance, one might need to administer the program via non-traditional routes which might include pharmacy and even supermarkets.

There is also the question about silent reflux since up to 40% individuals with esophageal adenocarcinoma present without a prior history of reflux symptoms [22]. However, if one accepts that there is significant enrichment in terms of risk in those with reflux symptoms, then population-based screening is probably not justified [12, 23,24,25,26,27,28,29,30]. The crude incidence rates in persons with and without chronic GERD symptoms yield a relative risk of esophageal adenocarcinoma of about six [25, 26]. This can be further appreciated from a modeling exercise based on the adult US population in which it was demonstrated that the greatest impact for a screening test would be for individuals with a history of reflux symptoms since this would account for 52% of the cancer burden and require screening 20% of the population [31]. From a terminology perspective, if one tests a symptomatic subgroup of the population, this might more accurately be referred to as a “diagnostic test” rather than “screening”—depending on whether you actively invite individuals to attend for testing or focus on those presenting to their practitioner with symptoms.

Screening Methods

Aside from determining the target population, the next main consideration is the screening method proposed, and as outlined by the WHO criteria, there should be scientific evidence of screening program effectiveness. There are many articles comparing the sensitivity and specificity of screening technologies, but the focus here is on implementation at high volume, bearing in mind that a given technology must be considered from the perspective of the health service (costs, skills required and ease of use, confidence in the technology and willingness to embrace it) as well as the end user (acceptability and experience). There should be quality assurance, with mechanisms to minimize potential risks of screening. Ultimately, the program should integrate education, testing, clinical services and program management. Many of the screening methods include the application of biomarkers which require systematic testing in order to cross the translational gap (Fig. 1).

Fig. 1
figure 1

A roadmap to show the stages of biomarker discovery to clinical implementation. The common stage at which biomarkers fail is highlighted as a translational gap

In a numerical simulation, one can appreciate how the likelihood that a patient who tests positive is actually disease-free (regret [1-PPV]) changes as the prevalence and specificity of the test changes [32]. The sensitivity of the test has very little effect. Hence, it can be seen that enriching the population and using a test with high specificity is essential to reduce the burden of false positives (Fig. 2).

Fig. 2
figure 2

A numerical simulation to show how false-positive rates vary with prevalence. Regret (1-PPV) in colored lines is shown as a function of prevalence Π and specificity assuming sensitivity is held at constant values of 0.7 (a) and 0.9 (b)

Imaging-Based Methods

White light endoscopy is the gold standard diagnostic tool for Barrett’s esophagus. Endoscopy requires a highly skilled operator as well as access to expensive equipment, and it is therefore restricted to specialist clinics, usually in the secondary care setting. From a patient perspective having an endoscopy requires time off work and although it is a routine procedure, there are very small risks of perforation and hemorrhage from multiple biopsies and side effects (sore throat, sedation effects). Transnasal endoscopy (TNE) is a slimmer instrument (< 6 mm) which can therefore be introduced through the nose which improves tolerability since gagging is reduced. Since the procedure is performed without sedation and in the sitting position, it is also more suited to primary care. However, the equipment costs, expertise and time required from the operator still pertain and so this cannot be regarded as high throughput [33]. A meta-analysis supports the accuracy, efficacy and patient acceptance of TNE over standard endoscopy [34]. However, from a user perspective, it is not clear whether transnasal endoscopy would be preferable since increased participation rates for transnasal endoscopy were not observed when compared to invitations for standard per oral endoscopy in a prospective randomized controlled trial of over 400 patients in primary care [35]. It should also be borne in mind that the rate of successful biopsy acquisition is lower with transnasal endoscopy than in the standard EGD group because the investigators were unable to pass the larger diameter sheath (5.8 mm) that had a biopsy channel, and instead used a smaller diameter sheath (4.7 mm) with no biopsy channel [35]. Furthermore, some instruments do not facilitate biopsy sampling (EG-SCAN). New-generation transnasal endoscopes are being developed with advanced imaging techniques that have the potential to distinguish dysplastic Barrett’s from metaplastic Barrett’s in the same examination [36], though this requires further study. In an analogous way, it is worth considering that colonoscopy is applied for colon cancer screening following an initial triage test using a stool sample in order to significantly enrich the population. The other distinction between colon and esophageal screening is that polyps can be removed in the same sitting which would not be feasible for Barrett’s metaplasia.

With the drawbacks of traditional endoscopy in mind, a number of imaging technologies are being developed which are more suited to primary care. Capsule-based imaging takes advantage of semiconductor technology and can be wireless like PillCam or tethered to increase the image-acquisition time [37]. Volume laser endomicroscopy (VLE) is a new generation of optical coherence tomography which has been integrated into a 6-cm balloon-based catheter, which allows rapid cross-sectional imaging, with an axial resolution of 7 μm and a depth of penetration of 3 mm [38,39,40,41,42,43]. Once the capsule is withdrawn, it can be disinfected for reuse, making it potentially inexpensive and feasible to be used for population screening. Other imaging modalities are being explored which exploit other dimensions of light in order to improve the resolution in order to negate the requirement for taking biopsies. In order to deploy any of these devices on a large scale, there are a number of considerations including: the cost of producing the device, the expertise for administration and analysis of the images. Lowering the cost of the imaging systems could be achieved by incorporating real-time objective feedback using automated image analysis—this is being explored in a number of settings but could be applied to the assessment of Barrett’s [44]. Alternatively, a centralized image analysis service could then report back to the clinician. It is likely that with any of these technologies, if a patient’s image is suggestive of Barrett’s or dysplasia, they would require an endoscopy to sample the esophagus. Biopsies provide the ability to interrogate the cellular, and increasingly the molecular properties, of the tissue in order to determine the cancer risk unless the technology was sensitive enough for this to be incorporated into the image analysis.

Esophageal Cell Collection Devices

An alternative approach would be to collect cells from the esophagus for analysis, without acquiring an image. This would generally require a pan-esophageal cell collection since the Barrett’s would not be visualized for purposeful sampling. Initial studies with non-endoscopic cell collection devices were disappointing due to a low cell yield, and the reliance on standard cytological analysis that is plagued by difficulties in interpreting cell atypia. However, there has been a resurgence in interest for this approach in view of the rapid expansion in biomarker technologies that improve the sensitivity for detection and can be readily applied in a high-throughput setting.

For example, an encapsulated sponge device (called Cytosponge™) has been developed which has material properties which facilitate the collection of a large number of cells (circa 0.6–1 million) which can then be interrogated for the presence of a Barrett’s-specific biomarker called TFF3 [45]. This simple device has been evaluated in the primary care setting and found to be suitable from the perspective of safety and user acceptability (BEST1) [46]. It has subsequently been evaluated in a larger, enriched population to test for the accuracy of this approach. In a prospective case–control study, there was a sensitivity of 79.9% for all comers (intention to treat) increasing to 87% for segments > 3 cm (check) and > 90% when excluding patients with an inadequate sample, meaning that no columnar cells (from stomach or Barrett’s) were present indicating that the device may not have reached the gastroesophageal junction (BEST2) [45, 46]. The acceptability of the Cytosponge™-TFF3 test is high with 82% of participants reporting low levels of anxiety before the test, and the Cytosponge™ was rated favorably compared to endoscopy (p < 0.001) [46]. In a qualitative study investigating the acceptability of the Cytosponge™ using interviews and focus groups, the acceptability was again found to be high, and participants perceived the test to be more comfortable and practical than endoscopy [47].

A microsimulation study was performed to compare the cost-effectiveness and health benefits of testing for Barrett’s by either Cytosponge™ or endoscopy compared with no systematic diagnostic test. The model suggested that the Cytosponge™ test was cost-effective when combined with endoscopic therapy [48]. Cost-effectiveness was further evaluated using two validated microsimulation models incorporating data from the BEST2 trial. This demonstrated that screening patients with reflux symptoms by Cytosponge™, with follow-up confirmation of positive results by endoscopy, would reduce the cost of screening in the range of 27–29% compared with screening by endoscopy, but led to 1.8–5.5 (per 1000 patients) fewer quality-adjusted life years. The incremental cost-effectiveness ratios (ICERs) for Cytosponge™ screening compared with no screening ranged from $26,358 to $33,307, whereas for screening patients by endoscopy compared with Cytosponge™, the ICERs ranged from $107,583 to $330,361, bearing in mind that these results were sensitive to Cytosponge™ cost within a plausible range of values [49]. High-throughput processing and reporting systems could reduce the costs further, and these are summarized in Fig. 3.

Fig. 3
figure 3

Considerations for upscaling the Cytosponge™

To further assess the suitability of the Cytosponge™-TFF3 test, a multicenter, randomized trial (Trial ID ISRCTN68382401) is now underway in primary care to assess whether invitation to a Cytosponge™-TFF3 test for patients with reflux symptoms will be effective in increasing the detection of BE in primary care. This trial will also provide further data on acceptability and cost-effectiveness as well as information on practical implementation of the device administration within the clinical care pathway, program management and quality assurance. The laboratory aspects of the quality assurance and implementation are discussed below.

Circulating Molecular Markers

A blood-based screening test would be an ideal screening platform as these tests are safe, well tolerated and can be carried out in a primary care setting. Circulating miRNAs are small noncoding RNA molecules (approximately 21–25 nucleotides in length) that are quite stable in the circulation and can be detected using multiplexed assay platforms. These function in RNA silencing and posttranscriptional regulation of gene expression. A number of studies have shown that miRNAs can distinguish between Barrett’s and control patients including those with esophagitis [50,51,52]. A combination of 4 circulating miRNAs (miRNA-95-3p, 136-5p, 194-5p and 451a) has been found to distinguish Barrett’s patients from controls with a sensitivity and specificity of 78 and 86%, respectively; however, this was an enriched population and a higher specificity is required before such a test can be rolled out in a primary care population for a disease of low prevalence in order to avoid a large number of false positives. Circulating cell-free tumor DNA (ctDNA) is another approach, and ctDNA can be detected at high allele fraction in cases with advanced esophageal cancer; however, the sensitivity in the context of early disease including Barrett’s esophagus is not yet known and the technology is still evolving, through a global research effort [53, 54]. If a pan-cancer test could be evaluated, then this would likely be followed up by a body-wide imaging modality to locate the source unless the species could have tissue level specificity.

Volatiles Detected in Breath

The rationale for using exhaled breath analysis to detect cancer is that the composition of breath metabolites reflects the radical change in the nutrient utilization and metabolic requirements of cancerous cells enabling them to meet the large demands on biosynthesis required for enhanced cell growth and division. The best described origins of cancer-related breath metabolites are the Warburg effect in which cancer cells have glycolytic rates up to 200 times above normal cells even when sufficient oxygen is present [55]. Such processes have been associated directly with changes in the volatile fraction of the metabolites produced by cancer cells which can be detected in exhaled breath [56, 57].

Biomarkers detected in a breath sample are an attractive method for cancer screening since it is noninvasive, applicable to the primary care setting, and is likely to be cost-effective depending on the complexity of the analysis. One study identified a panel of breath volatile organic compounds that could be used to distinguish esophageal cancer from Barrett’s and begin conditions of the upper gastrointestinal tract [58], and another study used an e-nose device to diagnose Barrett’s in patients with a history of dysplastic Barrett’s esophagus [59]. As for serum assays, given the possibility of detecting a wide range of cancers on the breath, one question is whether the signature of metabolites detected can provide specificity for a given cancer site. In terms of implementation, this would likely require a centralized analysis of the exhaled metabolites which can be detected in a collection tube and transported at room temperature.

Laboratory Requirements for Roll Out of a Biomarker Screening Test

Many of the technologies discussed above rely on a biomarker from tissue, blood or breath. There are a number of important, practical considerations for determining whether an assay can be deployed in a high-throughput manner for screening. These include the sample transport, case identifiers and booking systems to comply with patient governance, specimen processing, analysis and result reporting.

The transport of the specimen maybe critical to the fidelity of the biomarker in question. In practical terms, a biomarker that can be transported at room temperature or stored in a fridge at 4 °C for up to 2 weeks is essential for large-scale implementation. Therefore, if a biomarker relies on interrogation of fresh frozen samples, this poses limitations. For example, sequencing technologies have tended to rely on fresh frozen tissue, but increasingly the platforms are becoming much more robust and applicable to smaller amounts of FFPE tissues and blood for ctDNA analysis.

The software management system should allow booking in of samples and accompanying patient data, sample tracking, processing and quality control. There is also a need for assimilation, presentation and storage of data for sample reporting or further analysis. The use of barcodes will improve efficiency with barcode scanning at each step of the laboratory process. This improves efficiency with reduced risk of sample mislabelling and allows tracking of sample progress within the laboratory and generates an audit trail.

The processing system will depend on the nature of the sample and the biomarker platform. For blood, this is well understood. For technologies like breath biopsy assays or Cytopsonge, then these systems have to be developed to a clinical standard. For Cytosponge™, the current BEST3 trial (Trial ID ISRCTN68382401) is being conducted in a clinically certified laboratory with adherence to a strict SOP that has been streamlined for ease of processing and to reduce costs. In this protocol, cellular material is removed from the Cytosponge™ using agitation methods, which can be automated. Centrifugation generated cell pellets, and a cell block for histological processing are then formed using either plasma and thrombin or agarose gel to enable the application of histology-based systems rather than the generation of cytology preparations. This has a number of advantages including ease of reporting by histopathologists without reliance on cytopathologists, availability of the cell block for additional stains or ancillary studies and the applicability to scanning technology. Slide scanner technology can improve laboratory efficiency by generating a permanent bank of stored images for histopathologist reporting, allowing remote pathologist reporting, and use of automated antibody reading using intelligent software systems, so that histopathologists only report positive or difficult cases. For Cytosponge™, the binary algorithm (positive or negative) interpretation of TFF3 easily lends itself to the development of automated reading systems. A quantitative biomarker such as miRNA or methylation-based assay requires the development of a predefined cutoff determined by running a calibration curve for each run [60]. Electronic reporting systems direct to service users are ideal which should be standardized and include recommendations for management. All these reporting systems must ensure patient confidentiality, and the technical specification must be sufficiently robust to ensure compliance with data protection and other privacy legislation.

Comparisons Between Screening Technologies

A comparison of potential screening methods for Barrett’s esophagus is given in Table 2. It can be seen that there is a trade-off between endoscopic techniques which require high-level expertise and simpler approaches which are more clinically applicable and more effective but might be less accurate than the gold standard diagnostic test.

Table 2 High-throughput considerations and where each technology fits

There is a lot to be learned from the development of stool-based biomarker assays for colon cancer detection. A multi-target stool DNA (MT-sDNA) test, which combines both mutant and methylated DNA markers and a fecal immunochemical test (FIT), recently performed favorably in a large cross-sectional validation study and has been approved by the US Food and Drug Administration (FDA) for the screening of asymptomatic, average risk individuals for colon cancer [61]. This test was shown to have superior sensitivity, although with lower specificity, to fecal hemoglobin by immunochemical testing for the detection of curable-stage CRC and advanced adenomas and to have an overall cancer detection rate similar to colonoscopy. Furthermore, the software algorithmically integrates results of assays to calculate a dichotomous “Positive” or Negative” result. The automated platform is operator independent and has been validated by blinded comparison to manual methods [62]. The test has high acceptability because the stool is collected in a container suspended from a toilet seat, and the patient is required only to swab the intact stool for FIT sampling and to cover the specimen with the included buffer solution. Survey data suggest that the simplicity of the collection process may be greater than that of guaiac-based FOBT [63]. This approach is an example of great success in the biomarker field from discovery through to clinical implementation. Research is required to try to improve the performance, especially the specificity of the test and to bring down the costs, but it is very reassuring to see that commercialization and reimbursement for such an approach is achievable.

Discussion

Barrett’s esophagus fulfills many of the Junger criteria for screening now that endoscopic therapy is widely available and proven to be effective. However, it is a relatively uncommon disease compared to colon and breast cancer for example, for which screening is in routine use, and therefore, patient selection is critical in order for the strategy to be cost-effective and acceptable to patients. It is therefore imperative that the studies are performed in the relevant populations in order to avoid misleading estimates of sensitivity and specificity [64, 65]. It is reasonable to evaluate new technologies in surveillance populations with a high prevalence of dysplasia and early cancer, but this must then be followed by studies in the relevant primary care setting alongside an evaluation of acceptability and health economics.

Progress is being made in the development of imaging tests and cell collection devices coupled to biomarkers with relevance for the primary care setting. Studies on a large scale in the relevant population, such as BEST3 should enable a thorough evaluation of these technologies followed by introduction into clinical practice. However, it is likely that implementation will be for an enriched population given the low disease prevalence of esophageal adenocarcinoma. Even then the number of individuals for testing will be high, and therefore, it is imperative that once a promising approach is identified, consideration is given to the practical aspects to ensure that it is suitable for clinical implementation at the appropriate scale to lead to meaningful health benefit.

Key Messages

  • Improved methods for the early detection of esophageal adenocarcinoma are a priority in order to reduce mortality from this highly aggressive cancer.

  • One strategy is to detect the treatable, precursor lesion Barrett’s esophagus, but for a low prevalence disease with a low conversion rate, any Barrett’s screening test would need to be applied to an enriched population in order to reduce the false-positive rate and improve the cost-effectiveness of the program.

  • Prior to implementation of any screening technology, randomized trials in the target population are required to ensure that accuracy, acceptability and health economics are favorable.

  • Imaging-based methods for screening include transnasal endoscopy, capsule-based imaging and volume laser endomicroscopy, and for large-scale implementation, automated image analysis is an important area of research and development. Blood biomarkers and breath volatiles are also under evaluation and would be ideal due to the minimally invasive nature of the test.

  • Cell collection devices coupled with biomarkers, such as Cytosponge™-TFF3, have progressed step by step from proof-of-concept studies through to large-scale randomized trials in individuals in the primary care setting. The sample processing, analysis and reporting are being performed in real time in a clinically accredited laboratory to ensure that this technology could be rolled out into mainstream practice if the trial data are favorable.