Barrett’s Esophagus (BE) is a premalignant condition of the esophagus that results from chronic gastroesophageal reflux. BE has an annual risk of progression to adenocarcinoma of up to 0.63% [1], and the survival of adenocarcinoma is low. Radiofrequency ablation (RFA) is used to eradicate flat BE tissue that has demonstrated transformation to early intra-epithelial neoplasia (i.e., low-grade or high-grade dysplasia) and uses high-frequency energy through an external electrode array on a circumferential (3-cm balloon) or focal delivery device, with a reproducible depth of ablation [2]. RFA results in complete response of intestinal metaplasia (CRIM) in 54.3–100% [3]. In addition to incomplete treatment, approximately 7.7–11.6% of treated patients experience esophageal strictures [4, 5]. However, the reasons for these varying treatment outcomes are poorly understood. We hypothesize that variations in Barrett’s epithelial thickness (BET) are associated with reduced RFA efficacy for thicker BET and strictures for thinner BET.

The recent development of volumetric laser endomicroscopy (VLE) allows high-resolution, detailed cross-sectional imaging of esophageal layers. VLE signal attenuation and the frequent lack of obvious layering in BE complicate the measurement of BET. In this study, we aimed to develop and validate a method to simplify BET measurements and improve measurement reliability. A simple and reproducible method to measure epithelial thickness is critical for studying the relationship between BET and outcomes. Assuming a correlation, BET could be used for real-time determination of whether fixed-dose RFA is optimal, or deeper therapies, such as cryoablation or endoscopic mucosal resection (EMR), should be employed.

Materials and Methods

Volumetric Laser Endomicroscopy

This second-generation VLE device uses a balloon-centered probe, with a balloon diameter of 14, 17, or 20 mm and a length of 6 cm. The balloon probe is passed through the working channel of the endoscope. When located in the distal esophagus, the balloon is inflated and the optics within the probe is helically scanned over the longitudinal extent of the balloon in 90 s, generating 1200 cross-sectional images of depth-resolved, backscattered infrared light. VLE images have an axial resolution of 7 µm, a lateral resolution of approximately 40 µm, and an imaging depth of approximately 3 mm [6]. These imaging characteristics potentially make VLE ideal for precise BET measurements.

Patient Selection

We performed a nested cohort study from the US VLE registry (NCT02215291) containing scans of 1000 patients from eighteen sites. We included all patients with a baseline VLE scan, followed by RFA and at least a Barrett’s length (Prague M) of 1 cm. Patients with prior ablative therapy or patients without a follow-up examination were excluded. All scans were performed from May 2015 to October 2016 using second-generation VLE probes. Informed consent was signed at the time of inclusion to the registry. The institutional reviews boards of all sites approved the registry study at its outset.

Selection of Measurement Locations

After determination of the GEJ, defined as the proximal end of both gastric folds and crypt architecture, a characteristic VLE finding is described by Gupta et al. [7]. Target VLE images were selected at 0.5 cm (100 cross sections) intervals from the GEJ up to the proximal-most extent of the Barrett’s segment, as visible on the VLE scan. In shorter BE segments, the median tomogram image was chosen. In case of severe dilation (balloon–surface contact < 50%) or an extremely decentered probe, the closest image that met the criteria was selected.

Contrast Enhancement Algorithm

Identification of the epithelial boundary in VLE images can be challenging due to signal attenuation that occurs as a function of depth and the fact that the images are displayed in logarithmic scale. To address this problem, a contrast enhancement algorithm [8] was developed to compensate this signal attenuation and highlight the esophageal wall layers. The methodology also included a technique for flattening the tissue surface [9]. Surface flattening expedites BET measurements by allowing the user to delineate a single point corresponding to the mucosa–submucosa boundary at each image’s circumferential measurement location.

Initially, the target tomogram image stacks were opened in (two-dimensional) polar coordinates (Fig. 1b). Then, a previously described algorithm by Ughi et al. (2016) [9] was used to locate the esophageal wall and flatten the tissue surface (Fig. 1c). Following surface flattening, OCT attenuation compensation was applied using the method described in Teo et al. [8]. Attenuation-compensated frames were automatically contrast enhanced based on the histogram computed from a region of interest that spanned the entire image and was 400 pixels deep (Fig. 1d). After flattening, attenuation compensation, and contrast enhancement, eight equidistant lines (every 45°) were overlaid onto the images to denote the angular locations where BET measurement locations were to be made (Fig. 1e). All processing was performed in ImageJ (imagej.nih.gov/ij/;1.51p) using custom macros and Java plugins.

Fig. 1
figure 1

Graphical overview of the contrast enhancement algorithm. (a) Standard VLE tomogram image, opened in Cartesian coordinates; (b) VLE tomogram image opened in polar coordinates; (c) VLE image after surface tissue flattening; (d) VLE image after OCT attenuation compensation and contrast enhancement; (e) VLE image after running the complete algorithm and placement of eight equidistant lines denoting the BET measurement locations

Measurements of Thickness

BET was defined as the distance between the balloon–epithelium interface and the superficial edge of the deepest lamina propria. Since the surface was flattened, for each pre-determined circumferential location (red lines in Fig. 1d), measuring BET simply amounted to clicking on the most superficial portion of the deepest lamina propria, on every overlaid line (Supplementary Figure), using ImageJ’s point selection tool with automatic measurements (“on”). In each cross section (both contrast-enhanced and non-contrast-enhanced), the eight measurements were performed two times (“T1” and “T2”) with a 3–10 months’ interval by a research fellow (I.L.), trained on VLE image interpretation, and a VLE expert (G.T.).

Pixel distances were calibrated to physical distances (µm) by using the known pixel dimensions (5.85 µm/pixel) and an estimate of the tissue refractive index (n = 1.4). Lines that crossed regions of the image that did not contain BE or where the surface flattening algorithm failed were excluded from the analysis.

Statistical Analysis

SPSS 24 software for Windows (IBM Corp, Armonk, NY) was used to perform the statistical analysis and to produce the graphs. We used standard descriptive statistics to analyze the baseline patient characteristics. We assessed the relation between both individual thickness measurements and mean thickness per patient with age, BMI, Prague length, and histology by calculating a Pearson’s, Spearman’s (in case of a non-normal distribution), or Kendall’s tau (in case of an ordinal variable) correlation coefficient.

We estimated the consistency between the research fellow and the VLE expert in the contrast-enhanced and the non-contrast-enhanced images, separately. Intra-observer agreements between the measurements made by the research fellow and VLE expert on both contrast-enhanced and non-contrast-enhanced images were calculated. This resulted in four intra-class correlation coefficients (ICC; two-way mixed model), including 95% CIs. The Mann–Whitney U and Chi-square test were used to compare means and percentages, respectively.

The standard deviation of the intra-observer difference of the ‘T1’ and ‘T2’ measurements in the contrast-enhanced images was used to estimate the range of difference that is acceptable for BET measurements.

Results

Patient Characteristics and Measurement Selection

Seventy-seven patients from the US VLE registry met the inclusion criteria. Twenty scans were subsequently excluded because of the absence of either visible BE or the quality of the scan did not allow BET measurement (Fig. 2a). Patients were predominantly male (75.4%) and had a mean age of 63.4 (SD 10.8). The median Prague (C circumferential and M maximal extent) length at baseline procedure was C1M3. Of all patients, 10.5% had non-dysplastic BE, 8.8% were indefinite for dysplasia, 49.1% had LGD, 24.6% HGD and 7% EAC (Table 1).

Fig. 2
figure 2

Exclusion of the patients (a), followed by the exclusion of the measurement locations (b)

Table 1 Patient demographics

A total number of 321 cross sections were selected randomly including 2568 measurement locations. After an initial screening of the measurement locations, 1068 locations were excluded due to non-BE tissue (36%) or failure of the surface finding (flattening) algorithm (6%) (Fig. 2b). The mean number of measurements per patient was 26 (95% CI 21–31).

Thickness per Patient

The mean BETs measured from the attenuation-compensated, contrast-enhanced images by VLE expert T1, VLE expert T2, research fellow T1, and research fellow T2 were 436.2 µm (IQR 200.6), 448.6 µm (IQR 183.9), 395.6 µm (IQR 175.6), 398.5 µm (IQR 178.9), respectively. The mean BET per patient ranged from 223.63 µm–257.07 µm to 669.78 µm–705.41 µm (Fig. 3) for the contrast-enhanced measurements. Correlations were found between BET and BMI, Prague C, and histology (Table 2). Thickness did not depend on gender (male 407.1 µm vs. female 376.5 µm, P > 0.05), Prague (M) length, or age.

Fig. 3
figure 3

Mean BET (µm) per patient ± 2 times the standard deviation

Table 2 Relationship between Barrett’s epithelial thickness and gender, age, BMI, Prague length, and histopathologic diagnosis

Intra-observer Agreement

We compared the results of the measurements done by both the VLE expert and the research fellow. The consistency of the measurements performed was “very good” with an intra-class coefficient of 0.818 (95% CI 0.798–0.839; Table 3, Fig. 4a) for the VLE expert and 0.890 (95% CI 0.878–0900; Table 3, Fig. 4b) for the research fellow; mean differences were 74.6 µm (IQR 58.5) and 55.2 µm (IQR 54.8), respectively.

Table 3 Intra-observer variability in the contrast-enhanced images between T1 and T2
Fig. 4
figure 4

Scatterplot displaying. (a) The intra-observer variability of the VLE expert between T1 and T2 for the contrast-enhanced images; (b) the intra-observer variability of the research fellow between T1 and T2 for the contrast-enhanced images; (c) the interobserver variability between the VLE expert (T1) and the research fellow (T1) for the contrast-enhanced images; (d) the interobserver between the VLE expert and the research fellow for the “original” images

Interobserver Agreement

In order to validate these measurements, we compared the measurements of the research fellow with the measurements of the VLE expert for both the original images and the contrast-enhanced images. The consistency of the measurements between the research fellow and the VLE expert was “very good” for the contrast-enhanced images with an intra-class coefficient of 0.880 (95% CI 0.867–0.891; Table 4, Fig. 4c). The intra-class correlation coefficient of the measurements made from the original images was significantly lower (0.778; 95% CI 0.754–0.799 Table 4, Fig. 4d). The percentage of individual measurements that differed less than 142 µm (2 × SD of intra-observer mean difference of the research fellow) in the contrast-enhanced and original images was 74.1% and 86.6%, respectively.

Table 4 Interobserver variability in the original images and the contrast-enhanced images

Discussion

In this paper, we report the use of a surface flattening and contrast enhancement algorithm for simplified and reliable BET measurements. Interobserver agreement using the contrast-enhanced images was significantly higher than for non-contrast-enhanced images, demonstrating the capability of this algorithm to clarify BE epithelial boundaries in VLE images. A wide range of BET was identified between patients, which is a potential explanation of the variable response to RFA.

Ganz et al. [2] were the first to ablate esophageal epithelium and found a uniform ablation depth in animal and post esophagectomy specimens. When applied to human studies and subsequent standard clinical care, there were little empirical data about how to optimize RFA dosing to ensure sufficient depth of ablation while minimizing the chance of overtreatment and stricture formation. The optimal RFA dosage is still unclear [4].

Until recently, no available imaging technology had been capable of reliable BET measurement for the guidance of RFA treatment. Confocal laser endomicroscopy (CLE), endoscopic ultrasound (EUS), and VLE are three commonly used esophageal imaging technologies. CLE can provide real-time, transverse or en-face, microscopic imaging, showing cellular and subcellular details. CLE can detect BE neoplasia with a sensitivity and specificity of 90.4% (95% CI 71.9–97.2) and 92.7% (95% CI 87–96), respectively, which are superior to those of conventional endoscopy [10, 11]. However, since CLE is a transverse imaging modality, it is incapable of measuring mucosal thickness and therefore not suitable for BET assessment.

EUS is a cross-sectional imaging modality that can image at depths of up to 5–6 cm, yet has limited resolution (100 µm). Srivastava et al. [12] and Gill et al. [13] used EUS to measure wall thickness as a proxy to BET. In both studies, thickness was defined as the distance from the balloon–mucosa interface to the outermost hyperechoic line, histologically equal to the adventitia, and found a significantly greater esophageal wall thickness with columnar lined tissue. Because of its relatively low resolution, EUS cannot diagnose BE or precisely identify the epithelial boundaries. EUS is therefore not likely suitable for BET measurement [13].

VLE is an imaging technique that can detect BE dysplasia with a sensitivity, specificity, and accuracy of 86%, 88%, and 87%, respectively, using the VLE-DA algorithm [14]. Tsai et al. [15] assessed mucosal thickness in thirty-three patients with the first-generation OCT device (lateral resolution 15 µm, axial resolution 5 µm, imaging depth of 1–2 mm) by measuring the vertical distance between the epithelial surface and the deepest edge of the lamina propria/muscularis mucosa layer at the location with the best balloon–surface contact. They found that a Barrett’s mucosal thickness of > 333 µm could predict the presence of BE at 6–8 weeks follow-up with an accuracy of 87.9% [15]. Despite the small sample size and their addition of the lamina propria and muscularis mucosa to the measurements, these findings support the hypothesis that BET is a factor that governs RFA response and confirms the need of further research that correlates epithelial thickness to RFA treatment response.

Given the large amount of image data produced by the second-generation OCT VLE technology, to implement epithelial thickness measurements into clinical practice, quick (i.e., < 30 s) real-time image analysis by a computer-aided system would be helpful. Swager et al. [16] and Sommen et al. [17] already showed superiority of a computer-aided system for the recognition of BE neoplasia and early cancer, if compared to expert opinion. This previously developed computer-aided system could be combined with a combined lamina propria recognition, contrast enhancement, and attenuation compensation algorithm for real-time BET measurements.

A potential limitation of our paper is that there is no direct corresponding histopathology gold standard available for the comparison of our measurements. As a result, we cannot assess the accuracy of our BET measurement algorithm. Other groups, however, have precisely correlated the VLE imaging landmarks used in this study with the same histological/anatomical landmarks (e.g., muscularis mucosa, lamina propria) [17]. A strength of our study was the high degree of intra- and interobserver reproducibility of our technique which suggests that these measurements will be consistent across studies and observers.

Interestingly, thickness appeared to vary along the Barrett’s segment of each patient (Fig. 3). In our opinion, this suggests that treatment may need to be personalized to each patient and even each segment. Once the system has been optimized, it could be integrated into the delivery catheter to give precisely the right dose of ablation according to location to achieve optimal ablation depth, increased treatment response, and reduced stricture formation.

The inclusion of the patients and the different cross sections were random. However, the exclusion of the measurement locations was done during the measurements, which could lead to a measurement-location selection bias. It is likely that this is not significant, as we implemented well-defined criteria for exclusion (no BE tissue, surface finding algorithm failed), and therefore, our exclusion metrics were likely reproducible and unbiased. Another limitation of our method is that it was manual and time consuming. The mean number of measurements per patient was 26 (95% CI 21–31). Automation of the measurement of mean thickness per patient would be more precise and rapid and such methods are being developed. BET was significantly correlated with BMI, Prague C length, and dysplasia grade. However, the limited clinical value of low correlation coefficients has to be taken into account.

In conclusion, we have developed an algorithm to distinguish different BE mucosal layers and measure BET. We showed natural variation in mean thickness between patients and improved interobserver consistency by performing measurements in attenuation-compensated, contrast-enhanced VLE images. Further research is needed to correlate epithelial thickness with treatment response and to automate BET measurements for real-time assessment and implementation into clinical practice.