Background

Eosinophilic esophagitis (EoE) is an inflammatory condition of the esophagus caused by an immune response to allergen exposure. This upper gastrointestinal (GI) tract disease is characterized by an influx of eosinophils into the esophageal mucosa triggered by ingestion of food antigens (Fig. 1). The resultant inflammation can lead to chronic dysphagia, food impaction, and sometimes even to esophageal perforation in adults, while young children tend to suffer from heartburn, abdominal pain, and vomiting – potentially resulting in failure to thrive due to associated feeding difficulties [1]. Untreated EoE may ultimately lead to fibrotic, hypertrophic tissue remodeling of the subepithelial wall, causing esophageal strictures [1,2,3,4] (Fig. 1). It is estimated that approximately 0.5-1 in 1000 people, many of whom are children, suffer from EoE, with reported incidence rates of 5–10/100,000/year that appear to be increasing [1]. EoE is found in 3–7% of all patients undergoing upper white light endoscopy (WLE) [1, 5]. Current treatment strategies include dietary restriction to avoid allergen exposure or topical (swallowed) corticosteroids, while in chronic untreated EoE cases, esophageal dilation of strictures may be performed [6]. Recently, biological therapies have demonstrated promising results in initial clinical trials [7].

Fig. 1
figure 1

Schematics illustrating the pathophysiologic natural progression of EoE from healthy, to acute eosinophil infiltration during an atopic reaction (redness and red spherical cells indicate inflammation), to incipient fibrosis after prolonged inflammatory conditions (arising strictures due to increased development of underlying fibrosis), to chronic fibrotic constrictions of the esophageal wall (severe strictures due to advanced fibrotic remodeling). a Anatomic representation of distal esophagus and proximal stomach. b Cross-sectional view of esophagus such as that provided by WLE. c Histologic section of relevant cellular linings – epithelium and lamina propria (figure following [1])

Currently, the diagnosis of EoE is based on symptoms of esophageal dysfunction, the histopathologic analysis of multiple biopsies obtained during upper WLE, and the absence of other inflammation related disorders (e.g. gastroesophageal reflux disease (GERD)) [8]. WLE guided biopsy is the gold standard for the diagnosis and monitoring of a number of mucosal diseases affecting the upper GI tract, including EoE, Barrett’s esophagus (BE) [9], celiac disease [10], and Crohn’s disease [11]. However, it is also a diagnostic technique that suffers from several general and EoE-specific shortcomings:

  1. (i)

    Missing depth information: Image visualization and biopsies obtained during WLE predominantly provide information about the epithelium, limiting insight regarding subepithelial pathologic changes. Even though some reports suggest that generous biopsies may capture sufficient lamina propria to enable assessment of subepithelial fibrosis [12], standard clinical biopsies generally lack depth information [13] and hence there is no possibility to routinely assess subepithelial remodeling and other features directly relevant to clinical manifestations and outcomes of EoE [14]. This issue is particularly problematic for the development of newer pharmacologic therapies aimed at suppressing or reversing subepithelial fibrosis [15,16,17] (Fig. 1).

  2. (ii)

    High costs: WLE requires a gastroenterologist to perform the procedure and an anesthesiologist or nursing staff to administer medications and monitor for adverse reactions in a specialized setting. In addition, histopathologic analysis of multiple biopsies collected per subject further increases the costs. At more than $ 1,000 per procedure and overall costs that were found to be about $ 3,300/year/EoE-patient [18], WLE guided biopsy for EoE diagnosis and monitoring accounts for a substantial proportion of the annual excess healthcare costs attributed to EoE in the U.S. (up to $ 1.4 billion/year in total) [18].

  3. (iii)

    Time inefficiency: Sedation increases procedure time and, in addition, usually requires the patients to take the full day off from school/work. Furthermore, current recommendations for EoE-specific stepwise food elimination/reintroduction protocols suggest intervals of eight weeks between follow-up WLEs. These empiric elimination/reintroduction diets can stretch over an entire year or more [19, 20].

  4. (iv)

    Low-resolution imaging in vivo: Even though direct visualization of the esophageal mucosa by WLE is critical to rule out potential other pathologies causing similar clinical symptoms, WLE without mucosa biopsies cannot provide the necessary high imaging resolutions in order to detect earliest cellular onsets of disease development in EoE. Only grossly apparent features (rings, furrows, plaques, etc.) are visualized by WLE.

  5. (v)

    High sampling variability: The definitive EoE diagnosis is made by finding a minimum of 15 eosinophils per microscopic high-power field [8]. Recommendations are that at least six biopsies at different esophageal locations should be taken, due to the fact that EoE may be a patchy disease [3]. Still, WLE guided biopsy is highly subject to sampling variability without any chance for microscopic large tissue segment investigation, typically sampling < 1% of the tissue involved.

  6. (vi)

    Missing intrinsic tissue-specific contrast: While conventional WLE provides true color surface reflectivity contrast only, two of its extensions, chromoendoscopy [21] and narrow band imaging (NBI) [22], aim to improve tissue-specific contrast to aid in detection of abnormal mucosal areas. However, chromoendoscopy relies on application of external contrast agents/dyes and NBI only incrementally improves visualization of certain tissue structures (mostly vascular ones).

  7. (vii)

    Anesthesia related risks: Even though the risk of conscious sedation as commonly used in adults can be regarded negligible, pediatric EoE patients usually require general anesthesia. Repeated WLE procedures (e.g. to assess disease status) considerably increase the risk of anesthesia related complications in one of the most dominant disease populations – young children [23].

These limitations of WLE guided biopsy indicate an important unmet clinical need for a (i) depth-resolved, (ii) cost-effective, (iii) time-efficient, (iv) high-resolution, (v) large area (i.e. high speed and 3D), (vi) intrinsic tissue-specific contrast, and (vii) anesthesia-free imaging tool for EoE disease diagnostics and monitoring in vivo. As a result, research groups around the globe have come up with different approaches to circumvent these shortcomings:

An ingestible gelatin capsule on a string incorporating a compressed mesh, named Cytosponge, originally developed for BE and esophageal adenocarcinoma (EAC) screening [24], has recently been successfully applied to the evaluation of EoE [25, 26]. This non-imaging screening tool, while representing a very inexpensive and time-efficient alternative, has sensitivity and specificity of 75% and 86%, respectively, compared to conventional biopsies [26]. It is also unclear whether the Cytosponge can provide information about other important histopathologic features, beyond eosinophil counts, such as basal zone hyperplasia, eosinophil microabscesses, and presence of dilated intercellular spaces [27], and does not enable assessment of lamina propria fibrosis.

In vivo optical endomicroscopy modalities such as confocal laser endomicroscopy (CLE) [28] or multiphoton microscopy [29] provide highest resolution imaging in vivo and can potentially resolve individual eosinophils. These approaches have a limited field-of-view (FOV) and still require conventional endoscopy, since the imaging probes are guided through the accessory port of the scope.

EoE is of interest to the Tearney Lab for several reasons. It affects a large number of people and is a significant burden to our health care system [1, 18]. It also is a relatively new disease (first reported in the 1970s [30, 31]; first proposal as distinct clinicopathological disorder in 1993 [32]) that is poorly understood. Hence, there are significant opportunities to use advanced diagnostic technologies to study its natural history and improve patient care through enhanced diagnosis and treatment options.

This paper is not intended to be a comprehensive literature review on in vivo optical endomicroscopy, nor does it raise any claims that all major achievements resulted from the research performed within one unique research laboratory. Similar to other previous translational sciences reports [33,34,35], it rather explores the most relevant research and development phases of a series of in vivo endomicroscopy modalities, that combined result in designs for a novel diagnostic device for EoE (Fig. 2). The development of this manuscript is an integral part of the translational sciences training component of the SPIE-Franz Hillenkamp Postdoctoral Fellowship in Problem-Driven Biophotonics and Biomedical Optics of which Dr. Andreas Wartak, member of the Tearney Lab, has been one of the two recipients in 2019 (https://spie.org/membership/early-career-resources/spie-hillenkamp-fellowship?SSO=1).

Fig. 2
figure 2

Key scientific, engineering, and clinical achievements throughout two decades of translational research. The eight milestones are indicated above the timeline in black. Below the timeline, the most fundamental literature reports (divided into first demonstration, first preclinical in vivo, and first-in-human) are listed in blue

Translational Story

This section provides the chronology of eight key milestones along the translational pathway (Fig. 2). The top section of the flow chart summarizes scientific, engineering, and clinical achievements. Detailed descriptions of the questions to be addressed, background, and challenges encountered along with the solutions developed are provided at each step. The bottom section of the flow chart lists peer-reviewed landmark journal articles along the way.

Endoscopic optical coherence tomography (OCT)

At the origin of our story, ongoing attempts to address the previously listed limitations of WLE guided biopsy for early upper GI tract disease diagnosis and monitoring yielded the first successful results in the mid-1990s. In particular, the issue of (i) missing depth information was circumvented by the introduction of a cross-sectional imaging modality for internal organ imaging. In addition to depth-resolved image acquisition, the benefits of this novel imaging modality included (ii) reduced procedure costs, (iii) better time-efficiency, as well as (iv) higher resolution imaging capabilities in vivo.

Optical coherence tomography (OCT) [36], an interferometric optical ranging modality for transparent and translucent living tissue imaging, has frequently been associated with key imaging adjectives such as non- to minimally invasive, high-speed, high-resolution, depth-resolved, and low cost. Until the mid-1990s, OCT had mainly been applied externally for ophthalmic or ex vivo tissue imaging [37]. As of today, OCT’s most prominent field of application has been ophthalmology, where, in particular, posterior segment imaging has been revolutionized through the introduction of histology-like cross-sectional imaging of retinal morphology in vivo [36, 38,39,40]. Besides ophthalmic and dermatologic imaging [41], which are both performed externally, OCT offers tremendous potential for the diagnosis and monitoring of internal diseases.

From a technology standpoint, OCT shares some of the fundamental imaging principles of ultrasound, with the major distinction that its tomograms arise from echo time delays of light instead of sound waves. With OCT, time delays are measured using a technique called low coherence interferometry, where light from a reference and light returned from the sample are combined, detected, and processed to obtain a depth resolved reflectance profile or A-scan. Recording successive A-scans while the sample beam is scanned across the tissue generates 2D cross-sectional images or B-scans. OCT instrumentation can achieve axial and transverse resolutions down to the 1-µm-range and, depending on wavelength and tissue composition, enable penetration depths of up to ~ 2–3 mm [42].

OCT’s full potential for internal disease diagnosis was primarily envisioned in the mid-1990s at the Massachusetts Institute of Technology (MIT), Boston, MA, USA, by the laboratory of Dr. James G. Fujimoto [43,44,45]. In combination with early contributions by two other research groups from the Institute of Applied Physics, Nizhny Novgorod, Russia [46, 47], and the Case Western Reserve University, Cleveland, OH, USA [48], the introduction of so-called endoscopic OCT [49,50,51] may be considered the first step towards a comprehensive solution to many of the outlined shortcomings of WLE guided biopsy.

In principle, OCT catheter-endoscopes present an extension of the sample arm of a conventional OCT instrument (Fig. 3(a)). The rest of the system, consisting of light source, reference arm, and detection unit usually acts independently, thus allowing for relatively simple exchange of different imaging probes. Light delivery is achieved by the integration of optical fibers into the catheter, while micro optical components such as a gradient index (GRIN) lens, a ball lens or a curved mirror enable precise beam focusing. In general, OCT endoscopes are divided into two groups with respect to their intended clinical application – forward-viewing [52] and side-viewing endoscopes [43] (Fig. 3(b), (c)). The latter may be further sub-divided into linear [45, 46] or helical scanning schemes [43, 48]. In this manuscript, we mostly focus on the side-viewing helical scanning technique, since this is the most common form of tissue scanning in luminal organs such as the esophagus.

Fig. 3
figure 3

Milestone 1: Principles of endoscopic OCT. a Fiber-based Michelson interferometer endoscopic OCT setup sketch illustrating the different OCT configurations. Time-domain (TD) OCT (blue) vs. Fourier-domain (FD: spectral-domain (SD) OCT (green) vs. swept source (SS) OCT (yellow)) OCT. b and c illustrate the two fundamental types of endoscopic OCT imaging probes – forward-viewing (b), including a non-specified beam scanning unit (BSU), and side-viewing (c). d Exemplary WLE image of a human esophagus in vivo. e and f depict reprints from a first preclinical in vivo and a first-in-human report, respectively. e shows first circular OCT scans of a rabbit’s esophagus in vivo (figure and caption adapted and reprinted with permission from [44] © The American Association for the Advancement of Science). f shows a linear OCT scan of esophageal tissue of a first healthy subject in vivo, presenting the well-known layered structure (figure and caption adapted and reprinted with permission from [46] © The Optical Society)

Helical scanning incorporates a deflection element (mirror, prism) that redirects the beam towards the side, perpendicular to the catheter axis, thus, enabling illumination of the inner wall of tubular organs. To obtain circumferential cross-sections, the side-directing optical element is rotated (Fig. 3(b)). Via a manual or automatic constant pullback along the main catheter axis, a 3D data set of adjacent circular cross-sectional images may be recorded (Fig. 4(a)).

Fig. 4
figure 4

Milestone 2: FD-OCT enables comprehensive 3D endoscopic imaging results. a Illustration of helical scanning scheme (side-viewing rotation + pullback). b and c depict reprints from landmark papers of first preclinical demonstrations of endoscopic FD-OCT, in vivo. b shows a 3D rendering of data recorded in the esophagus of a living swine as well as cross sectional data details and a comparison to histology (figure and caption adapted and reprinted with permission from [69] © Springer Nature). c depicts a balloon-centered 3D dataset and a 2D cross-section of the distal esophagus of a swine in vivo (figure and caption adapted and reprinted with permission from [71] © American Society for Gastrointestinal Endoscopy)

Two different rotation approaches have been implemented for side-viewing OCT endoscopes: (i) proximal rotational scanning, that performs a combined rotation of the light delivery, focusing and side-directing optics, making use of an optical rotary junction [48, 53, 54]; (ii) distal rotational scanning, that features a micro-motor at the distal tip of the endoscope that only rotates the side-directing optical element [55, 56].

The first reported endoscopic OCT probe was designed to assess atherosclerotic lesions in the vascular system where biopsy cannot be performed safely [43]. The requirements for the catheter-endoscope included light delivery, focusing, scanning, and collection. The need to image tortuous coronary arteries of an inner diameter of ~ 1 mm necessitated a small-diameter and short rigid-length imaging probe. Owing to the size restrictions, the reported 1.1 mm diameter probe used a rotational scanning scheme with proximal rotation to image a postmortem human saphenous vein section in a first tissue study ex vivo [43]. This OCT probe was subsequently used to image a living rabbit’s esophagus and trachea [44] (Fig. 3(e)).

A first-in-human study of endoscopic OCT was published by the Institute of Applied Physics RAS, and aimed to investigate precancerous and cancerous human mucosal tissue [46]. The accessory port of a standard endoscope was used to guide a forward-viewing imaging probe, incorporating a miniaturized electro-mechanical linear scanning unit that obtained 2 mm long scans, of the tissue of interest, i.e. mucosal tissues of the esophagus, larynx, stomach, urinary bladder, uterine cervix (Fig. 3(f)).

Further human endoscopic OCT investigations, conducted in vivo, were published by the same group [47] and by Drs. Brett Bouma and Tearney from the Wellman Center of Photomedicine (WCP) at Massachusetts General Hospital (MGH), Boston, MA, USA [45]. Both reports made use of linear scanning schemes (forward-viewing and side-viewing, respectively) to image larynx, bladder, cervix, and abdominal tissue, as well as esophageal tissue, respectively. The first helical scanning images of human esophageal tissue were recorded by the group from Case Western Reserve University in vivo, just before the new millennium [48].

In the early-2000s, first clinical papers were published by the same three groups and their clinical collaborators, primarily assessing endoscopic OCT’s value for early disease diagnosis BE and EAC [53, 57,58,59,60]. Also, a first comparison study between endoscopic OCT and endoscopic ultrasound (EUS) clearly demonstrated OCT’s imaging resolution superiority [61].

These proof-of-concept experiments, first-in-human investigations, and original clinical studies demonstrated that endoscopic OCT could detect epithelial and subepithelial pathologic tissue alterations in a cost- and time-efficient manner at high imaging resolutions. These advantages of OCT in part addressed some of WLE’s shortcomings, including: (i) missing depth information, (ii) high costs, (iii) time inefficiency, and (iv) low imaging resolutions in vivo.

Fourier domain optical coherence tomography (FD-OCT)

From its introduction in 1991 until the early-2000s, the reference arm length of an OCT instrument had to be scanned to map out tissue reflectivity over depth, i.e. to record a depth profile (A-scan) (Fig. 3(a)). This approach was retrospectively termed time domain (TD-) OCT. TD-OCT has been replaced for most endoscopic OCT applications by so-called Fourier domain (FD-) OCT. FD-OCT enables recording of an entire A-scan with the reference arm position fixed in place, while additionally offering a sensitivity advantage of more than two orders of magnitude in comparison to TD-OCT. This paradigm shift in terms of image quality, and thus imaging speed, increased (iii) time-efficiency and primarily enabled comprehensive 3D volumetric data acquisition, partially mitigating (iv) the sampling error inherent to endoscopic biopsy.

The first clear demonstration of the sensitivity advantage of FD-OCT over TD-OCT took place in 2003, highlighted by three landmark papers from groups at the Medical University of Vienna, Austria, the WCP at MGH, and Duke University, NC, USA [62,63,64]. The reports provided experimental proof and outlined the respective theory for both FD-OCT sub-implementations – spectral domain (SD-) OCT and swept source (SS-) OCT (also called optical frequency domain imaging (OFDI)).

In SD-OCT a broad band light source in combination with a spectrometer is used to simultaneously measure the depth dependent sample reflectivity (Fig. 3(a)). In SS-OCT, a rapidly tunable narrow band lights source sweeps through the wavelength bandwidth over time and the interference fringes are recorded by a single balanced photodetection unit [65] (Fig. 3(a)). Imaging speeds have skyrocketed from several thousand A-scans per second in TD-OCT to now several million A-scans per second in FD-OCT [66]. FD-OCT imaging speeds are only limited by the spectrometer camera (in SD-OCT) or the tunable light source (SS-OCT), and are thus constantly increasing through continued technological advances in both fields [67, 68].

FD-OCT found increased potential utility in endoscopic OCT due to its capabilities to provide comprehensive volumetric imaging (Fig. 4). In contrast to previous cross-sectional TD-OCT investigations, large tissue areas could now be covered in reasonable procedure times, presenting endoscopic OCT with a third imaging dimension.

First pre-clinical [69,70,71] as well as clinical [72, 73] volumetric endoscopic OCT studies were reported by the WCP at MGH and by MIT (Fig. 4(b), (c)). A balloon-centered esophageal imaging catheter, developed at the WCP [71, 72], found its way into the market under the name of volumetric laser endomicroscopy (VLE), recently reporting a first 1000 patient registry study [74].

The step from TD- towards FD-OCT helped to overcome a major hurdle for endoscopic OCT to augment WLE guided biopsy for esophageal disease diagnosis by offering a (iii) time efficient procedure that could (iv) screen large tissue areas at high resolution in 3D.

Endoscopic polarization sensitive optical coherence tomography (PS-OCT)

Purely intensity-based OCT inherently lacks the ability to provide (vi) tissue-specific contrast, since the amplitude information of the detected interference pattern is solely dependent on refractive index (RI) changes within the imaged sample. Obtaining tissue-specific contrast (e.g. between fibrous and non-fibrous tissues) is helpful for differentiating between neighboring tissue types, enabling less ambiguous image interpretation.

Functional extensions that incorporate additional properties of the backscattered light such as Doppler OCT [75] and spectroscopic OCT [76], have been developed. Among these functional extensions, polarization sensitive OCT (PS-OCT) exploits the change of the light’s polarization as it propagates through orientationally aligned tissues (e.g. collagen). Besides providing an additional qualitative contrast mechanism, PS-OCT also allows quantitative information regarding tissue birefringence and depolarization to be obtained [77, 78] (Fig. 5(a)-(f)).

Fig. 5
figure 5

Milestone 3: Cartoon illustration of the advantage of PS-OCT over conventional OCT using the example of subepithelial fibrosis of esophageal tissue (as in EoE). a and depict histologic sections, c and d OCT intensity scans, and (e) and (f) OCT local phase retardation (a measure of birefringence) images of two pathologically different stages (healthy vs. fibrosis). While the OCT intensity scan of the fibrotic tissue region may not allow quantification of subepithelial collagen, the PS-OCT image clearly outlines the fibrotic area and even allows for quantitative birefringence evaluation. g and h depict swine esophagus images acquired by an endoscopic PS-OCT system ex vivo, showing the layered structures of epithelium and lamina propria/muscularis mucosa. Intensity (g) as well as accumulated phase retardation (h) are shown, respectively (figure and caption adapted and reprinted with permission from [83] © The Optical Society)

PS-OCT was first reported at MIT in 1992 [79]. Several different PS-OCT approaches have been published to date, many of which focus on techniques for conducting this form of imaging through optical fibers that are routinely used in endoscopic OCT. Commonly employed single-mode fibers (SMF) do not conserve the light’s polarization state during propagation under stress. Since polarization maintaining fibers (PMF) are challenging to employ for endoscopic PS-OCT, so-called multiple input state systems for standard SMFs have been developed.

The first multiple input PS-OCT was demonstrated in the early-2000s [80], while first endoscopic PS-OCT images [81] as well as first-in-human investigations [82] were presented from Dr. Johannes de Boer’s group at the WCP at MGH. Swine esophageal imaging ex vivo with an endoscopic PS-OCT system was reported in 2014 [83] (Fig. 5(g), (h)). Today, endoscopic PS-OCT is mainly applied intravascularly for characterization of atherosclerotic plaques [84,85,86], but also has gained traction for pulmonary endoscopic OCT [87,88,89,90].

By making use of the changes of light polarization caused by its propagation through the sample, additional quantitative information about oriented structures such as collagen can now be extracted from the recorded OCT signal. Thus, the lack of (vi) tissue-specific contrast pertaining to collagen deposition/remodeling in conventional OCT may be overcome through the use of PS-OCT.

1-micrometer optical coherence tomography (µOCT)

As a pathologist, Dr. Tearney was particularly interested in visualization of cellular details using OCT in vivo, in order to enable image interpretation on a cytopathological level. However, conventional endoscopic OCT primarily obtains images at resolutions of ~ 10 µm and ~ 30 µm in axial and lateral dimensions, respectively, which renders it unable to identify individual cells. The (iv) imaging resolutions of WLE in vivo, and conventional endoscopic OCT are too low for cellular level assessment.

In order to allow for tissue imaging at the cellular level, several research groups undertook efforts to push the resolution limits of OCT towards ≤ 1 µm axially and ≤ 2 µm laterally, corresponding to a three-magnitude-fold increase in voxel-resolution [91,92,93,94] (Fig. 6(a)). Terms such as cellular-resolution OCT, high-resolution OCT or ultra-high-resolution OCT all characterize similar approaches towards the same goal. Within this manuscript we will refer to this form of highest resolution cross-sectional OCT imaging as micro-OCT (µOCT), a term coined in 2011 [95].

Fig. 6
figure 6

Milestone 4: Direct comparison between conventional resolution endoscopic OCT and µOCT. a Voxel resolution comparison. b and c benchtop imaging comparison using esophageal biopsies. The cell nuclei in (c) were contrast enhanced by application of acetic acid just before image acquisition. Scale bars: 100 µm

Axial (or depth) and lateral (or transverse) resolutions are decoupled in OCT. While the latter is dependent on the focal spot size, and thus the focusing optics, axial resolution is defined only by the coherence length of the featured light source, rather than the depth-of-field (DOF) as in conventional microscopy. In approximation, the axial resolution is determined by the spectral bandwidth and the central wavelength of the light source. As a rule of thumb, the larger the bandwidth and the shorter the central wavelength, the higher the axial resolution.

In order to achieve highest axial resolutions ≤ 1 µm, large spectral bandwidths (> 200 nm) in the visible or near-infrared (NIR) need to be employed. Naturally, this increases the demands for OCT instrumentation, in particular with regards to optical components that encounter enhanced chromatic dispersion as well as camera technology that enables high spectral resolutions. Previously, expensive solid-state lasers such as Cr4+:forsterite [91] or Ti:sapphire lasers [92,93,94] were used in order to obtain the required wide optical bandwidths. Recently, the advent of less expensive supercontinuum fiber-based light sources, has made high axial resolution OCT more accessible to the field. These commercially available supercontinuum sources are more robust and compact than their bulky solid-state predecessors, and thus more easily adaptable for clinical instruments. Since commercially available swept sources facilitate only limited spectral bandwidths (usually < 140 nm), µOCT has been primarily realized using an SD-OCT approach.

The first µOCT investigation at the WCP at MGH examined human coronary artery disease (CAD) from cadavers, ex vivo. Clear delineation of cellular and sub-cellular features associated with atherogenesis, thrombosis, and responses to interventional therapy were reported [95]. In addition to cardiology, µOCT has been demonstrated in the fields of gastroenterology and pulmonology [96]. The latter featured the first animal µOCT study for airway cilia and mucus imaging in living swine using a linear scanning scheme and an interferometric common-path approach [97]. Nasal tissue µOCT imaging data was reported by a group at the University of Luebeck, Germany [98], while a collaboration between the WCP and the University of Alabama, Birmingham, AL, culminated in a first comprehensive human µOCT imaging study comparing nasal mucosal image findings from healthy subjects vs. those from cystic fibrosis patients [99].

The need for (vi) 1-µm resolution imaging to determine the extent of diseases at the cellular level has thus been addressed through the introduction of µOCT imaging (Fig. 6(b), (c)).

OCT-based tethered capsule endomicroscopy (OCT-TCE)

Although VLE proved effective, in particular for BE screening, as indicated by a number of clinical publications [100,101,102], this technique still requires WLE [51]. Conscious sedation in adults and general anesthesia in children is therefore needed while the endoscope is guided down the upper GI tract. This requirement makes the procedure a (ii) cost-inefficient screening tool (vii) increases the risk for anesthesia related complications in pediatric patients [23, 103].

In 2013, the Tearney Lab demonstrated a swallowable tethered imaging pill to enable a minimally invasive alternative to endoscopic biopsy, not dependent on subject sedation [104, 105]. So-called OCT tethered capsule endomicroscopy (TCE) incorporated conventional resolution (10 µm axial; 30 µm lateral) OCT in an optomechanically-engineered pill on a string (Fig. 7(a)). In comparison to previous GI imaging pills, that used forward viewing white light video imaging or fiber piezo scanning [106, 107], OCT-based TCE provides cross-sectional, depth-resolved imaging.

Fig. 7
figure 7

Milestone 5: OCT-based TCE for unsedated upper GI tract imaging. a Schematic of capsules using different rotational principles (RJ proximal scanning vs. micro motor distal scanning) and focusing components (GRIN lens vs. ball lens). b Illustration of imaging procedure. c OCT-based TCE image data from a patient diagnosed with BE, high-grade dysplasia, and intramucosal carcinoma. Cross-sectional zoom-ins as well as 3D renderings (figure and caption adapted and reprinted with permission from [104] © Springer Nature)

OCT-based TCE enables microscopic imaging of the entire esophagus wall in unsedated patients within a few minutes. Three-dimensional esophageal images are acquired as the capsule descends the organ via gravity and peristalsis and when it ascends, actuated by manual tether pullback (Fig. 7(b)). Following the procedure, the capsule is withdrawn by the tether and ready for reuse after disinfection.

After the first-in-human demonstration [104] (Fig. 7(c)), OCT-based TCE was later also demonstrated to allow for fast and minimally invasive BE screening in a primary care setting [108], because of its advantage of not requiring patient sedation. In addition to esophageal imaging, OCT-based TCE has been also now applied to imaging the stomach and small intestine [109].

Many different TCE modifications have since been explored, including different capsule sizes, various rotational scanning principles (e.g. micro motor distal scanning vs. RJ proximal scanning), and varying focusing components (e.g. ball lens vs. GRIN lens) [104, 108, 109] (Fig. 7(a)). Additional endoscopic tools have also been added to the capsule (e.g. laser marking [110]). The TCE community is growing; other leading research groups in the field of endoscopic OCT [111,112,113,114,115] and photo-acoustic imaging [116] have already integrated TCE into their scientific portfolios.

In comparison to previous techniques, OCT-based TCE has the clear potential to (ii) decrease procedure costs and (vii) enhance procedure tolerability by rendering patient sedation obsolete.

Spectrally encoded confocal microscopy based TCE (SECM-TCE)

As pointed out previously in section two, purely intensity-based OCT is usually described to not provide tissue-specific contrast, by only mapping backscattered light intensities over depth, corresponding to RI changes. In approximation, these backscattered intensities scale with the magnitude of the RI change – the higher the RI difference the more light is backscattered. Therefore, the highest OCT image contrast is achieved when RI gradients in tissue are large. Reflectance microscopy has uncovered that esophageal eosinophils scatter light significantly more than surrounding cells or tissues [117]. It has been postulated, that the intracellular granules within eosinophils have a higher RI compared to other cytoplasmic constituents [118, 119]. Under specific circumstances, µOCT thus might be able to detect cellular, and such (vi) intrinsic tissue specific contrast.

Influx of eosinophils in the esophageal wall is the primary diagnostic indicator for EoE today. A new diagnostic that aims to replace endoscopic biopsy as the disease’s diagnostic gold standard, has to have the ability to quantify eosinophils in the esophageal wall in order to compare it to other potential disease relevant indicators.

In 2011, the Tearney Lab reported on increased backscattering of light by eosinophils with respect to surrounding esophageal epithelium [117]. Using an in-house developed high-speed reflectance confocal microscopy (RCM) technique called spectrally encoded confocal microscopy (SECM) [120] (Fig. 8(a), (b)), individual eosinophils (confirmed by histopathology) were visualized as high contrast structures within biopsies collected from EoE patients (Fig. 8(c)-(f)). Even though the cause for the hyperreflectivity of eosinophils is unknown, it may be attributed to their large size bilobed nuclei and the RI of specific crystalloid granules in the eosinophil cytoplasm.

Fig. 8
figure 8

Milestone 6: SECM for quantification of individual eosinophils. The working principle of SECM is illustrated in (a) and (b). A spectrally dispersed line is scanned over a region of interest enabling highly parallelized confocal microscopy imaging. Next to the corresponding H&E histology (c), (d), imaging results from the first SECM EoE biopsy study that illustrate the increased light scattering by eosinophils are depicted (e), (f) (figure and caption adapted and reprinted with permission from [117] © American Society for Gastrointestinal Endoscopy)

In the pursuit of human studies of SECM in vivo, the imaging technology was successfully integrated into a tethered capsule device, enabling first pre-clinical SECM-based TCE investigations in vivo [121]. Recently, a first-in-human study was reported, clearly demonstrating SECM’s ability to visualize individual eosinophils in patients [122]. Delineation of highly scattering eosinophils, and thus (vi) cellular/tissue specific contrast has now been demonstrated in vivo. However, being a confocal microscopy derivative that grabs images in a single transverse plane, it is challenging for SECM to visualize cross-sections of the esophageal wall for simultaneous epithelial and subepithelial tissue assessment. Nevertheless, the demonstration of eosinophilic hyperreflectivity in SECM raised expectations regarding similar (vi) intrinsic cellular/tissue specific contrast in µOCT images, which similarly derives its contrast from tissue reflectance.

Depth-of-focus (DOF) extended µOCT

As mentioned before, µOCT’s high lateral resolution (~ 2 µm) results from increasing the numerical aperture (NA) of its imaging optics. With conventional lenses, these high lateral resolutions can only be achieved for a certain depth range, inversely proportional to the NA, called the DOF (Fig. 9(a)). Thus, high lateral resolutions may only be conserved over a limited axial depth, considerably diminishing OCT’s potential for obtaining (iv) highest lateral imaging resolutions in vivo while maintaining (i) depth-resolved, cross-sectional tissue diagnostics.

Fig. 9
figure 9

Milestone 7: µOCT enables maintenance of high lateral imaging resolutions over an extended DOF. a low NA vs. high NA imaging with regards to lateral resolution (Δx) and DOF (b). b DOF extension using coaxially focused multi-mode (CAFM) beam. Simulation results of the intensity profile over depth as well as real transverse beam profiles of both a Gaussian and a CAFM beam are depicted. c µOCT images of esophageal swine tissue ex vivo compared to those obtained by conventional probe optics (figure and caption adapted and reprinted with permission from [133] © The Optical Society)

Several techniques to overcome this issue have been proposed and demonstrated. These techniques can be separated in software and hardware approaches: Besides diverse digital refocusing algorithms [123], hardware DOF extension (often referred to as extended depth-of-focus (EDOF) imaging) was achieved using Bessel-like beam illumination [124,125,126], phase masks/beam apodization [127, 128], chromatic aberration [98, 114, 129], and apertures/beam multiplexing [130, 131].

In 2016, a novel waveguide-based DOF extension method, generating a multi-focal beam for OCT was introduced by the Tearney Lab [132]. This technique, termed coaxially focused multimode (CAFM) beam generation, makes use of a short segment of multi-mode fiber acting as a waveguide to create multiple annular modes that are focused at different locations within the sample (Fig. 9(b)). Using this technique, a high lateral resolution of ~ 2.5 µm can be conserved over a roughly 10-times longer depth range in comparison to that obtained using conventional optics.

First swine esophageal and coronary tissue imaging ex vivo demonstrated advanced cellular resolution imaging capabilities over an extended depth [133]. In 2019, the Tearney Lab also reported first rabbit intravascular EDOF µOCT in a preclinical study, in vivo [134] (Fig. 9(c)).

Without EDOF, µOCT imaging is limited to a shallow imaging range, withholding its true cross-sectional imaging potential. The conservation of (iv) highest lateral imaging resolutions over an (i) extended depth range has the potential to considerably benefit depth-resolved cellular tissue diagnostics in vivo.

Future of EoE diagnostics

Limitations of current methods for EoE diagnosis and assessment of therapeutic outcomes has hindered progress in the field. Building on all previously outlined achievements – endoscopic OCT, FD-OCT, PS-OCT, µOCT, OCT-based TCE, SECM-based TCE, and EDOF µOCT – the introduction of polarization sensitive micro optical coherence tomography based tethered capsule endomicroscopy (PS-µOCT-based TCE) illustrates the final step within a two-decade-long translational process. This cluster of imaging modalities results in a (i) depth-resolved, (ii) cost-effective, (iii) time-efficient, (iv) high-resolution, (v) large area (i.e. high speed and 3D), (vi) intrinsic tissue-specific contrast, and (vii) sedation/anesthesia-free imaging tool.

Regarding EoE, this technology has the potential to: (i) provide differentiation of individual eosinophils from the surrounding tissue (allowing eosinophil count assessment throughout the entire esophagus), as imaging results from a first EoE biopsy bench-top study demonstrate (Fig. 10(a)-(f); patient selection, consenting, biopsy collection, imaging, and processing were performed according to the MGH’s Internal Review Board approved protocol 2015P000328) [135], (ii) allow for simultaneous assessment of subepithelial tissue alterations towards quantifying collagen and smooth muscle content (fibrotic and hypertrophic tissue remodeling), as indicated by a first swine esophageal tissue bench-top imaging study (Fig. 10(g)-(i); tissue imaging, ex vivo, was performed according to the MGH’s Institutional Animal Care and Use Committee approved protocol 2014N000300), and (iii) enable the time course and relationship between inflammation and subepithelial remodeling to be studied.

Fig. 10
figure 10

Proposed EoE diagnostic technology: µOCT imaging (a)-(c) and H&E histology (d)-(f) results from an EoE biopsy study. Biopsies from EoE patients were imaged within 60 minutes after being obtained by routine WLE guidance. a and depict exemplary EoE negative (no eosinophils) and EoE positive intensity scans, respectively. d and e depict corresponding histology results. The hyperreflective small spherical structures in (B) were confirmed to be eosinophils by histology (e). c and f present magnifications of the indicated areas in (b) and (e) showing individual eosinophils. g-i depict PS-µOCT imaging results from a first esophageal tissue study in swine, ex vivo. g depicts the squared intensity image clearly indicating the epithelium (EP), lamina propria (LP), muscularis mucosa (MM), sub-mucosa (SM) and muscularis externa (ME). h depicts the corresponding phase retardation image highlighting lamina propria as birefringence expressing layer (natural collagen content). i depicts the corresponding H&E histology showing the same layered structure as the PS-µOCT images. Scale bars: 100 µm

An improved diagnostic modality for EoE has to aim for depth-resolved epithelial eosinophil quantification over a comprehensive tissue area in vivo, to improve the understanding of the underlying inflammatory processes in this disease. By lowering the threshold for repeat disease assessment, the less invasive nature of this technology should allow for more rapid testing with shorter follow-up intervals which, together will more accurately define the time course of eosinophil influx and disappearance in response to removal and reintroduction of dietary antigens. The process of identifying dietary triggers through empiric elimination diets (typically with 6–8 weeks between each WLE-based assessment after food removal or reintroduction) that may have stretched over a year or more [19, 20], could potentially be shortened to several weeks or days.

Besides comprehensive, depth-resolved quantification of eosinophils, the EoE diagnostic of the future needs to provide quantitative information on subepithelial tissue remodeling in vivo. Assessment of the degree of fibrosis in the lamina propria as well as hypertrophic changes of the muscularis mucosa may inform on the state of disease progression, while in addition may help to address critical gaps regarding the clinical understanding of the natural history of stricture development in EoE. Thus, ongoing studies on detection of incipient fibrosis due to edge assessment of generous biopsies that include fragments of lamina propria [12], might be complemented by quantitative PS-µOCT birefringence measurements in vivo.

Thanks to the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) and the 2019 SPIE-Franz Hillenkamp Postdoctoral Fellowship, the translational journey towards such a diagnostic tool is progressing at a good pace (Fig. 2). The goal of this project is to (i) design and fabricate a PS-µOCT-based TCE instrument, validate the concept (ii) ex vivo, (iii) in vivo in animals, and (iv) perform a first-in-human pilot study in healthy subjects. If successful, the plan is to move forward to launch clinical trials in EoE patients. The outcome and results of this project will be described in a future report.

Conclusions

The journey to conceiving this project has come from an understanding of a clinical problem (EoE) and the deficiencies in the standard of care for these patients. The needs for a minimally invasive technology for simultaneously assessing eosinophil burden and sub-epithelial remodeling are clear. The soon-to-be realization of this project is made possible by more than two decades of research to develop an ever-increasing toolset for in vivo optical endomicroscopy. The selection of the most appropriate tools (TCE, µOCT, PS-OCT) for solving this problem forms the basis for the proposed technology and we hope, a solution that will address the critical unmet needs in the field of EoE diagnostics.

Many technical challenges were overcome along the way. The nidus for this field has been the development of confocal and OCT technologies, ways of ignoring the majority of light remitted from tissue that has been multiple scattered and only collecting/analyzing the small fraction that contains microscopic information. Increasing the speed of OCT, by the introduction of FD-OCT, has been another major advance, allowing 3D microscopic imaging in vivo in realistic procedure times, avoiding patient motion artifacts. The implementation of TCE has also been important, opening up an opportunity to rapidly scan the esophagus in unsedated patients, making the procedure potentially much less expensive than WLE and definitely more tolerable for those who undergo the exam. New contrast mechanisms (e.g. PS-OCT) and enabling technologies for achieving high resolution such as EDOF µOCT are also critically important for this application.

Engineering advances aside, other important tasks were accomplished along the way. Intellectual property was protected through a robust collaboration with MGH’s licensing office (MGB Innovation), making it possible to commercialize these devices following clinical demonstration/validation. Funding for this research has been diverse, ranging from philanthropy and foundations to national (e.g. NIH) and industry. In total, more than $ 20M in funding has been raised by the Tearney Lab in order to advance the field to get to the point where effective solutions to EoE diagnostics and monitoring can be addressed.

It took about 20 years to discover, develop, and move these modalities along the translational pathway. Over 200 faculty, postdoctoral fellows, students, engineers, technicians, regulatory experts, patent attorneys, licensing and legal specialists, and administrators joined the Tearney Lab over this period of time and contributed. They brought in the mandatory multidisciplinary skills and expertise needed in research (physics (optics), engineering (mechanics, electronics, optics, computer science, and data science), telecommunication sciences, translational sciences, medicine (GI, pathology, primary care, allergy and immunology), regulatory sciences, business development and commercialization.

As of today, several of the discussed building blocks are already in the clinics and used to diagnose, monitor and screen for a diverse field of diseases and disorders. Endoscopic OCT is commercially available for cardiovascular and GI applications targeting, in particular, CAD and BE/EAC, respectively. Clinical studies in other, only endoscopically accessible areas of the human body such as the pulmonary tract, the urinary tract, and the gynecologic tract are performed at various institutions around the globe [49]. Endoscopic PS-OCT is not yet commercially available, however clinical investigations in cardiology [136, 137] and pulmonology [90] are very promising. Finally, sedation-free OCT-based TCE is attracting increasing attention in the GI field with more and more research groups making advances in this area [111,112,113,114,115].

There are still many challenges ahead before PS-µOCT-based TCE will potentially replace WLE guided biopsy as the gold standard for diagnosis of EoE. One limitation of the current imaging probe design is EoE’s significant prevalence and incident rate [1] in young children who are unable to swallow the capsule. This vulnerable population also carries a higher risk for anesthesia related complications in WLE [23], resulting in the recent adoption of unsedated transnasal endoscopy (TNE) [138] for EoE diagnosis in a growing number of pediatric GI clinics [139,140,141]. Unsedated TNE provides a considerable reduction in procedure costs and is in general better tolerated (both due to the lack of anesthesia), however, handling of the reduced-form-factor endoscope is slightly impeded and up to 5% of subjects experience epistaxis [138]. The Tearney Lab has recently explored the transnasal route for upper GI tract imaging using OCT [142]. Hence, even though the current generation capsule probe requiring transoral deployment might exclude children below a certain age, future iterations of the PS-µOCT technology will enable transnasal deployment.

Thanks to the SPIE-Franz Hillenkamp Postdoctoral Fellowship and a Research Project Grant (R01) from the NIDDK, the first project phase of the PS-µOCT-based TCE technology research and instrument design has already been completed. Benchtop investigations of human biopsies and swine tissue ex vivo proved the two technological key advances – (i) visualization of individual eosinophils in the entire epithelium; (ii) subepithelial birefringence quantification – perform as anticipated (Fig. 10). In the next phase, an animal feasibility study will be conducted in a swine model expressing esophageal strictures (and potentially esophageal eosinophilia [143]) in vivo. A first-in-human study in healthy subjects and EoE patients will then conclude the second phase of the project. Later on, larger scale clinical studies to validate PS-µOCT-based TCE compared to the standard of care will be performed. Once validated, this technology can be used to investigate current treatment options for EoE as well as food re-introduction during dietary exclusion, provide answers to key questions in the field, with the promise of enhancing EoE management strategies and improving drug development for these patients.