1 Introduction

In the field of cochlear mechanics, quantitative observation of the vibration within the sensory epithelium is important since hair cells in the epithelium transduce sound vibration into electrical signals that are essential for audition [1,2,3,4]. In mammals, the epithelium contains various cells and structures, including inner hair cells (IHCs), outer hair cells (OHCs), basilar membrane (BM), and the reticular lamina (RL). Sound elicits vibration in all of these constituents. Among them, only OHCs cell bodies actively and periodically change length in association with the vibration [5, 6]. The unique mechanical activity of OHCs modifies the vibration in the epithelium with a feedback mechanism [7,8,9]. The phenomenon, in conjunction with the nonlinear mechanical properties of sensory IHCs, is considered crucial to establish the high-sensitivity and sharp-frequency tuning associated with normal hearing [9,10,11]. Nevertheless, because the activity of OHCs has only been measured in isolated cells, there is still uncertainty regarding the real motion in situ [5, 6].

To examine the nano-vibrations within the epithelium, optical coherence tomography (OCT)-based vibrometry systems have been recently developed [12,13,14,15,16,17,18,19,20,21,22,23,24,25]. These OCT systems successfully analyze the tissues or cells in the epithelium [15,16,17,18,19, 22,23,24,25,26]. Though their theoretical resolutions in inanimate materials are cellular level, the difference in vibrations between the apical and basal regions of OHCs in animals in vivo has not yet been fully addressed. To precisely resolve these regions during the in vivo vibrometry, we consider two factors, axial resolution and spatial accuracy, in this study.

The reflectance of the object affects the effective optical bandwidth that is the key parameter for the axial resolution of the OCT system. The reflectance of the sensory epithelium is 0.04–0.004%, which is 200–2000 times less than that of the cornea in the human eye (~ 8%) [27]. To accumulate the backscattered light sufficiently with the conventional OCT vibrometry systems, the total acquisition time for one session is typically more than 1 s [13, 19, 22, 24, 25]. However, the background fluctuation arising from the physiological motion such as the heartbeat and respiration often exceeds some dozen micrometers in a cycle of more than 1 s. Because this fluctuation is significantly larger than the pixel size in an A-scan image, the long total acquisition time can smear out the image and degrade the spatial accuracy. To avoid the effect of the physiological motion and accurately detect the vibrations inside the epithelium, we reduced the total acquisition time. The reduced total acquisition time and the low reflectance of the sensory epithelium deteriorate the power of the accumulated light in the sensor. In this situation, the above conventional spectral-domain OCT (SD-OCT) system is insufficient because of the power and spectrum of the light source; the power of a super-luminescent diode (SLD) typically used in SD-OCT systems is less than 5 mW [16, 19, 22, 24, 25, 28]. Because of the Gaussian distribution of the light source spectrum, the power at both ends of the spectrum is often weaker than around the central wavelength. The effective optical bandwidth of the system is therefore narrowed, leading to the spatial resolution being degraded. We overcome this impediment by introducing a supercontinuum (SC) source into the SD-OCT system. Owing to the nonlinear optical phenomenon, it provides more power than the SLD and a smoother spectral continuum within the sensitive optical bandwidth of the sensor. These modifications achieve high axial resolution and high spatial accuracy in imaging and vibrometry in transparent biological material even with the physiological movements. Consequently, we demonstrate the ability of our system to measure simultaneously and accurately sub-nanometric motion in the apical and basal areas of OHCs in reference to that occurring in the BM region.

In this study, we first evaluate the performances of the SC-introduced SD-OCT system (SCSD-OCT), and apply it to the biological measurement in ex/in vivo guinea pigs. Lastly, we analyze the vibration data in in vivo guinea pigs in comparison with post-mortem condition, and discuss its physiological significance.

2 Methods

The details on the basic performances of the SCSD-OCT, sound frequencies applied to the animals, cochlear electrophysiological measurements, the laser interferometer used in this study, and in vivo vibrometric data are provided in Supplemental materials.

2.1 Animal preparation

All animal experiments, which were designed in accordance with the Japanese Animal Protection and Management Law, were carried out in compliance with the protocol approved by the Institutional Animal Care Committee and the President of Niigata University (Permission Number: Niigata Univ. Res. SA00407). Hartley guinea pigs (200–300 g, 2–4 weeks of age; SLC Inc., Hamamatsu, Japan) were housed at the animal facility and kept in a 12-h light and 12-h dark cycle. Food and water were provided ad libitum. All animal handling and reporting complied with the animal research: reporting of in vivo experiments (ARRIVE) guidelines [29]. For ex vivo preparation, all animals were anesthetized by intraperitoneal injection of sodium pentobarbital (1 ml per guinea pig; Somnopentyl; Kyoritsu, Japan). Once deep anesthesia was confirmed, the left cochlea was surgically exposed, and then dissected from the temporal bone. It was also washed in the chilled standard phosphate-buffered saline (PBS) consisting of NaCl, 137 mM; Na2HPO4, 8.1 mM; KCl, 2.68 mM; KH2PO4, 1.47 mM at pH 7.4. For in vivo preparation, we first intramuscularly injected atropine sulfate (0.05 mg/kg) and Midazolam (4 mg/kg) into the animals as a premedication. Animals were then deeply anaesthetized with an intra-peritoneal injection of urethane (1.5 g/kg). Supplemental doses of anesthesia were administered to ensure areflexia to toe pinches. After tracheotomy, the animal was artificially ventilated with room air using a respirator (SN-408-7; Shinano Manufacturing, Japan). Surgical access to the cochlea was performed using a similar method as previously described for biophysical measurements [19, 22, 30]. The pinna was removed along with the muscles covering the bulla. The ventral and anterior portions of the bulla were surgically opened using a ventrolateral approach to expose the round window in the basal turn of the cochlea. The guinea pig’s head was fixed on the acrylic plate, which was tightly connected to a base stage (SL20/M; Thorlabs, USA). The animal’s body was positioned on a heating pad atop a movable platform under the OCT system. The platform could move planarly with a housed speaker (T250D, Fostex, Japan), calibrated with an ultrasound microphone (Sokolich ultrasonic probe microphone system, US). The tip of the speaker was inserted into the left ear canal of the guinea pig. To monitor the auditory function of the live animal, we measured the auditory brainstem response (ABR) thresholds before and after the recording at four different frequencies of 7000, 14,000, 21,000, and 28,000 Hz. The experiment including surgery was completed within 4 h.

The total number of animals used in this study was 87. Among them, four animals were used for ex vivo experiments, four for evaluation of the effectiveness of the power of the SC light source for in vivo measurements, twelve for the irradiation effect on ABR, and one for recording the vibration tuning curve with a laser interferometer. The other 66 animals underwent the surgery described above for in vivo recordings by means of SD-OCT vibrometry. In 34 animals, the preparation was successful and data were acquired prior to and 30 min after anoxia. Eleven out of 34 preparations showed minimal ABR threshold changes during the experiment; if the ABR thresholds changed more than 30 dB in any frequencies, we eliminated the data from the analysis. Of the 11 sensitive cochleae, the vibration amplitude of four exhibited a clear nonlinear response to the sound stimulation at BM or RL, a hallmark for physiological reaction of the sensory epithelium. Four samples with a clear nonlinear response were selected for this study, because the cochleae without the nonlinear response were likely to be damaged by surgery or anesthesia. Accordingly, we analyzed the limited number of samples that strictly met the ABR threshold criteria as stated above and the active vibration of the epithelium as the other groups adopted [16, 18, 20, 31, 32]. Although some groups have worked on this type of in vivo measurement in various animals, the output has been very low [16, 20, 31,32,33,34]. The invasive surgery, low reflectance of the epithelia, vulnerability of sensitive hearing, and time-consuming data collection are known to be the main reasons for this outcome. In our study, the number of samples is almost equivalent to that in all the previous studies [16, 18, 20, 31, 32].

2.2 The basic performance of the SC-introduced SD-OCT

Figure 1a details the SCSD-OCT system that we constructed for this study. For this setup, we used the spectrometer and scanning system in a commercial SD-OCT (Ganymede GAN520; Thorlabs, US) as well as an SC source (SuperK EXT-20; NKT Photonics, Denmark). The light source was connected to an optical bandpass filter (SuperK SPLIT; NKT Photonics, Denmark), a fiber delivery system (SuperK CONNECT; NKT Photonics, Denmark), and a single-mode broadband fiber (FD7; NKT Photonics, Denmark). We used a custom objective lens (a modified M Plan Apo NIR 10 × , Mitutoyo, Japan), with improved transmission at the wavelength employed, a long focal length of 30.5 mm, and a depth of focus of 4.1 μm. It had improved transmission at the wavelengths employed in this study. The average light powered onto the sample was 117 mW. The effective bandwidth and central wavelength in the spectrometer were approximately 200 nm and 900 nm, respectively (Table 1).

Fig. 1
figure 1

source was used in the recording of the inanimate and biological materials. BS beam splitter. b 2D cross-sectional tomographic image of air/glass/air layers. From the top, the layers are air, glass of 50 ± 0.01 μm thickness, and air. The line recorded from this is highlighted by the red line. c Signal intensity (arbitrary unit) on linear scale. d Upper panel shows the CCD view of a test target. The selected line in lower panel is highlighted by yellow. The dotted yellow line in lower panel indicates the line shown in d. e OCT signal intensity on logarithmic scale. 8-1 and 9-1 ~ 3 show the group number and element of the test target (color figure online)

Depiction and performances of the supercontinuum-introduced spectral-domain optical coherence tomography (SCSD-OCT) system. a A supercontinuum (SC)

Table 1 Performance on various reflective objects of the supercontinuum-introduced SD-OCT (SCSD-OCT) system

3 Results

3.1 The performance of the SC-introduced SD-OCT in low reflective objects

To analyze the interference signal acquired in the spectrometer and evaluate the axial resolution, we tested the image of a fused silica glass of 50 ± 0.01 μm thick (Atock, Japan) (Fig. 1b). In a B-scan image of the glass, a line was selected for the A-scan. The interference signal intensity was temporally averaged over 10,000 A-scans, and the result is shown in Fig. 1c. The axial pixel size calculated from the number of pixels between the interference peaks was 1.25 ± 0.00 μm in the glass. Because the refractive index of glass and air is 1.45 and 1, respectively, the pixel size in air was 0.86 ± 0.00 μm. In the recording of the sensory epithelium, we estimated the refractive index to be 1.35; therefore, the pixel size is 1.16 ± 0.00 μm. The axial resolutions, which were calculated from the full width at half maximum (FWHM), of the SCSD-OCT were 1.72 ± 0.00 μm and 2.73 ± 0.00 μm in air and glass, respectively (see Supplementary text 1.1, and Fig. S2 (c)). We also assessed the lateral resolution by discriminating separate lines in an OCT image on a resolution test target (Edmund optics: USAF 1951 target) (Fig. 1d). Because the peak intensity of the OCT signal was over 5 dB higher than the background in the region of 8-1 (Fig. 1e), the lateral resolution was 1.95 μm. The performance of the SCSD-OCT is summarized in Table 1, and its detailed description and comparison of the performance to a SLD-introduced SD-OCT systems are described in Supplemental materials.

3.2 Doppler vibrometry

To analyze the vibration inside the sensory epithelium, we collected three parameters, that is, intensity signal of the epithelium, vibration amplitude, and phase. Before collecting such information, we first selected a line of interest from a B-scan image. Then, we performed a spectral Fourier transform over the wavenumber on the interference signal using ThorImageOCT (Thorlabs, US) software. The software applied a Hamming window in the processing, and captured the time-domain interference signal of 10,000 scans in one recording session. In terms of the intensity signal, which reflects the structural image of the epithelium, the custom-made LabVIEW program temporally averaged the assembled intensity of the time-domain interference signal. Moreover, the program also performed a spectral Fourier transform over the captured phase of the time-domain interference signal. The magnitude of the spectral Fourier transform was translated into the vibration amplitude, whereas the phase of the transform corresponded to the vibration phase. When a sample vibrates, backscattered light from a position z′ within the sample alters the optical path in the sample arm over time t. This alteration induces a change in the phase of the time-domain interference signal, φ(z′, t). Assuming the sample vibration is composed of a single frequency f, with an initial phase θ, φ(z′, t) becomes:

$$\varphi \left( {z^{\prime}, t} \right) = A\left( {z^{\prime}} \right)\sin \left( {2\pi ft + \theta } \right) + \xi \left( t \right),$$
(1)

where A(z′) is the amplitude of a sinusoidal vibration sample, and ξ(t) is the phase noise of the interference signal mainly due to physiological motion such as heartbeat, respiration, and muscle contractions. The amplitude A(z′) was converted into units of nanometers by multiplying by λ/(4πn); the actual amplitude of the vibration is calculated as A(z′) × λ/(4πn). For air and intracochlear soft tissue, we assumed n to be 1.0 and 1.35, respectively [35]. To smooth the distribution of the amplitude and phase along the depth, a triangular moving average filter with a half-window of 3 pixels was applied in the spatial domain.

To control sample vibration and its recording, we prepared two sinusoidal voltages using a function generator (WF1948, NF Corporation, Japan) with specific amplitudes, frequencies, and delay. One voltage drove a piezo or speaker for the stimulation, and the other voltage served as an external trigger input of the OCT device to control the sampling rate. To detect an amplitude and phase in steady-state vibrations, we started the latter voltage with a 200 ms delay to the former voltage. Accordingly, no recording data included the information in a transient response of vibration. Notably, the phase in vibration was referenced to the phase of the sinusoidal voltage for the sample. To avoid the micrometer-scale physiological motion during in vivo vibrometry, we reduced the total acquisition time for one session. Therefore, in this study, all the total acquisition times were 100 ms except for M-scan imaging of 1 s (see Sect. 3, 4). During the 100 ms, the software collected 10,000 A-scans in one session because the sampling rate of the line sensor was 100,000 Hz.

The frequency response of SCSD-OCT was assessed using a piezo-electric actuator (PZT). Figure 2a shows a B-scan image of PZT. This target was sinusoidally moved by providing voltages of 0.5, 1.0, and 10 V at stimulus frequencies between 5000 and 30,000 Hz with 1000 Hz increments (Fig. 2b). To capture many cycles in a recording of low-frequency vibration for accuracy, the sampling frequency for stimuli of less than 15,000 Hz was 40,000 Hz, whereas it was 100,000 Hz for stimuli of more than 16,000 Hz. We, therefore, used a 100 ms total acquisition time for the sample’s vibration frequencies from 16,000 to 30,000 Hz, and a 250 ms total acquisition time for those from 5000 to 15,000 Hz. In both situations, the number of scans collected in one session was 10,000. The amplitude was collected from the location of the PZT surface (Fig. 2a, open red circle). The detected amplitude increased linearly with the strength of the applied voltage, and the values in 1 V stimuli were consistent with those acquired by laser interferometry. The detailed system of the laser interferometer is described in the Supplemental materials [30]. The average measurement error between the two methods was less than ~ 0.72 nm. To ensure that our SCSD-OCT system permitted the phase measurement, we drove the PZT with sinusoidal stimuli at both 23,000 Hz and 30,000 Hz frequencies with different phase delays. The delays were varied from 0° to 90° in 10° increments. We then compared the vibration phase to the delay. Figure 2c displays the delay versus the measured phase of the vibrating PZT. The measured phase grew in parallel with the shift of the delay. Figure 2d indicates the fast Fourier transform (FFT) vibration amplitude measured from the surface of the vibrating PZT at 23,000 Hz with 10 V stimuli. We evaluated a limit of detection (LOD) value of vibration amplitude in the frequency domain. The vibration amplitude in all the frequency was derived from the change in phase of the interference signal, φ(z′, t) as we mentioned above (Eq. 1). LOD was defined by the noise floor (NF) and standard deviation (SD) as LOD = NF + 5SD. To calculate NF and SD, 10,000 scans of the vibration amplitudes in the frequencies from f–1500 Hz to f–500 Hz were used (Fig. 2d, red square). NF is the noise floor, i.e., averaged value, and SD is the standard deviation. f is the frequency of the vibration of the target sample. We adopted this criterion for all recordings. Under the PZT vibration in a 24,000 Hz stimulus with 0.5 V voltage, the LOD was 477 pm (Fig. 2b, gray line).

Fig. 2
figure 2

Performance of the supercontinuum spectral-domain optical coherence tomography (SCSD-OCT) system during vibrometry. a 2D tomographic image of the surface of a piezo-electric actuator (PZT). The selected line is highlighted in red. The red open circle indicates the recording point on the PZT. b Acquired amplitudes of a vibrating PZT with the SCSD-OCT (blue) and a laser interferometer (IFM) (dotted red). The input voltage was varied from 0.5 to 10 V. c Phases of the vibrating PZT at 23,000 and 30,000 Hz stimuli, and 0.5 V (dotted red) and 1 V input voltage (blue). d The vibration amplitude of PZT using a 23,000 Hz stimulus, 100 ms duration, and 10 V input voltage. The limit of detection (LOD) was calculated from the signals from the red squared region in the inset and was plotted with a gray line in b (color figure online)

3.3 Ex vivo imaging of isolated cochlear partition

During in/ex vivo recordings, we limited the power of the SC source to 429.0 mW. In this situation, the power of the beam onto the sample was 117 mW, almost 10 times higher than that of 3–16 mW in previous OCT systems [12,13,14,15,16,17,18, 20,21,22,23,24,25, 36,37,38,39]. In addition to the object reflectance and short total acquisition time, we had two reasons for the necessity of the high power irradiation in the in vivo recordings of guinea pig cochleae from the anatomical perspective: mechanical vignetting and reflection of the beam. These factors were likely to be initiated by two characteristic structures inside the temporal bone. First, compared to the cochleae of the mouse and gerbil, the organ of guinea pigs is located deeper inside the bone (Fig. 3a) [19, 24, 31]. The beam was likely to be mechanically vignetted because of the narrow path in mastoid air cells (Fig. 3b). Second, the angle of the round window membrane in guinea pig cochleae is steeper toward the surface of the cochlear partition (Fig. 3c). To demonstrate these effects on the power of the beam, we monitored the light strengths on the pathway through the temporal bone and round window to the sensory epithelium in a dissected temporal bone (Fig. 3d). Almost 24% of the total power on the animal’s temporal bone was vignetted by the disturbance of the mastoid bone, whereas ~ 35% was scattered by the round window membrane (Fig. 3e). These results indicate that the light power on the epithelium by SLDSD-OCT can be estimated to be ~ 0.29 mW (Table S1). Accordingly, compared to that in other species [19, 24, 25], averaging 4,000,000 scans with a total acquisition time of ~ 27 s was likely to be required in guinea pigs [22].

Fig. 3
figure 3

Effect of the mechanical vignetting and reflection on the beam by the anatomical structures in a guinea pig. a A lateral view of the guinea pig cochlea during an in vivo vibration recording. The thick yellow arrow indicates the direction of the view shown in b. b A view of the basal turn and round window membrane of a guinea pig cochlea through the mastoid air cells. The thick yellow arrow indicates the direction of the view shown in a. c Optical coherence tomography (OCT) image of the round window membrane (RWM) and cochlear partition in a live animal. d An inside view of the middle ear cavity in a dissected temporal bone. Cochlea bony wall except for around the RWM is removed, and a light path is made on the bulla. e Schematic of the beam power between the surface of the temporal bone and the sensory epithelium

Prior to in vivo analysis of the sensory epithelium, we next recorded an image of the ex vivo tissue using SCSD-OCT. After dissection of the cochlea, we positioned the sample upside down on a phosphate-buffered saline (PBS) filled petri dish and exposed the epithelium by removing a round window membrane. In the schematic of the epithelium, the outer tunnel and tunnel of Corti are highlighted with a dotted pink circle and a triangle, respectively (Fig. 4a). These are major landmarks in the epithelium. The apical and basal regions of OHCs are highlighted in blue and purple, respectively. Figure 4b shows a camera image of the view including the sensory epithelium. A region of approximately 300 µm width in the sensory epithelium (red line) was subjected to a B-scan, and the unaveraged image is displayed in Fig. 4c. In this image, we identified the outer tunnel, the tunnel of Corti, and OHCs based on the schematic in Fig. 4a and literature [12, 15, 17, 20, 22, 38]. The OHC diameter and length at the basal turn of the guinea pig cochlea are ~ 7 μm and ~ 40 μm, respectively [16], whereas the ideal lateral resolution of our system was 1.95 μm. Nevertheless, the system distinguished not the three rows but the apical and basal regions of OHCs, and displayed OHCs as a cluster (Fig. 4c yellow dotted rectangle) [12, 17, 22]. This inconsistency may be due to the strong speckle initiated in the OCT image by SC source. To further identify the apical and basal regions of OHCs, we held the position of the beam on the center of the OHCs image (Fig. 4c, red arrowheads). Figure 4d shows the intensity distribution averaged over 10,000 A-scans. The x-axis is the axial location, upper to the left and lower to the right. The cochlear partition was visualized above the background signal in lymph fluids with a signal-to-noise ratio (SNR) of more than 5 dB (Fig. 4d, bidirectional arrow). The BM directly faces the perilymph, as such in Fig. 4d the contrast neighboring the weak signal to this fluid (pixels 211–220), mirrored the border zone of the BM [13,14,15,16,17,18,19,20, 22,23,24, 36, 37]. In this context, the peak observed around pixel 220 corresponded to the inside of the BM (Fig. 4d, green shading). Similarly, the other contrast around pixels 275–283 represented the border zone of the sensory epithelium, which is exposed to the endolymph. The peak at pixel 275 represented the RL containing the apical edge of OHCs, similar to observations in the literature [13,14,15,16,17,18,19,20, 22,23,24, 36, 37]. In addition, we observed a few contrasts inside the epithelium. Recent studies using OCT showed that OHCs provide continuous high intensity [12, 17, 22] and supporting cells elicit relatively low intensity [12, 17]. Therefore, it seems probable that, in the cochlear partition described in Fig. 4d, the area eliciting minimal intensity (pixel ~ 232) is the location of the supporting cells (open arrowhead). From this perspective, the region with relatively high intensities ranging from pixels 275 to 243 was likely to represent the inside of OHCs. In this regard, the moderate contrast between pixels 232 and 243 correspond to a border zone of OHCs. In summary, the region from pixels 232 to 283 was presumed to be described as the OHCs region (Fig. 4d, blue and purple shading). Note that the width of this region was 53.6 μm. This distance should result from a diagonal cross of the cluster of OHCs (one OHC: ~ 7 μm in diameter and ~ 40 μm in length) by the beam (see Fig. 4a, c). Similar results were recorded in two different dissected cochleae.

Fig. 4
figure 4

Ex vivo imaging of the cochlea partition isolated from a guinea pig. a Schematics of a cochlear partition. Red lines show the direction of the laser. b A view of the guinea pig epithelium in the recording. Red line indicates the scanned line. c Optical coherence tomography (OCT) image of the epithelium in the basal turn of the cochlea with indicative landmarks. Outer tunnel, tunnel of Corti, and a cluster of outer hair cell (OHCs) are highlighted in a dotted yellow circle, triangle, and rectangle. Red arrowheads indicate the selected line shown in d. d Intensity collected in the line was plotted on a logarithmic scale. Top of the epithelium lies to the left. In this and subsequent illustrations, blue and purple shading indicates presumed OHCs region, and green shading corresponds to BM. Bidirectional arrow, blue, purple, green, and open arrowheads exhibit cochlear partition, apiOHCs, basoOHCs, BM, and supporting cells region. RW round window, IHC inner hair cell, OHC outer hair cell, apiOHCs apical region of OHCs, basoOHCs basal region of OHCs, SCs supporting cells, BM basilar membrane, NF neuronal fiber, EL endolymph, PL perilymph (color figure online)

On the basis of these observations, in this and following measurements, we denoted that the peak region neighboring the contrast to the endolymph as the apical region of OHCs (i.e., apiOHCs) (Fig. 4d, blue arrowhead). The peak next to the contrast bordering the supporting cells was denoted the basal region of OHCs (i.e., basoOHCs) (Fig. 4d, purple arrowhead). From the histological viewpoint, the distance between these peaks should be roughly < 50 μm [16]; in our data of Fig. 4d, this parameter was 53.6 μm. In most of the recordings, the number of pixels representing this region exceeded 30. This feature was due to the high axial resolution of our system, and the strong advantage in distinguishing the apical and basal regions of OHCs.

3.4 In vivo measurements of cochlear partition motion

Motion effect of the live animal on the OCT signal has not yet been fully examined in the previous studies [13,14,15,16,17,18,19,20,21,22, 24,25,26, 28, 32, 33, 36, 37, 40]. Therefore, we evaluate the effect of the total acquisition time on the spatial accuracy during vibrometry by M-scan. The M-scan is the time-motion display of the OCT image along a chosen line. Figure 5a displays the OCT images of the cochlear partition including the sensory epithelium. We selected a line on the OCT image and then recorded the M-scan image for 1 s without sound stimuli as shown in Fig. 5b. The sampling rate was 10,000 Hz. Owing, not to the sound, but to the background fluctuation arising from the physiological motion, recording position seemed to be shifted laterally. This fluctuation seemed to decrease the thickness of the cochlear partition by more than 10 pixels (Fig. 5b). Figure 5c–e show the intensity distributions of the partition along the depth axis. The distance between two peaks at the edge of the partition at 30 ms was similar to those at 130 ms (Fig. 5c, d). By contrast, the distance at 700 ms significantly shrank (Fig. 5e) this deformation seems to arise from the micrometer-scale physiological motion in lateral and/or axial directions. In vibrometry with conventional OCT, the total acquisition time was more than 1 s. Therefore, the physiological motion smears out the averaged intensity distribution, resulting in loss of spatial accuracy. To avoid this effect, we set the total acquisition time to 100 ms in this study. Of note, in this series of experiments, the image of the cochlear cross section was obtained through the intact round window in a live guinea pig. The data were taken from a location similar to that of Fig. 4.

Fig. 5
figure 5

In vivo OCT and M-scan imaging of the cochlea partition without sound stimuli in a live guinea pig. a OCT image of the partition. Red line indicates the selected line for b. b M-scan image of the line in a. The yellow dotted lines are the times of the A-scan shown in ce. ce Intensity distribution in OCT image at different times. ce show the intensities in the time of 30 ms, 130 ms, and 700 ms, respectively. Filled arrowheads indicate the peaks at the edge of the partition (color figure online)

3.5 In vivo measurements of the motion in apical/basal region of OHCs and BM

We also examined the effectiveness of the SC light source power on the OCT images and vibration measurements in vivo. The data were recorded by varying the estimated light intensities on the epithelium (2.0–47.6 mW) (Fig. 6). Figure 6a shows the OCT images of the sensory epithelium. Under 2.0 mW intensity, the epithelium was insufficiently visualized and when the intensity increased to 4.1 mW, an outline of the epithelium appeared. The cluster of OHCs, however, was not clearly distinguishable. At an intensity greater than 14.2 mW, structures inside the epithelium were observed. To further test the vibrometry data, we applied 24,000 Hz sound with 90 dB sound pressure level (SPL) to the animal. Figure 6b–e presents data on SNR in the signal intensity, the noise floor (NF), amplitude, and phase of the vibration under sound stimulation of 24,000 Hz at 90 dB. These data were averaged over 10,000 A-scans acquired in one recording session. Based on the criteria described in Fig. 4d, at the 47.6 mW measurement, we detected the two SNR peaks in the region of OHCs (Fig. 6b, light blue and purple arrowheads). One was neighboring the endolymph, and another was next to supporting cells. Because these landmarks corresponded to the border zones of the OHCs, these SNR peaks in the zone were denoted as the apiOHCs and basoOHCs. Note that SNR in the OHCs region was significantly enhanced by ~ 10 dB at 47.6 mW compared to the corresponding region at 4.1 mW (Fig. 6b). A similar improvement was also observed in NF; in the majority of the region inside the OHCs, the value at 47.6 mW was half of that at 4.1 mW (Fig. 6c). Because of this low NF at 47.6 mW, all the amplitude’s data points between apiOHCS and basoOHCs exceeded LOD (LOD = NF + 5SD) (Fig. 6d, red line). The spatial distribution pattern of these data points was relatively smooth. In contrast, the amplitude data was unstable along the depth axis at the light intensity of 14.2 mW, and significantly attenuated at intensities less than 4.1 mW. These impairments may be attributed to high NF arising from the low SNR (Fig. 6b, c), which increased the LOD values to approximately 4 and 5 nm at 14.2 and 2.0 mW, respectively (Fig. 6d, gray line). In terms of the vibration phase, the values inside the region where the vibration amplitude satisfied the LOD were almost flat whereas those in the region where the amplitude was less than the LOD had fluctuated (Fig. 6e). Compared to the LOD, vibration amplitudes were spatially significant at 47.6 mW. Therefore, we chose this intensity and LOD in subsequent in vivo assays.

Fig. 6
figure 6

Analyses of the sensory epithelium at varying light powers with the optical coherence tomography (OCT) system. a OCT images of the epithelium in the basal turn of the cochlea through the round window membrane. Red arrowheads show the selected line in be. Scale bar in the right panel is 50 μm. b Signal-to-noise ratio (SNR) in the intensities of the averaged A-scan image during the sound stimulation. Blue, purple, green, and open arrowheads exhibit cochlear partition, apiOHCs, basoOHCs, the basilar membrane (BM), and supporting cells region. Blue and purple shadings represent OHCs region, which is defined in the data under the light power of 47.6 mW. The intensity is referenced to the data from inside the fluid space above the epithelium, the endolymph. c Noise floor (NF) of the vibration amplitude on a linear scale. As power increases, the NF decreases. d NF, the limit of detection (LOD), and vibration amplitudes in response to pure-tone stimulation at 24,000 Hz with 90 dB on a logarithmic scale. Black and gray lines indicate NF and LOD (LOD = NF + 5SD). e Responses of the vibration phase in the epithelium. As the power is gained, phase fluctuation is suppressed due to the improvement in SNR. Blue, purple, green, and open arrowheads exhibit apiOHCs, basoOHCs, basilar membrane, and supporting cell regions (color figure online)

A different guinea pig was tested further, and the OCT image and vibration data obtained under the control and post-mortem conditions are shown in Fig. 7. Figure 7a summarizes the schematics of the beam, applied sound, and evoked vibration. The range scanned in the epithelium was 300 μm (Fig. 7b). The focus of the objective lens was positioned at the depth of the OHCs. On the basis of the structural information from ex vivo experiments, we identified common landmarks mentioned in Fig. 4 (Fig. S4 (a) and (b)). We also detected the tectorial membrane, and collagen-fiber layer of the BM (Fig. 7b green and yellow arrowheads). The additional peak inside the endolymphatic region and the dip in the bimodal peak of the BM signal reflect the tectorial membrane and collagen-fiber layer of BM, respectively (Fig. 7c, green and yellow arrowheads). These observations indicate that the spatial resolution of our system was higher than that of conventional systems [15, 17,18,19, 21, 22, 25]. For vibrometry, the scanning beam was held at the location on the OHCs (Fig. 7b, red arrowheads). The signal intensity, amplitude, and phase of the vibration were then acquired while stimulating the ear canal with a pure-tone sound. The sound frequency was 23,000 Hz, and SPLs ranged from 50 to 90 dB. In mammalian cochleae, biomechanical properties of the organ of Corti tune each region to a specific frequency of sound; the base of the cochlea resonates at higher frequencies, and the apex at lower frequencies. Because our recording point observed through the round window was located in the base, the high-frequency sound elicited a larger vibration in the epithelium than low-frequency sound. The sound frequency that maximally vibrates the epithelium with small SPLs at the recording point is known as the best frequency (BF) [33]. In BF and near-BF, the amplitude response to different SPLs exhibits compressive growth [8]. According to the frequency response of the epithelial vibration recorded with an interferometer, we noted the frequencies from 22,000 to 24,000 Hz as near-BF and those of < 22,000 Hz as sub-BF throughout this study as previously reported [30]. In this trial, we chose 23,000 Hz as a near-BF stimulation.

Fig. 7
figure 7

Optical coherence tomography (OCT) analyses of the sensory epithelium under near-best frequency (BF) stimulation. Right and left panels show the data under physiological and post-mortem conditions in an animal, respectively. a Depiction of the location of recording and sound-elicited vibration. b OCT images of a guinea pig’s epithelium in the basal turn of the cochlea. The red, green, and yellow arrowheads show the selected line for ce, tectorial membrane, and collagen-fiber layer of the basilar membrane (BM), respectively. c The intensity of the OCT image on the selected line. The stimulation results are plotted in 10 dB increments from 50 to 90 dB SPL under physiological and post-mortem conditions. In this and subsequent illustrations, levels are arranged in a chromatic sequence. The data used for the analysis in Fig. 8 are open circles. The green, yellow and open arrowheads indicate the tectorial membrane, collagen-fiber layer of the BM and supporting cells region. d Vibration amplitudes inside the epithelium in response to pure-tone stimulation at 23,000 Hz. The mild convergence of the peaks toward that under 90 dB stimulation indicates that the vibrations under physiological conditions are compressive (left panel). Amplitudes measured 30 min after anoxia scale linearly with sound pressure (right panel). Gray lines indicate LOD. e Responses of the vibration phase inside the epithelium. The divergence of the response from positive value implies that phase leads that of the other part in the epithelium. In the phase responses under post-mortem condition, phases overlapped in various sound pressures. Phases are plotted in 10 dB increments in stimulation from 50 to 90 dB SPL under physiological and post-mortem conditions (color figure online)

The axial M-scan data recorded from the aforementioned regions were analyzed with different stimulus intensities and the results are displayed in Fig. 7c–e. Figure 7c shows the signal intensity on the axial M-scan that included OHC signal in the recording. The amplitude and phase of vibrations are presented in Fig. 7d, e, respectively. The signal intensity was obtained from a temporal average of 10,000 A-scans over the total acquisition time, as opposed to a single A-scan data selected from a B-scan. The intensities of apiOHCs, basoOHCs, and BM can be visualized above the background signal inside the lymph fluids by over 20 dB, whereas the intensity was minimum at the location of the supporting cells (Fig. 7c, open arrowheads). These observations were consistent with those in the dissected cochlea (see Fig. 4d). In addition, under both control and post-mortem conditions, there was little detected variation in signal intensity throughout the recording despite changes in the sound pressure. This result implies that there was a significant spatial accuracy in vibrometry. In other words, the beam was negligibly moved from the initial target onto the different cells or structures during measurement. The behavior of the epithelial vibrations is shown in Fig. 7d, e. Of note, in conventional OCT systems, although the apiOHCs and basoOHCs are separately visualized, vibrations in these regions have not yet been fully analyzed [15,16,17,18,19, 22, 23, 36, 37]. Under the acoustic stimuli of ≤ 70 dB SPL, the vibration amplitudes of apiOHCs and BM were larger than those of basoOHCs. For intense stimuli of ≥ 80 dB SPL, the amplitudes of apiOHCs, basoOHCs, and BM were all similar (Fig. 7d). Figure 7e illustrates the phases in the epithelium. Below 70 dB SPL, the phase in apiOHCs and BM remarkably led the basoOHCs phase. Under intense sound, the phases were almost constant throughout the cochlear partition. Under post-mortem conditions the partition was characterized by completely different behavior. The vibration amplitudes at the basoOHCs exceeded those at the apiOHCs amplitudes, whereas those at BM were almost equal to the apiOHCs amplitudes at all stimulus intensities (Fig. 7d). In addition, at stimuli of ≤ 70 dB SPL, the phase at apiOHCs and BM never led that at basoOHCs (Fig. 7e).

Figure 8 summarizes the foregoing measurements at apiOHCs, basoOHCs, and BM as a function of the stimulation level (for the plotted data, see Fig. 7c–e open circles). Under physiological conditions, the apiOHCs displayed low sensitivity to the intense sound of more than 80 dB and high sensitivity to small-to-moderate sound (Fig. 8a, solid lines). This result is consistent with the compressive amplification for stimuli reported in the previous studies [8, 15, 17, 20]. The BM sensitivity was also high to moderate sound stimuli. In contrast, the basoOHCs sensitivities little changed and remained almost equivalent to those under the post-mortem conditions, indicating that basoOHCs motion was not amplified. Under post-mortem conditions, the sensitivities were stably low for various sound pressures; the values at any data point were less than those under control conditions (Fig. 8a, dotted lines). Figure 8b shows the phase plots with reference to data at 90 dB SPL. Under physiological conditions, the phases of both apiOHCs and BM gradually shifted as the sound decreased, and at 50 dB, 22.8° and 15.3° of the phase lead were measured, respectively. Whereas the basoOHCs’ phase minimally changed throughout all stimulus intensities. Post-mortem, the phases of apiOHCs and BM were similar to those of basoOHCs and showed no intensity dependence indicating a linear or passive system (Fig. 8b, dotted lines). The trend in the above vibration measurement was repeatedly confirmed in different animals with similar results (Fig. 8c, d). These results indicated that the basoOHCs vibrations did not occur in synchrony with the mechanical activity of OHCs. To summarize the relative amplitude and phase of vibration in apiOHCs, basoOHCs, and BM under physiological and post-mortem conditions, we displayed the schematics of the vibration at different SPLs in Fig. 8e, f. Under physiological conditions at 60 dB SPL stimuli, the vibration amplitude of apiOHCs and BM exceeded that of basoOHCs. In addition, the phase of apiOHCs’ vibration led that of BM by ~ 15°. By contrast, the phase of basoOHCs’ vibration lagged that of BM by ~ 15°. These properties were not remarkable at 80 dB SPL stimuli or under post-mortem conditions. Taken together, the difference in the amplitudes and phases of apiOHCs and BM between control and post-mortem conditions indicates that the OHCs’ force contributed to the amplification of the cellular vibration under physiological conditions. Note that we additionally evaluated the vibration of apiOHCs, basoOHCs, and BM in sub-BF stimulation in the same animals used in Figs. 7 and 8 (Fig. S7 and S8). In sub-BF, the amplitude and phase responses to different SPLs exhibited linear growth, and the OHCs activity did not contribute to the epithelial vibration.

Fig. 8
figure 8

Summary profile of the vibrations at apical region of outer hair cells (apiOHCs), basal region of OHCs (basoOHCs), and basilar membrane (BM) in one and different guinea pigs (n = 4). Solid and dotted lines indicate the data under physiological and post-mortem conditions, respectively. Level functions in a live animal relate the active motion of OHCs to the strength of stimulation. In the phase plots, the phase is referenced to the data at 90 dB SPL. a The sensitivities under physiological and post-mortem conditions in apiOHCs, basoOHCs, and BM. Sensitivity is calculated from the amplitudes and pressures of sound. The response after anoxia (dotted lines) reveals the behavior of the passive cochlea. At the location of apiOHCs and BM, the physiological data (solid lines) demonstrate higher sensitivity with small-to-moderate stimuli than those under the post-mortem condition. b The phases under physiological and post-mortem conditions in apiOHCs, basoOHCs, and BM. In the high sensitive behavior at apiOHCs and BM during the small-to-moderate stimuli, these phases lead those under 90 dB SPL stimuli. At the location of basoOHCs, neither sensitivity nor phase show any remarkable difference between the physiological and post-mortem conditions. c, d Grouped sensitivities and phases of apiOHCs, basoOHCs, and BM and their comparison. e, f Schematics of the apiOHCs, basoOHCs, and BM vibrations at the sound level of 60 and 80 dB SPL under physiological and post-mortem conditions. In 60 dB stimulation, vibration amplitudes at apiOHCs and BM are approximately twice greater than those at basoOHCs. The phase at apiOHCs leads that at BM by ~ 15°, whereas the phase at basoOHCs lags that at BM by ~ 15°. At 80 dB or post-mortem condition, apiOHCs, basoOHCs, and BM vibrate in phase with similar amplitudes

4 Discussion and conclusion

The achievements of our study are the optimization of a SD-OCT system for high-resolution in vivo imaging and vibrometry in biological material, specifically the sensory epithelium. We described the optical design of an SCSD-OCT system, and demonstrated that the system can image inside of the extremely low reflective material and measure nanometer-scale vibrations in both the apical and basal regions of OHCs as well as BM in live guinea pigs. To focus on the performance of our system in this study, we showed the sound intensity dependence of the vibrations in three regions without detailed frequency dependence in a limited number of guinea pigs. Frequency dependence is likely to reveal the biophysical background of the vibration, as such, it should be undertaken in future experiments.

Conventional OCT vibrometry systems visualize the structure inside the organ of Corti involving apiOHCs and basoOHCs, the tunnel of Corti, and the outer and inner tunnels in the B-scan. However, vibration data in the two regions of OHCs in vivo have not yet been fully analyzed owing to the low spatial resolution [13, 15,16,17,18,19,20,21,22, 24, 25]. One of the features of our system is its high axial resolution of ~ 1.72 μm in air. In most conventional OCT vibrometry systems subjected to the recordings of cochlear vibrations, the spatial resolutions exceed 3.4 μm [15,16,17,18,19,20,21,22, 24, 25]. As the length of OHCs is approximately 40–50 μm, the pixels inside the OHCs are likely to be less than 12 in those recordings. Comparatively, there was almost twice the number of pixels occupied in this study (Fig. 4d). This axial resolution is equivalent to a high resolution of ‘cellular-level’ OCT systems described in the literature [41,42,43].

OCT performance is dependent on the sensitivity in the detector (S), which can be described by a relation [39]:

$$S \propto \frac{{\rho AR_{{\text{s}}} }}{2}\Delta t,$$
(2)

where ρ is the detector responsivity, A is the power of the light source, Rs is the reflectivity of sample, and Δt is the camera integration time [44,45,46]. For a transparent sensory epithelium, Rs is only less than 0.04% [27]. To obtain sufficient sensitivity, an increase in Δt as well as A was necessary. However, one of the most serious considerations for in vivo vibrometry is to control the noise arising from the physiological motions in the physiological preparation. The amplitudes and frequencies of motions including heartbeat and respiration are mostly high and low, respectively. Thus, a long total acquisition time can smear out the intensity distribution along the depth and degrade the spatial accuracy in vibrometry as we show in Fig. 5. To overcome this shortcoming, we set the total acquisition time to be as short as possible while strengthening light power.

Because of the features and adjustments above, the system that we constructed successfully captured high-resolution images and sub-nanoscale vibrations in the apical and basal regions of OHCs in vivo. An ultra-high-resolution OCT with an SC source was previously applied to record the image for guinea pig cochlea ex vivo [38]. However, its sampling rate was 40 Hz, which was insufficient for high-frequency vibrometry. This limitation was overcame by the sampling rate by the performance of the SCSD-OCT system described in the present study. Therefore, the vibrometer was capable of visualizing previously unidentified mechanical properties of live OHCs.

The power of the beam onto the sample in our system was 117 mW, significantly higher than that in previous OCT systems [12,13,14,15,16,17,18, 20,21,22,23,24,25, 36,37,38,39]. Because the total acquisition time is 0.1 s, total irradiation energy in one session of vibrometry is ~ 11.7 mW·s. This total energy is equivalent to or less than that in other recent studies [13, 18, 19, 24, 25]. For example, in the recording with another commercial SD-OCT system in the same location and same animal, the total irradiation energy was ~ 100 mW·s (3.7 mW with the total acquisition time of 27.2 s). Together with the minor hearing threshold shifts before and after the 2 h irradiation in Fig. S3, the power of the light source in our study seems to be harmless to the functions of animals.

Our current understanding of biophysical function in ex vivo OHCs has expanded in recent decades [5, 8, 9]. Electrophysiological and molecular biological approaches have shown two representative functions. First, tilting the hair bundle atop OHCs allows nonlinear flow of electrical current through the mechanoelectrical channel mainly composed of transmembrane channel-like (TMC) proteins [47, 48]. Second, the stimulated OHCs produce force owing to the conformational change of the motor protein, prestin [5, 6, 10, 49]. Thus, the motility of OHCs is dependent on the current, and is known as electromotility [5, 8, 9]. As for the in vivo function of OHCs, recent theoretical and experimental approaches have suggested that the physical environment surrounding OHCs can change their responses in vivo [50, 51]. In vivo recording of the motion of electromotility of OHC is, therefore, one of the biggest issues in the field of auditory physiology. Our data in the cluster of three OHCs in Fig. 7 contains the motion in different OHCs. Conversely, all three rows of OHCs are likely to vibrate as a single cluster at the basal turn in the in vivo guinea pigs [17]. Consequently, our data clarify one aspect of the in vivo OHCs’ electromotility.

It has been reported that the vibration amplitude of OHCs is significantly amplified, and its phase leads that at BM under the near-BF stimuli in guinea pigs, gerbils, and mice [15, 18, 19, 31]. Also in our study, vibration was amplified and its phase led in apiOHCs. These observations are consistent with previous discoveries. By contrast, the amplitude in basoOHCs is not actively amplified, and its phase lags relative to the phase of BM in our data (Fig. 8i). These results suggest that the electromotility force is unlikely to directly transmitted to BM via basal region of OHCs. Inner/outer pillar cells, the phalangeal process of Deiters’ cells, and other cellular structures are possible candidates for other pathway. Accordingly, the force transmission mechanism of the electromotility inside the epithelium may differ among species. Also, in reference to the phase of basoOHCs, apiOHCs motion leads by ~ 30 degrees under near-BF stimulation in the weak intensities. This phase lead was consistent with the active phase lead recorded during in vitro hair bundle behavior [3, 52, 53], and seems to contribute to the active amplification through the negative stiffness of the bundle. Similarly to our data, the phase lead of the bundle is not remarkable in strong stimuli.