Participants
The study was approved by the Nottingham one Research Ethics Committee (REC reference: 12/EM/0016) and was sponsored by Nottingham University Hospitals NHS Trust (Research & Innovation reference: 11IH007). All participants were native English speakers with self-reported normal or corrected-to-normal vision, without any known language, cognitive or motor disorder or previous brain injury. Three patients and two control subjects were left handed. All participants gave written informed consent before taking part.
Seventeen adults with bilateral profound deafness who had consented to cochlear implantation were recruited through the Nottingham Auditory Implant Programme. All participants met UK national guidelines for cochlear implantation (NICE 2009). Namely, participants had unaided pure-tone air conduction thresholds of ≥ 90 dB hearing level at 2 and 4 kHz in both ears, a best-aided auditory word recognition score of ≤ 50 % on the Bamford-Kowal-Bench (BKB) sentence test (Bench et al. 1979), and had been deemed suitable CI candidates by the Nottingham Auditory Implant Programme. For clinical characteristics of the sample, see Table 1. All participants were implanted unilaterally with a Cochlear™ Nucleus® 6 device with CP910 sound processor that employed the advanced combination encoder (ACE™) stimulation strategy. None of the participants experienced any complications during their CI surgery and no abnormalities were identified on post-operative X-ray. Furthermore, for all participants, all implantable electrodes were situated within the cochlea and post-operative impedances were within normal range on all electrodes. All participants were stimulated in monopolar configuration, and comfort and threshold levels were estimated for each electrode position by the clinical team according to standard clinical protocols.
Table 1 Clinical characteristics of the sample Seventeen normal-hearing (NH) adults were also recruited to serve as a control group. The group’s mean age (57 years, SD = 16.8) was approximately matched to that of the CI users mean age (58 years, SD = 13.9). All participants had normal hearing thresholds, defined here as average pure-tone air conduction hearing thresholds of ≤ 20 dB (dB) across frequencies 0.5, 1, 2 and 4 kHz in both ears.
Experimental Design
Pre-operative brain imaging using fNIRS was conducted at the participants’ earliest convenience after having consented to receive a CI, but before undergoing surgery (T0). At T0, CI users were tested in their best-aided condition, i.e. wearing their hearing aids if they used them in everyday life (see Table 1). Brain imaging was also conducted with NH control subjects to enable group comparisons of cortical activation. Behavioural measures of visual speechreading ability were also obtained at T0 for both groups. Post-operative behavioural measures of auditory speech understanding (CI outcome) were obtained in the same individuals approximately 6 months after activation of their CI device (T1, average duration of CI use = 6.13 months, SD = 0.4). At T1, CI users were tested in their best-aided condition wearing their preferred listening devices (i.e. CI and optional contralateral hearing aid). The mean retest interval between T0 and T1 for CI users was 8.2 months (SD = 1.2).
Testing Conditions
Testing was carried out in a double-walled sound-attenuated booth. Participants were seated in front of a visual display unit at a viewing distance of 1 m, with a centrally located Genelec 8030A loudspeaker mounted immediately above and behind the visual display unit. All stimuli were presented using the MATLAB® computing environment (Release 2014b, The MathWorks, Natick, MA). Visual components of the stimuli were presented on the visual display unit. To reflect the typical level of conversational speech, auditory components were presented through the loudspeaker at 65 dB SPL (A-weighted root-mean-square sound pressure level averaged over the duration of each sentence.). This was measured at the listening position with the participant absent using a Brüel & Kjær 2250 sound level metre and free-field microphone (Type 4189). Prior to the commencement of each test, participants were provided with written instructions to ensure understanding and consistency of instructions given.
fNIRS Data Acquisition
At T0, cortical activation was measured using a continuous-wave fNIRS system (ETG-4000, Hitachi Medical Co., Japan). The ETG-4000 is a commercial system that emits a continuous beam of light into the cortex and samples at a rate of 10 Hz. The system measures simultaneously at two wavelengths, 695 nm and 830 nm, to allow for the separate measurement of changes in oxygenated haemoglobin (HbO) and deoxygenated haemoglobin (HbR) concentrations. This specific choice of wavelengths has been shown to minimise cross-talk error between the two chromophores (Sato et al. 2004). A dense sound-absorbing screen was placed between the fNIRS equipment and the participant to attenuate the fan noise generated by the equipment. This resulted in a steady ambient noise level of 38 dB SPL (A-weighted).
fNIRS Stimuli
The Institute of Hearing Research (IHR) Number Sentences (Hall et al. 2005) were presented as speech stimuli during the acquisition of fNIRS measurements. The corpus comprised digital audio-visual recordings of 90 sentences, each spoken by both a male and female talker. Each of the sentences contained between four and seven words, three of which were designated keywords. For the purpose of this experiment, the speech material was presented in a visual-only condition (V-ONLY, i.e. speechreading) where the visual component of the recording was shown but the auditory component was muted. The speech material was also presented in an auditory (A-ONLY) and audio-visual (AV) condition that is reported and analysed elsewhere. Rest periods consisted of a uniform background with a fixation cross-presented in place of the talker’s mouth.
fNIRS Paradigm
Thirty IHR number sentences were randomly selected without replacement for presentation in each of the conditions, with the restriction that an equal number were spoken by the male and female talker in each condition. The speech stimuli were presented in a block design paradigm interleaved with rest periods. Each block comprised six concatenated sentences, evenly spaced to fill a 24-s block duration. Five blocks were presented for each stimulus condition. During these blocks, the participants were instructed to attend to the talker and to always try to understand what the talker was saying. To encourage sustained attention throughout the experiment, an attentional trial was presented after two of the 15 stimulus blocks. These blocks were chosen at random, and therefore, the attentional trials occurred at unpredictable positions within the experimental run. Two seconds after the cessation of a chosen block, two alternative words were presented on either side of the fixation cross; in a two-alternative forced choice task, participants were asked to press one of two buttons to indicate which word had been spoken in the immediately preceding sentence. Following the participant’s response, an additional 5-s rest was added to the start of the ensuing rest period. Rest periods were included to allow the haemodynamic response elicited by the stimulation block to return to a baseline level. The durations of the rest periods were randomly varied between 20 and 40 s in 5 s increments.
Prior to fNIRS scanning, participants first completed a short familiarisation run to ensure that they understood the experimental procedure. During the familiarisation session, one block of each of the conditions was presented. In order to avoid pre-exposure to the experimental stimuli, the familiarisation blocks comprised speech material (BKB sentences (Bench et al. 1979)) that was different from the material presented during the fNIRS measurements and the subsequent behavioural testing. Following each stimulation block, an example of the attentional control task was also presented.
Optode Placement
Two 3 × 3 optode arrays were placed bilaterally over the participant’s temporal lobes. Together, these comprised ten emitter and eight detector optodes with a fixed inter-optode distance of 30 mm, providing a penetration depth into the cortex of approximately 15 mm (Strangman et al. 2014). This resulted in a total of 24 measurement channels (12 per hemisphere).
The optode arrays were positioned on the participant’s head so as to ensure good coverage of the STC. Optode positioning was guided by the International 10-20 System (Jasper 1958) to promote consistency across participants and test sessions. Specifically, on each side, the lowermost source optode was placed as close as possible to the pre-auricular point, with the uppermost source optode aligned towards Cz. Consistency of optode positioning across test sessions at the individual level was further ensured by reference to photographs taken during the initial testing session.
To evaluate the consistency of optode positioning across individuals, the procedure was piloted on six adult volunteers who did not take part in the main experiment. After positioning the arrays as described above, the optode positions, plus anatomical surface landmarks, were recorded using the Hitachi ETG-4000’s electromagnetic 3D Probe Positioning Unit. For each volunteer, the digitised optode positions were registered to a standard atlas brain, ‘Colin27’(Collins et al. 1998), using the AtlasViewer tool (Aasted et al. 2015), allowing their locations to be visualised relative to underlying cortical anatomy. The standard deviation in the position of each optode was between 2.9 and 8.8 mm. Assessment of the mean optode positions suggested that the array provided good coverage of STC (Fig. 1).
Definition of Region of Interest
The region of interest (ROI) was the posterior portion of bilateral superior temporal cortex (STC), based on evidence that speech is processed in the temporal lobes bilaterally (Hickok and Poeppel 2007) and that fNIRS responses to speech are also expressed bilaterally in these regions (Wiggins et al. 2016). Examples of deafness-induced cross-modal plasticity have been reported in both hemispheres (Buckley and Tobey 2011; Chen et al. 2016; Doucet et al. 2006; Strelnikov et al. 2013); however, the precise role of plasticity in each hemisphere remains uncertain (Anderson et al. 2017a). Therefore, in the first instance, we examined activation bilaterally. However, recognising that each hemisphere has a different specialisation with regard to speech processing (Cardin et al. 2013; Hall et al. 2005; Lazard et al. 2012b; Zatorre and Belin 2001), in follow-up analyses, we examined each hemisphere separately.
In order to assess the sensitivity of our fNIRS measurements to the underlying cortical regions, using the AtlasViewer tool (Aasted et al. 2015), a Monte Carlo code for simulating the probabilistic path of photon migration through the head (Boas et al. 2002) (‘tMCimg’) was run with 1 × 107 simulated photons launched from each optode position. The resultant sensitivity profiles suggested that channels #9, 10 and 12 (left hemisphere) and channels #20, 21 and 23 (right hemisphere) provided appropriate sensitivity to the posterior portion of STC (as reported in references (Anderson et al. 2017b; Wiggins et al. 2016)).
Behavioural Test of Speech Understanding
The CUNY (City University of New York) Sentence Lists (Boothroyd et al. 1985) were employed to obtain a measure of speech understanding. The CUNY corpus was employed primarily due to its routine use as a clinical outcome measure by CI programmes across the UK. Additionally, this corpus was not presented during fNIRS scanning, thus helping to limit training effects within and across testing sessions. The CUNY Sentence Lists include 25 standardised lists each comprising 12 sentences that vary in length and topic. Each list contains between 101 and 103 words spoken by a male talker. Two CUNY lists (i.e. 24 sentences) were randomly selected without replacement for presentation in each stimulation condition. Speech understanding was measured in A-ONLY, V-ONLY and AV conditions. However, for the purposes of the present study, we focus only on speechreading ability before implantation (T0) and auditory ability following 6 months of CI use (T1) as a measure of CI outcome. Whilst AV speech recognition is important in everyday life to CI users, traditionally, both pre-operative CI candidacy and post-operative CI outcome are assessed by A-ONLY performance in UK clinics. Separate analysis of AV speech recognition using an additive model is fully reported in CAA’s doctoral thesis (Anderson 2016).
The 24 sentences were presented in random order. After each sentence presentation, the participant was instructed to repeat back all words that they were able to identify. All words correctly reported by the participant were recorded by the researcher on a scoring laptop before initiation of the next trial. The scoring method ignored errors of case or declensions. Prior to commencement of speech understanding testing, all participants completed a short familiarisation run. BKB sentences were employed during the familiarisation run in order to avoid pre-exposure to the CUNY corpus.
Pre-processing of fNIRS Data
We used analysis methods similar to those used in a number of previous studies conducted in our laboratory (Dewey and Hartley 2015; Wiggins and Hartley 2015; Wiggins et al. 2016). Raw fNIRS recordings were exported from the Hitachi ETG-4000 into MATLAB for use with routines provided in the HOMER2 package (Huppert et al. 2009) and custom scripts. Raw light intensity measurements were first converted to change in optical density (Huppert et al. 2009). Wavelet motion correction was then performed to reduce the impact of motion artefacts on the fNIRS signal. Wavelet filtering can enhance data yield and has emerged as a favourable approach for use with fNIRS data (Molavi and Dumont 2012). The HOMER2 hmrMotionCorrectWavelet function (based on Molavi and Dumont 2012) was used which assumes that the wavelet coefficients have a Gaussian probability distribution and so applies a probability threshold to remove outlying wavelet coefficients that are assumed to correspond to motion artefacts. A probability threshold was set to exclude coefficients lying more than 1.5 inter-quartile ranges below the first quartile or above the third quartile.
Following motion-artefact correction, a bandpass filter of 0.01–0.5 Hz was applied to reduce sources of physiological noise in the data including high-frequency cardiac oscillations, low-frequency respiration and blood pressure changes. The fNIRS signal was next converted into estimates of changes in HbO and HbR using the modified Beer-Lambert law with a default differential path-length factor of six (Huppert et al. 2009). As bandpass filtering is unable to remove all physiological noise from fNIRS recordings (Huppert et al. 2009), the haemodynamic signal separation method of Yamada et al. (Yamada et al. 2012) was also applied. This algorithm separates the fNIRS signal into estimates of the functional and systemic components, based on expected differences in the correlation between HbO and HbR in each component. Specifically, a positive correlation between changes in HbO and HbR is assumed in the systemic component, whereas a negative correlation is assumed in the functional component. The functional component of the signal was identified by the algorithm, extracted from the fNIRS signal and retained for further analysis.
In order to quantify the level of cortical activation, the pre-processed fNIRS signal was subjected to an ordinary least squares general linear model (GLM). The GLM design matrix included three boxcar regressors, one for each stimulation condition. The two response periods following the two attentional trials were also modelled in the design matrix as transient events occurring at the time the two words were presented on screen. All regressors were convolved with the canonical haemodynamic response function provided in SPM8 (http://www.fil.ion.ucl.ac.uk/spm). After completing the first-stage OLS estimation at the single-subject level, we used the Cochrane-Orcutt procedure (Cochrane and Orcutt 1949) to correct for serial correlation. Briefly, this involved fitting a first-order autoregressive process to the model residuals and transforming the original model according to the estimated autoregressive parameter (see Plichta et al. 2007). We then re-estimated the beta weights based on the transformed model (second stage).
The beta weights of the canonical HRF term were extracted for each stimulation condition, at each measurement channel, and for each participant. As described above, the haemodynamic signal separation method employed here (Yamada et al. 2012) assumes a fixed linear relationship between HbO and HbR in the functional response. Therefore, the results of all statistical analyses are identical regardless of whether conducted on the beta weights extracted for the HbO or HbR parameter. For simplicity, only results pertaining to the beta estimates of the HbO parameter of the functional component are presented here. These beta weights were used to quantify the amplitude of cortical activation to speech compared to rest. The resultant beta weights were averaged across the ROI measurement channels and were subjected to further statistical analysis as outlined below.
Pre-processing of Behavioural Data
Auditory speech understanding and speechreading ability, measured using the CUNY Sentence Lists, were quantified as the percentage of words reported correctly (% correct). In order to make the data more suitable for statistical analysis, the rationalised arcsine transform (Studebaker 1985) was applied using Matlab. Firstly, the arcsine transform (T) was applied as follows:
$$ T=\mathrm{arcsine}\sqrt{\frac{X}{N+1}}+\mathrm{arcsine}\sqrt{\frac{X+1}{N+1}} $$
The ‘asin’ function in Matlab was used to return the inverse sine (arcsine) for each value of X, where X represents the total number of words reported correctly and N represents the total number of words presented. This was then transformed linearly:
where R indicates the resulting rationalised arcsine-transformed score (rationalised arcsine unit, RAU). This transformation extends the original percent correct scale outwards in both directions from 50 %, creating bigger differences as the extremes of the range are approached. Consequently, this transformation makes the rationalised arcsine scale linear and additive in its proportions whilst producing values close to the original percentage scores for values between approximately 15 and 85 % (Studebaker 1985). Subsequently, the transformed scores were subjected to statistical analysis.
Statistical Analysis
Following the pre-processing of neuroimaging and behavioural data, resultant data were analysed using IBM® SPSS® Statistics software (Release 22.0, Armonk, NY: IBM Corp.). Bivariate linear regression analysis was performed to test whether bilateral STC response to visual speech before implantation was predictive of future CI outcome. Normality of the distribution of bilateral STC activation to visual speech was confirmed. Whilst the Kolmogorov-Smirnov test indicated that the distribution of CI outcome data did not significantly differ from normality, visual inspection of the histogram did indicate slight negative skew, despite applying the rationalised arcsine transform to the raw performance data. This skew was somewhat anticipated given the significant benefits that cochlear implantation can provide, particularly within the first 6 months following implantation (Lenarz et al. 2012). However, post-hoc diagnostic measures of the regression model verified that the assumptions of bivariate linear regression were met: a scatterplot indicated linearity between the predictor and dependent variable, visual inspection of histograms and normal P-P (probability-probability) plots indicated that the standardised residuals of the regression model were normally distributed and that the assumption of homoscedasticity was met.
Multiple regression was conducted to examine whether pre-implant STC activation to visual speech provided incremental predictive value above that of influential clinical characteristics (covariates). For each regression model conducted, the covariate/s of interest was first entered as a predictor variable into block 1, with pre-implant STC activation to visual speech then entered as a predictor into block 2 of the model. For all models, histogram and scatterplots confirmed that the standardised residuals were normally distributed and the assumption of homoscedasticity was met. Furthermore, the Durbin-Watson statistic indicated that the assumption of independent errors was met, and the variance inflation factor indicated that multicollinearity was low between the predictor variables in block 2 of the models and was not problematic.
All data are publicly available through the University of Nottingham’s Research Data Management Repository (https://doi.org/10.17639/nott.322).