Brain Structure and Function

, Volume 220, Issue 2, pp 1109–1125

Effects of congruent and incongruent visual cues on speech perception and brain activity in cochlear implant users

Authors

  • Jae-Jin Song
    • Department of Otorhinolaryngology-Head and Neck SurgerySeoul National University Bundang Hospital
  • Hyo-Jeong Lee
    • Department of Otorhinolaryngology-Head and Neck SurgeryHallym University College of Medicine
    • Sensory Organ Research Institute, Seoul National University Medical Research Center
  • Hyejin Kang
    • Department of Nuclear MedicineSeoul National University Hospital
  • Dong Soo Lee
    • Department of Nuclear MedicineSeoul National University Hospital
  • Sun O. Chang
    • Sensory Organ Research Institute, Seoul National University Medical Research Center
    • Department of Otorhinolaryngology-Head and Neck SurgerySeoul National University Hospital
    • Sensory Organ Research Institute, Seoul National University Medical Research Center
    • Department of Otorhinolaryngology-Head and Neck SurgerySeoul National University Hospital
Original Article

DOI: 10.1007/s00429-013-0704-6

Cite this article as:
Song, J., Lee, H., Kang, H. et al. Brain Struct Funct (2015) 220: 1109. doi:10.1007/s00429-013-0704-6

Abstract

While deafness-induced plasticity has been investigated in the visual and auditory domains, not much is known about language processing in audiovisual multimodal environments for patients with restored hearing via cochlear implant (CI) devices. Here, we examined the effect of agreeing or conflicting visual inputs on auditory processing in deaf patients equipped with degraded artificial hearing. Ten post-lingually deafened CI users with good performance, along with matched control subjects, underwent H215O-positron emission tomography scans while carrying out a behavioral task requiring the extraction of speech information from unimodal auditory stimuli, bimodal audiovisual congruent stimuli, and incongruent stimuli. Regardless of congruency, the control subjects demonstrated activation of the auditory and visual sensory cortices, as well as the superior temporal sulcus, the classical multisensory integration area, indicating a bottom-up multisensory processing strategy. Compared to CI users, the control subjects exhibited activation of the right ventral premotor-supramarginal pathway. In contrast, CI users activated primarily the visual cortices more in the congruent audiovisual condition than in the null condition. In addition, compared to controls, CI users displayed an activation focus in the right amygdala for congruent audiovisual stimuli. The most notable difference between the two groups was an activation focus in the left inferior frontal gyrus in CI users confronted with incongruent audiovisual stimuli, suggesting top-down cognitive modulation for audiovisual conflict. Correlation analysis revealed that good speech performance was positively correlated with right amygdala activity for the congruent condition, but negatively correlated with bilateral visual cortices regardless of congruency. Taken together these results suggest that for multimodal inputs, cochlear implant users are more vision-reliant when processing congruent stimuli and are disturbed more by visual distractors when confronted with incongruent audiovisual stimuli. To cope with this multimodal conflict, CI users activate the left inferior frontal gyrus to adopt a top-down cognitive modulation pathway, whereas normal hearing individuals primarily adopt a bottom-up strategy.

Keywords

Cochlear implantDeafnessPositron emission tomographyAudiovisualPlasticity

Abbreviations

NH

Normal hearing

AV

Audiovisual

STS

Superior temporal sulcus

IFG

Inferior frontal gyrus

CI

Cochlear implant

PET

Positron emission tomography

MRI

Magnetic resonance imaging

CAP

Categories of auditory performance

MTG

Middle temporal gyrus

ITG

Inferior temporal gyrus

rCBF

Regional cerebral blood flow

vPMC

Ventral premotor cortex

SMG

Supramarginal gyrus

SFG

Superior frontal gyrus

MeFG

Medial frontal gyrus

Introduction

To function in multisensory environments, the human brain merges information from multiple sources into a coherent percept to direct attention and to coordinate behavioral responses (Corbetta and Shulman 2002; Werner and Noppeney 2010b; Strelnikov et al. 2011). For example, to process face-to-face communications between normal hearing (NH) individuals, the brain combines cues from the auditory (vocalizations) and visual modalities (orofacial articulatory movements) to reduce noise interference and increase accuracy (Ross et al. 2007; Nath and Beauchamp 2011). However, while congruent orofacial articulatory movements significantly contribute to speech comprehension (Sumby and Pollack 1954; van Wassenhove et al. 2005), incongruent audiovisual (AV) stimuli may lead to novel percepts that match neither the auditory nor the visual information (McGurk and MacDonald 1976). These observations have generated interest in neural substrates specifically involved in AV speech processing, and indeed, regions such as the left superior temporal sulcus (STS) and the inferior frontal gyrus (IFG) have been identified as possible substrates. In NH individuals, the STS has constantly been identified as an AV integrator of both speech and non-speech stimuli (Calvert et al. 2000; Beauchamp et al. 2004a; Miller and D’Esposito 2005; Werner and Noppeney 2010a). In addition, it has been suggested that the IFG has a specific role in the processing of incongruent AV stimuli, potentially reflecting the increased cognitive demands (Hein et al. 2007; Jones and Callan 2003).

The STS in NH individuals shows additive or superadditive activation for AV congruence (Beauchamp 2005; Talsma et al. 2007) and this suggests an important role of the STS in AV categorizations (Werner and Noppeney 2010b). Recent observations of increasing supperadditivity for degraded stimuli (original auditory or visual stimuli combined with noise phase spectra) in the STS (Werner and Noppeney 2010b) are consistent with the inverse effectiveness principle, which states that multisensory enhancement is maximal when the individual stimuli are least effective (Meredith and Stein 1983). A good example of inverse effectiveness is a deaf subject whose auditory modality has been restored by cochlear implant (CI). Even for a proficient CI user, poorly represented temporal fine structure and limitations in encoding spectral cues yield impoverished input to the neocortex when compared to normal acoustic stimulation (Kral and O’Donoghue 2010). Consequently, when processing auditory cues, CI users benefit from congruent visual cues as an important compensatory mechanism, and they show enhanced congruent AV information fusion ability comparable to NH individuals (Doucet et al. 2006; Rouger et al. 2007; Tremblay et al. 2010). In addition, a recent functional imaging study has revealed that the activity of the visual cortex is positively correlated with the proficiency level of auditory recovery (Strelnikov et al. 2013). In addition, recent behavioral studies reveal that proficient CI users, as well as NH subjects, perform equally well for AV incongruent tasks, whereas non-proficient CI users demonstrated inferior results and relied predominantly on visual cues (Tremblay et al. 2010; Champoux et al. 2009). In this regard, proficient CI users have revealed remarkable perception abilities for both congruent and incongruent AV stimuli, whereas AV conflict in non-proficient CI users can be a major obstacle for successful rehabilitation.

In profound deafness, the deprived auditory cortical regions are taken over by intact sensory modalities such as vision (Finney et al. 2001; Rauschecker 1999) as a result of competition for cortical space. In addition, unimodal auditory speech stimuli activate visual cortices more in CI users than NH individuals (Giraud et al. 2001; Giraud and Truy 2002), designating an AV coupling that is progressively tuned after CI. Moreover, unimodal visual speech cues activate auditory phonological regions of CI users more than those of NH individuals (Lee et al. 2007b). Based on these unique activation patterns of the CI users for unimodal speech stimuli, we may surmise the presence of characteristic neural substrates involved in bimodal AV stimuli processing in CI users. However, despite recent advances in understanding of CI users’ ability to process congruent as well as conflicted AV information, the hitherto available studies are based on behavioral approaches, and thus the neural substrates involved in the integration of AV stimuli in CI users remain unknown.

Following the literature reviewed above, we hypothesize that CI users may activate the visual cortex more than NH individuals when processing congruent AV stimuli. In addition, we surmise that this vision-reliant tendency may hinder auditory perception when confronted with incongruent visual cues, and therefore CI users may utilize additional higher-order brain regions such as prefrontal cortices to process incongruent AV information. Hence, the purpose of the current study was twofold. First, by means of H215O-positron emission tomography (PET) (Song et al. 2012a, 2013a), the optimal method for an investigation of CI users because other methodologies such as functional magnetic resonance imaging or magnetoencephalography are not feasible in CI users, we sought to reveal in CI users the neural correlates associated with the processing of congruent AV stimuli, as well as the extraction of target auditory cues in the milieu of reinforcing or distracting visual cues. By comparing these results with those of a matched NH control group, we further characterized differences in strategies used in multimodal speech processing between the CI users and the NH controls. Second, by correlation analyses with speech performance and deafness duration as covariates, we examined cortical regions of activation under AV stimuli that were modulated by deafness-induced plasticity before CI surgery, and cortical regions of activation that were related to CI speech outcome.

Materials and methods

Participants

Twelve post-lingually deafened adult CI users (8 males and 4 females, 31.5 ± 8.0 years) and ten control participants (7 females, 3 males) with normal hearing and vision matched for age and education level were enrolled (Table 1). The study was approved by the institutional review board at Seoul National University Hospital. The mean age of the patients was 31.5 ± 8.0 years (range 19–47 years). Nine of the subjects had a history of idiopathic progressive hearing loss, and the other three had histories of sudden sensorineural hearing loss, sudden hearing loss after febrile illness, and progressive sensorineural hearing loss due to chronic otitis media. All were right-handed (self-reported), had normal/corrected-normal visual acuity, and had no history of mental retardation or neurologic/psychiatric problems. Temporal bone computed tomography and brain magnetic resonance imaging (MRI) were available for all subjects, and no inner ear/cerebral anatomical abnormalities were found.
Table 1

Demographic data of CI user group and control group

Subject code

Age (years)

Sex

CI side

Implanted device

Duration of deafness (years)

Duration of CI usage (months)

Cause of deafness

Word perception test score (%)

Auditory-only condition

P01

23

M

L

Cochlear CI24RCA

0.5

6

PSNHL

85

P02

32

F

R

Cochlear CI24RE(CA)

1

15

PSNHL

77.8

P03

28

F

L

Cochlear CI24RCA

6

45

PSNHL

100

P04

19

M

R

Cochlear CI24RE(CA)

12

18

PSNHL

33

P05

47

F

R

Med-El Combi40+

17

79

FI

45

P06

27

F

R

Cochlear CI22 M

6

141

SSNHL

45

P07

25

M

R

AB Clarion HiRes 90 K/HiFocus

6

48

PSNHL

75

P08

37

M

L

Cochlear CI24RCA

13

42

PSNHL

100

P09

45

F

L

Cochlear CI22 M

0.5

147

COM

80

P10

35

F

L

Cochlear CI24RE(CA)

3

24

PSNHL

100

P11

29

F

R

Cochlear CI24R(CS)

10

76

PSNHL

100

P12

31

F

L

Cochlear CI24RCA

20

52

PSNHL

85

C01

40

F

N/A

N/A

C02

33

M

C03

19

M

C04

19

M

C05

34

M

C06

36

M

C07

33

M

C08

41

F

C09

27

F

C10

32

M

Two patients (P03 and P08) were excluded from both behavioral and PET image analyses due to poor image quality

CI cochlear implant, P CI patients, C controls, M male, F female, R right, L left, PSNHL progressive sensorineural hearing loss, FI febrile illness, SSNHL sudden sensorineural hearing loss, COM chronic otitis media, N/A not applicable

All subjects had bilateral profound hearing loss (≥90 dB HL) preoperatively. The mean deafness duration was 7.6 ± 6.6 years (range 0.5–20 years). In cases of progressive hearing loss, some degree of brain plasticity should develop from the onset of hearing loss, prior to deafness onset (Lazard et al. 2013). However, for our deaf subjects with a long history of progressive hearing loss, the onset of hearing loss based on subjective memory was rather unreliable. Therefore, deafness duration was defined as the time elapsed since the patient could not communicate in the auditory mode even with the best-fitted hearing aid. After more than 3 months of practice with hearing aids, none of our patients reached a threshold of 70 dB (criteria of severe hearing loss) (Song et al. 2009, 2012b, 2013a) for any frequency, and none displayed any improvement in auditory language skills. All enrolled CI users demonstrated good performance as defined by categories of auditory performance (CAP) scores of 6 (understand conversation, no lip reading) or 7 (highest score, use of telephone—known speaker) (Archbold et al. 1995). Post-CI speech scores were measured using the Korean Phonetically Balanced (PB) Word Perception test 1 year after CI (6–13 months after CI, mean ± SD 9.7 ± 2.5 months) (Table 1). The PB word perception test is composed of 40 monosyllabic PB words. Two licensed speech therapists with more than 9 years of experience presented words with auditory cues only, and subjects were instructed to verbally repeat the words. The mean duration after CI surgery was 57.8 ± 44.2 months (range 6–147 months). Deafness duration and post-CI speech score showed no significant correlation (P = 0.46, Spearman’s ρ = −0.27) (Fig. 1).
https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig1_HTML.gif
Fig. 1

Post-CI speech score of CI users plotted as a function of deafness duration, showing no significant correlation (P = 0.46, Spearman’s ρ = −0.27). Blank diamonds highlight two patients with abrupt deafness

Stimulus paradigm

Three Korean native speakers (two men and one woman) were videotaped while pronouncing Korean numbers from 1 to 9. Korean numbers 2, 3, and 4 are monosyllabic, while 1, 5, 6, 7, 8, and 9 are bysillabic words. The differences in syllables were minimized by instructing all three speakers to pronounce each word using a normal pronunciation with an even intonation, vocal intensity and tempo. The production of each stimulus word began and ended in a neutral, closed mouth position for a total duration of approximately 1 s. The duration of each stimulus unit, consisting of a neutral lip position, followed by an articulatory movement, and ending in neutral lip position (800–1,200 ms for each), was 3,000 ms. Video clips and sound tracts were separated and edited monitor using Studio Plus v 10.5 (Pinnacle, Mountain View, CA, USA) to produce four stimulus conditions, i.e., auditory stimuli without any visual cue (A-only), the congruent AV stimuli (con-AVS) in which the sound matched the lip movement, the incongruent AV stimuli (inc-AVS) in which the sound did not match the lip movement, and a baseline condition with a flickering cross at the center of the monitor. For con-AVS and inc-AVS, the beginnings of the mouth movement and presented sound were synchronized. For all conditions, a white crosshair on black background was presented at the center of the screen at the beginning and end of each session.

Before each experiment, all subjects were given a 10-min training session on the stimuli and tasks, practicing all four conditions to ensure that they understood the instructions and all the conditions. However, to maintain a constant attention level, they were not informed of the existence of target stimuli sequences inserted in each session. In other words, a target stimuli sequence (either congruent or incongruent AV) was inserted in between random (mixed congruent and incongruent AV) stimuli sessions, the subjects were misled as if all the AV conditions were composed of mixed congruent and incongruent AV stimuli. Moreover, they were instructed to stare at the monitor throughout the experiment and to click the left button of the mouse just when an even number was heard, irrespective of the visual stimuli. In this way, they perceived both AV stimuli, but had to attend only the auditory information while disregarding visual stimuli. Each condition was presented twice, so that a total of eight scans were acquired for each subject. To present four conditions in a random order to minimize expectations and concentration decline bias, we used the hospital identification numbers as seed numbers in a random number-generation algorithm.

Stimulus presentation and image acquisition

The stimulus units for each condition were arranged and presented by Presentation software version 12.0 (Neurobehavioral system, Albany, CA, USA). The visual stimuli were presented on a 14.1-in. monitor located 80 cm away from the subject’s eyes. The video image was 20.6 cm (8.1 in.) high × 27.7 cm (10.9 in.) wide, centered on the monitor over a black background. The sound was presented at a fixed comfortable listening level (approximately 70 dB SPL) by a headset wide enough to cover the microphone of the CI external device. The input level of the sound stimuli was controlled across words and between individuals by controlling the loudness of sound from the stimulation PC equally. The noise level in the scanning room was approximately 50-dB SPL and the noise attenuation level of the headset was approximately 20-dB SPL, which guaranteed no interruption of auditory stimuli perception. After each stimulus, a log file was automatically created that recorded the response and the time interval between the stimulus and response.

Each scan was obtained for 3 min while presenting each stimulus sets, and a 10–12 min intermission was given between conditions to take into account H215O uptake and washout phases. For the A-only condition, each scan was subdivided into a 4-s instruction session and a 176-s target stimuli sequence. For con-AVS and inc-AVS, the 3-min stimulus session was subdivided into a 4-s instruction session, a 20-s period of random stimuli, a 90-s target stimuli sequence, and a 66-s period of random stimuli. For con-AVS and inc-AVS, both congruent and incongruent AV stimuli were presented during the period of random stimuli to prevent the subject from losing focus during repetitive sequences of congruent or incongruent stimuli.

Depending on the subject’s body weight, an intravenous injection of 46 mCi or less of H215O was delivered simultaneously with the beginning of a new scanning sequence. Image acquisition was performed for 2 min starting from isotope injection, using an ECAT EXACT47 (Siemens-CTI, Knoxville, TN) PET scanner (BGO crystal detector, spatial resolution 6.1 mm, axial resolution 4.3 mm, sensitivity 214 kcps/lCi/min) in two-dimensional mode with a 16.2-cm axial field of view.

A transmission scan was performed using a 68Ga rod source to establish attenuation maps immediately before an emission scan. During the emission scan, 47 slices of brain emission images were acquired over a 2-min period, during which time subjects received minimal sensory input (dimmed light and silence). Emission images were reconstructed in a 128 × 128 × 47 matrix with a pixel size of 2.1 × 2.1 × 3.4 mm, using a filtered back projection method with a Shepp filter with a cut-off value of 0.35 cycles/pixel. All reconstructed images were corrected for attenuation, and the transaxial images were realigned to produce sagittal and coronal images.

Each scan was acquired for 2 min in the three-dimensional mode with retracted interplane septa and dynamic scans. For image analysis, each scan data consisted of the summation of dynamic scans only for 90 s during the period of target stimuli. By inserting the target stimuli between the random stimuli and not informing the participants that each session contained a target stimuli period, we precluded the possibility that different attention levels affected modulation of the auditory association cortices (van Atteveldt et al. 2007).

Data analysis

For behavioral data obtained during scanning, the significance of differences in accuracy and response time (only for correct responses) among three active stimulus conditions within each group and between the CI user and control groups were determined with the Mann–Whitney U test. Statistical analyses were performed using SPSS for Windows version 13.0 (SPSS, Chicago, IL).

Two subjects (P03 and P08) were excluded from both behavioral and PET image analyses due to poor image quality. The two groups remained still matched with regard to age, sex (P = 0.684 and P = 0.143, respectively by the Mann–Whitney U test), and education level. Twenty subjects (ten for each group) were included for the analysis of PET images. Image preprocessing (realignment, spatial normalization, spatial smoothing with a 16-mm full width at half-maximum Gaussian kernel), and statistical analyses were carried out using SPM5 package (Wellcome Department of Cognitive Neurology, London, UK) implemented in Matlab 7.1 (Mathworks Inc., Natick, MA). For all the factorial design specifications, we have used relative threshold masking of 0.8, implicit mask, and proportional scaling. In individual analyses, global normalization of H215O uptake was applied so that the mean count of H215O uptake of PET images from each subject’s brain was arbitrarily set at 50. For each subject, three contrast images were created by subtracting images of the baseline condition from images of the A-only, con-AVS, and inc-AVS conditions. The outputs of these individual analyses were entered into the 2nd-level group analyses. The possible influence of age and sex was factored out by including age and sex as nuisance variables in all group analyses. The locations of significant clusters were determined initially with the Anatomy Toolbox implemented in SPM5 (Eickhoff et al. 2005), and were reconfirmed by referring to the Talairach and Tournoux’s atlas (Talairach and Tornoux 1988).

Group analyses of activated cortical areas were carried out in a voxelwise manner with a flexible factorial design with group and condition as factors by contrasting the brain activities for the three active stimuli conditions with the brain activity for the baseline condition, using a statistical threshold of P = 0.001, uncorrected (k = 25, T = 3.25) (Table 2). Also, as the current study mainly focus on the CI users’ strategy of multisensory perception, two additional intra-group analyses were performed for the CI user group with contrasts the CI user group (inc-AVS–con-AVS) and the CI user group (con-AVS–inc-AVS) using a statistical threshold of uncorrected P = 0.005 (k = 25, T = 2.67) and an inclusive mask at P = 0.005. Between-group analyses were then performed by subtracting the areas of increased H215O uptake in the control group from that of the CI group for three active stimulus conditions or vice versa with a threshold of uncorrected P = 0.005 (k = 25, T = 2.67) (Table 3) and inclusive masks at P = 0.005.
Table 2

Neural activation under A-only, congruent AV, and incongruent AV stimuli

The control group

MNI coordinates (x y z)

BA

Cluster size

T

The CI user group

MNI coordinates (xyz)

BA

Cluster size

T

A-only baseline

A-only baseline

 R MTG

72 −12 −4

21

1,798

6.47

 L MTG

−62 −24 2

22/42

1,251

4.94

 R STG

68 −36 6

22/42

i.above

4.71

 L STG

−60 −2 −4

22

i.above

3.73

 L STG

−66 −26 0

21

1,088

5.35

 L temporal pole

−54 6 −8

22

i.above

3.66

 L MTG

−64 −56 4

21

i.above

4.11

 R STG

72 −14 −2

21

316

4.04

 L thalamus

−14 −16 12

 

173

3.88

     

 R thalamus

4 −8 6

  

3.49

     

Congruent AV baseline

Congruent AV baseline

 R inferior temporal gyrus

56 −56 −16

37

4,125

5.91

 R lingual G

12 −88 −12

17

1,121

4.84

 R temporo-parietal junction

50 −40 20

42

i.above

5.46

 R amygdala

32 4 −14

 

97

3.70

 R lingual gyrus

12 −88 −12

17

1,426

5.81

 L MTG

−68 −30 6

22

34

3.51

 L MTG

−70 −28 0

21

181

5.39

     

 L lingual gyrus

−22 −84 −4

18

49

3.62

     

Incongruent AV baseline

Incongruent AV baseline

 R lingual gyrus

12 −94 −10

17

1,889

6.62

 R calcarine gyrus

12 −94 −6

17

1,248

5.31

 R STG

68 −36 6

22/42

2,168

5.15

 L calcarine gyrus

−6 −104 −6

17

i.above

5.01

 R MTG

72 −12 −6

21

i.above

5.04

 L STG

−68 −30 4

22

934

5.23

 R precentral gyrus

40 −8 26

6

i.above

3.93

 L MTG

−46 −54 2

21

i.above

3.69

 R ITG

58 −58 −16

37

183

4.13

 L Inferior frontal gyrus

−40 20 0

47

675

4.39

 L MTG, middle

−70 −28 −2

21

50

4.09

 R STG

74 −22 0

22

433

4.26

 L MTG, post

−64 −56 4

21

60

3.95

 R pons

6 −12 20

 

202

4.26

Uncorrected P < 0.001, k = 25 voxels, T = 3.25

CI cochlear implant, A-only auditory-only, AV audiovisual, L left, R right, STG superior temporal gyrus, STS superior temporal sulcus, MTG middle temporal gyrus, ITG inferior temporal gyrus, i.above included in the above cluster

Table 3

Areas of relative activation in the CI user group compared to the control group under incongruent AV, congruent AV and A-only conditions, and vice versa

The CI user group > the control group

MNI coordinates (x y z)

BA

Cluster size

T

The control group > the CI group

MNI coordinates (x y z)

BA

Cluster size

T

Under the A-only condition

Under the A-only condition

 No suprathreshold cluster

    

 L MTG

−64 −62 4

39

45

4.74

    

 R ITG

56 −56 −18

37

162

2.93

Under the congruent AV condition

Under the AV congruent condition

 R amygdala

30 2 −12

 

78

3.49

 R vPMC

46 −12 26

6

1,177

4.88

 L hippocampal tail

−22 −44 4

 

38

3.15

 R supramarginal gyrus

46 −38 20

40

i.above

3.68

    

 R ITG

58 −56 −12

37

513

3.55

    

 L inferior occipital

−26 −84 −6

18

83

3.52

Under the incongruent AV condition

Under the incongruent AV condition

 L inferior frontal gyrus

−44 14 8

47

47

3.04

 R vPMC

42 −8 24

6

716

4.57

 L middle temporal sulcus

−42 −52 4

37

48

2.90

 R supramarginal gyrus

54 −38 22

40

57

3.41

    

 R inferior occipital gyrus

16 −98 −22

18

70

3.32

Uncorrected P < 0.005, k = 25 voxels, T = 2.67 (masked at P = 0.005)

CI cochlear implant, A-only auditory-only, AV audiovisual, L left, R right, G gyrus, IFG inferior frontal gyrus, MTG middle temporal gyrus, ITG inferior temporal gyrus, vPMC ventral premotor cortex

In order to disclose the areas of activation in the CI user group relative to those of the control group, a conjunction analysis was performed using a threshold of P = 0.005, uncorrected (T = 2.67) for all three active stimulus conditions. Moreover, group-by-condition interaction analyses were performed by a contrast “the CI user group (inc-AVS–con-AVS)—the control group (inc-AVS–con-AVS)” and vice versa, “the control group (inc-AVS–con-AVS)—the CI user group (inc-AVS–con-AVS)”, with uncorrected P = 0.005 (T = 2.67) (Table 4).
Table 4

Areas activated in the conjunction and interaction analysis

 

MNI coordinates (xyz)

BA

Cluster size

T

Conjunction analysis: areas more activated in the CI user group than in the control group under A-only, incongruent AV, and congruent AV conditions

  L superior frontal gyrus

−26 14 62

6

69

3.30

  L medial frontal gyrus

−14 52 12

10

31

3.00

Interaction analysis: areas of relative activation in CI users under congruent AV condition subtracted from those under incongruent AV condition

  L middle frontal gyrus

−46 30 42

8

4

2.81

Uncorrected P < 0.005, k = 0, T = 2.67

CI cochlear implant, A-only auditory-only, AV audiovisual, L left

Finally, correlation analyses were performed in a voxelwise manner between the contrast images of each condition and (1) deafness duration, (2) word perception score using aforementioned Korean PB word perception scores, and (3) duration of CI experience (uncorrected P < 0.005, k = 10, T = 3.50 and inclusive masks at P = 0.005, Tables 5, 6), while age and sex being controlled for by treating them as nuisance variables. Areas with significant effects were selected and reported.
Table 5

The activated areas positively or negatively related to the duration of deafness

Positive correlation

MNI coordinates (x y z)

BA

Cluster size

T

Negative correlation

MNI coordinates (x y z)

BA

Cluster size

T

Under the A-only condition

Under the A-only condition

 n.s.

    

 n.s.

    

Under the congruent AV condition

Under the congruent AV condition

 L hippocampal tail

−12 −38 12

 

149

7.17

 L superior temporal gyrus

−58 2 −12

22

18

4.21

 Claustrum

30 8 −10

 

72

4.44

     

 R lingual gyrus

16 −84 −14

18

195

4.04

     

Under the incongruent AV condition

Under the incongruent AV condition

 n.s.

    

 n.s.

    

Uncorrected P < 0.005, k = 10 voxels, T = 3.50 with an inclusive mask at P = 0.005

CI cochlear implant, A-only auditory-only, AV audiovisual, n.s. not significant

Table 6

The activated areas positively or negatively related to the CI word score

Positive correlation

MNI coordinates (x y z)

BA

Cluster size

T

Negative correlation

MNI coordinates (x y z)

BA

Cluster size

T

Under the A-only condition

Under the A-only condition

 n.s.

    

 n.s.

    

Under the congruent AV condition

Under the congruent AV condition

 R amygdala

32 −2 20

 

27

5.55

 L lingual gyrus

−12 −96 −6

18

132

7.46

     

 R lingual gyrus

14 −90 −12

18

22

3.85

Under the incongruent AV condition

Under the incongruent AV condition

 n.s.

 

 R lingual gyrus

18 −94 −10

18

134

5.25

 

 L lingual gyrus

−14 −102 −6

17

76

4.99

 L fusiform gyrus

−32 −64 −2

19

14

4.58

(uncorrected P < 0.005, k = 10 voxels, T = 3.50 with an inclusive mask at P = 0.005)

CI cochlear implant, A-only auditory-only, AV audiovisual, n.s. not significant

We have adopted the statistical threshold of P = 0.001, uncorrected for detecting intra-group main effects, similar to the previous literature (Lazard et al. 2011; Lee et al. 2007b). We have adopted this threshold because the aim of the study to explore differences between the two groups with regard to the activity of the areas of higher cognitive processing after primary sensory perception stage and in this regard this was the most stringent post hoc threshold for detecting hypermetabolic effects in auditory areas under the A-only condition and in auditory and visual areas under the con-AVS condition in both groups. By comparing the results using the statistical threshold of P = 0.001, uncorrected with those using P = 0.05, false detection rate (FDR) corrected (Genovese et al. 2002), we confirmed that most of the results are replicable except for the fact that there were no suprathreshold clusters under the con-AVS condition in the CI user group when adopted FDR corrected threshold. Therefore, we report results obtained by the statistical threshold of P = 0.001, uncorrected.

In addition, similar to previous works (Lazard et al. 2010, 2012b; Lee et al. 2001, 2003, 2007a, b), we have adopted P = 0.005, uncorrected and also masked with an inclusive mask at P = 0.005 for all the group comparisons and correlation analyses to detect the effects from functionally relevant areas that were already statistically significant in intra-group analyses. In addition, to compensate for the relatively low statistical threshold, we focused on overlaps between functional relevance (obtained by correlation analyses) and the results from the conjunction analysis.

Results

Behavioral data during the experiment

On intra-group comparison, the response accuracy under the inc-AVS condition was significantly poorer as compared with those under the con-AVS condition in the CI user group (P = 0.003), while the control group showed no significant differences with regard to the response accuracy for all three stimuli (Fig. 2, left panel). Meanwhile, with regard to response time, the CI user group displayed slower responses for the inc-AVS condition than for the A-only condition or con-AVS condition (P = 0.004 and 0.001, respectively), but the control group showed significant difference only between A-only and inc-AVS conditions (P = 0.045; Fig. 2, right panel).
https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig2_HTML.gif
Fig. 2

Behavioral data showing significant intra-group differences among the three stimuli conditions in each group (asterisks), and differences between the two groups (daggers) for all three conditions with regard to the correct trials and response time. Error bars represent SE. A-only auditory-only, AVcon congruent audiovisual, AVinc incongruent audiovisual

On inter-group comparison, the CI user group demonstrated the overall poorer response accuracy than the NH control group (Fig. 2, left panel). There were no statistically significant differences between the two groups under the A-only and con-AVS conditions (P = 0.069 and 0.314, respectively). However, the CI user group revealed significantly lower accuracy than the NH group under the inc-AVS condition (P = 0.001). Meanwhile, with regard to response time, the CI users showed overall slower responses than the controls (Fig. 2, right panel) and the discrepancies were prominent for the A-only and inc-AVS conditions (P = 0.009 and 0.007, respectively). In contrast, the difference between the two groups was insignificant for the con-AVS condition (P = 0.314). These results confirm that the participating CI users indeed retain slower, but good, auditory speech perception ability compared to normal hearing peers, and with the support of congruent visual information, speech perception ability is behaviorally not discriminable from controls.

Significant activation in three task conditions and group differences

Figure 3 shows the activated cortical areas in the two groups under all three active stimulus conditions, with areas showing significant group difference overlaid. The activated areas and statistical details are summarized in Tables 2 and 3.
https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig3_HTML.gif
Fig. 3

Main activation effects for the three conditions in CI users (yellow) and controls (azure) (uncorrected P < 0.001, k = 25 voxels, T = 3.25), as well as relative activation foci for the contrast “CI users–Controls” (red) and “Controls–CI” (blue) (uncorrected P < 0.005, k = 25 voxels, T = 2.67 masked at P < 0.005). CI cochlear implant, AV audiovisual

For the A-only condition, both groups revealed activation of bilateral primary auditory and auditory association cortices (BAs 21, 22 and 42, Table 2). As for group differences, the CI group demonstrated no suprathreshold clusters compared to the control group. In contrast, greater activation in the NH controls relative to the CI users was observed in the left lateral middle temporal gyrus (MTG, BA 39) and the right inferior temporal gyrus (ITG, BA 37) (Fig. 3, upper panels).

Under the con-AVS condition, both groups showed increased regional cerebral blood flow (rCBF) in the visual cortices (BAs 17 and 18). However, compared to the salient activation of bilateral auditory cortices in the control group, the CI group showed a relatively small activation in the left MTG (BA 22). Unlike the CI group, the control group showed increased rCBF in the right ITG (BA 37) and the right temporo-parietal junction (BA 42, Table 2). The control group demonstrated significant CBF increases in the right ventral premotor cortex (vPMC, BA 6), supramarginal gyrus (SMG, BA 40), and ITG (BA 37) relative to the CI user group. By contrast, the CI user group showed more activation than controls in the right amygdala and the left hippocampal tail (Table 3; Fig. 3, middle panels).

In the inc-AVS condition, both groups revealed activations in the auditory (BAs 21, 22 and 42) and visual (BA 17) cortices. However, the control group showed increased rCBF in the right ITG (BA 37) under the con-AVS condition, whereas the CI user group additionally indicated a significant cluster in the left IFG (BA 47) (Table 2). Moreover, the CI group displayed significantly more activation in areas of the left IFG and left MTG (BA 37) than controls, whereas more activation in the control group than the CI group was found in the right vPMC, the right SMG, and the right inferior occipital gyrus (BA 18) (Fig. 3, lower panels; Fig. 4, left panel and bar graphs).
https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig4_HTML.gif
Fig. 4

In the left panel, brain regions where activity is higher in CI users than controls under the incongruent AV condition (red) are displayed over the main activation effects in CI users (yellow) and controls (azure) (from Fig. 3) (uncorrected P < 0.005, k = 25 voxels, T = 2.67). Right panel shows areas of conjunction of “CI users–Controls” contrasts in all three conditions (green), and that of group-by-congruency condition interaction [blue; CI users (incongruent AV–congruent AV)—Controls (incongruent AV–congruent AV)] (uncorrected P < 0.005, T = 2.67). Plots depict the relative effect size across groups and conditions in those regions where CI users exhibited an overactivation relative to controls. A-only auditory-only, AVcon congruent audiovisual, AVinc incongruent audiovisual

For the contrast CI users (inc-AVS–con-AVS), no significant effects were found even at P = 0.01, uncorrected. However, for the contrast CI users (inc-AVS–con-AVS), a marginally significant effect at the left IFG was found (P = 0.006, T = 2.50, and k = 57).

Conjunction and interaction analysis

By means of the conjunction analysis, areas in the left superior frontal gyrus (SFG, BA 6) and left medial frontal gyrus (MeFG, BA 10) were found to be hyperactivated in the CI user group relative to the control group under all three active stimulus conditions (Table 4; Fig. 4). A group-by-condition interaction between inc-AVS and con-AVS conditions was elucidated in an area of the left middle frontal gyrus (MFG, BA 8), which shows effect of the CI user group (inc-AVS–con-AVS)—the control group (inc-AVS–con-AVS). Interactions of the opposite direction showed no suprathreshold clusters (Table 4; Fig. 4, right panel and bar graphs).

Correlation analysis with clinical variables

The areas of neural activation under three conditions that are positively or negatively correlated with deafness duration and CI speech score are summarized in Tables 5 and 6 (uncorrected P < 0.005). In the con-AVS condition, the right lingual gyrus (16, −84, −14) showed a positive correlation with deafness duration (Table 5, Fig. 5 right upper), while the left lingual gyrus (−12, −96, −6) showed a negative correlation with CI speech score (Fig. 5 left lower; Table 6). In addition, the right lingual gyrus (18, −94, −10) displayed a negative correlation with CI speech score (Fig. 5 right lower; Table 6). Additionally, in the con-AVS condition, CI users performed better when they showed higher activation in a region of the right amygdala (Table 6; Fig. 6, rightmost).
https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig5_HTML.gif
Fig. 5

Regions of early visual cortices showing correlation with clinical factors in deaf CI users. Brain regions correlated positively with deafness duration under the congruent AV condition (red), negatively with the CI speech score under the congruent (pink) and incongruent (azure) AV conditions are displayed over the main activation effects in CI users under the congruent (gray) and incongruent (black) AV conditions (from Fig. 3) (P < 0.005, k = 25 voxels, T = 2.67). Note that areas of the lingual gyri are correlated positively with deafness duration, but negatively with speech score. Circles highlight two patients with abrupt deafness. CI cochlear implant, AV audiovisual

https://static-content.springer.com/image/art%3A10.1007%2Fs00429-013-0704-6/MediaObjects/429_2013_704_Fig6_HTML.gif
Fig. 6

The right amygdala, where activity was higher in CI users than controls under the congruent AV condition (red), is displayed over the main activation effects in CI users (yellow) (P < 0.005, k = 25 voxels, T = 2.67). The plot depicts the relative effect size across groups and conditions in this area. Note the positive correlation between activation of the right amygdala and CI speech score (green). Circles highlight two patients with abrupt deafness. A-only auditory-only, AVcon congruent audiovisual, AVinc incongruent audiovisual

With duration of CI experience as a covariate of interest, however, no significant correlations were found under all three active stimulus conditions.

Discussion

By merging multisensory information, one obtains a more reliable percept of the environment. Regions such as the IFG (Hein et al. 2007), PMC (Skipper et al. 2007), posterior parietal cortex (Noppeney et al. 2008), ventral occipito-temporal cortex (Beauchamp et al. 2004b), as well as the STS, have often been suggested as integrators of AV stimuli. However, there is a scarcity of knowledge about the neural substrates responsible for merging congruent AV stimuli in CI users, as well as those utilized by such individual for selecting the critical information need to resolve conflicting incongruent AV stimuli. In the current study, we attempted to illuminate those neural substrates involved in AV integration mechanisms in CI users in the context of semantic congruency of audiovisual input. In brief, CI users were more vision-reliant when AV stimuli were congruent, whereas they adopted a top-down cognitive pathway when confront with AV conflicts.

Behavioral results for AV speech stimuli

Recent behavioral studies suggest that although CI users are typically biased toward visual cues when integrating competing AV stimuli (Rouger et al. 2008), particularly proficient CI users show comparable utilization of visual and auditory cues for both congruent (Doucet et al. 2006; Rouger et al. 2007; Tremblay et al. 2010) and incongruent AV stimuli (Champoux et al. 2009; Tremblay et al. 2010). These observations are partially in accordance with our results, which revealed no statistically significant difference with regard to response accuracy and response time between the CI users and NH controls for congruent AV stimuli. However, contrary to these previous observations, the performance of the CI users was markedly degraded by simultaneously presented visual distractors, whereas the NH controls were not affected by distracting visual inputs (Fig. 2, left panel).

Increased load of normal sound processing in CI users as compared to NH controls

We presented non-degraded auditory cues both to CI users and NH controls to investigate the neural substrates involved in the procession of bimodal stimuli that are similar to stimuli encountered in our daily environment. Under the inc-AVS condition, the NH control group may have had advantages over the CI user group due to better perception of the auditory cues. The significantly longer response times of the CI users (Fig. 2, right panel), albeit no difference in response accuracy under the A-only condition, may also indicate an additional effort to process auditory input with lower quality in CI users. The two groups also showed slightly different pattern of activation in the A-only condition. In CI users, significant clusters in the bilateral STG extend to the temporal pole, which might suggest an increased load of semantic processing resulting from degraded auditory information (Vigneau et al. 2006). In contrast, normal controls showed increased activity relative to CI users in the left MTG and right ITG, suggesting reduced auditory processing load by virtue of clearer sound and priming effects from previous training sessions using AV stimuli.

CI users rely more on speech reading than NH controls to process congruent AV stimuli even when required to attend only speech sounds

For congruent AV stimuli, both CI users and NH controls demonstrated increased rCBF in areas of the auditory and visual sensory cortices. Increased rCBF in areas of the bilateral STS in NH controls replicate previous reports suggesting the STS as the core integrator of congruent AV stimuli (Werner and Noppeney 2010a; Beauchamp et al. 2010; van Atteveldt et al. 2004). By contrast, compared to widely distributed bilateral activation in areas of the auditory cortices encompassing the STS in the NH group, the CI group revealed only slight activation in an area of the auditory cortex centered at the left MTG (Fig. 3).

To compensate for auditory deprivation, post-lingual deafness subjects maintain oral comprehension by developing speech reading (Lazard et al. 2012; Lee et al. 2007b). Even several years after implantation, CI users maintain a high level of reliance on speech reading due to rudimentary and approximate auditory cues (Lazard et al. 2012; Rouger et al. 2007). Therefore, the CI user group may have processed congruent AV information mainly utilizing visual cues with minimal assistance of auditory inputs, even though they had been instructed to respond only to specific auditory information while viewing, but ignoring visual information. In this regard, the far lower activation of the left auditory sensory cortical area in the CI group under the con-AVS condition as compared with the NH group implies habitual vision dependence in analyzing congruent AV stimuli.

Deafness-induced and CI-related functional reorganization was also found in the limbic system

In addition to the above-mentioned areas of AV processing, during the con-AVS condition two regions of the limbic system revealed an increased activation in CI users as compared with NH controls, showing two different clinical profiles. In a region of the left hippocampus, higher activation was observed in CI users as compared to NH controls under the con-AVS condition, and this activation was positively correlated with deafness duration (Tables 3, 5). This clinical correlation suggests that to integrate speech sounds with speaking faces, CI users with longer deafness duration tend to rely on stored auditory memory to overcome the impoverished quality of the auditory input.

A unique activation by congruent stimuli in the CI group was also noticed in an area of the right amygdala (Fig. 6; Tables 2, 3). In addition, this activation was positively correlated with speech score (Fig. 6; Table 6). Activation of the amygdala has been posited to be related with higher cognitive working memory load (Schaefer et al. 2006; Yun et al. 2010) and perceptual AV integration of emotions in NH individuals (Mesulam 1998; Kreifelts et al. 2010). Responding by auditory information while viewing, but ignoring visual information may evoke some uncomfortable emotion (Song et al. 2013b; Vanneste et al. 2010) in CI users who naturally rely more on visual information than auditory information when AV stimuli are congruent. By contrast, no activation of the amygdala in CI users under inc-AVS may indicate they were paradoxically less disturbed while ignoring non-matching visual stimuli even though they showed poorer performance under the inc-AVS condition. As CI users improve auditory performance along with audiovisual performance (Rouger et al. 2007), those with higher auditory performance might feel more uneasy because of their strong tendency to violate the instruction. Interestingly, a recent study on post-lingually deaf patients indicated that the right amygdala is activated by color imagery, whereas normal hearing controls activate the same region for auditory imagery (Lazard et al. 2011). Taken together with these reports, we may conjecture that the right amygdalar activation may reflect a neural reorganization preferring visual stimuli.

Although the region centered the right amygdala where it showed positive correlation with speech score displayed relatively small voxel size (k = 27), this may be a functionally relevant area because it also overlaps with areas where CI users showed activation both in intra- and inter-group comparisons. As we can observe in Fig. 6, the areas of activation in CI users in intra- and inter-group comparisons partially overlaps. Considering that the area with significant activation on an inter-group contrast but without any significance on an intra-group level may have low functional relevance, the overlap of the area showing positive correlation with those showing significance in intra- and inter-group comparisons may be functionally important. Also, this is why we have used inclusive masks for all the group comparisons and correlation analyses, and this is why we report the areas of correlation even with relatively small voxel size, because those areas may be statistically meaningful.

NH individuals utilize vPMC–SMG network for AV integration more than CI users

The right vPMC and SMG displayed increased activity in the NH control group relative to the CI user group for both con-AVS and inc-AVS conditions (Fig. 3; Table 3). The increased activity in this network may reflect involvement of the mirror neuron system. Viewing another persons’ action activates cortical areas belonging to the mirror neuron system, presumably to link action execution and observation, and in particular, the observation of mouth movements elicits a covert motor plan to imitate the lip motion (Nishitani and Hari 2002). The premotor area (Skipper et al. 2005; Molenberghs et al. 2010) and the SMG (Aboitiz and Garcia 2009; Molenberghs et al. 2010) are frequently suggested to be sites of the human mirror neuron system. In addition, the vPMC–SMG pathway is involved in linking articulatory motor and somatosensory representations during speech perception (Skipper et al. 2007; Guenther et al. 2006). Considering that the NH controls were more experienced with speech production with the CI users, with an average duration of deafness of 8.6 years in our series, the relative activation of the vPMC–SMG network in NH controls may reflect decreased activity of the mirror neuron system for speech-related motor representation. In this regard, the more salient difference in this network between the two groups for the con-AVS relative to inc-AVS supports our interpretation of the different activity of this network as the result of differences in the recruitment of the speech-related mirror neuron system between the two groups.

CI users adopt top-down strategy to process auditory information with visual distractors

For the inc-AVS condition, NH individuals presented an activation pattern similar to that for the con-AVS condition, and CI users also demonstrated increased rCBF in areas of the bilateral auditory and visual cortices (Fig. 3; Table 2). However, the CI user group revealed distinct activation in an area of the left IFG, and this activation was also prominent in comparison to the NH control group (Fig. 4; Table 3). In addition, the CI user group displayed trend-level activation in an area of the left IFG for the intra-group comparison using the contrast CI users (inc-AVS–con-AVS). An examination of relative activation across conditions (Fig. 4, lower left) shows that the relative increase of activity was obtained in two conditions (A-only and inc-AVS), for which auditory decisions were not aided by congruent visual input. This suggests greater difficulty and increased cognitive load in the absence of visual aids—a situation further exacerbated by visual distractors.

The results for the NH control group are consistent with previous reports arguing for the STS as the “bottom-up” integrator of AV conflicts (van Atteveldt et al. 2004; Hein et al. 2007). In contrast, increased rCBF in the left IFG encompassing Broca’s area indicates “top-down” processing of the same AV conflict in the CI user group. This is intuitively plausible because CI users rely more on visual than auditory inputs. Hence, if the task is to extract auditory cues, they need to selectively neglect the visual cues that are their primary source of information. The IFG has been reported to be an area that subserves cognitive control (Fletcher and Henson 2001; Koechlin et al. 2003). Functional studies in rats with damaged IFG have demonstrated functional deficits in the detection of response conflict (Haddon and Killcross 2006; Marquis et al. 2007). In this regard, the left IFG may serve as a center for selective auditory cue extraction in the milieu of AV conflict. Another possible explanation for this IFG activation is a stop-signal inhibition for AV conflict (Aron et al. 2003). That is, vision-reliant CI users may have utilized the IFG, an area of response inhibition (Garavan et al. 1999; Bunge et al. 2002), due to overrepresented visual processing under the inc-AVS.

The conjunction analysis revealing areas in the left SFG and MeFG as significantly activated regions in the CI users relative to the NH controls under all three stimulus conditions (Fig. 4; Table 4) may also indicate uniformly increased activity of higher cognitive functions for auditory processing in CI users. The SFG contributes particularly to working memory (Park et al. 2011), and the MeFG forms the apex of the executive system for decision-making (Koechlin and Hyafil 2007). These areas have also been reported to be activated by incongruent AV stimuli in NH subjects (Adam and Noppeney 2010; Jones and Callan 2003). From these viewpoints, raised activity in areas of the left SFG and MeFG under all stimulus conditions may indicate higher cognitive load in processing auditory stimuli in CI users.

The group-by-congruency interaction analysis yielded an area in the left MFG (Fig. 4, right lower; Table 4) that may also indicate different applications of cognitive resources between groups. This area was deactivated in NH controls by ignoring of visual conflict, whereas it was activated in CI users by congruent visual input. By virtue of previous sensory experience, task difficulty and attentional load were differently applied between groups according to the visual stimuli delivered simultaneously with the auditory stimuli (Goldberg et al. 2007; Ruff et al. 2010).

One discrepancy between our results and prior studies should be addressed. While several investigators have described an increased activity in the IFG of NH individual for incongruent AV stimuli (Szycik et al. 2009; Bernstein et al. 2008; Nath and Beauchamp 2012), NH subjects in our study did not show such activation. This may be attributed to the difference in language and task complexity between the current and previous studies. Korean numbers can be differentiated discretely with regard to articulatory orofacial movements, whereas previous studies used less easily discernible words such as disyllabic words comprised of rhyming AV cues (Szycik et al. 2009) or monosyllabic McGurk words (Bernstein et al. 2008; Nath and Beauchamp 2012). Therefore, the IFG activation for top-down auditory cue extraction may have been less critical in the current study compared to previous studies.

Previous deafness-induced plasticity in the visual cortex affects current CI outcome

The negative correlation between speech score and activity in areas of the bilateral lingual gyri (Fig. 5, lower panels; Table 6) may indicate that CI users with poorer performance rely more on visual cues as they did during the period of deafness. In post-lingual deaf patients prior to implantation, increased metabolism in the visual areas has been reported as a predictor of poorer CI outcome (Giraud and Lee 2007). In the current study, for both AV stimuli, those CI users with the most increased activity in early visual areas were least successful in auditory perception with the CI device (Fig. 5, lower). Positive correlation with deafness duration (Fig. 5, right upper) confirms that previous deafness-induced plasticity in these areas is related to current CI auditory ability. Nonetheless, we may take degraded input delivered by the implant into account when we analyze increased activity in the visual cortices.

A crucial difference between the current study and a recent study should be addressed. While the current study revealed negative correlations between the activity of the visual cortex and CI speech score for both AV conditions, a recent study indicated that the most significant activity cluster correlated with CI speech outcomes was observed in the visual occipital cortex for congruent AV stimuli (Strelnikov et al. 2013). This starkly opposite difference may have resulted from two reasons. First, while our CI users’ average duration of CI usage before the PET scan was 66.7 months, that of Strelnikov’s study was 7.6 days. Considering that intra-modal compensation by speech reading is the primary source of speech processing during the period of deafness, about a week’s CI usage may have been insufficient for inducing cross-modal sensory restoration by AV interaction due to the crude nature of the CI-derived sound, and thus the CI subjects in Strelnikov’s study may have been benefitted from their speech-reading ability while processing AV stimuli. In contrast, the average CI usage of six and a half years in our participants may have enabled enough cross-modal compensation and therefore they performed better when they were less dependent on previously dominant visual cortical activity. Second, while our CI users’ average duration of deafness is 7.6 ± 6.6 years that of Strelnikov’s study was at least more than 16.4 years. Deaf individuals may rely more on their speech-reading skill to process audiovisual information as the deafness duration becomes longer. Therefore, we may surmise that CI users with relatively longer duration of deafness, as in Strelnikov’s cohort, may rely more on their visual cortical activity due to deafness-induced plasticity and they perform better when they activate their visual cortex more, while those with shorter deafness, as in our cohort, duration may utilize auditory information more, and thus their visual cortical activity may negatively affect the speech perception. However, as we have acknowledged above, our cohort showed no correlation between the speech perception score and the duration of deafness, probably because we have collected only good performers. In contrast, Strelnikov et al. collected subjects with relatively long duration of deafness with various outcomes (with >20 years of deafness ranged from 15 to 85 % speech perception accuracy). Therefore, our analyses focusing on the duration of deafness in isolation are limited in that our cohort is not representative of general CI users, and thus general deafness-induced plasticity cannot be evaluated only by our cohort. Future studies with larger number of subjects with various regard to duration of deafness, duration of CI usage, and speech outcome should be performed to confirm whether the visual cortical activity is deleterious or beneficial according to the duration of deafness.

To find reversal of deafness-related decrease of the left STG/MTG activity following hearing restoration with CI (Rouger et al. 2012), we performed correlation analyses with duration of CI experience and indeed demonstrated progressive increase of activity in the right STG–STS in all three stimulus conditions, but these could not reach statistical threshold used for correlation analyses in this study (uncorrected P = 0.005, k = 10, T = 3.50), hence warrants future study with more subjects to confirm.

Limitations of the current study and proposed future studies

To the best of our knowledge, the current study is the first to identify candidate neural substrates for the processing of congruent and incongruent AV stimuli in CI users. By disentangling the characteristic mechanisms of CI-assisted speech-processing strategies, these results may be a milestone for future studies investigating multisensory integration in CI users. Due to the limited number of subjects, we could not recruit a sample of CI users with a homogenous mode of deafness, i.e., sudden or progressive hearing loss. Two of ten CI users were deafened due to sudden hearing loss, and they might different from those with progressive hearing loss in the plastic changes of their brains. Although these two participants did not exhibit unusual behavioral results (they are indicated by circles in all correlation plots), future studies with a larger series of participants should be performed to exclude bias originating from such etiologic differences. In addition, although the influence of sex was factored out by including it as a nuisance variable in all group analyses and correlation analyses, possible effect of sex may still have affected the results and their interpretation because the previous literature has indicated that sex differences affect cortical speech processing (Bitan et al. 2010; Kempe et al. 2012; Koles et al. 2010). Future studies with better sex-matched groups should be performed to exclude this possible bias. Finally, because we were focused on the effect of agreeing or conflicting visual inputs on auditory processing in CI subjects, we did not include visual-only (V-only; lip reading) condition in the current study paradigm. Considering that previous studies have revealed identical performance levels (Tremblay et al. 2010; Rouger et al. 2008) but different cortical activation patterns (Rouger et al. 2012) between NH controls and CI subjects under V-only condition, future studies comparing V-only and AV stimuli may help us understand the post-CI changes in visual and audiovisual speech-processing networks.

Conclusions

Taken together, comparisons of AV integration circuits in NH controls and CI users delineated neural substrates involved in multimodal speech processing in CI users. With conflicting multimodal input, CI users are vision-reliant when processing congruent AV stimuli, and disturbed more by visual distractors than NH controls when confronted with incongruent AV stimuli. To cope with this multimodal conflict, CI users activate prefrontal areas such as the left IFG to adopt a top-down cognitive modulation pathway, whereas NH individuals primarily adopt a bottom-up strategy and utilize multisensory integrators such as the STS or the right vPMC–SMG pathway. Taken together, deafness-induced plasticity makes CI users depend more on visual processing and higher cognitive pathway for coping with multimodal environments.

Acknowledgments

The authors thank Dr. Yong-Hwi Ahn for his support on the manuscript. Also, the first author thanks Dr. DY Yoon for giving precious support to the study. This work was supported by Korean government (MOST) [Korea Science and Engineering Foundation (KOSEF) (no. 2012-0030102)].

Copyright information

© Springer-Verlag Berlin Heidelberg 2014